Data analytics systems
are the cutting edge of modern corporate computing. While many
people may feel they are behind the “state of the art” they read
about, the truth is these are projects we’re implementing
currently for prominent companies in life
sciences, finance, healthcare, Internet services, and aerospace, They have a
lot in common with each other, and likely even with
your computing
environment.
That’s the truth on the ground. Meanwhile, we are constantly
seeing buzzwords in the tech media, as writers struggle to help
everyone understand what’s going on out there. Big Data (BD) and
Business Intelligence (BI) get talked about a lot -- but a
lot of people are unclear on what they mean. So let’s cut through
the clutter and look at what these projects are really about: who
are they for, what problems do they solve, what do they have in
common, and how are they different.
What is big data, really?
My friend and Deerfield schoolmate Doug Laney, a distinguished
analyst on Gartner’s data team, famously defined big data as having
volume, velocity, and variety.
Raw data points are available to us all in a volume no one
really anticipated, as nearly every object and action in an
enterprise is tracked. Recently the CTO of a hospital group
described to me their world, in which 22,000 medical devices are
putting out logs of data all the time. The volume of information on
hand is overwhelming. How do we move it around and store it? How
does one make sense of it?
Adding to this is a non-stop stream of new data points, coming
in at high velocity. Our friend and client Tom Doris, founder of
financial analytics firm OTAS and now a LiquidNet executive, says many
stock analysts use their systems to organize the millions of new
data points emerging every day from each of several stock
markets.
This seems hard enough, and then you add in the enormous variety
of data sources relevant to making good decisions. The delightful
Chris Mackey, CEO of our client Mackey RMS, focuses on organizing the
extensive research and collaboration that goes into major decisions
for hedge funds and the like. How do you choose a course of action
when crucial data is in a dozen different formats, on numerous
different servers, behind a range of APIs and addresses?
Big data is the ultimate “be careful what you wish for”
scenario. Do you wish you knew what was going on? Okay -- now what
would you do if you knew almost everything that was going
on? It’s like buying the daily output from a gold mine. No human,
without massive machine assistance, can extract most of the value
from that torrent of gold ore.
Then what’s business intelligence?
The idea of business intelligence predates computers, but has
been made much more important -- and more useful -- by the vast
amount of data we now have. BI is the system of making better
decisions through better decision-support systems.
These systems can be as simple as reporting and charting
software, or as elaborate as machine learning and artificial
intelligence. And they rely on organized streams of input data --
which don’t even have to be “Big” to be extremely useful. In fact,
a lot of BI involves digesting the complexity of the raw data,
bringing it down to human-usable tools like dashboards, metrics,
and exception detection. Many BI systems are hierarchical --
presenting decision-makers with a summary of the current situation,
and features to filter or explore the data to learn more about any
part.
Our client Seattle Cancer
Care Alliance, for example, provides life-saving treatments at
several leading cancer-care institutions. From the start, they
provide outstanding care to a great many patients. But wouldn’t it
be even more exciting to constantly learn from the outcomes of all
these treatments, to see which therapies are working best for what
sorts of cases -- and then to use this knowledge to deliver the
best possible course of care for every future patient? While a
typical analysis might only involve thousands of patients in total
-- hardly enough to sound like Big Data -- the caliber of insight
that must be provided is exceptionally high.
For a very different example, consider the project we’re working
on right now with a multi-billion-dollar manufacturing company. As
is typical these days, their big expensive machines have a computer
on board that constantly logs their performance. But a lot of this
data just goes into storage, with no one looking at most of it.
What they want is to understand the leading causes of
breakage and downtime, and gradually eliminate these -- through
offline analysis to discover best practices for maintenance, and
near-real-time analysis to improve operations plans during the
workday -- making their operators into computer-assisted
super-operators. That’s business intelligence, turning available
data into better decision-making.
How can I get both?
As you’ve probably guessed, BD and BI aren’t competing
approaches -- they are IT architectures that play well together,
with Business Intelligence as essentially a layer on top of Big
Data.
We find that most companies already have good IT organizations
in place, with the skills to develop new software when need, and to
integrate existing Commercial Off-the-Shelf (COTS) tools when
available. The problem, then, isn’t lack of building blocks. Anyone
can obtain or write a program to input a table of data and graph
it, or compute subtotals. The problem is how to put these building
blocks together, and especially, how to scale up trivial solutions
to production scale.
We break the BD/BI work into three doable pieces: DevOps, DataOps, and cloud
application architecture.
DevOps is another jargon term in constant use -- including here
at FP Complete. It means the engineering that happens *after*
you’ve written some code but *before* your end user receives the
final results on-screen. Devops is a set of tools and best
practices for scaling up: from a data analysis that runs one time,
on one user’s machine, to a system that runs all the time, on a
reliable and scalable and secure cloud-based system, to support
everyone who needs the answers. If you’re still using manual
processes and mysterious “IT wizards” to scale up your analyses
from the laptop to the data center, you’re not going to reach Big
Data scale or achieve much Business Intelligence. DevOps is a
proven set of techniques and technologies for integration,
deployment, scale-up, and continuous operations.
DataOps is a newer concept -- it’s “DevOps for data.” Just as
numerous tools can clean up and scale up your analytics apps, a
parallel set of tools can clean up and scale up your actual data
feeds. DataOps includes data cleansing, schema enforcement, storage
and replication, warehousing and repositories, metadata management,
version management, uniform API provision, security and monitoring
-- all the tools and processes to turn your “pile” of data into an
“answer factory” capable of responding to any reasonable query, and
constantly ingesting and incorporating the latest data streams.
Cloud application architecture means designing your distributed
system -- servers, apps, tools, work processes, jobs, and data
flows -- into a sensible whole. These days, almost no one should be
designing a major new IT system from scratch. If your company is
mostly writing new virgin software code from a blank-screen start,
you’re wasting work and losing time. Understanding best practices
and existing IT architectures, and picking components from the
existing inventory, will usually get you 80% of the way toward a
good solution. Reuse makes all the difference! Cloud features and
distributed, service-oriented architectures make
building-block-style development productive and fast. Bug-resistant
architectures, with clear separation of responsibilities, will
allow you to break your IT system into pieces -- most not written
from scratch -- each maintainable on its own schedule, and
improvable at will.
What’s realistic to expect?
The good news is that Big Data is not an all-or-nothing
proposition, and neither is Business Intelligence. You can make
stepwise progress on both, which is exactly what we encourage our
clients to do.
Phase 1 will be BI with the limited portion of your data that’s
already in good condition. It’s fairly straightforward to create
new IT solutions -- I don’t say new apps here, because these
solutions will using existing code for much of the work --
that will answer whatever you feel are the most pressing questions
about your data. You are probably already doing some of this,
without even calling it business intelligence. Most companies stay
in Phase 1 for years, never really getting the answers they wish
they had, but at least answering a few crucial questions with
hand-built systems.
Phase 2 will be basic DevOps -- turning your IT work into an IT
factory, in which any analysis that runs for *someone* can be
turned into an analysis that runs for *everyone, all the time* --
maintainably, reproducibly, reliably, scalably , securely. Likely
steps here include Version Control, Continuous Integration,
Continuous Deployment, Automated Testing, Cloud
Scalability, System Monitoring, and possibly Security Auditing.
With many of these things implemented, you will see your BI
productivity go way up, with new solutions coming online regularly
and predictably.
Phase 3 will be basic DataOps, launched when you rapidly
discover that the questions you really want answered require data
that’s “somewhere around here” and not yet organized. You can
expect to do an inventory of the many formal and informal data
feeds you depend on, what format they’re in, how they arrive, how
accurate they are, and how they are accessed. A set of automated
systems will be set up to filter, correct, or “cleanse” these
feeds, and then to make them available on high-powered, typically
cloud-based, distributed data servers. A set of metadata or “tables
of contents” will be set up to help your team locate and tap into
the data sources needed to answer a particular query. Data sources
will likely always be federated, with no one format conquering all,
and with cloud services stitching up the differences. With DataOps
implemented, you can expect to describe any reasonable question
about “what’s really going on,” and if the data is present
somewhere, a system that answers your questions will be
feasible.
The difference between Big Data and Business Intelligence will
fade
We find that mastery of data streams is more and more central to
every industry. Whether you’re in financial technology (FinTech),
aerospace, life sciences, or health care, your world is likely to
look more and more like the world of secure Internet services and
cloud computing. People in every industry tell us that this is
where they’re going.
As automation increases, Big Data will become the norm, and
we’ll soon just be calling it Data. Just as DevOps is becoming the
norm for innovative IT groups, so will DataOps. IT departments will
more and more resemble a two-sided “zipper,” marrying
ever-improving data inputs with ever-improving software inputs,
into ever-improving online solutions that run in their data centers
and in the cloud.
It will be a long road, but realistically we can look forward to
a future in which any question you have about your operations, your
customers, your patients, your research, can be answered with real
data -- reliably, reproducibly, and all the time.
If you liked this post
you may also like:
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.