
Methods for predicting the future:
“You can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively these methods are known as ‘nutty methods.’ Or you can put well-researched facts into sophisticated computer models, more commonly referred to as ‘a complete waste of time.’”
– Scott Adams, Creator of Dilbert
Data-driven decision making has rightly become one of the best investments that a company can make. Much more than “a complete waste of time” and much better than a crystal ball, companies have increased revenue, reduced costs, and even re-invented themselves by taking advantage of the power of analytics. There are many ways to use data to drive your decisions and many components for doing it well. There are so many, in fact, that people get lost trying to understand or talk about it. When you read articles, different authors will use different terms for the same thing. The boundaries between the terms are less like fences and more like transitions.
This article will attempt to briefly describe the Economy of Data Science (EDS). Why “attempt?” Because of all the reasons above, this is not an easy task. It’s like talking about the transitions between emotions. When does irritated become annoyed become frustrated become upset? Why “briefly?” Each of these components is a discipline all its own. Comparing, contrasting and categorizing them is a much bigger task than the word limit here. Why “economy?” Dictionary.com defines economy as “the disposition or regulation of the parts or functions of any organic whole; an organized system or method.” Why “data science?” Let’s start there.
Data Science
The phrase data science was in vogue for quite a while in the first two decades of this century. It was probably the most ambiguous term in the EDS. It is still used, but popular articles have moved on to machine learning and AI. Soon it will be called something else. The phrase is very useful though for capturing the organic whole of data-driven decision making. It can mean everything from using data to do scientific research to doing research on how to analyze data better. So think of data science as the interdisciplinary field that captures all data-driven decision making in the same way that theater captures everything from Shakespeare to set design to scriptwriting to “You’re a Good Man Charlie Brown” directed by Mrs. DeGraw in high school.
(Business) Analytics
Analytics, sometimes called Business analytics when the focus is on business questions, is the umbrella for data analytics, advanced analytics and even machine learning.
Data Analytics
I have heard people use Data Analytics to mean analytics as a whole, so watch out for ambiguity. This data-driven decision-making method calculates descriptive statistics that tell us what has happened in the past. These statistics are useful when we don’t have enough data to make a predictive decision. They can also be used to explore the data (exploratory data analysis) to get a better understanding of what’s there before moving to advanced analytics.
Bar charts, line graphs, KPIs, top 10, cross-tabs, means, bullet charts, correlations, gauges with a target, percentages, scatterplots, and even a standard report are all forms of descriptive statistics. They would, for example, tell a sales VP how their team is doing in each region. Data Analytics could also be used to evaluate whether there is evidence of differences between a test group and a control group.
Advanced Analytics
This category is also called inferential statistics. Here, inferential refers to predicting, forecasting, categorizing, and prescribing (cause and effect). These techniques use probability distributions to model data from the past so that an inference can be made about what will happen in the future. One does not have to have Big Data (another ambiguous term) to use these techniques. Very important simple decisions, such as which marketing message is better, have been made well with less than 100 data points. The more complex the decision to be made, though, the more data is needed. For example, deciding which marketing message is better for each demographic across all products would require lots of data.
Common advanced analytics are regression analysis, clustering, causal Bayesian analysis, control charts, response surface modeling, time series forecasting and machine learning.
Machine Learning
Like other statistical techniques, machine learning came about in multiple disciplines. The earliest form of machine learning (logistic regression) was created about 100 years ago, when the computing power was not nearly adequate. Today that computational capability allows us to answer very advanced, complex questions. How do you automatically translate from one language to another? How can a car drive across town by itself? How do you detect habitable planets with noisy data? How do you detect breast cancer in early stages from mammograms?
Neural nets, decision trees, logistic regression are all forms of machine learning. People who do machine learning care more about the predictive capability of the model than they do the underlying relationship. Because of that, and because they are working with larger data sets, they will divide the data into training and testing subsets, to make sure that their predictions are generalizable to future data.
Data Engineering
Data engineering has also been called other things; ETL development is the “old” name and data wrangling is a newer alternative. We think of data engineering as the process of extracting data from its source, often staging it somewhere, preparing it for analysis (either before staging or after) with the goal of automating the process so that, like electricity or oil, data just flows to where we want to use it.
Lots of work is included in data engineering. Data cleansing is huge because data is messy. (Garbage In, Garbage Out, as they say.) Many companies are trying to automate data cleansing because it takes a lot of time. Some of them are even using machine learning to do that. Data governance is the idea of trying to make sure the data is correct before it even gets into the system. (Don’t let the garbage in in the first place.) Data warehousing, data lakes, data federation and data marts are all types of data consolidation, storage and staging.
Business Intelligence
One of the early ways that companies would use data to make decisions was called Business Intelligence (BI). (Of course, this started many jokes.) Business Intelligence is a combination of data engineering and data analytics with, recently, some advanced analytics thrown in. BI dashboards are the presentation of the visualizations discussed above under Data Analytics. These operations were very valuable when the analytics presented were well thought-out and regularly evaluated against business strategy.
Analytics Infrastructure
Underneath all of this data-driven decision making is the infrastructure required to make it happen. Much of this is standard IT infrastructure designed for the specific purpose of analyzing data. On top of that, we include the software platforms for data processing, analytics, and visualization.
The analytics Infrastructure doesn’t have to be a fancy Hadoop environment in the cloud. It is possible to make powerful decisions with a personal computer. When companies expand beyond ad hoc decision-making into using data to optimize the enterprise, then the infrastructure will expand, too.
Conclusion
There are many reasons that data-driven decision making is difficult to implement at an organization. Misunderstanding and miscommunication about what is being done shouldn’t be one of them. These descriptions can be helpful to “get on the same page.” However, what’s more important than which term is used is the consistent understanding of what you want to accomplish.
With this brief description of the Economy of Data Science, you can also begin to set up a roadmap of how you want to achieve analytics maturity so that data can help inform your business decisions.