“Too often we forget that genius, too, depends upon the data within its reach, that even Archimedes could not have devised Edison’s inventions.” – Fr. Ernest Dimnet, The Art of Thinking, 1954
Companies that make decisions based on data consistently outperform their competitors. Our “genius” as a business is only as good as the data we collect and the techniques we use to analyze it. There are many analytical techniques available with increasing levels of competitive advantage and business value. The graphic below, published many years ago and redone many times by many people, shows what many call the Analytical (or Analytics) Continuum.
The chart shows the types of analytical activities along the continuum. You’ll see the bottom four and top four labeled differently by different people: Business Intelligence and Business Analytics, Descriptive Statistics and Inferential Statistics, or Data Analytics and Advanced Analytics. It doesn’t really matter as long as we know what we’re talking about.
Some thoughts to consider first. The chart draws these activities with greater value and greater advantage as they get more advanced. There is much truth to that. It does not mean there is no value to the lower left activities. In fact, they provide the foundation for many data-driven decisions. We need the inventions of both Archimedes and Edison. It also does not mean that a company has to start at the lower left and work its way through each activity before they get to the upper right. There is great value in forecasting whether one has “completed” query drilldowns or not. Finally, each of these activities is not a one-and-done. All aspects of the company’s operations will benefit from each of these.
The more parts of the company that utilize these activities for their decision-making, and the more integrated those parts are, the better the company will be. This is true up and down the continuum. For example, the learnings from a prescriptive analysis performed in Sales can be implemented as alerts in Marketing for a better, smoother operation all around. To understand what each of these does, we’ll stay within the Sales function of the company. These examples try to give a sense of the benefits without getting bogged down in the details.
Standard reports are what companies have been doing for decades. There may be more or different information included in them now, but essentially they are the same type of report as Joseph gave to the Pharoah in ancient Egypt. How many of each type of grain do we have in each silo in preparation for the upcoming drought? In the Sales world, this might be a report of monthly sales activity. How many calls, contacts, visits has each salesperson made this month? How many opportunities are in each stage of the pipeline? This report takes data from the last month and aggregates it into various categories by particular filters. These reports used to be static but have been dynamic since Business Intelligence came into its own.
Ad hoc Reports
Standard reports are good for the known questions that come up with regularity. Not every question is known ahead of time, though. For those that come up unexpectedly, a data analyst or data scientist will investigate the data to create an ad hoc report that answers the questions. This could take hours or weeks, depending on how readily available and integrated the data already is. The more siloed an organization has kept its data, the harder and more error-prone are ad hoc reports.
Suppose that your company is acquiring another company in your region. In order to improve the integration, the data analyst is asked to print out a list of clients and point-of-contact names that your sales team has worked with in the last 10 years. They also want the products sold and the revenue generated by each client. One of the data issues that can cause big problems with this request is changing the system and not keeping the data up to date. If a new system was put in place two years ago, it may be difficult, or even impossible, to gather the data from three to 10 years ago, depending on the decisions made at the time.
One of the first forms of data analytics was the data cube. You can think of it as a multi-dimensional spreadsheet: rows, columns, worksheets and files all in one big database. With pre-calculated aggregations, percentages, etc., mining this data was fast and easy, as long as you wanted to go where the path was laid. Concerning usage, the query drill-down is a little like the ad hoc report in a standard report style. New database platforms do not require pre-calculated cubes to provide fast drill-downs.
For example, suppose that we are sponsoring a healthcare analytics conference. We might want to know how our team is doing with healthcare companies this year. We also might want to know which salesperson is doing the best and which is the worst. Do we send one or both to the conference? If we keep all of our sales data in the same database, then we can answer ad hoc questions like these without doing a lot of digging and combining.
Alerts are like alarms that let you know when something needs to happen. Do you have an alarm to wake up? One for when the bank balance gets too low? One to pick up the kids? Certainly one for an upcoming meeting. When certain conditions are met, an alert is triggered to let someone know to take action. Depending on the urgency, these alerts can be emails, texts, phone calls or some other communication channel.
The manager of our imaginary sales organization thinks it’s important that each member of the sales team is calling new prospects each week. A certain number of calls transitions into a certain number of discussions into a certain number of meetings, etc. However, the manager doesn’t want to keep looking at a report each week to see that everyone is meeting their targets. Instead, they set up an alert. If someone does not meet their target calls for the week, then an email is sent to the manager informing them that “so-and-so” only made XX calls last week.
Now we move into Advanced Analytics. The order that we have these activities in the graphic is not set in stone. Besides the methodology used, the real driver of value to the organization is the decision to which that method is applied. What food we serve in the cafeteria is an important question and optimizing calorie options will be good for the employee and, therefore, good for the company. At the same time, if we are not forecasting sales in the next quarter, we will have even bigger troubles that are not good for anyone.
Optimization is usually the highest value but the other three can change places depending on what they are being applied to.
Prescriptive modeling is answering the question of why something is happening or what can we do to make things happen a certain way. For these models, the relationship between the variables and the outcomes is of primary importance. Variables will be added or removed depending on their ability to explain why the outcome behaves as it does. If a methodology is a black box to the data scientist, then it cannot describe that relationship. Work is being done so that more methodologies can be prescriptive, but not all.
Our imaginary sales team has been working with the product developers to determine which product features are important and which are not and whether any features are missing. To do this, they talk to customers. They show them the product and ask them to rate overall how much they like it and then identify how important each of the features are. This data is collected and modeled to determine the relationship between the overall rating and each feature, either separately or possibly in groups. The analysis will show which features decrease the overall rating and which ones increase it. This will help them to develop the product roadmap for future releases.
Yogi Berra said that “It’s tough to make predictions, especially about the future.”
Almost all inferential statistics is about making predictions about the future. The reason it is so hard is because we only have the past to tell us what’s going to happen. As Mark Twain said, “History never repeats itself, but it does often rhyme.” The only way the past will help with the future is if things stay the same. After the pandemic years, we know that things often change unexpectedly. However, even with that, we can still forecast some processes.
Both prediction and forecasting tell what is going to happen with some outcome in the future based on what has happened in the past. The main difference between them is that prediction uses extra data points to predict the outcome, whereas forecasting uses the past behavior of the outcome itself. Some methodologies use both extra variables and the past behavior of the outcome itself. If that can be done, the forecasts/predictions can be better.
For example, suppose we want to forecast sales for the next quarter. By doing a time series analysis, we can use past sales data to predict what our sales will be in the future, assuming that the behavior stays essentially the same. We can incorporate seasonality into the model, too. If we typically have a peak in the summer, for example, we likely will this year as well. If we can also include extra variables, we can get an even better forecast. For example, we can incorporate certain one-off events, such as an industry conference or a promotional coupon. The pricing of our competitors may be included if we have a reasonable way to forecast that. Depending on what we’re selling, weather may be valuable. Remember, the more data within reach, the better our decisions.
Underneath all of this data-driven decision-making is the infrastructure required to make it happen. Much of this is standard IT infrastructure designed for the specific purpose of analyzing data. On top of that, we include software platforms for data processing, analytics, and visualization.
The analytics Infrastructure doesn’t have to be a fancy Hadoop environment in the cloud. It is possible to make powerful decisions with a personal computer. When companies expand beyond ad hoc decision-making into using data to optimize the enterprise, then the infrastructure will expand, too.
Predictive models, especially but not exclusively through machine learning, can be built for the sole purpose of predicting the future. All variables are included if they can help the prediction. There is no concern about how or whether the variable is related to the outcome. Think about Netflix recommendations and Google or Facebook ads. Getting the right prediction for the next movie to watch can be more important in this situation than understanding why.
Suppose we want to predict sales for the next seven days, and we have 25 inputs to use, such as day of the month, month, day of the week, promotion, state holiday, school holiday, sales of the previous seven days and promotions of the previous seven days. We don’t care, at this point anyway, what causes our prediction, so we use a neural net to build the model. We divide the data into Train, Test and Validate data sets so that we are not overfitting the model (which can happen in pure prediction), and it will predict well in future data sets.
The high point of Advanced Analytics is optimization. Optimization tries to find the best settings of a system based on some success criteria. Suppose we have five errands around town to do this afternoon, and we want to do this in the shortest time possible. We optimize our afternoon by minimizing the time it takes given the constraints of the locations and distances of the errands.
The same idea works for optimizing a system in an organization. We have limited resources (one person, our driving speed) with certain constraints (location of errands, distances). We want to optimize some outcome (amount of time it takes) according to some success criteria (as quickly as possible). In a company, that system can be at many levels and many systems within each level. It can be as small as one salesperson optimizing their time, a sales department optimizing its marketing budget or an enterprise optimizing the inventory of products based on the supply chain and sales forecasts.
Many optimization methodologies can be used depending on the type of system. The programming techniques include Linear, Quadratic, Nonlinear, Dynamic, Hill climbing, Simulated annealing, and Genetic algorithms.
You may ask why Artificial Intelligence is not included in the continuum. There are two reasons. First, Artificial Intelligence doesn’t really exist yet. What does exist is a concept that applies automation to all of the previous techniques to create Artificial Narrow Intelligence or Weak AI.
Data-driven decision-making improves the success of a company. Decisions are better when we have a better understanding of the past and use it to make decisions for the future. Over the years, almost all companies have already been using Data Analytics- half of the continuum in at least some part of their organization. More and more companies have added some Advanced Analytics capabilities here and there. The most successful companies are the ones that have added many Advanced Analytics capabilities in decision-making processes across the organization.
Wherever you sit in your organization, adding other analytics activities will help you to be more successful. It can be more advanced but doesn’t have to be. Any time you add analytics activities, you make better decisions.