“It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to suit facts.” — Sherlock Holmes
If you have good data and leverage it wisely, you can accurately predict the future.
It's called predictive analytics – the process of analyzing historical data in an attempt to predict future events. Done correctly, it yields actionable insights.
It's all the rage in this era of big data and machine learning, and it predates both, going back to at least the 17th century.
Lloyd’s of London – most likely the world’s first major insurer – covered the shipping industry. Experts would analyze known variables, assess risks, and decide whether to insure a client. (Unrelated, but interesting nonetheless: Bankers would accept the risk for a given trip and write their names under the risk information – hence, underwriting.)
We’ve come a long way. Today, predictive analytics generally refers to the use of data and statistical algorithms to create mathematical models that, using machine learning, predict the likelihood of future outcomes based on historical data. Think credit scoring or Amazon’s anticipatory shipping.
In the Harvard Business Review, author, professor, and analytics expert Tom Davenport provides a primer on how predictive analytics works. He makes it sound so simple: “All we have to do is gather the right data, do the right type of statistical model, and be careful of our assumptions.”
Simple? Maybe. Easy? Not so Much.
It may be simple, but that doesn’t mean it’s easy. But to preserve the simplicity, we’ll stick to his format. Here’s an overview of those three basic elements:
- Data: The quality of the analysis depends on the quality of the historical data. Data analytics can’t fix inaccurate or incomplete data. In other words, junk in, junk out.
- Statistical modeling: This can include a range of techniques; regression is the most common.
- Assumptions: The underlying assumption of predictive modeling is the future will behave like the past. (After all, you are predicting the future based on historical data.) But we know that doesn’t always happen, and faulty assumptions can undermine the model.
Let’s start with the first.
Selecting and “Cleaning” the Data
Everything depends on the type and quality of data being collected. “Lack of good data is the most common barrier to organizations seeking to employ predictive analytics,” says Davenport.
It is essential to knock down silos and collect data from across the enterprise in a data warehouse. Generally, the more data the better, but it’s important to balance data quantity with quality. KPMG’s Shreeshant Dabir offers this warning:
“Massive datasets are great, but a lot of valuable insights can also be uncovered from small, high-quality datasets. Instead of spending all of your time and resources on gathering as much data as possible, start exploring what the data could be telling you and then focus your efforts on finding and curating the right data to support deeper insights.”
Data collection is only part of the process; data must also be structured and organized. Data can be flawed for various reasons, some of them human (e.g., self-reporting errors, improperly labeled data, and poorly phrased questions) and some not (e.g., incomplete data aggregation and poor standardization methods.)
Only after the data has been prepared can the statistical modeling begin.
Various tools can be used to create statistical models. Regression analysis is one of the most common in predictive analysis. It entails looking at variables and assessing the relationships between them. “Using that regression equation, the analyst can then use the regression coefficients – the degree to which each variable affects the purchase behavior – to create a score predicting the likelihood of the purchase,” Davenport explains.
GutCheck, a market research firm, offers an illustration related to – what else? Market research. A retailer may want to learn how much the probability of an online purchase is affected by the ease of product search and the cost of delivery. “The regression output could show that the ease of product search has a stronger association with a likelihood to purchase and as a result, more focus should be placed on improving that variable over delivery cost.”
Another interesting tool is neural networks, which, as the name suggests, are designed to make predictions the way a human brain would. They use artificial intelligence to model extremely complex relationships based on, among other things, pattern recognition.
But just as these models require accurate data, they also require sound assumptions.
Getting the Assumptions Right
Remember when we said the underlying assumption of predictive modeling is the future will behave like the past? That becomes a problem when the assumptions are outdated. And we all know that past performance does not guarantee future results.
Sometimes, the problem is that a key variable was excluded. Davenport points to the financial crisis of a decade ago as an example of how faulty assumptions make models invalid. The models predicted the likelihood of mortgagees repaying their loans.
“The models didn’t include the possibility that housing prices might stop rising, and even that they might fall. When they did start falling, it turned out that the models became poor predictors of mortgage repayment. In essence, the fact that housing prices would always rise was a hidden assumption in the models.”
We all know what happened.
Then Do it Again
Predictive analytics is an iterative process. Once the models are built and assumptions checked, they need to be tested. But even that is not enough.
As new data is added, the models will change.
“The very nature of predictive models is that they may lose accuracy over time,” according to a Forrester report. It notes that more sophisticated predictive analytics solutions “include features to monitor the ongoing efficacy of models in production by comparing model output with established key performance indicators and testing new models using a champion/challenger or A/B testing scheme.”
It’s back to the junk in, junk out concept. Predictive models are only as accurate as the data they receive, and data can become outdated. So ideally, models are constantly being refined.
Making it Matter
None of this matters unless it results in actionable insights that improve your business
“It is key to remember that predictive analytics is only valuable if you can turn the results of analytical models into actions,” explains Malene Haxholdt, global marketing manager for business analytics at SAS. “The business process, the people, and the technology need to be aligned to successfully deploy predictive analytics.”
Companies that use predictive analytics to inform complex business decisions gain a competitive edge. In the coming years, winners and losers will be determined by their ability to use predictive analytics. If you’re unsure where to start or how to prepare, we can help. We specialize in transforming large data sets into powerful models that disproportionately influence future outcomes. Our advanced analytics team can provide strategy, hands-on engineering, and machine learning expertise. Get in touch.