Univariate vs Multivariate Time Series Analysis
Our second question brings the third and fourth types of anomaly detection. It’s a simple one: are we going to look at how things change over time? Data scientists call this a time series, and we can perform both univariate and multivariate time series analysis. This lets us look at trends, such as increased user engagement with our IoT device, and it also enables us to contextualize our data because it’s reasonable to expect different results at different times.
Going back to the weather, a univariate time series analysis can detect anomalies such as a sudden drop in temperature overnight. Another example is contextual anomaly detection, such as the average temperature during a given month. If we’re looking at the temperature in July, we’ll gather our sensor data throughout the month and then compare it to historical records.
Of course, in the Northern Hemisphere, we expect the average temperature to be higher during July than in December, so this analysis lets us distinguish between these two time periods to establish a normal range for each, thus lending us a useful context.
Multivariate time series analysis, on the other hand, lets us look at systems as a whole. Let’s take, for example, the record-breaking 2020 wildfire season in the American West. While a univariate time series will tell us that yes, indeed, the intensity of this year’s wildfires is anomalous, a multivariate time series can give us clues about why. These factors include low precipitation, high heat, frequency of lightning storms, or even less obvious contributors such as an increased rate of fires started by more people escaping to the woods because of COVID-19.
While a multivariate time series has the potential to uncover relationships that would be otherwise invisible, figuring out the relevancy of any given pattern remains a challenge. We cannot forget that correlation does not imply causation. This brings us to our next important question: how do we sort through our algorithms’ results? How do we know that a data point is truly anomalous and actually unexpected rather than a statistical improbability that’s guaranteed to surface eventually?
After all, the odds of getting a royal straight flush in poker are 649,740 to 1, or 0.00015%, but that doesn’t mean that we’d call the lucky winner a cheater.
Answering these questions is the real work of a data scientist. Sure, if we have tons and tons of training data alongside as much compute power as we want, maybe we can just dump everything into a neural network and hope for the best. However, we often don’t have this luxury, and, even if we do, problems like the ones we discussed above can still arise.
This is where subject matter expertise enters the scene. If we know that a certain range of values is normal, then we can categorize anything outside of that as anomalous, regardless of the context or frequency in which those anomalies occur. For instance, if we’re building a smart toaster, we’ll know an acceptable range of temperatures, voltage, toast durations, etc. We can then hardcode these values into our algorithm.
And this is just the beginning. Data scientists need to think critically about many questions, such as:
- Who is this algorithm for?
- Where does this information end up getting surfaced?
- What are the implications of this anomaly?
The process that we take to answer these tough questions is beyond the scope of this article, but the point I want to highlight is that anomaly detection isn’t just plug-and-play artificial intelligence.
Anomaly detection requires careful consideration, deep expertise, and collaboration between engineers, businesspeople, and data scientists. If you’re ready to take the next step and learn more about how Very starts thinking about data science on day one of the product life cycle, check out our IoT machine learning services and get in touch today.