Want to know how IoT development works?

READ THE GUIDE
0.0212_anomaly-detection

A Beginner’s Guide to Anomaly Detection

Share:

Facebook Twitter Link

One hallmark of a mature Internet of Things (IoT) product is a data science approach that features advanced analytics and deploys Machine Learning (ML) models to add value to both the consumer and business sides of the equation. One of the fastest ways to see an ROI in this arena is with anomaly detection.

In Data Mining: Concepts and Techniques by Han et al., the authors define anomaly detection as “the process of finding data objects with behaviors that are very different from expectations. Such objects are called outliers or anomalies.” 

Essentially, we’re looking for observations that break the norm, that either exceed or lag behind what we discover or define as usual. These can have a negative quality, such as a device that’s not working properly, or it could be positive, like a mobile application seeing a sharp uptick in usage during a certain period.

In this way, anomaly detection brings benefits from both a business and technical standpoint. Organizations can use this analysis to inform strategic decision-making, such as devoting more budget to a project because its value exceeds expectations, or they can optimize an IoT device’s functionality with applications like predictive maintenance, where anomalous sensor data indicates expected machine failure so that we can fix it before it breaks to reduce downtime.

In this article, we’re going to go over some of the basic types of anomaly detection that data scientists can choose from, and we’ll explore some of the analytical frameworks that we use to inform our approach to anomaly detection and put our analysis into context.

The Four Basic Types of Anomaly Detection

Anomaly_blog_

Univariate vs Multivariate Analysis

The first question we ask when deciding what type of anomaly detection method to use is “how many variables should we consider?” If we only need to look at one variable, we’ll use a univariate approach. On the other hand, if we need to consider multiple factors and the interplay between them, we’ll select multivariate analysis.

Let’s make it simple and talk about the weather. If we want to look at anomalous weather patterns, univariate anomaly detection will measure a single indicator, such as temperature. We can then ask questions like “is this temperature strange for this region?”

On the other hand, multivariate analysis will consider a host of factors, like precipitation, humidity, air pressure, or even tangentially related variables such as crop yield or wildlife activity. As a result, multivariate anomaly detection has a higher ceiling for revealing interesting patterns, but it’s also more difficult to successfully pull off.

Sign up for our newsletter

Join 10,000+ subscribers to get the latest IoT development news delivered to your inbox.

Univariate vs Multivariate Time Series Analysis 

Our second question brings the third and fourth types of anomaly detection. It’s a simple one: are we going to look at how things change over time? Data scientists call this a time series, and we can perform both univariate and multivariate time series analysis. This lets us look at trends, such as increased user engagement with our IoT device, and it also enables us to contextualize our data because it’s reasonable to expect different results at different times.

Going back to the weather, a univariate time series analysis can detect anomalies such as a sudden drop in temperature overnight. Another example is contextual anomaly detection, such as the average temperature during a given month. If we’re looking at the temperature in July, we’ll gather our sensor data throughout the month and then compare it to historical records.

time-series-multivariate-data

Of course, in the Northern Hemisphere, we expect the average temperature to be higher during July than in December, so this analysis lets us distinguish between these two time periods to establish a normal range for each, thus lending us a useful context.

Multivariate time series analysis, on the other hand, lets us look at systems as a whole. Let’s take, for example, the record-breaking 2020 wildfire season in the American West. While a univariate time series will tell us that yes, indeed, the intensity of this year’s wildfires is anomalous, a multivariate time series can give us clues about why. These factors include low precipitation, high heat, frequency of lightning storms, or even less obvious contributors such as an increased rate of fires started by more people escaping to the woods because of COVID-19.

While a multivariate time series has the potential to uncover relationships that would be otherwise invisible, figuring out the relevancy of any given pattern remains a challenge. We cannot forget that correlation does not imply causation. This brings us to our next important question: how do we sort through our algorithms’ results? How do we know that a data point is truly anomalous and actually unexpected rather than a statistical improbability that’s guaranteed to surface eventually?

After all, the odds of getting a royal straight flush in poker are 649,740 to 1, or 0.00015%, but that doesn’t mean that we’d call the lucky winner a cheater.

Analytical Framing

analytical-framing

Answering these questions is the real work of a data scientist. Sure, if we have tons and tons of training data alongside as much compute power as we want, maybe we can just dump everything into a neural network and hope for the best. However, we often don’t have this luxury, and, even if we do, problems like the ones we discussed above can still arise.

This is where subject matter expertise enters the scene. If we know that a certain range of values is normal, then we can categorize anything outside of that as anomalous, regardless of the context or frequency in which those anomalies occur. For instance, if we’re building a smart toaster, we’ll know an acceptable range of temperatures, voltage, toast durations, etc. We can then hardcode these values into our algorithm.

And this is just the beginning. Data scientists need to think critically about many questions, such as:

  • Who is this algorithm for?
  • Where does this information end up getting surfaced? 
  • What are the implications of this anomaly? 

The process that we take to answer these tough questions is beyond the scope of this article, but the point I want to highlight is that anomaly detection isn’t just plug-and-play artificial intelligence.

Anomaly detection requires careful consideration, deep expertise, and collaboration between engineers, businesspeople, and data scientists. If you’re ready to take the next step and learn more about how Very starts thinking about data science on day one of the product life cycle, check out our IoT machine learning services and get in touch today.

iot compliance guide