Skip to content

BLOG

NoSQL for IoT Development — How to Choose the Right Database

Data is one of the defining value propositions of an internet of things (IoT) device. Not only do connected devices provide a bridge by collecting sensor data from the physical world, but the ability to run analytics and create recurring value is a hallmark of a mature IoT product

While we often focus on the firmware that collects and processes IoT data at the edge or the machine learning (ML) algorithms that turn collected data into intelligence in the cloud, today we’re going to go over the intermediary that pulls it all together: data engineering.

What Is a NoSQL Database?

There are a lot of different ways to store collected data. One of the most common ways is in a relational database like PostgreSQL, where data is stored according to a schema in well-defined rows and columns and that we can query with SQL (Structured Query Language).

However, databases and the ecosystems they serve have grown quite a bit in the four decades since SQL hit the scene; namely, we’ve seen the rise of NoSQL (Not Only SQL) databases that store information in myriad ways.

There are many types of NoSQL databases to choose from, and our selection will ultimately depend on our use case. Like most decisions in product development, the best solution is always contextual.

It’s important to keep in mind, though, that while SQL refers to a single language (with dialects), NoSQL is a loosely aligned grouping of languages, dialects and potpourri, including parts of SQL, but they all have features that make them unique.

In this article, we’re going to cover why NoSQL databases are useful for IoT, some of the different types of non-relational databases that we have to choose from, and some of the challenges that data engineers face when dealing with IoT data.

Relational Databases vs. NoSQL for IoT Development

Relational Databases vs. NoSQL for IoT Development

Before diving into why NoSQL for IoT is a good choice, it’s important to discuss why we might stray from trusty relational databases like PostgreSQL or MySQL.

Relational databases are incredibly useful, and they offer tooling and semantics for storing and querying data. As always, look before you leap and check if there are plugins or extensions for your favorite relational databases that might support your NoSQL needs.

PostgreSQL, for example, has excellent support for use cases like search, geospatial data, and JSON document columns. Having said all that, let’s think about the cases where NoSQL shines.

Scalability

When scaling write capacity, adding nodes can be difficult for SQL databases. IoT use cases are often write-heavy and often have unpredictable bursts of traffic. If easy scalability is a concern for your team, document databases like MongoDB or DynamoDB can be great choices to handle your system load.

Consistency

Relational databases also typically offer strong consistency guarantees, and your IoT application may be perfectly fine with eventual consistency models. Strong consistency is a property of databases that guarantee that any writes will be seen by all readers. While it is tempting to think eventual consistency is a bad thing, in practice many applications are perfectly suitable for it.

Consider temperature sensor readings that are being stored for nightly analysis. In this case, you don’t need to guarantee that all readers see the same value immediately, just that the write happened, and by the time you run analytics you can have confidence that the readings have arrived.

Flexibility

With SQL databases, you have schema and validation capabilities built into the DB. Popular tooling like Django and Rails also help you manage your migrations when the schema changes. However, you may find your data needs a little bit more flexibility, which might make schemaless NoSQL databases appealing since you can push schema enforcement logic to your application code.

Types of NoSQL Databases

NoSQL databases for IoT development

1. Document Databases

A popular flavor of NoSQL is the document database. Solutions like DynamoDB and MongoDB store data as JSON documents and, due to the flexibility of this medium, are one of the most general-purpose options. Document databases are typically write-optimized out of the box, and they can handle the blast of data from your devices, often at the price of strong consistency guarantees. They are well-suited for keeping and updating an IoT device’s current state, but they fall short when you need to do something like group devices by some feature nested in the document, such as showing all devices with their LEDs blinking blue.

2. Time-Series Databases

There are also time-series databases like Timestream. As the name suggests, these databases are optimized to store time-oriented data, making them perfect for audit logs and machine learning models that rely on time-series data. They are also a good option for storing sensor readings or capturing change events in your system as they support fast queries of data based on time bounds.

3. Columnar Databases

A third option is columnar databases, such as Redshift. Whereas a SQL database would have to query every row to get information from an entire column, a columnar database can ignore the rows to directly query the columns. This is useful for IoT applications where analysis or grouping is being done on specific columns or attributes of the data. 

These are just a few of the flavors of NoSQL databases available to IoT developers, and there are still plenty more we did not cover today that have compelling features for IoT development — two of my personal favorites are In-Memory and Graph databases. While these and others all have their unique advantages, they all pose the same challenges: how do we effectively manage our data given these constraints?

Challenges of NoSQL for IoT Development

When we start working with NoSQL databases, it can be tempting to dump a bunch of data into it and tell yourself that you’ll work out the details later. Trust me: this isn’t the technical debt that you want to take on.

Without an enforced schema, we can run into major problems down the road, including but not limited to:

  • Not being able to read our data
  • Running into pipeline inefficiencies
  • Outright data loss

It might be hard to think about when you’re trying to organize a database from the ground up, but we should start thinking about what version two looks like before we even finish version one. SQL databases are much more forgiving because you can always perform migrations later, whereas NoSQL databases require a lot more forethought. 

We need to have a plan for how we’re going to access, manage, support, and validate from the get-go. Speaking of validation, this is another sticking point of unstructured data. Especially in IoT systems, we often encounter the scenario where different devices out in the field are running different firmware and either writing different data or piping the same data in a different format. 

We need to ask ourselves: do I trust this data? The key to building this trust is observability into our infrastructure. We need to be able to see what inserts are logged, define what types of data our devices should be sending, and we should run automatic checks to make sure that’s the case.

Data Access Patterns Should Drive Database Choices

Yes, it can be hard to think about how your eventual ML pipeline is going to access device data before your sensors have collected their first KB, but making this investment early pays huge dividends in the long run. Another part of this is picking the right NoSQL database. 

If there’s one thing that I want you to take away from this article, it’s this: data access patterns should drive your database choices.

This can lead to hybrid architectures where we combine different types of NoSQL databases or even add relational databases into the mix. While it’s crucial that we know the direction of synchronization to avoid split brain and other headaches, it’s possible to architect creative solutions that leverage the advantages of multiple databases.

If this all sounds quite complicated, that’s because it is. And that’s why it’s so crucial to work with a partner that brings the necessary expertise to the table. 

KEEP READING: Discover how Very partnered with SUN Automation to develop a groundbreaking anomaly detection solution, empowering users to predict and prevent machine failure using advanced data analytics and machine learning.