Types of NoSQL Databases
1. Document Databases
A popular flavor of NoSQL is the document database. Solutions like DynamoDB and MongoDB store data as JSON documents and, due to the flexibility of this medium, are one of the most general-purpose options. Document databases are typically write-optimized out of the box, and they can handle the blast of data from your devices, often at the price of strong consistency guarantees. They are well-suited for keeping and updating an IoT device’s current state, but they fall short when you need to do something like group devices by some feature nested in the document, such as showing all devices with their LEDs blinking blue.
2. Time-Series Databases
There are also time-series databases like Timestream. As the name suggests, these databases are optimized to store time-oriented data, making them perfect for audit logs and machine learning models that rely on time-series data. They are also a good option for storing sensor readings or capturing change events in your system as they support fast queries of data based on time bounds.
3. Columnar Databases
A third option is columnar databases, such as Redshift. Whereas a SQL database would have to query every row to get information from an entire column, a columnar database can ignore the rows to directly query the columns. This is useful for IoT applications where analysis or grouping is being done on specific columns or attributes of the data.
These are just a few of the flavors of NoSQL databases available to IoT developers, and there are still plenty more we did not cover today that have compelling features for IoT development — two of my personal favorites are In-Memory and Graph databases. While these and others all have their unique advantages, they all pose the same challenges: how do we effectively manage our data given these constraints?
Challenges of NoSQL for IoT Development
When we start working with NoSQL databases, it can be tempting to dump a bunch of data into it and tell yourself that you’ll work out the details later. Trust me: this isn’t the technical debt that you want to take on.
Without an enforced schema, we can run into major problems down the road, including but not limited to:
- Not being able to read our data
- Running into pipeline inefficiencies
- Outright data loss
It might be hard to think about when you’re trying to organize a database from the ground up, but we should start thinking about what version two looks like before we even finish version one. SQL databases are much more forgiving because you can always perform migrations later, whereas NoSQL databases require a lot more forethought.
We need to have a plan for how we’re going to access, manage, support, and validate from the get-go. Speaking of validation, this is another sticking point of unstructured data. Especially in IoT systems, we often encounter the scenario where different devices out in the field are running different firmware and either writing different data or piping the same data in a different format.
We need to ask ourselves: do I trust this data? The key to building this trust is observability into our infrastructure. We need to be able to see what inserts are logged, define what types of data our devices should be sending, and we should run automatic checks to make sure that’s the case.
Data Access Patterns Should Drive Database Choices
Yes, it can be hard to think about how your eventual ML pipeline is going to access device data before your sensors have collected their first KB, but making this investment early pays huge dividends in the long run. Another part of this is picking the right NoSQL database.
If there’s one thing that I want you to take away from this article, it’s this: data access patterns should drive your database choices.
This can lead to hybrid architectures where we combine different types of NoSQL databases or even add relational databases into the mix. While it’s crucial that we know the direction of synchronization to avoid split brain and other headaches, it’s possible to architect creative solutions that leverage the advantages of multiple databases.
If this all sounds quite complicated, that’s because it is. And that’s why it’s so crucial to work with a partner that brings the necessary expertise to the table. To find out more about how Very approaches today’s IoT engineering landscape, check out our IoT development guide.