Technology for IoT Developers
Building redundant, fault-tolerant systems is very much a part of our approach to the IoT tech stack, so we want to give you an overview of some of the solutions that we use. A big part of this is a software development technique called defensive programming.
This means that all our code should have a plan for errors, and we bake this into our applications while we write them. That way, even if our program does crash or we encounter an unexpected problem, we can reduce the blast radius and mitigate the damage. This could mean isolating the issue, automatically trying to restart the program, or even triggering a full system reboot.
This is the main reason why Elixir is our go-to language for developing connected products. Elixir leverages the power of the Erlang VM, which is designed with fault-tolerance in mind, and as a result, it promotes defensive programming habits.
Take, for instance, the way Elixir/Erlang treats processes. Instead of running all our code in the same space, we break functionality into discrete parts, which we can then isolate as processes. Elixir code can run up to millions of concurrent processes.
There are some distinct advantages to this approach. First, if one process dies, then it won’t crash our entire application—we might just lose a bit of functionality.
Second, we can structure processes hierarchically by using supervisors, which are basically processes that monitor other processes. We then configure these supervisors to act based on our contingencies. This could be restarting a dead process, sending an error report, or anything else that we need.
Pictured below is an example of one of the restart strategies enabled by Elixir/Erlang, where a supervisor restarts all processes when one process dies. However, as mentioned above, you can also arrange your supervisors so that only the process that fails restarts, or so that the system will restart processes based on the order they were created and which ones failed.
The key here is that Elixir isn’t the first language to take this process-based approach to programming, but rather that doing so is much easier than in other languages like C or Java. This is because Elixir includes built-in functions for managing processes in its standard library, such as send(), which sends information from one process to another, and spawn(), which creates a new process.
Another part of our fault-tolerance strategy is the Nerves framework, an Elixir toolkit for building embedded Linux systems. While Nerves brings a lot of benefits for IoT development, I want to draw attention to its reliability features. Take, for example, AB partitions, where we run our current firmware on one disk partition and write updates to the other one.
When we reboot and switch to the new firmware’s partition, we’ll run a series of checks, such as:
If the updated device can’t meet these conditions, then it simply reverts to the previous working version. Not only does this stop devices from turning into bricks, but it also gives us the freedom to push out smaller, more frequent patches to fix bugs, make optimizations, or patch vulnerabilities.
The End Game: Exceeding Customer Expectations
When a consumer or a business buys an IoT device, they bring a series of expectations to the table. These can range from a frictionless onboarding experience to expecting it to bring tangible value to their day-to-day, but there’s one thing they always want: longevity. Nobody wants to invest in a product that’s going to keel over in a light breeze.
Some IoT developers approach IoT projects with rose-tinted glasses, imagining that their devices will always function in optimal conditions and that nothing will go wrong. That’s not our philosophy. We always ask the hard questions, figure out what can and will go wrong, and develop a strategy to account for these failures.
And that’s just one small part of our unique process. Learn more about how our agile approach to IoT development is designed to navigate complexity and de-risk your IoT project.