Want to know how IoT development works?

READ THE GUIDE
0.0206_nerves-of-steel

Nerves of Steel: 5 Ways to Guarantee Reliability for Embedded Devices

Share:

Facebook Twitter Link

We have a simple question. How do we prevent our internet of things (IoT) devices from becoming useless bricks?

In a recent talk during the Nerves in Action web series, Jon Carstens from SmartRent explained how embedded developers can harden their Nerves setup to prevent “a rogue update” from bricking a device. Watch the talk here:

Built in the Elixir language, a fault-tolerant language that’s ideal for embedded devices, Nerves enables IoT developers to create devices that require minimal human intervention: an important quality for devices that we set out into the wild all by their lonesome.

That’s why Carstens has been working with the Nerves community to create an even more robust IoT platform. In this article, we’re going to cover five mechanisms for making devices as strong as steel.

AB Partition Schemes 

The most common cause of a bricked device is bad firmware, the low-level code that allows our applications to interface with the hardware. When we burn firmware onto an SD card, we usually make four partitions. First, we have the U-boot env, a small bit of code that’ll prepare the system for boot—don’t worry, we’ll go in-depth on this in just a bit.ab

After that, we have the two partitions that commonly make up a piece of firmware: the boot partition, which handles the actual boot process, and the root filesystem (aka rootfs), which contains all the embedded Linux nuts and bolts. The bootloader mounts the rootfs, and this lets us access the fourth, final, and largest partition: the application space where we write our app code.

If we were to update either firmware partition by simply overwriting it, we open ourselves up to a huge risk. If something goes wrong, then it’s difficult to fix at best, or at worst it’s completely impossible from a distance.

However, by implementing AB partitions, we’ll always have a good firmware version to fall back on. Say, for example, that we’re running firmware A, and then we receive an over-the-air (OTA) firmware update from a NervesHub server. We’ll then write the updated version onto the B partition.

When we reboot, we’ll launch from the B partition. If it fails to boot, we can simply call the Nerves.Runtime.revert() function to reboot from the working A partition. In this way, we guarantee that we always have a verified firmware version on our device, giving us the power to iterate, patch, and improve our devices.

Uboot Scripting

You may have noticed a small problem. If our boot fails to mount the rootfs, how on earth are we supposed to call our revert function? Luckily, U-boot scripting is here to save the day.

On the BeagleBoard controller, we can add scripts to the U-boot env to check if the previous firmware correctly booted. So, if B failed to boot when we last tried it, the U-boot script will revert to A.

However, other popular controllers like the Raspberry Pi don’t support U-boot scripting. Uh-oh! What’re we supposed to do now?

Sign up for our newsletter

Join 10,000+ subscribers to get the latest IoT development news delivered to your inbox.

Initramfs

Initramfs

Though it may seem like alphabet soup at first, initramfs is the solution. Short for initialize RAM filesystem, this lets us load a stripped-down rootfs into memory before loading the real one.

The nerves_initramfs bears close resemblance to the U-boot scripting we saw above, and, for practical purposes, it does pretty much the same thing: verify our firmware. It takes a different approach to get there, but the TL;DR is that it lets us revert to our known good partition if we need to.

Heart

Since Elixir is built on top of Erlang, we have access to built-in features like heart. Essentially, this module lets us ‘take a pulse’ at predefined intervals to make sure that everything is running smoothly. We can ask heart to check up on internet connectivity, an active link to NervesHub, or external sources like sensors or MQTT service.

In an article titled You Gotta Have Heart, Doug Selph explains that “Heart runs in the BEAM as an optional feature, and when active will trigger a restart if it detects an application crash.”  By implementing a watchdog system, a heart callback will either give the ok to keep running, or it will automatically reboot the system.

You can check out the nerves_heart repository for all the juicy details about how we implement this feature in Nerves.

Shoehorn

shoehorn

Our final tactic for creating reliable embedded systems is Shoehorn, a shim that ensures our virtual machine (VM) will always pass initialization. Let’s take a step back to look at this advantage.

Let’s say our application has some critical dependencies that provide networking, secure shell (SSH) access, or a link to NervesHub. If something goes wrong, these tools let us access the device to fix the problem. The trouble, however, is that when our application crashes, then these dependencies crash with it. This leaves us stuck in quite the jam: our broken app also broke the tools for fixing it.

By using Shoehorn, however, we’re able to configure our initialize sequence to separate the dependencies’ startup processes from the app itself. That way, even if the app crashes, those dependencies keep running. Therefore, we can still access our device because we still have networking, SSH, and NervesHub.

Conclusion

 All of these device hardening strategies ultimately boil down to a central design philosophy. In his talk, Jon Carstens implored us to always keep this mantra at the front of our mind:

“ALWAYS BE ABLE TO UPDATE THE DEVICE!”

Troubleshooting and repair can be hard, but they’re a lot less frustrating than not being able to access the device at all. That’s why all these techniques give us doorways to go through, even in the worst of times.

Eager to learn more about how Very makes reliable embedded devices on the Nerves platform? Check out our Nerves development services.

iot compliance guide