Bringing Agile Development to Machine Learning and IoT
One of the reasons I joined Very was because I am interested specifically in agile development practices for machine learning applications. Very is one of the leaders, in my opinion, in agile development practices for building IoT products, and those very often require machine learning components for certain types of predictions or recommendations (for example, a facial recognition-powered beer kiosk system). In general, in the industry, agile development practices and data science don't always go hand in hand, so what Very’s doing in that regard is pretty cutting edge.
One of the things that makes agile IoT development so successful at Very is our capacity for multidisciplinary work. As a fully remote company long before the COVID-19 pandemic, we’ve become experts in remote collaboration and asynchronous communication. We regularly practice remote pair programming between engineers working on different aspects of a project, like data science and hardware.
This helps each person to understand how the piece they’re working on fits into the whole so that we can create and improve upon development processes. It can be tricky to follow the agile development practice of building “thin, vertical slices”, when those slices cut through not just the UI and backend, but also hardware, firmware, and machine learning elements of the stack. A cohesive team that remains in tight communication, where all members understand the entire system at a high level (not just their own component), can be essential here.
With machine learning specifically, we’re often in a situation where we want to start prototyping the system even before “real” data is flowing. In this case, it’s often useful to use simulated data, and a good rule of thumb is to do the data simulation as close to the data source as possible.
For example, say we want to build an anomaly detection model for an industrial IoT application that streams data off of a PLC on a production line.
One option would be for a data scientist to work in isolation building an anomaly detection pipeline off of data that they simulate in Python.
Another option (which brings us much closer to a product that will work in production) is to have the data scientist working together with a firmware developer and backend engineer, setting up the pipeline which runs all the way from data getting simulated on the firmware of a test PLC in the lab, transmitted over cellular, and processed in the cloud (although the anomaly detection piece might be the same in both cases).
Then when it’s time to hook the PLC up to the real plant with the same firmware and basic system, you’re that much closer to a fully-functioning solution.
Obviously, simulated data won’t be useful for training accurate models and making predictions, but it can give you a big head start on the scaffolding of the system, and is also extremely valuable for unit and integration testing. Ultimately, this can shorten your time to market for the final product, especially when compared to a process where components are developed in isolation, and then “assembled” towards the end of development.
At the end of our talk during Connected Things 2020: WFH Edition, our moderator and MITEF board member Mark Thirman asked if I had any closing advice for our attendees.
I shared with him that it’s a really exciting time in the IoT and machine learning spaces, and not to worry too much if you’re feeling like you’re behind the innovation curve. It’s never too late to get started, especially when you partner with a company like Very who’s been there, done that.