Create a Robust OTA Update Mechanism for Your Remote Devices

Create a Robust OTA Update Mechanism for Your Remote Devices

By Gerardo Stola and Lisandro Pérez Meyer

A successful IoT solution will typically provide the capacity to remotely obtain and update its own software, for instance a new UI feature, a background task or an operating system update. But, to allow for these types of updates, adequate capabilities must first be put in place during the development phase. 

About Over-the-Air Updates

This remote updating feature is commonly called over-the-air (OTA) updates as most devices provide wireless links, such as WiFi, LoRa and cellular. (OTA updates also apply to wired setups used in many industries.) An effective OTA service provides robust management of the update process, taking into account the reality that the device fleet may be dispersed over a vast area, making manual updates both difficult and exorbitant. 

Ideally, the OTA process should be so seamless and unobtrusive that an end-user doesn’t even notice that an update has taken place. It needs to keep running, regardless of whether a new software package has been downloaded. It is true that under certain circumstances the update process will be more obvious because the device must be off-duty for a while. Think about a medical device performing clinical trials. The update should not proceed while the device is in use but rather at another time. In some other cases, the device might need to be offline when a restart is required or as a way to avoid a tiny glitch on the display. Still, a good user experience must be integral to the update process.

Timing and geography

For proper handling of the OTA update process it is important for both the update architecture and the UI to inform the operator at the right time that an update is required, and let them trigger the start of the install. This way, the operator can safely update during non-working hours rather than, say, during critical operations. 

In addition to timing, the update process must also adequately address geography as it can be quite challenging to keep software up to date on devices that are geographically spread out. First of all, the quality of the deliverable must be properly tested before any deployment to the fleet to ensure that all remote devices will work as expected after the update. This means that appropriate testing in a controlled environment is a must for each version of hardware in your fleet.

From the update server point of view, ‘proof of life’ is required from all remote devices. Typically this is achieved through periodic inbound messages that inform administrators of the device’s current status and version. 

To help system administrators understand device behavior in the field, developers can incorporate into the OTA process meaningful remote logs, which can be attached to the ping with device status. 

Consistency and security

Any piece of software must be atomically received and installed. Under no circumstances may an inconsistent update be allowed to be installed. In order to achieve this, there are a variety of standard redundancy checks that should be performed just after the software is downloaded.

The device must properly identify itself to the update server and the server must recognize that the device truly belongs to the fleet. A current standard to achieve this is mutual TLS authentication. This involves public and private key pairs for each device and for the server.

Regarding the authenticity of the content, the client also needs to verify that it comes from a trusted source in its original form. A digital signature can help in this situation.

Delta updates

Other aspects to consider when establishing a device’s OTA update capabilities relate to the device’s power and communications capabilities. Some devices might be plugged into a wall outlet and connected to the Internet via WiFi while other devices might use a low-speed modem and be battery powered. The first type of device might easily handle a big payload update but the other type probably could not. This is when delta updates become important, as only the differences between the current system and the updated system are transferred to the device, thus decreasing its power and bandwidth consumption.

Being able to deploy updates in batches is also a feature to consider, especially when the amount of devices is high. A batched approach will not only reduce the total amount of simultaneous connections to the update server but will also help in determining if the update is going as expected. It is always wise to start the update on a small batch of devices and then increase the coverage as things prove to work as expected.

When updates go wrong

Sometimes things can go wrong. Power is lost during an upgrade. A check fails and the wrong update makes it to an incompatible device. The possibilities are endless. Regardless of the cause, the system needs to be able to either continue the update or roll back to the previously known working version. There are many ways to achieve this. Determining the best option requires understanding the device’s usage patterns, as well as having the flexibility to accommodate different failures and rollback scenarios.

Choosing the Right Update Technology

There are many technologies available that provide OTA updates for embedded systems.  Some can only do application updates, while others can provide full-root filesystem updates. There are even some technologies that combine both capabilities. In addition, there are container-based solutions worth mentioning such as libostree-based Aktualizr. 

Mender

Mender is an open source, end-to-end solution that provides OTA support for Linux and Android devices, for both application updates and full root filesystem updates (A/B partition switching). Their documentation is really nice and frequently updated. This platform involves a server that handles the update process and a UI to help track what is going on with each device. Device status is updated periodically and displayed in the UI, including firmware version, hardware release and some other data. 

On the client side, the update is highly customizable, thanks to a variety of scripts that can be added at diverse points of the process. The API is well documented, so the client could potentially be implemented on non-Linux MCUs (bare metal or RTOS flavor).

Libostree-based technologies

Linux-based solutions utilizing libostree are based on delta updates. An example of a libostree-based solution is Aktualizr, which implements an Uptane OTA update client, which is a technology used mostly in the automobile industry.

Due to its delta-based update strategy, bandwidth consumption is smaller than full root filesystem updates. Instead of using an A/B partition strategy, it is able to change the root filesystem by using atomic changes to files, deployed in a git-like fashion. The filesystem hierarchy must be immutable. This also makes it a perfect candidate for using containers managed by a slim base OS.

The Takeaway

OTA updates are a must for most if not all embedded systems incorporating connectivity. Selecting the right approach for your device is critical in order to fully leverage your product’s value. If you’d like guidance in your decision making, get in touch with our engineering team.