As the Internet of Things (IoT) continues to expand, managing a large number of heterogeneous devices with diverse requirements and characteristics has become essential. The Cloud computing ecosystem has provided a vast amount of tools to efficiently manage workload deployments. However, there is a huge gap between the requirements of Cloud infrastructure as opposed to Edge or IoT infrastructure. One crucial difference is the diversity and resource constraints of IoT devices. As billions of devices become part of our environment, the need for secure, robust device management and seamless integration into higher-level orchestration frameworks becomes increasingly critical.

Kubernetes (K8s) has emerged as the standard for container orchestration but dealing with resource-constrained edge devices poses significant challenges. This blog post explores the challenges and proposed solutions for managing IoT device firmware in the context of cloud-native orchestration.

Challenges in IoT Device Management Link to heading

Managing IoT devices presents unique challenges due to the diversity of IoT ecosystems, the scale of deployments, and the necessity for secure and efficient operations. These challenges are categorized into three main areas:

  • Firmware Management: Ensuring that all IoT devices are running the latest firmware versions securely and efficiently.
  • Orchestration: Integrating IoT devices into existing orchestration frameworks without overburdening their limited resources.
  • Security and On-boarding: Securely on-boarding new devices and managing the security of existing devices throughout their lifecycle .

The Cloud-native Ecosystem Link to heading

The cloud-native concept has revolutionized the design, development, deployment, and management of applications by leveraging cloud computing principles such as scalability, resilience, automation, and agility.

Containerization and micro-services are essential to this approach, providing flexibility, efficient resource utilization, and consistency across diverse environments.

However, deploying containers on IoT devices is not entirely feasible, due to their limited processing power, memory, and storage. To address these constraints, the community has proposed alternative solutions, including cloud-native management frameworks running as micro-services on edge devices close to the IoT infrastructure. These edge devices act as proxies or gateways, facilitating communication with the IoT devices.

IoT Device Management Frameworks Link to heading

Two popular cloud-native management frameworks for Edge/IoT devices are KubeEdge and Akri. These frameworks integrate with the cloud-native ecosystem, offering various benefits and facing certain limitations:

  • KubeEdge: Extends Kubernetes to the edge, providing infrastructure support for edge computing applications. It enables centralized management of edge nodes and devices, offering features like device management, data synchronization, and edge application deployment.
  • Akri: Focuses on edge device discovery and management, simplifying the integration of IoT devices into Kubernetes clusters. Akri automatically detects and registers IoT devices, making them available to applications running in the cluster.

None of the above frameworks, however, provide a pure cloud-native approach to firmware updating. Specifically, both use custom containers that can fetch the firmware from specific locations available either locally or publicly, and use custom tools to flash the firmware directly to the device. This means that none of these frameworks take advantage of the unique characteristics of the OCI spec to leverage software delivery benefits seamlessly.

Enhancements to Akri Link to heading

Earlier this year, we shared our take on how to tackle these challenges in a research paper presented at the MECC workshop in EuroSys 2024. In this work, we introduce enhancements to the Akri framework to reduce resource utilization on edge gateways, moving towards a fully unified infrastructure management solution based on cloud-native concepts. These enhancements aim to simplify IoT device firmware management and improve the efficiency of IoT device firmware upgrades across the Cloud-Edge-IoT continuum.

Figure 1: Stock Akri workflow

Figure 1: Stock Akri workflow

Essentially, Akri is a fully-featured, modular framework to manage IoT device applications. When it comes to firmware flashing (e.g. OTA updates), Akri relies on the user to provide a custom container image that is responsible for fetching the firmware, communicating with the device and eventually flashing the firmware on the device. Our approach is to leverage Akri for the device identification and mapping, while at the same time add the functionality to define which devices we would like to upgrade / re-purpose. A proof of concept has been implemented and the code changes are minimal.

In essence, we add an extra type of job in the Akri Controller logic. Initially, the Akri Configuration must include additional values such as the FirmwareJobSpec, responsible for managing the firmware of the leaf device. Additionally, the internal structures holding the fields of the Configuration CRD need to be expanded. More importantly, the Controller needs another piece of logic to check whether there is a firmwareJobSpec field in the configuration, and if positive, deploy the firmware jobs accordingly.

Figure 2: Updated Akri workflow

Figure 2: Updated Akri workflow

The only difference in terms of action compared to vanilla Akri is the scheduling of the firmwareJob, which is handled by the Controller.

In more detail, the Akri Configuration CRD needs additional fields to hold the values of firmwareJob:

 1# deployment/helm/crds/akri-configuration-crd.yaml
 2            spec:
 3              type: object
 4              properties:
 5                firmwareJobSpec: # {{JobSpec}
 6                  x-kubernetes-preserve-unknown-fields: true
 7                  type: object
 8                  nullable: true                
 9                discoveryHandler: # {{DiscoveryHandlerInfo}}
10                  type: object
11                  properties:
12                    name:
13                      type: string
14                    discoveryDetails:
15                      type: string
16                    discoveryProperties:
17                      nullable: true
18                      type: array

The logic to handle the extra configuration field resides in controller/src/util/instance_action.rs

 1controller/src/util/instance_action.rs
 2
 3    if let Some(firmweare_job_spec) = &configuration.spec.firmware_job_spec {  
 4        trace!("about to handle the firmware job spec {:?}", firmweare_job_spec);
 5        let firmware_change_result = handle_instance_change_job(
 6            instance,
 7            *configuration.metadata.generation.as_ref().unwrap(),
 8            &firmweare_job_spec,
 9            action,
10            kube_interface,
11        ).await;   

Essentially this part checks if firmwareJobSpec exists in the configuration given and then calls handle_instance_job a function responsible for handling k8s Jobs in Akri.

A firmware flash can be triggered by applying a new Configuration or updating an existing one. When we apply a new configuration the discovery handler is spawned and is responsible for finding devices attached to nodes (virtually or physically). When the detection is over, it’s time for the brokerPods and firmwareJobs to be deployed. The brokerPod holds the load / application utilizing the device. The firmwareJob is responsible for checking the firmware version, of the device and if a newer version is specified then the firmware update is triggered.

When updating a configuration the old one is deleted along with the linked instances and brokerPods or firmwareJobs and a new set of Configuration with the additional pods or firmwareJobs are deployed.

When we add a new device type, if the device is discoverable from the discovery handler, brokerPods and brokerJobs will be deployed utilizing the device.

We are currently working on a refactor of our approach and we will provide an update once we have something concrete.

Conclusion Link to heading

In conclusion, the management of IoT devices within a cloud-native ecosystem is crucial for leveraging the full potential of IoT technology. By addressing the challenges posed by resource constraints at the edge and proposing enhancements to existing frameworks like Akri, this work contributes to more accessible and efficient orchestration solutions for IoT environments. Future efforts will focus on further refining these solutions and integrating them into mainstream IoT management practices.

Leveraging cloud-native principles and addressing the unique challenges of IoT device management, we can create a more cohesive and efficient continuum for the deployment and management of applications in IoT environments.

Stay tuned for updates on this front!