As the Internet of Things (IoT) continues to expand, managing a large number of heterogeneous devices with diverse requirements and characteristics has become essential. The Cloud computing ecosystem has provided a vast amount of tools to efficiently manage workload deployments. However, there is a huge gap between the requirements of Cloud infrastructure as opposed to Edge or IoT infrastructure. One crucial difference is the diversity and resource constraints of IoT devices. As billions of devices become part of our environment, the need for secure, robust device management and seamless integration into higher-level orchestration frameworks becomes increasingly critical.
Kubernetes (K8s) has emerged as the standard for container orchestration but dealing with resource-constrained edge devices poses significant challenges. This blog post explores the challenges and proposed solutions for managing IoT device firmware in the context of cloud-native orchestration.
Challenges in IoT Device Management Link to heading
Managing IoT devices presents unique challenges due to the diversity of IoT ecosystems, the scale of deployments, and the necessity for secure and efficient operations. These challenges are categorized into three main areas:
- Firmware Management: Ensuring that all IoT devices are running the latest firmware versions securely and efficiently.
- Orchestration: Integrating IoT devices into existing orchestration frameworks without overburdening their limited resources.
- Security and On-boarding: Securely on-boarding new devices and managing the security of existing devices throughout their lifecycle .
The Cloud-native Ecosystem Link to heading
The cloud-native concept has revolutionized the design, development, deployment, and management of applications by leveraging cloud computing principles such as scalability, resilience, automation, and agility.
Containerization and micro-services are essential to this approach, providing flexibility, efficient resource utilization, and consistency across diverse environments.
However, deploying containers on IoT devices is not entirely feasible, due to their limited processing power, memory, and storage. To address these constraints, the community has proposed alternative solutions, including cloud-native management frameworks running as micro-services on edge devices close to the IoT infrastructure. These edge devices act as proxies or gateways, facilitating communication with the IoT devices.
IoT Device Management Frameworks Link to heading
Two popular cloud-native management frameworks for Edge/IoT devices are
KubeEdge
and Akri
. These
frameworks integrate with the cloud-native ecosystem, offering various benefits
and facing certain limitations:
KubeEdge
: Extends Kubernetes to the edge, providing infrastructure support for edge computing applications. It enables centralized management of edge nodes and devices, offering features like device management, data synchronization, and edge application deployment.
Akri
: Focuses on edge device discovery and management, simplifying the integration of IoT devices into Kubernetes clusters.Akri
automatically detects and registers IoT devices, making them available to applications running in the cluster.
None of the above frameworks, however, provide a pure cloud-native approach to firmware updating. Specifically, both use custom containers that can fetch the firmware from specific locations available either locally or publicly, and use custom tools to flash the firmware directly to the device. This means that none of these frameworks take advantage of the unique characteristics of the OCI spec to leverage software delivery benefits seamlessly.
Enhancements to Akri
Link to heading
Earlier this year, we shared our take on how to tackle these challenges in a
research paper presented at
the MECC
workshop in EuroSys
2024. In this work, we introduce enhancements to
the Akri
framework to reduce resource utilization on edge gateways, moving
towards a fully unified infrastructure management solution based on
cloud-native concepts. These enhancements aim to simplify IoT device firmware
management and improve the efficiency of IoT device firmware upgrades across
the Cloud-Edge-IoT continuum.
Essentially, Akri
is a fully-featured, modular framework to manage IoT device
applications. When it comes to firmware flashing (e.g. OTA
updates), Akri
relies on the user to provide a custom container image that is responsible for
fetching the firmware, communicating with the device and eventually flashing
the firmware on the device. Our approach is to leverage Akri
for the device
identification and mapping, while at the same time add the functionality to
define which devices we would like to upgrade / re-purpose. A proof of concept
has been implemented and the code
changes
are minimal.
In essence, we add an extra type of job in the Akri
Controller logic.
Initially, the Akri
Configuration must include additional values such as the
FirmwareJobSpec
, responsible for managing the firmware of the leaf device.
Additionally, the internal structures holding the fields of the Configuration
CRD
need to be expanded. More importantly, the Controller needs another piece
of logic to check whether there is a firmwareJobSpec
field in the
configuration, and if positive, deploy the firmware jobs accordingly.
The only difference in terms of action compared to vanilla Akri
is
the scheduling of the firmwareJob
, which is handled by the Controller.
In more detail, the Akri
Configuration CRD
needs additional fields to hold the
values of firmwareJob
:
1# deployment/helm/crds/akri-configuration-crd.yaml
2 spec:
3 type: object
4 properties:
5 firmwareJobSpec: # {{JobSpec}
6 x-kubernetes-preserve-unknown-fields: true
7 type: object
8 nullable: true
9 discoveryHandler: # {{DiscoveryHandlerInfo}}
10 type: object
11 properties:
12 name:
13 type: string
14 discoveryDetails:
15 type: string
16 discoveryProperties:
17 nullable: true
18 type: array
The logic to handle the extra configuration field resides in controller/src/util/instance_action.rs
1controller/src/util/instance_action.rs
2
3 if let Some(firmweare_job_spec) = &configuration.spec.firmware_job_spec {
4 trace!("about to handle the firmware job spec {:?}", firmweare_job_spec);
5 let firmware_change_result = handle_instance_change_job(
6 instance,
7 *configuration.metadata.generation.as_ref().unwrap(),
8 &firmweare_job_spec,
9 action,
10 kube_interface,
11 ).await;
Essentially this part checks if firmwareJobSpec
exists in the configuration
given and then calls handle_instance_job
a function responsible for handling
k8s Jobs in Akri
.
A firmware flash can be triggered by applying a new Configuration or updating
an existing one. When we apply a new configuration the discovery handler is
spawned and is responsible for finding devices attached to nodes (virtually or
physically). When the detection is over, it’s time for the brokerPods
and
firmwareJobs
to be deployed. The brokerPod
holds the load / application
utilizing the device. The firmwareJob
is responsible for checking the
firmware version, of the device and if a newer version is specified then the
firmware update is triggered.
When updating a configuration the old one is deleted along with the linked
instances and brokerPods
or firmwareJobs
and a new set of Configuration
with the additional pods or firmwareJobs
are deployed.
When we add a new device type, if the device is discoverable from the discovery
handler, brokerPods
and brokerJobs
will be deployed utilizing the device.
We are currently working on a refactor of our approach and we will provide an update once we have something concrete.
Conclusion Link to heading
In conclusion, the management of IoT devices within a cloud-native ecosystem is
crucial for leveraging the full potential of IoT technology. By addressing the
challenges posed by resource constraints at the edge and proposing enhancements
to existing frameworks like Akri
, this work contributes to more accessible and
efficient orchestration solutions for IoT environments. Future efforts will
focus on further refining these solutions and integrating them into mainstream
IoT management practices.
Leveraging cloud-native principles and addressing the unique challenges of IoT device management, we can create a more cohesive and efficient continuum for the deployment and management of applications in IoT environments.
Stay tuned for updates on this front!