In 2022, NVIDIA released the Jetson Orin modules, specifically designed for extreme computation at the Edge. The NVIDIA Jetson AGX Orin modules deliver up to 275 TOPS of AI performance with power configurable between 15W and 60W.

Figure 1: The NVIDIA Jetson AGX Orin devkit & module

Figure 1: The NVIDIA Jetson AGX Orin devkit & module

These powerful machines, apart from bleeding edge GPUs, feature 12 ARMv8 cores and come with 16GB, 32GB or 64GB of memory. The number of cores, combined with these amounts of RAM appear ideal for some use-cases where multi-tenancy is essential; making use of the virtualization extensions in these cores, we enforce stronger isolation among the workloads running at the Edge. Moreover, with the use of kata-containers, we maintain the cloud-native aspect of the application deployment at the Edge.

Following up on our work with kata-containers and vAccel, we got a couple of Orin nodes and started experimenting. Unfortunately, things are a bit different than with the original Jetson AGX Xavier boards we were used to. The SoC is slightly different, with a fully-featured, multi-core GICv3 implementation.

That’s great news! Right? well… not really, as the necessary control structures for GICv3 are not created during boot, caused by an incomplete interrupts declaration in the device tree.

Vadim & Alexey identified the issue and provided the necessary patches to the device tree source files and .. tada! we can boot a non-emulated VM, using GICv3 on Jetson Orin!

Walk through the issue Link to heading

So, initially, we started to see if the stock kernel has KVM enabled:

1# dmesg |grep -i kvm
2[    0.362967] kvm [1]: IPA Size Limit: 48 bits
3[    0.363130] kvm [1]: VHE mode initialized successfully

Looks like there is support. So we moved on to try running an AWS Firecracker VM. Got our vmlinux, rootfs.img and config.json built for our tests and tried to boot the VM:

1wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/vm-example/aarch64/rust-vmm/vmlinux
2wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/vm-example/aarch64/rootfs.img
3wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/vm-example/aarch64/fc/config_vsock.json
4wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/vm-example/aarch64/fc/firecracker
1# ./firecracker --api-sock fc.sock --config-file config_vsock.json 
22023-02-13T18:54:29.061153660 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:480] Building VMM configured from cmdline json failed: Internal(Vm(VmCreateGIC(CreateGIC(Error(19)))))

We were a bit troubled as the exact same setup was working fine on a Jetson AGX Xavier. We also tried QEMU, with the same results:

1# qemu-system-aarch64 -cpu max -machine virt,gic-version=3,kernel-irqchip=on -m 1024 -nographic -monitor none -kernel /boot/Image -enable-kvm
2qemu-system-aarch64: gic-version=3 is not supported with kernel-irqchip=off

but if we disable KVM on qemu, we can see the kernel booting:

 1# qemu-system-aarch64 -cpu max -machine virt,gic-version=3,kernel-irqchip=on -m 1024 -nographic -monitor none -kernel /boot/Image
 2[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
 3[    0.000000] Linux version 5.10.65-tegra (buildbrain@mobile-u64-5266-d7000) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2020.08) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #1 SMP PREEMPT Tue Mar 15 00:53:43 PDT 2022
 4[    0.000000] OF: fdt: memory scan node memory@40000000, reg size 16,
 5[    0.000000] OF: fdt:  - 40000000 ,  40000000
 6[    0.000000] Machine model: linux,dummy-virt
 7[    0.000000] efi: UEFI not found.
 8[    0.000000] Zone ranges:
 9[    0.000000]   DMA      [mem 0x0000000040000000-0x000000007fffffff]
10[    0.000000]   DMA32    empty
11[    0.000000]   Normal   empty
12[    0.000000] Movable zone start for each node
13[snipped]

So after digging into this issue, googling and trying various workarounds, we came across the points raised above about the GICv3 initialization.

Solution Link to heading

Following the instructions on how to build a kernel for the Jetson AGX Orin series, we get the sources and patch the device tree sources using the following snippet:

 1--- Linux_for_Tegra/source/public/hardware/nvidia/soc/t23x/kernel-dts/tegra234-soc/tegra234-soc-minimal.dtsi.orig	2022-08-11 03:14:51.000000000 +0000
 2+++ Linux_for_Tegra/source/public/hardware/nvidia/soc/t23x/kernel-dts/tegra234-soc/tegra234-soc-minimal.dtsi		2023-02-12 09:07:10.259761186 +0000
 3@@ -43,6 +43,10 @@
 4 		reg = <0x0 0x0f400000 0x0 0x00010000    /* GICD */
 5 		       0x0 0x0f440000 0x0 0x00200000>;  /* GICR CPU 0-15 */
 6 		ranges;
 7+		interrupts = <GIC_PPI 9
 8+                       (GIC_CPU_MASK_SIMPLE(8) | IRQ_TYPE_LEVEL_HIGH)>;
 9+                interrupt-parent = <&intc>;
10+
11 		status = "disabled";
12 
13 		gic_v2m: v2m@f410000 {

We build the kernel using the following command:

1Linux_for_Tegra/source/public# ./nvbuild.sh -o kernel_out_updated

Essentially, just the device tree is needed:

1kernel_out_updated/arch/arm64/boot/dts/nvidia/tegra234-p3701-0000-p3737-0000.dtb

We copy this file to the board at the following directory:

1/boot/dtb

and tweak the boot loader config to load the updated device-tree file, instead of the default one:

 1--- /boot/extlinux/extlinux.conf.orig	2023-02-13 18:41:26.208771762 +0000
 2+++ /boot/extlinux/extlinux.conf	2023-02-13 18:41:37.452854874 +0000
 3@@ -1,6 +1,6 @@
 4 LABEL primary
 5       MENU LABEL primary kernel
 6       LINUX /boot/Image
 7-      FDT /boot/dtb/kernel_tegra234-p3701-0000-p3737-0000.dtb
 8+      FDT /boot/dtb/tegra234-p3701-0000-p3737-0000.dtb
 9       INITRD /boot/initrd
10       APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200 console=ttyAMA0,115200 console=tty0 firmware_class.path=/etc/firmware fbcon=map:0 net.ifnames=0

And now we’re ready for reboot! Assuming all went well, you’ll end up with the following in the kernel boot logs:

1# dmesg |grep -i kvm
2[    3.048360] kvm [1]: IPA Size Limit: 48 bits
3[    3.052646] kvm [1]: GICv3: no GICV resource entry
4[    3.057437] kvm [1]: disabling GICv2 emulation
5[    3.061891] kvm [1]: GIC system register CPU interface enabled
6[    3.067830] kvm [1]: vgic interrupt IRQ9
7[    3.071899] kvm [1]: VHE mode initialized successfully

And if we try spawning a firecracker VM as above, we get the following:

  1# ./bin/firecracker --config-file config_vsock.json --api-sock fc.sock
  2[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]
  3[    0.000000] Linux version 5.10.0 (runner@gh-cloud-pod-ckp6h) (gcc (Ubuntu/Linaro 8.4.0-1ubuntu1~18.04) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.30) #1 SMP Wed May 4 06:10:52 UTC 2022
  4[    0.000000] Machine model: linux,dummy-virt
  5[    0.000000] earlycon: uart0 at MMIO 0x0000000040003000 (options '')
  6[    0.000000] printk: bootconsole [uart0] enabled
  7[    0.000000] efi: UEFI not found.
  8[    0.000000] NUMA: No NUMA configuration found
  9[    0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x000000017fffffff]
 10[    0.000000] NUMA: NODE_DATA [mem 0x17f6d7900-0x17f6f8fff]
 11[    0.000000] Zone ranges:
 12[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000bfffffff]
 13[    0.000000]   DMA32    [mem 0x00000000c0000000-0x00000000ffffffff]
 14[    0.000000]   Normal   [mem 0x0000000100000000-0x000000017fffffff]
 15[    0.000000] Movable zone start for each node
 16[    0.000000] Early memory node ranges
 17[    0.000000]   node   0: [mem 0x0000000080000000-0x000000017fffffff]
 18[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000017fffffff]
 19[    0.000000] On node 0 totalpages: 1048576
 20[    0.000000]   DMA zone: 4096 pages used for memmap
 21[    0.000000]   DMA zone: 0 pages reserved
 22[    0.000000]   DMA zone: 262144 pages, LIFO batch:63
 23[    0.000000]   DMA32 zone: 4096 pages used for memmap
 24[    0.000000]   DMA32 zone: 262144 pages, LIFO batch:63
 25[    0.000000]   Normal zone: 8192 pages used for memmap
 26[    0.000000]   Normal zone: 524288 pages, LIFO batch:63
 27[    0.000000] psci: probing for conduit method from DT.
 28[    0.000000] psci: PSCIv1.0 detected in firmware.
 29[    0.000000] psci: Using standard PSCI v0.2 function IDs
 30[    0.000000] psci: Trusted OS migration not required
 31[    0.000000] psci: SMC Calling Convention v1.1
 32[    0.000000] percpu: Embedded 22 pages/cpu s49944 r8192 d31976 u90112
 33[    0.000000] pcpu-alloc: s49944 r8192 d31976 u90112 alloc=22*4096
 34[    0.000000] pcpu-alloc: [0] 0 [0] 1 
 35[    0.000000] Detected PIPT I-cache on CPU0
 36[    0.000000] CPU features: detected: GIC system register CPU interface
 37[    0.000000] CPU features: detected: Hardware dirty bit management
 38[    0.000000] CPU features: detected: Spectre-v4
 39[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
 40[    0.000000] Policy zone: Normal
 41[    0.000000] Kernel command line: console=ttyS0 reboot=k panic=1 pci=off loglevel=8 root=/dev/vda ip=172.42.0.2::172.42.0.1:255.255.255.0::eth0:off random.trust_cpu=on root=/dev/vda rw earlycon=uart,mmio,0x40003000
 42[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
 43[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
 44[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
 45[    0.000000] software IO TLB: mapped [mem 0x00000000bbfff000-0x00000000bffff000] (64MB)
 46[    0.000000] Memory: 4024740K/4194304K available (8064K kernel code, 7700K rwdata, 2060K rodata, 1408K init, 3005K bss, 169564K reserved, 0K cma-reserved)
 47[    0.000000] random: get_random_u64 called from __kmem_cache_create+0x2c/0x4a0 with crng_init=0
 48[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
 49[    0.000000] rcu: Hierarchical RCU implementation.
 50[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=2.
 51[    0.000000] 	Tracing variant of Tasks RCU enabled.
 52[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
 53[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
 54[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
 55[    0.000000] GICv3: 96 SPIs implemented
 56[    0.000000] GICv3: 0 Extended SPIs implemented
 57[    0.000000] GICv3: Distributor has no Range Selector support
 58[    0.000000] GICv3: 16 PPIs implemented
 59[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000003ffb0000
 60[    0.000000] arch_timer: cp15 timer(s) running at 31.25MHz (virt).
 61[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xe6a171046, max_idle_ns: 881590405314 ns
 62[    0.000003] sched_clock: 56 bits at 31MHz, resolution 32ns, wraps every 4398046511088ns
 63[    0.001027] Console: colour dummy device 80x25
 64[    0.001600] Calibrating delay loop (skipped), value calculated using timer frequency.. 62.50 BogoMIPS (lpj=125000)
 65[    0.003003] pid_max: default: 32768 minimum: 301
 66[    0.003621] LSM: Security Framework initializing
 67[    0.004239] SELinux:  Initializing.
 68[    0.004705] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
 69[    0.005642] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
 70[    0.007399] rcu: Hierarchical SRCU implementation.
 71[    0.008182] EFI services will not be available.
 72[    0.008825] smp: Bringing up secondary CPUs ...
 73[    0.016089] Detected PIPT I-cache on CPU1
 74[    0.016135] GICv3: CPU1: found redistributor 1 region 0:0x000000003ffd0000
 75[    0.016251] CPU1: Booted secondary processor 0x0000000001 [0x410fd421]
 76[    0.016842] smp: Brought up 1 node, 2 CPUs
 77[    0.019483] SMP: Total of 2 processors activated.
 78[    0.020055] CPU features: detected: Privileged Access Never
 79[    0.020721] CPU features: detected: LSE atomic instructions
 80[    0.021326] CPU features: detected: User Access Override
 81[    0.021913] CPU features: detected: 32-bit EL0 Support
 82[    0.022464] CPU features: detected: Common not Private translations
 83[    0.023120] CPU features: detected: RAS Extension Support
 84[    0.023723] CPU features: detected: Data cache clean to the PoU not required for I/D coherence
 85[    0.024694] CPU features: detected: CRC32 instructions
 86[    0.025269] CPU features: detected: Speculative Store Bypassing Safe (SSBS)
 87[    0.051711] CPU: All CPU(s) started at EL1
 88[    0.052455] alternatives: patching kernel code
 89[    0.054839] devtmpfs: initialized
 90[    0.056234] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
 91[    0.057936] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
 92[    0.059329] DMI not present or invalid.
 93[    0.060220] NET: Registered protocol family 16
 94[    0.062118] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
 95[    0.064139] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
 96[    0.066196] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
 97[    0.067848] audit: initializing netlink subsys (disabled)
 98[    0.069643] audit: type=2000 audit(0.068:1): state=initialized audit_enabled=0 res=1
 99[    0.069818] thermal_sys: Registered thermal governor 'fair_share'
100[    0.071171] thermal_sys: Registered thermal governor 'step_wise'
101[    0.071923] thermal_sys: Registered thermal governor 'user_space'
102[    0.072795] cpuidle: using governor ladder
103[    0.074720] cpuidle: using governor menu
104[    0.075499] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
105[    0.076945] ASID allocator initialised with 65536 entries
106[    0.085849] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
107[    0.087077] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
108[    0.088149] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
109[    0.089190] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
110[    0.093766] iommu: Default domain type: Translated 
111[    0.094597] SCSI subsystem initialized
112[    0.095080] pps_core: LinuxPPS API ver. 1 registered
113[    0.095636] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
114[    0.096684] PTP clock support registered
115[    0.097441] NetLabel: Initializing
116[    0.097830] NetLabel:  domain hash size = 128
117[    0.098321] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
118[    0.098967] NetLabel:  unlabeled traffic allowed by default
119[    0.099750] clocksource: Switched to clocksource arch_sys_counter
120[    0.100568] VFS: Disk quotas dquot_6.6.0
121[    0.101037] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
122[    0.101952] FS-Cache: Loaded
123[    0.102771] CacheFiles: Loaded
124[    0.104955] NET: Registered protocol family 2
125[    0.105781] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
126[    0.107212] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
127[    0.108659] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
128[    0.110424] TCP: Hash tables configured (established 32768 bind 32768)
129[    0.111486] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
130[    0.112545] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
131[    0.113614] NET: Registered protocol family 1
132[    0.114900] Initialise system trusted keyrings
133[    0.115467] Key type blacklist registered
134[    0.116112] workingset: timestamp_bits=36 max_order=20 bucket_order=0
135[    0.118568] squashfs: version 4.0 (2009/01/31) Phillip Lougher
136[    0.132869] Key type asymmetric registered
137[    0.133466] Asymmetric key parser 'x509' registered
138[    0.134217] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
139[    0.136154] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
140[    0.137435] printk: console [ttyS0] disabled
141[    0.138118] 40003000.uart: ttyS0 at MMIO 0x40003000 (irq = 14, base_baud = 1500000) is a 16550A
142[    0.139487] printk: console [ttyS0] enabled
143[    0.139487] printk: console [ttyS0] enabled
144[    0.140875] printk: bootconsole [uart0] disabled
145[    0.140875] printk: bootconsole [uart0] disabled
146[    0.142604] cacheinfo: Unable to detect cache hierarchy for CPU 0
147[    0.146080] loop: module loaded
148[    0.146816] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)
149[    0.147764] vda: detected capacity change from 0 to 1073741824
150[    0.149279] Loading iSCSI transport class v2.0-870.
151[    0.150525] iscsi: registered transport (tcp)
152[    0.151107] tun: Universal TUN/TAP device driver, 1.6
153[    0.152381] rtc-pl031 40004000.rtc: designer ID = 0x41
154[    0.153012] rtc-pl031 40004000.rtc: revision = 0x1
155[    0.153898] rtc-pl031 40004000.rtc: char device (254:0)
156[    0.154530] rtc-pl031 40004000.rtc: registered as rtc0
157[    0.155158] rtc-pl031 40004000.rtc: setting system clock to 2023-02-13T19:08:49 UTC (1676315329)
158[    0.156324] hid: raw HID events driver (C) Jiri Kosina
159[    0.157485] Initializing XFRM netlink socket
160[    0.158395] NET: Registered protocol family 10
161[    0.160444] Segment Routing with IPv6
162[    0.160971] NET: Registered protocol family 17
163[    0.161565] Key type dns_resolver registered
164[    0.162179] NET: Registered protocol family 40
165[    0.163868] registered taskstats version 1
166[    0.164554] Loading compiled-in X.509 certificates
167[    0.166411] Loaded X.509 cert 'Build time autogenerated kernel key: 5c87d35d601eb9a30312d062d8891ae920951e61'
168[    0.167750] Key type ._fscrypt registered
169[    0.168431] Key type .fscrypt registered
170[    0.169023] Key type fscrypt-provisioning registered
171[    0.170251] Key type encrypted registered
1722023-02-13T19:08:49.402741111 [anonymous-instance:ERROR:src/devices/src/virtio/net/device.rs:390] Failed to write to tap: Os { code: 5, kind: Other, message: "Input/output error" }
173[    0.187729] IP-Config: Complete:
174[    0.188127]      device=eth0, hwaddr=aa:fc:00:00:00:01, ipaddr=172.42.0.2, mask=255.255.255.0, gw=172.42.0.1
175[    0.189265]      host=172.42.0.2, domain=, nis-domain=(none)
176[    0.189908]      bootserver=255.255.255.255, rootserver=255.255.255.255, rootpath=
177[    0.190773] 
178[    0.193054] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
179[    0.194395] VFS: Mounted root (ext4 filesystem) on device 254:0.
180[    0.195952] devtmpfs: mounted
181[    0.197005] Freeing unused kernel memory: 1408K
182[    0.215773] Run /sbin/init as init process
183[    0.216663]   with arguments:
184[    0.217281]     /sbin/init
185[    0.217844]   with environment:
186[    0.218496]     HOME=/
187[    0.218989]     TERM=linux
188[    0.219552]     pci=off
189SELinux:  Could not open policy file <= /etc/selinux/targeted/policy/policy.33:  No such file or directory
190[    0.258127] systemd[1]: Failed to find module 'autofs4'
191[    0.263505] systemd[1]: systemd 245.4-4ubuntu3.11 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
192[    0.266780] systemd[1]: Detected architecture arm64.
193
194Welcome to Ubuntu 20.04.2 LTS!
195
196[snipped]

Initially, we got an RTC initialization error, so in case you get such an issue, disable pl031_driver_init through the initcall_blacklist kernel cmdline option:

 1--- config_vsock.json.orig	2023-02-13 19:17:43.750699285 +0000
 2+++ config_vsock.json	2023-02-13 19:17:57.154793334 +0000
 3@@ -1,7 +1,7 @@
 4 {
 5 	"boot-source": {
 6 		"kernel_image_path": "vmlinux",
 7-		"boot_args": "console=ttyS0 reboot=k panic=1 pci=off loglevel=8 root=/dev/vda ip=172.42.0.2::172.42.0.1:255.255.255.0::eth0:off random.trust_cpu=on"
 8+		"boot_args": "console=ttyS0 reboot=k panic=1 pci=off loglevel=8 root=/dev/vda ip=172.42.0.2::172.42.0.1:255.255.255.0::eth0:off random.trust_cpu=on initcall_blacklist=pl031_driver_init"
 9 	},
10 	"drives": [
11 		{

Future steps Link to heading

Give us a shout at team@cloudkernels.net if you liked it! The plan is to use these boards to expose acceleration functionality in isolated workloads running on sandboxed containers using kata-containers and vAccel. Take a sneak peek on what we are working on here and stay tuned for the next post where we describe the process to build and run a vAccel-enabled sandboxed container on a Jetson Orin!