Running applications that need hardware acceleration in public clouds remains a challenge, both for end-users and for service providers. The reasons for this are mostly related to the complicated hardware abstractions that acceleration devices expose, as well as to the complicated software stacks that drive these devices.
In an effort to hide the complexity of the software stack under the hood, and provide end-users with the capability of accelerating their applications, we have introduced vAccel, a hardware acceleration abstraction to semantically expose functions that can be accelerated to workloads running in VMs or even remote hosts.
In this post, we will present how we can use vAccel to remotely execute basic FPGA operations on a PYNQ-Z1 board. First, we go through a brief description of the hardware and software components of this example, as well as the steps to reproduce the experiment. We install the vAccel software stack to the board running a generic linux distribution and run a local example. Then, we run the same example remotely, using a client machine connected to the same network as our development board.
Overview Link to heading
As mentioned above, we are using a PYNQ-Z1 development board. The PYNQ-Z1 board is the hardware platform for the PYNQ open-source framework. It features a Zynq-7000 (XC7Z020-1CLG400C) All Programmable System-On-Chip (APSoC), integrating a feature-rich dual-core Cortex-A9 based processing system (PS) and Xilinx programmable logic (PL) in a single device. Figure 1 shows an image of the PYNQ-Z1 development board by Digilent.
Apart from Petalinux, you can install a generic linux distribution. We recently walked through the process of installing debian on a PYNQ-Z1.
Install vAccel Link to heading
We can use the binary release or build from source. For the sake of completeness we present both options.
Install from binaries Link to heading
Get the deb package for the core vAccelRT library and install it:
1wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/master/aarch32/Release-deb/vaccel-0.5.0-Linux.deb
2sudo dpkg -i vaccel-0.5.0-Linux.deb
We should be presented with a couple of libraries in /usr/local/lib
as well
as some example binaries on /usr/local/bin
.
Skip the next section and go directly to Test the installation.
Build from source Link to heading
Clone the repo and prepare to build:
1git clone https://github.com/cloudkernels/vaccelrt --recursive
2cd vaccelrt
3mkdir -p build && cd build
4cmake ../ -DBUILD_PLUGIN_NOOP=ON -DBUILD_EXAMPLES=ON
To build and install use the following simple command:
1make install
Test the installation Link to heading
To make sure we’ve got everything setup correctly, we can run a couple of examples. First we could use the noop
plugin to do image classification on an image.
1# Set the path to the vAccel libraries
2export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
3
4# Set the plugin to noop
5export VACCEL_BACKENDS=/usr/local/lib/libvaccel-noop.so
6
7# enable Debug
8export VACCEL_DEBUG_LEVEL=4
9
10# Run the classify example
11/usr/local/bin/classify /usr/local/share/images/example.jpg 1
We should be presented with debug output and a dummy classification tag:
1user@debian-fpga:~$ /usr/local/bin/classify /usr/local/share/images/example.jpg 1
22023.01.21-02:46:23.09 - <debug> Initializing vAccel
32023.01.21-02:46:23.09 - <debug> Created top-level rundir: /run/user/1001/vaccel.ogcPti
42023.01.21-02:46:23.09 - <debug> Registered plugin noop
52023.01.21-02:46:23.09 - <debug> Registered function noop from plugin noop
62023.01.21-02:46:23.09 - <debug> Registered function sgemm from plugin noop
72023.01.21-02:46:23.09 - <debug> Registered function image classification from plugin noop
82023.01.21-02:46:23.09 - <debug> Registered function image detection from plugin noop
92023.01.21-02:46:23.09 - <debug> Registered function image segmentation from plugin noop
102023.01.21-02:46:23.09 - <debug> Registered function image pose estimation from plugin noop
112023.01.21-02:46:23.09 - <debug> Registered function image depth estimation from plugin noop
122023.01.21-02:46:23.09 - <debug> Registered function exec from plugin noop
132023.01.21-02:46:23.09 - <debug> Registered function TensorFlow session load from plugin noop
142023.01.21-02:46:23.09 - <debug> Registered function TensorFlow session run from plugin noop
152023.01.21-02:46:23.09 - <debug> Registered function TensorFlow session delete from plugin noop
162023.01.21-02:46:23.09 - <debug> Registered function MinMax from plugin noop
172023.01.21-02:46:23.09 - <debug> Registered function Array copy from plugin noop
182023.01.21-02:46:23.09 - <debug> Registered function Vector Add from plugin noop
192023.01.21-02:46:23.09 - <debug> Registered function Parallel acceleration from plugin noop
202023.01.21-02:46:23.09 - <debug> Registered function Matrix multiplication from plugin noop
212023.01.21-02:46:23.09 - <debug> Loaded plugin noop from /usr/local/lib/libvaccel-noop.so
222023.01.21-02:46:23.09 - <debug> session:1 New session
23Initialized session with id: 1
24Image size: 79281B
252023.01.21-02:46:23.11 - <debug> session:1 Looking for plugin implementing image classification
262023.01.21-02:46:23.11 - <debug> Found implementation in noop plugin
27[noop] Calling Image classification for session 1
28[noop] Dumping arguments for Image classification:
29[noop] len_img: 79281
30[noop] will return a dummy result
31classification tags: This is a dummy classification tag!
322023.01.21-02:46:23.11 - <debug> session:1 Free session
332023.01.21-02:46:23.11 - <debug> Shutting down vAccel
342023.01.21-02:46:23.11 - <debug> Cleaning up plugins
352023.01.21-02:46:23.11 - <debug> Unregistered plugin noop
Run local example Link to heading
An example more tailored to the board we’re running on could be a vector operation, such as a vector addition (as in Ichiro’s example).
We already have a pre-compiled example for a vector addition in
pynq_vector_add_generic
. Lets try to execute it:
1/usr/local/bin/pynq_vector_add_generic
The output is similar to the above. Since we’re using the noop
plugin, the
result is a dummy result, only for debugging.
1user@debian-fpga:~$ /usr/local/bin/pynq_vector_add_generic
22023.01.21-02:49:17.02 - <debug> Initializing vAccel
32023.01.21-02:49:17.02 - <debug> Created top-level rundir: /run/user/1001/vaccel.epcZBL
42023.01.21-02:49:17.02 - <debug> Registered plugin noop
52023.01.21-02:49:17.02 - <debug> Registered function noop from plugin noop
62023.01.21-02:49:17.02 - <debug> Registered function sgemm from plugin noop
72023.01.21-02:49:17.02 - <debug> Registered function image classification from plugin noop
82023.01.21-02:49:17.02 - <debug> Registered function image detection from plugin noop
92023.01.21-02:49:17.02 - <debug> Registered function image segmentation from plugin noop
102023.01.21-02:49:17.02 - <debug> Registered function image pose estimation from plugin noop
112023.01.21-02:49:17.02 - <debug> Registered function image depth estimation from plugin noop
122023.01.21-02:49:17.02 - <debug> Registered function exec from plugin noop
132023.01.21-02:49:17.02 - <debug> Registered function TensorFlow session load from plugin noop
142023.01.21-02:49:17.02 - <debug> Registered function TensorFlow session run from plugin noop
152023.01.21-02:49:17.02 - <debug> Registered function TensorFlow session delete from plugin noop
162023.01.21-02:49:17.02 - <debug> Registered function MinMax from plugin noop
172023.01.21-02:49:17.02 - <debug> Registered function Array copy from plugin noop
182023.01.21-02:49:17.02 - <debug> Registered function Vector Add from plugin noop
192023.01.21-02:49:17.02 - <debug> Registered function Parallel acceleration from plugin noop
202023.01.21-02:49:17.02 - <debug> Registered function Matrix multiplication from plugin noop
212023.01.21-02:49:17.02 - <debug> Loaded plugin noop from /usr/local/lib/libvaccel-noop.so
222023.01.21-02:49:17.02 - <debug> session:1 New session
23Initialized session with id: 1
242023.01.21-02:49:17.02 - <debug> session:1 Looking for plugin implementing fpga_vector_add operation
252023.01.21-02:49:17.02 - <debug> Found implementation in noop plugin
26[noop] Calling v_vectoradd for session 1
27[noop] Dumping arguments for v_vectoradd:
28[noop] len_a: 5 len_b: 5
299.100000
309.100000
319.100000
329.100000
339.100000
342023.01.21-02:49:17.02 - <debug> session:1 Free session
352023.01.21-02:49:17.02 - <debug> Shutting down vAccel
362023.01.21-02:49:17.02 - <debug> Cleaning up plugins
372023.01.21-02:49:17.02 - <debug> Unregistered plugin noop
Get the PYNQ hardware plugin Link to heading
Now that we have established that vAccel is working correctly on the board, we
can use the hardware plugin, built for PYNQ. It implements three vector
operations (vector_add
, array_copy
and mmult
).
To get it, we grab the deb from the binaries page of vAccel:
1wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/plugins/pynq/main/aarch32/Release-deb/vaccelrt-plugin-pynq-0.1-Linux.deb
2dpkg -i vaccelrt-plugin-pynq-0.1-Linux.deb
Once installed, it should place a shared object in /usr/local/lib
:
1$ ls -la /usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
2-rw-r--r-- 1 root root 13144 Dec 25 03:41 /usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
We use this shared object as the vAccel plugin and re-run the pynq_vector_add_generic
program:
1# Set the path to the vAccel libraries
2export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
3
4# Set the plugin to PYNQ
5export VACCEL_BACKENDS=/usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
6
7# enable Debug
8export VACCEL_DEBUG_LEVEL=4
9
10# Run the vector add example
11/usr/local/bin/pynq_vector_add_generic
The output should be like below:
12023.01.21-02:56:07.30 - <debug> Initializing vAccel
22023.01.21-02:56:07.30 - <debug> Created top-level rundir: /run/user/0/vaccel.kPHyje
32023.01.21-02:56:07.31 - <debug> Registered plugin fpga_functions
42023.01.21-02:56:07.31 - <debug> Registered function Array copy from plugin fpga_functions
52023.01.21-02:56:07.31 - <debug> Registered function Vector Add from plugin fpga_functions
62023.01.21-02:56:07.31 - <debug> Registered function Parallel acceleration from plugin fpga_functions
72023.01.21-02:56:07.31 - <debug> Registered function Matrix multiplication from plugin fpga_functions
82023.01.21-02:56:07.31 - <debug> Loaded plugin fpga_functions from /usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
92023.01.21-02:56:07.31 - <debug> session:1 New session
10Initialized session with id: 1
112023.01.21-02:56:07.31 - <debug> session:1 Looking for plugin implementing fpga_vector_add operation
122023.01.21-02:56:07.31 - <debug> Found implementation in fpga_functions plugin
13Calling Vector Add function (FPGA) 1
142.800000
152.100000
168.500000
173.500000
1811.299999
192023.01.21-02:56:07.31 - <debug> session:1 Free session
202023.01.21-02:56:07.31 - <debug> Shutting down vAccel
212023.01.21-02:56:07.31 - <debug> Cleaning up plugins
222023.01.21-02:56:07.31 - <debug> Unregistered plugin fpga_functions
As we can see, it actually performed the addition on the two vectors. See the relevant snippet from the code:
1[...]
2 float a[5] = { 5.0, 1.0, 2.1, 1.2, 5.2 };
3 float b[5] = { -2.2, 1.1, 6.4, 2.3, 6.1 };
4[...]
5 ret = vaccel_fpga_vadd(&sess, a, b, c, len_a, len_b);
Run remote example Link to heading
To be able to run the above example remotely, we need two things:
- run the
vaccelrt-agent
as a vAccel application locally - run the
pynq_vector_add_generic
program on the remote host, using the relevant plugin that enables remote execution (vsock
).
Run the vAccelRT Agent Link to heading
The vAccelRT agent is essentially a vAccel application that on one side consumes the vAccel API, and on the other side listens for gRPC requests from remote hosts. To get it use the following commands:
1wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/agent/59704ec358de8f68345556a774c60788ac957183/aarch32/release/vaccelrt-agent
2chmod +x vaccelrt-agent
To expose the above functionality, we use the exact same environment variables,
only this time we run the agent, not the pynq_vector_add_generic
program:
1# Set the path to the vAccel libraries
2export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
3
4# Set the plugin to PYNQ
5export VACCEL_BACKENDS=/usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
6
7# enable Debug
8export VACCEL_DEBUG_LEVEL=4
9
10# set the local endpoint
11export VACCEL_AGENT_ENDPOINT=tcp://0.0.0.0:8192
12
13# Run the vAccelRT agent
14./vaccelrt-agent -a $VACCEL_AGENT_ENDPOINT
The output should be something like the following:
12023.01.21-03:03:34.06 - <debug> Initializing vAccel
22023.01.21-03:03:34.06 - <debug> Created top-level rundir: /run/user/0/vaccel.apxqms
32023.01.21-03:03:34.06 - <debug> Registered plugin fpga_functions
42023.01.21-03:03:34.06 - <debug> Registered function Array copy from plugin fpga_functions
52023.01.21-03:03:34.06 - <debug> Registered function Vector Add from plugin fpga_functions
62023.01.21-03:03:34.06 - <debug> Registered function Parallel acceleration from plugin fpga_functions
72023.01.21-03:03:34.06 - <debug> Registered function Matrix multiplication from plugin fpga_functions
82023.01.21-03:03:34.06 - <debug> Loaded plugin fpga_functions from /usr/local/lib/arm-linux-gnueabihf/libvaccel-pynq.so
9vaccel ttRPC server started. address: tcp://0.0.0.0:8192
10Server is running, press Ctrl + C to exit
Run the application on the remote host Link to heading
On the remote host, depending on the architecture and variant we need to setup vAccelRT and the relevant plugin that enables remote execution: vaccelrt-plugin-vsock
.
Let’s assume it’s an x86_64
host. The commands needed to setup vAccel are the following:
1# Get & Install vAccelRT core library
2wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/master/x86_64/Release-deb/vaccel-0.5.0-Linux.deb
3dpkg -i vaccel-0.5.0-Linux.deb
4
5# Get & Install the vSock plugin
6wget https://s3.nbfc.io/nbfc-assets/github/vaccelrt/plugins/vsock/master/x86_64/Release-deb/vaccelrt-plugin-vsock-0.1.0-Linux.deb
7dpkg -i vaccelrt-plugin-vsock-0.1.0-Linux.deb
Now we should have the following on /usr/local/lib
:
1$ tree /usr/local/lib/
2/usr/local/lib/
3├── libmytestlib.so
4├── libvaccel-noop.so
5├── libvaccel-python.so
6├── libvaccel.so
7├── libvaccel-vsock.so
8
90 directories, 5 files
As previously the process to execute the vAccel application is the same, with the only difference that we need to point the plugin to the IP address and port where the vAccelRT Agent listens:
1# Set the path to the vAccel libraries
2export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
3
4# enable Debug
5export VACCEL_DEBUG_LEVEL=4
6
7# Set the plugin to VSOCK
8export VACCEL_BACKENDS=/usr/local/lib/libvaccel-vsock.so
9
10# set the IP & port
11export VACCEL_VSOCK=tcp://192.168.4.21:8192
12
13# Run the program
14/usr/local/bin/pynq_vector_add_generic
The output should be something similar to this:
1$ /usr/local/bin/pynq_vector_add_generic
22023.01.20-20:16:25.18 - <debug> Initializing vAccel
32023.01.20-20:16:25.18 - <debug> Created top-level rundir: /run/user/0/vaccel.X1hPel
42023.01.20-20:16:25.20 - <debug> Registered plugin vsock
52023.01.20-20:16:25.20 - <debug> vsock is a VirtIO module
62023.01.20-20:16:25.20 - <debug> Registered function sgemm from plugin vsock
72023.01.20-20:16:25.20 - <debug> Registered function image classification from plugin vsock
82023.01.20-20:16:25.20 - <debug> Registered function image detection from plugin vsock
92023.01.20-20:16:25.20 - <debug> Registered function image segmentation from plugin vsock
102023.01.20-20:16:25.20 - <debug> Registered function image depth estimation from plugin vsock
112023.01.20-20:16:25.20 - <debug> Registered function image pose estimation from plugin vsock
122023.01.20-20:16:25.20 - <debug> Registered function TensorFlow session load from plugin vsock
132023.01.20-20:16:25.20 - <debug> Registered function TensorFlow session delete from plugin vsock
142023.01.20-20:16:25.20 - <debug> Registered function TensorFlow session run from plugin vsock
152023.01.20-20:16:25.20 - <debug> Registered function MinMax from plugin vsock
162023.01.20-20:16:25.20 - <debug> Registered function Array copy from plugin vsock
172023.01.20-20:16:25.20 - <debug> Registered function Matrix multiplication from plugin vsock
182023.01.20-20:16:25.20 - <debug> Registered function Vector Add from plugin vsock
192023.01.20-20:16:25.20 - <debug> Registered function Parallel acceleration from plugin vsock
202023.01.20-20:16:25.20 - <debug> Registered function exec from plugin vsock
212023.01.20-20:16:25.20 - <debug> Loaded plugin vsock from /usr/local/lib/libvaccel-vsock.so
222023.01.20-20:16:25.21 - <debug> [vsock] Initializing session
232023.01.20-20:16:25.21 - <debug> [vsock] New session 1
242023.01.20-20:16:25.21 - <debug> session:1 New session
25Initialized session with id: 1
262023.01.20-20:16:25.21 - <debug> session:1 Looking for plugin implementing fpga_vector_add operation
272023.01.20-20:16:25.21 - <debug> Found implementation in vsock plugin
282.800000
292.100000
308.500000
313.500000
3211.299999
332023.01.20-20:16:25.22 - <debug> [vsock] Destroying session 1
342023.01.20-20:16:25.22 - <debug> [vsock] Destroying vsock client
352023.01.20-20:16:25.22 - <debug> session:1 Free session
362023.01.20-20:16:25.22 - <debug> Shutting down vAccel
372023.01.20-20:16:25.22 - <debug> Cleaning up plugins
382023.01.20-20:16:25.22 - <debug> Unregistered plugin vsock
That’s it! we managed to use a PYNQ-Z1 board to run simple operations on the FPGA fabric from a remote host. In addition to that, the remote host is of different architecture than the PYNQ board.
Future steps Link to heading
As we are far from being experts on Hardware design, we plan to build a more elaborate example for the FPGA board (e.g. an Image inference accelerator using Tensil) and use this as a backend for vAccel’s Image inference API ;)
Give us a shout at team@cloudkernels.net if you liked it, or visit the vAccel website and drop us a note at vaccel@nubificus.co.uk for more info!