TVM
We use TVM to compile our neural networks to run on the GPU of the robot. The networks are compiled to Vulkan shaders which run on many different GPUs including non Nvidia GPUs that don’t support CUDA. This results in a significant speedup compared to running the networks on the CPU.
Installation on the robot
Install helpful Vulkan tools
sudo apt install vulkan-tools
Add user to group
render
andvideo
sudo usermod -a -G render,video $USER
Install TVM python directory with
pip install ./python
Install additional deps
pip install onnx onnxsim xgboost
Compile a YOEO model directly
If you use YOEO you can export the model to onnx with the yoeo-to-onnx
command. See yoeo-to-onnx --help
for more information.
You can then use the onnx model to compile it with TVM using the yoeo-onnx-to-tvm
command. See yoeo-onnx-to-tvm --help
for more information.
By doing so you can skip most of the steps above that use the tvmc
cli.
How to optimize a model
Run the following commands if you are annoyed by deprecation warnings
export PYTHONWARNINGS="ignore"
Use tvmc (located in
~/.local/bin
, therefore check if it is in thePATH
!) for all simple TVM related things. If it is not in thePATH
you can add it withexport PATH=$PATH:~/.local/bin
You can add this to your
.bashrc
(if you use bash) or.zshrc
(if you use zsh) to make it permanent.Simplify your onnx model using
onnxsim <input_model> <output_model>
This step is needed as onnx can be quite convoluted and TVM does not support all operations.
Tune your simplified onnx model using
tvmc tune --target "vulkan" --target-vulkan-from_device 0 --enable-autoscheduler --output <optimization_run_name>.json --enable-autoscheduler --repeat 5 --number 50 <onnx_file>
Only do this on the “NUCs” because the model will be optimized for the specific hardware.
If you only optimize for CPU select target “llvm” and possibly narrow it down further by setting llvm related settings, but we focus Vulkan (GPU) optimization for the rest of this guide.
Remember to replace the placeholders in the command.
Check with
radeontop
if the GPU is utilized.The optimization might take hours or even days.
Compile the model using the optimizations in the json file from the previous tuning. To do this run
tvmc compile --target "vulkan" --target-vulkan-from_device 0 --output <output_name>.tar --tuning-records <optimization_run_name>.json <onnx_file>
You also want to do this on the “NUC” because the model will be optimized for the specific hardware.
This should not take hours.
Remember to replace all placeholders in the command
- You will hopefully end up with a
.tar
file containing the following items mod.so
mod.params
mod.json
- You will hopefully end up with a
Run your compiled model
To test the model run the following command
tvmc run <model_name>.tar --profile --print-time --device "vulkan" --repeat 100
The command shows you the profiling of each layer. Check if they all run on
vulkan0
. At the bottom, a timing benchmark is printed.
Run the model using the Python API
Extract the tar, using tar -xf <your_file>
.
The following code snippet shows how to run a YOEO model using the Python API.
Normally the bitbots_vision
is used to run the compiled model.
import numpy as np import tvm from tvm.contrib import graph_executor # Load model binary_lib = tvm.runtime.load_module("mod.so") loaded_params = bytearray(open("mod.params", "rb").read()) loaded_json = open("mod.json").read() # Create model module module = graph_executor.create(loaded_json, binary_lib, tvm.vulkan(0)) # Replace with tvm.cpu(0) for CPU models module.load_params(loaded_params) # Create dummy data input_data = np.random.uniform(size=(1,3, 416, 416)).astype("float32") # Run the network module.set_input(InputLayer=input_data) module.run() yolo_detections, segmentation = module.get_output(0).numpy(), module.get_output(1).numpy() print(yolo_detections.shape, segmentation.shape)