TensorRT-LightNet: High-Efficiency and Real-Time CNN Implementation on Edge AI

trt-lightNet is a CNN implementation optimized for edge AI devices that combines the advantages of LightNet ^[1] and TensorRT ^[2]. LightNet is a lightweight and high-performance neural network framework designed for edge devices, while TensorRT is a high-performance deep learning inference engine developed by NVIDIA for optimizing and running deep learning models on GPUs. trt-lightnet uses the Network Definition API provided by TensorRT to integrate LightNet into TensorRT, allowing it to run efficiently and in real-time on edge devices. This is a reproduction of trt-lightnet ^[6], which generates a TensorRT engine from the ONNX format.

Key Improvements

2:4 Structured Sparsity

trt-lightnet utilizes 2:4 structured sparsity ^[3] to further optimize the network. 2:4 structured sparsity means that two values must be zero in each contiguous block of four values, resulting in a 50% reduction in the number of weights. This technique allows the network to use fewer weights and computations while maintaining accuracy.

NVDLA Execution

trt-lightnet also supports the execution of the neural network on the NVIDIA Deep Learning Accelerator (NVDLA) ^[4] , a free and open architecture that provides high performance and low power consumption for deep learning inference on edge devices. By using NVDLA, trt-lightnet can further improve the efficiency and performance of the network on edge devices.

Multi-Precision Quantization

In addition to post training quantization ^[5], trt-lightnet also supports multi-precision quantization, which allows the network to use different precision for weights and activations. By using mixed precision, trt-lightnet can further reduce the memory usage and computational requirements of the network while still maintaining accuracy. By writing it in CFG, you can set the precision for each layer of your CNN.

Multitask Execution (Detection/Segmentation)

trt-lightnet also supports multitask execution, allowing the network to perform both object detection and segmentation tasks simultaneously. This enables the network to perform multiple tasks efficiently on edge devices, saving computational resources and power.

Installation

Requirements

For Local Installation

CUDA 11.0 or later
TensorRT 8.5 or 8.6
cnpy for debug of tensors This repository has been tested with the following environments:
CUDA 11.7 + TensorRT 8.5.2 on Ubuntu 22.04
CUDA 12.2 + TensorRT 8.6.0 on Ubuntu 22.04
CUDA 11.4 + TensorRT 8.6.0 on Jetson JetPack5.1
CUDA 11.8 + TensorRT 8.6.1 on Ubuntu 22.04

For Docker Installation

Docker
NVIDIA Container Toolkit

This repository has been tested with the following environments:

Docker 24.0.7 + NVIDIA Container Toolkit 1.14.3 on Ubuntu 20.04

Steps for Local Installation

Clone the repository.

$ git clone git@github.com:tier4/trt-lightnet.git
$ cd trt-lightnet

Install libraries.

$ sudo apt update
$ sudo apt install libgflags-dev
$ sudo apt install libboost-all-dev
$ sudo apt install libopencv-dev

Install from the following repository.

https://github.com/rogersce/cnpy

Compile the TensorRT implementation.

$ mkdir build && cd build
$ cmake ../
$ make -j

Steps for Docker Installation

Clone the repository.

$ git clone git@github.com:tier4/trt-lightnet.git
$ cd trt-lightnet

Build the docker image.

$ docker build -t trt-lightnet:latest .

Run the docker container.

$ docker run -it --gpus all trt-lightnet:latest

Model

T.B.D

Usage

Converting a LightNet model to a TensorRT engine

Build FP32 engine

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision fp32

Build FP16(HALF) engine

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision fp16

Build INT8 engine
(You need to prepare a list for calibration in "models/calibration_images.txt".)

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision int8 --first true

First layer is much more sensitive for quantization. Threfore, the first layer is not quanitzed using "--first true"

Build DLA engine (Supported by only Xavier and Orin)

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision int8 --first true --dla [0/1]

Inference with the TensorRT engine

Inference from images

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision [fp32/fp16/int8] --first true {--dla [0/1]} --d DIRECTORY

Inference from images

$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision [fp32/fp16/int8] --first true {--dla [0/1]} --v VIDEO

Implementation

trt-lightnet is built on the LightNet framework and integrates with TensorRT using the Network Definition API. The implementation is based on the following repositories:

LightNet: https://github.com/daniel89710/lightNet
TensorRT: https://github.com/NVIDIA/TensorRT
NVIDIA DeepStream SDK: https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/restructure
YOLO-TensorRT: https://github.com/enazoe/yolo-tensorrt
trt-yoloXP: [https://github.com/tier4/trt-yoloXP]

Conclusion

trt-lightnet is a powerful and efficient implementation of CNNs using Edge AI. With its advanced features and integration with TensorRT, it is an excellent choice for real-time object detection and semantic segmentation applications on edge devices.

References

[1]. LightNet
[2]. TensorRT
[3]. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
[4]. NVDLA
[5]. Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT
[6]. lightNet-TR

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
include		include
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT-LightNet: High-Efficiency and Real-Time CNN Implementation on Edge AI

Key Improvements

2:4 Structured Sparsity

NVDLA Execution

Multi-Precision Quantization

Multitask Execution (Detection/Segmentation)

Installation

Requirements

For Local Installation

For Docker Installation

Steps for Local Installation

Steps for Docker Installation

Model

Usage

Converting a LightNet model to a TensorRT engine

Inference with the TensorRT engine

Implementation

Conclusion

References

About

Releases

Packages

Contributors 2

Languages

tier4/trt-lightnet

Folders and files

Latest commit

History

Repository files navigation

TensorRT-LightNet: High-Efficiency and Real-Time CNN Implementation on Edge AI

Key Improvements

2:4 Structured Sparsity

NVDLA Execution

Multi-Precision Quantization

Multitask Execution (Detection/Segmentation)

Installation

Requirements

For Local Installation

For Docker Installation

Steps for Local Installation

Steps for Docker Installation

Model

Usage

Converting a LightNet model to a TensorRT engine

Inference with the TensorRT engine

Implementation

Conclusion

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages