High-speed 8bit 3x3 Convolutions and Linear Layers for x86-VNNI (256bit vectors) and Arm v8.6-a instructions.

Motivation:

This repo provides fast and scalable kernels for quantized int8 convolutions and linear layers. I noticed that the quantized Pytorch networks run well below the possible throughput on modern Arm and X86 machines. I wanted to try whether more efficient implementations (individual kernels and full networks) are possible. For example, for a VGG network the speed comparison is:

|                               | Pytorch                  | Library        |
|-------------------------------|--------------------------|----------------|
| X86: Intel Ultra 7 155H       | 326 GFLOPS/sec           | 818 GFLOPS/sec |
| Arm: Graviton V4  (8 Cores)   | 290 GFLOP/sec (QNNPACK)  | 1.5 TFlops/sec |

The implementation in this repo is fast because it uses:

Use efficient, scalable kernels
Avoid memory allocations at runtime
Exploiting compilers optimize for the deployment sizes of the model (as is common for inference-only frameworks)

A few comparisons with libtorch might be interesting:

This library does way fewer heap-memory allocations (and none during inference): 140k vs 151
There are ~3x fewer L1 cache misses: 3.5% vs 1.2

Build Instructions:

Note that a build requires at least gcc15 or clang19. The code can be built with cmake as:

cmake -S . -B build
cmake --build build

For comparison, a quantized VGG libtorch model is build too.

Because there are no pre-built libtorch Arm binaries available, libtorch has to be built from source.

The benchmarks are located in the bench folder. For example, one can run:

./bench/benchmark_torch PATH_TO_INT8_PT_FILE
./bench/benchmark_network PATH_TO_INT8_FILE PATH_TO_INPUT_DATA

The quantized pytorch weights (in torchscript format) can be downloaded here: https://drive.google.com/file/d/1aeedspAvXb7UJOMJcu_UCnrPXP5b5-ay/view?usp=drive_link The weights in FBS format can be downloaded here: https://drive.google.com/file/d/1tPdKA3pPHBj5c6Oc_bB__eAOrhjvpSOP/view?usp=drive_link

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
bench		bench
convolution		convolution
tests		tests
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build_script.sh		build_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

High-speed 8bit 3x3 Convolutions and Linear Layers for x86-VNNI (256bit vectors) and Arm v8.6-a instructions.

Motivation:

Build Instructions:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

FabianSchuetze/convolution_cpp

Folders and files

Latest commit

History

Repository files navigation

High-speed 8bit 3x3 Convolutions and Linear Layers for x86-VNNI (256bit vectors) and Arm v8.6-a instructions.

Motivation:

Build Instructions:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages