Skip to content

FabianSchuetze/convolution_cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High-speed 8bit 3x3 Convolutions and Linear Layers for x86-VNNI (256bit vectors) and Arm v8.6-a instructions.

Motivation:

This repo provides fast and scalable kernels for quantized int8 convolutions and linear layers. I noticed that the quantized Pytorch networks run well below the possible throughput on modern Arm and X86 machines. I wanted to try whether more efficient implementations (individual kernels and full networks) are possible. For example, for a VGG network the speed comparison is:

|                               | Pytorch                  | Library        |
|-------------------------------|--------------------------|----------------|
| X86: Intel Ultra 7 155H       | 326 GFLOPS/sec           | 818 GFLOPS/sec |
| Arm: Graviton V4  (8 Cores)   | 290 GFLOP/sec (QNNPACK)  | 1.5 TFlops/sec |

The implementation in this repo is fast because it uses:

  • Use efficient, scalable kernels
  • Avoid memory allocations at runtime
  • Exploiting compilers optimize for the deployment sizes of the model (as is common for inference-only frameworks)

A few comparisons with libtorch might be interesting:

  • This library does way fewer heap-memory allocations (and none during inference): 140k vs 151
  • There are ~3x fewer L1 cache misses: 3.5% vs 1.2

Build Instructions:

Note that a build requires at least gcc15 or clang19. The code can be built with cmake as:

cmake -S . -B build
cmake --build build 

For comparison, a quantized VGG libtorch model is build too.

Because there are no pre-built libtorch Arm binaries available, libtorch has to be built from source.

The benchmarks are located in the bench folder. For example, one can run:

./bench/benchmark_torch PATH_TO_INT8_PT_FILE
./bench/benchmark_network PATH_TO_INT8_FILE PATH_TO_INPUT_DATA

The quantized pytorch weights (in torchscript format) can be downloaded here: https://drive.google.com/file/d/1aeedspAvXb7UJOMJcu_UCnrPXP5b5-ay/view?usp=drive_link The weights in FBS format can be downloaded here: https://drive.google.com/file/d/1tPdKA3pPHBj5c6Oc_bB__eAOrhjvpSOP/view?usp=drive_link