Fast inference engine for Transformer models
-
Updated
Jul 11, 2024 - C++
Fast inference engine for Transformer models
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
TinyChatEngine: On-Device LLM Inference Library
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
A resource-conscious neural network implementation for MCUs
Implementation of a subset of CBP of h.264 encoder
FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch
arm compute library implementation of efficient low precision neural network
Find dominant colors in images with QT and OpenCV, with a nice GUI to show results in 3D color spaces: RGB, HSV, HSL, HWB, CIE XYZ and L*A*B, and the recent OKLAB! Export results to images, .CSV files and palettes for popular software like Photoshop, Paintshop Pro and Corel Draw. Visualize Cube LUTs!
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371
Some codes used for the numerical examples proposed in https://hal.archives-ouvertes.fr/hal-01514987v2 and https://arxiv.org/abs/1705.01446
Int8 quantization in Openvino
Image compression using vector quantization
Modified inference engine for quantized convolution using product quantization
A simple BMP image viewer, converter and editor. App is primarily focused on implementation of own code for working with BMP images
Model Quantization with Pytorch, Tensorflow & Larq
Generating tensorrt model using onnx
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."