mistralrs-quant

History

Name		Name	Last commit message	Last commit date
parent directory ..
kernels		kernels
src		src
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs

README.md

`mistralrs-quant`

An advanced and highly diverse set of quantization techniques. This crate supports both quantization and optimized inference, making it truly unique in its breadth of useability.

It is used by mistral.rs to power ISQ, imatrix collection, and general quantization features.

Currently supported:

GGUF: GgufMatMul(2-8 bit quantization, with imatrix)
Gptq: GptqLayer(with CUDA marlin kernel)
Hqq: HqqLayer (4, 8 bit quantization)
FP8: FP8Linear(optimized on CUDA)
Unquantized (used for ISQ): UnquantLinear
Bnb: BnbLinear (int8, fp4, nf4)

Some kernels are copied or based on implementations in:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

mistralrs-quant

mistralrs-quant

README.md

`mistralrs-quant`

Files

mistralrs-quant

Directory actions

More options

Directory actions

More options

Latest commit

History

mistralrs-quant

Folders and files

parent directory

README.md

mistralrs-quant

`mistralrs-quant`