Benefits of Using QAT Before GGUF Quantization?

Hi,
thank you for the amazing project.

I have a question regarding quantization workflows. Does applying QAT before convering to GGUF format (e.g. using `Q4, Q4_K_M`) result in better quality fompared to directy quantizing with GGUF alone?

I'm planning to serve my model using llama.cpp, so converting to GGUF is required. I’ve noticed a noticeable quality drop when using methods provided by llama.cpp, so I’m considering trying QAT to mitigate this.

Has anyone experimented with this approach or have any insights to share?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benefits of Using QAT Before GGUF Quantization? #2419

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benefits of Using QAT Before GGUF Quantization? #2419

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions