Skip to content

Benefits of Using QAT Before GGUF Quantization? #2419

@kiyoonyoo

Description

@kiyoonyoo

Hi,
thank you for the amazing project.

I have a question regarding quantization workflows. Does applying QAT before convering to GGUF format (e.g. using Q4, Q4_K_M) result in better quality fompared to directy quantizing with GGUF alone?

I'm planning to serve my model using llama.cpp, so converting to GGUF is required. I’ve noticed a noticeable quality drop when using methods provided by llama.cpp, so I’m considering trying QAT to mitigate this.

Has anyone experimented with this approach or have any insights to share?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions