Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantize: Handle user-defined quantization levels for additional tensors #12511

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

EAddario
Copy link

This PR adds the ability to quantize other tensors, beyond token-embedding and output-tensor. It handles most of the supported architectures except Mamba, RWKV6, RWKV6QWEN2 and T5 to avoid having too many command options, but can add as well if maintainers request it.

For full background on the PR, please see: Squeezing Tensor Bits: the quest for smaller LLMs

@EAddario EAddario changed the title Handle user-defined quantization levels for additional tensors quantize: Handle user-defined quantization levels for additional tensors Mar 22, 2025
@max-krasnyansky
Copy link
Collaborator

How about we add a more generic --tensor-type tensor_name_pattern=type.
@slaren has the PR #11397 that overrides backend mapping per tensor.
Let's make this one similar (same patters, etc). That way we'll be able to override specific layers (if needed).

@EAddario
Copy link
Author

That's an excellent idea! and it'll allow to add all supported tensor types (50+) without creating a mess of parameters. Plus, it will give me something to do over the weekend 😆

@jukofyork
Copy link
Contributor

How about we add a more generic --tensor-type tensor_name_pattern=type. @slaren has the PR #11397 that overrides backend mapping per tensor. Let's make this one similar (same patters, etc). That way we'll be able to override specific layers (if needed).

Yeah, I think this is definitely the way to go - the regex support of that PR gives really good flexibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants