Quantization on a HuggingFace model #13775
Unanswered
Impasse52
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is using quantization in pipelines possible at all? I'm trying to use a 7B LLaMA based model on a TITAN V (12GB VRAM) but I can't load it as I go out of memory. I've been looking for a solution for a while but I can't seem to find any hints on how to approach this, except for re-implementing the pipeline logic myself which is far from ideal.
Beta Was this translation helpful? Give feedback.
All reactions