Quantization on a HuggingFace model #13775

Impasse52 · 2025-03-20T15:36:50Z

Impasse52
Mar 20, 2025

Is using quantization in pipelines possible at all? I'm trying to use a 7B LLaMA based model on a TITAN V (12GB VRAM) but I can't load it as I go out of memory. I've been looking for a solution for a while but I can't seem to find any hints on how to approach this, except for re-implementing the pipeline logic myself which is far from ideal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Quantization on a HuggingFace model #13775

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Quantization on a HuggingFace model #13775

Uh oh!

Uh oh!

Impasse52 Mar 20, 2025

Replies: 0 comments

Impasse52
Mar 20, 2025