We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
The model has been trained on the A100 GPU. However, I am wondering about the GPU memory cost during inference.
Currently, I have a 3060 GPU with 12 VRAM. Can it be used for running inference?
Thank you
The text was updated successfully, but these errors were encountered:
You can just load the model in 8-bit, taking about 7.5GB of memory: load_in_8bit=True https://huggingface.co/docs/transformers/main/en/main_classes/quantization
load_in_8bit=True
Sorry, something went wrong.
No branches or pull requests
Hello,
The model has been trained on the A100 GPU. However, I am wondering about the GPU memory cost during inference.
Currently, I have a 3060 GPU with 12 VRAM. Can it be used for running inference?
Thank you
The text was updated successfully, but these errors were encountered: