-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuned Model Inference error: AttributeError: 'NoneType' object has no attribute 'device' #14
Comments
Which version of bitsandbytes are you on |
@devilismyfriend |
I have the same issue. I only get it when I try to run inference with my local fine tune, the downloaded one doesn't have the problem. |
Maybe try allocating the foundation model on the CPU? That might save some VRAM for the LoRA model. |
Changing device_map to cpu did not help for me, still getting the same stack trace. {'base_model.model.model.embed_tokens': 0, 'base_model.model.model.layers.0': 0, 'base_model.model.model.layers.1': 0, 'base_model.model.model.layers.2': 0, 'base_model.model.model.layers.3': 0, 'base_model.model.model.layers.4': 0, 'base_model.model.model.layers.5': 0, 'base_model.model.model.layers.6': 0, 'base_model.model.model.layers.7': 0, 'base_model.model.model.layers.8': 0, 'base_model.model.model.layers.9': 0, 'base_model.model.model.layers.10': 0, 'base_model.model.model.layers.11': 0, 'base_model.model.model.layers.12': 0, 'base_model.model.model.layers.13': 0, 'base_model.model.model.layers.14': 0, 'base_model.model.model.layers.15': 0, 'base_model.model.model.layers.16': 0, 'base_model.model.model.layers.17': 0, 'base_model.model.model.layers.18': 0, 'base_model.model.model.layers.19': 0, 'base_model.model.model.layers.20': 0, 'base_model.model.model.layers.21': 0, 'base_model.model.model.layers.22': 0, 'base_model.model.model.layers.23': 0, 'base_model.model.model.layers.24': 0, 'base_model.model.model.layers.25': 0, 'base_model.model.model.layers.26': 0, 'base_model.model.model.layers.27': 'cpu', 'base_model.model.model.layers.28': 'cpu', 'base_model.model.model.layers.29': 'cpu', 'base_model.model.model.layers.30': 'cpu', 'base_model.model.model.layers.31': 'cpu', 'base_model.model.model.layers.32': 'cpu', 'base_model.model.model.layers.33': 'cpu', 'base_model.model.model.layers.34': 'cpu', 'base_model.model.model.layers.35': 'cpu', 'base_model.model.model.layers.36': 'cpu', 'base_model.model.model.layers.37': 'cpu', 'base_model.model.model.layers.38': 'cpu', 'base_model.model.model.layers.39': 'cpu', 'base_model.model.model.norm': 'cpu', 'base_model.model.lm_head': 'cpu'} |
Right now I am forcing device_map to use only the GPU, ie adding Looks like the issue is that Peft's load will auto apply a device_map if not specified, which will load some of the model weights with cpu. This is unforunately not compatible with bitsandbytes. Forcing peft to use only the GPU is the workaround I found. |
This seems to work for me as well. Cheers now I can use my 13B lora |
had the same problem with the stock generate.py, this fixed it for me as well. can confirm it works on a RTX 3060 with 12GB (9.9GB in use). but nvtop reports only 30% GPU usage. there's a bottleneck somewhere. Also, uncommenting and executing the original test code failed with the last sample with an OOM error. using the gradio UI i get about 1GB of extra memory used after each request, so i'd say it's a leak. I added |
So to clarify, the changes I had to apply was in generate.py:
change this to:
|
This may be fixed by this PEFT PR |
This may be fixed by a recent PR on pip install git+https://github.com/huggingface/accelerate |
Update: for anyone experiencing this issue, see the workaround I posted in #14 (comment)
I tried out the finetune script locally and and it looks like there was no problem with that. However, when trying to run inference, I'm getting
AttributeError: 'NoneType' object has no attribute 'device'
from bitsandbytes. I've checked and looks like it was an issue related to model sharing on cpu and gpu, but I am not sure which part of this repo is causing that. Any idea?Relevant issue in bitsandbytes: TimDettmers/bitsandbytes#40
The text was updated successfully, but these errors were encountered: