Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuned Model Inference error: AttributeError: 'NoneType' object has no attribute 'device' #14

Open
0xbitches opened this issue Mar 16, 2023 · 11 comments

Comments

@0xbitches
Copy link

0xbitches commented Mar 16, 2023

Update: for anyone experiencing this issue, see the workaround I posted in #14 (comment)

I tried out the finetune script locally and and it looks like there was no problem with that. However, when trying to run inference, I'm getting AttributeError: 'NoneType' object has no attribute 'device' from bitsandbytes. I've checked and looks like it was an issue related to model sharing on cpu and gpu, but I am not sure which part of this repo is causing that. Any idea?

Relevant issue in bitsandbytes: TimDettmers/bitsandbytes#40

@devilismyfriend
Copy link

Which version of bitsandbytes are you on

@0xbitches
Copy link
Author

@devilismyfriend
0.37.0 latest release

@ItsLogic
Copy link

I have the same issue. I only get it when I try to run inference with my local fine tune, the downloaded one doesn't have the problem.
I am on the latest bits and bytes commit built from source.

@tloen
Copy link
Owner

tloen commented Mar 16, 2023

Maybe try allocating the foundation model on the CPU? device_map={'': 'cpu'}

That might save some VRAM for the LoRA model.

@0xbitches
Copy link
Author

Changing device_map to cpu did not help for me, still getting the same stack trace.
It looks like the downloaded model is using {'base_model': 0} device map, which only loads in GPU.
Local finetune device map looks like:

{'base_model.model.model.embed_tokens': 0, 'base_model.model.model.layers.0': 0, 'base_model.model.model.layers.1': 0, 'base_model.model.model.layers.2': 0, 'base_model.model.model.layers.3': 0, 'base_model.model.model.layers.4': 0, 'base_model.model.model.layers.5': 0, 'base_model.model.model.layers.6': 0, 'base_model.model.model.layers.7': 0, 'base_model.model.model.layers.8': 0, 'base_model.model.model.layers.9': 0, 'base_model.model.model.layers.10': 0, 'base_model.model.model.layers.11': 0, 'base_model.model.model.layers.12': 0, 'base_model.model.model.layers.13': 0, 'base_model.model.model.layers.14': 0, 'base_model.model.model.layers.15': 0, 'base_model.model.model.layers.16': 0, 'base_model.model.model.layers.17': 0, 'base_model.model.model.layers.18': 0, 'base_model.model.model.layers.19': 0, 'base_model.model.model.layers.20': 0, 'base_model.model.model.layers.21': 0, 'base_model.model.model.layers.22': 0, 'base_model.model.model.layers.23': 0, 'base_model.model.model.layers.24': 0, 'base_model.model.model.layers.25': 0, 'base_model.model.model.layers.26': 0, 'base_model.model.model.layers.27': 'cpu', 'base_model.model.model.layers.28': 'cpu', 'base_model.model.model.layers.29': 'cpu', 'base_model.model.model.layers.30': 'cpu', 'base_model.model.model.layers.31': 'cpu', 'base_model.model.model.layers.32': 'cpu', 'base_model.model.model.layers.33': 'cpu', 'base_model.model.model.layers.34': 'cpu', 'base_model.model.model.layers.35': 'cpu', 'base_model.model.model.layers.36': 'cpu', 'base_model.model.model.layers.37': 'cpu', 'base_model.model.model.layers.38': 'cpu', 'base_model.model.model.layers.39': 'cpu', 'base_model.model.model.norm': 'cpu', 'base_model.model.lm_head': 'cpu'}

@0xbitches
Copy link
Author

0xbitches commented Mar 16, 2023

@ItsLogic

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

Looks like the issue is that Peft's load will auto apply a device_map if not specified, which will load some of the model weights with cpu. This is unforunately not compatible with bitsandbytes. Forcing peft to use only the GPU is the workaround I found.

@ItsLogic
Copy link

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

This seems to work for me as well. Cheers now I can use my 13B lora

@paniq
Copy link

paniq commented Mar 17, 2023

Right now I am forcing device_map to use only the GPU, ie adding device_map={'': 0} to PeftModel.from_pretrained, which worked.

had the same problem with the stock generate.py, this fixed it for me as well. can confirm it works on a RTX 3060 with 12GB (9.9GB in use). but nvtop reports only 30% GPU usage. there's a bottleneck somewhere.

Also, uncommenting and executing the original test code failed with the last sample with an OOM error. using the gradio UI i get about 1GB of extra memory used after each request, so i'd say it's a leak. I added import gc; gc.collect() to generate and that seems to fix it, but long responses can also trigger OOM. Limiting tokens to 128 did help.

@ThatCoffeeGuy
Copy link

So to clarify, the changes I had to apply was in generate.py:

model = PeftModel.from_pretrained(
        model, "tloen/alpaca-lora-7b",
        torch_dtype=torch.float16
    )

change this to:

    model = PeftModel.from_pretrained(
        model, "tloen/alpaca-lora-7b",
        torch_dtype=torch.float16,
        device_map={'': 0}
    )

@kooshi
Copy link
Contributor

kooshi commented Mar 23, 2023

This may be fixed by this PEFT PR

@younesbelkada
Copy link

This may be fixed by a recent PR on accelerate that supports weights quantization for dispatch_model function. Related PR: huggingface/accelerate#1237 - can you try to use the main branch of accelerate by installing it from source?

pip install git+https://github.com/huggingface/accelerate

huggingface/peft#115 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants