Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any idea if this will work on CPU? #36

Open
spikespiegel opened this issue Apr 18, 2023 · 6 comments
Open

Any idea if this will work on CPU? #36

spikespiegel opened this issue Apr 18, 2023 · 6 comments

Comments

@spikespiegel
Copy link

First of all, thanks for this great project! The output quality seems very good, and the idea of running a multimodal model to work locally is awesome. It seems we already have a GPT-4 like multimodal model in our hands, so very exciting. I was wondering if it is possible to run with llama.cpp on CPU? I am currently running Vicuna-13b on CPU (the 4-bit quantized version) - around 8 GB Ram is enough. It works just fine, and the inference speed is about 1.5 tokens per second for my computer. (lt also seems to work on mobile phones with enough memory. I did not try it, but I saw a few examples). llama.cpp has their own file format (ggml), and provide a way to convert from original weights to ggml. It would be great if people with low VRAM or no VRAM can make it work on CPU. Any thoughts?

@kenneth104
Copy link

yep,I need help to run on CPU

I am downloading and merging models

@webnizam
Copy link

Hi! Has there been any progress on running it on a CPU? I'm really interested in this as well, since I don't have a powerful GPU. Any updates or workarounds you've discovered would be greatly appreciated. Thanks!

@rdkmaster
Copy link

yep,I need help to run on CPU

I am downloading and merging models

@kenneth104
Any progress of using CPU was made? Can you share?

@kenneth104
Copy link

yep,I need help to run on CPU
I am downloading and merging models

@kenneth104 Any progress of using CPU was made? Can you share?

noI can’t run on CPU, something need cuda and output error

@liyaozong1991
Copy link

liyaozong1991 commented Jun 29, 2023

if you want to run the demo on cpu, you need to use float32 and init all param on cpu.

1. change demo.py 
# model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)). 
model = model_cls.from_config(model_config).to('cpu')
# chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
chat = Chat(model, vis_processor, device='cpu')
2. change minigpt4.yaml 
# vit_precision: "fp16"
vit_precision: "fp32"
3. change minigpt4_eval.yaml 
#  low_resource: True
low_resource: False
4. change mini_gpt4.py, about line 90:
#  torch_dtype=torch.float16,
torch_dtype=torch.float32

that is all you need to do to use cpu run the demo.py, but the speed is very slow 

@rdkmaster
Copy link

@liyaozong1991 thanks, I'll try this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants