Any idea if this will work on CPU? #36

spikespiegel · 2023-04-18T13:58:04Z

First of all, thanks for this great project! The output quality seems very good, and the idea of running a multimodal model to work locally is awesome. It seems we already have a GPT-4 like multimodal model in our hands, so very exciting. I was wondering if it is possible to run with llama.cpp on CPU? I am currently running Vicuna-13b on CPU (the 4-bit quantized version) - around 8 GB Ram is enough. It works just fine, and the inference speed is about 1.5 tokens per second for my computer. (lt also seems to work on mobile phones with enough memory. I did not try it, but I saw a few examples). llama.cpp has their own file format (ggml), and provide a way to convert from original weights to ggml. It would be great if people with low VRAM or no VRAM can make it work on CPU. Any thoughts?

kenneth104 · 2023-04-20T03:26:09Z

yep,I need help to run on CPU

I am downloading and merging models

webnizam · 2023-04-20T11:57:27Z

Hi! Has there been any progress on running it on a CPU? I'm really interested in this as well, since I don't have a powerful GPU. Any updates or workarounds you've discovered would be greatly appreciated. Thanks!

rdkmaster · 2023-04-21T09:14:02Z

yep,I need help to run on CPU

I am downloading and merging models

@kenneth104
Any progress of using CPU was made? Can you share?

kenneth104 · 2023-04-21T14:39:20Z

yep,I need help to run on CPU
I am downloading and merging models

@kenneth104 Any progress of using CPU was made? Can you share?

noI can’t run on CPU, something need cuda and output error

liyaozong1991 · 2023-06-29T02:18:00Z

if you want to run the demo on cpu, you need to use float32 and init all param on cpu.

1. change demo.py 
# model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)). 
model = model_cls.from_config(model_config).to('cpu')
# chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))
chat = Chat(model, vis_processor, device='cpu')
2. change minigpt4.yaml 
# vit_precision: "fp16"
vit_precision: "fp32"
3. change minigpt4_eval.yaml 
#  low_resource: True
low_resource: False
4. change mini_gpt4.py, about line 90:
#  torch_dtype=torch.float16,
torch_dtype=torch.float32

that is all you need to do to use cpu run the demo.py, but the speed is very slow

rdkmaster · 2023-06-29T02:35:32Z

@liyaozong1991 thanks, I'll try this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any idea if this will work on CPU? #36

Any idea if this will work on CPU? #36

spikespiegel commented Apr 18, 2023

kenneth104 commented Apr 20, 2023

webnizam commented Apr 20, 2023

rdkmaster commented Apr 21, 2023

kenneth104 commented Apr 21, 2023

liyaozong1991 commented Jun 29, 2023 •

edited

Loading

rdkmaster commented Jun 29, 2023

Any idea if this will work on CPU? #36

Any idea if this will work on CPU? #36

Comments

spikespiegel commented Apr 18, 2023

kenneth104 commented Apr 20, 2023

webnizam commented Apr 20, 2023

rdkmaster commented Apr 21, 2023

kenneth104 commented Apr 21, 2023

liyaozong1991 commented Jun 29, 2023 • edited Loading

rdkmaster commented Jun 29, 2023

liyaozong1991 commented Jun 29, 2023 •

edited

Loading