-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any idea if this will work on CPU? #36
Comments
yep,I need help to run on CPU I am downloading and merging models |
Hi! Has there been any progress on running it on a CPU? I'm really interested in this as well, since I don't have a powerful GPU. Any updates or workarounds you've discovered would be greatly appreciated. Thanks! |
@kenneth104 |
noI can’t run on CPU, something need cuda and output error |
if you want to run the demo on cpu, you need to use float32 and init all param on cpu.
|
@liyaozong1991 thanks, I'll try this. |
First of all, thanks for this great project! The output quality seems very good, and the idea of running a multimodal model to work locally is awesome. It seems we already have a GPT-4 like multimodal model in our hands, so very exciting. I was wondering if it is possible to run with llama.cpp on CPU? I am currently running Vicuna-13b on CPU (the 4-bit quantized version) - around 8 GB Ram is enough. It works just fine, and the inference speed is about 1.5 tokens per second for my computer. (lt also seems to work on mobile phones with enough memory. I did not try it, but I saw a few examples). llama.cpp has their own file format (ggml), and provide a way to convert from original weights to ggml. It would be great if people with low VRAM or no VRAM can make it work on CPU. Any thoughts?
The text was updated successfully, but these errors were encountered: