ggml-old-vic13b-q5_1.bin not supported #567

DjToMeK30 · 2023-05-31T08:19:24Z

Where can I see supported models?
I tried this vicuna models and none of them work properly. I always get error like
error loading model: this format is no longer supported (see ggerganov/llama.cpp#1305)

shaggy2626 · 2023-05-31T10:06:05Z

Where can I see supported models? I tried this vicuna models and none of them work properly. I always get error like error loading model: this format is no longer supported (see ggerganov/llama.cpp#1305)

i got the same error GPT4All-13B-snoozy.ggmlv3.q8_0.bin'

DjToMeK30 · 2023-05-31T11:11:11Z

@shaggy2626 I used this method
#220 (comment)
pip install llama-cpp-python==0.1.48

Then I changed
MODEL_N_CTX=2048

I also applied this method
#517 (comment)

Now it works better but only with PDF I think. CSV finds only one row, and html page is no good
I am exporting Google spreadsheet (excel) to pdf

shaggy2626 · 2023-05-31T11:36:07Z

@shaggy2626 I used this method #220 (comment) pip install llama-cpp-python==0.1.48

Then I changed MODEL_N_CTX=2048

I also applied this method #517 (comment)

Now it works better but only with PDF I think. CSV finds only one row, and html page is no good I am exporting Google spreadsheet (excel) to pdf

how long does it take your to provide a response? i have 64gb of ram and it takes an average of 2 minutes for each query. im trying to use this model https://huggingface.co/TheBloke/GPT4All-13B-snoozy-GGML/blob/main/GPT4All-13B-snoozy.ggmlv3.q8_0.bin

with

all-mpnet-base-v2

DjToMeK30 · 2023-05-31T11:41:32Z

I use
MODEL_TYPE=LlamaCpp
MODEL_PATH=ggml-old-vic13b-q5_1.bin

But yeah, it takes about 2-3min for a response. It starts loading model in memory. I have 32gb
But whole response is crap, on my side. Didn't yet find it useful in my scenario
Maybe it will be better when CSV gets fixed because saving excel/spreadsheet in pdf is not useful really

jackfood · 2023-06-01T02:37:40Z

Agreed that the return result is not quite there yet in term of completeness. either the embedding technique is not that good, or LLM is not up to standard trying what i am asking.

StephenDWright · 2023-06-03T10:08:02Z

Just for comparison, I am using wizard Vicuna 13GB ggml but I am using it with GPU implementation where some of the work gets off loaded. Answers take about 4-5 seconds to start generating, 2-3 when asking multiple ones back to back. Answers are pretty good, I would say if 10 was an open AI implementation, the answers I get based on the context are a solid 8. Fairly complete, sometimes a minor mix up because of the order it states the information but accurate. Depending on the question "Tell me about topic" vs "What is specific thing in topic" it can give pretty lengthy and rich answers.

DjToMeK30 · 2023-06-03T18:17:02Z

Just for comparison, I am using wizard Vicuna 13GB ggml but I am using it with GPU implementation where some of the work gets off loaded. Answers take about 4-5 seconds to start generating, 2-3 when asking multiple ones back to back. Answers are pretty good, I would say if 10 was an open AI implementation, the answers I get based on the context are a solid 8. Fairly complete, sometimes a minor mix up because of the order it states the information but accurate. Depending on the question "Tell me about topic" vs "What is specific thing in topic" it can give pretty lengthy and rich answers.

How did you enable GPU? And what gpu you have? Are you running this on your own files? What type?

StephenDWright · 2023-06-04T01:45:47Z

Just for comparison, I am using wizard Vicuna 13GB ggml but I am using it with GPU implementation where some of the work gets off loaded. Answers take about 4-5 seconds to start generating, 2-3 when asking multiple ones back to back. Answers are pretty good, I would say if 10 was an open AI implementation, the answers I get based on the context are a solid 8. Fairly complete, sometimes a minor mix up because of the order it states the information but accurate. Depending on the question "Tell me about topic" vs "What is specific thing in topic" it can give pretty lengthy and rich answers.

How did you enable GPU? And what gpu you have? Are you running this on your own files? What type?

I followed the instructions in the pull request that enables GPU support. It is a bit touchy but got it to work and it helps it fly. Also had to make a change that's in the pull request that improves performance. The GPU I am using is a 12GB 3060. I am currently using it on documentation about labor laws for the teaching profession in my country.

GuoChang2032 · 2023-06-04T07:09:58Z

@StephenDWright I also tried to enable GPU acceleration, but it was not successful. Can you give me more details? Thank you.

StephenDWright · 2023-06-04T11:52:34Z

@StephenDWright I also tried to enable GPU acceleration, but it was not successful. Can you give me more details? Thank you.

I followed the instructs in the thread. in order for it to work, the llamacpp version that should be installed needs to be at least 1.54. I think the one in the requirements file is 1.50. if you install the 1.50 then you have to uninstall it and reinstall it with a flag that makes sure it doesn't reinstall from a cache.

I also had to install pytorch, nvidia CUDA toolkit 11.8, and cmake, which I installed using visual studio. Then you follow the instructions in the thread to build using the flags provided. Even then I had an issue where I had two llamacpp directories, I had to manually remove one from the build directory and pull out the directory from 1.54 and put it in the root of the build directory. It's a little scrappy and I'm not sure if I can repeat the install to be honest. I removed and pulled this repository so many times before I got it to work. My issue however as far as I could tell had to do with the right version of llamacpp. Btw I am using a venv in visual studio code.

You should read out the thread though. What error are you getting?

GuoChang2032 · 2023-06-04T13:17:18Z

@StephenDWright I didn't encounter any errors, but the GPU just doesn't work, and I feel like my CUDA toolkit hasn't been fixed properly :(

StephenDWright · 2023-06-04T20:46:59Z

@StephenDWright I didn't encounter any errors, but the GPU just doesn't work, and I feel like my CUDA toolkit hasn't been fixed properly :(

What kind of GPU is it and when the model loads, what does it output? It should output something like this close to the end.

llama_model_load_internal: [cublas] offloading 32 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 6656 MB

When you ran

$Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"; $Env:FORCE_CMAKE=1; py ./setup.py install

Does it say found CUDA toolkit and show the path for the toolkit and also Cublas found?

GuoChang2032 · 2023-06-05T01:41:06Z

@StephenDWright我没有遇到任何错误，但 GPU 无法正常工作，我觉得我的 CUDA 工具包没有得到正确修复:(

它是什么类型的 GPU，当模型加载时，它会输出什么？它应该在接近尾声时输出类似这样的内容。

llama_model_load_internal：[cublas] 将 32 层卸载到 GPU llama_model_load_internal：[cublas] 使用的总 VRAM：6656 MB

当你跑的时候

$Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"; $环境：FORCE_CMAKE=1；py ./setup.py 安装

它是否说找到了 CUDA 工具包并显示了工具包的路径，还找到了 Cublas？

I installed and resolved this issue in the Conda environment, but now I have encountered other issues. However, thank you very much and I salute you

DjToMeK30 · 2023-06-05T11:41:56Z

@StephenDWright thanks for the follow up. I was wondering what type of graphic card would I need to make this somehow usable. I don't think my GPU would handle it like your do :)
A question would be how to know if you want to have 10 client asking a question to your local AI, how good of a graphic card your pc would need to handle 10 different conversations about your documents

StephenDWright · 2023-06-05T14:02:38Z

@StephenDWright thanks for the follow up. I was wondering what type of graphic card would I need to make this somehow usable. I don't think my GPU would handle it like your do :)
A question would be how to know if you want to have 10 client asking a question to your local AI, how good of a graphic card your pc would need to handle 10 different conversations about your documents

I have no idea how it would work with 10 different clients at the same time. I know when I was trying to make a gui for it, I had to be careful to load the model first and not have it load every time a question is asked because it will just use and run out of vram memory. It will probably run very slow as well if you try to generate tokens from multiple clients at the same time. 🤷🏽‍♂️ You would need a pretty beefy card and probably more than a few to make it useable for multiple clients, even if that number is 10. At that point, you may as well just use Openai's API.

DjToMeK30 · 2023-06-05T14:20:15Z

@StephenDWright thanks for the follow up. I was wondering what type of graphic card would I need to make this somehow usable. I don't think my GPU would handle it like your do :)
A question would be how to know if you want to have 10 client asking a question to your local AI, how good of a graphic card your pc would need to handle 10 different conversations about your documents

I have no idea how it would work with 10 different clients at the same time. I know when I was trying to make a gui for it, I had to be careful to load the model first and not have it load every time a question is asked because it will just use and run out of vram memory. It will probably run very slow as well if you try to generate tokens from multiple clients at the same time. 🤷🏽‍♂️ You would need a pretty beefy card and probably more than a few to make it useable for multiple clients, even if that number is 10. At that point, you may as well just use Openai's API.

I know that I would need alot stronger GPU or system in general. Just thinking how much would you need to acomplish something like that regardless of the price. And ofc then you would also need to save model state for each client and load it when needed

DjToMeK30 added the bug Something isn't working label May 31, 2023

AndreasKunar mentioned this issue Jun 1, 2023

Hallucination: Answers are not from the docs but from the model's own knowledge base #517

Closed

mikepsinn added a commit to mikepsinn/privateGPT that referenced this issue Jun 11, 2023

Updated requirements to fix Windows issues described in zylon-ai#567

48e44ef

imartinez added the primordial Related to the primordial version of PrivateGPT, which is now frozen in favour of the new PrivateGPT label Oct 19, 2023

imartinez closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-old-vic13b-q5_1.bin not supported #567

ggml-old-vic13b-q5_1.bin not supported #567

DjToMeK30 commented May 31, 2023

shaggy2626 commented May 31, 2023

DjToMeK30 commented May 31, 2023

shaggy2626 commented May 31, 2023 •

edited

DjToMeK30 commented May 31, 2023

jackfood commented Jun 1, 2023

StephenDWright commented Jun 3, 2023

DjToMeK30 commented Jun 3, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 4, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 4, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 5, 2023

DjToMeK30 commented Jun 5, 2023

StephenDWright commented Jun 5, 2023

DjToMeK30 commented Jun 5, 2023 •

edited

ggml-old-vic13b-q5_1.bin not supported #567

ggml-old-vic13b-q5_1.bin not supported #567

Comments

DjToMeK30 commented May 31, 2023

shaggy2626 commented May 31, 2023

DjToMeK30 commented May 31, 2023

shaggy2626 commented May 31, 2023 • edited

DjToMeK30 commented May 31, 2023

jackfood commented Jun 1, 2023

StephenDWright commented Jun 3, 2023

DjToMeK30 commented Jun 3, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 4, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 4, 2023

StephenDWright commented Jun 4, 2023

GuoChang2032 commented Jun 5, 2023

DjToMeK30 commented Jun 5, 2023

StephenDWright commented Jun 5, 2023

DjToMeK30 commented Jun 5, 2023 • edited

shaggy2626 commented May 31, 2023 •

edited

DjToMeK30 commented Jun 5, 2023 •

edited