BGE embedding issue related to Model Dimension #1768

nittulukose · 2024-03-20T12:41:06Z

I have managed to deploy privateGPT with sagemaker endpoints. I have used the following models
LLM Model : Mistral 7B Instruct
Embeding Model: BGE Base En V1.5 (Its dimension is 768)
This worked fine.

But when I chang Embedding to any other bge model (dimensions other than 768), I get the following error

ValueError: shapes (4,768) and (384,) not aligned: 768 (dim 1) != 384 (dim 0)

The given error is for the model bge-small-en-v1.5 (Dimension 384).

How can I resolve? I want to use other BGE embedding models with dimensions other than 768.

It seems, while running the privategpt application with "sagemaker mode" is expecting the dimension 768.

How can I explicitly change that to our custom value?

I need to run the embedding model "bge-m3 (dimension 1024)" for multilingual feature.

Looking forward for a solution.

dbzoo · 2024-03-20T20:29:23Z

You may not mix and match multiple embedding models with different dimensions into the same storage backend without reinitializing them - specifically the vector db. Given that you must re-init the vector store to change dimensions and the index/doc store related to it, you need to reset all database stores.

If you are using simple, chromadb or postgres this will wipe the stores. - See #1772
$ make wipe

nittulukose · 2024-03-21T06:28:43Z

Hi,

Thank you for your response.

In this case, I am using "qdrant" vectorstore. Could you please let me know how to re-init "qdrant" vector store?

dbzoo · 2024-03-22T00:56:35Z

If you are using qdrant locally you can delete the directories and files within.
Otherwise you need this #1783

nittulukose · 2024-03-22T06:24:58Z

Hi @dbzoo

Thanks, that worked!!!

I have cleared the contents in the directory, local_data/private_gpt/qdrant. Now, the dimension issue fixed.

Please suggest an LLM - Embedding Model combination that support 'Query Doc' mode for multi language (Both input and output).

My Requirement.

Query Doc Mode
Support multi language (Especially Arabic and English)
Accuracy

Currently I'm using,
LLM Model : Mistral 7B Instruct
Embeding Model: BGE M3 (Dim 1024) - which claims multilanguage support.
But, in this case Query Doc Mode is not accurate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BGE embedding issue related to Model Dimension #1768

BGE embedding issue related to Model Dimension #1768

nittulukose commented Mar 20, 2024 •

edited

Loading

dbzoo commented Mar 20, 2024 •

edited

Loading

nittulukose commented Mar 21, 2024

dbzoo commented Mar 22, 2024

nittulukose commented Mar 22, 2024

BGE embedding issue related to Model Dimension #1768

BGE embedding issue related to Model Dimension #1768

Comments

nittulukose commented Mar 20, 2024 • edited Loading

dbzoo commented Mar 20, 2024 • edited Loading

nittulukose commented Mar 21, 2024

dbzoo commented Mar 22, 2024

nittulukose commented Mar 22, 2024

nittulukose commented Mar 20, 2024 •

edited

Loading

dbzoo commented Mar 20, 2024 •

edited

Loading