Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGE embedding issue related to Model Dimension #1768

Open
nittulukose opened this issue Mar 20, 2024 · 4 comments
Open

BGE embedding issue related to Model Dimension #1768

nittulukose opened this issue Mar 20, 2024 · 4 comments

Comments

@nittulukose
Copy link

nittulukose commented Mar 20, 2024

I have managed to deploy privateGPT with sagemaker endpoints. I have used the following models
LLM Model : Mistral 7B Instruct
Embeding Model: BGE Base En V1.5 (Its dimension is 768)

This worked fine.

But when I chang Embedding to any other bge model (dimensions other than 768), I get the following error

ValueError: shapes (4,768) and (384,) not aligned: 768 (dim 1) != 384 (dim 0)

The given error is for the model bge-small-en-v1.5 (Dimension 384).

How can I resolve? I want to use other BGE embedding models with dimensions other than 768.

It seems, while running the privategpt application with "sagemaker mode" is expecting the dimension 768.

How can I explicitly change that to our custom value?

I need to run the embedding model "bge-m3 (dimension 1024)" for multilingual feature.

Looking forward for a solution.

@dbzoo
Copy link
Contributor

dbzoo commented Mar 20, 2024

You may not mix and match multiple embedding models with different dimensions into the same storage backend without reinitializing them - specifically the vector db. Given that you must re-init the vector store to change dimensions and the index/doc store related to it, you need to reset all database stores.

If you are using simple, chromadb or postgres this will wipe the stores. - See #1772
$ make wipe

@nittulukose
Copy link
Author

Hi,

Thank you for your response.

In this case, I am using "qdrant" vectorstore. Could you please let me know how to re-init "qdrant" vector store?

@dbzoo
Copy link
Contributor

dbzoo commented Mar 22, 2024

If you are using qdrant locally you can delete the directories and files within.
Otherwise you need this #1783

@nittulukose
Copy link
Author

Hi @dbzoo

Thanks, that worked!!!

I have cleared the contents in the directory, local_data/private_gpt/qdrant. Now, the dimension issue fixed.

Please suggest an LLM - Embedding Model combination that support 'Query Doc' mode for multi language (Both input and output).

My Requirement.

  1. Query Doc Mode
  2. Support multi language (Especially Arabic and English)
  3. Accuracy

Currently I'm using,
LLM Model : Mistral 7B Instruct
Embeding Model: BGE M3 (Dim 1024) - which claims multilanguage support.
But, in this case Query Doc Mode is not accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants