Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set trust_remote_code=True in privateGPT to use nomic-ai/nomic-embed-text-v1.5 embed model from huggingface.co? #1893

Closed
01PAfXWT opened this issue Apr 29, 2024 · 3 comments

Comments

@01PAfXWT
Copy link

First of all, grateful thanks to the authors of privateGPT for developing such a great app.

However, when I tried to use nomic-ai/nomic-embed-text-v1.5 from huggingface.co as an embedding model coupled with llamacpp for local setups, an error occurred as follows:

ValueError: Loading nomic-ai/nomic-embed-text-v1.5 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

I am wondering how I can set the trust_remote_code=True option and properly pass it to privateGPT. I tried inserting trust_remote_code: true in the huggingface or llamacpp section in the yaml file, but it did not work.

The following is my configuration profile.

server:
  env_name: ${APP_ENV:demo-local}

llm:
  mode: llamacpp
  max_new_tokens: 512
  context_window: 3900
  tokenizer: mistralai/Mistral-7B-Instruct-v0.2
  temperature: 0.1  

llamacpp:
  prompt_style: "mistral"
  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
  tfs_z: 1.0 
  top_k: 40 
  top_p: 1.0 
  repeat_penalty: 1.1
  
embedding:
  mode: huggingface
  embed_dim: 768
  ingest_mode: pipeline
  count_workers: 4

huggingface:
  embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5

  
rag:
  similarity_top_k: 5
  rerank:
    enabled: false
    model: cross-encoder/ms-marco-MiniLM-L-2-v2
    top_n: 3

data:
  local_data_folder: ../KBs/demo
  
vectorstore:
  database: qdrant

qdrant:
  path: ../KBs/demo/qdrant

nodestore:
  database: simple

Any suggestions? Thank you.

@hamzahassan66
Copy link

hamzahassan66 commented May 17, 2024

@01PAfXWT Hi, same situation here. did you find any solution for this?

@01PAfXWT
Copy link
Author

To set the trust_remote_code option in privateGPT, you need to modify the following code sections:

  1. add definition of trust_remote_code to private_gpt\settings\settings.py , the default value is set to False.
class HuggingFaceSettings(BaseModel):
    embedding_hf_model_name: str = Field(
        description="Name of the HuggingFace model to use for embeddings"
    )
    access_token: str = Field(
        None,
        description="Huggingface access token, required to download some models",
    )
    trust_remote_code: bool = Field(
        False,
        description="Trust remote code when downloading models",
    )
  1. insert trust_remote_code=settings.huggingface.trust_remote_code, to private_gpt\components\embedding\embedding_component.py :
self.embedding_model = HuggingFaceEmbedding(
                    model_name=settings.huggingface.embedding_hf_model_name,
                    cache_folder=str(models_cache_path),
                    trust_remote_code=settings.huggingface.trust_remote_code,
                )
  1. finally, add trust_remote_code:true to settings-{profile}.yaml
huggingface:
  embedding_hf_model_name: nomic-ai/nomic-embed-text-v1.5
  access_token: ${HUGGINGFACE_TOKEN:********}
  trust_remote_code: true

Above modifications work for me.

@hamzahassan66
Copy link

@01PAfXWT Worked like a charm, thanks mate 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants