Use privateGPT for multi document QA

Ziqing Yang edited this page Jun 8, 2023 · 7 revisions

privateGPT is an open-source project based on llama-cpp-python and LangChain among others. It aims to provide an interface for localizing document analysis and interactive Q&A using large models. Users can utilize privateGPT to analyze local documents and use GPT4All or llama.cpp compatible large model files to ask and answer questions about document content, ensuring data localization and privacy.

Since this project is based on related derivatives of LLaMA, this article will introduce the usage of privateGPT using the GGML format model in llama.cpp as an example.

For more detailed content and usage, please refer to the privateGPT official directory:

Prerequisites: Install llama-cpp-python

Since the GGML model in llama.cpp is used in privateGPT, it is necessary to install the llama-cpp-python extension in advance. The following installation method does not use any acceleration library.

pip install llama-cpp-python

If you wish to install a version compatible with OpenBLAS/cuBLAS/CLBlast, please refer to:

Must read for Mac M series chip users

Make sure that the Python in the current installation environment supports the arm64 architecture, otherwise the execution speed will be more than 10x slower. The test method is to execute the following python command after installing llama-cpp-python, and the model path should be replaced with your local GGML model file supported by llama.cpp.

>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")

If NEON = 1 is displayed, it indicates normal; NEON = 0 indicates that it was not installed correctly according to the arm64 architecture. Below is an example of a log supporting ARM NEON acceleration.

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

How to install python compatible with arm64?

If you use conda, you can use the following command to create the related environment, here choosing Python 3.10 to meet the requirements of privateGPT.

CONDA_SUBDIR=osx-arm64 conda create -n privategpt python=3.10 -c conda-forge

Step 1: Clone the directory and install the dependency packages

After correctly installing llama-cpp-python, you can continue to install privateGPT, the specific command is as follows (note that python>=3.10).

git clone
cd privateGPT
pip3 install -r requirements.txt

Step 2: Modify the configuration file

Create a configuration file named .env in the root directory of privateGPT, an example of a well-written configuration file is:

  • MODEL_TYPE: Fill in LlamaCpp

  • PERSIST_DIRECTORY: Fill in the location where the analysis files are stored. Here, a db directory will be created in the root directory of privateGPT.

  • MODEL_PATH: Points to the location where the large model is stored, which here points to the GGML file supported by llama.cpp.

  • MODEL_N_CTX: The maximum token limit of the large model, set to 2048.

  • EMBEDDINGS_MODEL_NAME: SentenceTransformers word vector model location, can specify the path on HuggingFace (will be automatically downloaded). Other officially supported models can be referred to:

Step 3: Analyze local files

privateGPT supports the analysis of the following common document formats, for example (only the most commonly used are listed):

  • Word files: .doc, .docx
  • PPT files: .ppt, .pptx
  • PDF files: .pdf
  • Pure text files: .txt
  • CSV files: .csv
  • Markdown files: .md
  • Email files: .eml, .msg

Place the documents to be analyzed (not limited to a single document) in the source_documents directory under the root node of privateGPT. Here, the LangChain sample data of this project is used as an example for introduction. The directory structure is similar:

> ls source_documents

Next, run the ingest command to analyze the document.


The output is as follows (the author uses M1 Max, and the analysis only took a few seconds). It should be noted that the word vector model in the configuration file will be downloaded for the first time (if the given is a huggingface address, not a local path).

Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00,  1.42s/it]
Loaded 1 new documents from source_documents
Split into 7 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run to query your documents

⚠️ Note: If there are related analysis files in the db directory, the data files will be accumulated. If you only want to parse the current document, please clear the db directory before ingesting.

Step 4: Ask questions about the document locally

After the analysis of the document in the previous step is completed, you can run the following command to start asking questions about the document:


After the following prompt appears, you can enter a question:

Enter a query: 

For example, ask 李白的诗是什么风格?, the result is as follows:

> Answer:

> source_documents/doc.txt:

> source_documents/doc.txt:
李白曾经在唐玄宗天宝元年(742年)供奉翰林。有一次皇帝因酒酣问李白说:“我朝与天后(武后)之朝何如?”白曰:“天后朝政出多门,国由奸幸,任人之道,如小儿市瓜,不择香味,惟拣肥大者;我朝任人如淘沙取金,剖石采用,皆得其精粹者。”玄宗听后大笑不止[8][9]。但是由于他桀骜不驯的性格,所以仅仅不到两年他就离开了长安。据说是因为他作的《清平调》得罪了当时宠冠后宫的杨贵妃(因李白命“力士脱靴”,高力士引以为大耻,因而以言语诱使杨贵妃认为“可怜飞燕倚新妆”几句是讽刺她)而不容于宫中[注 3]。天宝三年(745年)“恳求还山,帝赐金放还”,离开长安。

> source_documents/doc.txt:
李白[注 1](701年5月19日—762年11月30日),字太白,号青莲居士,中国唐朝诗人。李白自言祖籍陇西成纪(今甘肃静宁西南),汉飞将军李广后裔,西凉武昭王李暠之后,与李唐皇室同宗。
一说其幼时内迁,寄籍剑南道绵州昌隆(今四川省江油市青莲镇)。一说先人隋末被窜于碎叶,出生于碎叶,属唐安西都护府(今吉尔吉斯斯坦共和国楚河州托克马克市)。有“诗仙”、“诗侠”、“酒仙”、“谪仙人”等称呼,活跃于盛唐[1],为杰出的浪漫主义诗人。与杜甫合称“李杜”[注 2]。被贺知章呼为“天上谪仙”、“李谪仙”。

> source_documents/doc.txt:

The whole process is not very fast, it took about half a minute to give the related result, and 4 data sources will be given.

Enter exit to end the script run.

Advanced configuration

Use more threads for acceleration actually calls the interface of llama-cpp-python, so if you do not make any code modifications, the default decoding strategy is used. Open and find the following statement (around lines 30-35, it varies depending on different versions).

llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, callbacks=callbacks, verbose=False)

Here is the definition of the LlamaCpp model. More custom parameters can be passed in according to the interface definition of llama-cpp-python. The following is an example, which additionally increases the number of decoding threads, which helps to improve the decoding speed (please configure according to the actual number of physical cores).

llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, callbacks=callbacks, verbose=False, n_threads=8)

A few lines after the above definition, LangChain's RetrievalQA will be used for interaction. For the specific definition and configuration method, please refer to the LangChain documentation.

Use Alpaca prompt template

If you are using Alpaca models, you can also pass the prompt template before generation. For example, near line 39, you can modify the following code

    qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents= not args.hide_source)


    prompt_template = ("Below is an instruction that describes a task. "
                      "Write a response that appropriately completes the request.\n\n"
                      "### Instruction:\n{context}\n\n{question}\n\n### Response: ")
    from langchain import PromptTemplate
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["context","question"])
    qa = RetrievalQA.from_chain_type(
        llm=llm, chain_type="stuff",
        return_source_documents= not args.hide_source,

Optimize LangChian

In, the default chain type is stuff. However, it is not suitable for long documents. You can switch to refine or map_reduce chain. Please refer to LangChain example . For instance, if using refine, users should first define two prompt template:

    refine_prompt_template = (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n"
        "这是原始问题: {question}\n"
        "已有的回答: {existing_answer}\n"
        "### Response: "

    initial_qa_template = (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n"
        "请根据以上背景知识, 回答这个问题:{question}。\n\n"
        "### Response: "

and modify the code around line 39:

    from langchain import PromptTemplate
    refine_prompt = PromptTemplate(
        input_variables=["question", "existing_answer", "context_str"],
    initial_qa_prompt = PromptTemplate(
        input_variables=["context_str", "question"],
    chain_type_kwargs = {"question_prompt": initial_qa_prompt, "refine_prompt": refine_prompt}
    qa = RetrievalQA.from_chain_type(
        llm=llm, chain_type="refine",
        retriever=retriever, return_source_documents= not args.hide_source,
