# Retrieval Augmentation for GPT-4 using Pinecone

#### Fixing LLMs that Hallucinate

In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a GPT-4 model to generate an answer backed by real data sources.

GPT-4 is a big step up from previous OpenAI completion models. It also exclusively uses the `ChatCompletion` endpoint, so we must use it in a slightly different way to usual. However, the power of the model makes the change worthwhile, particularly when augmented with an external knowledge base like the Pinecone vector database.

Original Notebook link:
https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/pinecone


Modified by John Tan Chong Min on 11 Apr

Required installs for this notebook are:

In [1]:
!pip install -qU bs4 tiktoken openai langchain pinecone-client[grpc]

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.3/70.3 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m515.4/515.4 kB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.2/177.2 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m283.7/283.7 kB[0m [31m20.3 MB/s[0m eta [3

## Preparing the Data

In this example, we will download the Langchain Documentation from [https://python.langchain.com/en/latest/](https://python.langchain.com/en/latest/). We get all `.html` files located on the site like so:

In [26]:
!wget -r -A.html -P rtdocs https://python.langchain.com/en/latest/

--2023-04-11 03:27:27--  https://python.langchain.com/en/latest/
Resolving python.langchain.com (python.langchain.com)... 104.17.33.82, 104.17.32.82, 2606:4700::6811:2152, ...
Connecting to python.langchain.com (python.langchain.com)|104.17.33.82|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘rtdocs/python.langchain.com/en/latest/index.html’

          python.la     [<=>                 ]       0  --.-KB/s               python.langchain.co     [ <=>                ]  77.68K  --.-KB/s    in 0.01s   

2023-04-11 03:27:27 (5.76 MB/s) - ‘rtdocs/python.langchain.com/en/latest/index.html’ saved [79548]

Loading robots.txt; please ignore errors.
--2023-04-11 03:27:27--  https://python.langchain.com/robots.txt
Reusing existing connection to python.langchain.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 95 [text/plain]
Saving to: ‘rtdocs/python.langchain.com/robots.txt.tmp’


2023-04-11 03:27:27 (17.2 MB/s) - ‘rt

This downloads all HTML into the `rtdocs` directory. Now we can use LangChain itself to process these docs. We do this using the `ReadTheDocsLoader` like so:

In [171]:
from langchain.document_loaders import ReadTheDocsLoader

loader = ReadTheDocsLoader('rtdocs')
docs = loader.load()
len(docs)



  _ = BeautifulSoup(


  soup = BeautifulSoup(data, **self.bs_kwargs)
  soup = BeautifulSoup(data, **self.bs_kwargs)


475

This leaves us with hundreds of processed doc pages. Let's take a look at the format each one contains:

In [172]:
docs[0]

Document(page_content='', metadata={'source': 'rtdocs/python.langchain.com/robots.txt.tmp'})

In [173]:
docs[1]

Document(page_content='.md\n.pdf\nTracing\n Contents \nTracing Walkthrough\nChanging Sessions\nTracing#\nBy enabling tracing in your LangChain runs, you’ll be able to more effectively visualize, step through, and debug your chains and agents.\nFirst, you should install tracing and set up your environment properly.\nYou can use either a locally hosted version of this (uses Docker) or a cloud hosted version (in closed alpha).\nIf you’re interested in using the hosted platform, please fill out the form here.\nLocally Hosted Setup\nCloud Hosted Setup\nTracing Walkthrough#\nWhen you first access the UI, you should see a page with your tracing sessions.\nAn initial one “default” should already be created for you.\nA session is just a way to group traces together.\nIf you click on a session, it will take you to a page with no recorded traces that says “No Runs.”\nYou can create a new session with the new session form.\nIf we click on the default session, we can see that to start we have no tr

We access the plaintext page content like so:

In [174]:
print(docs[1].page_content)

.md
.pdf
Tracing
 Contents 
Tracing Walkthrough
Changing Sessions
Tracing#
By enabling tracing in your LangChain runs, you’ll be able to more effectively visualize, step through, and debug your chains and agents.
First, you should install tracing and set up your environment properly.
You can use either a locally hosted version of this (uses Docker) or a cloud hosted version (in closed alpha).
If you’re interested in using the hosted platform, please fill out the form here.
Locally Hosted Setup
Cloud Hosted Setup
Tracing Walkthrough#
When you first access the UI, you should see a page with your tracing sessions.
An initial one “default” should already be created for you.
A session is just a way to group traces together.
If you click on a session, it will take you to a page with no recorded traces that says “No Runs.”
You can create a new session with the new session form.
If we click on the default session, we can see that to start we have no traces stored.
If we now start running chain

We can also find the source of each document:

In [175]:
docs[0].metadata['source']

'rtdocs/python.langchain.com/robots.txt.tmp'

In [176]:
docs[0].metadata['source'].replace('rtdocs/', 'https://')

'https://python.langchain.com/robots.txt.tmp'

Now we store the documents as a data variable with two fields
- text: the page_content of the html page
- url: the link to the html page

In [177]:
data = [{'text': subdoc.page_content, 'url': subdoc.metadata['source'].replace('rtdocs/', 'https://')} for subdoc in docs]

In [178]:
len(data)

475

Let's visualize one of the data index

In [179]:
data[1]

{'text': '.md\n.pdf\nTracing\n Contents \nTracing Walkthrough\nChanging Sessions\nTracing#\nBy enabling tracing in your LangChain runs, you’ll be able to more effectively visualize, step through, and debug your chains and agents.\nFirst, you should install tracing and set up your environment properly.\nYou can use either a locally hosted version of this (uses Docker) or a cloud hosted version (in closed alpha).\nIf you’re interested in using the hosted platform, please fill out the form here.\nLocally Hosted Setup\nCloud Hosted Setup\nTracing Walkthrough#\nWhen you first access the UI, you should see a page with your tracing sessions.\nAn initial one “default” should already be created for you.\nA session is just a way to group traces together.\nIf you click on a session, it will take you to a page with no recorded traces that says “No Runs.”\nYou can create a new session with the new session form.\nIf we click on the default session, we can see that to start we have no traces stored.\

It's pretty ugly but it's good enough for now. Let's see how we can process all of these. We will chunk everything into ~400 token chunks, we can do this easily with `langchain` and `tiktoken`:

In [180]:
import tiktoken

tokenizer = tiktoken.get_encoding('p50k_base')

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

In [52]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20,
    length_function=tiktoken_len,
    separators=["\n\n", "\n", " ", ""]
)

Process the `data` into more chunks using this approach.

In [181]:
from uuid import uuid4
from tqdm.auto import tqdm

chunks = []

for idx, record in enumerate(tqdm(data)):
    texts = text_splitter.split_text(record['text'])
    chunks.extend([{
        'id': str(uuid4()),
        'text': texts[i],
        'chunk': i,
        'url': record['url']
    } for i in range(len(texts))])

  0%|          | 0/475 [00:00<?, ?it/s]

In [182]:
len(chunks)

2922

In [183]:
chunks[0]

{'id': '57b26110-8285-4c77-a798-2acba3d0c433',
 'text': '.md\n.pdf\nTracing\n Contents \nTracing Walkthrough\nChanging Sessions\nTracing#\nBy enabling tracing in your LangChain runs, you’ll be able to more effectively visualize, step through, and debug your chains and agents.\nFirst, you should install tracing and set up your environment properly.\nYou can use either a locally hosted version of this (uses Docker) or a cloud hosted version (in closed alpha).\nIf you’re interested in using the hosted platform, please fill out the form here.\nLocally Hosted Setup\nCloud Hosted Setup\nTracing Walkthrough#\nWhen you first access the UI, you should see a page with your tracing sessions.\nAn initial one “default” should already be created for you.\nA session is just a way to group traces together.\nIf you click on a session, it will take you to a page with no recorded traces that says “No Runs.”\nYou can create a new session with the new session form.\nIf we click on the default session, we c

In [58]:
chunks[1]

{'id': 'a11e55cf-5df2-4912-ba5a-a2ae6af35e87',
 'text': 'We can keep on exploring each of these nested traces in more detail.\nFor example, here is the lowest level trace with the exact inputs/outputs to the LLM.\nChanging Sessions#\nTo initially record traces to a session other than "default", you can set the LANGCHAIN_SESSION environment variable to the name of the session you want to record to:\nimport os\nos.environ["LANGCHAIN_HANDLER"] = "langchain"\nos.environ["LANGCHAIN_SESSION"] = "my_session" # Make sure this session actually exists. You can create a new session in the UI.\nTo switch sessions mid-script or mid-notebook, do NOT set the LANGCHAIN_SESSION environment variable. Instead: langchain.set_tracing_callback_manager(session_name="my_session")\nprevious\nDeployments\n Contents\n  \nTracing Walkthrough\nChanging Sessions\nBy Harrison Chase\n    \n      © Copyright 2023, Harrison Chase.\n      \n  Last updated on Apr 11, 2023.',
 'chunk': 1,
 'url': 'https://python.langchain

In [184]:
chunks[400]

{'id': '74cb597f-4e64-45f0-8350-43a8f8e9dd53',
 'text': 'Source code for langchain.embeddings.self_hosted_hugging_face\n"""Wrapper around HuggingFace embedding models for self-hosted remote hardware."""\nimport importlib\nimport logging\nfrom typing import Any, Callable, List, Optional\nfrom langchain.embeddings.self_hosted import SelfHostedEmbeddings\nDEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"\nDEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"\nDEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "\nDEFAULT_QUERY_INSTRUCTION = (\n    "Represent the question for retrieving supporting documents: "\n)\nlogger = logging.getLogger(__name__)\ndef _embed_documents(client: Any, *args: Any, **kwargs: Any) -> List[List[float]]:\n    """Inference function to send to the remote hardware.\n    Accepts a sentence_transformer model_id and\n    returns a list of embeddings for each document in the batch.\n    """\n    return client.encode(*args, **kwargs)\ndef load_

In [185]:
chunks[401]

{'id': 'b7a6d30e-8996-48fe-9ef8-8792a1ca0a61',
 'chunk': 1,
 'url': 'https://python.langchain.com/en/latest/_modules/langchain/embeddings/self_hosted_hugging_face.html'}

Our chunks are ready so now we move onto embedding and indexing everything.

## Initialize Embedding Model

We use `text-embedding-ada-002` as the embedding model. We can embed text like so:

In [186]:
import openai

# initialize openai API key
openai.api_key = "<your-api-key>"  #platform.openai.com

embed_model = "text-embedding-ada-002"

res = openai.Embedding.create(
    input=[
        chunks[0]['text'], chunks[1]['text'], chunks[400]['text'], chunks[401]['text']
    ], engine=embed_model
)

In the response `res` we will find a JSON-like object containing our new embeddings within the `'data'` field.

In [187]:
res.keys()

dict_keys(['object', 'data', 'model', 'usage'])

Inside `'data'` we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains `1536` dimensions (the output dimensionality of the `text-embedding-ada-002` model.

In [188]:
len(res['data'])

4

In [189]:
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])

(1536, 1536)

We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings.

Visualize the cosine similarity between embeddings (same as dot product)

In [190]:
import itertools

combinations = list(itertools.combinations(range(4), 2))
for first,second in combinations:
  print(f'Similarity between {first} and {second}', np.dot(res['data'][first]['embedding'], res['data'][second]['embedding']))

Similarity between 0 and 1 0.8790477489737241
Similarity between 0 and 2 0.6479160827111908
Similarity between 0 and 3 0.6710545200749323
Similarity between 1 and 2 0.6552577593400561
Similarity between 1 and 3 0.6590764328448566
Similarity between 2 and 3 0.8292819084110552


Check the magnitude of each vector

In [169]:
[np.linalg.norm(res['data'][i]['embedding']) for i in range(4)]

[1.0000000210234485,
 1.0000000155647548,
 0.9999999802991162,
 1.0000000373702902]

## Initializing the Index

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index.

In [92]:
import pinecone

index_name = 'gpt-4-langchain-docs'

# initialize connection to pinecone
pinecone.init(
    api_key="<your-api-key>",  # app.pinecone.io (console)
    environment="<your-environment>"  # next to API key in console
)

# check if index already exists (it shouldn't if this is first time)
if index_name not in pinecone.list_indexes():
    # if does not exist, create index
    pinecone.create_index(
        index_name,
        dimension=len(res['data'][0]['embedding']),
        metric='dotproduct'
    )
# connect to index
index = pinecone.GRPCIndex(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [93]:
from tqdm.auto import tqdm
import datetime
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(chunks), batch_size)):
    # find end of batch
    i_end = min(len(chunks), i+batch_size)
    meta_batch = chunks[i:i_end]
    # get ids
    ids_batch = [x['id'] for x in meta_batch]
    # get texts to encode
    texts = [x['text'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    meta_batch = [{
        'text': x['text'],
        'chunk': x['chunk'],
        'url': x['url']
    } for x in meta_batch]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

  0%|          | 0/30 [00:00<?, ?it/s]

Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using ChatGPT.

## Retrieval

To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs, like so:

In [206]:
query = "how do I use the React Agent in LangChain?"

# Get embedding from OpenAI
res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(xq, top_k=5, include_metadata=True)

In [207]:
res

{'matches': [{'id': 'af0ae51f-1ac9-48d1-bacc-45b704c9f0d1',
              'metadata': {'chunk': 0.0,
                           'text': '.md\n'
                                   '.pdf\n'
                                   'Agent Types\n'
                                   ' Contents \n'
                                   'zero-shot-react-description\n'
                                   'react-docstore\n'
                                   'self-ask-with-search\n'
                                   'conversational-react-description\n'
                                   'Agent Types#\n'
                                   'Agents use an LLM to determine which '
                                   'actions to take and in what order.\n'
                                   'An action can either be using a tool and '
                                   'observing its output, or returning a '
                                   'response to the user.\n'
                                   'Here a

With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

## Retrieval Augmented Generation

GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts *alongside* our original query. We can do that like so:

In [193]:
# get list of retrieved text
contexts = [item['metadata']['text'] for item in res['matches']]

augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query

In [194]:
print(augmented_query)

for all LLMs, and common utilities for working with LLMs.\n\nð\x9f”\x97 Chains:\n\nChains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.\n\nð\x9f“\x9a Data Augmented Generation:\n\nData Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources.\n\nð\x9f¤\x96 Agents:\n\nAgents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end

---

Chains: Chains go beyond just a single LLM call, and are sequences 

Now we ask the question:

In [195]:
# system message to 'prime' the model
primer = f"""You are Q&A bot. A highly intelligent system that answers
user questions based on the information provided by the user above
each question. If the information can not be found in the information
provided by the user you truthfully say "I don't know".
"""

res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": augmented_query}
    ]
)

To display this response nicely, we will display it in markdown.

In [196]:
from IPython.display import Markdown

display(Markdown(res['choices'][0]['message']['content']))

LangChain provides a standard interface for chains, called LangChain. You can use it to sequence calls to an LLM or a different utility to create chains. There are lots of integrations with other tools and end-to-end chains for common applications. You can find examples and guidance on how to use LLMChain in LangChain on their website.

#Non-augmented query
Let's compare this to a non-augmented query...

In [197]:
res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

I apologize, but I don't have enough information to answer your question. Could you please provide more context or details about what LLMChain and LangChain are and how they are related?

If we drop the `"I don't know"` part of the `primer`?

In [198]:
res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are Q&A bot. A highly intelligent system that answers user questions"},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

The LLMChain is a blockchain-based system that forms a part of the LangChain ecosystem. To use LLMChain, you will first need to create an account on the LangChain platform. Once you have created an account, you can then start using the LLMChain platform to access and transact in Langcoin cryptocurrency.

To get started, you will need to download and install a digital wallet that supports LangCoin. You can then use this wallet to store and manage your LangCoin holdings. To acquire Langcoin, you can purchase them from a supported exchange or obtain them through mining.

Once you have LangCoins in your wallet, you can use them to make transactions on the LLMChain platform. This can include buying products and services from businesses that accept Langcoin as a payment method. 

The LLMChain also has smart contract capabilities that allow users to create and execute self-executing contracts without the need for intermediaries. These contracts can govern various types of transactions, including the transfer of assets, property rights, and financial instruments.

In conclusion, to use LLMChain in LangChain, you need to create an account on the LangChain platform, obtain LangCoins and a digital wallet that supports LangCoin, and use the LLMChain platform to initiate transactions and execute smart contracts.

# Let's try to make GPT output the url as well

In [None]:
query = "how do I use the React Agent in LangChain?"

# Get embedding from OpenAI
res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(xq, top_k=5, include_metadata=True)

In [208]:
# get list of retrieved text
contexts = ['Text: '+item['metadata']['text']+'\nUrl: '+item['metadata']['url'] for item in res['matches']]

augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query

In [209]:
print(augmented_query)

Text: .md
.pdf
Agent Types
 Contents 
zero-shot-react-description
react-docstore
self-ask-with-search
conversational-react-description
Agent Types#
Agents use an LLM to determine which actions to take and in what order.
An action can either be using a tool and observing its output, or returning a response to the user.
Here are the agents available in LangChain.
zero-shot-react-description#
This agent uses the ReAct framework to determine which tool to use
based solely on the tool’s description. Any number of tools can be provided.
This agent requires that a description is provided for each tool.
react-docstore#
This agent uses the ReAct framework to interact with a docstore. Two tools must
be provided: a Search tool and a Lookup tool (they must be named exactly as so).
The Search tool should search for a document, while the Lookup tool should lookup
a term in the most recently found document.
This agent is equivalent to the
original ReAct paper, specifically the Wikipedia example.
self

In [210]:
# system message to 'prime' the model
primer = f"""You are Q&A bot. A highly intelligent system that answers
user questions based on the information provided by the user above
each question. If the information can not be found in the information
provided by the user you truthfully say "I don't know. 

You will receive some background information of the format 'Text: <text> Url: <url>'.
You are to answer the user by citing each part of the answer you provide with the relevant url in brackets at the end of the part.
The format for answersing for each url is as such: '<Answer from url1>(<url1>) <Answer from url2>(<url2>)'
Use at least 2 urls if possible".
"""

res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": augmented_query}
    ]
)

In [211]:
display(Markdown(res['choices'][0]['message']['content']))

The React Agent in LangChain is used to determine which tool to use based solely on the tool’s description (zero-shot-react-description). The agent requires that a description is provided for each tool. You can refer to the LangChain documentation for more information on how to use the React Agent (https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html).