<a href="https://colab.research.google.com/github/xsrv07/AI/blob/main/LC1_3_Simple_RAG%2C_Conversational_RAG_and_Multi_User_Conversational_RAG_Systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic RAG System with LangChain




# QA Search Engine using Large Language Models - ChatGPT

Here we use an Open AI LLM to generate contextual embeddings for each wikipedia article.

Then we use ChatGPT (GPT3.5) to answer questions just as a human would by searching for the most similar articles based on our input queries.

The new model, `text-embedding-3-small` is our new highly efficient embedding model and provides a significant upgrade over its predecessor, the `text-embedding-ada-002` model released in December 2022.

GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is `gpt-3.5-turbo` which has been optimized for chat using the Chat Completions API but works well for traditional completions tasks as well.

## QA Search Engine using Large Language Models - ChatGPT

Here we use an Open AI LLM to generate contextual embeddings for each wikipedia article.

Then we use ChatGPT (GPT3.5) to answer questions just as a human would by searching for the most similar article based on our input queries.

The new model, `text-embedding-3-small` is our new highly efficient embedding model and provides a significant upgrade over its predecessor, the `text-embedding-ada-002` model released in December 2022.

Stronger performance. Comparing `text-embedding-ada-002` to `text-embedding-3-small`, the average score on a commonly used benchmark for multi-language retrieval (MIRACL(opens in a new window)) has increased from 31.4% to 44.0%, while the average score on a commonly used benchmark for English tasks (MTEB(opens in a new window)) has increased from 61.0% to 62.3%.

Reduced price. `text-embedding-3-small` is also substantially more efficient than our previous generation `text-embedding-ada-002` model. Pricing for `text-embedding-3-small` has therefore been reduced by 5X compared to `text-embedding-ada-002`, from a price per 1k tokens of $0.0001 to $0.00002.

GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is `gpt-3.5-turbo` which has been optimized for chat using the Chat Completions API but works well for traditional completions tasks as well.

### Load Dependencies

In [None]:
!pip install langchain==0.1.16
!pip install langchain-openai==0.1.3
!pip install langchain-community==0.0.33

Collecting langchain==0.1.16
  Downloading langchain-0.1.16-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.16)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.1.16)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain==0.1.16)
  Downloading langchain_community-0.0.38-py3-none-any.whl.metadata (8.7 kB)
Collecting langchain-core<0.2.0,>=0.1.42 (from langchain==0.1.16)
  Downloading langchain_core-0.1.52-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<0.1,>=0.0.1 (from langchain==0.1.16)
  Downloading langchain_text_splitters-0.0.2-py3-none-any.whl.metadata (2.2 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.1.16)
  Downloading langsmith-0.1.99-py3-none-any.whl.metadata (13 kB)
Collecting tenacity<9.0.0,>=8.1.0 (from langchain==0.1.16)
  Downloading tenacity-8.

In [None]:
!pip install langchain-chroma==0.1.0
!pip install langchainhub==0.1.15

Collecting langchain-chroma==0.1.0
  Downloading langchain_chroma-0.1.0-py3-none-any.whl.metadata (1.3 kB)
Collecting chromadb<0.5.0,>=0.4.0 (from langchain-chroma==0.1.0)
  Downloading chromadb-0.4.24-py3-none-any.whl.metadata (7.3 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma==0.1.0)
  Downloading fastapi-0.112.1-py3-none-any.whl.metadata (27 kB)
Collecting chroma-hnswlib==0.7.3 (from chromadb<0.5.0,>=0.4.0->langchain-chroma==0.1.0)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb<0.5.0,>=0.4.0->langchain-chroma==0.1.0)
  Downloading uvicorn-0.30.6-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb<0.5.0,>=0.4.0->langchain-chroma==0.1.0)
  Downloading posthog-3.5.0-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting pulsar-client>=3.1.0 (from chromadb<0.5.0,>=0.4.0->langchain-chroma==0.1.0)
  Downloading pulsar_client

## Enter API Tokens

In [None]:
from getpass import getpass

OPENAI_KEY = getpass()

··········


if using Azure Open AI you might need to configure it based on how it is setup in your org.

Refer to [this](https://python.langchain.com/docs/integrations/llms/azure_openai/) for more details

In [None]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

### Load Wikipedia Data

In [None]:
import gzip
import json
import requests
from tqdm import tqdm
import sys
import os

# Download a file from a URL
def http_get(url, path) -> None:
    """
    Downloads a URL to a given path on disc
    """
    if os.path.dirname(path) != "":
        os.makedirs(os.path.dirname(path), exist_ok=True)

    req = requests.get(url, stream=True)
    if req.status_code != 200:
        print("Exception when trying to download {}. Response {}".format(url, req.status_code), file=sys.stderr)
        req.raise_for_status()
        return

    download_filepath = path + "_part"
    with open(download_filepath, "wb") as file_binary:
        content_length = req.headers.get("Content-Length")
        total = int(content_length) if content_length is not None else None
        progress = tqdm(unit="B", total=total, unit_scale=True)
        for chunk in req.iter_content(chunk_size=1024):
            if chunk:  # filter out keep-alive new chunks
                progress.update(len(chunk))
                file_binary.write(chunk)

    os.rename(download_filepath, path)
    progress.close()


wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

100%|██████████| 50.2M/50.2M [00:05<00:00, 9.29MB/s]


In [None]:
import gzip
import json

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

passages = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())

        #Add all paragraphs
        #passages.extend(data['paragraphs'])

        #Only add the first paragraph
        passages.append(data['paragraphs'][0])

print("Passages:", len(passages))

Passages: 169597


In [None]:
# We subset our data so we only use a subset of wikipedia to run things faster
passages = [passage for passage in passages for x in ['fish', 'india', 'cheetah']
              if x in passage.lower().split()]
passages = [passage for passage in passages for x in ['flying fish', 'india', 'cheetah']
              if x in passage.lower()]

In [None]:
len(passages)

793

In [None]:
passages[0]

'Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.'

### Load Open AI LLMs

In [None]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

### Generate LLM Embeddings and store them in Chroma Vector DB

**Chroma Vector DB** is a versatile, open-source vector database designed for managing and querying vector embeddings. It is easy to set up and integrates well with various AI tools and algorithms. Chroma is particularly useful for applications that require rapid and precise retrieval of content represented as embeddings—efficient data formats for text, images, and soon, audio and video.

**Key Features:**
- **Integration with AI Tools:** Chroma supports embedding functions from leading providers like OpenAI, Google, and Hugging Face, allowing for flexible and powerful data handling.
- **Ease of Use:** The database provides default embedding functions, or users can integrate external APIs to generate embeddings.
- **Efficient Querying:** Users can create collections to store embeddings, documents, and metadata. These can be queried to retrieve the most similar items, making information retrieval quick and effective.
- **Flexible API:** Chroma offers a straightforward API that supports both standard operations and custom embedding functions.

For more detailed information, visit the official Chroma documentation [here](https://docs.trychroma.com).


In [None]:
passages[:3]

['Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.',
 'The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten nations: Brazil, Chile, Colombia, Cuba, the 

In [None]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

In [None]:
# The vectorstore we'll be using
from langchain_chroma import Chroma

# The embedding engine that will convert our text to vectors
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
from langchain.docstore.document import Document

docs = [Document(page_content=doc) for doc in passages]

In [None]:
docs[:3]

[Document(page_content='Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.'),
 Document(page_content='The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
chunked_docs = splitter.split_documents(docs)

In [None]:
chunked_docs[:3]

[Document(page_content='Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.'),
 Document(page_content='The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten

## Create Vector DB and Retriever

If you have already created `wiki_db`in the previous hands-on session then just load the DB and DO NOT run the following code to create the database again, ignore this when running on Colab

In [None]:
# create vector DB of docs and embeddings - takes 1 min on Colab
chroma_db = Chroma.from_documents(documents=chunked_docs, collection_name='wiki_db',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./wiki_db")

## Load Vector DB from disk

Run the following code if your vector DB already exists on disk from the previous hands-on session

In [None]:
# load from disk
chroma_db = Chroma(persist_directory="./wiki_db",
                   collection_name='wiki_db',
                   embedding_function=openai_embed_model)

In [None]:
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x7b5062351390>

In [None]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                                              search_kwargs={"k": 5, "score_threshold": 0.2})

In [None]:
similarity_retriever.invoke('what is the capital of India?')

[Document(page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'),
 Document(page_content='The Republic of India is divided into twenty-eight States,and eight union territories including the National Capital Territory.'),
 Document(page_content="Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India durin

### Build a QA RAG Chain

In [None]:
from langchain import hub

In [None]:
prompt = hub.pull("rlm/rag-prompt")
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If you don't know the answer, just say that you don't know.
            Keep the answer upto 5 lines unless the user asks for more information

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

## New LangChain Syntax for RAG Chain - Using LCEL

In [None]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_rag_chain = (
    {
        "context": (similarity_retriever
                      |
                    format_docs),
        "question": RunnablePassthrough()
    }
      |
    prompt_template
      |
    chatgpt
)

In [None]:
query = "What is the capital of India?"
result = qa_rag_chain.invoke(query)
print(result.content)

The capital of India is New Delhi.


In [None]:
query = "Tell me about the financial capital of India"
result = qa_rag_chain.invoke(query)
print(result.content)

The financial capital of India is Mumbai. It is home to the National Stock Exchange of India (NSE), the largest stock exchange in the country, and the State Bank of India (SBI), the largest bank. Mumbai plays a crucial role in the Indian economy, which is the fifth largest in the world.


In [None]:
query = "Who was the winner of the champions league in 2020?"
result = qa_rag_chain.invoke(query)
print(result.content)

I don't know.


In [None]:
query = "What is the financial capital of India?"
result = qa_rag_chain.invoke(query)
print(result.content)

The financial capital of India is Mumbai. It is home to the National Stock Exchange of India (NSE) and the State Bank of India (SBI), which are key financial institutions in the country.


In [None]:
query = "Tell me more about it in detail"
result = qa_rag_chain.invoke(query)
print(result.content)



I don't know.


# Conversational RAG System with LangChain

In many Q&A applications, the ability to engage in back-and-forth conversations with users is crucial. This necessitates the application having a form of "memory" to recall past interactions and apply this context to current queries.

This guide focuses on integrating historical messages into the application's logic. Additional details on managing chat history can be found [here](https://python.langchain.com/docs/expression_language/how_to/message_history/).

![](https://i.imgur.com/8hLJMPl.gif)

### Building on the Q&A RAG System - to a Conversational Q&A RAG System

We will enhance our Q&A RAG System, which utilizes the Wikipedia dataset, by implementing the following updates:

- **Prompt Adjustment:** Our prompt will be modified to include historical messages as inputs, allowing the system to maintain context over the course of a conversation.

- **Contextualizing Questions:** We will introduce a sub-chain mechanism to reformulate the latest user query by considering the chat history. This is crucial for understanding questions that refer back to previous messages. For example, a query like "Can you elaborate on the second point?" relies on the context provided by preceding interactions, which affects the system's ability to retrieve relevant information effectively.





## Contextualizing the Question

To maintain a seamless flow in conversations, especially in a Q&A setting, it's essential to incorporate historical interactions. Here’s how we achieve this:

### Defining a Sub-Chain for Historical Context

1. **Sub-Chain Creation:** We'll define a sub-chain that uses both historical messages and the latest user query. This sub-chain reformulates the question if it refers to any past interactions, ensuring the system, especially the vector database understands the context to return the most relevant documents to this newly reworded question.

2. **Using `MessagesPlaceholder`:** Our prompt construction involves a `MessagesPlaceholder` variable named `chat_history`. This setup allows us to input a list of messages using the `chat_history` key. The system integrates these messages, positioning them after its own responses and before the latest user question.

3. **Helper Function Usage:** We employ the `create_history_aware_retriever` function available [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html). This function is crucial for handling instances where the chat history might be empty and orchestrates the sequence of operations: `prompt | llm | StrOutputParser() | retriever`.

4. **Chain Construction:** The `create_history_aware_retriever` constructs a chain that processes inputs under the keys `input` and `chat_history`, ensuring the output schema aligns with that of a retriever.

By implementing these steps, our system can effectively utilize historical context to better understand and respond to user queries, thereby enhancing the conversational experience.


In [None]:
rephrase_prompt = hub.pull("langchain-ai/chat-langchain-rephrase")
rephrase_prompt

PromptTemplate(input_variables=['chat_history', 'input'], metadata={'lc_hub_owner': 'langchain-ai', 'lc_hub_repo': 'chat-langchain-rephrase', 'lc_hub_commit_hash': 'fb7ddb56be11b2ab10d176174dae36faa2a9a6ba13187c8b2b98315f6ca7d136'}, template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.\n\nChat History:\n{chat_history}\nFollow Up Input: {input}\nStandalone Question:')

In [None]:
print(rephrase_prompt.template)

Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {input}
Standalone Question:


In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

rephrase_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is."""

rephrase_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rephrase_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    chatgpt, similarity_retriever, rephrase_prompt
)

history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7b5062351390>, search_type='similarity_score_threshold', search_kwargs={'k': 5, 'score_threshold': 0.2}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Given a chat history and the latest user question\nwhich might reference context in the chat history, formulate a standalone question\nwhich can b

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.

## Building the QA RAG Chain with Chat History

Now we're ready to construct our comprehensive QA RAG chain, which leverages historical context for more accurate and relevant responses.

### Components of the QA RAG Chain

1. **Creating Document Chains:**
   - We use the `create_stuff_documents_chain` function, which is detailed [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html). This function is used to create a `question_answer_chain`, accepting inputs such as `context`, `chat_history`, and `input`. It efficiently combines the retrieved context with the conversation history and the current query to generate an informed answer.

2. **Building the Final QA RAG Chain:**
   - The entire QA RAG chain is assembled using the `create_retrieval_chain` function, available [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html). This chain integrates the `history_aware_retriever` with the `question_answer_chain`. It retains intermediate outputs like the retrieved context for added convenience during the query handling process.
   - The `create_retrieval_chain` function accepts keys such as `input` and `chat_history` and includes `input`, `chat_history`, `context`, and `answer` in its outputs.

By implementing these steps, the system not only contextualizes but also provides accurate answers by synthesizing information from both the current and historical interactions. This method enhances the conversational AI’s ability to understand and respond to user queries dynamically, making the interactions more engaging and relevant.


In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_system_prompt = """You are an assistant for question-answering tasks.
                      Use the following pieces of retrieved context to answer the question.
                      If you don't know the answer, just say that you don't know.
                      Keep the answer upto 5 lines unless the user asks for more information

                      Context:
                      {context}
                  """

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(chatgpt, qa_prompt)

qa_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
qa_rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7b5062351390>, search_type='similarity_score_threshold', search_kwargs={'k': 5, 'score_threshold': 0.2}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Given a chat history and the latest user question\nwhich might reference conte

In [None]:
chat_history = []

question = "What is the capital of India?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

The capital of India is New Delhi.


In [None]:
for chunk in qa_rag_chain.stream({"input": question, "chat_history": chat_history}):
  print(chunk)

{'input': 'What is the capital of India?', 'chat_history': []}
{'context': [Document(page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'), Document(page_content='The Republic of India is divided into twenty-eight States,and eight union territories including the National Capital Territory.'), Document(page_content="Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as de

In [None]:
chat_history

[]

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
chat_history

[HumanMessage(content='What is the capital of India?'),
 AIMessage(content='The capital of India is New Delhi.')]

In [None]:
question = "Tell me more about this city"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several monuments. The city covers an area of about 42.7 km and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and falls under the North Indian geographical zone.


In [None]:
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
chat_history

[HumanMessage(content='What is the capital of India?'),
 AIMessage(content='The capital of India is New Delhi.'),
 HumanMessage(content='Tell me more about this city'),
 AIMessage(content='New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several monuments. The city covers an area of about 42.7 km and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and falls under the North Indian geographical zone.')]

In [None]:
question = "Can fish really fly?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

Yes, some fish, like flying fish, can glide above the water's surface. They have winglike fins that allow them to glide for considerable distances to escape predators. However, they do not truly "fly" like birds; instead, they glide through the air after leaping out of the water.


In [None]:
response

{'input': 'Can fish really fly?',
 'chat_history': [HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='Tell me more about this city'),
  AIMessage(content='New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several monuments. The city covers an area of about 42.7 km and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and falls under the North Indian geographical zone.')],
 'context': [Document(page_content='Flying fish are marine oceanic fishes of the family Exocoetidae. They are about 50 species, and they live worldwide in warm waters. They are noted for their ability to glide. They are all small, with a maximum length of about 45 cm (18 inches), and have winglike, rigid fins and an unevenly forked tail.'),
  Document(page_content='The flying snake, or "Chrysopelea", is a mild

In [None]:
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])

In [None]:
chat_history

[HumanMessage(content='What is the capital of India?'),
 AIMessage(content='The capital of India is New Delhi.'),
 HumanMessage(content='Tell me more about this city'),
 AIMessage(content='New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several monuments. The city covers an area of about 42.7 km and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and falls under the North Indian geographical zone.'),
 HumanMessage(content='Can fish really fly?'),
 AIMessage(content='Yes, some fish, like flying fish, can glide above the water\'s surface. They have winglike fins that allow them to glide for considerable distances to escape predators. However, they do not truly "fly" like birds; instead, they glide through the air after leaping out of the water.')]

In [None]:
chat_history[-2:]

[HumanMessage(content='Can fish really fly?'),
 AIMessage(content='Yes, some fish, like flying fish, can glide above the water\'s surface. They have winglike fins that allow them to glide for considerable distances to escape predators. However, they do not truly "fly" like birds; instead, they glide through the air after leaping out of the water.')]

In [None]:
question = "What is the fastest animal?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
print(response['answer'])

The fastest land animal is the cheetah, which can run up to 112 kilometers per hour for short distances.


In [None]:
question = "Tell me about its different species"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
print(response['answer'])

There are several subspecies of cheetah:

1. **South African Cheetah (Acinonyx jubatus jubatus)**: The most abundant subspecies, native to Southern Africa, with over 6,000 individuals in the wild.

2. **Asiatic Cheetah (Acinonyx jubatus venaticus)**: A critically endangered subspecies found in Asia, with a very small population remaining.

These subspecies differ in their geographic distribution and population status.


## Returning Sources in Q&A Applications

An essential feature of Q&A applications is the transparency in showing the sources that contributed to the generated answers. This builds trust and allows users to further explore the original content.

### Integration with LangChain’s `create_retrieval_chain`

- **Source Propagation:** LangChain’s `create_retrieval_chain` function is designed to ensure that source documents retrieved during the question answering process are included in the final output. This is particularly useful for maintaining transparency and providing users with the ability to trace answers back to their origins.
- **Key Implementation:** The retrieved source documents are propagated through to the output under the "context" key. This feature not only supports the credibility of the answers provided but also enhances user engagement by allowing users to verify and explore the sources themselves.

By utilizing this built-in functionality, developers can easily implement a system where users are always informed about the origins of the information provided, thereby enhancing the overall reliability and user experience of the Q&A application.


In [None]:
chat_history = []

In [None]:
question = "which is the fastest animal?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print('Answer:', response['answer'])
print('Sources:')
for document in response['context']:
    print(document)
    print()

Answer: The fastest land animal is the cheetah, which can run up to 112 kilometers per hour for a short time.
Sources:
page_content='A cheetah ("Acinonyx jubatus") is a medium large cat which lives in Africa. It is the fastest land animal and can run up to 112 kilometers per hour for a short time. Most cheetahs live in the savannas of Africa. There are a few in Asia. Cheetahs are active during the day, and hunt in the early morning or late evening.'

page_content='The Zebra Turkeyfish ("Dendrochirus zebra") is a very venomous fish. It lives in the Indian and Pacific seas. The fish has 13 venomous spines along its back, used to look after itself. The fish is slow and quiet, but can be a danger. The fish rests in dark places such as under a rock or a piece of coral. They aren\'t affected by each other\'s venom. They are solitary fish that are not scared of anything, as they have no predators other than groupers.'

page_content='The Bengal tiger ("Panthera tigris tigris") is a tiger subsp

In [None]:
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])

In [None]:
question = "Tell me more, including different types of this animal and their details"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print('Answer:', response['answer'])
print('Sources:')
for document in response['context']:
    print(document)
    print()

Answer: The cheetah ("Acinonyx jubatus") is the fastest land animal, primarily found in Africa, particularly in savannas. There are two main subspecies:

1. **South African Cheetah ("Acinonyx jubatus jubatus")**: This is the most abundant subspecies, with over 6,000 individuals in the wild. Its population in Namibia has increased from about 2,500 in 1990 to over 3,500 by 2015.

2. **Asiatic Cheetah ("Acinonyx jubatus venaticus")**: This subspecies is critically endangered and is native to Asia.

Cheetahs are diurnal hunters, active during the day, and typically hunt in the early morning or late evening.
Sources:
page_content='A cheetah ("Acinonyx jubatus") is a medium large cat which lives in Africa. It is the fastest land animal and can run up to 112 kilometers per hour for a short time. Most cheetahs live in the savannas of Africa. There are a few in Asia. Cheetahs are active during the day, and hunt in the early morning or late evening.'

page_content='The South African Cheetah ("Ac

# Multi-User Conversational RAG System with LangChain

In many Q&A applications, the ability to engage in back-and-forth conversations with users is crucial. This necessitates the application having a form of "memory" to recall past interactions and apply this context to current queries.

However in most real-world conversational systems, multiple users or user sessions will be accessing the system simultaneously.

![](https://i.imgur.com/X4WivLu.gif)

Here we will show how you can use `SQLChatMessageHistory` such that we can store separate conversation histories per user or session which is often the need for real-world chatbots which will be accessed by many users at the same time. Instead of in-memory we can store it in a SQL database which can be used to store a lot of conversations.

We use a `get_session_history` function which is expected to take in a `session_id` and return a Message History object. Everything is stored in a SQL database. This `session_id` is used to distinguish between separate conversations, and should be passed in as part of the config when calling the new chain

We also use a `memory_buffer_window` function to only use the top-K last historical conversations before sending it to the LLM, basically our own implementation of `ConversationBufferWindowMemory`




In [None]:
# removes the memory database file - usually not needed
# you can run this only when you want to remove all conversation histories
!rm memory.db

rm: cannot remove 'memory.db': No such file or directory


In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.chat_message_histories import SQLChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

######### REPHRASER ############
rephrase_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is."""

rephrase_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rephrase_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    chatgpt, similarity_retriever, rephrase_prompt
)



######### MULTI_USER RAG RESPONSE GENERATOR ############
qa_system_prompt = """You are an assistant for question-answering tasks.
                      Use the following pieces of retrieved context to answer the question.
                      If you don't know the answer, just say that you don't know.
                      Keep the answer upto 5 lines unless the user asks for more information

                      Context:
                      {context}
                  """

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# used to retrieve conversation history from database
# based on a specific user or session ID
def get_session_history_db(session_id, topk_conversations=2):
    history = SQLChatMessageHistory(session_id, "sqlite:///memory.db")
    #history.messages = history.messages[-2*topk_conversations:]
    return history

# subset historical conversations
def memory_buffer_window(messages, topk_conversations=2): # each conversation has 2 messages - (human prompt, AI response)
    return messages[-(topk_conversations*2):]

# custom RAG chain which looks at last K conversational messages
question_answer_chain = (
    RunnablePassthrough.assign(chat_history=lambda x: memory_buffer_window(x["chat_history"]))
      |
    qa_prompt
      |
    chatgpt
      |
    StrOutputParser()
)
qa_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


############ CONVERSATIONAL RAG CHAIN ####################
conversational_rag_chain = RunnableWithMessageHistory(
    qa_rag_chain,
    get_session_history_db,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [None]:
from IPython.display import display, Markdown

def conv_rag_chatbot(usersession_id, prompt):
    response = conversational_rag_chain.invoke(
                                {"input": prompt},
                                config={
                                    "configurable": {"session_id": usersession_id}
                                }
    )
    print('Answer:')
    display(Markdown(response['answer']))
    print('Sources:')
    for document in response['context']:
        print(document)
        print()

    return response

In [None]:
us_id = 'bond007'
r = conv_rag_chatbot(us_id, 'What is the capital of India?')



Answer:


The capital of India is New Delhi.

Sources:
page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'

page_content="Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Sin

In [None]:
r = conv_rag_chatbot(us_id, 'Tell me more about it')



Answer:


New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is known for its numerous monuments. The city covers an area of about 42.7 km² and has a population of approximately 9.4 million people. New Delhi is situated in the North Indian zone and is considered an expensive place to live.

Sources:
page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'

page_content='Rajpath (, meaning "King\'s Way") is the national boulevard of India. It is in New Delhi, the capital of India. The boulevard starts at the home of the President of India and ends at the National Stadium.'

page_content="Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United

{'input': 'Tell me more about it',
 'chat_history': [HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='Tell me more about it'),
  AIMessage(content='New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several significant monuments. The city covers an area of about 42.7 km² and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and is located in the North Indian zone.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.')],
 'context': [Document(page_content='New Delhi () is the capital of In

In [None]:
us_id = 'jim003'
r = conv_rag_chatbot(us_id, 'What is the fastest animal on land?')



Answer:


The fastest land animal is the cheetah, which can run up to 112 kilometers per hour for a short distance.

Sources:
page_content='A cheetah ("Acinonyx jubatus") is a medium large cat which lives in Africa. It is the fastest land animal and can run up to 112 kilometers per hour for a short time. Most cheetahs live in the savannas of Africa. There are a few in Asia. Cheetahs are active during the day, and hunt in the early morning or late evening.'

page_content='The South African Cheetah ("Acinonyx jubatus jubatus"), also known as Namibian Cheetah, is the nominate subspecies of cheetah native to Southern Africa. It is the most abundant subspecies estimated at more than 6,000 individuals in the wild. Since 1990 and onwards, the population was estimated at approximately 2,500 individuals in Namibia, until 2015, the cheetah population has been increased to more than 3,500 in the country.'

page_content='The flying snake, or "Chrysopelea", is a mildly venomous snake found throughout India to the Indonesian archipelago. It can glide, in an arboreal habitat, going from tree to tree, most likely, m

{'input': 'What is the fastest animal on land?',
 'chat_history': [HumanMessage(content='What is the fastest animal on land?'),
  AIMessage(content='The fastest land animal is the cheetah, which can run up to 112 kilometers per hour for a short time.'),
  HumanMessage(content='tell me more about its different species'),
  AIMessage(content='There are several subspecies of cheetah:\n\n1. **South African Cheetah (Acinonyx jubatus jubatus)**: The most abundant subspecies, native to Southern Africa, with an estimated population of over 6,000 individuals in the wild.\n\n2. **Asiatic Cheetah (Acinonyx jubatus venaticus)**: A critically endangered subspecies found in Asia, with a very limited population.\n\nThese subspecies differ in their habitats and conservation status, with the South African Cheetah being more numerous compared to the critically endangered Asiatic Cheetah.')],
 'context': [Document(page_content='A cheetah ("Acinonyx jubatus") is a medium large cat which lives in Africa. I

In [None]:
r = conv_rag_chatbot(us_id, 'tell me more about its different species')



Answer:


The cheetah has several subspecies:

1. **South African Cheetah (Acinonyx jubatus jubatus)**: This is the most common subspecies, primarily found in Southern Africa, with a population exceeding 6,000 individuals.

2. **Asiatic Cheetah (Acinonyx jubatus venaticus)**: This subspecies is critically endangered and native to Asia, with a very small population remaining.

3. **Northeast African Cheetah (Acinonyx jubatus soemmeringii)**: Found in parts of Northeast Africa, this subspecies is also facing threats to its population.

4. **Northwest African Cheetah (Acinonyx jubatus hecki)**: This subspecies is found in the Sahara region and is also endangered.

Each subspecies has unique characteristics and faces different conservation challenges.

Sources:
page_content='A cheetah ("Acinonyx jubatus") is a medium large cat which lives in Africa. It is the fastest land animal and can run up to 112 kilometers per hour for a short time. Most cheetahs live in the savannas of Africa. There are a few in Asia. Cheetahs are active during the day, and hunt in the early morning or late evening.'

page_content='The South African Cheetah ("Acinonyx jubatus jubatus"), also known as Namibian Cheetah, is the nominate subspecies of cheetah native to Southern Africa. It is the most abundant subspecies estimated at more than 6,000 individuals in the wild. Since 1990 and onwards, the population was estimated at approximately 2,500 individuals in Namibia, until 2015, the cheetah population has been increased to more than 3,500 in the country.'

page_content='The Asiatic cheetah ("Acinonyx jubatus venaticus") is a critically endangered subspecies of the cheetah native to Asia.'

page_content='The Felinae is a subfamily of the Felidae. It includes sma

{'input': 'tell me more about its different species',
 'chat_history': [HumanMessage(content='What is the fastest animal on land?'),
  AIMessage(content='The fastest land animal is the cheetah, which can run up to 112 kilometers per hour for a short time.'),
  HumanMessage(content='tell me more about its different species'),
  AIMessage(content='There are several subspecies of cheetah:\n\n1. **South African Cheetah (Acinonyx jubatus jubatus)**: The most abundant subspecies, native to Southern Africa, with an estimated population of over 6,000 individuals in the wild.\n\n2. **Asiatic Cheetah (Acinonyx jubatus venaticus)**: A critically endangered subspecies found in Asia, with a very limited population.\n\nThese subspecies differ in their habitats and conservation status, with the South African Cheetah being more numerous compared to the critically endangered Asiatic Cheetah.'),
  HumanMessage(content='What is the fastest animal on land?'),
  AIMessage(content='The fastest land animal i

In [None]:
us_id = 'bond007'
r = conv_rag_chatbot(us_id, 'What about the financial capital of India?')



Answer:


The financial capital of India is Mumbai. It is home to the National Stock Exchange of India (NSE), which is the largest stock exchange in India and the third largest in the world by transaction volume. Mumbai is also the headquarters for many major financial institutions, banks, and corporations, making it a key economic hub in the country.

Sources:
page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'

page_content='The National Stock Exchange of India Limited (NSE), is a Mumbai-based stock exchange. It is the biggest stock exchange in India and the third biggest in the world in terms of amounts of transactions. NSE is mutually-owned by a set of leading financial institutions, banks, insurance companies and other financial intermediaries in India but its ownership and management operate as separate groups. As of 2006, the NSE VSAT terminals, 2799 in total, cover more than 1500 cities across India. In July 2007, the NSE had a total market capitalization of 42,74,509 crore INR making it the second-largest stock mark

{'input': 'What about the financial capital of India?',
 'chat_history': [HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='Tell me more about it'),
  AIMessage(content='New Delhi is the capital of India and a union territory within the megacity of Delhi. It has a rich history and is home to several significant monuments. The city covers an area of about 42.7 km² and has a population of approximately 9.4 million people. New Delhi is known for its high cost of living and is located in the North Indian zone.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='What is the capital of India?'),
  AIMessage(content='The capital of India is New Delhi.'),
  HumanMessage(content='Tell me more about it'),
