In [1]:
import dotenv
dotenv.load_dotenv()

True

# Retrieval Augmented Generation

In this notebook we will build a chat bot which hold a conversation with a user and answer questions about a specific topic by retrieving relevant information from a knowledge base.

We will use samples from [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. In contains an article and its highlights so we will use the article as our knowledge base and the highlights to come up with a question to evaluate our chain.

Let's see sample from the dataset:

In [2]:
import pandas as pd

df = pd.read_csv('data/data.csv')

df.head()

Unnamed: 0,article,highlights,id
0,"LONDON, England (Reuters) -- Harry Potter star...",Harry Potter star Daniel Radcliffe gets £20M f...,42c027e4ff9730fbb3de84c1af0d2c506e41c3e4
1,Editor's note: In our Behind the Scenes series...,Mentally ill inmates in Miami are housed on th...,ee8871b15c50d0db17b0179a6d2beab35065f1e9
2,"MINNEAPOLIS, Minnesota (CNN) -- Drivers who we...","NEW: ""I thought I was going to die,"" driver sa...",06352019a19ae31e527f37f7571c6dd7f0c5da37
3,WASHINGTON (CNN) -- Doctors removed five small...,"Five small polyps found during procedure; ""non...",24521a2abb2e1f5e34e6824e0f9e56904a2b0e88
4,(CNN) -- The National Football League has ind...,"NEW: NFL chief, Atlanta Falcons owner critical...",7fe70cc8b12fab2d0a258fababf7d9c6b5e1262a


In [3]:
print(df['highlights'][0])
print("\n")
print(df['article'][0][:200])

Harry Potter star Daniel Radcliffe gets £20M fortune as he turns 18 Monday .
Young actor says he has no plans to fritter his cash away .
Radcliffe's earnings from first five Potter films have been held in trust fund .


LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on 


## Knowledge Base aka Vector Database

We will use `OpenAI` embeddings library. Although those embeddings have context length of ~8K it is beneficial to split texts into smaller chunks in order to create more meaningfull embeddings and also lower the token usage when we inserting found chunks into the model together with the question to produce the answer. 

When splitting the texts we can set the overlap parameter which can help with assuring that the consequtive chunks and sentences in them have more context.

We will store the vectors using Meta `FAISS` library, which provides a fast similarity search.

Our implementation will follow `ConversationalRetrievalChain` which is one of predefined chains in LangChain, but we will build it using LangChain Expression Language.

In [4]:
from langchain.chains import ConversationalRetrievalChain

### High level overview



<div style="display: flex; justify-content: space-between;">
    <div style="flex: 1; text-align: left;">
    <div style="flex: 1; text-align: center;">
<strong>What a user will see: </strong>
</br> </br>
user: Question 1 
</br>
bot: Answer 1 
</br> </br>
user: Question 2
</br>
bot: Answer 2
</br> </br>
user: Question 3
</br>
bot: ...
    </div>
    </div>
    <div style="flex: 1; text-align: right;">
    <div style="flex: 1; text-align: center;">
<strong>What out chain will do to generate an answer:</strong>
</br> </br>
Step 1: Condensing the chat history with current question to standalone question:

`{some system prompt}`
</br>
</br>
`{chat_history}` 
</br>
</br>
`{Question 3}`
</br> 
</br>
`standalone question = ...`
</br> 
</br>
Step 2: Retrieving relevant information from the VectorDB

`vectordb.similarity_search(standalone question)`
</br> </br>
Step 3: Generating an aswer using the retrieved information

`{some system prompt}`
</br>
</br>
`{context}`
</br>
</br>
`{standalone question}`
</br> 
</br>
`Answer 3 = ... `
    </div>
    </div>
</div>

### Why do we need to condense the chat history with current question to standalone question?

Imagine sample conversation:
<div style="flex: 1; text-align: center;">
user: How much did Daniel Radcliffe got as he turned 18?
</br>
bot: £20M
</br>
user: What was the reason?
</div>

If we were to just use "What was the reason?" as a query for similarity search we would get a lot of irrelevant results. Also using concatenation of the whole history with the current question may not be optimal in a long run, because of possible topics changes and too much noise. That is why we prompt the model to rephrase the whole history with the current question to a standalone one.

Let's see how it works:

In [5]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema.messages import (
    HumanMessage, 
    AIMessage, 
    SystemMessage
)
from langchain.schema.output_parser import StrOutputParser
from langchain.memory import ConversationBufferMemory


CONDENSE_QUESTION_PROMPT = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content="Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question."),
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("Follow Up Input:\n{question}\nStandalone question:\n"),
    ]
)

condense_question_chain = CONDENSE_QUESTION_PROMPT | ChatOpenAI(model='gpt-3.5-turbo', temperature=0.0) | StrOutputParser()

memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")
memory.save_context({"question": "How much did Daniel Radcliffe got as he turned 18?"}, {"answer": "£20M"})

condense_question_chain.invoke({
    "chat_history": memory.load_memory_variables("chat_history")["chat_history"],
    # or just
    # **memory.load_memory_variables("chat_history"),
    "question": "What was the reason?",
})

'What was the reason for Daniel Radcliffe receiving £20M as he turned 18?'

## Vector Database

To create Vector Database with out articles we need to:
1. Split the articles into chunks
2. Create embeddings for each chunk
3. Store the embeddings in the Vector Database

We will also cache the embeddings, so if you rerun this notebook with the same data and chunk parameters it will not need to recompute the embeddings.
The cache is stored in RAM because for some reason LangChain file storage doesnt work with 

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 512,
    chunk_overlap  = 128,
)

print(f"Starting with {len(df)} articles")
docs = text_splitter.create_documents(df['article'].values)
print(f"Splitted articles into {len(docs)} documents")

Starting with 30 articles
Splitted articles into 258 documents


In [7]:
from langchain.vectorstores import FAISS
from langchain.embeddings import CacheBackedEmbeddings, OpenAIEmbeddings
from langchain.storage import LocalFileStore

embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=OpenAIEmbeddings(),
    document_embedding_cache=LocalFileStore('./.cache/'),
    namespace="RAG_embeddings"
)

vecstore = FAISS.from_documents(
    documents=docs,
    embedding=embeddings,
)

In [8]:
vecstore.similarity_search("What was the reason for Daniel Radcliffe receiving £20M as he turned 18?", k=4)

[Document(page_content='LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won\'t cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don\'t plan to be one of those people who, as soon as they'),
 Document(page_content='-- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office chart. Details of how he\'ll mark his landmark birthday are under wraps. His agent and publicist had no comment on his plans. "I\'ll definitely have some sort of party," he said in an interview. "Hopefully none of you will

As we can see, the answer is present just in 4-th document.

In [9]:
vecstore.similarity_search("What was the reason?", k=4)

[Document(page_content='security and stability, they said. There was running water and electricity most of the time. But still life was tough under the dictator, like the time when Zainab\'s uncle disappeared and was never heard from again after he read a "religious book," she said. Sitting in the parking lot of a Target in suburban Los Angeles, Youssif\'s father watched as husbands and wives, boyfriends and girlfriends, parents and their children, came and went. Some held hands. Others smiled and laughed. "Iraq finished," he said'),
 Document(page_content='boy is first and foremost. "I will do anything for Youssif," his father said, pulling his son closer to him. "Our child is everything." His mother tried to coax Youssif to talk to us on this day. But he didn\'t want to; his mother says he\'s shy outside of their home. The biggest obstacle now is getting the visas to leave, and the serious security risks they face every day and hour they remain in Iraq. But this family -- which saw t

Let's try to search something about another article

In [10]:
print(df.iloc[4]['highlights'])

NEW: NFL chief, Atlanta Falcons owner critical of Michael Vick's conduct .
NFL suspends Falcons quarterback indefinitely without pay .
Vick admits funding dogfighting operation but says he did not gamble .
Vick due in federal court Monday; future in NFL remains uncertain .


In [11]:
vecstore.similarity_search("For how long are the Atlanta Falcons suspended?", k=4)

[Document(page_content='that he and two co-conspirators killed dogs that did not fight well. Falcons owner Arthur Blank said Vick\'s admissions describe actions that are "incomprehensible and unacceptable." The suspension makes "a strong statement that conduct which tarnishes the good reputation of the NFL will not be tolerated," he said in a statement.  Watch what led to Vick\'s suspension » . Goodell said the Falcons could "assert any claims or remedies" to recover $22 million of Vick\'s signing bonus from the 10-year, $130'),
 Document(page_content='(CNN)  -- The National Football League has indefinitely suspended Atlanta Falcons quarterback Michael Vick without pay, officials with the league said Friday. NFL star Michael Vick is set to appear in court Monday. A judge will have the final say on a plea deal. Earlier, Vick admitted to participating in a dogfighting ring as part of a plea agreement with federal prosecutors in Virginia. "Your admitted conduct was not only illegal, but a

### Question answering with context

Vector Database in LangChain returns a list of documents. We have to parse it and provide as a context to the model together with the question.

The simples way to do it is just to concatenate all the documents together and provide it as a context. 
A more advanced but also requiring additional LLM call solution would be to ask the model first to summarize and extract the most relevant parts regarding the question.
We will go with the first approach for now. 

In [12]:
from operator import itemgetter
from langchain.schema.runnable import RunnableLambda, RunnableMap

get_context_chain = {
    "documents": lambda inputs: vecstore.similarity_search(inputs["standalone_question"], k=4)
} | RunnableLambda(lambda inpts: "\n".join([doc.page_content for doc in inpts["documents"]]))

context_resp = get_context_chain.invoke({"standalone_question": "What was the reason for Daniel Radcliffe receiving £20M as he turned 18?"})
print(context_resp)

LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they
-- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office chart. Details of how he'll mark his landmark birthday are under wraps. His agent and publicist had no comment on his plans. "I'll definitely have some sort of party," he said in an interview. "Hopefully none of you will be reading about it." Radcliffe's earnings from the fi

In [13]:
QA_WITH_CONTEXT_PROMPT = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template("""\
Use the following pieces of context to answer the users question. \
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context:
{context}"""),
        HumanMessagePromptTemplate.from_template("{standalone_question}"),
    ]
)

qa_with_context_chain = QA_WITH_CONTEXT_PROMPT | ChatOpenAI(model='gpt-3.5-turbo', temperature=0.0) | StrOutputParser()

qa_with_context_chain.invoke({
    "standalone_question": "What was the reason for Daniel Radcliffe receiving £20M as he turned 18?",
    "context": context_resp
})

'Daniel Radcliffe received £20 million as he turned 18 because it was his accumulated earnings from starring in the first five Harry Potter films.'

# Chain it all together

As an exercise, your task is to chain all of those components into one.
In contrast to notebook 02, where we had global memory object, this time we want to have a memory per conversation.

To do so, we have to pass it as a parameter to the chain as input, and extract it from the chain output.
And to make calling the chain more convenient, we will wrap it in a function which creates a new memory object per chain.

In [14]:
from typing import Dict, Any


def step_1_condense_question(inputs: Dict[str, Any]) -> Dict[str, Any]:
    chat_history = ...
    question = ...

    results = condense_question_chain.invoke(...)

    return {
        "standalone_question": ...
    }

def step_2_get_context(inputs: Dict[str, Any]) -> Dict[str, Any]:
    standalone_question = ...

    results = get_context_chain.invoke(...)

    return {
        "context": ...,
        "standalone_question": standalone_question
    }

def step_3_qa_with_context(inputs: Dict[str, Any]) -> Dict[str, Any]:
    standalone_question = ...
    context = ...

    results = qa_with_context_chain.invoke(...)

    return {
        "answer": ...
    }

def update_memory(inputs: Dict[str, Any]) -> Dict[str, Any]:
    answer = ...
    question = ...
    memory = ...

    memory.save_context({"question": question}, {"answer": answer})

    return {
        "answer": answer,
        "memory": memory
    }

def get_new_chain():

    memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")

    qa_with_rag_chain = RunnableMap({
        "answer": RunnableLambda(step_1_condense_question) | RunnableLambda(step_2_get_context) | RunnableLambda(step_3_qa_with_context),
        "question": itemgetter("question"),
        "memory": itemgetter("memory")
    }) | update_memory

    def wrapper(question):    
        results = qa_with_rag_chain.invoke({
            "question": question,
            "memory": memory
        })

        return results["answer"]


Solution

In [15]:
from typing import Dict, Any


def step_1_condense_question(inputs: Dict[str, Any]) -> Dict[str, Any]:
    chat_history = inputs['memory'].load_memory_variables("chat_history")["chat_history"]
    question = inputs['question']

    results = condense_question_chain.invoke({
        "chat_history": chat_history,
        "question": question
    })

    return {
        "standalone_question": results
    }

def step_2_get_context(inputs: Dict[str, Any]) -> Dict[str, Any]:
    standalone_question = inputs['standalone_question']

    results = get_context_chain.invoke({
        "standalone_question": standalone_question
    })

    return {
        "context": results,
        "standalone_question": standalone_question
    }

def step_3_qa_with_context(inputs: Dict[str, Any]) -> Dict[str, Any]:
    standalone_question = inputs['standalone_question']
    context = inputs['context']

    results = qa_with_context_chain.invoke({
        "standalone_question": standalone_question,
        "context": context
    })

    return results

def update_memory(inputs: Dict[str, Any]) -> Dict[str, Any]:
    answer = inputs['answer']
    question = inputs['question']
    memory = inputs['memory']

    memory.save_context({"question": question}, {"answer": answer})

    return {
        "answer": answer,
        "memory": memory
    }

def get_new_chain():

    memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")

    qa_with_rag_chain = RunnableMap({
        "answer": RunnableLambda(step_1_condense_question) | RunnableLambda(step_2_get_context) | RunnableLambda(step_3_qa_with_context),
        "question": itemgetter("question"),
        "memory": itemgetter("memory")
    }) | update_memory

    def wrapper(question):    
        results = qa_with_rag_chain.invoke({
            "question": question,
            "memory": memory
        })

        return results["answer"]
    
    return wrapper


In [16]:
chain = get_new_chain()

In [17]:
chain("How much did Daniel Radcliffe got as he turned 18?")

'Daniel Radcliffe gained access to a reported £20 million ($41.1 million) fortune when he turned 18.'

In [18]:
chain("What was the reason?")

'Daniel Radcliffe gained access to his reported £20 million ($41.1 million) fortune when he turned 18 because his earnings from the first five Harry Potter films were held in a trust fund, which he was not able to touch until he reached that age.'

In [19]:
chain.__closure__[0].cell_contents

ConversationBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='How much did Daniel Radcliffe got as he turned 18?'), AIMessage(content='Daniel Radcliffe gained access to a reported £20 million ($41.1 million) fortune when he turned 18.'), HumanMessage(content='What was the reason?'), AIMessage(content='Daniel Radcliffe gained access to his reported £20 million ($41.1 million) fortune when he turned 18 because his earnings from the first five Harry Potter films were held in a trust fund, which he was not able to touch until he reached that age.')]), return_messages=True, memory_key='chat_history')

In [20]:
chain2 = get_new_chain()

In [21]:
chain2.__closure__[0].cell_contents

ConversationBufferMemory(return_messages=True, memory_key='chat_history')