# bRAG: Query Transformations with Multi-Query

Query transformations are a set of approaches focused on re-writing and / or modifying questions for retrieval.

![overview](./image/query-overview.png)

## Pre-requisites (optional but recommended)

### Only do the first step if you have never created a virtual environment for this repository. Otherwise, make sure that the Python Kernel that you selected is from your `venv/` folder.

In [96]:
# Create virtual environment
! python3 -m venv ../venv

In [97]:
# Activate virtual Python environment
! source ../venv/bin/activate

In [98]:
# If your Python is not from your venv path, ensure that your IDE's kernel selection (on the top right corner) is set to the correct path 
# (your path output should contain "...venv/bin/python")

! which python

/Users/taha/Desktop/bRAGAI/code/gh/bRAG-langchain/venv/bin/python


In [99]:
# Install all packages
! pip3 install -r ../requirements.txt --quiet

### * If you choose to skip the pre-requisites and install only the packages specific to this notebook using your global Python path environment, execute the command below; otherwise, proceed to the next step.

In [100]:
! pip3 install --quiet pinecone-client python-dotenv langchain langchain-community langchain-core langchain-openai beautifulsoup4 tiktoken pypdf

## Environment

`(1) Packages`

In [17]:
import os
from dotenv import load_dotenv

# Load all environment variables from .env file
load_dotenv()

# Access the environment variables
langchain_tracing_v2 = os.getenv('LANGCHAIN_TRACING_V2')
langchain_endpoint = os.getenv('LANGCHAIN_ENDPOINT')
langchain_api_key = os.getenv('LANGCHAIN_API_KEY')

## LLM
openai_api_key = os.getenv('OPENAI_API_KEY')

## Pinecone Vector Database
pinecone_api_key = os.getenv('PINECONE_API_KEY')
pinecone_api_host = os.getenv('PINECONE_API_HOST')
index_name = os.getenv('PINECONE_INDEX_NAME')


`(2) LangSmith`

https://docs.smith.langchain.com/

In [18]:
os.environ['LANGCHAIN_TRACING_V2'] = langchain_tracing_v2
os.environ['LANGCHAIN_ENDPOINT'] = langchain_endpoint
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

`(3) API Keys`

In [19]:
os.environ['OPENAI_API_KEY'] = openai_api_key
openai_model = "gpt-3.5-turbo"

#Pinecone keys
os.environ['PINECONE_API_KEY'] = pinecone_api_key
os.environ['PINECONE_API_HOST'] = pinecone_api_host
os.environ['PINECONE_INDEX_NAME'] = index_name

`(4) Pinecone Init`

In [20]:
from pinecone import Pinecone

pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index = pc.Index(os.environ['PINECONE_INDEX_NAME'])

## Multi Query RAG Architecture

Flow:

![multi-query](./image/multi-query.png)

Docs:

* https://python.langchain.com/docs/how_to/MultiQueryRetriever/

### Index

In [55]:
# Load blog
import bs4
from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone
from pprint import pprint

#### INDEXING ####

# Load Document (Uploading one file at a time)
pdf_file_path = "../test/langchain_turing.pdf"
loader = PyPDFLoader(pdf_file_path)

docs = loader.load()

# Upload muiltiple PDF files from a directory
# pdf_file_paths = <enter your path here>
# loader = PyPDFDirectoryLoader(pdf_file_paths)

# docs_dir = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=2000, 
    chunk_overlap=500)

# Make splits
splits = text_splitter.split_documents(docs)

# Index
vectorstore = Pinecone.from_documents(
    documents=splits, 
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"), 
    index_name=index_name
)

retriever = vectorstore.as_retriever()

Make sure your PineconeDB contains the uploaded file

# ![Pinecone](./image/pinecone.png)


### Prompt

In [56]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_perspectives 
    | ChatOpenAI(model_name=openai_model, temperature=0.1) 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [57]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "How does LangChain leverage modular components like LangGraph, LangSmith, and LangServe to address challenges in building scalable and secure LLM-powered applications?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

3

In [58]:
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(model_name=openai_model, temperature=0.1)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

print(final_rag_chain.invoke({"question":question}))

LangChain leverages modular components like LangGraph, LangSmith, and LangServe to address challenges in building scalable and secure LLM-powered applications by providing developers with a comprehensive toolkit for simplifying the complexities of working with LLMs. These components enable developers to configure, extend, and deploy applications tailored to specific needs, facilitating the development of stateful, contextually aware applications. LangGraph aids in stateful process modeling, LangServe allows for scalable API deployment, and LangSmith provides monitoring and evaluation capabilities. Additionally, LangChain's emphasis on security through granular permissions, sandboxing, defense in depth, auditability, and monitoring helps mitigate risks associated with data exposure and dependency vulnerabilities. The framework also supports a wide range of third-party integrations, allowing for custom component development and additional functionality, ensuring flexibility and adaptabil

## Part 6: RAG-Fusion

RAG Fusion is an advanced Retrieval-Augmented Generation approach that combines multiple retrieval sources, each specialized in a unique context, to generate a more accurate, contextually rich response. Unlike Regular Multi-Query, which independently executes and aggregates results without blending, RAG Fusion dynamically selects and integrates information from diverse sources, creating a unified, coherent answer that adapts to complex queries. This fusion of contextually relevant information enhances response robustness and relevance, making RAG Fusion particularly effective for handling multi-faceted information retrieval tasks.

Flow:

![rag-fusion](./image/rag-fusion.png)

Docs:

* https://github.com/langchain-ai/langchain/blob/master/cookbook/rag_fusion.ipynb?ref=blog.langchain.dev

Blog / repo: 

* https://medium.com/towards-data-science/forget-rag-the-future-is-rag-fusion-1147298d8ad1

### Prompt

In [59]:
from langchain.prompts import ChatPromptTemplate

# RAG-Fusion: Related
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [60]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_rag_fusion 
    | ChatOpenAI(model=openai_model, temperature=0.1)
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [61]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

3

In [63]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

pprint(final_rag_chain.invoke({"question":question}))

('LangChain leverages modular components like LangGraph, LangSmith, and '
 'LangServe to address challenges in building scalable and secure LLM-powered '
 'applications by providing developers with a comprehensive toolkit for '
 'building, deploying, and monitoring applications. LangGraph enables stateful '
 'process modeling, LangServe facilitates the deployment of LLM applications '
 'as scalable REST APIs, and LangSmith provides monitoring and evaluation '
 'capabilities. These components work together to enable developers to build '
 'context-aware applications tailored to specific needs across diverse '
 'domains, including NLP, cybersecurity, healthcare, finance, and customer '
 "service. Additionally, LangChain's emphasis on flexibility allows for custom "
 'component development and integration with third-party tools, enhancing the '
 'functionality and adaptability of applications. The security features in '
 'LangChain, such as granular permissions, sandboxing, real-time moni

## RAG Decomposition Architecture

RAG Decomposition Architecture is a specialized framework within Retrieval-Augmented Generation designed to break down complex queries into simpler, manageable sub-queries. Each sub-query focuses on a specific part of the larger question and is sent to specialized retrievers or databases to gather precise information. These sub-results are then combined and synthesized to form a cohesive, comprehensive answer to the original query. This architecture enhances retrieval accuracy, as each sub-query targets a specific context, reducing noise and improving relevance in the final response. RAG Decomposition is particularly useful for multi-part questions, complex topics, or scenarios requiring in-depth, granular answers.

### Answer recursively  

![answer-recursively](./image/answer-recursively.png)

Papers:

* https://arxiv.org/pdf/2205.10625.pdf
* https://arxiv.org/abs/2212.10509.pdf

In [64]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [67]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# LLM
llm = ChatOpenAI(model=openai_model, temperature=0.1)

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What role does LangChain's Retrieval-Augmented Generation (RAG) pipeline play in improving the accuracy and relevance of LLM responses?"
questions = generate_queries_decomposition.invoke({"question":question})

In [68]:
questions

["1. How does LangChain's Retrieval-Augmented Generation (RAG) pipeline incorporate external knowledge sources to enhance the accuracy of LLM responses?",
 "2. What specific techniques are used in LangChain's RAG pipeline to ensure the relevance of LLM responses to the input query?",
 "3. Can LangChain's RAG pipeline be customized or fine-tuned to improve the accuracy and relevance of LLM responses for specific domains or industries?"]

In [69]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [70]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

# llm
llm = ChatOpenAI(model_name=openai_model, temperature=0.1)

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

In [71]:
pprint(answer)

("Yes, LangChain's RAG pipeline can be customized or fine-tuned to improve the "
 'accuracy and relevance of Large Language Model (LLM) responses for specific '
 'domains or industries. The RAG pipeline incorporates techniques such as '
 'Document Loaders and Text Splitters, Embedding Models and Vector Stores, and '
 'Retrievers and RAG Chains to ensure relevance and accuracy in responses. By '
 'customizing the preprocessing of documents, embedding models, and retrieval '
 'of external data sources, developers can tailor the RAG pipeline to specific '
 'domains or industries, enhancing the quality of generated answers. '
 "Additionally, LangChain's architecture supports custom component development "
 'and integrations, allowing for flexibility and adaptability to meet the '
 'requirements of different application scenarios.')


### Answer individually

Alternatively, we can take the answers from each individual query and pass it on directly to the LLM to generate a final answer given the previous answers as context.

![answer-individually](./image/answer-individually.png)

In [79]:
# Answer each sub-question individually 
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG prompt

template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Context: {context}
Question: {question} 
Answer: 
"""

prompt_rag = ChatPromptTemplate.from_template(template)
# prompt_rag = hub.pull("rlm/rag-prompt")
# template = " {question} "
# prompt_rag = ChatPromptTemplate.from_template(template)

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """RAG on each sub-question"""
    retriever = vectorstore.as_retriever()
    
    # Use our decomposition / 
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        print("sq :", sub_question)
        
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.invoke(sub_question)
        print("rdocs: ", retrieved_docs)
        
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, "question": sub_question})
        
        print("q: ", question)
        print("a: ", answer)
        
        rag_results.append(answer)
    
    return rag_results,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

# print("qqq: ", questions, "\naaa: ", answers)

sq : 1. How does LangChain's Retrieval-Augmented Generation (RAG) pipeline incorporate external knowledge sources to enhance the accuracy of LLM responses?
rdocs:  [Document(metadata={'page': 2.0, 'source': '../test/langchain_turing.pdf'}, page_content='LangChain 3\nneeds, providing a flexible foundation for building scalable, secure, and multi-\nfunctional applications. Figure 1 illustrates a fundamental LangChain pipeline.\nIn this architecture, diverse data sources—including documents, text, and im-\nages—are embedded and stored within a vector store. Upon receiving a user’s\nquery, the system retrieves the most relevant information from the vector store.\nThis retrieved context is then provided to the large language model (LLM),\nenhancing its ability to generate accurate and factually grounded responses.\nFig. 1.LangChain pipeline architecture showcasing the retrieval-augmented genera-\ntion process. Documents in various formats (e.g., PDF, text, images) are preloaded\nand embedde

In [80]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

# question2 = "What do you know?"

print("Original Question:\n", question)
print("\nContext (including 3 question:answer pairs from modified query ):\n", context)

print("\nFinal Answer:\n", final_rag_chain.invoke({"context":context,"question":question}))

Original Question:
 What role does LangChain's Retrieval-Augmented Generation (RAG) pipeline play in improving the accuracy and relevance of LLM responses?

Context (including 3 question:answer pairs from modified query ):
 Question 1: 1. How does LangChain's Retrieval-Augmented Generation (RAG) pipeline incorporate external knowledge sources to enhance the accuracy of LLM responses?
Answer 1: LangChain's RAG pipeline embeds diverse data sources into a vector store. When a user submits a query, the system retrieves the most relevant information from this vector store. This retrieved context is then provided to the large language model (LLM), improving its ability to generate accurate and factually grounded responses.

Question 2: 2. What specific techniques does LangChain's RAG pipeline use to ensure the relevance of LLM responses to user queries?
Answer 2: LangChain's RAG pipeline ensures relevance by using Document Loaders and Text Splitters for preprocessing, Embedding Models and Ve

<!-- Trace:

https://smith.langchain.com/public/ed1cabf5-dea0-478b-8088-f7323d938a9b/r -->

## Part 8: Step Back

The step-back method is a problem-solving and information retrieval technique that involves generating more abstract or higher-level questions rather than directly addressing the original query. This approach, known as "stepback prompting," emphasizes understanding broader contexts and underlying concepts by posing general questions that provide a bigger picture. The method includes using examples to guide the formulation of these abstract questions and allows for independent retrieval of information relating to both the original and step-back questions. This dual retrieval process can enhance understanding and yield more robust answers, making it particularly useful in domains with substantial conceptual knowledge, such as technical documentation and textbooks, by separately addressing high-level concepts and their detailed implementations.

![step-back](./image/step-back.png)

Paper: 

* https://arxiv.org/pdf/2310.06117.pdf

In [81]:
# Few Shot Examples
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)

In [83]:
generate_queries_step_back = prompt | ChatOpenAI(model=openai_model, temperature=0.1) | StrOutputParser()
question = "How does LangChain ensure security when integrating external services like vector databases and API providers in LLM applications?"
generate_queries_step_back.invoke({"question": question})

'How does LangChain manage security in integrating external services in LLM applications?'

In [91]:
# Response prompt 
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatOpenAI(model=openai_model, temperature=0.1)
    | StrOutputParser()
)

ans = chain.invoke({"question": question})

print(ans.strip())

LangChain ensures security when integrating external services like vector databases and API providers in LLM applications through a combination of best practices and internal controls. 

1. Granular Permissions: LangChain enforces the principle of least privilege by allowing developers to specify limited permissions. This minimizes the risk of unauthorized actions by ensuring that only necessary permissions are granted, reducing the potential attack surface.

2. Sandboxing and Defense in Depth: LangChain utilizes sandboxed environments and layered security measures to protect sensitive data and limit exposure to vulnerabilities. By isolating external services and implementing multiple layers of security, LangChain mitigates the risk of data breaches and unauthorized access.

3. Auditability and Monitoring: LangSmith, a component of LangChain, provides detailed logging and monitoring capabilities. This enables developers to track application usage in real-time and detect anomalies, allo

## Part 9: HyDE

![hyde](./image/hyde.png)

Docs: 

* https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb

Paper:

* https://arxiv.org/abs/2212.10496

In [93]:
from langchain.prompts import ChatPromptTemplate

# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_docs_for_retrieval = (
    prompt_hyde | ChatOpenAI(model=openai_model, temperature=0.1) | StrOutputParser() 
)

# Run
question = "How can LangChain's memory module be utilized to maintain context across multi-turn conversations in a chatbot application?"
generate_docs_for_retrieval.invoke({"question":question})

"LangChain's memory module can be utilized to maintain context across multi-turn conversations in a chatbot application by storing and retrieving relevant information from previous interactions. This module can store key information such as user preferences, past queries, and any other relevant data that can help the chatbot maintain a coherent conversation flow. By leveraging this memory module, the chatbot can provide more personalized responses and anticipate user needs based on past interactions. Additionally, the memory module can also help the chatbot track the progression of the conversation and ensure that the context is maintained throughout the dialogue. This can lead to a more seamless and engaging user experience, as the chatbot can reference previous interactions and provide more relevant and accurate responses. Overall, the memory module in LangChain can significantly enhance the conversational capabilities of a chatbot application and improve the overall user experience.

In [94]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever 
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(metadata={'page': 3.0, 'source': '../test/langchain_turing.pdf'}, page_content='4 Vasilios Mavroudis\nMemory: Enables applications to retain information from past interactions,\nsupporting both basic and advanced memory structures. This component is crit-\nical for maintaining context across sessions and delivering contextually aware\nresponses.\nIndexes: Serve as structured databases that organize and store information,\nallowing for efficient data retrieval when processing language queries.\nRetrievers: Designed to work alongside indexes, retrievers fetch relevant data\nbased on query inputs, ensuring that the generated responses are well-informed\nand accurate.\nVector Store: Manages the embedding of words or phrases as numerical vec-\ntors, a core step in capturing semantic meaning and supporting tasks involving\nlanguage understanding and similarity searches.\nOutput Parsers: Components that refine and structure the generated language\noutputs for specific tasks, ensurin

In [95]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})

"LangChain's memory module enables applications to retain information from past interactions, supporting both basic and advanced memory structures. This component is critical for maintaining context across sessions and delivering contextually aware responses in a chatbot application. By storing and retrieving conversation history as needed, the memory module ensures continuity and persistence of context, making it suitable for prolonged, context-dependent conversations in multi-turn interactions."

## Conclusion

This notebook provides a comprehensive examination of Retrieval-Augmented Generation (RAG) techniques, with a particular emphasis on multi-query architectures and their practical applications.

1. **Environment Setup**: Detailed instructions for the creation of a virtual environment and the installation of requisite packages are provided.
2. **Data Loading and Indexing**: Methodologies for the effective loading of documents and their subsequent indexing using Pinecone are discussed.
3. **Multi-Query RAG**: Techniques for generating multiple perspectives on user queries are explored to enhance retrieval accuracy.
4. **RAG Fusion and Decomposition**: Advanced methodologies for the integration and breakdown of queries are presented to improve the quality of responses.
5. **Practical Implementations**: Concrete examples illustrating the implementation of these techniques utilizing LangChain and OpenAI models are included.

In summary, this notebook serves as a valuable resource for the implementation of RAG methodologies across various applications, underscoring the significance of context and query diversity in the realm of information retrieval tasks.