## RAG from Scratch: Query Transformations

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
LANGCHAIN_API_KEY=os.getenv('LANGCHAIN_API_KEY')
LANGCHAIN_TRACING_V2=os.getenv('LANGCHAIN_TRACING_V2')
LANGCHAIN_ENDPOINT=os.getenv('LANGCHAIN_ENDPOINT')
GROQ_API_KEY=os.getenv('GROQ_API_KEY')

In [3]:
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
import numpy as np
import bs4
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.load import dumps, loads
from operator import itemgetter

#### Indexing

In [4]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# splitting the document using text splitter

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = splitter.split_documents(blog_docs)

# initializing the vector store

vectordb = Chroma(embedding_function=FastEmbedEmbeddings())
retriever = vectordb.as_retriever(kwargs={"k":1})
retriever.add_documents(splits) # indexing documents

USER_AGENT environment variable not set, consider setting it to identify your requests.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

['fcdd3e0d-c502-4f3e-bbdd-8a70d8a59216',
 'e0c1c079-5f05-4425-a40f-45f9bcbda6e1',
 '4fab0542-2526-4818-85bf-b3d838aa7c7c',
 '09b71a75-c616-42c6-8c99-79d34bd4b74b',
 'e6eb3726-88a5-4ca9-838f-e443583bde8c',
 '114e8045-7c8f-42b4-812e-117001506f47',
 '967176ff-63b6-4777-bd5f-c029596fd866',
 'c61b1b04-dbdf-45ad-93c0-e25380eabaf8',
 '7469b911-903e-47e0-9c66-5a1d06a6d23c',
 '1d0fefa6-7d91-4322-b1d7-8d79a449475f',
 'fa242fa7-f2ec-4f73-93ad-df01f138af9d',
 '9dfc5098-1c56-4f7f-84e3-e9f48d4989dd',
 '2dc60516-795c-4a3b-9879-be6aef285ec1',
 'c42164dd-abe4-4537-8e6a-21972844d5e5',
 '032428a7-87ac-4361-8f71-6e70eac81a25',
 'fdeff041-b395-40ca-8f60-485bb6a5c59b',
 'fb2d3da2-82e5-45fb-b482-01f26833612d',
 '8abd8319-2754-4821-b157-413e4f9091df',
 '54797c9a-6848-4a94-ab1c-60dd8468cdca',
 '00ac902c-05e0-411e-b2ce-548b7ebbdd5b',
 'ff6d8324-24a4-4af1-84b6-71d12b0ad9d2',
 '75933296-5412-4ba5-a4ee-f015595e7a39',
 'f9c93d9d-50ba-4034-9084-75cc498c2fa7',
 '8b7dffde-15bb-4ff9-9718-7fcee43b0bd6',
 'c7623664-3a1b-

### Query Rewriting

The main motivation for Query Translation is that user queries can be ambiguous, which can negatively impact the retrieval process, as the relevant document will, possibly, not be able to be retrieved. One way to solve is Query Rewriting. In this type of query translation, we will use an LLM to generate multiple forms of rewritten queries from a given user query, and we will use those queries to retrieve relevant documents. There are two types of Query Rewriting.

#### Multi-Query Retrieval

After the generating multiple queries and performing retrieval, we will just compile the documents, and then give them to the LLM as context.

In [5]:
# making the prompt template

prompt = """You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Only provide the alternative questions separated by newlines and no spaces should be between the questions. Original question: {question}
"""

prompt_perspectives = ChatPromptTemplate.from_template(prompt)

In [6]:
# making the chain for Multiquery Retrieval

generate_queries = (
    prompt_perspectives
    | ChatGroq(model="llama-3.1-70b-versatile", temperature=0.0)
    | StrOutputParser()
    | (lambda x: [y for y in x.split('\n') if y.strip()])
)

In [7]:
# defining a function to return a unique set of documents from the given retrieved documents

def unique_docs(documents: list[list]):
    # flatten list of lists and convert each Document into a string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique = list(set(flattened_docs))

    return [loads(doc) for doc in unique]

In [8]:
# integrating the MultiQuery chain

question = "What is task decomposition for LLM agents?"
retrieval_chain = (
    generate_queries
    | retriever.map()
    | unique_docs
)

In [9]:
# invoke the chain to retrieve the documents

docs = retrieval_chain.invoke({"question": question})

len(docs)

  return [loads(doc) for doc in unique]


9

In [10]:
# final RAG chain

template = """Answer the question. This context is provided to help you answer the question:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0.0)

final_rag_chain = (
    {"context": retrieval_chain,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

In [11]:
final_rag_chain.invoke({"question": question})

'Task decomposition for LLM (Large Language Model) agents is the process of breaking down large tasks into smaller, manageable subgoals. This enables the agent to efficiently handle complex tasks. Task decomposition can be done in several ways, including:\n\n1. By LLM with simple prompting, such as asking for steps or subgoals for a specific task.\n2. By using task-specific instructions, such as writing a story outline for a novel.\n3. With human inputs, where humans provide guidance on how to break down a task into smaller subgoals.\n\nTask decomposition is an important component of the planning stage in a LLM-powered autonomous agent system, as it allows the agent to plan ahead and execute complex tasks in a more efficient and effective manner.'

#### RAG Fusion

Similar to Multiquery retrieval, but we will be performing an additional processing step of using Reciprocal Reranking Function (RRF) to rerank the retrieved documents into a single unified ranking, which the LLM will use to generate the answer.

In [12]:
prompt = """You are an AI language model assistant. Your task is to generate four different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Only provide the alternative questions separated by newlines and no spaces should be between them. Original question: {question}
"""

prompt_rag_fusion = ChatPromptTemplate.from_template(prompt)

In [13]:
generate_queries_rag_fusion = (
    prompt_rag_fusion
    | ChatGroq(model="llama-3.1-8b-instant")
    | StrOutputParser()
    | (lambda x: [y for y in x.split('\n') if y.strip()])
)

In [14]:
# defining the function for RRF

def reciprocal_ranking_function(documents: list[list], k=60):
    fused_scores = {} # fused scores, on which the documents will be reranked

    for docs in documents:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc) # converting docs to a string

            if doc_str not in fused_scores: # if the doc is not already in the fused_scores dictionary
                fused_scores[doc_str] = 0 # assign the minimum score for now

            previous_score = fused_scores[doc_str] # extract the previous score
            fused_scores[doc_str] += 1 / (k + rank) # add the calculated score

    ranked_results = [
        (loads(doc), score) for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True) # rerank the documents
    ]

    return ranked_results

In [15]:
# integrating the RAG fusion chain

retrieval_chain_rag_fusion = (
    generate_queries_rag_fusion
    | retriever.map()
    | reciprocal_ranking_function
)

In [16]:
# retrieving the documents

docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

6

In [17]:
# final RAG fusion chain

template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0.0)

final_rag_fusion_chain = (
    {"context": retrieval_chain_rag_fusion,
     "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

In [18]:
final_rag_fusion_chain.invoke({"question": question})

'Task decomposition for LLM (Large Language Model) agents involves breaking down large tasks into smaller, manageable subgoals. This enables efficient handling of complex tasks. According to the provided context, task decomposition can be done in three ways:\n\n1. By LLM with simple prompting, such as "Steps for XYZ.\\\\n1." or "What are the subgoals for achieving XYZ?"\n2. By using task-specific instructions, for example, "Write a story outline" for writing a novel.\n3. With human inputs.\n\nThis process is mentioned in the context as a way to facilitate long-term planning and task management for LLM agents.'

### Query Decomposition