https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

some deviations from the source code because i dont wanna pay for embeddings from openai, or hit openai models. All openAI integration is replaced with ollama.

I also removed langsmith integration. don't think it's needed. just a frontend for LLM debugging which i can achieve with `langchain.debug = True`

RAG Fusion - After doing multi-query, instead of passing all returned documents into RAG, we may want to do some sort of ranking. We use Reciprocal rank fusion to do so.


In [16]:
from langchain_community.vectorstores import Chroma
# Load documents
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# setting debug to true will allow us to see what is langchain actually creating
import langchain 
langchain.debug = True 

# Get embedding model
from langchain_ollama import OllamaEmbeddings

# Get chat model
from langchain_ollama.chat_models import ChatOllama

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from operator import itemgetter

In [2]:
# Everything in this cell is from previous notebooks
# Load docs from bs4
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split docs
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(blog_docs)

# Get embedding ollama model
embed = OllamaEmbeddings(
    model="nomic-embed-text"
)

# Embed
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=embed)

# Set up a retriever
embed = OllamaEmbeddings(
    model="nomic-embed-text"
)

# Embed
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}, # How many to retrieve
    search_type='mmr'       # 'similarity' by default
)

# Get llm
llm = ChatOllama(model="llama3.2:3b-instruct-q5_K_M", temperature=0)

# Get multiquery prompt
prompt = """You are an AI language model assistant. Your task is
to generate {num_qn} different versions of the given user
question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user question,
your goal is to help the user do an adequate covering of the distance-based similarity search.  Think in pictures meaning that your questions should cover the largest possible perspective.

Provide these alternative questions separated by newlines. For example:
Question 1: How many breeds of dogs are there?
Question 2: What's the total count of the number of dog breeds?
Question 3: How many subspecies of dogs are there?

Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(prompt)

generate_queries = (
    prompt_perspectives
    | llm # The LLM that responds to the prompt
    | StrOutputParser() # Just a simple fn to convert output into str. E.g. sometimes if it's a chat model, the return is in response['content'], and not a string directly
    | (lambda x: x.split("\n")) # Split by newline. 
)

In [3]:
generate_queries.invoke({"question": 'what types of water bottles are there', "num_qn": 5})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatOllama] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: You are an AI language model assistant. Your task is\nto generate 5 different versions of the given user\nquestion to retrieve relevant documents from a vector database.\nBy generating multiple perspectives on the user question,\nyour goal is to help the user do an adequate covering of the distance-based similarity search.  Think in pictures meaning that you

['Here are five alternative versions of the original question to retrieve relevant documents from a vector database:',
 '',
 '1. What categories or classifications exist for water bottles in terms of material, shape, size, and functionality?',
 '2. Can you provide an exhaustive list of different types of water bottles that cater to various needs, preferences, and lifestyles?',
 '3. How many distinct subcategories or subsets of water bottles can be identified based on their design, purpose, and intended use?',
 '4. What are the primary attributes or characteristics that distinguish one type of water bottle from another in terms of performance, durability, and aesthetics?',
 '5. Are there any specific standards, regulations, or certifications that govern the classification, labeling, or categorization of different types of water bottles?',
 '',
 'These alternative questions aim to capture a broader range of perspectives on the original question, including categories, classifications, sub

In [13]:
def rrf(results, k=60):
    scores = {}
    for documents in results:
        # Each result has multiple documents
        # Assume the order of documents is most relevant first
        for rank, doc in enumerate(documents):
            doc_str = doc.page_content
            if doc_str not in scores:
                scores[doc_str] = 1 / (k + rank)
            else:
                scores[doc_str] += 1 / (k + rank)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)
        

In [17]:
retrieval_chain = generate_queries | retriever.map() | rrf 

In [18]:
retrieval_chain.invoke({"question": 'what types of water bottles are there', "num_qn": 5})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatOllama] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: You are an AI language model assistant. Your task is\nto generate 5 different versions of the given user\nquestion to retrieve relevant documents from a vector database.\nBy generating multiple perspectives on the user question,\nyour goal is to help the user do an adequate covering of the distance-based similarity search.  Think in pictures meaning that you

[('}\n]\nChallenges#\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:',
  0.1605970585396154),
 ('Make sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.\nEnsure to implement all code, if you are unsure, write a plausible implementation.\nInclude module dependency or package manager dependency definition file.\nBefore you finish, double check that all parts of the architecture is present in the files.\nUseful to know:\nYou almost always put different classes in different files.\nFor Python, you always create an appropriate requirements.txt file.\nFor NodeJS, you always create an appropriate package.json file.\nYou always add a comment briefly describing the purpose of the function definition.\nYou try to add comments explaining very complex bits of logic.\nYou always follow the best practices for the requested languages in terms of describing the code

In [21]:
from langchain.schema.runnable import RunnableLambda

# Write the generation prompt
generation_prompt = """Answer the following question based on this context:

{context}

Question: {question}"""
generation_prompt_obj = ChatPromptTemplate.from_template(generation_prompt)

# Write function to downselect N docs selected by RAG fusion
def downselect_docs(docs_scores, n):
    return [doc for doc, _ in docs_scores[:n]]
# wrap downselect in runnable lambda
downselect_lambda = RunnableLambda(lambda x: {"context": downselect_docs(x["context"], 5), "question": x["question"]})

full_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")}
     | downselect_lambda
     | generation_prompt_obj
     | llm
)

In [22]:
full_chain.invoke({"question": 'what types of water bottles are there', "num_qn": 5})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "question": "what types of water bottles are there",
  "num_qn": 5
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequ

AIMessage(content="There is no mention of water bottles in the provided context. The text appears to be discussing challenges and best practices for building large language models (LLMs) centered agents, as well as referencing various research papers and tools related to this topic. If you're looking for information on types of water bottles, I'd be happy to help with that!", additional_kwargs={}, response_metadata={'model': 'llama3.2:3b-instruct-q5_K_M', 'created_at': '2025-02-03T14:42:35.561106298Z', 'done': True, 'done_reason': 'stop', 'total_duration': 18178319708, 'load_duration': 17994505, 'prompt_eval_count': 534, 'prompt_eval_duration': 13320000000, 'eval_count': 70, 'eval_duration': 4839000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-48f620f0-41a2-43a5-bb09-b1be8ac47134-0', usage_metadata={'input_tokens': 534, 'output_tokens': 70, 'total_tokens': 604})