https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

some deviations from the source code because i dont wanna pay for embeddings from openai, or hit openai models. All openAI integration is replaced with ollama.

I also removed langsmith integration. don't think it's needed. just a frontend for LLM debugging which i can achieve with `langchain.debug = True`

Figuring out multi query here. It addresses the issue of users giving poor prompts, which leads to subpar retrieval of documents. The idea is to use an LLM to rewrite the given query in Y number of ways, then hit the retrieval DB Y times to retrieve Y x N documents instead.

In [15]:
from langchain_community.vectorstores import Chroma

# setting debug to true will allow us to see what is langchain actually creating
import langchain 
langchain.debug = True 

In [1]:
# Load documents
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [5]:
# Split documents into different chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(blog_docs)

In [6]:
# Get embedding model
from langchain_ollama import OllamaEmbeddings

embed = OllamaEmbeddings(
    model="nomic-embed-text"
)

# Embed
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=embed)

In [7]:
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}, # How many to retrieve
    search_type='mmr'       # 'similarity' by default
)

Set up a prompt for multiquery

In [9]:
from langchain_ollama.chat_models import ChatOllama
llm = ChatOllama(model="llama3.2:3b-instruct-q8_0", temperature=0)

In [10]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = """You are an AI language model assistant. Your task is
to generate {num_qn} different versions of the given user
question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user question,
your goal is to help the user do an adequate covering of the distance-based similarity search.  Think in pictures meaning that your questions should cover the largest possible perspective.

Provide these alternative questions separated by newlines. For example:
Question 1: How many breeds of dogs are there?
Question 2: What's the total count of the number of dog breeds?
Question 3: How many subspecies of dogs are there?

Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(prompt)

generate_queries = (
    prompt_perspectives
    | llm # The LLM that responds to the prompt
    | StrOutputParser() # Just a simple fn to convert output into str. E.g. sometimes if it's a chat model, the return is in response['content'], and not a string directly
    | (lambda x: x.split("\n")) # Split by newline. 
)

In [24]:
docs[0][0].page_content

'[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).\n[12] Parisi et al. “TALM: Tool Augmented Language Models”\n[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).\n[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.\n[15] Li et al. “API-Bank: A Benchmark for Tool-Augmented LLMs” arXiv preprint arXiv:2304.08244 (2023).\n[16] Shen et al. “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace” arXiv preprint arXiv:2303.17580 (2023).\n[17] Bran et al. “ChemCrow: Augmenting large-language models with chemistry tools.” arXiv preprint arXiv:2304.05376 (2023).'

In [26]:
def get_unique_union(docs):
    flattened_docs = [doc.page_content for sublist in docs for doc in sublist]
    return list(set(filter(bool, flattened_docs)))

question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union

In [27]:
docs = retrieval_chain.invoke({"question":question, "num_qn": 5})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatOllama] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: You are an AI language model assistant. Your task is\nto generate 5 different versions of the given user\nquestion to retrieve relevant documents from a vector database.\nBy generating multiple perspectives on the user question,\nyour goal is to help the user do an adequate covering of the distance-based similarity search.  Think in pictures meanin

In [28]:
len(docs)

22

In [31]:
from operator import itemgetter
# from langchain_openai import ChatOpenAI
# from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question, "num_qn": 5})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question>] Entering Chain run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> > chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "question": "What is task decomposition for LLM agents?",
  "num_qn": 5
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question> 

'Task decomposition refers to the process of breaking down a complex task into smaller, more manageable sub-tasks that can be executed by an autonomous agent, such as a Large Language Model (LLM) agent. This allows the agent to plan and execute tasks over a lengthy history and effectively explore the solution space.\n\nIn the context of LLM agents, task decomposition involves:\n\n1. Identifying the high-level goals and objectives of the task\n2. Breaking down each goal into smaller sub-tasks that can be executed sequentially or in parallel\n3. Assigning specific actions or commands to each sub-task\n4. Planning and executing the sequence of sub-tasks to achieve the overall goal\n\nTask decomposition is challenging for LLM agents because they struggle with:\n\n1. Long-term planning: LLMs have limited short-term memory, making it difficult to plan and execute tasks over a lengthy history.\n2. Task decomposition: Breaking down complex tasks into smaller sub-tasks requires the agent to und