https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

some deviations from the source code because i dont wanna pay for embeddings from openai, or hit openai models. All openAI integration is replaced with ollama.

I also removed langsmith integration. don't think it's needed. just a frontend for LLM debugging which i can achieve with `langchain.debug = True`

Decomposition - The idea is similar to multi-query but the core idea is different. Instead of a rewrite of a given query, we ask the LLM to decompose the given question into sub-questions, perform RAG and answer each subquestion, then with each subquestion and subanswer, we ask the original question to the LLM.


In [1]:
from langchain_community.vectorstores import Chroma
# Load documents
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# setting debug to true will allow us to see what is langchain actually creating
import langchain 
langchain.debug = True 

# Get embedding model
from langchain_ollama import OllamaEmbeddings

# Get chat model
from langchain_ollama.chat_models import ChatOllama

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from operator import itemgetter

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
# Everything in this cell is from previous notebooks
# Load docs from bs4
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split docs
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(blog_docs)

# Get embedding ollama model
embed = OllamaEmbeddings(
    model="nomic-embed-text"
)

# Embed
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=embed)

# Set up a retriever
embed = OllamaEmbeddings(
    model="nomic-embed-text"
)

# Embed
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}, # How many to retrieve
    search_type='mmr'       # 'similarity' by default
)

# Get llm
llm = ChatOllama(model="llama3.2:3b-instruct-q5_K_M", temperature=0)

In [3]:
# Enhance the original video by applying structured output
from pydantic import BaseModel, conlist

class DecompositionModel(BaseModel):
    questions: conlist(str, min_length=3, max_length=3)

decomposition_llm = ChatOllama(
    model="llama3.2:3b-instruct-q5_K_M",
    temperature=0,
    format=DecompositionModel.model_json_schema()
)

In [20]:
# The part that decomposes the question
def decomposition_validator(ai_message):
    return DecompositionModel.model_validate_json(ai_message.content).questions

decomposition_prompt = """Decompose the following question into {n} number of subquestions that can be individually answer.\n{question}"""
decomposition_prompt = ChatPromptTemplate.from_template(decomposition_prompt)
decomposition_chain = decomposition_prompt | decomposition_llm | decomposition_validator # | StrOutputParser() | (lambda x: x.split('\n'))

In [23]:
question = "What are the main components of an LLM-powered autonomous agent system?"

In [24]:
questions = decomposition_chain.invoke({'n':3, 'question': question})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "n": 3,
  "question": "What are the main components of an LLM-powered autonomous agent system?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "n": 3,
  "question": "What are the main components of an LLM-powered autonomous agent system?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatOllama] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: Decompose the following question into 3 number of subquestions that can be individually answer.\nWhat are the main components of an LLM-powered autonomous agent system?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[chain:RunnableSequence > llm:ChatOllama] [12.19s] Exiting LLM run with output:
[0m{
  "genera

In [25]:
questions

['What is a key component of an LLM-powered autonomous agent system?',
 'How does LLM technology contribute to the decision-making process in an autonomous agent system?',
 'What role do other AI technologies play alongside LLM in an autonomous agent system?']

In [26]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7a327c7c1ea0>, search_type='mmr', search_kwargs={'k': 5})

In [27]:
# The part where you iteratively have an LLM answer the question, but adding each subquestion/subanswer
# The video example doesn't make sense to me. Why would we answer each subquestion, but not answer the original question?

subquestion_answers = []

for subquestion in questions:
    iterative_prompt_template = """Here is the question you need to answer:
{subquestion}


Here are additional related questions and answers to use to answer the question:
{subquestion_answers}


Here is additional context relevant to the question:
{context}


Use the above context and any related questions and answer to answer the question:
{subquestion}
"""

    iterative_prompt = ChatPromptTemplate.from_template(iterative_prompt_template)
    subquestion_chain = (
        {
            'context': itemgetter('subquestion') | retriever,
            'subquestion_answers': itemgetter('subquestion_answers'),
            'subquestion': itemgetter('subquestion')
        }
        | iterative_prompt
        | llm
        | StrOutputParser()
    )
    subquestion_answer = subquestion_chain.invoke({'subquestion': subquestion, 'subquestion_answers': '\n\n'.join(subquestion_answers)})
    subquestion_answers.append(f"Related question: {subquestion}\nRelated answer: {subquestion_answer}")


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "subquestion": "What is a key component of an LLM-powered autonomous agent system?",
  "subquestion_answers": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,subquestion_answers,subquestion>] Entering Chain run with input:
[0m{
  "subquestion": "What is a key component of an LLM-powered autonomous agent system?",
  "subquestion_answers": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,subquestion_answers,subquestion> > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "subquestion": "What is a key component of an LLM-powered autonomous agent system?",
  "subquestion_answers": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,subquestion_answers,subquestion> > chain:RunnableSequence > chain:RunnableLambda] Entering Chain run with inp

In [38]:
# will enhance the example from video by doing an additional prompt combining the subquestions and answers back with the original question:

final_prompt_template = """You are a helpful assistant. I have some context and related questions for you to read, and I need you to answer the question below:

{subquestion_answers}


Here is additional context relevant to the question:
{context}


Use the above context and any related questions and answer to answer the question:
{question}
"""
final_prompt = ChatPromptTemplate.from_template(final_prompt_template)
final_chain = (
    {
        'context': itemgetter('question') | retriever,
        'subquestion_answers': itemgetter('subquestion_answers'),
        'question': itemgetter('question')
    }
    | final_prompt
    | llm
    | StrOutputParser()
)
final_answer = final_chain.invoke({'question': question, 'subquestion_answers': '\n\n'.join(subquestion_answers)})


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "question": "What are the main components of an LLM-powered autonomous agent system?",
  "subquestion_answers": "Related question: What is a key component of an LLM-powered autonomous agent system?\nRelated answer: Based on the provided context, a key component of an LLM-powered autonomous agent system is Long-Term Memory (LTM). LTM can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. It consists of two subtypes: Explicit/Declarative memory and Implicit/Procedural memory.\n\nRelated question: How does LLM technology contribute to the decision-making process in an autonomous agent system?\nRelated answer: Based on the provided context, LLM (Large Language Model) technology contributes to the decision-making process in an autonomous agent system through several key components:\n\n1. **Planning**: LLM helps break d

In [42]:
print("You are a helpful assistant. I have some context and related questions for you to read, and I need you to answer the question below:\n\nRelated question: What is a key component of an LLM-powered autonomous agent system?\nRelated answer: Based on the provided context, a key component of an LLM-powered autonomous agent system is Long-Term Memory (LTM). LTM can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. It consists of two subtypes: Explicit/Declarative memory and Implicit/Procedural memory.\n\nRelated question: How does LLM technology contribute to the decision-making process in an autonomous agent system?\nRelated answer: Based on the provided context, LLM (Large Language Model) technology contributes to the decision-making process in an autonomous agent system through several key components:\n\n1. **Planning**: LLM helps break down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n2. **Reflection and refinement**: LLM allows for self-criticism and self-reflection over past actions, learning from mistakes and refining them for future steps, thereby improving the quality of final results.\n3. **Memory**: Long-Term Memory (LTM) can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. This enables the agent to retain knowledge and learn from past experiences.\n\nIn terms of decision-making, LLM technology contributes by:\n\n1. **Generating reasoning traces in natural language**: LLM can generate reasoning traces that help the agent understand its own thought process and identify areas for improvement.\n2. **Providing self-reflection prompts**: The ReAct prompt template enables LLM to think explicitly about its actions and decisions, allowing the agent to reflect on its past choices and refine them for future steps.\n3. **Evaluating task results**: LLM can evaluate the correctness of task results, identifying areas where the agent needs to improve.\n\nOverall, LLM technology plays a crucial role in enabling autonomous agents to make informed decisions by providing a powerful general problem solver that can break down complex tasks into manageable subgoals, reflect on past actions, and learn from mistakes.\n\nRelated question: What role do other AI technologies play alongside LLM in an autonomous agent system?\nRelated answer: Based on the provided context, other AI technologies play several key roles alongside Large Language Models (LLMs) in a LLM-powered autonomous agent system:\n\n1. **Planning**: Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n2. **Reflection and refinement**: Self-criticism and self-reflection over past actions, learning from mistakes and refining them for future steps, thereby improving the quality of final results.\n3. **Memory**: Long-Term Memory (LTM) can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. This enables the agent to retain knowledge and learn from past experiences.\n\nAdditionally, other AI technologies such as:\n\n* **Model selection**: LLM distributes tasks to expert models, where the request is framed as a multiple-choice question.\n* **Tool use**: Equipping LLMs with external tools can significantly extend the model capabilities. This includes fine-tuning a LM to learn to use external tool APIs (TALM and Toolformer).\n* **Plugin development**: The collection of tool APIs can be provided by other developers (as in Plugins) or self-defined (as in function calls).\n\nThese technologies complement LLMs, enabling autonomous agents to make informed decisions by providing a powerful general problem solver that can break down complex tasks into manageable subgoals, reflect on past actions, and learn from mistakes.\n\n\nHere is additional context relevant to the question:\n[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\\n    \\nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview#\\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\\n\\nPlanning\\n\\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n\\n\\nMemory'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='}\\n]\\nChallenges#\\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.\\n\\n(2) Model selection: LLM distributes the tasks to expert models, where the request is framed as a multiple-choice question. LLM is presented with a list of models to choose from. Due to the limited context length, task type based filtration is needed.\\nInstruction:')]\n\n\nUse the above context and any related questions and answer to answer the question:\nWhat are the main components of an LLM-powered autonomous agent system?")

You are a helpful assistant. I have some context and related questions for you to read, and I need you to answer the question below:

Related question: What is a key component of an LLM-powered autonomous agent system?
Related answer: Based on the provided context, a key component of an LLM-powered autonomous agent system is Long-Term Memory (LTM). LTM can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. It consists of two subtypes: Explicit/Declarative memory and Implicit/Procedural memory.

Related question: How does LLM technology contribute to the decision-making process in an autonomous agent system?
Related answer: Based on the provided context, LLM (Large Language Model) technology contributes to the decision-making process in an autonomous agent system through several key components:

1. **Planning**: LLM helps break down large tasks into smaller, manageable subgoals, enabling efficient handling of co

In [40]:
print(final_answer)

Based on the provided context, the main components of a Large Language Model (LLM)-powered autonomous agent system are:

1. **Planning**: Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
2. **Reflection and refinement**: Self-criticism and self-reflection over past actions, learning from mistakes and refining them for future steps, thereby improving the quality of final results.
3. **Memory**: Long-Term Memory (LTM) can store information for a remarkably long time, ranging from a few days to decades, with essentially unlimited storage capacity. This enables the agent to retain knowledge and learn from past experiences.

These components work together to enable an LLM-powered autonomous agent system to make informed decisions by providing a powerful general problem solver that can break down complex tasks into smaller steps, reflect on past actions, and store knowledge for future use.
