## Key components
Reliable RAG is an improvement over naive RAG that addresses:
1. Retrieval of irrelevant documents
2. Hallucinations not grounded in facts from the retrieved information
3. Citation of sources used to generate the response

How does it do this?
1. Use an LLM to filter retrieved documents
2. Something
3. Something

In [None]:
!poetry add llama-index llama-index-readers-web

In [None]:
!poetry add llama-index-vector-stores-chroma

In [None]:
!poetry add python-dotenv

###  Import Libraries and enviornment variables

In [1]:
import os
from dotenv.main import load_dotenv

# Load environment variables from '.env' file
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")

### Load documents

In [2]:

from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
)


urls = [
    "https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-2-reflection/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-3-tool-use/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/?ref=dl-staging-website.ghost.io",
    "https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/?ref=dl-staging-website.ghost.io"
]

loaded_docs = SimpleWebPageReader(html_to_text=True).load_data(urls)

If I used a better loader I could get metadata like page title.

### Create vector store

#### Create chroma vector store if it doesn't exist

In [3]:
import chromadb
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("reliable_rag")

### Chunk the documents

In [4]:

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()
splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)

nodes = splitter.get_nodes_from_documents(loaded_docs)


In [6]:
nodes[0].get_content()

"✨ New course! Enroll in  [Retrieval Optimization: From Tokenization to Vector\nQuantization](https://bit.ly/3Y3pjp2)\n\n[![](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27300%27%20height=%2792%27/%3e)The\nBatch![The\nBatch](/_next/image/?url=%2F_next%2Fstatic%2Fmedia%2Fdlai-batch-\nlogo.a60dbb9f.png&w=640&q=75)](/the-batch/)\n\n  * [Explore Courses](/courses/)\n  * [AI Newsletter](/the-batch/)\n    * [The Batch](/the-batch/)\n    * [Andrew's Letter](/the-batch/tag/letters/)\n    * [Data Points](/the-batch/tag/data-points/)\n    * [ML Research](/the-batch/tag/research/)\n    * [Blog](/blog/)\n  * [Community](/community/)\n    * [Forum](https://community.deeplearning.ai/)\n    * [Events](/events/)\n    * [Ambassadors](/ambassador/)\n    * [Ambassador Spotlight](/blog/category/ambassador-spotlight/)\n  * [Resources](/resources/)\n  * [Company](/about/)\n 

Store the chunks in Chroma

In [7]:
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore


vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)

In [8]:
retriever = index.as_retriever(similarity_top_k=2)

### Question

In [9]:
question = "what are the differnt kind of agentic design patterns?"

### Retrieve documents

In [12]:
docs = retriever.retrieve(question)

### Check what our docs look like

In [15]:
print(f"Text: {docs[0].text}\n\nContent: {docs[0].get_content()}\n\nEmbedding: {docs[0].embedding}")

Text: [Proposed
ChatDev architecture, illustrated.](/_next/image/?url=https%3A%2F%2Fdl-
staging-website.ghost.io%2Fcontent%2Fimages%2F2024%2F04%2Funnamed---
2024-04-17T155856.845-2.png&w=3840&q=75)

Share

  * [](https://twitter.com/intent/tweet?url=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)
  * [](https://www.facebook.com/sharer/sharer.php?u=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)
  * [](https://www.linkedin.com/shareArticle?mini=true&url=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)

Dear friends,

Multi-agent collaboration is the last of the four [key AI agentic design
patterns](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-
performance/?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-8TZzur2df1qdnGx09b-Fg94DTsc3-xXao4StKvKNU2HR51el3n8yOm0CPSw6GiAoLQNKua)
that I’ve described i

### Check document relevancy

In [19]:

from pydantic import BaseModel, Field
from llama_index.llms.openai import OpenAI
from llama_index.core import ChatPromptTemplate

class GradeDocuments(BaseModel):
    binary_score: str = Field(
        description="Whether the document is relevant to the question"
        )

llm = OpenAI(model="gpt-4o-mini")
retrieval_grader = llm.as_structured_llm(GradeDocuments)

# Prompt
system = """You are a grader assessing relevance of a retrieved document to a user question. \n 
    If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("user", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)


In [28]:
docs_to_use = []
# get actual object
for doc in docs:
    print(doc.get_text(), '\n', '-'*50)
    res = retrieval_grader.chat(grade_prompt.format_messages(question=question, document=doc.get_text()))
    print(res,'\n')
    if res.raw.binary_score == 'yes':
        docs_to_use.append(doc)

[Proposed
ChatDev architecture, illustrated.](/_next/image/?url=https%3A%2F%2Fdl-
staging-website.ghost.io%2Fcontent%2Fimages%2F2024%2F04%2Funnamed---
2024-04-17T155856.845-2.png&w=3840&q=75)

Share

  * [](https://twitter.com/intent/tweet?url=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)
  * [](https://www.facebook.com/sharer/sharer.php?u=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)
  * [](https://www.linkedin.com/shareArticle?mini=true&url=https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)

Dear friends,

Multi-agent collaboration is the last of the four [key AI agentic design
patterns](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-
performance/?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-8TZzur2df1qdnGx09b-Fg94DTsc3-xXao4StKvKNU2HR51el3n8yOm0CPSw6GiAoLQNKua)
that I’ve described in rece

### Generate result

In [31]:
# Prompt
system = """You are an assistant for question-answering tasks. Answer the question based upon your knowledge. 
Use three-to-five sentences maximum and keep the answer concise."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("user", "Retrieved documents: \n\n <docs>{documents}</docs> \n\n User question: <question>{question}</question>"),
    ]
)

# Post-processing
def format_docs(docs):
    return "\n".join(f"<doc{i+1}>:\nContent:{doc.get_content()}\n</doc{i+1}>\n" for i, doc in enumerate(docs))


generation = llm.chat(prompt.format_messages(documents=format_docs(docs_to_use), question=question))
print(generation)

assistant: The different kinds of agentic design patterns include four key strategies: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. Reflection involves agents assessing their own performance and adjusting accordingly. Tool Use allows agents to utilize external tools to enhance their capabilities. Planning focuses on agents organizing tasks and workflows effectively. Multi-Agent Collaboration breaks down complex tasks into subtasks handled by different agents, optimizing performance through specialization.


### Check for Hallucinations

In [32]:
# Data model
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in 'generation' answer."""

    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

# LLM with function call
structured_llm_grader = llm.as_structured_llm(GradeHallucinations)

# Prompt
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n 
    Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("user", "Set of facts: \n\n <facts>{documents}</facts> \n\n LLM generation: <generation>{generation}</generation>"),
    ]
)

response = structured_llm_grader.chat(hallucination_prompt.format_messages(documents=format_docs(docs_to_use), generation=generation))

print(response)

assistant: {"binary_score":"yes"}


From here the original notebook cites sources with another llm layer.

The loader I used didn't extract metadata so I'm just going to skip this.