In [44]:
! pip install langchain langchain-text-splitters langchain-community bs4 langgraph

^C


In [45]:
import os

os.environ["LANGSMITH_TRACING"] = "false"
os.environ["LANGSMITH_API_KEY"] = "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"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"


Components
We will need to select three components from LangChain’s suite of integrations. Select a chat model: Openai in this case

In [None]:
! pip install -U "langchain[openai]"

In [47]:
import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "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"
model = init_chat_model("gpt-4.1")

Select an embeddings model:

In [None]:
! pip install -U "langchain-openai"

In [48]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Select a vector store:

In [None]:
! pip install -U "langchain-core"

In [50]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

We need to first load the blog post contents. We can use DocumentLoaders for this, which are objects that load in data from a source and return a list of Document objects. In this case we’ll use the WebBaseLoader, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text. We can customize the HTML -> text parsing by passing in parameters into the BeautifulSoup parser via bs_kwargs (see BeautifulSoup docs). In this case only HTML tags with class “post-content”, “post-title”, or “post-header” are relevant, so we’ll remove all others.

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1

print(f"Total characters: {len(docs[0].page_content)}")

print(docs[0].page_content[:500])

Our loaded document is over 42k characters which is too long to fit into the context window of many models, To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant parts of the blog post at run time. 

We use a RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

In [51]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 63 sub-documents.


In [None]:
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])

['0ac226fd-92a1-4920-85ff-8f58f04dbf27', '8c4c3049-af4e-458a-be35-d0304742e943', 'e06c6c56-278a-4c86-b3a2-cddd6266863a']



Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer. We will demonstrate:
    A RAG agent that executes searches with a simple tool. This is a good general-purpose implementation.
    A two-step RAG chain that uses just a single LLM call per query. This is a fast and effective method for simple queries.
    
One formulation of a RAG application is as a simple agent with a tool that retrieves information. We can assemble a minimal RAG agent by implementing a tool that wraps our vector store:

In [53]:
from langchain.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

Given our tool, we can construct the agent:

In [54]:
from langchain.agents import create_agent

tools = [retrieve_context]
# If desired, specify custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

Let’s test this out. We construct a question that would typically require an iterative sequence of retrieval steps to answer:

In [None]:
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()