# ðŸ““ The GenAI Revolution Cookbook

**Title:** LangChain Tutorial: Quick-Start for RAG and Agents (Python) [2025]

**Description:** Build a production-ready Python RAG and agent fast with LangChainâ€”step-by-step setup, copyable code, GitHub repo, testing, and deployment guidance included.

**ðŸ“– Read the full article:** [LangChain Tutorial: Quick-Start for RAG and Agents (Python) [2025]](https://blog.thegenairevolution.com/article/langchain-tutorial-quick-start-for-rag-and-agents-python-2025)

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



LangChain is a really practical framework for getting things done with LLMs. It provides a unified interface for document loading, splitting, embedding, retrieval, and chainingâ€”which means you can build a working RAG pipeline without writing low\-level vector store or LLM client code. The framework integrates with OpenAI, Chroma, and other providers out of the box, so you can focus on tuning retrieval and prompts rather than dealing with plumbing. For builders who want to prototype quickly and iterate on retrieval quality, LangChain's abstractions reduce boilerplate and make it easy to swap components (switching from Chroma to another vector store, for instance) without rewriting your chain logic.

## Core Concepts for This Use Case

**Document ingestion and splitting**: Load text from files or URLs, then split it into chunks that fit your embedding model's context window while preserving semantic boundaries.

**Embeddings and vector storage**: Convert chunks into dense vectors using an embedding model, then persist them in a vector store (Chroma) for fast similarity search.

**Retrieval and prompting**: Query the vector store to fetch relevant chunks, inject them into a prompt template, and pass the grounded prompt to an LLM for an answer.

**Evaluation**: Measure retrieval quality and response accuracy using keyword checks, timing, and manual inspection of retrieved sources.

## Setup

Run this notebook in Google Colab or locally with Python 3\.9\+. Install dependencies first:

In [None]:
!pip install -qU langchain== langchain-openai langchain-community chromadb requests==2.32.4 langchain-core>=1.0.0,<2.0.0

Set your OpenAI API key securely. In Colab, use the Secrets panel (key name: OPENAI\_API\_KEY) or prompt for it:

In [None]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

Verify the key and model access with a quick sanity check:

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
resp = llm.invoke("Respond with the single word: ready")
assert isinstance(resp.content, str), "OpenAI client did not return a string response."
print("OpenAI client OK")

## Using LangChain for RAG in Practice

### Ingest and Split Documents

Create a sample text file and load it with TextLoader. For web pages, use WebBaseLoader instead.

In [None]:
from langchain_community.document_loaders import TextLoader

sample_text = """
LangChain is a framework for developing applications powered by language models.
It provides integrations for document loading, splitting, embeddings, vector stores, retrieval, chains, tools, and agents.
This file serves as sample content for RAG demonstrations.
"""

with open("docs.txt", "w") as f:
    f.write(sample_text)

loader = TextLoader("docs.txt")
docs = loader.load()
print(f"Loaded {len(docs)} document(s). Sample content: {docs[0].page_content[:80]}")

Split documents into chunks using RecursiveCharacterTextSplitter. This splitter respects paragraph and sentence boundaries better than naive character splits. You'll want to adjust chunk\_size and overlap to balance recall (larger chunks capture more context) and token cost (smaller chunks reduce prompt size). If you encounter unexpected retrieval misses or prompt mismatches, it's worth reviewing [common tokenization pitfalls that can break prompts and RAG](/article/common-tokenization-pitfalls-that-can-break-prompts-and-rag).

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", " ", ""]
)
splits = splitter.split_documents(docs)
print(f"Split into {len(splits)} chunk(s). First chunk preview:\n{splits[0].page_content}")

### Embed and Persist to Chroma

Embed the chunks using OpenAI's text\-embedding\-3\-small model and store them in a Chroma vector database. Persist the store to disk so you can reuse it across sessions without re\-embedding.

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
persist_dir = "chroma_rag_store"

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_dir
)
vectorstore.persist()
print(f"Vector store created and persisted at '{persist_dir}'.")

To avoid duplicating data on re\-runs, either delete the persist directory before creating a new store or reopen the existing store and add documents incrementally:

In [None]:
reopened = Chroma(
    embedding_function=embeddings,
    persist_directory=persist_dir
)

### Configure Retrieval

Create a retriever using Maximal Marginal Relevance (MMR) to reduce redundancy in retrieved chunks. Start with k\=4 (number of chunks returned) and fetch\_k\=20 (candidates to re\-rank). Use search\_type\="similarity" for pure cosine similarity if you prefer speed over diversity.

In [None]:
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "fetch_k": 20}
)
print("Retriever initialized with MMR and k=4.")

### Define a Grounded Prompt

Write a prompt template that instructs the model to answer only from the provided context and to admit when it doesn't know. This reduces hallucination and keeps responses grounded.

In [None]:
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Use the context to answer the question. If the answer
is not in the context, say you do not know.

Context:
{context}

Question:
{question}

Answer concisely.
""".strip())
print("Prompt template created for grounded RAG responses.")

### Assemble the RAG Chain

Combine the retriever, prompt, and LLM into a RetrievalQA chain. This chain fetches relevant chunks, stuffs them into the prompt, and sends the prompt to the LLM.

In [None]:
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)
print("RAG chain assembled and ready for queries.")

### Run and Evaluate

Test the RAG pipeline with a sample query:

In [None]:
query = "What does LangChain provide for building LLM apps?"
result = rag_chain({"query": query})
print("RAG result:", result["result"])

Measure latency and check for keyword presence to validate retrieval quality:

In [None]:
import time

def timed_call(fn, *args, **kwargs):
    start = time.time()
    out = fn(*args, **kwargs)
    return out, time.time() - start

def contains_any(text, keywords):
    text_l = text.lower()
    return any(k.lower() in text_l for k in keywords)

eval_cases = [
    {
        "query": "What components does LangChain offer?",
        "keywords": ["document", "embeddings", "vector", "tools", "agents"]
    }
]

for case in eval_cases:
    res, dur = timed_call(lambda: rag_chain({"query": case["query"]}))
    answer = res["result"]
    ok = contains_any(answer, case["keywords"])
    print(f"Eval: {ok} for '{case['query']}' in {dur:.2f}s (Result: {answer})")

Start with small k and compact prompts. Lower temperature for determinism. Use smaller models during iteration and reserve higher\-end models for production paths that require them. Cache embeddings and responses where possible. For advanced optimization, consider adding [semantic caching with Redis Vector](/article/semantic-caching-with-redis-vector) to cut LLM costs.

## Conclusion

You've built a working RAG pipeline with LangChain, Chroma, and OpenAI: ingesting documents, splitting them into chunks, embedding and persisting to a vector store, retrieving relevant context, and generating grounded answers. You've also added basic evaluation to measure latency and keyword coverage. This foundation lets you iterate on chunking strategy, retrieval parameters, and prompt design to improve accuracy and cost\-efficiency. For next steps, explore [building a stateful AI agent with LangGraph](/article/building-a-stateful-ai-agent-with-langgraph) to add multi\-turn reasoning and tool use on top of your RAG system.