# Summary

## 1. Documents

- A Document is the unix of text that the LLM processes in order to answer queries. 
- A Document is not necessarly the whole file: files are typically chunked. Documents usually contain metadata including the name of the original document, the page from where the part of the document was extracted, author e.t.c.
- Full texts are usually split into document chunks to optimise LLm output. Larger chunks = more stuff in context = more risk of hallunication. 

## 2. Embeddings

- LLMs store text as vectors in a high-dimensional space
- In this space, the position of each point (embedding) reflects the meaning of its corresponding text.
Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space.
- In simpler words: LLMs store text in "space". The way that text is store relative to each other captures the relationship between the text. So text close to each other might have similar meaning. Text at a higher vertical axis might describe parent chld relationship. 
 
## 3. Vector Stores

- A vector store is a specialised database for storing and querying embeddings. It supports standard operations like create, read, update, and delete (CRUD).
- For retrieval, it allows searching for semantically similar text by comparing embedding vectors using similarity metrics such as cosine similarity.


## 4. Tool Calling
- Tool calling enables an LLM to interact with systems e.g. calling an API or querying a database.
- When interacting with external tools, the request and response typically needs to confirm to a schema e.g. API Request Payload, SQL query structure.

```python
from pydantic import BaseModel, Field
from langchain_core.utils.function_calling import tool_example_to_messages

class BudgetEntry(BaseModel):
    amount: Optional[float] = Field(description = "The income or expense amount",default=0.0)
    currency: Optional[str] = Field(description = "The currency of the amount",default='AED')
    creditOrDebit: Optional[str] = Field(description = "Credit or Debit. Debit if the amount was debited/spent. credit if the amount was received. Defaults to credit", enum=["C","D"],default='D')
    memo: Optional[str] = Field(description="Short description of the credit/debit event e.g. Shopping")
    category: str = Field(description="The category of the credit/debit event e.g. Bills", enum=["Salary","Bills","Rent","Shopping","Car","Home"])

class Extract(BaseModel):
    entry:  Optional[BudgetEntry] = Field(description = "The budget entry if all of required the details of the transaction were present in the text"),
    success: bool = Field(description="True/False value indicating if the text contained all required details for a transaction")


examples = [
    (
        "Fifty dollars for a t-shirt",
        Extract(success=True, entry=BudgetEntry(amount=50., currency="USD",creditOrDebit="D",memo="T-Shirt",category="Shopping")),
    ),
    (
        "And having the same one as six other people in this club is a hella don't",
        Extract(success=False, entry=None),
    ),
]


messages = []

for txt, tool_call in examples:
    if tool_call.success:
        # This final message is optional for some providers
        ai_response = "Detected entry."
    else:
        ai_response = "Detected no entry."
    messages.extend(tool_example_to_messages(txt, [tool_call], ai_response=ai_response))

message_with_extraction = {
    "role": "user",
    "content": "Apple Vision Pro thingy for $3999",
}

# Add your few-shot examples + a new query
response = model.invoke(messages + [message_with_extraction])

structured_llm = model.with_structured_output(schema=Extract)
structured_llm.invoke(messages + [message_with_extraction])
```

## 5. Structured Output

- 

## 6. Few-shot Prompting
- Few-shot prompting is a technique used with large language models (LLMs) where you provide a small number of examples (typically 1–5) within the prompt to show the model how to perform a task
- Importantly, the examples should include "negatives" so that the LLM understands how to handle such cases.
- **Different chat model providers impose different requirements for valid message sequences**

## 7. Chatbots
- By default, an LLM does not retain the context from previous invocations. For example, if you tell an LLM your name in one invocation, it will not "remember" your name in the subsequent invocations
- For the LLM to remember the name, the chat history has to be sent with each invocation
- Tools like LangGraph enable this.

## 8. LangGraph
- **LangGraph** is an open-source library from the creators of LangChain that is used to build A **stateful**, **multi-step** agent/workflow applications with reliable execution and **persistence**.
- Crucially:
    - LangChain updates the state after each human and AI interation and persists it in a vector store. This state acts as memory used by LLMs to give context-aware answers.
    - Importantly, Human/Agent interactions are modeled as nodes in a graph which makes the orchestration between humans and agents visible and easy to debug.

#### 8.1 Nodes
- A node is a single action: e.g., call an LLM, run a tool, query a database, or invoke custom code.
- Nodes can represent different actors (LLMs, tools, humans).
- In case of a human, the graph encodes when to pause for human input, how to resume, and how actors exchange state.

#### 8.2 Edges
- Edges connect nodes and decide what runs next.
- They can be unconditional, conditional (branching), looping, or fan-out/fan-in (parallel paths and joins).
- Graph branches can run in parallel

#### 8.3 Graph
- A workflow is the directed graph of nodes and edges.
- This makes the system explicit and debuggable: you can see the exact path the execution took.

#### 8.4 State
- The workflow carries a state object (typically a structured dict/schema) that persists across nodes.
- Nodes read from state and propose updates to state rather than mutating it in place
- After each node runs, LangGraph materialises a new state version.
- This **immutability per step** yields reproducibility, diff-ability, and clear audit trails.

#### 8.5 Checkpoints
- Each step can be checkpointed (state snapshot + metadata).
- Checkpoints and state can be stored in memory or external stores (DBs, object storage).

- With checkpoints, you can retry or reroute without losing prior work.
- This enables resume after failure/timeouts, deterministic replay, and precise debugging from any point.

#### 8.6 Thread
- A thread is one concrete run of the graph (e.g., a single user session).
- Each thread has its own state history and checkpoints, isolating concurrent users/sessions cleanly.

![](./docs/langgraph-checkpoints.jpg)

```python
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),
        MessagesPlaceholder(variable_name="messages"), # state['messages'] will be inserted in this placeholder.
    ]
)

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages] # state stores messages
    language: str # messages contain a variable 'language' whose value also needs to be stored with the mesages


workflow = StateGraph(state_schema=State)

def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
````

## 9. Agents

- An LLM agent is a system that uses a large language model to **autonomously decide which actions or tools to invoke in order to achieve a defined goal**.

## 10. Tools

- LLM tools are external functions or services that a large language model can call—such as search, databases, or APIs—to extend its capabilities beyond text generation.

```python
from langchain_tavily import TavilySearch
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

search = TavilySearch(max_results=2)
search_results = search.invoke("What is the weather in SF")
# Once we have all the tools we want, we can put them in a list that we will reference later.
tools = [search]

model = init_chat_model("gpt-4.1", model_provider="openai")
model_with_tools = model.bind_tools(tools)

agent_executor = create_react_agent(model, tools)
input_message = {"role": "user", "content": "Search for the weather in SF"}
response = agent_executor.invoke({"messages": [input_message]})

for message in response["messages"]:
    message.pretty_print()
```

## 11. Retrieval Augmented Generation 

- Retrieval-Augmented Generation (RAG) is a technique where an LLM retrieves relevant external knowledge (e.g., from documents or databases) and incorporates it into its prompt to generate more accurate and grounded responses.
- A typical RAG application has two main components:

- **Indexing**: a pipeline for ingesting data from a source and indexing it. _This usually happens offline_. The steps are:

    - **Load**: First we need to load our data. This is done with Document Loaders.
    - **Split**: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.
    - **Store**: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a VectorStore and Embeddings model.

- **Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

    - **Retrieve**:  A retriever is the component in a retrieval-augmented generation (RAG) system that fetches the most relevant information from an external knowledge source, given a user query. Instead of generating text, it returns raw documents or records—often using vector similarity search, keyword search, or a hybrid approach. The retriever grounds the LLM by supplying fresh, domain-specific context, which reduces hallucination and ensures that generated answers are accurate and aligned with business data.
    - **Generate**: A ChatModel / LLM produces an answer using a prompt that includes both the question with the retrieved data

```python
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
# N.B. for non-US LangSmith endpoints, you may need to specify
# api_url="https://api.smith.langchain.com" in hub.pull.
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
```

### 11.1 Query Analysis

- Query Analysis is the process of employing models to transform or construct optimized search queries from raw user input.
- For example, this could be transforming user input into an SQL query or a REST API request.


```python
class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str


def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}
```

The `analyze_query` node analyzes the user's prompt to map a search model that can be used to generate an SQL Query or an API request.

### 11.2 RAG Chain
- A RAG Chain is deterministic. It follows these 3 steps:

    **Step 1** → Call retriever with query.

    **Step 2** → Take retrieved docs + user query, feed them into LLM.

    **Step 3** → Return answer.

In a RAG Chain the `retriever` is called a maximum of one time. This does not utilise the reasoning capabilities of the LLM. 

```python
memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)
graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()
```

### 11.2.1 Human in the loop

- We can add human intervention before sensitive steps (e.g. querying a database, creating or deleting files). 
- This gives a chance for a human to review the step and deciding whether to approve or reject.
- This is enabled by LangGraph's persistence layer, which saves run progress to your storage of choice.
```python
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory, interrupt_before=["execute_query"])

# Now that we're using persistence, we need to specify a thread ID
# so that we can continue the run after review.
config = {"configurable": {"thread_id": "1"}}
```

### 11.3 RAG Agents

- A Rag Agent is probabilistic.
- The retriever is exposed as a tool.
- The LLM decides how many times, and with what query.

```python
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)
```

## 11.4 Working wih Databases

- In order to perform RAG over Databases, we must turn the user's prompt into a query.

### 11.4.1 Chains

- With chains, our graph typically looks like `write_query` -> `execute_query` -> `generate_answer`.
- To reliably obtain SQL queries (absent markdown formatting and explanations or clarifications), we will make use of LangChain's structured output abstraction.
    ```python
    from langchain_core.prompts import ChatPromptTemplate

    system_message = """
    Given an input question, create a syntactically correct {dialect} query to
    run to help find the answer. Unless the user specifies in his question a
    specific number of examples they wish to obtain, always limit your query to
    at most {top_k} results. You can order the results by a relevant column to
    return the most interesting examples in the database.
    
    Never query for all the columns from a specific table, only ask for a the
    few relevant columns given the question.
    
    Pay attention to use only the column names that you can see in the schema
    description. Be careful to not query for columns that do not exist. Also,
    pay attention to which column is in which table.
    
    Only use the following tables:
    {table_info}
    """
    
    user_prompt = "Question: {input}"
    
    query_prompt_template = ChatPromptTemplate(
        [("system", system_message), ("user", user_prompt)]
    )

    class QueryOutput(TypedDict):
        """Generated SQL query."""
        query: Annotated[str, ..., "Syntactically valid SQL query."]
    
    
    def write_query(state: State):
        """Generate SQL query to fetch information."""
        prompt = query_prompt_template.invoke(
            {
                "dialect": db.dialect,
                "top_k": 10,
                "table_info": db.get_table_info(),
                "input": state["question"],
            }
        )
        structured_llm = llm.with_structured_output(QueryOutput)
        result = structured_llm.invoke(prompt)
        return {"query": result["query"]}
    ```

- To execute the query, we will use the [`QuerySQLDatabaseTool`](). It is recommended to add a human in the looop before executing a SQL query
    ```python
    from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool

    def execute_query(state: State):
        """Execute SQL query."""
        execute_query_tool = QuerySQLDatabaseTool(db=db)
        return {"result": execute_query_tool.invoke(state["query"])}
    ```

### 11.4.2 Agents

The advantage of using Agents to perform RAG over Databases is that it leverage the reasoning capabilities of LLMs to make decisions during execution. This includes:

- They can query the database as many times as needed to answer the user question
- They can recover from errors by running a generated query, catching the traceback and regenerating it correctly
- They can answer questions based on the databases' schema as well as on the databases' content (like describing a specific table).

To build an agent, we can leverage LangChain's [SQLDatabaseToolkit](https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.toolkit.SQLDatabaseToolkit.html).

The SQLDatabaseToolkit includes tools that can:

    - Create and execute queries
    - Check query syntax
    - Retrieve table descriptions
    - ... and more
    
    ```python
    from langchain_core.messages import HumanMessage
    from langgraph.prebuilt import create_react_agent
    from langchain_community.agent_toolkits import SQLDatabaseToolkit

    toolkit = SQLDatabaseToolkit(db=db, llm=llm)
    tools = toolkit.get_tools()

    agent_executor = create_react_agent(llm, tools, prompt=system_message)
    question = "Which country's customers spent the most?"

    for step in agent_executor.stream(
        {"messages": [{"role": "user", "content": question}]},
        stream_mode="values",
    ):
        step["messages"][-1].pretty_print()
    ```

### 11.4.3 Dealing with high cardinality columns

- High Cardinality columns are columns that contain many distinct values. 
    - A `full_name` column is an example of a high cardinality column
    - `is_verified` is a low cardinality column.

- In order to filter columns that contain proper nouns such as addresses, song names or artists, we first need to double-check the spelling in order to filter the data correctly.

- We can achieve this by creating a vector store with all the distinct proper nouns that exist in the database. We can then have the agent query that vector store each time the user includes a proper noun in their question, to find the correct spelling for that word

```
import ast
import re


def query_as_list(db, query):
    res = db.run(query)
    res = [el for sub in ast.literal_eval(res) for el in sub if el]
    res = [re.sub(r"\b\d+\b", "", string).strip() for string in res]
    return list(set(res))


artists = query_as_list(db, "SELECT Name FROM Artist")
albums = query_as_list(db, "SELECT Title FROM Album")

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_texts(artists + albums)
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
description = (
    "Use to look up values to filter on. Input is an approximate spelling "
    "of the proper noun, output is valid proper nouns. Use the noun most "
    "similar to the search."
)
retriever_tool = create_retriever_tool(
    retriever,
    name="search_proper_nouns",
    description=description,
)

# Add to system message
suffix = (
    "If you need to filter on a proper noun like a Name, you must ALWAYS first look up "
    "the filter value using the 'search_proper_nouns' tool! Do not try to "
    "guess at the proper name - use this function to find similar ones."
)

system = f"{system_message}\n\n{suffix}"

tools.append(retriever_tool)

agent = create_react_agent(llm, tools, prompt=system)
```

## 12. Summarization

There are three building a summarizer: 

1. **Stuffing the prompt** If the document fits in to the context window, we can just include the document in the prompt. This is the simplest approach. This is implemented using `create_stuff_documents_chain`

2. **Map Reduce**: If the document(s) do not fit in to the context window, we **map each document into a summary** and then **reduce the summaries of each document into one final summary**. The mapping step is typically done in parallel. This is implemented usinf the [`MapReduceDocumentsChain`](https://python.langchain.com/api_reference/langchain/chains/langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain.html)

3. **Iterative Refinement**: Iterative Refinement builds on Map-Reduce. Map-Reduce is effective when the documents do not have a sequential nature. In other cases, such as summarizing a novel or body of text with an inherent sequence, [iterative refinement](https://python.langchain.com/docs/how_to/summarize_refine/) may be more effective.