# Using Built-in Middleware to summarize a Long Conversation

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [5]:
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model="gpt-4o-mini",
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger=("tokens", 100),
            keep=("messages", 1)
        )
    ],
)

from langchain.messages import HumanMessage, AIMessage

response = agent.invoke(
    {"messages": [
        HumanMessage(content="Are you ready to play the JFK QA game?"),
        AIMessage(content="Sure!"),
        HumanMessage(content="Who was the favorite sister or JFK?"),
        AIMessage(content="Her sister Kick."),
        HumanMessage(content="Correct! Who was his favorite brother?"),
        AIMessage(content="Hmmm, that is difficult. I would say Ted. He loved Bobby very much, but Bobby was very different from him."),
        HumanMessage(content="Correct! What was the main source of pain of JFK on a daily basis?"),
        AIMessage(content="Back pain."),
        HumanMessage(content="Correct again! Did JFK have dogs?"),
        ]},
    {"configurable": {"thread_id": "1"}}
)

print(response["messages"][-1].content)

Yes, John F. Kennedy had several dogs during his presidency. One of the most famous was a Welsh Terrier named Pushinka, which was a gift from Soviet Premier Nikita Khrushchev. The Kennedy family also had other pets, including a German Shepherd named Charlie. The presence of these dogs added to the family atmosphere in the White House.


#### OK. Let's use pptrint to see the detailed response

In [6]:
from pprint import pprint

pprint(response)

{'messages': [HumanMessage(content="Here is a summary of the conversation to date:\n\nThe user engaged in a question-and-answer game about John F. Kennedy (JFK). Key points discussed include:\n- JFK's favorite sister was Kick.\n- His favorite brother was Ted, though he also had a strong bond with Bobby.\n- JFK experienced daily pain primarily from back issues.", additional_kwargs={}, response_metadata={}, id='147008bd-7867-4c9a-a9f0-983a87624f7b'),
              HumanMessage(content='Correct again! Did JFK have dogs?', additional_kwargs={}, response_metadata={}, id='d06fba71-939e-4807-8780-7640c3d5cead'),
              AIMessage(content='Yes, John F. Kennedy had several dogs during his presidency. One of the most famous was a Welsh Terrier named Pushinka, which was a gift from Soviet Premier Nikita Khrushchev. The Kennedy family also had other pets, including a German Shepherd named Charlie. The presence of these dogs added to the family atmosphere in the White House.', additional_kwar

#### As you can see, the response includes a summary of the conversation to date. Let's print it.

In [7]:
print(response["messages"][0].content)

Here is a summary of the conversation to date:

The user engaged in a question-and-answer game about John F. Kennedy (JFK). Key points discussed include:
- JFK's favorite sister was Kick.
- His favorite brother was Ted, though he also had a strong bond with Bobby.
- JFK experienced daily pain primarily from back issues.


## OK. Let's now explain the previous code in simple terms

Below is the same code, explained **in simple terms, line-by-line**, plus a clear explanation of **what `SummarizationMiddleware` does**.

---

#### What this program is doing (big picture)

You’re creating a chat **agent** (a smart assistant loop) that:

1. Uses an LLM (`gpt-4o-mini`)
2. **Remembers the conversation** using a “checkpointer” (memory storage)
3. Uses **SummarizationMiddleware** to *auto-summarize older chat history* when it gets too long
4. Runs the agent on a list of chat messages and prints the agent’s latest reply

Middleware overview: it’s a way to “intercept/control” what happens inside the agent loop.

---

#### Imports

```python
from langchain.agents import create_agent
```

* Imports `create_agent`, a helper that builds an agent for you (an agent = model + tools + memory + loop behavior).

```python
from langgraph.checkpoint.memory import InMemorySaver
```

* Imports an **in-memory checkpointer**.
* A *checkpointer* stores the agent’s state so the agent can resume a conversation later (per thread). For quick demos/prototyping, LangChain recommends `InMemorySaver`.

```python
from langchain.agents.middleware import SummarizationMiddleware
```

* Imports the middleware that will **summarize conversation history automatically** when some threshold is reached.

---

#### Create the agent

```python
agent = create_agent(
```

* Start building an agent object.

```python
    model="gpt-4o-mini",
```

* Sets the main model the agent will use to respond.

```python
    checkpointer=InMemorySaver(),
```

* Adds “short-term memory persistence” using an in-memory store.
* This lets the agent keep a conversation history **per thread id**, so multiple conversations don’t mix.

```python
    middleware=[
```

* Adds middleware components (think: “plugins” that run at certain points inside the agent loop).

```python
        SummarizationMiddleware(
```

* Turns on auto-summarization of older messages.

```python
            model="gpt-4o-mini",
```

* The model used **to write the summary**.
* (You can use the same or a cheaper/faster model than the main one.)

```python
            trigger=("tokens", 100),
```

* **When to summarize.**
* `("tokens", 100)` means: *if the conversation context is about to exceed ~100 tokens (according to the token counter), summarize older content.*
* **Important beginner note:** 100 tokens is *tiny* (like a few short messages), so this will summarize very aggressively.

```python
            keep=("messages", 1)
```

* **How much recent chat to keep “as-is”** after summarizing.
* `("messages", 1)` means: keep only the most recent **1 message** unchanged; older stuff gets compressed into a summary.

```python
        )
    ],
)
```

* Finishes building the agent.

---

#### Create message objects

```python
from langchain.messages import HumanMessage, AIMessage
```

* Imports message types.
* `HumanMessage` = user text, `AIMessage` = assistant text.

---

#### Invoke (run) the agent with a conversation

```python
response = agent.invoke(
```

* Runs the agent once and returns a response object (which includes messages/state updates).

```python
    {"messages": [
```

* The input is a dictionary with a `messages` list.
* You’re giving the agent a conversation “so far”.

```python
        HumanMessage(content="Are you ready to play the JFK QA game?"),
        AIMessage(content="Sure!"),
        HumanMessage(content="Who was the favorite sister or JFK?"),
        AIMessage(content="Her sister Kick."),
        HumanMessage(content="Correct! Who was his favorite brother?"),
        AIMessage(content="Hmmm, that is difficult. I would say Ted. He loved Bobby very much, but Bobby was very different from him."),
        HumanMessage(content="Correct! What was the main source of pain of JFK on a daily basis?"),
        AIMessage(content="Back pain."),
        HumanMessage(content="Correct again! Did JFK have dogs?"),
```

* This is a back-and-forth conversation.
* The last message is a user question: “Did JFK have dogs?”
* The agent should answer that last question.

```python
        ]},
```

* Ends the messages list and the input dict.

```python
    {"configurable": {"thread_id": "1"}}
```

* This is the **config** for the run.
* `thread_id="1"` tells the checkpointer: “store/retrieve memory for conversation thread #1”.
* This is how you keep separate chat sessions.

```python
)
```

* Finishes the agent call.

---

#### Print the agent’s final reply

```python
print(response["messages"][-1].content)
```

* `response["messages"]` is the updated message list after the agent answered.
* `[-1]` means “the last message”.
* `.content` gets the text.
* So this prints the agent’s newest answer.

---

#### What `SummarizationMiddleware` does (simple explanation)

`SummarizationMiddleware` is an **automatic chat-history compressor**:

* It **monitors how big your message history is** (by tokens, message count, or fraction of context).
* When the **trigger threshold** is hit (your code: 100 tokens), it:

  1. Takes the *older* part of the conversation
  2. Calls a model (your code: `gpt-4o-mini`) to **summarize** that older part
  3. Replaces that older part with a **short summary**
  4. Keeps some most-recent messages untouched based on `keep` (your code: keep only 1 recent message)

A detail from the reference docs: it “maintains context continuity by ensuring **AI/Tool message pairs remain together**” (so it doesn’t break the meaning of tool interactions).

#### In the specific settings we used

* `trigger=("tokens", 100)` → summarize *very quickly*
* `keep=("messages", 1)` → keep almost nothing verbatim, just the latest message(s)
* Result: the agent will often see something like:

  * **Summary:** “We’re playing a JFK Q&A game. User asked X, assistant answered Y…”
  * **Most recent message:** “Correct again! Did JFK have dogs?”

## How to run this code from Visual Studio Code
* Open Terminal.
* Make sure you are in the project folder.
* Make sure you have the poetry env activated.
* Enter and run the following command:
    * `python 011-mid-to-summ-conversation.py`