# Middleware with History Modification

Ref: https://docs.langchain.com/oss/python/langchain/middleware/built-in

This notebook demonstrates `SummarizationMiddleware`, which automatically
summarizes conversation history when approaching token limits.


## Setup

Configure `.env` before running. See `.env.sample`.


In [13]:
import rich
from dotenv import load_dotenv

load_dotenv()

True

## SummarizationMiddleware

Built-in middleware that manages conversation history intelligently:

**How it works:**

1. Monitors message count/tokens in `before_model` hook
2. When `trigger` threshold is exceeded, summarizes older messages using an LLM
3. Replaces old messages with a summary, keeping recent messages intact
4. Uses `RemoveMessage(id=REMOVE_ALL_MESSAGES)` to clear history, then adds summary + kept messages

**Why this matters - ToolMessage constraint:**

LLM APIs require that every `ToolMessage` must have a corresponding `AIMessage` with
matching `tool_call_id`. If you naively trim messages and accidentally remove the AIMessage
while keeping its ToolMessage (or vice versa), the API call will fail.

`SummarizationMiddleware` handles this by scanning for `tool_calls` in AIMessage and their
corresponding ToolMessage, adjusting the cutoff point to keep pairs together.

**Parameters:**

- `model`: LLM used to generate summaries
- `trigger`: When to summarize - `("messages", N)`, `("tokens", N)`, or `("fraction", 0.8)`
- `keep`: How much to preserve - same format as trigger


In [14]:
from datetime import datetime
from typing import Any

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langchain_anthropic import ChatAnthropic
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import tool
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph.state import CompiledStateGraph


@tool
def get_current_time() -> str:
    """Get the current time."""
    return datetime.now().isoformat()


model = ChatAnthropic(model="claude-sonnet-4-5-20250929")
checkpointer = InMemorySaver()

# Summarize when messages exceed 8, keep last 4 messages
summarization = SummarizationMiddleware(
    model=model,
    trigger=("messages", 8),
    keep=("messages", 4),
)

agent: CompiledStateGraph[Any] = create_agent(
    model=model,
    tools=[get_current_time],
    system_prompt="You are a helpful assistant. Keep your responses brief.",
    checkpointer=checkpointer,
    middleware=[summarization],
)

## Test Summarization

Send multiple messages to trigger the summarization behavior.


In [15]:
config: RunnableConfig = {"configurable": {"thread_id": "summarization-test"}}

questions = [
    "My name is Alice and I'm a software engineer.",
    "What is 2 + 2?",
    "What is the capital of France?",
    "What time is it?",
    "Tell me a fun fact about cats.",
    "What is my name and profession?",  # Should be preserved in summary
]

for q in questions:
    rich.print(f"[bold cyan]User: {q}[/bold cyan]")
    response = agent.invoke({"messages": [{"role": "user", "content": q}]}, config)
    ai_message = response["messages"][-1]
    rich.print(f"[bold green]Assistant: {ai_message.content}[/bold green]")
    rich.print(f"[dim]Total messages in state: {len(response['messages'])}[/dim]")

## Inspect the Summary

Check what the summarized history looks like.


In [16]:
state = agent.get_state(config)
messages = state.values.get("messages", [])

rich.print(f"[bold]Current message count: {len(messages)}[/bold]")
rich.print("messages =", messages)

## Custom History Modification

You can also write custom middleware to modify history. The key is to return a dict
with `messages` from `before_model` hook.

**Important:** When modifying messages, ensure AIMessage/ToolMessage pairs stay together.


In [17]:
from langchain.agents.middleware.types import AgentMiddleware, AgentState
from langchain_core.messages import HumanMessage
from langgraph.runtime import Runtime


class SystemPromptInjectionMiddleware(AgentMiddleware[AgentState[Any], None]):
    """Middleware that injects dynamic context into conversation history."""

    def __init__(self, context_provider: str) -> None:
        self.context_provider = context_provider

    def before_model(self, state: AgentState[Any], runtime: Runtime[None]) -> dict[str, Any] | None:
        messages = state["messages"]

        # Only inject right after user message (not after tool calls)
        if not messages or not isinstance(messages[-1], HumanMessage):
            return None

        context_message = HumanMessage(
            content=f"[System context: {self.context_provider}]",
        )

        rich.print(f"[bold magenta]Injected context: {self.context_provider}[/bold magenta]")
        return {"messages": [context_message, *messages]}


# Example: inject context that user always wants to know the time
checkpointer3 = InMemorySaver()
agent3: CompiledStateGraph[Any] = create_agent(
    model=model,
    tools=[get_current_time],
    system_prompt="You are a helpful assistant.",
    checkpointer=checkpointer3,
    middleware=[SystemPromptInjectionMiddleware("User always wants to know the current time")],
)

config3: RunnableConfig = {"configurable": {"thread_id": "custom-test"}}
response = agent3.invoke({"messages": [{"role": "user", "content": "Hello!"}]}, config3)
rich.print("response =", response)