### Middleware

Middleware provides a way to more tightly control what happens inside the agent. Middleware is useful for the following:
- Tracking agent behavior with logging, analytics, and debugging.
- Transforming prompts, tool selection, and output formatting.
- Adding retries, fallbacks, and early termination logic.
- Applying rate limits, guardrails, and PII detection.

In [1]:
from langchain_groq import ChatGroq
import os
from dotenv import load_dotenv

load_dotenv()

llm = ChatGroq(
    model="llama-3.1-8b-instant",  # ✅ updated model
    groq_api_key=os.getenv("GROQ_API_KEY")
)


### Summarization MiddleWare
Automatically summarize conversation history when approaching token limits, preserving recent messages while compressing older context. Summarization is useful for the following:
- Long-running conversations that exceed context windows.
- Multi-turn dialogues with extensive history.
- Applications where preserving full conversation context matters.

In [2]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langgraph.checkpoint.memory import InMemorySaver

load_dotenv()

llm = ChatGroq(
    model="llama-3.1-8b-instant",  # ✅ updated model
    groq_api_key=os.getenv("GROQ_API_KEY")
)

agent = create_agent(
    model=llm,  # pass LLM object not string
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(
            model=llm,  # also here
            trigger=("messages", 10),
            keep=("messages", 4)
        )
    ]
)


In [3]:
config = {"configurable": {"thread_id": "test-1"}}


In [4]:
from langchain_core.messages import HumanMessage

questions = [
    "What is 2+2?",
    "What is 10*5?",
    "What is 100/4?",
    "What is 15-7?",
    "What is 3*3?",
    "What is 4*4?",
]

for q in questions:
    response = agent.invoke(
        {"messages": [HumanMessage(content=q)]},
        config
    )

    print("Assistant:", response["messages"][-1].content)
    print("Total messages stored:", len(response["messages"]))
    print("-" * 50)


Assistant: 2 + 2 = 4.
Total messages stored: 2
--------------------------------------------------
Assistant: 10 * 5 = 50.
Total messages stored: 4
--------------------------------------------------
Assistant: 100 / 4 = 25.
Total messages stored: 6
--------------------------------------------------
Assistant: 15 - 7 = 8.
Total messages stored: 8
--------------------------------------------------
Assistant: 3 * 3 = 9.
Total messages stored: 10
--------------------------------------------------
Assistant: 4 * 4 = 16.
Total messages stored: 6
--------------------------------------------------
