HAMMER (Hybrid-memory Agent Model with Modular Execution and Recall) is a memory-enabled agent built around IBM Granite models.
It shows how to:
- Attach persistent, structured memory to a stateless LLM.
- Keep all data local (SQLite + vector store on your machine).
- Expose the agent both as a REST API and as a set of MCP tools via the MCP Forge Gateway.
In short: HAMMER demonstrates how to move from single-shot chatbots to agents that can remember, summarize and reuse context across longer workflows.
-
Three-layer memory
- Short-term RAM buffer for the current turn window.
- SQLite summary store with compact bullet-point summaries.
- Vector store (Chroma) using Granite Embedding 30M for semantic recall.
-
Stateless LLM, stateful agent
- The Granite LLM remains stateless.
- All state lives in the memory modules, accessible via the agent loop.
-
Modular MCP integration
- HAMMER exposes its capabilities as MCP tools through the MCP Forge Gateway.
- MCP clients (e.g. editors, other agents) can call HAMMER as a tool.
-
Local-first
- All memory and indexes are stored on the local machine.
- Suitable for privacy-sensitive experiments and PoCs.
-
Transparent introspection
- Debug endpoints for inspecting RAM buffer, summaries and vector hits.
- Easy to understand what the agent “remembers” and why.
At a high level:
-
Clients
- CLI / local tools.
- MCP clients (e.g. Claude, VS Code, watsonx) via MCP Forge.
-
HAMMER Core
cli.py– terminal entrypoint.api.py– FastAPI layer (/chat,/summarize_session,/memory_query,/debug_state).llm.py– Granite 4.0-H-Micro (3B) wrapper.embedder.py– Granite Embedding 30M.memory.py– RAM buffer + SQLite summaries + vector store.
-
Memory backends
- RAM: in-process conversation buffer.
- SQLite**: compact summaries per session.
- Vector store: embeddings + similarity search (Chroma).
-
MCP Forge Gateway
- Runs locally and exposes HAMMER as a set of MCP tools.
- Provides admin UI, metrics and configuration.
Diagram 1: System Context & Integrations (Click to expand)
+---------------------+
| Human User |
+----------+----------+
|
(A) CLI | (B) MCP Client
(local terminal) | (Claude, VS Code, watsonx)
|
+----------v----------+
| HAMMER CLI |
| (cli.py) |
+----------+---------+
|
| HTTP (REST: /chat, ...)
|
+----------v----------+ +-------------------------+
| HAMMER API | | MCP Client (external) |
| (api.py / FastAPI) | | (implements MCP spec) |
+----------+----------+ +-----------+-------------+
| |
direct REST calls |
from CLI / curl |
| |
| |
| MCP JSON-RPC (tools/call)
| |
| +----------v-----------+
| | MCP Context Forge |
| | Gateway |
| | (mcpgateway) |
| +----------+-----------+
| |
| REST tools bridge (HTTP)
+----------------v-------------------------+
|
+----------v----------+
| HAMMER API |
| (api.py / FastAPI) |
+---------------------+
Diagram 2: Core Architecture & Data Flow (Click to expand)
+----------------------+
HTTP /chat, /summarize → | API Layer |
HTTP /memory_query | (FastAPI: api.py) |
HTTP /debug_state +----------+-----------+
|
| calls into core logic
v
+----------------------+
| HAMMER Core |
| (agent loop, glue) |
+----------+-----------+
|
+---------------------------+---------------------------+
| | |
v v v
+---------------+ +---------------+ +----------------+
| Short-term | | SummaryStore | | VectorStore |
| Memory (RAM) | | (SQLite) | | (Chroma, etc.) |
| ShortTerm | | get/add | | query/add |
+-------+-------+ +-------+-------+ +--------+-------+
| | |
| | |
+-------------+-------------+-----------------------------+
|
v
+--------------+
| Prompt |
| Builder |
+------+-------+
|
v
+--------------------+
| Granite LLM |
| (llm.py / generate |
| e.g. 4.0-h-micro) |
+---------+----------+
|
v
+-------------+
| Reply |
| Text |
+------+------+
|
|
back to API Layer
|
v
HTTP response to:
- CLI
- curl
- MCP Gateway (tools)
- Python 3.10+
git- Optional but recommended:
- Access to IBM Granite models
- MCP Forge Gateway (installed via
mcpgateway)
git clone https://github.ibm.com/wojciech-lebek/hammer.git
cd hammermake setupThe mcpgateway/makefile automates typical MCP tasks:
- Creating a virtualenv for the MCP Gateway.
- Installing
mcpgateway. - Starting the Gateway locally.
- Registering HAMMER API endpoints as MCP tools.
- Opening the admin UI in the browser.
Inspect available targets with:
cd mcpgateway
make help # if defined, or open makefile to see targetsThis section shows two end-to-end demos:
- Scenario A: Local API + hybrid memory.
- Scenario B: MCP mode via MCP Forge Gateway.
make run-apiExpected output (example):
Uvicorn running on http://127.0.0.1:9001
curl -s "http://127.0.0.1:9001/debug_state?session_id=demo" | jqYou should see an empty buffer, no summaries and no vector hits.
curl -s -X POST "http://127.0.0.1:9001/chat" \
-H "Content-Type: application/json" \
-d '{
"session_id": "demo",
"message": "Explain what HAMMER is in 3 bullet points."
}' | jqThis writes the interaction into RAM and (depending on configuration) the vector store.
curl -s -X POST "http://127.0.0.1:9001/summarize_session" \
-H "Content-Type: application/json" \
-d '{"session_id":"demo","max_bullets":3}' | jqA compact SQLite summary is created for session demo.
Subsequent calls can retrieve and reuse this summary.
This scenario uses MCP Forge Gateway so that MCP-compatible clients can call HAMMER as a tool.
make gateway-venv
make gateway-install
make gateway-token make gateway-register-hammer make run-gatewayThen open MCP Gateway UI in your browser:
make gateway-ui-toolsor open MCP Gateway UI metrics in your browser:
make gateway-ui-metricsYou should see HAMMER tools registered in the admin UI (for example tools exposing chat, memory inspection and architecture explanation).
Configure your MCP-aware client (e.g. CLI, editor plugin or another agent) to use:
- MCP Gateway URL:
http://127.0.0.1:4444 - The registered HAMMER tools (e.g.
hammer_chat,hammer_debug_state, etc.)
Example prompts from the MCP client:
- “Use the HAMMER tool to explain its own architecture.”
- “Use HAMMER to list what we discussed in this session so far.”
- “Call HAMMER’s debug tool and show me the current memory layers.”
Key effects:
- MCP turns HAMMER into a tool-providing service.
- The LLM on the client side chooses which HAMMER tool to call.
- You can use HAMMER in multi-tool and multi-agent workflows.
HAMMER exposes a small set of HTTP endpoints (FastAPI in api.py):
-
POST /chat- Input:
{"session_id": "...", "message": "..." } - Output: model reply + metadata
- Behavior: runs the full agent loop (read memory → call Granite → update memory).
- Input:
-
GET /debug_state- Query:
session_id=... - Output: JSON dump of RAM buffer, summaries and vector hits.
- Useful for introspection and debugging.
- Query:
-
POST /summarize_session- Input:
{"session_id": "...", "max_bullets": 3 } - Output: generated summary.
- Behavior: writes a compact summary into the SQLite store.
- Input:
-
POST /memory_query- Input: query text and/or filters.
- Output: retrieved summaries and vector matches.
See doc/arch_api.md for a detailed diagram and explanation of the call flow
or open API Docs in your browser:
make api-docsHAMMER/
├── README.md # This file
├── makefile # Root automation (MCP Gateway, tools, helper targets)
├── requirements.txt # Python dependencies
├── src/
│ ├── cli.py # CLI entrypoint
│ ├── api.py # FastAPI server
│ ├── llm.py # Granite LLM wrapper
│ ├── embedder.py # Granite embedding wrapper
│ └── memory.py # Hybrid memory implementation
├── mcpgateway/ # MCP Forge Gateway configuration and helper Makefile
├── doc/ # Architecture docs, demos, slides, Q&A
├── img/ # Logos and diagrams
├── wp/ # Related papers / whitepapers on agent memory
For detailed walkthroughs, see:
doc/hammer_local_demo_md.md– local API demo.doc/hammer_mcp_demo.md– MCP demo.doc/arch-overview.md– high-level system overview.doc/arch_api.md– API-layer architecture.
Typical scenarios:
-
Enterprise PoC for agent memory
- Show stakeholders how a stateless Granite model can be extended with persistent memory.
- Compare behavior with and without hybrid memory.
-
Tool-integrated agents
- Use HAMMER as a tool in MCP workflows (e.g. with other MCP servers).
- Let a higher-level orchestrator decide when to call HAMMER vs. other tools.
-
Research and experimentation
- Evaluate different memory policies (e.g. when to summarize vs. store full turns).
- Experiment with alternative summary schemas or vector stores.
If you refer to HAMMER in internal documents or presentations:
@misc{lebek2025hammer,
author = {Lebek, W.},
title = {HAMMER: Hybrid-memory Agent Model with Modular Execution and Recall},
year = {2025},
publisher = {IBM CIC Schweiz},
howpublished = {\url{https://github.ibm.com/wojciech-lebek/hammer.git}}
}
Licensed under the Apache License 2.0
