# Query Decomposer Agent Tool Test

This notebook demonstrates how to use the `MemoryAgentTool` with our KeyValueMemory implementation to create a decomposer agent that can read from and write to memory.

The decomposer agent breaks down complex queries into simpler sub-queries that can be more easily processed by downstream agents.

The `MemoryAgentTool` class provides:
1. **Memory integration** for agents by extending BaseTool
2. **Pre-processing** via reader callbacks that fetch relevant context from memory before agent execution
3. **Post-processing** via parser callbacks that extract and store information from agent outputs after execution

In [1]:
import asyncio
import json
import logging
import re
from typing import Dict, Any, List, Optional

# Set up logging
logging.basicConfig(level=logging.INFO, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Reduce noise from autogen
logging.getLogger('autogen_core').setLevel(logging.WARNING)

In [2]:
# Import our KeyValueMemory and MemoryAgentTool
from memory import KeyValueMemory
from memory_agent_tool import MemoryAgentTool, MemoryAgentToolArgs

# Import necessary AutoGen components
from autogen_core import CancellationToken
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.base import TaskResult
from autogen_ext.models.openai import OpenAIChatCompletionClient

## 1. Set up our memory store and model client

In [3]:
# Initialize the shared memory store
memory = KeyValueMemory(name="text_to_sql_memory")

# Set up the model client - replace with your specific model
model_client = OpenAIChatCompletionClient(
    model="gpt-4o",  # or other appropriate model
    temperature=0.1,
    timeout=120
)

## 2. Define memory callback functions for our decomposer agent

The agent will have custom reader and parser functions that define how it interacts with memory.

In [4]:
# Decomposer Agent memory callbacks
async def decomposer_reader(memory, task, cancellation_token):
    """Read relevant information for the decomposer agent."""
    print("🔍 READER: Starting decomposer_reader function")
    context = {}
    
    # Get database schema if available
    schema = await memory.get("current_schema") or await memory.get("full_database_schema")
    if schema:
        print(f"🔍 READER: Found schema (length: {len(schema)})")
        context["schema"] = schema
    else:
        print("🔍 READER: No schema found in memory")
    
    # Get query history for context
    query_history_json = await memory.get("query_history")
    if query_history_json:
        try:
            query_history = json.loads(query_history_json)
            if query_history:
                print(f"🔍 READER: Found query history ({len(query_history)} entries)")
                context["query_history"] = query_history
            else:
                print("🔍 READER: Query history is empty")
        except json.JSONDecodeError:
            logging.error("Failed to parse query history JSON")
            print("🔍 READER: Error parsing query history JSON")
    else:
        print("🔍 READER: No query history found in memory")
    
    # Get previous decompositions if available
    decompositions_json = await memory.get("previous_decompositions")
    if decompositions_json:
        try:
            decompositions = json.loads(decompositions_json)
            if decompositions:
                print(f"🔍 READER: Found previous decompositions ({len(decompositions)} entries)")
                context["previous_decompositions"] = decompositions
            else:
                print("🔍 READER: Previous decompositions list is empty")
        except json.JSONDecodeError:
            logging.error("Failed to parse previous decompositions JSON")
            print("🔍 READER: Error parsing previous decompositions JSON")
    else:
        print("🔍 READER: No previous decompositions found in memory")
    
    print(f"🔍 READER: Returning context with keys: {list(context.keys())}")
    return context

async def decomposer_parser(memory, task, result, cancellation_token):
    """Parse and store decomposition results."""
    print("🔍 PARSER: Starting decomposer_parser function")
    
    if not result.messages:
        print("🔍 PARSER: No messages in result")
        return
        
    last_message = result.messages[-1].content
    print(f"🔍 PARSER: Processing message of length: {len(last_message)}")
    print(f"🔍 PARSER: Message preview: {last_message[:100]}...")
    
    # Look for decomposition in XML format - handle both plain XML and XML inside code blocks
    # First try finding XML directly
    decomposition_match = re.search(r'<decomposition>.*?</decomposition>', last_message, re.DOTALL)
    if decomposition_match:
        print("🔍 PARSER: Found direct XML match")
    else:
        print("🔍 PARSER: No direct XML match found, checking for code blocks")
        # If not found, try finding it inside markdown code blocks
        xml_in_code = re.search(r'```(?:xml)?\s*(<decomposition>.*?</decomposition>)\s*```', last_message, re.DOTALL)
        if xml_in_code:
            print("🔍 PARSER: Found XML in code block")
            decomposition_match = re.search(r'<decomposition>.*?</decomposition>', xml_in_code.group(1), re.DOTALL)
            if decomposition_match:
                print("🔍 PARSER: Successfully extracted XML from code block")
            else:
                print("🔍 PARSER: Failed to extract XML from code block")
        else:
            print("🔍 PARSER: No XML in code block found")
    
    if decomposition_match:
        decomposition_str = decomposition_match.group()
        print(f"🔍 PARSER: Extracted decomposition of length: {len(decomposition_str)}")
        await memory.set("current_decomposition", decomposition_str)
        print("🔍 PARSER: Stored decomposition in memory")
        
        # Extract sub-queries for easier access
        subqueries = []
        subquery_matches = re.findall(r'<subquery.*?>(.+?)</subquery>', decomposition_str, re.DOTALL)
        if subquery_matches:
            print(f"🔍 PARSER: Found {len(subquery_matches)} subqueries")
            for i, subquery in enumerate(subquery_matches):
                subqueries.append({
                    "id": f"subquery_{i+1}",
                    "text": subquery.strip()
                })
            await memory.set("subqueries", json.dumps(subqueries))
            print(f"🔍 PARSER: Stored {len(subqueries)} subqueries in memory")
        else:
            print("🔍 PARSER: No subqueries found in decomposition")
    else:
        print("🔍 PARSER: No decomposition found in result")
        
    # Parse the original query
    query = ""
    try:
        task_obj = json.loads(task)
        query = task_obj.get("query", task)
        print(f"🔍 PARSER: Extracted query from JSON task: {query[:50]}...")
    except json.JSONDecodeError:
        query = task
        print(f"🔍 PARSER: Using raw task as query: {query[:50]}...")
        
    # Store decomposition in history
    decompositions = []
    decompositions_json = await memory.get("previous_decompositions")
    if decompositions_json:
        try:
            decompositions = json.loads(decompositions_json)
            print(f"🔍 PARSER: Loaded existing decomposition history with {len(decompositions)} entries")
        except json.JSONDecodeError:
            logging.error("Failed to parse previous decompositions, creating new list")
            print("🔍 PARSER: Error parsing previous decompositions, creating new list")
    else:
        print("🔍 PARSER: No existing decomposition history, creating new list")
    
    # Add current decomposition to history
    decompositions.append({
        "query": query,
        "decomposition": last_message,
        "timestamp": str(datetime.datetime.now())
    })
    print(f"🔍 PARSER: Added current decomposition to history (now {len(decompositions)} entries)")
    
    # Keep only the last 5 decompositions
    if len(decompositions) > 5:
        decompositions = decompositions[-5:]
        print(f"🔍 PARSER: Trimmed history to last 5 entries")
        
    await memory.set("previous_decompositions", json.dumps(decompositions))
    print("🔍 PARSER: Updated decomposition history in memory")
    
    # Verify memory was updated
    current_decomp = await memory.get("current_decomposition")
    subqueries_json = await memory.get("subqueries")
    decomp_history = await memory.get("previous_decompositions")
    
    if current_decomp:
        print("🔍 PARSER: Verified current_decomposition was saved")
    else:
        print("🔍 PARSER: WARNING: current_decomposition not found after saving!")
        
    if subqueries_json:
        print("🔍 PARSER: Verified subqueries were saved")
    else:
        print("🔍 PARSER: WARNING: subqueries not found after saving!")
        
    if decomp_history:
        print("🔍 PARSER: Verified decomposition history was saved")
    else:
        print("🔍 PARSER: WARNING: decomposition history not found after saving!")

## 3. Create our Decomposer Agent with System Message

In [5]:
import datetime

# Define system message for the decomposer agent
DECOMPOSER_SYSTEM_MESSAGE = """
You are a query decomposition expert. Your role is to:
1. Analyze complex natural language queries
2. Break them down into simpler sub-queries
3. Return the decomposition in XML format

For each input query, return a decomposition with one or more subqueries. Each subquery should:
- Be simpler than the original query
- Focus on a single aspect of the original query
- Be answerable using the database schema provided
- Include a description of how results will be combined

ALWAYS return your decomposition in this XML format:
<decomposition>
  <original_query>The original query text</original_query>
  <subquery id="1">First simpler subquery</subquery>
  <subquery id="2">Second simpler subquery</subquery>
  <combination_logic>
    Explanation of how to combine the subquery results to answer the original query
  </combination_logic>
</decomposition>
"""

# Create the agent
decomposer_agent = AssistantAgent(
    name="decomposer",
    system_message=DECOMPOSER_SYSTEM_MESSAGE,
    model_client=model_client,
    description="Breaks down complex queries into simpler sub-queries"
)

# Wrap the agent with memory capabilities
decomposer_tool = MemoryAgentTool(
    agent=decomposer_agent,
    memory=memory,
    reader_callback=decomposer_reader,
    parser_callback=decomposer_parser
)

## 4. Set up Sample Database Schema

We'll use the same sample database schema as in the schema selector test.

In [6]:
# Reset memory for a fresh run
await memory.clear()

# Set up a sample database schema
full_schema = """
<database_schema>
  <table name="customers">
    <column name="customer_id" type="INTEGER" primary_key="true" />
    <column name="name" type="TEXT" />
    <column name="email" type="TEXT" />
    <column name="join_date" type="DATE" />
  </table>
  <table name="orders">
    <column name="order_id" type="INTEGER" primary_key="true" />
    <column name="customer_id" type="INTEGER" foreign_key="customers.customer_id" />
    <column name="order_date" type="DATE" />
    <column name="total_amount" type="DECIMAL" />
  </table>
  <table name="products">
    <column name="product_id" type="INTEGER" primary_key="true" />
    <column name="name" type="TEXT" />
    <column name="price" type="DECIMAL" />
    <column name="category" type="TEXT" />
  </table>
  <table name="order_items">
    <column name="item_id" type="INTEGER" primary_key="true" />
    <column name="order_id" type="INTEGER" foreign_key="orders.order_id" />
    <column name="product_id" type="INTEGER" foreign_key="products.product_id" />
    <column name="quantity" type="INTEGER" />
    <column name="price" type="DECIMAL" />
  </table>
  <table name="inventory">
    <column name="inventory_id" type="INTEGER" primary_key="true" />
    <column name="product_id" type="INTEGER" foreign_key="products.product_id" />
    <column name="quantity" type="INTEGER" />
    <column name="warehouse" type="TEXT" />
  </table>
</database_schema>
"""

# Store the schema
await memory.set("full_database_schema", full_schema)
print("Full database schema stored in memory")

# Initialize empty decomposition history
await memory.set("previous_decompositions", json.dumps([]))
print("Decomposition history initialized")

2025-05-21 13:20:53,046 - root - INFO - [KeyValueMemory] Memory cleared.


Full database schema stored in memory
Decomposition history initialized


## 5. Run the Decomposer Agent for a Complex Query

Let's test our decomposer with a complex query that should be broken down into simpler parts.

In [7]:
# Define a complex query that needs decomposition
complex_query = "Find the top 3 product categories by revenue and for each category, show the customer who spent the most on that category"

# Create a task with the schema and query
task1 = json.dumps({
    "query": complex_query,
    "schema": await memory.get("full_database_schema")
})

# Create a cancellation token
cancellation_token = CancellationToken()

# Create the proper arguments object
args = MemoryAgentToolArgs(task=task1)

# Run the agent
result1 = await decomposer_tool.run(
    args=args,
    cancellation_token=cancellation_token
)

# Display result
print(f"\nAgent Response:\n{result1.messages[-1].content}")

🔍 READER: Starting decomposer_reader function
🔍 READER: Found schema (length: 1406)
🔍 READER: No query history found in memory
🔍 READER: Previous decompositions list is empty
🔍 READER: Returning context with keys: ['schema']


2025-05-21 13:20:58,100 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


🔍 PARSER: Starting decomposer_parser function
🔍 PARSER: Processing message of length: 969
🔍 PARSER: Message preview: ```xml
<decomposition>
  <original_query>Find the top 3 product categories by revenue and for each c...
🔍 PARSER: Found direct XML match
🔍 PARSER: Extracted decomposition of length: 958
🔍 PARSER: Stored decomposition in memory
🔍 PARSER: Found 2 subqueries
🔍 PARSER: Stored 2 subqueries in memory
🔍 PARSER: Extracted query from JSON task: Find the top 3 product categories by revenue and f...
🔍 PARSER: Loaded existing decomposition history with 0 entries
🔍 PARSER: Added current decomposition to history (now 1 entries)
🔍 PARSER: Updated decomposition history in memory
🔍 PARSER: Verified current_decomposition was saved
🔍 PARSER: Verified subqueries were saved
🔍 PARSER: Verified decomposition history was saved

Agent Response:
```xml
<decomposition>
  <original_query>Find the top 3 product categories by revenue and for each category, show the customer who spent the most on that

## 6. Run the Decomposer Agent for Another Complex Query

Let's run another query to demonstrate memory continuity.

In [8]:
# Define another complex query
complex_query2 = "Compare the average order amount between customers who joined before 2020 and those who joined after, broken down by product category"

# Create a task with the schema and query
task2 = json.dumps({
    "query": complex_query2,
    "schema": await memory.get("full_database_schema")
})

# Create the proper arguments object
args = MemoryAgentToolArgs(task=task2)

# Run the agent again
result2 = await decomposer_tool.run(
    args=args,
    cancellation_token=cancellation_token
)

# Display result
print(f"\nAgent Response:\n{result2.messages[-1].content}")

🔍 READER: Starting decomposer_reader function
🔍 READER: Found schema (length: 1406)
🔍 READER: No query history found in memory
🔍 READER: Found previous decompositions (1 entries)
🔍 READER: Returning context with keys: ['schema', 'previous_decompositions']


2025-05-21 13:21:01,540 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


🔍 PARSER: Starting decomposer_parser function
🔍 PARSER: Processing message of length: 922
🔍 PARSER: Message preview: ```xml
<decomposition>
  <original_query>Compare the average order amount between customers who join...
🔍 PARSER: Found direct XML match
🔍 PARSER: Extracted decomposition of length: 911
🔍 PARSER: Stored decomposition in memory
🔍 PARSER: Found 2 subqueries
🔍 PARSER: Stored 2 subqueries in memory
🔍 PARSER: Extracted query from JSON task: Compare the average order amount between customers...
🔍 PARSER: Loaded existing decomposition history with 1 entries
🔍 PARSER: Added current decomposition to history (now 2 entries)
🔍 PARSER: Updated decomposition history in memory
🔍 PARSER: Verified current_decomposition was saved
🔍 PARSER: Verified subqueries were saved
🔍 PARSER: Verified decomposition history was saved

Agent Response:
```xml
<decomposition>
  <original_query>Compare the average order amount between customers who joined before 2020 and those who joined after, broken dow

## 7. Check Memory Contents

Let's examine what's stored in memory after our agent runs.

In [9]:
# Check what's stored in memory
current_decomposition = await memory.get("current_decomposition")
print(f"Current decomposition:\n{current_decomposition}\n")

# Get subqueries
subqueries_json = await memory.get("subqueries")
if subqueries_json:
    subqueries = json.loads(subqueries_json)
    print("Extracted Subqueries:")
    for sq in subqueries:
        print(f"ID: {sq['id']}")
        print(f"Text: {sq['text']}\n")
else:
    print("No subqueries found in memory")

# Get decomposition history
decompositions_json = await memory.get("previous_decompositions")
if decompositions_json:
    decompositions = json.loads(decompositions_json)
    print(f"\nDecomposition History ({len(decompositions)} entries):")
    for i, entry in enumerate(decompositions):
        print(f"\nEntry {i+1}:")
        print(f"Query: {entry['query']}")
        print(f"Timestamp: {entry['timestamp']}")
else:
    print("No decomposition history found in memory")

Current decomposition:
<decomposition>
  <original_query>Compare the average order amount between customers who joined before 2020 and those who joined after, broken down by product category</original_query>
  <subquery id="1">Identify customers who joined before 2020 and calculate the average order amount for each product category for these customers.</subquery>
  <subquery id="2">Identify customers who joined after 2020 and calculate the average order amount for each product category for these customers.</subquery>
  <combination_logic>
    Execute subquery 1 to get the average order amounts for customers who joined before 2020, broken down by product category. Execute subquery 2 to get the same information for customers who joined after 2020. Compare the results from both subqueries to analyze differences in average order amounts between the two groups for each product category.
  </combination_logic>
</decomposition>

Extracted Subqueries:
ID: subquery_1
Text: Identify customers wh

## 8. Test Using Subqueries

Let's demonstrate how we might use the decomposed subqueries in a workflow.

In [10]:
# Get the subqueries
subqueries_json = await memory.get("subqueries")
if subqueries_json:
    subqueries = json.loads(subqueries_json)
    print("Processing subqueries sequentially...\n")
    
    for sq in subqueries:
        print(f"Processing {sq['id']}: {sq['text']}")
        # In a real workflow, we would send each subquery to a SQL generator agent
        # For demonstration, we'll just print what we would do
        print(f"  → This would be sent to a SQL generator agent")
        print(f"  → The SQL would be executed against the database")
        print(f"  → Results would be stored in memory for combination\n")
    
    # Get the combination logic
    current_decomposition = await memory.get("current_decomposition")
    if current_decomposition:
        combination_match = re.search(r'<combination_logic>(.+?)</combination_logic>', current_decomposition, re.DOTALL)
        if combination_match:
            combination_logic = combination_match.group(1).strip()
            print(f"Combination Logic:\n{combination_logic}\n")
            print("This logic would be used to combine the results of the subqueries.")
else:
    print("No subqueries available to process")

Processing subqueries sequentially...

Processing subquery_1: Identify customers who joined before 2020 and calculate the average order amount for each product category for these customers.
  → This would be sent to a SQL generator agent
  → The SQL would be executed against the database
  → Results would be stored in memory for combination

Processing subquery_2: Identify customers who joined after 2020 and calculate the average order amount for each product category for these customers.
  → This would be sent to a SQL generator agent
  → The SQL would be executed against the database
  → Results would be stored in memory for combination

Combination Logic:
Execute subquery 1 to get the average order amounts for customers who joined before 2020, broken down by product category. Execute subquery 2 to get the same information for customers who joined after 2020. Compare the results from both subqueries to analyze differences in average order amounts between the two groups for each produ

## 9. Conclusion

This notebook demonstrates how the `MemoryAgentTool` allows a decomposer agent to break down complex queries into simpler subqueries while maintaining context through memory. The key features demonstrated include:

1. Reading database schema and past decompositions from memory before agent execution
2. Parsing and storing decomposition results after agent execution
3. Extracting and storing subqueries for easy access by downstream components
4. Maintaining a history of decompositions for context

This pattern can be extended to multi-agent workflows where the decomposer agent provides structure for other agents to follow, enabling more complex reasoning chains in text-to-SQL applications.