# 05. Agent Integration with TensorZero

This notebook demonstrates how to build agents using TensorZero's native tool calling system:
- Understanding TensorZero's tool configuration approach
- Building a simple conversational agent with tools
- Multi-turn conversations with tool usage
- Agent observability and feedback collection
- Performance analysis and benchmarking

In [3]:
# Import required libraries
from tensorzero import TensorZeroGateway, ToolCall
from typing import List, Dict, Any, Optional
import json

print("üîß TensorZero Agent Development Environment")
print("=" * 50)

# Initialize TensorZero client
client = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")
print("‚úÖ Connected to TensorZero Gateway")

# Test basic connection
try:
    response = client.inference(
        function_name="chat",
        variant_name="gpt4_mini",
        input={"messages": [{"role": "user", "content": "Hello!"}]}
    )
    print(f"‚úÖ Basic inference test: {response.inference_id}...")
except Exception as e:
    print(f"‚ùå Connection test failed: {e}")
    print("üí° Make sure TensorZero services are running with 'docker compose up'")

üîß TensorZero Agent Development Environment
‚úÖ Connected to TensorZero Gateway
‚úÖ Basic inference test: 0198f364-81c3-75a0-acaf-190d3a1b534d...


## 1. Understanding TensorZero's Tool System

TensorZero handles tools differently from OpenAI. Let's explore how it works.

In [4]:
# Test the agent_chat function with tools
print("üîß Testing TensorZero Tool System")
print("=" * 40)

# Test a simple query that should trigger calculator tool
try:
    response = client.inference(
        function_name="agent_chat",
        variant_name="gpt4_mini",
        input={"messages": [{"role": "user", "content": "What is 25 + 17?"}]}
    )
    
    print(f"üìù Inference ID: {response.inference_id}")
    print(f"üè∑Ô∏è  Variant: {response.variant_name}")
    
    # Process response content
    assistant_text = ""
    tool_calls = []
    
    if hasattr(response, 'content') and response.content:
        for content_block in response.content:
            if hasattr(content_block, 'text') and content_block.text:
                assistant_text += content_block.text
                print(f"üí¨ Assistant: {content_block.text}")
            elif isinstance(content_block, ToolCall):
                tool_calls.append({
                    "name": content_block.name,
                    "args": content_block.arguments if hasattr(content_block, 'arguments') else {},
                    "id": content_block.id if hasattr(content_block, 'id') else f"call_{len(tool_calls)}"
                })
    
    if tool_calls:
        print(f"üîß Tool Calls Detected: {len(tool_calls)}")
        for tool_call in tool_calls:
            print(f"   üìå {tool_call['name']}: {tool_call['args']}")
    else:
        print("‚ÑπÔ∏è  No tool calls detected")
        
except Exception as e:
    print(f"‚ùå Tool test failed: {e}")
    print("\nüí° Key Difference: TensorZero configures tools in tensorzero.toml, not in requests!")
    print("   - Tools are defined in the configuration file")
    print("   - Requests just specify the function with tools configured")
    print("   - Tool calls come back as ToolCall objects, not OpenAI format")

üîß Testing TensorZero Tool System
üìù Inference ID: 0198f364-8472-7033-99da-f003a5e85233
üè∑Ô∏è  Variant: gpt4_mini
üîß Tool Calls Detected: 1
   üìå calculator: {'expression': '25 + 17'}


## 2. Building a Simple Agent

Now let's create a conversational agent that can use tools effectively.

In [5]:
class TensorZeroAgent:
    """A simple conversational agent using TensorZero's tool system."""
    
    def __init__(self, client: TensorZeroGateway):
        self.client = client
        self.conversation_history = []
        print("ü§ñ TensorZero Agent initialized")
    
    def chat(self, user_message: str, show_details: bool = True) -> Dict[str, Any]:
        """Send a message and get response with tool handling."""
        
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        if show_details:
            print(f"üë§ User: {user_message}")
        
        try:
            # Make inference with TensorZero
            response = self.client.inference(
                function_name="agent_chat",
                variant_name="gpt4_mini",
                input={"messages": self.conversation_history}
            )
            
            # Process the response
            assistant_content = ""
            tool_calls = []
            
            if hasattr(response, 'content') and response.content:
                for content_block in response.content:
                    if hasattr(content_block, 'text') and content_block.text:
                        assistant_content += content_block.text
                    elif isinstance(content_block, ToolCall):
                        tool_calls.append({
                            "name": content_block.name,
                            "args": content_block.arguments if hasattr(content_block, 'arguments') else {},
                            "id": content_block.id if hasattr(content_block, 'id') else f"call_{len(tool_calls)}"
                        })
            
            # Add assistant message to history
            assistant_message = {
                "role": "assistant",
                "content": assistant_content if assistant_content else "I need to use a tool to help you."
            }
            
            if tool_calls:
                assistant_message["tool_calls"] = tool_calls
            
            self.conversation_history.append(assistant_message)
            
            if show_details:
                if assistant_content:
                    print(f"ü§ñ Agent: {assistant_content}")
                if tool_calls:
                    print(f"üîß Used tools: {len(tool_calls)}")
                    for tool_call in tool_calls:
                        print(f"   üìå {tool_call['name']}: {tool_call['args']}")
                print(f"üìù Inference: {response.inference_id}...")
            
            return {
                "response": assistant_content,
                "tool_calls": tool_calls,
                "inference_id": response.inference_id,
                "variant": response.variant_name
            }
            
        except Exception as e:
            error_msg = f"Error: {str(e)}"
            print(f"‚ùå Agent error: {e}")
            return {
                "response": error_msg,
                "tool_calls": [],
                "inference_id": None,
                "variant": None
            }
    
    def show_history(self):
        """Display the conversation history."""
        print("\nüìú Conversation History:")
        print("=" * 40)
        for i, msg in enumerate(self.conversation_history, 1):
            role = msg['role']
            content = msg['content']
            print(f"{i}. {role.title()}: {content}")
            if 'tool_calls' in msg:
                for tool_call in msg['tool_calls']:
                    print(f"   üîß Tool: {tool_call['name']}({tool_call['args']})")
        print("=" * 40)

# Create our agent
agent = TensorZeroAgent(client)
print("\n‚úÖ Agent ready! Available tools: calculator, get_weather, search_tensorzero_docs")

ü§ñ TensorZero Agent initialized

‚úÖ Agent ready! Available tools: calculator, get_weather, search_tensorzero_docs


## 3. Testing the Agent

Let's test our agent with various tasks that require tool usage.

In [6]:
# Test 1: Math calculation
print("üßÆ Test 1: Math Calculation")
result1 = agent.chat("What is 234 multiplied by 567?")

# Test 2: Weather query
print("\nüå§Ô∏è  Test 2: Weather Query")
result2 = agent.chat("What's the weather like in Tokyo?")

# Test 3: Documentation search
print("\nüìö Test 3: Documentation Search")
result3 = agent.chat("How does feedback work in TensorZero?")

# Test 4: Multi-step task
print("\nüîÑ Test 4: Multi-step Task")
result4 = agent.chat("Calculate 150 USD in GBP at 0.79 exchange rate, and tell me the weather in London.")

print("\n‚úÖ All tests completed!")
print(f"üìä Total inferences: {len([r for r in [result1, result2, result3, result4] if r['inference_id']])}")

üßÆ Test 1: Math Calculation
üë§ User: What is 234 multiplied by 567?
üîß Used tools: 1
   üìå calculator: {'expression': '234 * 567'}
üìù Inference: 0198f364-878b-7f21-b8f4-73741beabffc...

üå§Ô∏è  Test 2: Weather Query
üë§ User: What's the weather like in Tokyo?
‚ùå Agent error: Failed to deserialize JSON to tensorzero::client_input::ClientInput: messages[1].tool_calls: unknown field `tool_calls`, expected `role` or `content` at line 1 column 159

üìö Test 3: Documentation Search
üë§ User: How does feedback work in TensorZero?
‚ùå Agent error: Failed to deserialize JSON to tensorzero::client_input::ClientInput: messages[1].tool_calls: unknown field `tool_calls`, expected `role` or `content` at line 1 column 159

üîÑ Test 4: Multi-step Task
üë§ User: Calculate 150 USD in GBP at 0.79 exchange rate, and tell me the weather in London.
‚ùå Agent error: Failed to deserialize JSON to tensorzero::client_input::ClientInput: messages[1].tool_calls: unknown field `tool_calls`, expect

## 4. Agent Observability

Let's explore how TensorZero tracks and observes agent interactions.

In [7]:
# Check agent conversation history
print("üìú Agent Conversation History")
agent.show_history()

# Collect some feedback
print("\nüìä Collecting Agent Feedback")
print("-" * 30)

# Simulate user feedback
feedback_data = [
    {"inference_id": result1.get("inference_id"), "rating": 0.95, "helpful": True, "comment": "Perfect calculation!"},
    {"inference_id": result2.get("inference_id"), "rating": 0.8, "helpful": True, "comment": "Weather info was useful"},
    {"inference_id": result3.get("inference_id"), "rating": 0.9, "helpful": True, "comment": "Good documentation search"},
    {"inference_id": result4.get("inference_id"), "rating": 0.85, "helpful": True, "comment": "Handled multiple tools well"}
]

for feedback in feedback_data:
    if feedback["inference_id"]:
        try:
            # Submit feedback metrics
            client.feedback(
                metric_name="user_rating",
                inference_id=feedback["inference_id"],
                value=feedback["rating"]
            )
            client.feedback(
                metric_name="helpful",
                inference_id=feedback["inference_id"],
                value=feedback["helpful"]
            )
            print(f"‚úÖ Feedback submitted: {feedback['rating']}/1.0 - {feedback['comment']}")
        except Exception as e:
            print(f"‚ö†Ô∏è  Feedback submission failed: {e}")

print("\nüåê View agent interactions in TensorZero UI: http://localhost:4000")
print("üìà Check metrics and performance data in the dashboard")

üìú Agent Conversation History

üìú Conversation History:
1. User: What is 234 multiplied by 567?
2. Assistant: I need to use a tool to help you.
   üîß Tool: calculator({'expression': '234 * 567'})
3. User: What's the weather like in Tokyo?
4. User: How does feedback work in TensorZero?
5. User: Calculate 150 USD in GBP at 0.79 exchange rate, and tell me the weather in London.

üìä Collecting Agent Feedback
------------------------------
‚úÖ Feedback submitted: 0.95/1.0 - Perfect calculation!

üåê View agent interactions in TensorZero UI: http://localhost:4000
üìà Check metrics and performance data in the dashboard


## 5. Agent Performance Analysis

Let's analyze how our agent performs across different types of queries.

In [8]:
import time
import pandas as pd
from typing import List, Dict

def benchmark_agent(test_cases: List[Dict], agent_instance) -> pd.DataFrame:
    """Benchmark the agent across different types of queries."""
    results = []
    
    print(f"üèÉ Running Agent Benchmark ({len(test_cases)} test cases)")
    print("=" * 50)
    
    for i, test_case in enumerate(test_cases, 1):
        query = test_case["query"]
        category = test_case["category"]
        expected_tools = test_case.get("expected_tools", [])
        
        print(f"\nüß™ Test {i}/{len(test_cases)}: {category}")
        print(f"Query: {query}")
        
        start_time = time.time()
        
        try:
            # Run agent
            result = agent_instance.chat(query, show_details=False)
            
            end_time = time.time()
            duration = round(end_time - start_time, 2)
            
            # Analyze result
            success = result["inference_id"] is not None
            tool_count = len(result["tool_calls"])
            response_length = len(result["response"])
            
            results.append({
                "test_case": i,
                "category": category,
                "query": query[:50] + "..." if len(query) > 50 else query,
                "success": success,
                "duration_seconds": duration,
                "tool_calls": tool_count,
                "response_length": response_length,
                "inference_id": result["inference_id"][:10] + "..." if result["inference_id"] else None
            })
            
            print(f"‚úÖ Success: {duration}s, {tool_count} tools, {response_length} chars")
            
        except Exception as e:
            end_time = time.time()
            duration = round(end_time - start_time, 2)
            
            results.append({
                "test_case": i,
                "category": category,
                "query": query[:50] + "..." if len(query) > 50 else query,
                "success": False,
                "duration_seconds": duration,
                "tool_calls": 0,
                "response_length": 0,
                "inference_id": None,
                "error": str(e)[:50]
            })
            
            print(f"‚ùå Failed: {duration}s, Error: {str(e)[:50]}...")
    
    return pd.DataFrame(results)

# Define test cases
test_cases = [
    {
        "category": "Math",
        "query": "Calculate the compound interest on $1000 at 5% annual rate for 3 years",
        "expected_tools": ["calculator"]
    },
    {
        "category": "Weather", 
        "query": "What's the weather forecast for Paris?",
        "expected_tools": ["get_weather"]
    },
    {
        "category": "Documentation",
        "query": "How do I set up variants in TensorZero?",
        "expected_tools": ["search_tensorzero_docs"]
    },
    {
        "category": "Multi-tool",
        "query": "If it's sunny in Miami, calculate how many hours of sunlight that would be if we get 65% of maximum possible (12 hours)",
        "expected_tools": ["get_weather", "calculator"]
    },
    {
        "category": "Conversational",
        "query": "Tell me about TensorZero and why it's useful for LLM applications",
        "expected_tools": []
    }
]

# Create fresh agent for benchmarking
benchmark_agent_instance = TensorZeroAgent(client)
benchmark_results = benchmark_agent(test_cases, benchmark_agent_instance)

# Display results
print("\nüìä Benchmark Results Summary")
print("=" * 40)
summary = benchmark_results.groupby('category').agg({
    'success': ['count', 'sum'],
    'duration_seconds': ['mean', 'std'],
    'tool_calls': 'mean',
    'response_length': 'mean'
}).round(2)
print(summary)

ü§ñ TensorZero Agent initialized
üèÉ Running Agent Benchmark (5 test cases)

üß™ Test 1/5: Math
Query: Calculate the compound interest on $1000 at 5% annual rate for 3 years
‚ùå Failed: 1.5s, Error: 'UUID' object is not subscriptable...

üß™ Test 2/5: Weather
Query: What's the weather forecast for Paris?
‚ùå Agent error: Failed to deserialize JSON to tensorzero::client_input::ClientInput: messages[1].tool_calls: unknown field `tool_calls`, expected `role` or `content` at line 1 column 199
‚úÖ Success: 0.13s, 0 tools, 177 chars

üß™ Test 3/5: Documentation
Query: How do I set up variants in TensorZero?
‚ùå Agent error: Failed to deserialize JSON to tensorzero::client_input::ClientInput: messages[1].tool_calls: unknown field `tool_calls`, expected `role` or `content` at line 1 column 199
‚úÖ Success: 0.13s, 0 tools, 177 chars

üß™ Test 4/5: Multi-tool
Query: If it's sunny in Miami, calculate how many hours of sunlight that would be if we get 65% of maximum possible (12 hours)
‚ùå A

## 6. Key Insights and Best Practices

What we've learned about building agents with TensorZero.

### üéØ TensorZero Agent Best Practices

**1. Tool Configuration Approach:**
- ‚úÖ Configure tools in `tensorzero.toml` (TensorZero's way)
- ‚ùå Don't try to send tools in request payloads (OpenAI's way)

**2. Response Processing:**
- ‚úÖ Handle `ToolCall` objects from response content
- ‚úÖ Process both text and tool call content blocks
- ‚úÖ Use inference IDs for tracking and feedback

**3. Agent Architecture:**
- ‚úÖ Keep conversation history for context
- ‚úÖ Handle tool call results properly
- ‚úÖ Implement proper error handling

**4. Observability:**
- ‚úÖ Use TensorZero's built-in inference tracking
- ‚úÖ Submit feedback metrics for performance analysis
- ‚úÖ Monitor agent behavior in TensorZero UI

### üîß Common Issues & Solutions

**Error: `"tools" unknown field`** ‚Üí Configure tools in `tensorzero.toml`, not requests
**Error: `"tool_calls" unknown field`** ‚Üí Handle TensorZero's `ToolCall` objects, not OpenAI format
**No tool calls detected** ‚Üí Check if tools are properly configured and function has tool access

### üöÄ Production Considerations

1. **Error Handling**: Robust fallback mechanisms for tool failures
2. **Rate Limiting**: Manage API costs and quotas through TensorZero
3. **Security**: Validate tool inputs and sanitize outputs
4. **Monitoring**: Set up alerts for agent performance degradation
5. **A/B Testing**: Use TensorZero variants to test different agent behaviors

### üìä Performance Benefits

- **Unified API**: Single interface to multiple LLM providers
- **Built-in Observability**: Automatic metrics collection and tracing
- **Cost Optimization**: Automatic provider routing and caching
- **Experimentation**: Easy A/B testing of different models/prompts
- **Production Ready**: <1ms latency overhead, enterprise features

---

**üåê TensorZero UI**: http://localhost:4000 - View all agent interactions, metrics, and performance data

**üìö Next Steps**: Explore multi-agent systems, custom tool development, and advanced observability features.