# LangGraph Calculator Tool Integration Example

This notebook demonstrates a ReAct agent with calculator tools using LangGraph.
It showcases:
- LangGraph ReAct agent with tool integration
- Custom calculator tools using @tool decorator
- Automatic tool selection and execution
- Detailed execution flow visualization

## Setup and Imports

In [9]:
import os
import sys
sys.path.append('./helpers')


# Force reload the module to ensure latest changes
import importlib
import helpers.agent_utils
import helpers.llm_config
import helpers.calculator_tools

importlib.reload(helpers.agent_utils)
importlib.reload(helpers.llm_config)
importlib.reload(helpers.calculator_tools)

from helpers.agent_utils import run_calculation, create_calculator_agent
from helpers.llm_config import is_openrouter_configured
from helpers.calculator_tools import get_calculator_tools

# Select the model
# 🎯 Available models (set MODEL_NAME environment variable):
#    • openai/gpt-4.1-nano (default)     - Latest nano model
#    • openai/gpt-4o-mini                - Fast, reliable, cost-effective
#    • openai/gpt-4o                     - More capable, higher cost
#    • anthropic/claude-3-haiku          - Fast Anthropic model
#    • anthropic/claude-3-5-sonnet       - Most capable Anthropic model
#    • nvidia/nemotron-nano-9b-v2        - Nvidia's fast model
#    • deepseek/deepseek-chat            - DeepSeek's reasoning model
#    • meta-llama/llama-3.1-8b-instruct  - Open source option
current_model = os.getenv("MODEL_NAME", "openai/gpt-4.1-nano")

# Environment Configuration
print("🔧 Environment Configuration")
print("=" * 40)
if is_openrouter_configured():
    print("✅ OpenRouter API key found")
    print(f"📍 Current model: {current_model}")
else:
    print("⚠️  Warning: No OpenRouter API key found and Ollama may not be running.")
print("=" * 40)


# Create the calculator agent
agent = create_calculator_agent()
print("🤖 Calculator agent created successfully!")

# Show available tools
tools = get_calculator_tools()
print(f"\n📋 Available tools: {[tool.name for tool in tools]}")

🔧 Environment Configuration
✅ OpenRouter API key found
📍 Current model: openai/gpt-4.1-nano
🤖 Calculator agent created successfully!

📋 Available tools: ['add', 'subtract', 'multiply', 'divide']


## Basic Calculator Tests

Let's test basic arithmetic operations:

In [10]:
# Test basic addition
run_calculation(agent, "Calculate 145 + 237")

📐 Extracted expression: 145 + 237
🧮 Python eval result: 382.0


🧮 QUERY: Calculate 145 + 237
🛠️  add({'a': 145, 'b': 237}) -> 382.0
🎯 result = 382.0

🤖 FINAL RESPONSE: The result of 145 + 237 is 382.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.40 seconds


{'agent': {'messages': [AIMessage(content='The result of 145 + 237 is 382.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 331, 'total_tokens': 344, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_c4c155951e', 'id': 'gen-1757389671-ZLgEdWwjfijgrIkEi7ZS', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--38deedee-0638-4d1a-9a7f-620df4e54e56-0', usage_metadata={'input_tokens': 331, 'output_tokens': 13, 'total_tokens': 344, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [11]:
# Test multiplication
run_calculation(agent, "What is 23 * 67?")

📐 Extracted expression: 23 * 67
🧮 Python eval result: 1541.0


🧮 QUERY: What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 result = 1541.0

🤖 FINAL RESPONSE: The result of 23 multiplied by 67 is 1541.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.82 seconds


{'agent': {'messages': [AIMessage(content='The result of 23 multiplied by 67 is 1541.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 334, 'total_tokens': 349, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_c4c155951e', 'id': 'gen-1757389673-Do6hZJ4CIT78Sra92f7P', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--55b6f106-a933-405d-a9ff-de7783a12c59-0', usage_metadata={'input_tokens': 334, 'output_tokens': 15, 'total_tokens': 349, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [12]:
# Test division
run_calculation(agent, "Compute 1024 divided by 8")

📐 Extracted expression: 1024 / 8
🧮 Python eval result: 128.0


🧮 QUERY: Compute 1024 divided by 8
🛠️  divide({'a': 1024, 'b': 8}) -> 128.0
🎯 result = 128.0

🤖 FINAL RESPONSE: 1024 divided by 8 equals 128.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.18 seconds


{'agent': {'messages': [AIMessage(content='1024 divided by 8 equals 128.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 334, 'total_tokens': 345, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_c4c155951e', 'id': 'gen-1757389675-UyETQv73kr4O3VmlSBLu', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--d42fb7f8-d906-48f9-bceb-181145758e60-0', usage_metadata={'input_tokens': 334, 'output_tokens': 11, 'total_tokens': 345, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [13]:
# Test subtraction
run_calculation(agent, "Find the result of 500 - 123")

📐 Extracted expression: 500 - 123
🧮 Python eval result: 377.0


🧮 QUERY: Find the result of 500 - 123
🛠️  subtract({'a': 500, 'b': 123}) -> 377.0
🎯 result = 377.0

🤖 FINAL RESPONSE: The result of 500 - 123 is 377.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.33 seconds


{'agent': {'messages': [AIMessage(content='The result of 500 - 123 is 377.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 334, 'total_tokens': 347, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389677-AT1tDckDFneuhUHm5PxL', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--9b6bae13-16b4-4654-bcf6-71cd53a54746-0', usage_metadata={'input_tokens': 334, 'output_tokens': 13, 'total_tokens': 347, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [14]:
# Test decimal addition
run_calculation(agent, "What is 15.5 + 24.3?")

📐 Extracted expression: 15.5 + 24.3
🧮 Python eval result: 39.8


🧮 QUERY: What is 15.5 + 24.3?
🛠️  add({'a': 15.5, 'b': 24.3}) -> 39.8
🎯 result = 39.8

🤖 FINAL RESPONSE: The sum of 15.5 and 24.3 is 39.8.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.29 seconds


{'agent': {'messages': [AIMessage(content='The sum of 15.5 and 24.3 is 39.8.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 341, 'total_tokens': 360, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389679-eG5gU6Tup6g5rHIlY0WP', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--dd972562-f127-4e2f-8c1b-d1c3e772ef0f-0', usage_metadata={'input_tokens': 341, 'output_tokens': 19, 'total_tokens': 360, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [15]:
# Test decimal multiplication
run_calculation(agent, "What's 2.5 * 4.8?")

📐 Extracted expression: 2.5 * 4.8
🧮 Python eval result: 12.0


🧮 QUERY: What's 2.5 * 4.8?
🛠️  multiply({'a': 2.5, 'b': 4.8}) -> 12.0
🎯 result = 12.0

🤖 FINAL RESPONSE: 2.5 multiplied by 4.8 equals 12.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.81 seconds


{'agent': {'messages': [AIMessage(content='2.5 multiplied by 4.8 equals 12.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 340, 'total_tokens': 354, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389681-kssPi9kGStZKA6OfV6wq', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--0c8d18a9-f7d5-42d1-bd51-2e4694cbdfcb-0', usage_metadata={'input_tokens': 340, 'output_tokens': 14, 'total_tokens': 354, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [None]:
# Batch test
additional_queries = [
    "Calculate 999 / 3",
    "Subtract 89 from 234"
]

print(f"Running {len(additional_queries)} additional tests...\n")

for i, query in enumerate(additional_queries, 1):
    print(f"\n📊 Additional Test {i}/{len(additional_queries)}")
    run_calculation(agent, query)

Running 2 additional tests...


📊 Additional Test 1/2
📐 Extracted expression: 999 / 3
🧮 Python eval result: 333.0


🧮 QUERY: Calculate 999 / 3
🛠️  divide({'a': 999, 'b': 3}) -> 333.0
🎯 result = 333.0

🤖 FINAL RESPONSE: The result of dividing 999 by 3 is 333.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.53 seconds

📊 Additional Test 2/2
📐 Extracted expression: 234 - 89
🧮 Python eval result: 145.0


🧮 QUERY: Subtract 89 from 234
🛠️  subtract({'a': 234, 'b': 89}) -> 145.0
🎯 result = 145.0

🤖 FINAL RESPONSE: The result of subtracting 89 from 234 is 145.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.45 seconds


## Complex Multi-Step Calculation

Testing the agent's ability to handle complex calculations:

In [None]:
# Complex calculation test
print("🧮 Complex Calculation Test")
complex_query = "I need to calculate (15 + 25) * 3. Can you help me with this step by step?"
run_calculation(agent, complex_query, 4)

🧮 Complex Calculation Test
📐 Extracted expression: (15 + 25) * 3
🧮 Python eval result: 120.0


🧮 QUERY (Run 1/4): I need to calculate (15 + 25) * 3. Can you help me with this step by step?
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🎯 result = 3.0

🤖 FINAL RESPONSE (Run 1): First, we add 15 and 25, which gives us 40. Then, we multiply 40 by 3, resulting in 120.
⏱️  Execution time: 1.96 seconds

🧮 QUERY (Run 2/4): I need to calculate (15 + 25) * 3. Can you help me with this step by step?
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 40, 'b': 3}) -> 120.0
🎯 result = 120.0

🤖 FINAL RESPONSE (Run 2): The result of (15 + 25) * 3 is 120.
⏱️  Execution time: 5.13 seconds

🧮 QUERY (Run 3/4): I need to calculate (15 + 25) * 3. Can you help me with this step by step?
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 40, 'b': 3}) -> 120.0
🎯 result = 120.0

🤖 FINAL RESPONSE (Run 

In [18]:
# Complex calculation test
print("🧮 Complex Calculation Test")
complex_query = "I need to calculate: ((15 + 25) * 3)/(43 - 12*5)."
run_calculation(agent, complex_query)


🧮 Complex Calculation Test
📐 Extracted expression: ((15 + 25) * 3)/(43 - 12*5)
🧮 Python eval result: -7.0588235294117645


🧮 QUERY: I need to calculate: ((15 + 25) * 3)/(43 - 12*5).
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  subtract({'a': 43, 'b': 12}) -> 31.0
🛠️  multiply({'a': 12, 'b': 5}) -> 60.0
🛠️  subtract({'a': 31, 'b': 60}) -> -29.0
🛠️  divide({'a': 40, 'b': -29}) -> -1.3793103448275863
🎯 result = -1.3793103448275863

🤖 FINAL RESPONSE: The result of the calculation ((15 + 25) * 3) / (43 - 12 * 5) is approximately -1.38.
🔍 VALIDATION: ❌ wrong (Agent: -1.38, Python: -7.058824)
✅ Calculation completed successfully!

⏱️  Execution time: 9.04 seconds


{'agent': {'messages': [AIMessage(content='The result of the calculation ((15 + 25) * 3) / (43 - 12 * 5) is approximately -1.38.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 33, 'prompt_tokens': 479, 'total_tokens': 512, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389715-qedJJZUOERL9a5OWifq6', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--8d07f598-ccba-49c1-847b-e1b0496f5ab3-0', usage_metadata={'input_tokens': 479, 'output_tokens': 33, 'total_tokens': 512, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [19]:
# Interactive cell - modify the query as needed
your_query = "Calculate the area of a rectangle with width 12.5 and height 8.3"
run_calculation(agent, your_query)

📐 Extracted expression: 12.5 * 8.3
🧮 Python eval result: 103.75000000000001


🧮 QUERY: Calculate the area of a rectangle with width 12.5 and height 8.3
🛠️  multiply({'a': 12.5, 'b': 8.3}) -> 103.75000000000001
🛠️  multiply({'a': 12.5, 'b': 8.3}) -> 103.75000000000001
🎯 result = 103.75000000000001

🤖 FINAL RESPONSE: The area of the rectangle is 103.75 square units.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.49 seconds


{'agent': {'messages': [AIMessage(content='The area of the rectangle is 103.75 square units.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 402, 'total_tokens': 416, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_c4c155951e', 'id': 'gen-1757389718-9LJsJoR2vL2QOP2uzloF', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--1ec57daa-27a6-4d76-ada1-3d9fc2f6ee4d-0', usage_metadata={'input_tokens': 402, 'output_tokens': 14, 'total_tokens': 416, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [20]:
# Test complex calculation with validation 
print("🧪 Testing complex calculation validation:")
run_calculation(agent, "Calculate (15 + 25) * 3", repeat=3)

🧪 Testing complex calculation validation:
📐 Extracted expression: (15 + 25) * 3
🧮 Python eval result: 120.0


🧮 QUERY (Run 1/3): Calculate (15 + 25) * 3
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🛠️  multiply({'a': 40, 'b': 3}) -> 120.0
🎯 result = 120.0

🤖 FINAL RESPONSE (Run 1): The result of (15 + 25) * 3 is 120.
⏱️  Execution time: 2.44 seconds

🧮 QUERY (Run 2/3): Calculate (15 + 25) * 3
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🛠️  multiply({'a': 40, 'b': 3}) -> 120.0
🎯 result = 120.0

🤖 FINAL RESPONSE (Run 2): The result of (15 + 25) * 3 is 120.
⏱️  Execution time: 2.00 seconds

🧮 QUERY (Run 3/3): Calculate (15 + 25) * 3
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🛠️  multiply({'a': 3, 'b': 1}) -> 3.0
🎯 result = 3.0

🤖 FINAL RESPONSE (Run 3): First, 15 + 25 equals 40. Then, multiplying 40 by 3 gives 120.
⏱️  Execution time: 2.29 seconds

📋 VALIDATION SUMMARY (3 runs)
🔍 VALIDATION: ❌ wrong (Agent: 120.0, Python: 120.0)
   • Inc

{'agent': {'messages': [AIMessage(content='First, 15 + 25 equals 40. Then, multiplying 40 by 3 gives 120.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 378, 'total_tokens': 402, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_c4c155951e', 'id': 'gen-1757389725-VwM5cIr6yoIteY6Nclzz', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--f0f6cfc2-a19e-4da2-b789-713cc37fc671-0', usage_metadata={'input_tokens': 378, 'output_tokens': 24, 'total_tokens': 402, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [21]:
# Test multiplication with validation (repeat=5)
print("🧪 Testing multiplication validation with 5 runs:")
run_calculation(agent, "What is 23 * 67?", repeat=5)

🧪 Testing multiplication validation with 5 runs:
📐 Extracted expression: 23 * 67
🧮 Python eval result: 1541.0


🧮 QUERY (Run 1/5): What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 result = 1541.0

🤖 FINAL RESPONSE (Run 1): The result of 23 multiplied by 67 is 1541.
⏱️  Execution time: 1.29 seconds

🧮 QUERY (Run 2/5): What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 result = 1541.0

🤖 FINAL RESPONSE (Run 2): The result of 23 multiplied by 67 is 1541.
⏱️  Execution time: 1.74 seconds

🧮 QUERY (Run 3/5): What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 result = 1541.0

🤖 FINAL RESPONSE (Run 3): The result of 23 multiplied by 67 is 1541.
⏱️  Execution time: 1.43 seconds

🧮 QUERY (Run 4/5): What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 result = 1541.0

🤖 FINAL RESPONSE (Run 4): The result of 23 multiplied by 67 is 1541.
⏱️  Execution time: 1.33 seconds

🧮 QUERY (Run 5/5): What is 23 * 67?
🛠️  multiply({'a': 23, 'b': 67}) -> 1541.0
🎯 res

{'agent': {'messages': [AIMessage(content='The result of 23 multiplied by 67 is 1541.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 334, 'total_tokens': 349, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389733-FlqtAm8HvJjeFJQQby8b', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--c7572508-e03c-433a-b9f6-344bc555c5c2-0', usage_metadata={'input_tokens': 334, 'output_tokens': 15, 'total_tokens': 349, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

## Enhanced Validation Testing

Testing the new validation feature that runs calculations multiple times to ensure consistency:

In [22]:
# Test the enhanced tool display
import sys
sys.path.append('./helpers')
from helpers.agent_utils import create_calculator_agent, run_calculation

# Create agent 
agent = create_calculator_agent()

# Test with a simple calculation to see the enhanced display
run_calculation(agent, "Calculate 15 + 25")

📐 Extracted expression: 15 + 25
🧮 Python eval result: 40.0


🧮 QUERY: Calculate 15 + 25
🛠️  add({'a': 15, 'b': 25}) -> 40.0
🎯 result = 40.0

🤖 FINAL RESPONSE: The result of 15 + 25 is 40.
🔍 VALIDATION: ✅ correct
✅ Calculation completed successfully!

⏱️  Execution time: 1.49 seconds


{'agent': {'messages': [AIMessage(content='The result of 15 + 25 is 40.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 331, 'total_tokens': 344, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'openai/gpt-4.1-nano', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'gen-1757389735-KqCtZ6RxPFsqKZ2iMfrS', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--da111f17-66f3-42a4-962e-1eb7103b944c-0', usage_metadata={'input_tokens': 331, 'output_tokens': 13, 'total_tokens': 344, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'reasoning': 0}})]}}

In [23]:
# Let's test the exact calculation from the notebook to see what should happen
expression = "((15 + 25) * 3)/(43 - 12*5)"
python_result = eval(expression)
print(f"Python eval: {python_result}")

# Let's see how the agent calculated it step by step
print("\nAgent's steps (from notebook):")
print("🛠️ add({'a': 15, 'b': 25}) -> 40.0")
print("🛠️ subtract({'a': 43, 'b': 12}) -> 31.0")  # ERROR! Should be 43 - (12*5) = 43-60 = -17
print("🛠️ multiply({'a': 25, 'b': 12}) -> 300.0")
print("🛠️ multiply({'a': 40, 'b': 3}) -> 120.0")
print("🛠️ divide({'a': 120, 'b': 31}) -> 3.870967741935484")

print(f"\nCorrect calculation:")
print(f"(15 + 25) = {15 + 25}")
print(f"12 * 5 = {12 * 5}")
print(f"43 - (12*5) = {43 - (12*5)}")
print(f"((15 + 25) * 3) = {(15 + 25) * 3}")
print(f"Final: {((15 + 25) * 3)}/{(43 - (12*5))} = {((15 + 25) * 3)/(43 - (12*5))}")

Python eval: -7.0588235294117645

Agent's steps (from notebook):
🛠️ add({'a': 15, 'b': 25}) -> 40.0
🛠️ subtract({'a': 43, 'b': 12}) -> 31.0
🛠️ multiply({'a': 25, 'b': 12}) -> 300.0
🛠️ multiply({'a': 40, 'b': 3}) -> 120.0
🛠️ divide({'a': 120, 'b': 31}) -> 3.870967741935484

Correct calculation:
(15 + 25) = 40
12 * 5 = 60
43 - (12*5) = -17
((15 + 25) * 3) = 120
Final: 120/-17 = -7.0588235294117645
