# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [1]:
from datetime import datetime
from agent import Agent
import datetime

In [2]:
## TODO: Create the agent's instructions

ECOHOME_SYSTEM_PROMPT = """
You are EcoHome Energy Advisor, an intelligent agent capable of optimizing energy usage across multiple smart home devices and systems.
Your goal is to answer questions regarding optimizing energy consumption and/or come up with personalized recommendations.

Guidelines:
- Rely on tool usage rather than own thinking or guessing
- Work data-driven and use them in your answer for reasoning
- Make clear reommendations by naming temperature ranges/hours/durations and so on rather than giving vague answers
- If you need clarification you are allowed to ask for it
- If possible estimate savings either as energy (kwh) or money (USD/EUR/...)
- Be honest, if you cannot solve a problem just say so and do not try to make something up
"""

In [3]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [4]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context=f"Location: San Francisco, CA; Time: {datetime.datetime.now()}"
)

In [5]:
print(response["messages"][-1].content)

It seems that I'm currently unable to retrieve the weather forecast for San Francisco, CA. However, I can provide you with the electricity pricing data for tomorrow.

### Electricity Pricing for December 29, 2025:
- **Off-Peak Hours (0:00 - 5:59)**: $0.15 per kWh
- **Peak Hours (6:00 - 21:59)**: $0.18 per kWh
- **Off-Peak Hours (22:00 - 23:59)**: $0.15 per kWh

### Recommendations for Charging Your Electric Car:
1. **Best Time to Charge**: 
   - **From 0:00 to 5:59**: Charge your electric car during these hours to take advantage of the lower rate of $0.15 per kWh.
   - **From 22:00 to 23:59**: You can also charge during these hours at the same lower rate.

2. **Avoid Charging**: 
   - **From 6:00 to 21:59**: Avoid charging during these peak hours as the rate is higher at $0.18 per kWh.

### Solar Power Consideration:
- If you can find out the solar generation forecast for tomorrow, you would ideally want to charge your car during the hours when solar generation is at its peak (usually 

In [7]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_electricity_prices
- get_weather_forecast
- get_weather_forecast
- get_electricity_prices
- get_weather_forecast


## 2. Define Test Cases

In [None]:
# TODO: Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [None]:
test_cases = [
    {
        "id": "ev_charging_1",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "The response should contain time recommendation, cost analysis and solar consideration",
    },
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")

## 3. Run Agent Tests

In [None]:
CONTEXT = "Location: San Francisco, CA"

In [None]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


In [None]:
test_results

## 4. Evaluate Responses

In [None]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [None]:
# TODO: Create a response evaluator
def evaluate_response(question, final_response, expected_response):
    """Evaluate a single response against expected response"""
    pass

In [None]:
# TODO: Create a tool udage evaluator
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    pass

In [None]:
# TODO: Generate a comprehensive evaluation report
# Calculate overall scores and metrics
# Identify strengths and weaknesses
# Provide recommendations for improvement
def generate_evaluation_report():
    pass