# EcoHome Energy Advisor - Agent Run & Evaluation

In this notebook, you'll run the Energy Advisor agent with various real-world scenarios and see how it helps customers optimize their energy usage.

## Learning Objectives
- Create the agent's instructions
- Run the Energy Advisor with different types of questions
- Evaluate response quality and accuracy
- Measure tool usage effectiveness
- Identify areas for improvement
- Implement evaluation metrics

## Evaluation Criteria
- **Accuracy**: Correct information and calculations
- **Relevance**: Responses address the user's question
- **Completeness**: Comprehensive answers with actionable advice
- **Tool Usage**: Appropriate use of available tools
- **Reasoning**: Clear explanation of recommendations


## 1. Import and Initialize

In [45]:
from datetime import datetime
from agent import Agent

In [46]:
## TODO: Create the agent's instructions

ECOHOME_SYSTEM_PROMPT = """
You are EcoHome Energy Advisor, an intelligent agent capable of optimizing energy usage across multiple smart home devices and systems.
Your goal is to answer questions regarding optimizing energy consumption and/or come up with personalized recommendations.

Guidelines:
- Rely on tool usage rather than own thinking or guessing
- Work data-driven and use them in your answer for reasoning
- Make clear reommendations by naming temperature ranges/hours/durations and so on rather than giving vague answers
- If possible estimate savings either as energy (kwh) or money (USD/EUR/...)
- Be honest, if you cannot solve a problem just say so and do not try to make something up
"""

In [47]:
ecohome_agent = Agent(
    instructions=ECOHOME_SYSTEM_PROMPT,
)

In [48]:
response = ecohome_agent.invoke(
    question="When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
    context=f"Location: San Francisco; Time: {datetime.now()}"
)

In [49]:
print(response["messages"][-1].content)

To minimize costs and maximize solar power when charging your electric car in San Francisco on December 30, 2025, consider the following recommendations:

### Solar Power Generation
- **Peak Solar Generation**: The solar power generation is expected to be highest between **11:00 AM and 2:00 PM**. During this time, the solar irradiance will be at its peak, with values reaching up to **519.5 W/m²**.

### Electricity Pricing
- **Off-Peak Rates**: The electricity prices are lower during off-peak hours:
  - **0:00 AM - 5:00 AM**: $0.15 per kWh
  - **10:00 PM - 11:59 PM**: $0.15 per kWh
- **Peak Rates**: From **6:00 AM to 9:00 PM**, the rate is **$0.18 per kWh**.

### Recommendations
1. **Charge During Peak Solar Hours**: 
   - **Best Time to Charge**: **11:00 AM to 2:00 PM**. This will allow you to utilize solar power for charging, reducing reliance on grid electricity.
   
2. **Consider Off-Peak Charging**: 
   - If you need to charge overnight, consider charging from **12:00 AM to 5:00 AM

In [50]:
print("TOOLS:")
for msg in response["messages"]:
    obj = msg.model_dump()
    if obj.get("tool_call_id"):
        print("-", msg.name)

TOOLS:
- get_weather_forecast
- get_electricity_prices


## 2. Define Test Cases

In [51]:
# Define comprehensive test cases for the Energy Advisor
# Create 10 test cases covering different scenarios:
# - EV charging optimization
# - Thermostat settings
# - Appliance scheduling
# - Solar power maximization
# - Cost savings calculations

In [52]:
test_cases = [
    {
        "id": "ev_charging_1",
        "question": "When should I charge my electric car tomorrow to minimize cost and maximize solar power?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "The response should contain time recommendation, cost analysis and solar consideration",
    },
    {
        "id": "energy_tips_8",
        "question": "Can you suggest three practical ways to lower my household electricity consumption?",
        "expected_tools": ["search_energy_tips"],
        "expected_response": "Should provide three concrete, tailored recommendations to improve efficiency."
    },
    {
        "id": "laundry_4",
        "question": "What’s the cheapest time to run my washing machine over the weekend?",
        "expected_tools": ["get_electricity_prices"],
        "expected_response": "Should recognize weekend rate patterns and point out the lowest-cost hours."
    },
    {
        "id": "thermostat_2",
        "question": "To cut costs, what thermostat setting makes sense for Wednesday afternoon?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "Should recommend a numeric temperature range and justify it using price and weather data."
    },
    {
        "id": "usage_history_6",
        "question": "Which device consumed the most power in the previous month?",
        "expected_tools": ["query_energy_usage"],
        "expected_response": "Should name the top-consuming appliance and include kWh usage and cost details."
    },
    {
        "id": "dishwasher_3",
        "question": "What kind of savings could I see if I run my dishwasher overnight instead of around 6 PM?",
        "expected_tools": ["get_electricity_prices", "calculate_energy_savings"],
        "expected_response": "Should approximate savings per run and per month based on TOU price differences."
    },
    {
        "id": "recent_summary_9",
        "question": "Can you give me a quick overview of my electricity use during the last 48 hours?",
        "expected_tools": ["get_recent_energy_summary"],
        "expected_response": "Should include total kWh, total cost, a device-level breakdown, and brief insights."
    },
    {
        "id": "solar_forecast_5",
        "question": "How much solar power is likely to be generated tomorrow in San Francisco?",
        "expected_tools": ["get_weather_forecast"],
        "expected_response": "Should mention expected sunshine, irradiance, or typical generation trends."
    },
    {
        "id": "optimization_multi_device_7",
        "question": "Can you plan the optimal run times for my EV, dishwasher, and dryer tomorrow to minimize costs?",
        "expected_tools": ["get_electricity_prices", "get_weather_forecast"],
        "expected_response": "Should suggest a coordinated schedule that avoids peak pricing."
    },
    {
        "id": "pool_pump_10",
        "question": "When should I operate my pool pump over the coming week for best efficiency and cost?",
        "expected_tools": ["get_weather_forecast", "get_electricity_prices"],
        "expected_response": "Should outline a day-by-day schedule that balances sunlight availability with off-peak rates."
    }
]

if len(test_cases) < 10:
    raise ValueError("You MUST have at least 10 test cases")

## 3. Run Agent Tests

In [53]:
CONTEXT = f"Location: San Francisco; Time: {datetime.now()}"

In [54]:
# Run the agent tests
# For each test case, call the agent and collect the response
# Store results for evaluation

print("=== Running Agent Tests ===")
test_results = []

for i, test_case in enumerate(test_cases):
    print(f"\nTest {i+1}: {test_case['id']}")
    print(f"Question: {test_case['question']}")
    print("-" * 50)
    
    try:
        # Call the agent
        response = ecohome_agent.invoke(
            question=test_case['question'],
            context=CONTEXT
        )
        
        # Store the result
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': response,
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat()
        }
        test_results.append(result)
                
    except Exception as e:
        print(f"Error: {e}")
        result = {
            'test_id': test_case['id'],
            'question': test_case['question'],
            'response': f"Error: {str(e)}",
            'expected_tools': test_case['expected_tools'],
            'expected_response': test_case['expected_response'],
            'timestamp': datetime.now().isoformat(),
            'error': str(e)
        }
        test_results.append(result)

print(f"\nCompleted {len(test_results)} tests")


=== Running Agent Tests ===

Test 1: ev_charging_1
Question: When should I charge my electric car tomorrow to minimize cost and maximize solar power?
--------------------------------------------------

Test 2: energy_tips_8
Question: Can you suggest three practical ways to lower my household electricity consumption?
--------------------------------------------------

Test 3: laundry_4
Question: What’s the cheapest time to run my washing machine over the weekend?
--------------------------------------------------

Test 4: thermostat_2
Question: To cut costs, what thermostat setting makes sense for Wednesday afternoon?
--------------------------------------------------

Test 5: usage_history_6
Question: Which device consumed the most power in the previous month?
--------------------------------------------------

Test 6: dishwasher_3
Question: What kind of savings could I see if I run my dishwasher overnight instead of around 6 PM?
--------------------------------------------------

Test

In [55]:
from rich.pretty import Pretty

Pretty(test_results[-1])

## 4. Evaluate Responses

In [56]:
called_tools = [m.name for m in test_results[-1]["response"]["messages"] if m.model_dump().get("tool_call_id")]
Pretty(called_tools)

In [57]:
from ragas.integrations.langgraph import convert_to_ragas_messages

convert_to_ragas_messages(test_results[-1]["response"]["messages"])

[HumanMessage(content='When should I operate my pool pump over the coming week for best efficiency and cost?', metadata=None, type='human'),
 AIMessage(content='', metadata=None, type='ai', tool_calls=[ToolCall(name='get_weather_forecast', args={'location': 'San Francisco', 'days': 7}), ToolCall(name='get_electricity_prices', args={})]),
 ToolMessage(content='{"location": "San Francisco, United States", "forecast_days": 7, "current": {"temperature_c": 7.4, "condition": "sunny", "cloud_cover": 0, "solar_irradiance": 0.0, "humidity": 60, "wind_speed": 12.2}, "units": {"time": "iso8601", "temperature_2m": "°C", "relative_humidity_2m": "%", "wind_speed_10m": "km/h", "shortwave_radiation": "W/m²", "cloud_cover": "%"}, "hourly": [{"time": "2025-12-29T00:00", "temperature_c": 8.1, "condition": "sunny", "cloud_cover": 0, "solar_irradiance": 0.0, "humidity": 71, "wind_speed": 14.2}, {"time": "2025-12-29T01:00", "temperature_c": 8.0, "condition": "sunny", "cloud_cover": 0, "solar_irradiance": 0.

In [58]:
# TODO: Implement evaluation functions
# Create functions to evaluate:
# - Final Response
# - Tool usage

In [59]:
from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import AnswerAccuracy

import os

# TODO: Create a response evaluator
def evaluate_response(question, final_response, expected_response):
    """Evaluate a single response against expected response"""
    print("question: ", question)
    print("final_response: ", final_response)
    print("expected_response: ", expected_response)
    
    client = AsyncOpenAI()
    llm = llm_factory("gpt-4o-mini", client=client, api_key = os.getenv("OPEN_AI_API_KEY"))

    # Create metric
    scorer = AnswerAccuracy(llm=llm)

    # Evaluate
    result = scorer.ascore(
        user_input="When was Einstein born?",
        response="Albert Einstein was born in 1879.",
        reference="Albert Einstein was born in 1879."
    )
    return result

bla = await evaluate_response(
    question=test_results[-2]["question"],
    final_response=test_results[-2]["response"]["messages"][-1].content,
    expected_response=test_results[-2]["expected_response"]
)
print(f"Answer Accuracy Score: {bla.value}")

question:  Can you plan the optimal run times for my EV, dishwasher, and dryer tomorrow to minimize costs?
final_response:  To minimize costs for running your EV, dishwasher, and dryer tomorrow (December 30, 2025), we can analyze the electricity pricing and plan the optimal run times based on the time-of-use rates.

### Electricity Pricing for December 30, 2025:
- **Off-Peak Hours (0:00 - 5:59)**: $0.15 per kWh
- **Peak Hours (6:00 - 19:59)**: $0.18 per kWh
- **Off-Peak Hours (20:00 - 23:59)**: $0.15 per kWh

### Recommendations for Device Run Times:
1. **EV Charging**:
   - **Optimal Time**: Charge during off-peak hours.
   - **Recommended Hours**: 12:00 AM - 6:00 AM (0:00 - 5:59) or 8:00 PM - 11:59 PM (20:00 - 23:59).

2. **Dishwasher**:
   - **Optimal Time**: Run during off-peak hours.
   - **Recommended Hours**: 12:00 AM - 6:00 AM (0:00 - 5:59) or 8:00 PM - 11:59 PM (20:00 - 23:59).

3. **Dryer**:
   - **Optimal Time**: Run during off-peak hours.
   - **Recommended Hours**: 12:00 A

In [41]:
test_results[-3]["response"]["messages"][-1].content

'It appears that there is no solar power generation data available for tomorrow (December 30, 2025) in San Francisco. This could be due to various factors, such as weather conditions or the absence of solar generation records for that date.\n\nIf you have specific solar panels installed, I recommend checking their performance metrics or contacting your solar provider for more accurate predictions.'

In [21]:
# TODO: Create a tool udage evaluator
def evaluate_tool_usage(messages, expected_tools):
    """Evaluate if the right tools were used"""
    tools_called = [m.name for m in messages if m.model_dump().get("tool_call_id")]
    correctly_called_tools = [t for t in called_tools if t in expected_tools]

    return {
        "appropriateness": len(correctly_called_tools) / len(tools_called),
        "completeness": len(correctly_called_tools) / len(expected_tools)
    }


evaluate_tool_usage(
    test_results[-3]["response"]["messages"],
    test_results[-3]["expected_tools"]
)

{'appropriateness': 1.0, 'completeness': 1.0}

In [None]:
# TODO: Generate a comprehensive evaluation report
# Calculate overall scores and metrics
# Identify strengths and weaknesses
# Provide recommendations for improvement
def generate_evaluation_report():
    pass