# Testing Agent Interfaces in the Sailing Environment

This notebook demonstrates how to use and evaluate different agents in the sailing environment. We'll explore three types of agents:
1. **Random Agent**: Makes random decisions (baseline)
2. **North Agent**: Always tries to move north
3. **Smart Agent**: Uses sailing physics to make informed decisions

We'll test these agents in a simple scenario and compare their performance.

In [1]:
# Cell 1: Imports and Setup
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display

# Add the src directory to the path
sys.path.append(os.path.abspath('../src'))
sys.path.append(os.path.abspath('..'))  # Add project root to path

# Import the environment and evaluation modules
from env_sailing import SailingEnv
from evaluation import evaluate_agent, visualize_trajectory
from agents.agent_north import AgentNorth
from agents.agent_random import AgentRandom  # Corrected import name
from agents.agent_smart import AgentSmart
from agents.base_agent import BaseAgent

# Import scenarios
from scenarios import get_scenario, SCENARIOS

## Part 1: Testing Individual Agents
Let's start by testing each agent individually to understand their behavior.

In [2]:
# Cell 2: Setup Simple Test Scenario with Visualization Parameters
# Get the simple test scenario
scenario = get_scenario('simple_test')

# Add visualization parameters to the scenario
viz_params = {
    'env_params': {
        'wind_grid_density': 25,    # Fewer arrows = clearer visualization
        'wind_arrow_scale': 80,     # Larger value = smaller arrows
        'render_mode': "rgb_array"
    }
}
scenario.update(viz_params)

### Random Agent Test
First, let's test the random agent in a single episode. This will help us understand the baseline performance and visualize how a random policy behaves in the environment.

In [3]:
# Cell 3: Test Random Agent
# Create and test random agent
random_agent = AgentRandom()  # Corrected class name
results = evaluate_agent(
    agent=random_agent,
    scenario=scenario,
    seeds=42,  # Single seed for reproducibility
    max_horizon=100,  # Using shorter horizon
    verbose=True,
    render=True,
    full_trajectory=True  # Enable full trajectory for visualization
)

# Display results
print("Random Agent Results:")
print(f"Total Reward: {results['mean_reward']:.2f} ± {results['std_reward']:.2f}")
print(f"Success Rate: {results['success_rate']:.2%}")
print(f"Average Steps: {results['mean_steps']:.1f} ± {results['std_steps']:.1f}")

# Visualize trajectory with slider
visualize_trajectory(results, None, with_slider=True)

Evaluating seeds:   0%|          | 0/1 [00:00<?, ?it/s]

  direction_normalized = direction / np.linalg.norm(direction)


Random Agent Results:
Total Reward: 0.00 ± 0.00
Success Rate: 0.00%
Average Steps: 100.0 ± 0.0


interactive(children=(IntSlider(value=0, description='Step:', max=99), Output()), _dom_classes=('widget-intera…

### Multiple Seeds Evaluation
To get a more robust evaluation of the random agent's performance, let's test it with multiple random seeds. This gives us a better understanding of its average behavior.

In [4]:
# Cell 4: Test Random Agent with Multiple Seeds
print("Testing random agent with multiple seeds...")
results = evaluate_agent(
    agent=random_agent,
    scenario=scenario,
    seeds=[42, 43, 44, 45, 46],  # Multiple seeds
    max_horizon=1000,
    verbose=True,
    render=False  # No rendering needed for multiple seeds
)

# Display aggregate results
print("\nAggregate Results:")
print(f"Mean Reward: {results['mean_reward']:.2f} ± {results['std_reward']:.2f}")
print(f"Success Rate: {results['success_rate']:.2%}")
print(f"Average Steps: {results['mean_steps']:.1f} ± {results['std_steps']:.1f}")

Testing random agent with multiple seeds...


Evaluating seeds:   0%|          | 0/5 [00:00<?, ?it/s]


Aggregate Results:
Mean Reward: 0.00 ± 0.00
Success Rate: 0.00%
Average Steps: 1000.0 ± 0.0


### North Agent Test
Now let's test the North Agent, which always tries to move northward. This is a simple heuristic strategy that might work well when the goal is north of the starting position.

In [5]:
# Cell 5: Test North Agent
print("Testing NorthAgent...")
north_agent = AgentNorth()
north_results = evaluate_agent(
    agent=north_agent,
    scenario=scenario,
    seeds=42,
    max_horizon=100,
    verbose=True,
    render=True,
    full_trajectory=True
)

# Display results
print("\nNorthAgent Results:")
print(f"Total Reward: {north_results['mean_reward']:.2f} ± {north_results['std_reward']:.2f}")
print(f"Success Rate: {north_results['success_rate']:.2%}")
print(f"Average Steps: {north_results['mean_steps']:.1f} ± {north_results['std_steps']:.1f}")

# Visualize trajectory
visualize_trajectory(north_results, None, with_slider=True)

Testing NorthAgent...


Evaluating seeds:   0%|          | 0/1 [00:00<?, ?it/s]


NorthAgent Results:
Total Reward: 0.00 ± 0.00
Success Rate: 0.00%
Average Steps: 100.0 ± 0.0


interactive(children=(IntSlider(value=0, description='Step:', max=99), Output()), _dom_classes=('widget-intera…

### Smart Agent Test
Finally, let's test the Smart Agent, which uses sailing physics to make informed decisions about movement direction.

In [6]:
# Cell 6: Test Smart Agent
print("Testing SmartAgent...")
smart_agent = AgentSmart()
smart_results = evaluate_agent(
    agent=smart_agent,
    scenario=scenario,
    seeds=42,
    max_horizon=100,
    verbose=True,
    render=True,
    full_trajectory=True
)

# Display results
print("\nSmartAgent Results:")
print(f"Total Reward: {smart_results['mean_reward']:.2f} ± {smart_results['std_reward']:.2f}")
print(f"Success Rate: {smart_results['success_rate']:.2%}")
print(f"Average Steps: {smart_results['mean_steps']:.1f} ± {smart_results['std_steps']:.1f}")

# Visualize trajectory
visualize_trajectory(smart_results, None, with_slider=True)

Testing SmartAgent...


Evaluating seeds:   0%|          | 0/1 [00:00<?, ?it/s]


SmartAgent Results:
Total Reward: 62.35 ± 0.00
Success Rate: 0.00%
Average Steps: 48.0 ± 0.0


interactive(children=(IntSlider(value=0, description='Step:', max=47), Output()), _dom_classes=('widget-intera…

## Part 2: Comparing All Agents
Now that we've tested each agent individually, let's compare them all together using multiple seeds for a fair comparison. This will help us understand their relative performance.

Key metrics we'll compare:
- Mean reward and its standard deviation
- Success rate in reaching the goal
- Average number of steps taken
- Trajectory visualization for seed 42

In [7]:
# Cell 7: Compare All Agents
# Create different agents
print("Creating agents...")
agent_north = AgentNorth()
agent_smart = AgentSmart()
agent_random = AgentRandom()  # Corrected class name

# Create a dictionary of agents
agents = {
    'North': agent_north,
    'Smart': agent_smart,
    'Random': agent_random
}

print("\nComparing agents with multiple seeds...")
seeds = [42, 43, 44, 45, 46]  # Test with 5 different seeds
max_horizon = 1000  # Give agents enough time to reach the goal

# Store results for each agent
all_results = {}

for agent_name, agent in agents.items():
    print(f"\nEvaluating {agent_name}...")
    results = evaluate_agent(
        agent=agent,
        scenario=scenario,
        seeds=seeds,
        max_horizon=max_horizon,
        verbose=True,
        render=False  # No need to render for multiple seeds
    )
    all_results[agent_name] = results
    
    # Print aggregate results for this agent
    print(f"\n{agent_name} Aggregate Results:")
    print(f"Mean Reward: {results['mean_reward']:.2f} ± {results['std_reward']:.2f}")
    print(f"Success Rate: {results['success_rate']:.2%}")
    print(f"Average Steps: {results['mean_steps']:.1f} ± {results['std_steps']:.1f}")

# Visualize trajectories for one seed
for agent_name, agent in agents.items():
    print(f"\nVisualizing {agent_name} trajectory (seed 42)...")
    results = evaluate_agent(
        agent=agent,
        scenario=scenario,
        seeds=42,
        max_horizon=max_horizon,
        verbose=False,
        render=True,
        full_trajectory=True
    )
    visualize_trajectory(results, None, with_slider=True)

Creating agents...

Comparing agents with multiple seeds...

Evaluating North...


Evaluating seeds:   0%|          | 0/5 [00:00<?, ?it/s]


North Aggregate Results:
Mean Reward: 10.34 ± 4.06
Success Rate: 0.00%
Average Steps: 236.2 ± 45.8

Evaluating Smart...


Evaluating seeds:   0%|          | 0/5 [00:00<?, ?it/s]


Smart Aggregate Results:
Mean Reward: 60.52 ± 1.50
Success Rate: 0.00%
Average Steps: 51.0 ± 2.4

Evaluating Random...


Evaluating seeds:   0%|          | 0/5 [00:00<?, ?it/s]


Random Aggregate Results:
Mean Reward: 0.00 ± 0.00
Success Rate: 0.00%
Average Steps: 1000.0 ± 0.0

Visualizing North trajectory (seed 42)...


interactive(children=(IntSlider(value=0, description='Step:', max=190), Output()), _dom_classes=('widget-inter…


Visualizing Smart trajectory (seed 42)...


interactive(children=(IntSlider(value=0, description='Step:', max=46), Output()), _dom_classes=('widget-intera…


Visualizing Random trajectory (seed 42)...


interactive(children=(IntSlider(value=0, description='Step:', max=999), Output()), _dom_classes=('widget-inter…

## Conclusion

This notebook demonstrated how to:
1. Create and test different types of agents in the sailing environment
2. Evaluate agent performance using single and multiple seeds
3. Visualize agent trajectories using an interactive slider
4. Compare different agents' performance using various metrics

Key takeaways:
- The Random Agent serves as a baseline but performs poorly as expected
- The North Agent shows how a simple heuristic can sometimes be effective
- The Smart Agent demonstrates how using domain knowledge (sailing physics) can improve performance

Next steps:
- Try the agents in different scenarios (e.g., with varying wind patterns)
- Modify the Smart Agent to improve its performance
- Create your own agent by implementing the BaseAgent interface