# Intelligent Agents: Reflex-Based Agents for the Vacuum-cleaner World

Student Name: [Add your name]

I have used the following AI tools: 
- Cursor AI Assistant for code implementation guidance and debugging
- AI assistance for understanding agent architectures and implementation strategies
- AI help with debugging and explaining complex concepts from class

I understand that my submission needs to be my own work: [your initials]

## Learning Outcomes

* Design and build a simulation environment that models sensor inputs, actuator effects, and performance measurement.
* Apply core AI concepts by implementing the agent function for a simple and model-based reflex agents that respond to environmental percepts.
* Practice how the environment and the agent function interact.
* Analyze agent performance through controlled experiments across different environment configurations.
* Graduate Students: Develop strategies for handling uncertainty and imperfect information in autonomous agent systems.

## Instructions

Total Points: Undergrads 100 + 5 bonus / Graduate students 110

Complete this notebook. Use the provided notebook cells and insert additional code and markdown cells as needed. Submit the completely rendered notebook as a HTML file.

### AI Use

Here are some guidelines that will make it easier for you:

* __Don't:__ Rely on AI auto completion. You will waste a lot of time trying to figure out how the suggested code relates to what we do in class. Turn off AI code completion (e.g., Copilot) in your IDE.
* __Don't:__ Do not submit code/text that you do not understand or have not checked to make sure that it is complete and correct.
* __Do:__ Use AI for debugging and letting it explain code and concepts from class.

### Using Visual Studio Code

If you use VS code then you can use `Export` (click on `...` in the menu bar) to save your notebook as a HTML file. Note that you have to run all blocks before so the HTML file contains your output.

### Using Google Colab

In Colab you need to save the notebook on GoogleDrive to work with it. For this you need to mount your google dive and change to the correct directory by uncommenting the following lines and running the code block.

In [None]:
# from google.colab import drive
# import os
#
# drive.mount('/content/drive')
# os.chdir('/content/drive/My Drive/Colab Notebooks/')

Once you are done with the assignment and have run all code blocks using `Runtime/Run all`, you can convert the file on your GoogleDrive into HTML be uncommenting the following line and running the block.

In [None]:
# %jupyter nbconvert --to html Copy\ of\ robot_vacuum.ipynb

You may have to fix the file location or the file name to match how it looks on your GoogleDrive. You can navigate in Colab to your GoogleDrive using the little folder symbol in the navigation bar to the left.

## Introduction

In this assignment you will implement a simulator environment for an automatic vacuum cleaner robot, a set of different reflex-based agent programs, and perform a comparison study for cleaning a single room. Focus on the __cleaning phase__ which starts when the robot is activated and ends when the last dirty square in the room has been cleaned. Someone else will take care of the agent program needed to navigate back to the charging station after the room is clean.

## PEAS description of the cleaning phase

__Performance Measure:__ Each action costs 1 energy unit. The performance is measured as the sum of the energy units used to clean the whole room.

__Environment:__ A room with $n \times n$ squares where $n = 5$. Dirt is randomly placed on each square with probability $p = 0.2$. For simplicity, you can assume that the agent knows the size and the layout of the room (i.e., it knows $n$). To start, the agent is placed on a random square.

__Actuators:__ The agent can clean the current square (action `suck`) or move to an adjacent square by going `north`, `east`, `south`, or `west`.

__Sensors:__ Four bumper sensors, one for north, east, south, and west; a dirt sensor reporting dirt in the current square.  


## The agent program for a simple randomized agent

The agent program is a function that gets sensor information (the current percepts) as the arguments. The arguments are:

* A dictionary with boolean entries for the for bumper sensors `north`, `east`, `west`, `south`. E.g., if the agent is on the north-west corner, `bumpers` will be `{"north" : True, "east" : False, "south" : False, "west" : True}`.
* The dirt sensor produces a boolean.

The agent returns the chosen action as a string.

Here is an example implementation for the agent program of a simple randomized agent:  

In [None]:
# make sure numpy is installed
%pip install -q numpy

Note: you may need to restart the kernel to use updated packages.


In [None]:
import numpy as np

actions = ["north", "east", "west", "south", "suck"]

def simple_randomized_agent(bumpers, dirty):
    return np.random.choice(actions)

In [None]:
# define percepts (current location is NW corner and it is dirty)
bumpers = {"north" : True, "east" : False, "south" : False, "west" : True}
dirty = True

# call agent program function with percepts and it returns an action
simple_randomized_agent(bumpers, dirty)

np.str_('east')

__Note:__ This is not a rational intelligent agent. It ignores its sensors and may bump into a wall repeatedly or not clean a dirty square. You will be asked to implement rational agents below.

## Simple environment example

We implement a simple simulation environment that supplies the agent with its percepts.
The simple environment is infinite in size (bumpers are always `False`) and every square is always dirty, even if the agent cleans it. The environment function returns a different performance measure than the one specified in the PEAS description! Since the room is infinite and all squares are constantly dirty, the agent can never clean the whole room. Your implementation needs to implement the **correct performance measure.** The energy budget of the agent is specified as `max_steps`.

In [None]:
def simple_environment(agent_function, max_steps, verbose = True):
    num_cleaned = 0

    for i in range(max_steps):
        dirty = True
        bumpers = {"north" : False, "south" : False, "west" : False, "east" : False}

        action = agent_function(bumpers, dirty)
        if (verbose): print("step", i , "- action:", action)

        if (action == "suck"):
            num_cleaned = num_cleaned + 1

    return num_cleaned



Do one simulation run with a simple randomized agent that has enough energy for 20 steps.

In [None]:
simple_environment(simple_randomized_agent, max_steps = 20)

step 0 - action: north
step 1 - action: west
step 2 - action: suck
step 3 - action: south
step 4 - action: west
step 5 - action: suck
step 6 - action: south
step 7 - action: south
step 8 - action: east
step 9 - action: south
step 10 - action: north
step 11 - action: south
step 12 - action: east
step 13 - action: north
step 14 - action: south
step 15 - action: north
step 16 - action: south
step 17 - action: south
step 18 - action: west
step 19 - action: north


2

# Tasks

## General [10 Points]

1. Make sure that you use the latest version of this notebook.
2. Your implementation can use libraries like math, numpy, scipy, but not libraries that implement intelligent agents or complete search algorithms. Try to keep the code simple! In this course, we want to learn about the algorithms and we often do not need to use object-oriented design.
3. You notebook needs to be formatted professionally.
    - Add additional markdown blocks for your description, comments in the code, add tables and use mathplotlib to produce charts where appropriate
    - Do not show debugging output or include an excessive amount of output.
    - Check that your submitted file is readable and contains all figures.
4. Document your code. Use comments in the code and add a discussion of how your implementation works and your design choices.


## Task 1: Implement a simulation environment [20 Points]

The simple environment above is not very realistic. Your environment simulator needs to follow the PEAS description from above. It needs to:

* Initialize the environment by storing the state of each square (clean/dirty) and making some dirty. ([Help with random numbers and arrays in Python](https://colab.research.google.com/drive/1RRzbPq-oel_rzi2GOptCFyxLpi3a32Mc?usp=sharing))
* Keep track of the agent's position.
* Call the agent function repeatedly and provide the agent function with the sensor inputs.  
* React to the agent's actions. E.g, by removing dirt from a square or moving the agent around unless there is a wall in the way.
* Keep track of the performance measure. That is, track the agent's actions until all dirty squares are clean and count the number of actions it takes the agent to complete the task.

The easiest implementation for the environment is to hold an 2-dimensional array to represent if squares are clean or dirty and to call the agent function in a loop until all squares are clean or a predefined number of steps have been reached (i.e., the robot runs out of energy).

The simulation environment should be a function like the `simple_environment()` and needs to work with the simple randomized agent program from above. **Use the same environment for all your agent implementations in the tasks below.**

*Note on debugging:* Debugging is difficult. Make sure your environment prints enough information when you use `verbose = True`. Also, implementing a function that the environment can use to displays the room with dirt and the current position of the robot at every step is very useful.  

In [None]:
# Task 1: Improved Simulation Environment Implementation
# 
# This is a cleaner, more readable version of the vacuum environment
# that implements the PEAS description with better code structure

import numpy as np
import random

def vacuum_environment(agent_function, room_size=5, dirt_prob=0.2, max_steps=1000, verbose=False):
    """
    Simulation environment for vacuum cleaner robot.
    
    Args:
        agent_function: The agent program function
        room_size: Size of the square room (default 5x5)
        dirt_prob: Probability that each square starts dirty (default 0.2)
        max_steps: Maximum number of steps before timeout (default 1000)
        verbose: Whether to print debug information
    
    Returns:
        tuple: (total_energy_used, success_flag, steps_taken)
    """
    
    # 1. Build the room: each square has a dirt_prob probability of being dirty
    room = np.random.random((room_size, room_size)) < dirt_prob
    
    # 2. Put the robot in a random spot
    x = random.randint(0, room_size - 1)
    y = random.randint(0, room_size - 1)

    energy_used = 0
    steps_taken = 0

    if verbose:
        print("Starting room (1=dirty, 0=clean):")
        print(room.astype(int))
        print(f"Robot starts at ({x}, {y})\n")

    # 3. Keep going until energy runs out
    while energy_used < max_steps:
        # Stop if everything is clean
        if np.sum(room) == 0:
            if verbose:
                print(f"All clean in {energy_used} steps!")
            return energy_used, True, steps_taken
        
        # 4. Robot sensors
        bumpers = {
            "north": y == 0,
            "south": y == room_size - 1,
            "west": x == 0,
            "east": x == room_size - 1
        }
        dirty_here = room[y, x]

        if verbose:
            print(f"Step {energy_used}: at ({x},{y}), dirty={dirty_here}")

        # 5. Ask the robot what to do
        action = agent_function(bumpers, dirty_here)

        # 6. Carry out the action
        if action == "suck":
            room[y, x] = False   # clean the square
            if verbose: print(" → Sucked up dirt")
        elif action == "north" and not bumpers["north"]:
            y -= 1
            if verbose: print(" → Moved north")
        elif action == "south" and not bumpers["south"]:
            y += 1
            if verbose: print(" → Moved south")
        elif action == "west" and not bumpers["west"]:
            x -= 1
            if verbose: print(" → Moved west")
        elif action == "east" and not bumpers["east"]:
            x += 1
            if verbose: print(" → Moved east")
        else:
            if verbose: print(f" → Invalid or bump action: {action}")

        energy_used += 1
        steps_taken += 1

    # If we ran out of steps
    if verbose:
        print(f"Stopped after {max_steps} steps. Dirt left: {np.sum(room)}")
    return energy_used, False, steps_taken

def display_room_state(room, agent_x, agent_y):
    """
    Display the current room state with agent position.
    """
    room_size = room.shape[0]
    print("Room state (D=dirty, C=clean, A=agent):")
    for y in range(room_size):
        row = ""
        for x in range(room_size):
            if x == agent_x and y == agent_y:
                row += "A "
            elif room[y, x]:
                row += "D "
            else:
                row += "C "
        print(row)
    print()


In [None]:
# Old Task 1 implementation - replaced with cleaner version above
# 
# This environment implements the PEAS description:
# - 5x5 room with random dirt placement (probability p=0.2)
# - Agent starts at random position
# - Tracks agent position and room state
# - Provides bumper and dirt sensors to agent
# - Measures performance as total energy units used to clean all dirty squares

import numpy as np
import random

def vacuum_environment(agent_function, room_size=5, dirt_prob=0.2, max_steps=1000, verbose=False):
    room = np.random.random((room_size, room_size)) < dirt_prob
    initial_dirty_count = np.sum(room)
    
    # Random starting position
    agent_x = random.randint(0, room_size - 1)
    agent_y = random.randint(0, room_size - 1)
    
    if verbose:
        print(f"Initial room state (1=dirty, 0=clean):")
        print(room.astype(int))
        print(f"Agent starts at position ({agent_x}, {agent_y})")
        print(f"Initial dirty squares: {initial_dirty_count}")
        print()
    
    energy_used = 0
    steps_taken = 0
    
    # Main simulation loop
    while energy_used < max_steps:
        # Check if room is completely clean
        if np.sum(room) == 0:
            if verbose:
                print(f"Room cleaned! Total energy used: {energy_used}")
            return energy_used, True, steps_taken
        
        # Create bumper sensors based on agent position
        bumpers = {
            "north": agent_y == 0,  # At top edge
            "south": agent_y == room_size - 1,  # At bottom edge
            "west": agent_x == 0,  # At left edge
            "east": agent_x == room_size - 1   # At right edge
        }
        
        # Dirt sensor for current position
        dirty = room[agent_y, agent_x]
        
        if verbose:
            print(f"Step {steps_taken}: Agent at ({agent_x}, {agent_y}), dirty={dirty}")
            print(f"Bumpers: {bumpers}")
        
        # Get action from agent
        action = agent_function(bumpers, dirty)
        
        if verbose:
            print(f"Action: {action}")
        
        # Execute action
        if action == "suck":
            if dirty:
                room[agent_y, agent_x] = False  # Clean the square
                if verbose:
                    print("Square cleaned!")
            else:
                if verbose:
                    print("Sucking on clean square (no effect)")
        
        elif action == "north":
            if agent_y > 0:
                agent_y -= 1
                if verbose:
                    print(f"Moved north to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into north wall")
        
        elif action == "south":
            if agent_y < room_size - 1:
                agent_y += 1
                if verbose:
                    print(f"Moved south to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into south wall")
        
        elif action == "west":
            if agent_x > 0:
                agent_x -= 1
                if verbose:
                    print(f"Moved west to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into west wall")
        
        elif action == "east":
            if agent_x < room_size - 1:
                agent_x += 1
                if verbose:
                    print(f"Moved east to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into east wall")
        
        else:
            if verbose:
                print(f"Invalid action: {action}")
        
        energy_used += 1
        steps_taken += 1
        
        if verbose:
            print(f"Remaining dirty squares: {np.sum(room)}")
            print()
    
    # Timeout reached
    if verbose:
        print(f"Timeout reached after {max_steps} steps. Room not fully cleaned.")
        print(f"Remaining dirty squares: {np.sum(room)}")
    
    return energy_used, False, steps_taken

def display_room_state(room, agent_x, agent_y):
    """
    Display the current room state with agent position.
    """
    room_size = room.shape[0]
    print("Room state (D=dirty, C=clean, A=agent):")
    for y in range(room_size):
        row = ""
        for x in range(room_size):
            if x == agent_x and y == agent_y:
                row += "A "
            elif room[y, x]:
                row += "D "
            else:
                row += "C "
        print(row)
    print()

Show that your environment works with the simple randomized agent from above.

In [None]:
# Test the environment with the simple randomized agent
print("Testing vacuum environment with simple randomized agent:")
print("=" * 60)

# Run a single test with verbose output to see how it works
energy, success, steps = vacuum_environment(simple_randomized_agent, room_size=5, verbose=True)

print(f"\nResults:")
print(f"Success: {success}")
print(f"Energy used: {energy}")
print(f"Steps taken: {steps}")

# Run multiple tests to get average performance
print("\n" + "=" * 60)
print("Running 10 tests to get average performance:")
print("=" * 60)

energies = []
successes = []
for i in range(10):
    energy, success, steps = vacuum_environment(simple_randomized_agent, room_size=5, verbose=False)
    energies.append(energy)
    successes.append(success)
    print(f"Test {i+1}: Energy={energy}, Success={success}")

print(f"\nAverage energy used: {np.mean(energies):.1f}")
print(f"Success rate: {np.mean(successes)*100:.1f}%")
print(f"Min energy: {min(energies)}")
print(f"Max energy: {max(energies)}")

In [None]:
# Task 2: Simple Reflex Agent (Improved Version)
# Imagine a robot vacuum. It can only sense:
# 1. Is the floor dirty right now?
# 2. Am I at a wall? (north, south, east, west)
#
# Rules for the robot:
# - If dirty → clean it ("suck")
# - If not dirty → pick a random safe direction (no bumping into walls)
# - If somehow stuck → just "suck" as a backup

import random

def simple_reflex_agent(bumpers, dirty):
    # If the robot sees dirt, it always cleans first
    if dirty:
        return "suck"
    
    # Otherwise, check which directions are safe (no wall)
    available_directions = []
    if not bumpers["north"]:
        available_directions.append("north")
    if not bumpers["south"]:
        available_directions.append("south")
    if not bumpers["east"]:
        available_directions.append("east")
    if not bumpers["west"]:
        available_directions.append("west")
    
    # If no safe moves (shouldn't really happen), just suck
    if not available_directions:
        return "suck"
    
    # Pick one safe direction randomly
    return random.choice(available_directions)

# Testing the robot brain
# The robot will:
# - Suck whenever it sees dirt
# - Move randomly around the room
# - Avoid crashing into walls
print("Simple Reflex Agent Demo")
print("This agent will:")
print("1. Always suck when it detects dirt")
print("2. Move randomly but avoid walls")
print("3. Never bump into walls")


## Task 2:  Implement a simple reflex agent [10 Points]

The simple reflex agent randomly walks around but reacts to the bumper sensor by not bumping into the wall and to dirt with sucking. Implement the agent program as a function.

_Note:_ Agents cannot directly use variable in the environment. They only gets the percepts as the arguments to the agent function. Use the function signature for the `simple_randomized_agent` function above.

In [None]:
# Task 2: Simple Reflex Agent Implementation
#
# The simple reflex agent reacts to sensor inputs:
# 1. If current square is dirty, suck it
# 2. If bumping into a wall, choose a different direction
# 3. Otherwise, move randomly but avoid walls

def simple_reflex_agent(bumpers, dirty):
    """
    Simple reflex agent that reacts to bumper and dirt sensors.
    
    Args:
        bumpers: Dictionary with boolean values for north, south, east, west
        dirty: Boolean indicating if current square is dirty
    
    Returns:
        str: Action to take ("north", "south", "east", "west", "suck")
    """
    
    # Rule 1: If current square is dirty, clean it
    if dirty:
        return "suck"
    
    # Rule 2: Choose a random direction that doesn't hit a wall
    available_directions = []
    
    if not bumpers["north"]:
        available_directions.append("north")
    if not bumpers["south"]:
        available_directions.append("south")
    if not bumpers["east"]:
        available_directions.append("east")
    if not bumpers["west"]:
        available_directions.append("west")
    
    # If no directions available (shouldn't happen in normal room), default to suck
    if not available_directions:
        return "suck"
    
    # Randomly choose from available directions
    return np.random.choice(available_directions)

# Test the simple reflex agent
print("Simple Reflex Agent:")
print("This agent will:")
print("1. Always suck when it detects dirt")
print("2. Move randomly but avoid walls")
print("3. Never bump into walls")

Show how the agent works with your environment.

In [None]:
# Test the simple reflex agent with the environment
print("Testing Simple Reflex Agent:")
print("=" * 60)

# Run a single test with verbose output
energy, success, steps = vacuum_environment(simple_reflex_agent, room_size=5, verbose=True)

print(f"\nResults:")
print(f"Success: {success}")
print(f"Energy used: {energy}")
print(f"Steps taken: {steps}")

# Run multiple tests to compare with randomized agent
print("\n" + "=" * 60)
print("Running 10 tests to compare performance:")
print("=" * 60)

energies = []
successes = []
for i in range(10):
    energy, success, steps = vacuum_environment(simple_reflex_agent, room_size=5, verbose=False)
    energies.append(energy)
    successes.append(success)
    print(f"Test {i+1}: Energy={energy}, Success={success}")

print(f"\nSimple Reflex Agent Performance:")
print(f"Average energy used: {np.mean(energies):.1f}")
print(f"Success rate: {np.mean(successes)*100:.1f}%")
print(f"Min energy: {min(energies)}")
print(f"Max energy: {max(energies)}")

# Compare with randomized agent
print("\n" + "=" * 60)
print("Comparison with Randomized Agent:")
print("=" * 60)

random_energies = []
random_successes = []
for i in range(10):
    energy, success, steps = vacuum_environment(simple_randomized_agent, room_size=5, verbose=False)
    random_energies.append(energy)
    random_successes.append(success)

print(f"Randomized Agent Performance:")
print(f"Average energy used: {np.mean(random_energies):.1f}")
print(f"Success rate: {np.mean(random_successes)*100:.1f}%")

print(f"\nImprovement:")
print(f"Energy reduction: {np.mean(random_energies) - np.mean(energies):.1f}")
print(f"Success rate improvement: {(np.mean(successes) - np.mean(random_successes))*100:.1f}%")

## Task 3: Implement a model-based reflex agent [20 Points]

Model-based agents use a state to keep track of what they have done and perceived so far. Your agent needs to find out where it is located and then keep track of its current location. You also need a set of rules based on the state and the percepts to make sure that the agent will clean the whole room. For example, the agent can move to a corner to determine its location and then it can navigate through the whole room and clean dirty squares.

Describe how you define the __agent state__ and how your agent works before implementing it. ([Help with implementing state information on Python](https://colab.research.google.com/drive/1gARICzulhRQLmwYYR4xRAF40AueYOyAY?usp=sharing))

In [1]:
## Model-Based Reflex Agent Design

### Agent State Design

The model-based reflex agent maintains the following state information:

1. **Current Position**: (x, y) coordinates of the agent
2. **Room Size**: Dimensions of the room (assumed to be square)
3. **Visited Squares**: Set of coordinates that have been visited
4. **Cleaned Squares**: Set of coordinates that have been cleaned
5. **Current Mode**: The agent operates in different modes:
   - `LOCATE`: Finding a corner to establish position
   - `EXPLORE`: Systematically visiting all squares
   - `CLEAN`: Cleaning dirty squares as found

### Implementation Strategy

1. **Position Discovery**: Start by moving to a corner (e.g., northwest corner) to establish absolute position
2. **Systematic Exploration**: Use a systematic pattern (like row-by-row) to visit all squares
3. **Dirt Cleaning**: Clean any dirty squares encountered during exploration
4. **State Updates**: Track visited and cleaned squares to avoid redundant actions

### Key Advantages

- **Complete Coverage**: Ensures all squares are visited
- **Efficient**: Avoids revisiting clean squares unnecessarily
- **Deterministic**: Predictable behavior and performance
- **Memory**: Maintains knowledge of room state

SyntaxError: invalid syntax (1604741280.py, line 5)

In [None]:
# Task 3: Model-Based Reflex Agent Implementation
# 
# This agent maintains internal state to track its position and plan its actions.
# It uses a systematic approach to ensure complete room coverage.

# Global state for the model-based agent
agent_state = {
    'position': None,    # Will be inferred from bumpers
    'room_size': 5,      # Assumed room size
    'visited': set(),    # Set of visited coordinates
    'cleaned': set(),    # Set of cleaned coordinates
    'mode': 'LOCATE',    # Current mode: LOCATE, EXPLORE
    'exploration_path': [],  # Planned path for exploration
    'path_index': 0,     # Current position in exploration path
    'last_action': None  # Track last action for position inference
}

def reset_agent_state():
    """Reset the agent state for a new run."""
    global agent_state
    agent_state = {
        'position': None,
        'room_size': 5,
        'visited': set(),
        'cleaned': set(),
        'mode': 'LOCATE',
        'exploration_path': [],
        'path_index': 0,
        'last_action': None
    }

def infer_position_from_bumpers(bumpers):
    """Infer current position based on bumper sensors."""
    # This is a simplified approach - in reality, we'd need more sophisticated tracking
    # For now, we'll use a simple heuristic based on wall proximity
    
    # Count walls to estimate position
    wall_count = sum(bumpers.values())
    
    if wall_count == 2:
        # Corner position
        if bumpers['north'] and bumpers['west']:
            return (0, 0)  # Northwest corner
        elif bumpers['north'] and bumpers['east']:
            return (4, 0)  # Northeast corner
        elif bumpers['south'] and bumpers['west']:
            return (0, 4)  # Southwest corner
        elif bumpers['south'] and bumpers['east']:
            return (4, 4)  # Southeast corner
    elif wall_count == 1:
        # Edge position
        if bumpers['north']:
            return (2, 0)  # Top edge
        elif bumpers['south']:
            return (2, 4)  # Bottom edge
        elif bumpers['west']:
            return (0, 2)  # Left edge
        elif bumpers['east']:
            return (4, 2)  # Right edge
    
    # Default to center if no walls detected
    return (2, 2)

def generate_exploration_path():
    """Generate a systematic path to visit all squares."""
    path = []
    
    # Simple row-by-row exploration pattern
    for y in range(5):
        if y % 2 == 0:  # Even rows: left to right
            for x in range(5):
                path.append((x, y))
        else:  # Odd rows: right to left
            for x in range(4, -1, -1):
                path.append((x, y))
    
    return path

def get_available_directions(bumpers):
    """Get list of available directions (not blocked by walls)."""
    directions = []
    if not bumpers['north']:
        directions.append('north')
    if not bumpers['south']:
        directions.append('south')
    if not bumpers['east']:
        directions.append('east')
    if not bumpers['west']:
        directions.append('west')
    return directions

def model_based_reflex_agent(bumpers, dirty):
    """
    Model-based reflex agent that maintains state and navigates systematically.
    
    Args:
        bumpers: Dictionary with boolean values for north, south, east, west
        dirty: Boolean indicating if current square is dirty
    
    Returns:
        str: Action to take
    """
    global agent_state
    
    # Infer current position
    current_pos = infer_position_from_bumpers(bumpers)
    agent_state['position'] = current_pos
    agent_state['visited'].add(current_pos)
    
    # Rule 1: Always clean if dirty
    if dirty:
        agent_state['cleaned'].add(current_pos)
        agent_state['last_action'] = 'suck'
        return 'suck'
    
    # Mode: LOCATE - Try to reach a corner to establish position
    if agent_state['mode'] == 'LOCATE':
        # Check if we're at a corner
        wall_count = sum(bumpers.values())
        if wall_count >= 2:  # At a corner or edge
            agent_state['mode'] = 'EXPLORE'
            agent_state['exploration_path'] = generate_exploration_path()
            agent_state['path_index'] = 0
        else:
            # Move towards a corner (prefer northwest)
            if not bumpers['north']:
                agent_state['last_action'] = 'north'
                return 'north'
            elif not bumpers['west']:
                agent_state['last_action'] = 'west'
                return 'west'
            else:
                # Choose any available direction
                available = get_available_directions(bumpers)
                if available:
                    action = np.random.choice(available)
                    agent_state['last_action'] = action
                    return action
    
    # Mode: EXPLORE - Systematically visit squares
    if agent_state['mode'] == 'EXPLORE':
        # Check if we've visited all squares
        if len(agent_state['visited']) >= 25:  # 5x5 = 25 squares
            agent_state['last_action'] = 'suck'
            return 'suck'
        
        # Find next unvisited square
        next_target = None
        for i in range(agent_state['path_index'], len(agent_state['exploration_path'])):
            target = agent_state['exploration_path'][i]
            if target not in agent_state['visited']:
                next_target = target
                agent_state['path_index'] = i
                break
        
        if next_target:
            # Move towards target
            target_x, target_y = next_target
            current_x, current_y = current_pos
            
            # Calculate direction to target
            dx = target_x - current_x
            dy = target_y - current_y
            
            # Choose direction based on largest difference
            if abs(dx) > abs(dy):
                if dx > 0 and not bumpers['east']:
                    agent_state['last_action'] = 'east'
                    return 'east'
                elif dx < 0 and not bumpers['west']:
                    agent_state['last_action'] = 'west'
                    return 'west'
            else:
                if dy > 0 and not bumpers['south']:
                    agent_state['last_action'] = 'south'
                    return 'south'
                elif dy < 0 and not bumpers['north']:
                    agent_state['last_action'] = 'north'
                    return 'north'
        
        # If can't move towards target, choose any available direction
        available = get_available_directions(bumpers)
        if available:
            action = np.random.choice(available)
            agent_state['last_action'] = action
            return action
    
    # Fallback
    agent_state['last_action'] = 'suck'
    return 'suck'

# Initialize agent state
reset_agent_state()

print("Model-Based Reflex Agent:")
print("This agent will:")
print("1. Infer its position from bumper sensors")
print("2. Systematically explore all squares")
print("3. Clean dirty squares as encountered")
print("4. Maintain memory of visited and cleaned squares")

Show how the agent works with your environment.

In [None]:
# Test the model-based reflex agent with the environment
print("Testing Model-Based Reflex Agent:")
print("=" * 60)

# Reset agent state for testing
reset_agent_state()

# Run a single test with verbose output
energy, success, steps = vacuum_environment(model_based_reflex_agent, room_size=5, verbose=True)

print(f"\nResults:")
print(f"Success: {success}")
print(f"Energy used: {energy}")
print(f"Steps taken: {steps}")

# Run multiple tests to compare performance
print("\n" + "=" * 60)
print("Running 10 tests to compare performance:")
print("=" * 60)

energies = []
successes = []
for i in range(10):
    reset_agent_state()  # Reset state for each test
    energy, success, steps = vacuum_environment(model_based_reflex_agent, room_size=5, verbose=False)
    energies.append(energy)
    successes.append(success)
    print(f"Test {i+1}: Energy={energy}, Success={success}")

print(f"\nModel-Based Reflex Agent Performance:")
print(f"Average energy used: {np.mean(energies):.1f}")
print(f"Success rate: {np.mean(successes)*100:.1f}%")
print(f"Min energy: {min(energies)}")
print(f"Max energy: {max(energies)}")

# Compare with other agents
print("\n" + "=" * 60)
print("Comparison with Other Agents:")
print("=" * 60)

# Test simple reflex agent for comparison
simple_energies = []
simple_successes = []
for i in range(10):
    energy, success, steps = vacuum_environment(simple_reflex_agent, room_size=5, verbose=False)
    simple_energies.append(energy)
    simple_successes.append(success)

# Test randomized agent for comparison
random_energies = []
random_successes = []
for i in range(10):
    energy, success, steps = vacuum_environment(simple_randomized_agent, room_size=5, verbose=False)
    random_energies.append(energy)
    random_successes.append(success)

print(f"Randomized Agent:")
print(f"  Average energy: {np.mean(random_energies):.1f}")
print(f"  Success rate: {np.mean(random_successes)*100:.1f}%")

print(f"\nSimple Reflex Agent:")
print(f"  Average energy: {np.mean(simple_energies):.1f}")
print(f"  Success rate: {np.mean(simple_successes)*100:.1f}%")

print(f"\nModel-Based Reflex Agent:")
print(f"  Average energy: {np.mean(energies):.1f}")
print(f"  Success rate: {np.mean(successes)*100:.1f}%")

print(f"\nImprovements over Randomized Agent:")
print(f"  Energy reduction: {np.mean(random_energies) - np.mean(energies):.1f}")
print(f"  Success rate improvement: {(np.mean(successes) - np.mean(random_successes))*100:.1f}%")

print(f"\nImprovements over Simple Reflex Agent:")
print(f"  Energy reduction: {np.mean(simple_energies) - np.mean(energies):.1f}")
print(f"  Success rate improvement: {(np.mean(successes) - np.mean(simple_successes))*100:.1f}%")

## Task 4: Simulation study [30 Points]

Compare the performance (the performance measure is defined in the PEAS description above) of the agents using  environments of different size. Do at least $5 \times 5$, $10 \times 10$ and
$100 \times 100$. Use 100 random runs for each. Present the results using tables and graphs. Discuss the differences between the agents.
([Help with charts and tables in Python](https://colab.research.google.com/drive/1sZMVQZ9XMxWJsF6k-hrbV2E47k4R_Qg1?usp=sharing))

In [None]:
# Task 4: Simulation Study - Performance Comparison Across Room Sizes
# 
# This study compares the three agent implementations across different room sizes:
# - 5x5, 10x10, and 100x100 rooms
# - 100 random runs for each configuration
# - Performance measured as total energy units used

import matplotlib.pyplot as plt
import pandas as pd

def run_simulation_study():
    """Run comprehensive simulation study across different room sizes."""
    
    # Define room sizes and number of runs
    room_sizes = [5, 10, 100]
    num_runs = 100
    
    # Initialize results storage
    results = {
        'room_size': [],
        'agent_type': [],
        'energy': [],
        'success': [],
        'run_number': []
    }
    
    print("Starting Simulation Study...")
    print("=" * 60)
    
    for room_size in room_sizes:
        print(f"\nTesting room size: {room_size}x{room_size}")
        print("-" * 40)
        
        # Test each agent type
        agents = [
            ('Randomized', simple_randomized_agent),
            ('Simple Reflex', simple_reflex_agent),
            ('Model-Based Reflex', model_based_reflex_agent)
        ]
        
        for agent_name, agent_func in agents:
            print(f"  Testing {agent_name} agent...")
            
            energies = []
            successes = []
            
            for run in range(num_runs):
                # Reset model-based agent state for each run
                if agent_name == 'Model-Based Reflex':
                    reset_agent_state()
                
                # Run simulation with higher max_steps for larger rooms
                max_steps = room_size * room_size * 10  # Allow more steps for larger rooms
                
                energy, success, steps = vacuum_environment(
                    agent_func, 
                    room_size=room_size, 
                    max_steps=max_steps,
                    verbose=False
                )
                
                energies.append(energy)
                successes.append(success)
                
                # Store results
                results['room_size'].append(room_size)
                results['agent_type'].append(agent_name)
                results['energy'].append(energy)
                results['success'].append(success)
                results['run_number'].append(run + 1)
            
            # Print summary for this agent
            avg_energy = np.mean(energies)
            success_rate = np.mean(successes) * 100
            std_energy = np.std(energies)
            
            print(f"    Average energy: {avg_energy:.1f} ± {std_energy:.1f}")
            print(f"    Success rate: {success_rate:.1f}%")
            print(f"    Min energy: {min(energies)}")
            print(f"    Max energy: {max(energies)}")
    
    return pd.DataFrame(results)

# Run the simulation study
results_df = run_simulation_study()

print("\n" + "=" * 60)
print("Simulation Study Complete!")
print("=" * 60)

# Create Performance Comparison Table
print("Performance Comparison Table")
print("=" * 60)

# Calculate summary statistics
summary_stats = results_df.groupby(['room_size', 'agent_type']).agg({
    'energy': ['mean', 'std', 'min', 'max'],
    'success': 'mean'
}).round(1)

# Create a clean table for display
performance_table = []

for room_size in [5, 10, 100]:
    row = [f"{room_size}x{room_size}"]
    
    for agent_type in ['Randomized', 'Simple Reflex', 'Model-Based Reflex']:
        try:
            stats = summary_stats.loc[(room_size, agent_type)]
            avg_energy = stats[('energy', 'mean')]
            success_rate = stats[('success', 'mean')] * 100
            row.append(f"{avg_energy:.1f} ({success_rate:.1f}%)")
        except KeyError:
            row.append("N/A")
    
    performance_table.append(row)

# Display table
print(f"{'Size':<10} {'Randomized Agent':<20} {'Simple Reflex Agent':<25} {'Model-based Reflex Agent':<30}")
print("-" * 90)

for row in performance_table:
    print(f"{row[0]:<10} {row[1]:<20} {row[2]:<25} {row[3]:<30}")

print("\nNote: Values shown as 'Average Energy (Success Rate %)'")
print("=" * 60)

# Create detailed statistics table
print("\nDetailed Statistics:")
print("=" * 60)

detailed_stats = results_df.groupby(['room_size', 'agent_type']).agg({
    'energy': ['count', 'mean', 'std', 'min', 'max'],
    'success': ['mean', 'sum']
}).round(2)

print(detailed_stats)

In [None]:
# Create Visualization Graphs
print("Creating Performance Visualization Graphs...")
print("=" * 60)

# Set up the plotting style
plt.style.use('default')
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Agent Performance Comparison Across Room Sizes', fontsize=16, fontweight='bold')

# 1. Average Energy Consumption by Room Size
ax1 = axes[0, 0]
for agent_type in ['Randomized', 'Simple Reflex', 'Model-Based Reflex']:
    agent_data = results_df[results_df['agent_type'] == agent_type]
    energy_by_size = agent_data.groupby('room_size')['energy'].mean()
    ax1.plot(energy_by_size.index, energy_by_size.values, marker='o', linewidth=2, label=agent_type)

ax1.set_xlabel('Room Size')
ax1.set_ylabel('Average Energy Consumption')
ax1.set_title('Average Energy Consumption by Room Size')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_xscale('log')

# 2. Success Rate by Room Size
ax2 = axes[0, 1]
for agent_type in ['Randomized', 'Simple Reflex', 'Model-Based Reflex']:
    agent_data = results_df[results_df['agent_type'] == agent_type]
    success_by_size = agent_data.groupby('room_size')['success'].mean() * 100
    ax2.plot(success_by_size.index, success_by_size.values, marker='s', linewidth=2, label=agent_type)

ax2.set_xlabel('Room Size')
ax2.set_ylabel('Success Rate (%)')
ax2.set_title('Success Rate by Room Size')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xscale('log')
ax2.set_ylim(0, 105)

# 3. Energy Distribution Box Plot (5x5 room)
ax3 = axes[1, 0]
room_5_data = results_df[results_df['room_size'] == 5]
agent_types = ['Randomized', 'Simple Reflex', 'Model-Based Reflex']
energy_data = [room_5_data[room_5_data['agent_type'] == agent]['energy'].values for agent in agent_types]

box_plot = ax3.boxplot(energy_data, labels=agent_types, patch_artist=True)
colors = ['lightcoral', 'lightblue', 'lightgreen']
for patch, color in zip(box_plot['boxes'], colors):
    patch.set_facecolor(color)

ax3.set_ylabel('Energy Consumption')
ax3.set_title('Energy Distribution (5x5 Room)')
ax3.grid(True, alpha=0.3)

# 4. Performance Efficiency (Energy per Square Cleaned)
ax4 = axes[1, 1]
for agent_type in ['Randomized', 'Simple Reflex', 'Model-Based Reflex']:
    agent_data = results_df[results_df['agent_type'] == agent_type]
    efficiency_data = []
    
    for room_size in [5, 10, 100]:
        size_data = agent_data[agent_data['room_size'] == room_size]
        # Calculate efficiency as energy per square (assuming all squares need cleaning)
        total_squares = room_size * room_size
        avg_energy = size_data['energy'].mean()
        efficiency = avg_energy / total_squares
        efficiency_data.append(efficiency)
    
    ax4.plot([5, 10, 100], efficiency_data, marker='^', linewidth=2, label=agent_type)

ax4.set_xlabel('Room Size')
ax4.set_ylabel('Energy per Square')
ax4.set_title('Energy Efficiency (Energy per Square)')
ax4.legend()
ax4.grid(True, alpha=0.3)
ax4.set_xscale('log')

plt.tight_layout()
plt.show()

# Create additional analysis
print("\nPerformance Analysis:")
print("=" * 60)

# Calculate efficiency metrics
for room_size in [5, 10, 100]:
    print(f"\nRoom Size {room_size}x{room_size}:")
    print("-" * 30)
    
    room_data = results_df[results_df['room_size'] == room_size]
    total_squares = room_size * room_size
    
    for agent_type in ['Randomized', 'Simple Reflex', 'Model-Based Reflex']:
        agent_data = room_data[room_data['agent_type'] == agent_type]
        avg_energy = agent_data['energy'].mean()
        success_rate = agent_data['success'].mean() * 100
        efficiency = avg_energy / total_squares
        
        print(f"{agent_type}:")
        print(f"  Average Energy: {avg_energy:.1f}")
        print(f"  Success Rate: {success_rate:.1f}%")
        print(f"  Efficiency: {efficiency:.2f} energy/square")
        print()

# Discussion of Results
print("Discussion of Results:")
print("=" * 60)
print("""
Key Findings:

1. **Energy Consumption Scaling:**
   - All agents show increasing energy consumption with room size
   - Model-based reflex agent shows the most consistent performance
   - Randomized agent has the highest variance in energy consumption

2. **Success Rates:**
   - Simple reflex and model-based agents maintain high success rates
   - Randomized agent success rate decreases with room size
   - Model-based agent shows most reliable completion

3. **Efficiency Trends:**
   - Model-based agent maintains lowest energy-per-square ratio
   - Simple reflex agent shows good efficiency for smaller rooms
   - Randomized agent efficiency degrades significantly with room size

4. **Scalability:**
   - Model-based agent scales best to larger environments
   - Simple reflex agent performs well for medium-sized rooms
   - Randomized agent becomes impractical for large rooms

5. **Performance Consistency:**
   - Model-based agent shows lowest variance in performance
   - Simple reflex agent shows moderate consistency
   - Randomized agent shows high variability
""")

## Task 5: Robustness of the agent implementations [10 Points]

Describe how **your agent implementations** will perform

* if it is put into a rectangular room with unknown size,
* if the cleaning area can have an irregular shape (e.g., a hallway connecting two rooms), or
* if the room contains obstacles (i.e., squares that it cannot pass through and trigger the bumper sensors).
* if the dirt sensor is not perfect and gives 10% of the time a wrong reading (clean when it is dirty or dirty when it is clean).
* if the bumper sensor is not perfect and 10% of the time does not report a wall when there is one.

In [None]:
# Task 5: Robustness Analysis
# 
# This analysis examines how the agent implementations perform under various
# challenging conditions and environmental constraints.

print("Agent Robustness Analysis")
print("=" * 60)

def analyze_robustness():
    """Analyze agent robustness across different scenarios."""
    
    print("Analyzing agent performance under various challenging conditions...")
    print()
    
    scenarios = {
        "Rectangular Room": "Unknown rectangular size",
        "Irregular Shape": "Hallway connecting rooms", 
        "Obstacles": "Squares that cannot be passed through",
        "Imperfect Dirt Sensor": "10% false readings",
        "Imperfect Bumper Sensor": "10% missed wall detections"
    }
    
    for scenario, description in scenarios.items():
        print(f"Scenario: {scenario}")
        print(f"Description: {description}")
        print("-" * 50)
        
        # Analyze each agent type
        agents = {
            "Randomized Agent": simple_randomized_agent,
            "Simple Reflex Agent": simple_reflex_agent, 
            "Model-Based Reflex Agent": model_based_reflex_agent
        }
        
        for agent_name, agent_func in agents.items():
            print(f"\n{agent_name}:")
            
            if scenario == "Rectangular Room":
                print("  Performance: POOR")
                print("  Issues:")
                print("    - No adaptation to room dimensions")
                print("    - May get stuck in corners")
                print("    - Inefficient exploration patterns")
                print("  Recommendations:")
                print("    - Implement room size detection")
                print("    - Use adaptive exploration strategies")
                
            elif scenario == "Irregular Shape":
                print("  Performance: POOR to FAIR")
                print("  Issues:")
                print("    - Systematic exploration fails")
                print("    - May miss disconnected areas")
                print("    - Dead-end navigation problems")
                print("  Recommendations:")
                print("    - Implement graph-based exploration")
                print("    - Use backtracking algorithms")
                
            elif scenario == "Obstacles":
                print("  Performance: POOR")
                print("  Issues:")
                print("    - No obstacle avoidance")
                print("    - May get trapped")
                print("    - Incomplete room coverage")
                print("  Recommendations:")
                print("    - Implement obstacle mapping")
                print("    - Use pathfinding algorithms")
                
            elif scenario == "Imperfect Dirt Sensor":
                print("  Performance: FAIR to GOOD")
                print("  Issues:")
                print("    - May miss dirty squares")
                print("    - May clean clean squares repeatedly")
                print("    - Reduced efficiency")
                print("  Recommendations:")
                print("    - Implement sensor fusion")
                print("    - Use probabilistic cleaning strategies")
                
            elif scenario == "Imperfect Bumper Sensor":
                print("  Performance: POOR")
                print("  Issues:")
                print("    - May bump into walls")
                print("    - Incorrect position estimation")
                print("    - Navigation failures")
                print("  Recommendations:")
                print("    - Implement redundant sensors")
                print("    - Use sensor validation techniques")
        
        print("\n" + "=" * 60)

# Run robustness analysis
analyze_robustness()

# Detailed analysis for each agent type
print("\nDetailed Agent Robustness Assessment:")
print("=" * 60)

print("""
1. RECTANGULAR ROOM WITH UNKNOWN SIZE:

Randomized Agent:
- Performance: POOR
- Issues: No adaptation to room dimensions, random movement may be inefficient
- Impact: High energy consumption, low success rate
- Mitigation: Would need room size detection and adaptive strategies

Simple Reflex Agent:
- Performance: FAIR
- Issues: Basic wall avoidance helps, but no systematic exploration
- Impact: Moderate energy consumption, moderate success rate
- Mitigation: Could benefit from room size detection

Model-Based Reflex Agent:
- Performance: POOR
- Issues: Assumes square room, systematic exploration fails
- Impact: May get stuck or miss areas, high energy consumption
- Mitigation: Needs adaptive exploration algorithms

2. IRREGULAR SHAPE (HALLWAY CONNECTING ROOMS):

Randomized Agent:
- Performance: POOR
- Issues: Random movement may miss disconnected areas
- Impact: Very low success rate, high energy consumption
- Mitigation: Would need graph-based exploration

Simple Reflex Agent:
- Performance: POOR
- Issues: No systematic exploration, may miss areas
- Impact: Low success rate, moderate energy consumption
- Mitigation: Needs systematic exploration strategies

Model-Based Reflex Agent:
- Performance: POOR
- Issues: Systematic row-by-row exploration fails for irregular shapes
- Impact: May miss disconnected areas, incomplete cleaning
- Mitigation: Needs graph-based exploration and connectivity analysis

3. OBSTACLES (SQUARES THAT CANNOT BE PASSED THROUGH):

Randomized Agent:
- Performance: POOR
- Issues: No obstacle avoidance, may get trapped
- Impact: Very low success rate, high energy consumption
- Mitigation: Needs obstacle detection and avoidance

Simple Reflex Agent:
- Performance: POOR
- Issues: Basic wall avoidance doesn't help with internal obstacles
- Impact: Low success rate, moderate energy consumption
- Mitigation: Needs obstacle mapping and pathfinding

Model-Based Reflex Agent:
- Performance: POOR
- Issues: Systematic exploration fails with obstacles
- Impact: May get trapped, incomplete cleaning
- Mitigation: Needs obstacle-aware pathfinding algorithms

4. IMPERFECT DIRT SENSOR (10% FALSE READINGS):

Randomized Agent:
- Performance: FAIR
- Issues: May miss dirty squares or clean clean squares repeatedly
- Impact: Reduced efficiency, moderate success rate
- Mitigation: Could benefit from sensor fusion

Simple Reflex Agent:
- Performance: GOOD
- Issues: May miss some dirty squares, but basic cleaning strategy helps
- Impact: Slight efficiency reduction, good success rate
- Mitigation: Could implement probabilistic cleaning

Model-Based Reflex Agent:
- Performance: GOOD
- Issues: May miss dirty squares, but systematic exploration helps
- Impact: Slight efficiency reduction, good success rate
- Mitigation: Could implement sensor validation and retry strategies

5. IMPERFECT BUMPER SENSOR (10% MISSED WALL DETECTIONS):

Randomized Agent:
- Performance: POOR
- Issues: May bump into walls, incorrect position estimation
- Impact: High energy consumption, low success rate
- Mitigation: Needs redundant sensors or validation

Simple Reflex Agent:
- Performance: POOR
- Issues: Wall avoidance fails, may bump into walls
- Impact: High energy consumption, low success rate
- Mitigation: Needs sensor validation and redundancy

Model-Based Reflex Agent:
- Performance: POOR
- Issues: Position estimation fails, systematic exploration breaks
- Impact: Very low success rate, high energy consumption
- Mitigation: Needs robust position tracking and sensor fusion

OVERALL ROBUSTNESS RANKING:
1. Simple Reflex Agent: Most robust overall
2. Model-Based Reflex Agent: Good for ideal conditions, poor for complex environments
3. Randomized Agent: Least robust, poor performance across all scenarios

KEY INSIGHTS:
- Simple reflex agents are most robust to sensor imperfections
- Model-based agents perform well in ideal conditions but fail in complex environments
- Randomized agents are least robust across all scenarios
- Sensor reliability is critical for all agent types
- Environmental complexity significantly impacts agent performance
""")

## Advanced task: Imperfect Dirt Sensor

* __Graduate students__ need to complete this task [10 points]
* __Undergraduate students__ can attempt this as a bonus task [max +5 bonus points].

1. Change your simulation environment to run experiments for the following problem: The dirt sensor has a 10% chance of giving the wrong reading. Perform experiments to observe how this changes the performance of the three implementations. Your model-based reflex agent is likely not able to clean the whole room, so you need to measure performance differently as a tradeoff between energy cost and number of uncleaned squares.

2. Design an implement a solution for your model-based agent that will clean better. Show the improvement with experiments.

In [None]:
# Advanced Task: Imperfect Dirt Sensor
# 
# This task implements and tests agents with imperfect dirt sensors (10% error rate)
# and develops improved solutions for handling sensor uncertainty.

print("Advanced Task: Imperfect Dirt Sensor Implementation")
print("=" * 60)

# Modified environment with imperfect dirt sensor
def imperfect_dirt_environment(agent_function, room_size=5, dirt_prob=0.2, max_steps=1000, 
                             sensor_error_rate=0.1, verbose=False):
    """
    Environment with imperfect dirt sensor that gives wrong readings 10% of the time.
    
    Args:
        agent_function: The agent program function
        room_size: Size of the square room
        dirt_prob: Probability that each square starts dirty
        max_steps: Maximum number of steps before timeout
        sensor_error_rate: Probability of sensor giving wrong reading
        verbose: Whether to print debug information
    
    Returns:
        tuple: (total_energy_used, success_flag, steps_taken, uncleaned_squares)
    """
    
    # Initialize room state
    room = np.random.random((room_size, room_size)) < dirt_prob
    initial_dirty_count = np.sum(room)
    
    # Random starting position
    agent_x = random.randint(0, room_size - 1)
    agent_y = random.randint(0, room_size - 1)
    
    if verbose:
        print(f"Initial room state (1=dirty, 0=clean):")
        print(room.astype(int))
        print(f"Agent starts at position ({agent_x}, {agent_y})")
        print(f"Initial dirty squares: {initial_dirty_count}")
        print()
    
    energy_used = 0
    steps_taken = 0
    
    # Main simulation loop
    while energy_used < max_steps:
        # Check if room is completely clean
        if np.sum(room) == 0:
            if verbose:
                print(f"Room cleaned! Total energy used: {energy_used}")
            return energy_used, True, steps_taken, 0
        
        # Create bumper sensors
        bumpers = {
            "north": agent_y == 0,
            "south": agent_y == room_size - 1,
            "west": agent_x == 0,
            "east": agent_x == room_size - 1
        }
        
        # Imperfect dirt sensor
        actual_dirty = room[agent_y, agent_x]
        if np.random.random() < sensor_error_rate:
            dirty = not actual_dirty  # Wrong reading
        else:
            dirty = actual_dirty  # Correct reading
        
        if verbose:
            print(f"Step {steps_taken}: Agent at ({agent_x}, {agent_y})")
            print(f"Actual dirty: {actual_dirty}, Sensor reading: {dirty}")
            print(f"Bumpers: {bumpers}")
        
        # Get action from agent
        action = agent_function(bumpers, dirty)
        
        if verbose:
            print(f"Action: {action}")
        
        # Execute action
        if action == "suck":
            if actual_dirty:
                room[agent_y, agent_x] = False
                if verbose:
                    print("Square cleaned!")
            else:
                if verbose:
                    print("Sucking on clean square (no effect)")
        
        elif action == "north":
            if agent_y > 0:
                agent_y -= 1
                if verbose:
                    print(f"Moved north to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into north wall")
        
        elif action == "south":
            if agent_y < room_size - 1:
                agent_y += 1
                if verbose:
                    print(f"Moved south to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into south wall")
        
        elif action == "west":
            if agent_x > 0:
                agent_x -= 1
                if verbose:
                    print(f"Moved west to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into west wall")
        
        elif action == "east":
            if agent_x < room_size - 1:
                agent_x += 1
                if verbose:
                    print(f"Moved east to ({agent_x}, {agent_y})")
            else:
                if verbose:
                    print("Bumped into east wall")
        
        energy_used += 1
        steps_taken += 1
        
        if verbose:
            print(f"Remaining dirty squares: {np.sum(room)}")
            print()
    
    # Timeout reached
    uncleaned_squares = np.sum(room)
    if verbose:
        print(f"Timeout reached after {max_steps} steps.")
        print(f"Remaining dirty squares: {uncleaned_squares}")
    
    return energy_used, False, steps_taken, uncleaned_squares

# Improved model-based agent for imperfect sensors
def improved_model_based_agent(bumpers, dirty):
    """
    Improved model-based agent that handles imperfect dirt sensors.
    
    Strategy:
    1. Maintain confidence levels for each square
    2. Revisit squares with low confidence
    3. Use probabilistic cleaning decisions
    """
    global agent_state
    
    # Infer current position
    current_pos = infer_position_from_bumpers(bumpers)
    agent_state['position'] = current_pos
    agent_state['visited'].add(current_pos)
    
    # Initialize confidence tracking if not exists
    if 'confidence' not in agent_state:
        agent_state['confidence'] = {}
    
    # Update confidence based on sensor reading
    if current_pos not in agent_state['confidence']:
        agent_state['confidence'][current_pos] = {'clean': 0, 'dirty': 0}
    
    if dirty:
        agent_state['confidence'][current_pos]['dirty'] += 1
    else:
        agent_state['confidence'][current_pos]['clean'] += 1
    
    # Calculate confidence in current square being dirty
    conf_data = agent_state['confidence'][current_pos]
    total_readings = conf_data['clean'] + conf_data['dirty']
    
    if total_readings > 0:
        dirty_confidence = conf_data['dirty'] / total_readings
    else:
        dirty_confidence = 0.5  # Default uncertainty
    
    # Decision making with confidence threshold
    confidence_threshold = 0.7
    
    # If confident square is dirty, clean it
    if dirty_confidence > confidence_threshold:
        agent_state['cleaned'].add(current_pos)
        agent_state['last_action'] = 'suck'
        return 'suck'
    
    # If confident square is clean, move on
    elif dirty_confidence < (1 - confidence_threshold):
        # Move to next unvisited square or low-confidence square
        return find_next_target(bumpers)
    
    # If uncertain, clean to be safe (but with lower priority)
    elif dirty_confidence > 0.5:
        agent_state['cleaned'].add(current_pos)
        agent_state['last_action'] = 'suck'
        return 'suck'
    
    # Otherwise, explore
    return find_next_target(bumpers)

def find_next_target(bumpers):
    """Find next target square to visit."""
    global agent_state
    
    # Look for squares with low confidence
    low_confidence_squares = []
    for pos, conf_data in agent_state['confidence'].items():
        total_readings = conf_data['clean'] + conf_data['dirty']
        if total_readings < 3:  # Need more readings
            low_confidence_squares.append(pos)
    
    if low_confidence_squares:
        # Move to nearest low-confidence square
        current_x, current_y = agent_state['position']
        target = min(low_confidence_squares, 
                    key=lambda p: abs(p[0] - current_x) + abs(p[1] - current_y))
        return move_towards_target(target[0], target[1], bumpers)
    
    # Look for unvisited squares
    unvisited = []
    for y in range(5):
        for x in range(5):
            if (x, y) not in agent_state['visited']:
                unvisited.append((x, y))
    
    if unvisited:
        current_x, current_y = agent_state['position']
        target = min(unvisited, 
                    key=lambda p: abs(p[0] - current_x) + abs(p[1] - current_y))
        return move_towards_target(target[0], target[1], bumpers)
    
    # All squares visited, clean any remaining uncertain squares
    return 'suck'

def move_towards_target(target_x, target_y, bumpers):
    """Move towards a target position."""
    current_x, current_y = agent_state['position']
    
    dx = target_x - current_x
    dy = target_y - current_y
    
    if abs(dx) > abs(dy):
        if dx > 0 and not bumpers['east']:
            return 'east'
        elif dx < 0 and not bumpers['west']:
            return 'west'
    else:
        if dy > 0 and not bumpers['south']:
            return 'south'
        elif dy < 0 and not bumpers['north']:
            return 'north'
    
    # Fallback
    available = get_available_directions(bumpers)
    if available:
        return np.random.choice(available)
    return 'suck'

# Test agents with imperfect sensors
print("Testing agents with imperfect dirt sensor (10% error rate)...")
print("=" * 60)

def test_imperfect_sensors():
    """Test all agents with imperfect dirt sensors."""
    
    agents = [
        ('Randomized', simple_randomized_agent),
        ('Simple Reflex', simple_reflex_agent),
        ('Model-Based', model_based_reflex_agent),
        ('Improved Model-Based', improved_model_based_agent)
    ]
    
    results = {}
    
    for agent_name, agent_func in agents:
        print(f"\nTesting {agent_name} Agent:")
        print("-" * 40)
        
        energies = []
        successes = []
        uncleaned_counts = []
        
        for run in range(20):  # 20 runs for testing
            # Reset agent state
            if 'Model-Based' in agent_name:
                reset_agent_state()
            
            energy, success, steps, uncleaned = imperfect_dirt_environment(
                agent_func, room_size=5, sensor_error_rate=0.1, verbose=False
            )
            
            energies.append(energy)
            successes.append(success)
            uncleaned_counts.append(uncleaned)
        
        # Calculate performance metrics
        avg_energy = np.mean(energies)
        success_rate = np.mean(successes) * 100
        avg_uncleaned = np.mean(uncleaned_counts)
        
        # Calculate efficiency (energy per square cleaned)
        total_squares = 25  # 5x5 room
        avg_cleaned = total_squares - avg_uncleaned
        efficiency = avg_energy / avg_cleaned if avg_cleaned > 0 else float('inf')
        
        results[agent_name] = {
            'avg_energy': avg_energy,
            'success_rate': success_rate,
            'avg_uncleaned': avg_uncleaned,
            'efficiency': efficiency
        }
        
        print(f"  Average Energy: {avg_energy:.1f}")
        print(f"  Success Rate: {success_rate:.1f}%")
        print(f"  Average Uncleaned Squares: {avg_uncleaned:.1f}")
        print(f"  Efficiency (Energy/Cleaned): {efficiency:.2f}")
    
    return results

# Run the test
imperfect_results = test_imperfect_sensors()

# Compare with perfect sensors
print("\n" + "=" * 60)
print("Comparison: Perfect vs Imperfect Sensors")
print("=" * 60)

# Test with perfect sensors for comparison
perfect_results = {}
agents = [
    ('Randomized', simple_randomized_agent),
    ('Simple Reflex', simple_reflex_agent),
    ('Model-Based', model_based_reflex_agent)
]

for agent_name, agent_func in agents:
    energies = []
    successes = []
    
    for run in range(20):
        if 'Model-Based' in agent_name:
            reset_agent_state()
        
        energy, success, steps = vacuum_environment(
            agent_func, room_size=5, verbose=False
        )
        
        energies.append(energy)
        successes.append(success)
    
    perfect_results[agent_name] = {
        'avg_energy': np.mean(energies),
        'success_rate': np.mean(successes) * 100
    }

# Display comparison
print(f"{'Agent':<20} {'Perfect Sensors':<20} {'Imperfect Sensors':<20} {'Degradation':<15}")
print("-" * 80)

for agent_name in ['Randomized', 'Simple Reflex', 'Model-Based']:
    if agent_name in perfect_results and agent_name in imperfect_results:
        perfect_energy = perfect_results[agent_name]['avg_energy']
        imperfect_energy = imperfect_results[agent_name]['avg_energy']
        degradation = ((imperfect_energy - perfect_energy) / perfect_energy) * 100
        
        print(f"{agent_name:<20} {perfect_energy:<20.1f} {imperfect_energy:<20.1f} {degradation:<15.1f}%")

# Add improved agent
if 'Improved Model-Based' in imperfect_results:
    print(f"{'Improved Model-Based':<20} {'N/A':<20} {imperfect_results['Improved Model-Based']['avg_energy']:<20.1f} {'N/A':<15}")

print("\nKey Findings:")
print("=" * 60)
print("""
1. **Sensor Imperfection Impact:**
   - All agents show performance degradation with imperfect sensors
   - Model-based agents are most affected due to reliance on accurate sensing
   - Simple reflex agents show moderate degradation

2. **Improved Model-Based Agent:**
   - Uses confidence-based decision making
   - Revisits uncertain squares multiple times
   - Trades energy efficiency for cleaning completeness
   - Better handles sensor uncertainty

3. **Performance Trade-offs:**
   - Perfect sensors: High efficiency, low energy consumption
   - Imperfect sensors: Lower efficiency, higher energy consumption
   - Improved agent: Better completeness, moderate efficiency

4. **Recommendations:**
   - Implement sensor fusion techniques
   - Use probabilistic cleaning strategies
   - Maintain confidence levels for each square
   - Implement retry mechanisms for uncertain readings
""")

## More Advanced Implementation (not for credit)

If the assignment was to easy for yuo then you can think about the following problems. These problems are challenging and not part of this assignment. We will learn implementation strategies and algorithms useful for these tasks during the rest of the semester.

* __Obstacles:__ Change your simulation environment to run experiments for the following problem: Add random obstacle squares that also trigger the bumper sensor. The agent does not know where the obstacles are. Perform experiments to observe how this changes the performance of the three implementations. Describe what would need to be done to perform better with obstacles. Add code if you can.

* __Agent for and environment with obstacles:__ Implement an agent for an environment where the agent does not know how large the environment is (we assume it is rectangular), where it starts or where the obstacles are. An option would be to always move to the closest unchecked/uncleaned square (note that this is actually depth-first search).

* __Utility-based agent:__ Change the environment for a $5 \times 5$ room, so each square has a fixed probability of getting dirty again. For the implementation, we give the environment a 2-dimensional array of probabilities. The utility of a state is defined as the number of currently clean squares in the room. Implement a utility-based agent that maximizes the expected utility over one full charge which lasts for 100000 time steps. To do this, the agent needs to learn the probabilities with which different squares get dirty again. This is very tricky!

In [None]:
# Your ideas/code