# Rational Agents

## Agent with internal world model

This code demonstrates a world model agent (simulating a thermostat). 

The agent maintains an internal world model consisting of... 
- inferred state ("rising", "falling", or "stable")
- decides actions (cool, heat, off) based on condition-action rules.

**Formula**: 
$ f_{\text{model}}: \mathcal{O}^T \times \mathcal{M} \to \mathcal{X} \times \mathcal{M} \to \mathcal{A} $

> Hint: `Ctrl + enter` to run the code

### Define variables for the formula symbols

In [1]:
T = 24 # Quantity of time steps
O_T = [] # Set of observations, raw past temperatures data from sensor (e.g., 21)
X = [] # Set of (interpreted) states, accurate temperatures data (e.g., 23)
M = [] # Set of the world model, Inferred temperature trends (rising/falling/stable)
A = [] # Set of actions, (cool/heat/off)

### Define the World Model agent function

In [2]:
def world_model_agent(observation: float) -> str:
    """Decide action based on world model."""

    # Record the lastest raw observation
    O_T.append(observation)

    # Let's simply assume that the temperature sensor's have a bias of 2 degrees,
    # and we only need to subtract 2 degrees to get the accurate temperature.
    accurate_temperature = observation - 2
    X.append(accurate_temperature)

    # A simple world model, just consider the last two observations
    trend = "stable"
    if len(X) >= 2:
        last = X[-1]
        prev = X[-2]
        if last > prev:
            trend = "rising"
        elif last < prev:
            trend = "falling"
        else:
            trend = "stable"
    M.append(trend)
    
    # Hardcoded condition-action rules
    if trend == "rising":
        return "cool"
    elif trend == "falling":
        return "heat"
    else:
        return "off"

### Simulation

Simulate the temperature changes over 24 time steps.

In [24]:
import math

# Environment starts at 18°C
current_temp = 18
print(f"Initial temperature: {current_temp}°C")

for step in range(T):
    # Outside temperature Variation Over Time (Based on Sin Function)
    current_temp += round(math.sin(2 * math.pi * step / 24 - math.pi / 2)) * 2
    # The last action of the will also affect the temperature
    if len(A) >= 1:
        if A[-1] == 'cool':
            current_temp -= 2
        elif A[-1] == 'heat':
            current_temp += 2
    
    # Agent decides action
    action = world_model_agent(current_temp)
    A.append(action)
    
    print(f"Time {step+1}: Temp:{current_temp}°C → Action: {action}")

Initial temperature: 18°C
Time 1: Temp:16°C → Action: off
Time 2: Temp:14°C → Action: heat
Time 3: Temp:14°C → Action: off
Time 4: Temp:12°C → Action: heat
Time 5: Temp:14°C → Action: cool
Time 6: Temp:12°C → Action: heat
Time 7: Temp:14°C → Action: cool
Time 8: Temp:12°C → Action: heat
Time 9: Temp:14°C → Action: cool
Time 10: Temp:14°C → Action: off
Time 11: Temp:16°C → Action: cool
Time 12: Temp:16°C → Action: off
Time 13: Temp:18°C → Action: cool
Time 14: Temp:18°C → Action: off
Time 15: Temp:20°C → Action: cool
Time 16: Temp:20°C → Action: off
Time 17: Temp:22°C → Action: cool
Time 18: Temp:20°C → Action: heat
Time 19: Temp:22°C → Action: cool
Time 20: Temp:20°C → Action: heat
Time 21: Temp:20°C → Action: off
Time 22: Temp:18°C → Action: heat
Time 23: Temp:18°C → Action: off
Time 24: Temp:16°C → Action: heat


In [4]:
# Check the values of symbols
print('O_T:', O_T)
print('X:', X)
print('M:', M)
print('A:', A)

O_T: [14, 12, 10, 8, 8, 8, 8, 8, 8, 10, 12, 14, 16, 18, 20, 22, 24, 24, 24, 24, 22, 20, 18, 16]
X: [12, 10, 8, 6, 6, 6, 6, 6, 6, 8, 10, 12, 14, 16, 18, 20, 22, 22, 22, 22, 20, 18, 16, 14]
M: ['stable', 'falling', 'falling', 'falling', 'stable', 'stable', 'stable', 'stable', 'stable', 'rising', 'rising', 'rising', 'rising', 'rising', 'rising', 'rising', 'rising', 'stable', 'stable', 'stable', 'falling', 'falling', 'falling', 'falling']
A: ['off', 'heat', 'heat', 'heat', 'off', 'off', 'off', 'off', 'off', 'cool', 'cool', 'cool', 'cool', 'cool', 'cool', 'cool', 'cool', 'off', 'off', 'off', 'heat', 'heat', 'heat', 'heat']


## Further Exercie:

**From the results above, we can see the actions of the world model agent is not so rational, try to implement a goal-based agent on top of it so that thermostat can keep the temperature at the specified goal(e.g., 25°C).**
