# Create a Simple Reflex-Based Lunar Lander Agent

In this example, we will use Gymnasium, an environment to train agents via reinforcement learning (RL). We will not use RL here but just use the environment with a custom simple reflex-based agent. 

## Install Gymnasium

The documentation for Gymnasium is available at https://gymnasium.farama.org/ 

Steps:
1. Create a new folder and open it with VS Code and install all needed Python Extensions in VS Code.
2. Create a new virtual environment (CTRL-Shift P Python Create Environment...)
3. I needed to install swig and the Python C++ headers on WSL2 via the terminal
    * `sudo apt install swig`
    * `sudo apt-get install python3-dev` 
4. Install gymnasium with the needed extras

In [None]:
%pip install -q swig
%pip install -q gymnasium[box2d,classic_control]

## The Lunar Lander Environment 

The documentation of the environment is available at: https://gymnasium.farama.org/environments/box2d/lunar_lander/

* Performance Measure: A reward of -100 or +100 points for crashing or landing safely respectively. We do not use 
  intermediate rewards here.

* Environment: This environment is a classic rocket trajectory optimization problem. A ship needs to land safely. The space is **continuous** with
  x and y coordinates in the range [-2.5, 2.5]. The landing pad is at coordinate (0,0).

* Actuators:  According to Pontryagin’s
  maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off. There are four discrete actions available:

    - 0: do nothing
    - 1: fire left orientation engine
    - 2: fire main engine
    - 3: fire right orientation engine

* Sensors: Each observation is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.

Gymnasim environments are implemented as classes with a `make` method to create the environment, a `reset` method, and a `step` method to execute an action.
To use it with an agent function that expects percetps and returns an action, we need write glue code that connects the environment with the agent function.

In [5]:
import gymnasium as gym

def run_episode(agent_function, max_steps=1000):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialize the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation (use seed=42 in reset to get reproducible results)
    observation, info = env.reset()

    # run one episode
    for _ in range(max_steps):
        # call the agent function to select an action
        action = agent_function(observation)

        print (f"Obs: {observation} -> Action: {action}")

        # step: execute an action in the environment
        observation, reward, terminated, truncated, info = env.step(action)
    
        env.render()

        if terminated:
            print(f"Final Reward: {reward}")
            break
    
    env.close()
    return reward

Note: `env.render()` shows the environment when the notebook is locally run (e.g., in VScode). On Colab, you cannot see the environment because the code is run on a headless server (i.e., a server without a display). There are some workarounds you can google.

## Example: A Random Agent

We ranomly return one of the actions. The environment accepts the integers 0-3.


In [8]:
import numpy as np

def random_agent_function(observation): 
    """A random agent that selects actions uniformly at random. It ignores the observation."""
    return np.random.choice([0, 1, 2, 3], p=[0.25, 0.25, 0.25, 0.25])

run_episode(random_agent_function)

Obs: [-0.00191422  1.4000576  -0.19390155 -0.48278096  0.00222485  0.04392162
  0.          0.        ] -> Action: 2
Obs: [-0.00391626  1.390079   -0.20199127 -0.44349682  0.00398342  0.0351756
  0.          0.        ] -> Action: 1
Obs: [-0.00598555  1.3795073  -0.21041162 -0.46986824  0.00742835  0.06890505
  0.          0.        ] -> Action: 2
Obs: [-0.00822887  1.3696134  -0.22700128 -0.43973824  0.01006807  0.0527994
  0.          0.        ] -> Action: 3
Obs: [-0.01038752  1.3591299  -0.21637464 -0.46594182  0.0105744   0.01012736
  0.          0.        ] -> Action: 1
Obs: [-0.01262779  1.3480498  -0.22661981 -0.4924714   0.01313464  0.05120957
  0.          0.        ] -> Action: 0
Obs: [-0.01486816  1.3363696  -0.22662804 -0.5191399   0.01569312  0.05117427
  0.          0.        ] -> Action: 1
Obs: [-0.0171772   1.3240843  -0.23522858 -0.546066    0.01997583  0.08566215
  0.          0.        ] -> Action: 2
Obs: [-0.01941042  1.3122385  -0.22807793 -0.52654964  0.02468403 

-100

## A Simple Reflex-Based Agent

To make the code easier to read, we use enumerations for actions (integers) and observations (index in the observation vector).

In [2]:
from enum import Enum

class Act(Enum):
    LEFT = 1
    RIGHT = 3
    MAIN = 2
    NO_OP = 0

class Obs(Enum):
    X = 0
    Y = 1
    VX = 2
    VY = 3
    ANGLE = 4
    ANGULAR_VELOCITY = 5
    LEFT_LEG_CONTACT = 6
    RIGHT_LEG_CONTACT = 7



## Implement A Better Reflex-Based Agent

Build a better that uses its right and left thrusters to land the craft (more) safely. Test your agent function using 100 problems.

In [3]:
def rocket_agent_function(observation):
    """Rule-based agent for lunar lander."""

    rules = [
        # (điều kiện, hành động)
        #Hãm tốc độ rơi
        (lambda obs: obs[Obs.VY.value] < -0.2, Act.MAIN.value),
        #Canh góc
        (lambda obs: obs[Obs.ANGLE.value] > 0.15, Act.RIGHT.value), #mui tau nghieng phai nen dung dong co phai tra day ve
        (lambda obs: obs[Obs.ANGLE.value] < -0.15, Act.LEFT.value), #mui tau nghieng trai nen dung dong co trai tra day ve
        #Đưa tàu về bãi đáp chỉ định
        (lambda obs: obs[Obs.X.value] > 0.1, Act.LEFT.value),
        (lambda obs: obs[Obs.X.value] < -0.1, Act.RIGHT.value)
    ]

    # chạy qua từng rule
    for condition, action in rules:
        if condition(observation):
            return action

    return Act.NO_OP.value
run_episode(rocket_agent_function)


NameError: name 'run_episode' is not defined

## Evaluating the Agent

Run the agent on 100 problems and report the average reward.

In [11]:
#TEST 1: VY < -0.3
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, -100, 100, -100, -100, 100, -100, -100, -100, 100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, 100, -100, -100, -100, -100, -100, -100, -100, -100]
Average reward: -82.0
Success rate: 9/100


In [26]:
#TEST 2: VY < -0.2
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[100, 100, -100, -100, -100, -100, 100, -100, -100, np.float64(-0.0806241681046842), 100, 100, -100, 100, -100, -100, 100, -100, 100, -100, 100, 100, -100, -100, 100, -100, -100, 100, -100, -100, -100, -100, 100, -100, 100, 100, -100, -100, -100, -100, 100, -100, -100, 100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, 100, -100, 100, -100, 100, -100, -100, -100, -100, -100, 100, 100, -100, -100, -100, -100, -100, -100, -100, 100, -100, 100, -100, 100, -100, -100, 100, -100, 100, 100, -100, -100, 100, -100, 100, -100, -100, -100, -100, -100, 100, np.float64(1.234205915060316), -100, -100, 100, 100]
Average reward: -29.988464182530443
Success rate: 34/100


In [25]:
#TEST 3: VY < -0.15
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[-100, 100, -100, -100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, 100, 100, -100, 100, 100, 100, -100, -100, -100, -100, -100, 100, -100, -100, -100, 100, -100, -100, 100, -100, -100, -100, -100, -100, np.float64(-0.22023092004587738), -100, 100, -100, np.float64(-0.021794024895214648), 100, 100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, 100, 100, -100, -100, 100, 100, -100, -100, 100, -100, -100, -100, -100, -100, 100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, 100]
Average reward: -50.00242024944941
Success rate: 24/100


In [None]:
#TEST 3: VY < -0.2 | -0.1<X & X>0.1 left right
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode="human")

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

Tàu đã hạ cánh tốt hơn và ttimf ve dich tot hon. Toi nen them rule khi chua o X bang 0 thi ko tat dong co day.

In [None]:
#TEST 3: VY < -0.2 | -0.1<X & X>0.1 left right| abs(X) > 0.1 main
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode="human")

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

Khong dap dat duoc hahaha, gan dat thi no bat len

In [None]:
#TEST 3: đổi thứ tự sang đưa về pad 0 -> canh angle -> use main
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode="human")

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

Tau rớt liên tù tì =))))

In [21]:
#TEST 3: đổi thứ tự sang đưa về -> use main -> pad 0 -> canh angle
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[-100, -100, -100, -100, -100, -100, -100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
Average reward: -94.0
Success rate: 3/100


Như hạch

In [53]:
#TEST 3: tắt v angle (roi nhanh hon do tan suat bat main bi giam) ket qua cao nhat len toi 41/100 du tau mat can bang nhieu hơn
#toi se thu dieu chinh angle ve 0.15 cho ra ket qua tot hon rat nhieu
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[100, -100, -100, 100, 100, -100, 100, 100, 100, 100, 100, 100, 100, 100, -100, 100, 100, -100, -100, -100, 100, 100, 100, 100, 100, -100, 100, -100, 100, -100, 100, 100, 100, 100, -100, 100, -100, -100, 100, 100, -100, -100, 100, 100, -100, 100, -100, 100, 100, -100, 100, -100, -100, 100, 100, 100, 100, 100, -100, -100, 100, 100, 100, 100, -100, 100, 100, 100, -100, 100, 100, 100, -100, 100, -100, 100, 100, -100, -100, -100, -100, -100, -100, -100, -100, 100, -100, 100, 100, -100, 100, 100, 100, -100, 100, -100, -100, -100, 100, 100]
Average reward: 20.0
Success rate: 60/100


In [None]:
#TEST 3: tắt v angle (roi nhanh hon do tan suat bat main bi giam) ket qua cao nhat len toi 41/100 du tau mat can bang nhieu hơn
#toi se thu dieu chinh angle ve 0.15 cho ra ket qua tot hon rat nhieu
#toi them lai dieu kien v angle nhung de sau dieu kien angle
#toi them dieu kien Y < -0.1 thi chay main ->> khong on lam
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode="human")

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

In [122]:
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode=None)

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

[100, -100, -100, -100, 100, -100, -100, -100, 100, -100, 100, 100, 100, 100, -100, -100, 100, -100, 100, -100, -100, 100, -100, 100, -100, 100, 100, -100, -100, 100, 100, 100, -100, 100, -100, 100, np.float64(0.21965917964532025), 100, -100, -100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, -100, -100, 100, -100, 100, 100, 100, -100, 100, -100, 100, 100, 100, 100, -100, 100, 100, 100, 100, -100, 100, -100, 100, -100, 100, -100, 100, 100, 100, np.float64(0.10289941503260039), 100, -100, 100, 100, 100, -100, 100, 100, 100, -100, 100, 100, 100, 100, -100, -100, 100]
Average reward: 28.00322558594678
Success rate: 63/100


In [6]:
import numpy as np

def run_episode_test(agent_function):
    """Run one episode in the LunarLander-v3 environment using the provided agent."""

    # Initialise the environment
    env = gym.make("LunarLander-v3", render_mode="human")

    # Reset the environment to generate the first observation
    observation, info = env.reset()

    # run one episode (max. 1000 steps)
    for _ in range(1000):
        # call the agent to select an action
        action = agent_function(observation)

        # step (transition) through the environment with the action
        observation, reward, terminated, truncated, info = env.step(action)

        if terminated:
            break

    env.close()
    return reward

def run_episodes(agent_function, n=100):
    """Run multiple episodes with the given agent and return the rewards for each episode."""
    return [run_episode_test(agent_function) for _ in range(n)]

rewards = run_episodes(rocket_agent_function)
print(rewards)

print(f"Average reward: {np.average(rewards)}")
print(f"Success rate: {np.sum(np.array(rewards) == 100)}/{len(rewards)}")

  from pkg_resources import resource_stream, resource_exists


KeyboardInterrupt: 