# Lunar Lander Problem
The problem consists of an 8-dimensional continuous state space and a discrete action space. The four discrete
actions available are: do nothing, fire the left orientation engine, fire the main engine, fire the right orientation
engine. The landing pad is always at coordinates (0,0). Coordinates consist of the first two numbers in the state
vector. The total reward for moving from the top of the screen to the landing pad ranges from 100 - 140 points
varying on the lander placement on the pad. If the lander moves away from the landing pad it is penalized the
amount of reward that would be gained by moving towards the pad. An episode finishes if the lander crashes or
comes to rest, receiving an additional -100 or +100 points respectively. Each leg ground contact is worth +10
points. Firing the main engine incurs a -0.3 point penalty for each occurrence. Landing outside of the landing
pad is possible. Fuel is infinite, so, an agent could learn to fly and land on its first attempt. The problem is
considered solved when achieving a score of 200 points or higher on average over 100 consecutive runs.

## States
At each time step, a tuple of size 8 is given representing the 8 states: &emsp;  *(x,y,$v_{x}$,$v_{y}$,$\theta$,$v_{\theta}$,$leg_{L}$,$leg_{R}$)*
State in respective order:
- *x-coordinate* 
- *y-coordinate*
- *horizontal velocity ($v_{x}$)*
- *vertical velocity ($v_{y}$)*
- *angle of lander with respect to verical access*
- *angular velocity of the lander*
- *boolean for if left leg is touching ground*
- *boolean for if right leg is touching ground*

# Rewards
Reward for moving from the top of the screen to the landing pad and coming to rest is about 100-140 points. If the lander moves away from the landing pad, it loses reward. If the lander crashes, it receives an additional -100 points. If it comes to rest, it receives an additional +100 points. Each leg with ground contact is +10 points. Firing the main engine is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200 points.

# Import and Create Environment

In [41]:
import gym
import matplotlib.pyplot as plt
from IPython import display
%matplotlib inline

# define seed for reproducibility
seed=222980

# initialize environment
env = gym.make('LunarLander-v2',render_mode="human")
env.action_space.seed(seed)

# get info on environment and seed
observation, _ = env.reset(seed=seed, options={})

# get environment info
num_actions = env.action_space.n 
num_inputs = env.observation_space.shape[0]

observation

array([-0.00676279,  1.4031343 , -0.68501765, -0.34606552,  0.00784321,
        0.15516661,  0.        ,  0.        ], dtype=float32)

In [19]:
num_actions

4

In [20]:
num_inputs

8

# Demonstrate Untrained Simulation
- Take random action.
- Unpack information after taking simulation step.
- Show simulation.
- Close environment

In [42]:
# %%
for _ in range(100):
    observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
    env.render()

    if terminated or truncated:
        observation, info = env.reset()

env.close()