In [None]:
from tensorforce.environments import Environment 
from tensorforce.agents import Agent
from tensorforce.execution import Runner 

### Brief Intro to LunarLander
---
LunarLander is a 2D game in which an agent has control of a lunar module and the goal is to land this module inbetween two flags by engaging thrusters (with noisy outcomes) and consuming fuel.

The state has 8 components:

    - position (horizontal and vertical)
    - velocity (horizontal and vertical)
    - angle and angular velocity
    - left and right leg contact 
    
The agent can take 4 actions:

    - do nothing
    - fire main engine (push up)
    - fire left engine (push right)
    - fire right engine (push left)
    
Initialization starts with the lunar module at the top of the screen with a random initial velocity and the landing pad always at coordinates (0, 0).

Rewards:

    - crashes or comes to rest (-100 or +100 points)
    - leg ground contact (+10 points for each leg)
    - firing main engine (-0.3 points for each frame)
    - firing side engine (-0.03 points for each frame)


### Initializing an Environment (with monitoring) 
---

Note I needed to `brew install ffmpeg` on my machine for monitoring to work 

In [None]:
level = 'LunarLander-v2'


environment = Environment.create(
    environment='gym',
    level=level,
    max_episode_timesteps=500, 
    terminal_reward=50, # reward for finishing before max_episode_timesteps (encouraging landing not hovering)
                        # if this is too high we are encouraging crashes
    visualize=True,
    visualize_directory='../tensorforce/monitor/' + level 
)

### Initializing an Agent 
---

In [None]:
agent = Agent.create(
    agent='tensorforce',
    environment=environment,
    update=64,
    optimizer=dict(optimizer='adam', learning_rate=5e-4), # lower learning rate than CartPole solution
    objective='policy_gradient',
    reward_estimation=dict(horizon=20),
    summarizer=dict(
        directory='../tensorforce/summaries',
        summaries='all'
    )
)

### Execution with the Runner Utility
---

In [None]:
runner = Runner(
    agent=agent,
    environment=environment,
    max_episode_timesteps=500 
)

runner.run(num_episodes=2000) # higher num_episodes than CartPole solution

runner.run(num_episodes=100, evaluation=True)

runner.close()

In [None]:
!tensorboard --logdir=../tensorforce/summaries