In [1]:
from tensorforce.environments import Environment 
from tensorforce.agents import Agent
from tensorforce.execution import Runner 

### Brief Intro to LunarLander
---
LunarLander is a 2D game in which an agent has control of a lunar module and the goal is to land this module inbetween two flags by engaging thrusters (with noisy outcomes) and consuming fuel.

The state has 8 components:

    - position (horizontal and vertical)
    - velocity (horizontal and vertical)
    - angle and angular velocity
    - left and right leg contact 
    
The agent can take 4 actions:

    - do nothing
    - fire main engine (push up)
    - fire left engine (push right)
    - fire right engine (push left)
    
Initialization starts with the lunar module at the top of the screen with a random initial velocity and the landing pad always at coordinates (0, 0).

Rewards:

    - crashes or comes to rest (-100 or +100 points)
    - leg ground contact (+10 points for each leg)
    - firing main engine (-0.3 points for each frame)
    - firing side engine (-0.03 points for each frame)


### Initializing an Environment (with monitoring) 
---

Note I needed to `brew install ffmpeg` on my machine for monitoring to work 

In [2]:
level = 'LunarLander-v2'


environment = Environment.create(
    environment='gym',
    level=level,
    max_episode_timesteps=700,
    visualize=True,
    visualize_directory='../tensorforce/monitor/' + level 
)

### Initializing an Agent 
---

In [3]:
agent = Agent.create(
    agent='tensorforce',
    environment=environment,
    update=64,
    optimizer=dict(optimizer='adam', learning_rate=5e-4), 
    objective='policy_gradient',
    reward_estimation=dict(horizon=20),
    summarizer=dict(
        directory='../tensorforce/summaries',
        summaries='all'
    )
)



### Execution with the Runner Utility
---

In [4]:
runner = Runner(
    agent=agent,
    environment=environment,
    max_episode_timesteps=700
)

runner.run(num_episodes=2000)

runner.run(num_episodes=100, evaluation=True)

runner.close()

Episodes:   0%|          | 0/2000 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]

Episodes:   0%|          | 0/100 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]

In [5]:
!tensorboard --logdir=../tensorforce/summaries


NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.9.0 at http://localhost:6006/ (Press CTRL+C to quit)
^C
