# An example of the Gym framework: CartPole

Inspired from http://gym.openai.com/docs/. 

- If gym is not yet installed on your computer, just do pip install gym;
- Or you can choose to install a kernelspec of your conda env if you don't want to install gym globally.

In [1]:
import gym 
import numpy as np

## What is inside the environement 

The source code can be found [here](https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py). 

In [2]:
import gym

# load the Cart Pole environnement 
env = gym.make('CartPole-v1')
env.reset()

print("the action space is", env.action_space) # two actions 
print("the observation space (state space here) is", env.observation_space, "\n") # a four-dimensional product of intervals

print("range of the observation space:")
print(env.observation_space.low) 
print(env.observation_space.high)

the action space is Discrete(2)
the observation space (state space here) is Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32) 

range of the observation space:
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]


## Visualizing a random policy interacting with the environment 

In [4]:
episode = 0
cumulative_reward = 0
maximum_reward = 0

for _ in range(500):
    env.render()
    # take a random action
    obs, reward, done, obs = env.step(env.action_space.sample()) 
    if done:
        env.reset()
        episode +=1
        if cumulative_reward > maximum_reward:
            maximum_reward = cumulative_reward
        cumulative_reward = 0
    else:
        cumulative_reward += 1
        
env.close()

print("the number of finished episodes is ", episode)
print("the maximum duration of an episode is ", maximum_reward)

the number of finished episodes is  23
the maximum duration of an episode is  45
