Inport gym and run an instance of the CartPole-v0 environment for 1000 timesteps. This will render the environment at each step.

In [10]:
import gym

env = gym.make('CartPole-v0')
env.reset()

for _ in range(1000):
    env.render()
    # take a random action
    env.step(env.action_space.sample()) 
    
    if done:
        break

[2017-04-12 14:47:03,992] Making new env: CartPole-v0
[2017-04-12 14:47:04,001] Finished writing results. You can upload them to the scoreboard via gym.upload('/Users/DreamFactory/cartpole-experiments')


Do the same, but this time for Ms Pacman.

In [11]:
env = gym.make('MsPacman-v0')
env.reset()

for _ in range(1000):
    env.render()
    # take a random action
    env.step(env.action_space.sample()) 
    
    if done:
        break

[2017-04-12 14:47:24,868] Making new env: MsPacman-v0


We can easily get output from an environment, including observations, rewards, episode state, and diagnostic info.

In [4]:
import gym

env = gym.make('CartPole-v0')

for i_episode in range(20):
    observation = env.reset()
    
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        print(observation, reward, done, info)
        
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

[2017-04-03 14:34:02,065] Making new env: CartPole-v0


[ 0.04571104 -0.0150851  -0.03563222  0.01190199]
[ 0.04540934  0.18052927 -0.03539418 -0.29180697] 1.0 False {}
[ 0.04540934  0.18052927 -0.03539418 -0.29180697]
[ 0.04901992 -0.01407061 -0.04123032 -0.01049368] 1.0 False {}
[ 0.04901992 -0.01407061 -0.04123032 -0.01049368]
[ 0.04873851 -0.20857775 -0.04144019  0.2689008 ] 1.0 False {}
[ 0.04873851 -0.20857775 -0.04144019  0.2689008 ]
[ 0.04456696 -0.40308457 -0.03606218  0.5482306 ] 1.0 False {}
[ 0.04456696 -0.40308457 -0.03606218  0.5482306 ]
[ 0.03650527 -0.20747506 -0.02509757  0.24440713] 1.0 False {}
[ 0.03650527 -0.20747506 -0.02509757  0.24440713]
[ 0.03235576 -0.01200379 -0.02020942 -0.05608534] 1.0 False {}
[ 0.03235576 -0.01200379 -0.02020942 -0.05608534]
[ 0.03211569  0.18340201 -0.02133113 -0.35507536] 1.0 False {}
[ 0.03211569  0.18340201 -0.02133113 -0.35507536]
[ 0.03578373  0.37882066 -0.02843264 -0.65440753] 1.0 False {}
[ 0.03578373  0.37882066 -0.02843264 -0.65440753]
[ 0.04336014  0.18410589 -0.04152079 -0.370811

Every environment has a 'Space' object that describes valid actions and observations.

In [5]:
env = gym.make('CartPole-v0')

print(env.action_space)
print(env.observation_space)

[2017-04-03 14:38:56,970] Making new env: CartPole-v0


Discrete(2)
Box(4,)


The 'Discrete' space allows a fixed range of non-negative numbers. In this example, valid actions are 0 or 1. The 'Box' space represents an n-dimensional box. In this example, valid observations is an array of 4 numbers. We can also check the bounds of the Box.

In [6]:
print(env.observation_space.high)
print(env.observation_space.low)

[  4.80000000e+00   3.40282347e+38   4.18879020e-01   3.40282347e+38]
[ -4.80000000e+00  -3.40282347e+38  -4.18879020e-01  -3.40282347e+38]


We can sample from a Space and check that something belongs to it.

In [7]:
from gym import spaces

# set space with 8 elements {0, 1, 2, ..., 7}
space = spaces.Discrete(8) 

x = space.sample()
assert space.contains(x)
assert space.n == 8

In [9]:
print(x)

3


In [11]:
x = space.sample()

In [12]:
print(x)

1


To list available environments:

In [14]:
from gym import envs

print(envs.registry.all())

dict_values([EnvSpec(YarsRevenge-ram-v3), EnvSpec(PongDeterministic-v3), EnvSpec(UpNDown-ramNoFrameskip-v3), EnvSpec(CarnivalNoFrameskip-v0), EnvSpec(UpNDownDeterministic-v3), EnvSpec(JourneyEscape-v3), EnvSpec(RoadRunner-v3), EnvSpec(Pong-v0), EnvSpec(JourneyEscape-ramDeterministic-v0), EnvSpec(NameThisGame-v3), EnvSpec(PrivateEyeDeterministic-v3), EnvSpec(Berzerk-v3), EnvSpec(Pong-ram-v0), EnvSpec(AssaultDeterministic-v3), EnvSpec(Venture-ramDeterministic-v3), EnvSpec(Amidar-v0), EnvSpec(Carnival-v3), EnvSpec(Pitfall-v3), EnvSpec(PhoenixNoFrameskip-v3), EnvSpec(Robotank-v0), EnvSpec(Robotank-v3), EnvSpec(Tutankham-ramDeterministic-v3), EnvSpec(MsPacmanDeterministic-v3), EnvSpec(FrozenLake-v0), EnvSpec(Freeway-ramNoFrameskip-v0), EnvSpec(BreakoutDeterministic-v0), EnvSpec(BerzerkDeterministic-v0), EnvSpec(ElevatorActionNoFrameskip-v3), EnvSpec(Gopher-ramDeterministic-v0), EnvSpec(IceHockey-ram-v0), EnvSpec(EnduroNoFrameskip-v3), EnvSpec(StarGunner-v3), EnvSpec(Boxing-v0), EnvSpec(Assa

We can use the 'Monitor' wrapper to record an algorithm's performance on an environment.

In [9]:
from gym import wrappers

env = gym.make('CartPole-v0')
env = gym.wrappers.Monitor(env, 'cartpole-experiments/', force=True)


for i_episode in range(20):
    observation = env.reset()
    
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

[2017-04-12 12:44:25,024] Making new env: CartPole-v0
[2017-04-12 12:44:25,032] Finished writing results. You can upload them to the scoreboard via gym.upload('/tmp/cartpole-experiment-1')
[2017-04-12 12:44:25,038] Creating monitor directory cartpole-experiments/
[2017-04-12 12:44:25,040] Starting new video recorder writing to /Users/DreamFactory/cartpole-experiments/openaigym.video.2.57037.video000000.mp4


[ 0.0108736   0.00374293  0.04373512 -0.04805103]
[ 0.01094846 -0.19197797  0.0427741   0.25810362]
[ 0.0071089   0.00250805  0.04793617 -0.02078681]
[ 0.00715906  0.19691093  0.04752044 -0.29796833]
[ 0.01109728  0.39132441  0.04156107 -0.5752936 ]
[ 0.01892377  0.58583985  0.0300552  -0.85459936]
[ 0.03064056  0.78053957  0.01296321 -1.13768215]
[ 0.04625135  0.5852505  -0.00979043 -0.84096208]
[ 0.05795636  0.39026356 -0.02660968 -0.55137403]
[ 0.06576164  0.58574894 -0.03763716 -0.8523206 ]
[ 0.07747661  0.78136322 -0.05468357 -1.15659694]
[ 0.09310388  0.97715372 -0.07781551 -1.46591265]
[ 0.11264695  1.17313753 -0.10713376 -1.78185344]
[ 0.1361097   1.36928888 -0.14277083 -2.10583068]


[2017-04-12 12:44:25,621] Starting new video recorder writing to /Users/DreamFactory/cartpole-experiments/openaigym.video.2.57037.video000001.mp4


[ 0.16349548  1.56552438 -0.18488744 -2.43902122]
Episode finished after 15 timesteps
[ 0.01017865  0.00874627 -0.04572754  0.01110334]
[ 0.01035358  0.20449317 -0.04550547 -0.2956493 ]
[ 0.01444344  0.01004849 -0.05141846 -0.01765821]
[ 0.01464441 -0.18429979 -0.05177162  0.25836857]
[ 0.01095842 -0.37864588 -0.04660425  0.53428304]
[ 0.0033855  -0.18290058 -0.03591859  0.22728721]
[-0.00027251  0.01271578 -0.03137284 -0.07650587]
[ -1.81963299e-05   2.08273115e-01  -3.29029614e-02  -3.78919620e-01]
[ 0.00414727  0.40384651 -0.04048135 -0.68179265]
[ 0.0122242   0.20930944 -0.05411721 -0.40212439]
[ 0.01641038  0.01499516 -0.06215969 -0.12698237]
[ 0.01671029  0.21094998 -0.06469934 -0.4386101 ]
[ 0.02092929  0.01680057 -0.07347154 -0.16700365]
[ 0.0212653   0.21289311 -0.07681162 -0.48193014]
[ 0.02552316  0.40901044 -0.08645022 -0.79779915]
[ 0.03370337  0.60520541 -0.1024062  -1.11637729]
[ 0.04580748  0.80151136 -0.12473375 -1.43934839]
[ 0.06183771  0.99792977 -0.15352072 -1.7682

[2017-04-12 12:44:29,940] Starting new video recorder writing to /Users/DreamFactory/cartpole-experiments/openaigym.video.2.57037.video000008.mp4


[ 0.1634254   0.94127229 -0.18882469 -1.52596036]
Episode finished after 18 timesteps
[ 0.0362162   0.0130795  -0.00122622  0.03365336]
[ 0.03647779 -0.18202484 -0.00055315  0.32594916]
[ 0.0328373   0.01310498  0.00596583  0.03309184]
[ 0.0330994   0.20814087  0.00662767 -0.25770285]
[ 0.03726221  0.01292493  0.00147361  0.03706315]
[ 0.03752071 -0.18221812  0.00221488  0.33021066]
[ 0.03387635 -0.37737153  0.00881909  0.62359123]
[ 0.02632892 -0.18237382  0.02129091  0.33369881]
[ 0.02268144  0.01243873  0.02796489  0.04780526]
[ 0.02293022 -0.18307282  0.028921    0.34917853]
[ 0.01926876 -0.3785939   0.03590457  0.65083911]
[ 0.01169688 -0.57419706  0.04892135  0.95460841]
[  2.12941771e-04  -7.69941768e-01   6.80135169e-02   1.26225133e+00]
[-0.01518589 -0.96586427  0.09325854  1.57543618]
[-0.03450318 -1.16196602  0.12476727  1.89568887]
[-0.0577425  -0.96839794  0.16268104  1.64418024]
[-0.07711046 -0.7755112   0.19556465  1.40628782]
Episode finished after 17 timesteps
[ 0.0309

Now we can try taking non-random actions and store a high score for multiple episodes.

In [3]:
import gym

env = gym.make('CartPole-v0')
highscore = 0

for _ in range(20):
    observation = env.reset()
    # track reward of each episode
    points = 0
    
    # run until episode is finished
    while True: 
        env.render()
        
        # if angle is positive then move right, if negative then move left
        if observation[2] > 0:
            action = 1
        else: 
            action = 0
        
        observation, reward, done, info = env.step(action)
        points += reward
        
        if done:
            # record high score
            if points > highscore:
                highscore = points
                print('New highscore: %s' % highscore)
            break

print('Final highscore: %s' % highscore)     

[2017-04-12 11:48:32,999] Making new env: CartPole-v0


New highscore: 42.0
New highscore: 55.0
New highscore: 59.0
Final highscore: 59.0
