# The Cart Pole Environment
Let's take what we've learned and try it out on the [Cart Pole environment](https://gym.openai.com/envs/CartPole-v1/).

In [1]:
import gym
env = gym.make('CartPole-v1')
observation = env.reset()
env.render()

True

![image.png](attachment:image.png)

The idea on this one is there's a cart you can move left or right, and a pole mounted to a joint.  The agent gets rewards each step the pole is above center.  The episode ends if the cart moves too far to the left or right, or if the pole gets more than 15 degrees from vertical.

Let's bring in some of our earlier code and get a random agent going.

In [29]:
env.close()

In [13]:
class Episode:
    """Tracks the history of what happened in a playthrough, which we can use for training."""
    def __init__(self):
        self.steps = [] # For each time step, a tuple of (state, action, reward)
        self.got_reward = False
    
    def record_step(self, state, action, reward):
        step = (state, action, reward)
        self.steps.append(step)
        if reward > -1:
            self.got_reward = True

    def __len__(self):
        return len(self.steps)
            
class RandomPolicy:            
    def __init__(self, env):
        self.env = env

    def suggest_action(self, state):
        return self.env.action_space.sample()
    
    def update_policy(self, episode):
        pass
    
    def generalize(self, observation):
        return observation
            
class Agent:
    max_steps_per_episode = 1000
    
    def __init__(self, env, policy):
        self.env = env
        self.policy = policy
    
    def run_episode(self, render=False):
        episode = Episode()
        observation = env.reset()
        for i in range(self.max_steps_per_episode):
            state = self.policy.generalize(observation)
            action = self.policy.suggest_action(state)
            observation, reward, done, info = env.step(action)
            episode.record_step(state, action, reward)
            if render:
                env.render()
            if done:
                break
        return episode
    
    def train(self, episode_count):
        """Train for a number of episodes.  Returns a list of episode lengths."""
        training_history = []
        for i in range(episode_count):
            episode = self.run_episode()
            self.policy.update_policy(episode)
            training_history.append(len(episode))
        return training_history

rand_agent = Agent(env, RandomPolicy(env))

In [23]:
episode = rand_agent.run_episode(render=True)

Let's look at our state space.  How many attributes are there in the state, and how many in the actions?

In [22]:
print(env.observation_space)
print(env.observation_space.high)
print(env.observation_space.low)

Box(4,)
[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


In [24]:
print(env.action_space)

Discrete(2)


So there are just 2 actions, move left and move right.

Let's look at that episode to see if we were getting any rewards at first.

In [26]:
episode.steps

[(array([-0.03947672, -0.00646558,  0.02318071, -0.00657104]), 0, 1.0),
 (array([-0.03960603, -0.20191217,  0.02304929,  0.2933346 ]), 0, 1.0),
 (array([-0.04364427, -0.39735504,  0.02891598,  0.59319692]), 1, 1.0),
 (array([-0.05159137, -0.20264954,  0.04077992,  0.30976087]), 1, 1.0),
 (array([-0.05564436, -0.00813162,  0.04697514,  0.03021255]), 0, 1.0),
 (array([-0.055807  , -0.20389465,  0.04757939,  0.33733878]), 0, 1.0),
 (array([-0.05988489, -0.39966024,  0.05432617,  0.64463792]), 1, 1.0),
 (array([-0.06787809, -0.20533579,  0.06721892,  0.3695453 ]), 0, 1.0),
 (array([-0.07198481, -0.4013452 ,  0.07460983,  0.68264292]), 1, 1.0),
 (array([-0.08001171, -0.20733428,  0.08826269,  0.41435111]), 0, 1.0),
 (array([-0.0841584 , -0.40358914,  0.09654971,  0.73350303]), 1, 1.0),
 (array([-0.09223018, -0.20992435,  0.11121977,  0.47270031]), 0, 1.0),
 (array([-0.09642867, -0.40642692,  0.12067378,  0.79826425]), 1, 1.0),
 (array([-0.10455721, -0.21314896,  0.13663906,  0.5458477 ]), 1