
# Reinforcement Learning
### env: _taxi-v2_

In [1]:
import gym

## Loading and initializing an environment

In [2]:
env = gym.make('Taxi-v2')
env.reset()

[2017-10-17 13:37:22,537] Making new env: Taxi-v2


164

### Observation states

In [3]:
print('Total number of states = {:,}'.format(env.observation_space.n))

Total number of states = 500


### Visualizing the state
In this environment the yellow square represents the taxi, the (“|”) represents a wall, the blue letter represents the pick-up location, and the purple letter is the drop-off location. The taxi will turn green when it has a passenger aboard. While we see colors and shapes that represent the environment, the algorithm does not think like us and only understands a flattened state, in this case an integer.

In [4]:
env.render()

+---------+
|[35mR[0m: | : :[34;1mG[0m|
| : : :[43m [0m: |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+



### Action space
This shows us there are a total of six actions available. Gym will not always tell you what these actions mean, but in this case, the six possible actions are: down (0), up (1), right (2), left (3), pick-up (4), and drop-off (5).

In [5]:
print('Total number of action the agent can carry out = {:,}'.format(env.action_space.n))

Total number of action the agent can carry out = 6


### Overriding and moving the agent state

In [6]:
env.env.s = 114
env.render()

+---------+
|R: | : :G|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+



In [7]:
step = env.step(1)  # move up (1)
print(step)
env.render()

(14, -1, False, {'prob': 1.0})
+---------+
|[43mR[0m: | : :G|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+
  (North)


In [8]:
step = env.step(0)  # move down (0)
print(step)
env.render()

(114, -1, False, {'prob': 1.0})
+---------+
|R: | : :G|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+
  (South)


In [9]:
step = env.step(2)  # move right (2)
print(step)
env.render()

(134, -1, False, {'prob': 1.0})
+---------+
|R: | : :G|
| :[43m [0m: : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+
  (East)


In [10]:
step = env.step(3)  # move left (3)
print(step)
env.render()

(114, -1, False, {'prob': 1.0})
+---------+
|R: | : :G|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+
  (West)


In [11]:
step = env.step(1)  # move up (1)
print(step)
env.render()

(14, -1, False, {'prob': 1.0})
+---------+
|[43mR[0m: | : :G|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+
  (North)


## Random Actions

One surprising way you could solve this environment is to choose randomly among the six possible actions. The environment is considered solved when you successfully pick up a passenger and drop them off at their desired location. Upon doing this, you will receive a reward of 20 and done will equal True. The odds are small, but it’s still possible, and given enough random actions you will eventually luck out. A core part of evaluating any agent’s performance is to compare it to a completely random agent. In a Gym environment, you can choose a random action using **`env.action_space.sample()`**. You can create a loop that will do random actions until the environment is solved. We will put a counter in there to see how many steps it takes to solve the environment.