# 00 Environment

#### 👉Before you solve a Reinforcement Learning problem you need to define what are
- the actions
- the states of the world
- the rewards

#### 👉We are using the `Taxi-v3` environment from OpenAI's gym: https://gym.openai.com/envs/Taxi-v3/

#### 👉`Taxi-v3` is an easy environment because the action space is small, and the state space is large but finite.

#### 👉Environments with a finite number of actions and states are called tabular

In [None]:
%load_ext autoreload
%autoreload 2
%pylab inline
%config InlineBackend.figure_format = 'svg'

## Load the environment 🌎

In [None]:
import gymnasium as gym
env = gym.make("Taxi-v3", render_mode=None)

## Action space

In [None]:
print("Action Space {}".format(env.action_space))

## State space

In [None]:
print("State Space {}".format(env.observation_space))

## Rewards

In [None]:
# env.P is double dictionary.
# - The 1st key represents the state, from 0 to 499
# - The 2nd key represens the action taken by the agent,
#   from 0 to 5

# example
state = 123
action = 0  # move south

# env.P[state][action][0] is a list with 4 elements
# (probability, next_state, reward, done)
# 
#  - probability
#    It is always 1 in this environment, which means
#    there are no external/random factors that determine the
#    next_state
#    apart from the agent's action a.
#
#  - next_state: 223 in this case
# 
#  - reward: -1 in this case
#
#  - done: boolean (True/False) indicates whether the
#    episode has ended (i.e. the driver has dropped the
#    passenger at the correct destination)
print('env.P[state][action][0]: ', env.P[state][action][0])

In [None]:
# Need to call reset() at least once before render() will work
env.reset()

env.reset(seed=123)
env.render()

In [None]:
env.reset(seed=223)
env.render()