# 00 Environment

#### 👉Before you solve a Reinforcement Learning problem you need to define what are
- the actions
- the states of the world
- the rewards

#### 👉`MountainCar-v0` is still an easy environment, but harder than `Taxi-v3`, which we solved in [part 1](https://github.com/Paulescu/hands-on-rl/tree/main/01_taxi) of the course.

#### 👉`MountainCar-v0` is **not** a tabular environment (i.e. tabular = finite number of actions and states), but with a small trick it can become one. This is why I say it is **still an easy environment**.

#### 👉Let's explore it!

In [None]:
%load_ext autoreload
%autoreload 2
%pylab inline
%config InlineBackend.figure_format = 'svg'

from matplotlib import pyplot as plt
%matplotlib inline

## Load the environment 🌎

In [None]:
import gymnasium as gym
env = gym.make('MountainCar-v0', render_mode='rgb_array')

## Plot it 🎨

In [None]:
# Workaround for pygame error: "error: No available video device"
# See https://stackoverflow.com/questions/15933493/pygame-error-no-available-video-device?rq=1
# This is probably needed only for Linux
import os
os.environ["SDL_VIDEODRIVER"] = "dummy"
_, _ = env.reset()
frame = env.render()

fig, ax = plt.subplots(figsize=(8, 6))
ax.axes.yaxis.set_visible(False)
ax.imshow(frame, extent=[env.min_position, env.max_position, 0, 1])

## Action space

- `0` Accelerate to the left
- `1` Don't accelerate
- `2` Accelerate to the right

In [None]:
print("Action Space {}".format(env.action_space))

## State space

In [None]:
# The state consists of 2 numbers:
# - Car's position, from -1.2 to 0.6
# - Car's velocity, from -0.07 to 0.07
print("State Space {}".format(env.observation_space))

print(f'Position ranges from {env.min_position} to {env.max_position}')
print(f'Velocity ranges from {-env.max_speed} to {env.max_speed}')

## Rewards

- A reward of -1 is awarded if the position of the car is less than 0.5.
- The episode ends once the car's position is above 0.5, or the max number of steps has been reached: `n_steps >= env._max_episode_steps`

A default negative reward of -1 encourages the car to escape the valley as fast as possible.