## The Taxi Environment

In this section, we will represent the task of the [Taxi
environment](https://gym.openai.com/envs/Taxi-v3/)
[@dietterich2000hierarchical]
as a finite-state machine, and use the framework
TempRL to set up the training.

In [17]:
from typing import cast
from gym.envs.toy_text import TaxiEnv

In [5]:
import gym

env = gym.make("Taxi-v3")
print(f"State space: {env.observation_space}")
print(f"Action space: {env.action_space}")

env.reset()
env.render()

State space: Discrete(500)
Action space: Discrete(6)
+---------+
|R: | : :G|
| : | : :[43m [0m|
| : : : : |
| | : | : |
|[35mY[0m| : |[34;1mB[0m: |
+---------+



For the purposes of this tutorial, we will
consider a factorized state space.

- `taxi_row`: an integer between 0 and 4
- `taxi_col`: an integer between 0 and 4
- `passenger_location`: an integer between 0 and 4:
    - 0: R(ed)
    - 1: G(reen)
    - 2: Y(ellow)
    - 3: B(lue)
    - 4: in taxi
- `destination`: an integer between 0 and 3
    - 0: R(ed)
    - 1: G(reen)
    - 2: Y(ellow)
    - 3: B(lue)

There are 6 deterministic actions:

- 0: move south
- 1: move north
- 2: move east
- 3: move west
- 4: pickup passenger
- 5: dropoff passenger

In [16]:
class TaxiWrapper(gym.ObservationWrapper):

    def observation(self, observation):
        """Decode the observation."""
        cast(TaxiEnv, self.unwrapped)
        return tuple(self.unwrapped.decode(observation))

wrapper = TaxiWrapper(env)
taxi_row, taxi_col, passenger_location, destination = wrapper.reset()
wrapper.render()
print(f"taxi_row: {taxi_row}")
print(f"taxi_col: {taxi_col}")
print(f"passenger_location: {passenger_location}")
print(f"destination: {destination}")

+---------+
|R: | : :[34;1mG[0m|
| : |[43m [0m: : |
| : : : : |
| | : | : |
|Y| : |[35mB[0m: |
+---------+

taxi_row: 1
taxi_col: 2
passenger_location: 1
destination: 3


To model the task as a temporal goal task,
we consider two different representations:

- $(x, y) \in \{0..4\}^2$ the features of the agent, and
- A set of fluents $\mathcal{F}:$
    - $pAtR$
    - $pAtY$
    - $pAtG$
    - $pAtB$
    - $pOnT$
    - $dAtR$
    - $dAtY$
    - $dAtG$
    - $dAtB$

The available actions at the high-level are $pickUp(X)$ and $dropOff(X)$.
Note that we don't need to model the movements of the taxi at this level
of abstraction.

Assume that in a certain episode the goal is to
bring the passenger to destination red and the passenger
is currently at position green.

The initial condition is the following:

$$
\mathcal{I} = passengerAtGreen \wedge destinationAtRed
$$

The goal can be represented by the following LTL$_f$ formula:
$$
\lozenge(passengerAtRed \wedge destinationAtRed)
$$

