# **Tabular Reinforcement Learning**

# Q-Learning on FrozenLake environment

## Non-Evaluables Practical Exercices

This is a non-evaluable practical exercise, but it is recommended that students complete it fully and individually, since it is an important part of the learning process.

The solution will be available, although it is not recommended that students consult the solution until they have completed the exercise. 

## The FrozenLake environment

In this activity, we are going to solve the [Frozen Lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) environment.

Main characteristics:
- The game starts with the player at location [0,0] of the frozen lake grid world with the goal located at far extent of the world e.g. [3,3] for the 4x4 environment.
- Holes in the ice are distributed in set locations when using a pre-determined map or in random locations when a random map is generated.
- The player makes moves until they reach the goal or fall in a hole.
- The lake is slippery (unless disabled) so the player may move perpendicular to the intended direction sometimes (see _is_slippery_ param).

<img src="https://gymnasium.farama.org/_images/frozen_lake.gif" />

## Q-Learning algorithm

<u>Question 1</u>: : **Implement the *Q-Learning* algorithm** using the following parameters:

- Number of episodes = 15,000
- *learning rate* = 0.8
- *discount factor* = 1

Additionally, implement de **$\epsilon$-Greedy with decay factor** method with the following parameters:
- *epsilon* = 1.0
- *max_epsilon* = 1.0
- *min_epsilon* = 0.01
- *decay_rate* = 0.005

<u>Question 2</u>: Once you have coded the algorithm, try different **values for the hyperparameters** and comment the best ones (providing an empirical comparison):

- Number of episodes
- *learning rate* 
- *discount factor* 
- *epsilon* values (including min value and decay factor)

<u>Question 3</u>: Try to solve the same environment but using a _8 x 8_ grid (also in slippery mode):
> gym.make(ENV_NAME, desc=None, map_name="8x8", is_slippery=True)

In [1]:
import gymnasium as gym

# definig the environment
env = gym.make("FrozenLake-v1", desc=None, map_name="4x4", is_slippery=False)

print("Gymnasium version is {} ".format(gym.__version__))
print("Action space is {} ".format(env.action_space))
print("Observation space is {} ".format(env.observation_space))

Gymnasium version is 1.2.0 
Action space is Discrete(4) 
Observation space is Discrete(16) 


In [2]:
def epsilon_greedy_policy(Q, state, nA, epsilon):
    '''
    Create a policy where epsilon dictates the probability of a random action being carried out.

    :param Q: link state -> action value (dictionary)
    :param state: state in which the agent is (int)
    :param nA: number of actions (int)
    :param epsilon: possibility of random movement (float)
    :return: probability of each action (list) d
    '''

    probs = np.ones(nA)
    
    return probs


def QLearning(episodes, learning_rate, discount, epsilon):
    '''
    Learn to solve the environment using the Q-Learning algorithm

    :param episodes: Number of episodes (int)
    :param learning_rate: Learning rate (float [0, 1])
    :param discount: Discount factor (float [0, 1])
    :param epsilon: chance that random movement is required (float [0, 1])
    :return: x,y number of episodes and number of steps
    :Q: action value function
    '''

    # Link actions to states
    Q = defaultdict(lambda: np.zeros(env.action_space.n))

    return Q

<div class="alert alert-block alert-danger">
<strong>Solution</strong>
</div>