<a href="https://colab.research.google.com/github/tawsifkamal/Q-learning/blob/main/FrozenLake.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Reinforcement Learning in FrozenLake**
#### In this notebook, we will use a basic Reinforcement learning Q-learning algorithm to train an agent to play open AI Gym's FrozenLake game.

**The steps to implement the algorithm are as follows:**

1. Initialize the FrozenLake environment, define the global variables and hyperparameters, and create an empty Q-table.

2. Iterate thorough 10000 episodes and 100 time steps in order to train the RL agent by updating the Q-values in the Q-table through the Bellman's Optimality Equation.

3. Play the game by selecting the optimal actions from the updated Q-table and watch the agent win! 

### **1. Setting up the Environment, global variables, and Q-table**  

In [1]:
# import the modules needed
import numpy as np
import gym
import random 
import time 
from IPython.display import clear_output

In [2]:
# initializing the environment and making the q_table
env = gym.make('FrozenLake-v0')
action_space = env.action_space.n
state_space = env.observation_space.n
q_table = np.zeros((state_space, action_space))

In [3]:
#Check for shape
q_table.shape

(16, 4)

In [4]:
# Setting the hyperparameters 
num_episodes = 10000
rewards_all_episodes = []
exploration_rate = 1
exploration_decay_rate = 0.001
max_exploration_rate = 1
min_exploration_rate = 0.01
lr = 0.1
discount_rate = 0.99 

### **2. Training the RL agent**

In [5]:
for episode in range(num_episodes):
  state = env.reset()
  done = False 

  reward_current_episode = 0

  for step in range(100):

    # Exploration-exploitation trade-off
    threshold = random.uniform(0, 1)
    if threshold > exploration_rate:
      action = np.argmax(q_table[state, :])
    else:
      action = env.action_space.sample()
    
    #taking the action
    new_state, reward, done, info = env.step(action)

    # Updating the q_values 
    q_table[state, action] = (1 - lr) * q_table[state, action] + lr * (reward + discount_rate * np.max(q_table[new_state, :]))

    state = new_state 
    reward_current_episode += reward

    if done: 
      break
    
  # Eploration decay formula
  exploration_rate = min_exploration_rate + (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate * episode)

  rewards_all_episodes.append(reward_current_episode)

In [6]:
# Displaying the metrics
reward_per_thousand_episodes = np.split(np.array(rewards_all_episodes), num_episodes/1000)
count = 1000
print('\n ***** Average number of rewards per one thousnad episodes ***** \n')
for reward in reward_per_thousand_episodes:
  print(count, ': ', str(sum(reward)/1000))
  count += 1000

print(' \n ****** Updated Q-table ******')
print(q_table)


 ***** Average number of rewards per one thousnad episodes ***** 

1000 :  0.026
2000 :  0.121
3000 :  0.383
4000 :  0.559
5000 :  0.638
6000 :  0.65
7000 :  0.665
8000 :  0.7
9000 :  0.675
10000 :  0.679
 
 ****** Updated Q-table ******
[[0.54268927 0.48679449 0.49175608 0.48594515]
 [0.23278903 0.23265411 0.18026242 0.4385909 ]
 [0.35750983 0.24838672 0.18525946 0.22710666]
 [0.18628561 0.01033244 0.00754911 0.01253793]
 [0.55792456 0.39664852 0.33856906 0.36500834]
 [0.         0.         0.         0.        ]
 [0.21612097 0.14013744 0.31725624 0.08513492]
 [0.         0.         0.         0.        ]
 [0.40508375 0.29192378 0.40869606 0.5908078 ]
 [0.40037599 0.63823276 0.46624761 0.36434178]
 [0.63322769 0.45112699 0.33991055 0.27445116]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.41654935 0.47333203 0.75989377 0.57138342]
 [0.73903394 0.84561353 0.74001725 0.76299689]
 [0.         0.         0.         0.        ]]


### **3. Agent playing the game**

In [7]:
# Playing the game 
for episode in range(3): 
  state = env.reset()

  for step in range(100): 
    clear_output(wait=True)
    time.sleep(1)
    action = np.argmax(q_table[state, :])
    new_state, reward, done, info = env.step(action)
    state = new_state
    env.render()

    if reward == 1: 
      print('*****You won the game!*****')
      break

    elif done == True:
      print('*****You lost oh no!*****')
      time.sleep(1)
      break 

  (Down)
SFFF
FHFH
FFFH
HFF[41mG[0m
*****You won the game!*****
