# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [3]:
import gymnasium as gym
import random
import pandas as pd
from gymnasium.envs.toy_text.frozen_lake import generate_random_map


In [31]:
# Make maze
cells =["S", "F", "F", "H","F","G"]
maze=["SFFHFFFFFH", "FHFFFHHFFF", "FFFHFHFFFH", "FHFFFFFFFF", "FFFFHFFHFH","FHFHFHFHFH","FFFFFFFFFF", "FHFFFFFFFF","HHFFFHFFFF","FFFFFFFFGF"]
env = gym.make('FrozenLake-v1', desc=maze)

initial_state = env.reset()

env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
action = 2
new_state, reward, terminated, truncated, info = env.step(action)

env.render()

  gym.logger.warn(


In [33]:
env.close()

#### Reward System:
## Empty Space = -1 point
## Present = 100 points
## Hole = -100 points

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [38]:
# Learning rate - 0.5, discount - 0.5
# Bellman Equation: (1-alpha)q(s , a) + alpha(R + gamma(max(q(s' , a'))))
# Q Table Diagram

In [40]:
q = {
    "Up": [0] * 100,
    "Down": [0] * 100,
    "Left": [0] * 100,
    "Right": [0] * 100
}
q_table = pd.DataFrame(q)
q_table

Unnamed: 0,Up,Down,Left,Right
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,0,0,0,0
...,...,...,...,...
95,0,0,0,0
96,0,0,0,0
97,0,0,0,0
98,0,0,0,0


In [42]:
def updateQ(q, alpha, gamma, step, cell, reward):
    row = [q[3][cell], q[1][cell], q[0][cell], q[2][cell]]
    bell = (1-alpha)*(q[step][cell]) + alpha*(reward + (gamma*max(row)))
    q[step][cell] = bell

In [44]:
# Train Q-Model
# Learning Rate - 0.5, Discount Rate - 0.5
# Reward: +100 gift, -1 empty space, -100 lake
# Belman Equation: (1-alpha)q(s,a) + alpha(R + gamma(max(s`, a`)))

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
for episode in range(1000):
    while not terminated:
        # Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        idx = new_state % len(q[3])
        if cells[new_state] == "F" or cells[new_state] == "S":
            reward = -1
        elif cells[new_state] == "H":
            reward = -100
        else:
            reward = 100
        updateQ(q, 0.5, 0.5, action, idx, reward)
    inital = env.reset()
    terminated = False

KeyError: 3

In [48]:
df = pd.DataFrame(q)
df.head()

Unnamed: 0,Up,Down,Left,Right
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,0,0,0,0


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [13]:
# Test model here.

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.