# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [3]:
import gymnasium as gym
import random
import pandas as pd
from gymnasium.envs.toy_text.frozen_lake import generate_random_map

In [4]:
# Make maze
env = gym.make('FrozenLake-v1', desc=generate_random_map(size=10), render_mode='human')

initial_state = env.reset()

env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
action = 2
new_state, reward, terminated, truncated, info = env.step(action)

env.render()

In [5]:
env.close()

#### Reward System:
## Empty Space = -1 point
## Present = 100 points
## Hole = -100 points

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [8]:
# Learning rate - 0.5, discount - 0.5
# Bellman Equation: (1-alpha)q(s , a) + alpha(R + gamma(max(q(s' , a'))))
# Q Table Diagram

In [None]:
states = list(range(400))
actions = [0, 1, 2, 3]  # 0 = left, 1 = right, 2 = up, 3 = down
rewards = {(state, action): random.choice([-1, -100, 100]) for state in states for action in actions}

gamma = 0.9  # Discount factor
alpha = 0.1  # Learning rate
epsilon = 0.1  # Exploration rate

# Initialize Q-table
def initialize_q_table(states, actions):
    return pd.DataFrame(0, index=states, columns=actions)

Q = initialize_q_table(states, actions)

def get_next_state(state, action):
    if action == 0:  # Left
        return state - 1 if state % 20 != 0 else state
    elif action == 1:  # Right
        return state + 1 if state % 20 != 19 else state
    elif action == 2:  # Up
        return state - 20 if state >= 20 else state
    elif action == 3:  # Down
        return state + 20 if state < 380 else state
    return state

def get_reward(state, action):
    return rewards.get((state, action), 0)

def choose_action(state):
    if random.uniform(0, 1) < epsilon:
        return random.choice(actions)  # Exploration
    else:
        return Q.loc[state].idxmax()  # Exploitation

# Train Q-learning model
episodes = 500
for _ in range(episodes):
    state = random.choice(states[:-1])  # Start at a non-terminal state
    while state != 399:  # Assuming 399 is the terminal state
        action = choose_action(state)
        next_state = get_next_state(state, action)
        reward = get_reward(state, action)
        
        # Update Q-value
        best_future_q = Q.loc[next_state].max()
        Q.loc[state, action] = (1 - alpha) * Q.loc[state, action] + alpha * (reward + gamma * best_future_q)
        
        state = next_state

# Display final Q-table
print(Q)


  Q.loc[state, action] = (1 - alpha) * Q.loc[state, action] + alpha * (reward + gamma * best_future_q)
  Q.loc[state, action] = (1 - alpha) * Q.loc[state, action] + alpha * (reward + gamma * best_future_q)
  Q.loc[state, action] = (1 - alpha) * Q.loc[state, action] + alpha * (reward + gamma * best_future_q)
  Q.loc[state, action] = (1 - alpha) * Q.loc[state, action] + alpha * (reward + gamma * best_future_q)


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [None]:
# Test model here.

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.