# Exploring Markov Decision Processes (MDPs) Through a Gridworld Example

This notebook takes us through a gridworld explorer that allows us to see how states and actions affect transitions. This is also useful for observing reward dynamics and terminations, and helps us understand stochastic transitions in action. 

## Setup

In [1]:
import sys
import os
sys.path.append("../")
from utils.gridworld import Gridworld
from utils.gridworld_solver import value_iteration
from utils.plot_utils import plot_value_heatmap, plot_policy_arrows

import matplotlib.pyplot as plt
import numpy as np

%load_ext autoreload
%autoreload 2
    
env = Gridworld(size=4, start=0, goal=15, traps=[5, 11])


## Visualize the Grid

In [2]:
def draw_grid(state):
    grid = np.zeros((4, 4), dtype=str)
    grid[:, :] = "."
    grid[env.goal // 4, env.goal % 4] = "G"
    for trap in env.traps:
        grid[trap // 4, trap % 4] = "X"
    grid[state // 4, state % 4] = "A"

    print("\nGrid:")
    for row in grid:
        print(" ".join(row))

## Run Episode Manually

In [None]:
state = env.reset()
draw_grid(state)

for t in range(20):
    action = int(input("Action (0=up, 1=right, 2=down, 3=left): "))
    next_state, reward, done = env.step(action)
    draw_grid(next_state)
    print(f"Step {t+1} - Reward: {reward}")
    if done:
        print("Episode finished.")
        break

## Solve Using Value Iteration

In [None]:
V, policy = value_iteration(env)

In [None]:
# visualize results
plot_value_heatmap(V, env, title="Optimal State Values")
plot_policy_arrows(policy, env, title="Optimal Policy")