<h2>Environment: Windy Gridworld</h2>

The Windy Gridworld environment is composed of a class containing the maze itself, the possible actions allowed within the environment and helper functions to compute reards and trnsition probabilities of the possible moves available.

<h3>Environment Windy Gridworld: Maze</h3> 

The maze is represented as a 20x20 numpy array of integers where each elelment of the 2D array represents a position in the maze and the value of each element signifies whether the position can be occupied by the agent (i.e. position is not a wall) or not (i.e. the position is a wall).

$$\large maze[i,j]
    = 
    \begin{cases}
    -1\quad\,\,\,\,\,\,\,\normalsize\text{position is a wall} \\
    0\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall} 
    \end{cases}
$$

<h3>Environment Windy Gridworld: Wind Direction and Wind Strength</h3>

The windDirections object is represented as a numpy array having the same shape as the maze object. Each position in the windDirections 2D numpy array holds an integer representing the direction in which the wind is blowing.

$$\large windDirections[i,j]
    = 
    \begin{cases}
    0\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows N } \\
    1\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows S } \\
    2\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows W } \\
    3\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows E } 
    \end{cases}
$$

The windStrengths object is represented as a numpy array having the same shape as the maze object. Each position in the windStrengths 2D numpy array holds an integer representing the number of positions an agent will be moved by the wind in the direction in which it is blowing.

$$\large windStrengths[i,j]
    = 
    \begin{cases}
    0\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind is not blowing} \\
    1\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows with intensity 1 } \\
    2\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows with intensity 2 } \\
    3\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows with intensity 3 } \\
    4\quad\quad \,\,\,\,\,\,\normalsize\text{position is not a wall and wind blows with intensity 4 }
    \end{cases}
$$

<h3>Environment Windy Gridworld: Mask</h3>


The Windy Gridworld class contains a data member mask which is an $nxm$ numpy array that is passed to helper functions in [helpers.ipynb](helpers.ipynb) which plot the state values and policies.

<h3>Environment Windy Gridworld: State Space</h3>

Each white square in the maze represnts a state which is a random variable denoted as $S$.
Each observed state $s\in S$ is defined as an ordered pair $(x, y)$ which represents the position of the agent on the grid.

$$\large S\coloneqq\{(x, y): (x\in\mathbb{Z}) \cap (0\le x<n) \bigcap (y\in\mathbb{Z}) \cap (0\le y<m)\}$$


<h3>Environment Windy Gridworld: Action Space

Each action is a random variable which we denote with the symbol $A$ and there are four actions available to the agent. Action "0" moves the agent one step "Up", "1" moves the agent one step "Down", and so on and so forth for the "2"$\rightarrow$"Left" and "3"$\rightarrow$"Right" actions.


$\large A\coloneqq\{a\in\mathbb{Z}:\quad\, a=0\text{ when agent moves "Up",}$<br>
$\large\quad\quad\quad\quad\quad\quad\quad a=1\text{ when agent moves "Down",}$<br>
$\large\quad\quad\quad\quad\quad\quad\quad a=2\text{ when agent moves "Left",}$<br>
$\large\quad\quad\quad\quad\quad\quad\quad a=3\text{ when agent moves "Right"}\}$

<h3> Environment Windy Gridworld: Episodes and Time Steps


In Gridworld, an episode consists of a discrete sequence of time steps that occur from the time when the agent first begins, at the "Start" square on our grid, to the time at which our agent finally reaches the "Finish" square on the grid.
    
The time step is denoted by the variable $t$ and is set to zero at the beginning of the first episode. At this time our agent is located in the "Start" square on the grid.

The time step variable $t$ is then incremented by one after each action is take by our agent until it reaches the "Finish" sqaure on the grid.

The variable $T$ is used to represent the value of the time step $t$ upon which our agent reaches the "Finish" square on the grid and the episode ends.

If another episode is carried out, the agent returns to the "Start" square and the time step variable $t$ is set to $T+1$.

$$\large\text{Episode }\coloneqq\{S_{t}, A_{t+1}, R_{t+1}\}$$

<h3>Environment Gridworld: Rewards</h3>

At each time step the reward of an action taken is either -1 or 0 depending upon whether the next state is a finish line or non-finish line square on the grid. 

$$
    \large R_t
    =
    \begin{cases}
    0\quad\quad \Large s_{t+1}\normalsize =\text{ finish line square on the grid } \\
    -1\quad\, \Large s_{t+1}\normalsize\neq\text{ finish line square on the grid}
    \end{cases}
    $$

In [18]:
import numpy as np
import import_ipynb

In [22]:
class envWindyGridworld:
    def __init__(self, path_to_maze, path_to_windDirections, path_to_windStrengths, start_position, finish_position):
        
        # load maze from numpy file
        self._maze = np.load(path_to_maze, map_mode='r')
        
        # load windDirections 2d array from numpy file
        self._windDirections = np.load(path_to_windDirections, map_mode='r')

        # load windDirections 2d array from numpy file
        self._windStrengths = np.load(path_to_windStrengths, map_mode='r')

        # Mask maze walls (indicated by -1) for seaborn heatmap in helper funciton plot_values
        self._mask = (self._maze == -1)

        i_start, j_start = start_position
        # If start position is valid (i.e. not a wall and within the grid boundary limits),
        # then set the starting position
        if ( (self._maze[i_start, j_start] != -1) and (i_start >= 0) and (i_start < self._maze.shape[0]) and (j_start >= 0) and (j_start < self._maze.shape[1]) ):
            self._start_position = tuple((i_start, j_start))

        i_fin, j_fin = finish_position
        # If finish position is valid (i.e. not a wall and within the grid boundary limits),
        # then set the final position
        if ( (self._maze[i_fin, j_fin] != -1) and (i_fin >= 0) and (i_fin < self._maze.shape[0]) and (j_fin >= 0) and (j_fin < self._maze.shape[1]) ):
            self._finish_position = tuple((i_fin, j_fin))
         
        # Define the four actions available to choose from in the environment
            # 0 -> Up
            # 1 -> Down
            # 2 -> Left
            # 3 -> Right
        self._actions = (0, 1, 2, 3)
        
        # Create a 2D matrix of tuples representing the states (i.e. squares on the gird)
        self._states = np.zeros((self._maze.shape[0], self._maze.shape[1], 2), dtype=np.int32)
        for i in np.arange(self._maze.shape[0]):
            for j in np.arange(self._maze.shape[1]):
                self._states[i,j] = np.array((i,j))
        
    def reward(self, state, next_state):
        # Each step towards the finsih square has a reward of -1
        # Moving to the finish line square has a reward of 0
        if (next_state != self._finish_position):
            return -1
        return 0

    # <TH>import ipynb.fs.full.targetPolicyWindyGridworld as tPolicy

    # This is the conditional probability of the reward and next state given the current state and action
    # def p_sp_r_given_s_a(self, next_state, reward, state, action):
    #   return 1.0
    # </TH>

    # This function returns the next state after choosing an action from the current state
    # after checking to see if the action would result in hitting a maze wall resulting in the next
    # state being reset to the current state (i.e. state does not change).
    # This is also where the transition rules for "windy" poisitions on the maze are defined.
    def transitions(self, state, next_state, action):
        # declare a place holder for the reward obtained by making a move
        r = 1
        # declare a place holder for the next_state tuple
        next_state = state
        # declare a placeholder for the wind direction
        windDirection = 0
        # declare a placeholder for the wind strength
        windStrength = 0
        # declare a testing variable for while loops used below
        isDone = 0

        # If up action was taken
        if (action == 0):
            # Set the next state to the position above the current state
            next_state = [state[0]-1, state[1]]
            # If agent hits the upper boundary of the maze or hits wall from below
            if ( (state[0] == 0) or (self._maze[state[0]-1, state[1]] == -1) ):
                next_state = state
            # If agent lands on a non-wall posiion on the maze
            if (self._maze[state[0]-1, state[1]] > 0):
                windDirection = self._maze[state[0]-1, state[1]]
                # If the wind is blowing in the north direction
                if(windDirection == 0):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]-1, state[1]], action=0)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the east direction
                if(windDirection == 3):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]+1], action=1)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1           
                # If the wind is blowing in the south direction
                if(windDirection == 1):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]+1, state[1]], action=3)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the west direction
                if(windDirection == 2):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]-1], action=4)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1

        # If down action was taken
        if (action == 1):
            # Set the next state to the position below the current state
            next_state = [state[0]+1, state[1]]
            # If agent hits the bottom boundary of the maze or hit a wall from above
            if ( (state[0] == self._maze.shape[0]-1) or (self._maze[state[0]+1, state[1]] == -1) ):
                next_state = state
            # If agent lands on a non-wall position on the maze
            if (self._maze[state[0]+1, state[1]] > 0):
                windDirection = self._maze[state[0]+1, state[1]]
                # If the wind is blowing in the north direction
                if(windDirection == 0):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]-1, state[1]], action=0)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the east direction
                if(windDirection == 3):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]+1], action=3)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1           
                # If the wind is blowing in the south direction
                if(windDirection == 1):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]+1, state[1]], action=2)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the west direction
                if(windDirection == 2):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]-1], action=3)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1

        # if left action is taken
        if (action == 2):
            # Set the next state to the position to the left of the current state
            next_state = [state[0], state[1]-1]
            # If the agent hit the left boundary of maze or hit a wall from the right
            if ( (state[1] == 0) or (self._maze[state[0], state[1]-1] == -1) ):
                next_state = state
            # If agent lands on a non-wall position on the maze
            if (self._maze[state[0], state[1]-1] > 0):
                windDirection = self._maze[state[0], state[1]-1]
                # If the wind is blowing in the north direction
                if(windDirection == 0):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]-1, state[1]], action=0)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the east direction
                if(windDirection == 3):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]+1], action=3)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1           
                # If the wind is blowing in the south direction
                if(windDirection == 1):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]+1, state[1]], action=1)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the west direction
                if(windDirection == 2):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]-1], action=2)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1

        # If action to move right is taken
        if (action == 3):
            # Set the next state to the position to the right of the current state
            next_state = [state[0], state[1]+1]
            # Hit the right boundary of maze or hit a wall from the left
            if ( (state[1] == self._maze.shape[1]-1) or (self._maze[state[0], state[1]+1] == -1) ):
                next_state = state
            # If agent lands on a non-wall position on the maze
            if (self._maze[state[0], state[1]+1] > 0):
                windDirection = self._maze[state[0], state[1]+1]
                # If the wind is blowing in the north direction
                if(windDirection == 0):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]-1, state[1]], action=0)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the east direction
                if(windDirection == 3):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]+1], action=3)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1           
                # If the wind is blowing in the south direction
                if(windDirection == 1):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0]+1, state[1]], action=1)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1
                # If the wind is blowing in the west direction
                if(windDirection == 2):
                    while( not(isDone) ):
                        _, next_state = transitions(state=state, next_state=[state[0], state[1]-1], action=2)
                        if ( (state[0] == next_state[0]) and (state[1] == next_state[1]) ):
                            isDone = 1

        # Compute the reward for going to the next state
        r = self.reward(state, next_state)

        # <TH>
        # p = self.p_sp_r_given_s_a(next_state, r, state, action)
        # </TH>
        return next_state, r# <TH> , p </TH>
    
    # Getters and setters via property decorators
    @property
    def FinishPosition(self):
        return self._finish_position
    @FinishPosition.setter
    def FinishPosition(self, finish_position):
        i, j = finish_position
        print((i,j))
        if (self._maze[i,j] == -1):
            raise Exception("Tuple finish position is a wall in the maze")
            #print("Please choose a position that does not represent a wall in the maze")
        if(i < 0 or i > self._maze.shape[0] -1 or j < 0 or j > self._maze.shape[1]-1):
            raise Exception("Tuple finish_position is not valid")
        self._finish_position = tuple((i, j))
        return
    @property
    def StartPosition(self):
        return self._finish_position
    @StartPosition.setter
    def StartPosition(self, start_position):
        i, j = start_position
        print((i,j))
        if (self._maze[i,j] == -1):
            raise Exception("Tuple start position is a wall in the maze")
            #print("Please choose a position that does not represent a wall in the maze")
        if(i < 0 or i > self._maze.shape[0] -1 or j < 0 or j > self._maze.shape[1]-1):
            raise Exception("Tuple start_position is not valid")
        self._start_position = tuple((i, j))
        return
    @property
    def States(self):
        return self._states
    @property
    def Actions(self):
        return self._actions
    @property
    def Maze(self):
        return self._maze
    @property
    def Mask(self):
        return self._mask

<h2>Instantiate a Windy Gridworld object</h2>

In [12]:
# Set a finsih position
#fin_position = (0,4)

# Instantiate a Gridworld object
#ex_gridworld = envGridworld(path_to_maze='./maze2.npy', finish_position=fin_position)

In [13]:
# Get the maze data member
#ex_gridworld.Maze

In [14]:
# Get the maze actions available
#ex_gridworld.Actions

In [15]:
# Set and get the finish lize position on the grid
#ex_gridworld.finish_position = (0,0)
#ex_gridworld.finish_position

In [16]:
# Get the maze mask
#ex_gridworld.Mask

In [17]:
# Delete the gridworld object
#del ex_gridworld