
### Standard Grid and Negative Grid Environments

![Gridworld.](./figures/StandardGridWorldSmall.png)

For the gridworld problem both grids have

* three rows and four columns
* states that are characterized by a row and column index, and $(i,j)$ pair
* actions at each state are a subset of {left, right, down, up} 
* state (0,0) is the initial state, where the decision making agent starts
* state (1,1) is a barrier state, the agent cannot make decisions that move to this state
* states (1,3) and (2,3) are terminal states
* the decision to move **up** from state (0,3) has an immediate reward of -1, which results in the agent moving to the terminal state (1,3)
* the decision to move **right** from state (1,2) has an immediate reward of -1, which also results in the agent moving to the terminal state (1,3)
* the decision to move **right** from state (2,2) has an immediate reward of 1, which results in the agent moving to the terminal state (2,3)

For the standard grid 
* all other rewards are 0

For the negative grid
* all other rewards are -0.1


Import for standard grid

In [1]:
from rlgridworld.standard_grid import create_standard_grid

Create the Standard Grid

In [2]:
gw = create_standard_grid()

The cell below calls a function to list all the states in the created grid

1. Row index
2. Column index
3. Value
4. is_terminal flag
5. is_barrier flag

Note: State (1,1) is a barrier state and states (1,3) and (2,3) are terminal states.

In [4]:
gw.print_grid_state()

Row: 0, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 1, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 3, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 1, Value: 0, is_terminal: False, is_barrier: True
Row: 1, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 3, Value: 0, is_terminal: True, is_barrier: False
Row: 2, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 1, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 3, Value: 0, is_terminal: True, is_barrier: False


The cell below calls a function to list the rewards associated with decisions (or actions) at each state.  The reward **None** is listed if the decision is prohibited.  The agent cannot move off the grid or into barrier states.

1. Row index
2. Column index
3. Reward for moving left
4. Reward for moving right
5. Reward for moving down
6. Reward for moving up

Note in particular the -1 reward for moving **up** from state (0,3) to (1,3).  Similarily, note the -1 reward for moving **right** from (1,2) to (1,3).  Lastly, note that moving **right** from (2,2) to (2,3) results in a reward of 1. 

In [5]:
gw.print_grid_rewards()

Row: 0, Column: 0, Left: None, Right: 0.0, Down: None, Up: 0.0
Row: 0, Column: 1, Left: 0.0, Right: 0.0, Down: None, Up: None
Row: 0, Column: 2, Left: 0.0, Right: 0.0, Down: None, Up: 0.0
Row: 0, Column: 3, Left: 0.0, Right: None, Down: None, Up: -1.0
Row: 1, Column: 0, Left: None, Right: None, Down: 0.0, Up: 0.0
Row: 1, Column: 1, Left: None, Right: None, Down: None, Up: None
Row: 1, Column: 2, Left: None, Right: -1.0, Down: 0.0, Up: 0.0
Row: 1, Column: 3, Left: None, Right: None, Down: None, Up: None
Row: 2, Column: 0, Left: None, Right: 0.0, Down: 0.0, Up: None
Row: 2, Column: 1, Left: 0.0, Right: 0.0, Down: None, Up: None
Row: 2, Column: 2, Left: 0.0, Right: 1.0, Down: 0.0, Up: None
Row: 2, Column: 3, Left: None, Right: None, Down: None, Up: None


Import for negative grid

In [9]:
from rlgridworld.standard_grid import create_negative_grid

Create the Negative Grid

In [10]:
gw = create_negative_grid()

Print states, which are the same as for the standard grid.

In [11]:
gw.print_grid_state()

Row: 0, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 1, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 0, Column: 3, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 1, Value: 0, is_terminal: False, is_barrier: True
Row: 1, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 1, Column: 3, Value: 0, is_terminal: True, is_barrier: False
Row: 2, Column: 0, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 1, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 2, Value: 0, is_terminal: False, is_barrier: False
Row: 2, Column: 3, Value: 0, is_terminal: True, is_barrier: False


Print the rewards. The change is that all the 0.0 rewards are now -0.1 to penalilze movement.  The effect is to find the shortest path to a terminal state, since it will minimize the penalty.

In [12]:
gw.print_grid_rewards()

Row: 0, Column: 0, Left: None, Right: -0.1, Down: None, Up: -0.1
Row: 0, Column: 1, Left: -0.1, Right: -0.1, Down: None, Up: None
Row: 0, Column: 2, Left: -0.1, Right: -0.1, Down: None, Up: -0.1
Row: 0, Column: 3, Left: -0.1, Right: None, Down: None, Up: -1.0
Row: 1, Column: 0, Left: None, Right: None, Down: -0.1, Up: -0.1
Row: 1, Column: 1, Left: None, Right: None, Down: None, Up: None
Row: 1, Column: 2, Left: None, Right: -1.0, Down: -0.1, Up: -0.1
Row: 1, Column: 3, Left: None, Right: None, Down: None, Up: None
Row: 2, Column: 0, Left: None, Right: -0.1, Down: -0.1, Up: None
Row: 2, Column: 1, Left: -0.1, Right: -0.1, Down: None, Up: None
Row: 2, Column: 2, Left: -0.1, Right: 1.0, Down: -0.1, Up: None
Row: 2, Column: 3, Left: None, Right: None, Down: None, Up: None


In [None]:
policy = { 
    (0,0):'up', (0,1):'right',(0,2):'right',(0,3):'up',
    (1,0):'up', (1,1):'', (1,2):'right', (1,3):'',
    (2,0):'right', (2,1):'right', (2,2):'right', (2,3):''
    }


This section defines the initial policy for our GridWorld environment. The policy is represented as a dictionary where:

- Keys: Tuple coordinates (i,j) representing each state in the 3×4 grid
- Values: String representing the action to take at each state
  - 'up': Move upward
  - 'right': Move rightward
  - '': Empty string for terminal states (1,3), (2,3) and barrier state (1,1)

- Starting state (0,0): Move up
- States (0,1), (0,2): Move right
- State (0,3): Move up (leads to terminal state with -1 reward)
- State (1,0): Move up
- State (1,2): Move right (leads to terminal state with -1 reward)
- States (2,0), (2,1), (2,2): Move right (2,2 leads to terminal state with +1 reward)

This policy represents one possible solution path through the GridWorld environment.

In [14]:
gw.print_values()
gw.print_policy(policy)

-------------------------------------
|   0.00 |   0.00 |   0.00 |   0.00 |
-------------------------------------
|   0.00 |   0.00 |   0.00 |   0.00 |
-------------------------------------
|   0.00 |   0.00 |   0.00 |   0.00 |
-------------------------------------
-------------------------------------
|  Right |  Right |  Right |        |
-------------------------------------
|     Up |        |  Right |        |
-------------------------------------
|     Up |  Right |  Right |     Up |
-------------------------------------
