## BTTT Environment Tutorial

This IPython Notebook seeks to explain the basic functions of the BTTT gym environment and how to use it!

# Initializaing
To initialize the gym environment, run `gym()` to initialize an environment of empty grids.

In [1]:
from gym import *
env = gym()

# Visualizing
Visualize the gym environment by using `env.printboard()`.

In [2]:
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | | | | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


# Placing the starting stone
The starting stone can be placed by using `env.place(grid, stone)`.

This makes a move with the desired stone at (grid[0], grid[1]) without changing the player turn.

Stone values mapped by:
- 0: ' ' (blank)
- 1: 'O' (Player 1 piece)
- 2: 'X' (Player 2 piece)
- 3: 'B' (Brick block)

Note the index for the rows and columns starts at 0, so you can enter any valid index from 0 to 6 for a 7x7 grid.

Below, we place the starting stone at the *D4* position, just like in Variant 1.

In [3]:
env.place((3,3), 3)
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


# Get list of valid moves
The valid moves in the environment can be obtained by using `env.moves`, which returns a list of valid move indices. The valid moves are numbered from 0 to 48, which are basically the position of the grid cells counting from left to right, then top to bottom.

Here, we can see move 24 is missing from the selection, as it is currently occupied by the brick. Any cell that is filled with Player 1's pieces (O), Player 2's pieces (X) or a brick block (B) will not be in the list of valid moves.

In [4]:
env.moves

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48]

# Get other details for the current board state

- `env.state` tells you the contents in the cell positions, in accordance to the stone value mapping given earlier.
- `env.turn` tells you the turn of the current player to move, 1 for Player 1, 2 for Player 2. At start state, it is Player 1's turn to move.
- `env.reward` gives a 1 if Player 1 wins, -1 if Player 2 wins, and 0 if it is a draw or the game is not won yet.
- `env.done` gives a value of `True` if the game is won by any player or is a draw, `False` otherwise.

In [5]:
env.state

array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 3, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

In [6]:
env.turn

1

In [7]:
env.reward

0

# Make an action

In order to make an action in the environment, use `env.step(action)`.

Note that this action index must be one of the valid moves, it not the environment will not process the move and return `Invalid Move`.

After making an action, `env.turn` will be updated to the next player.
Also, if the game is won, or it is a draw, `env.reward` and `env.done` will be updated accordingly.

Here, we make an action at the top left corner of the board (index 0).

In [8]:
env.step(0)

# Visualizing what happens after an action

Now, we will see what has changed after our action. We will look at the following:
- board state: The top left square has been updated to `1`. We can also visualize this using `printboard()`.
- turn: This has been updated to `2`, indicating Player 2's turn.
- reward: There is no change, since the game is not won or drawn yet.
- done: There is no change, since the game is not won or drawn yet.

In [9]:
env.state

array([[1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 3, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

In [10]:
env.printboard()

A|O| | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


In [11]:
env.turn

2

In [12]:
env.reward

0

In [13]:
env.done

False

# Playing the game till completion

Let us now play a game till completion.

Note the `env.reward` and `env.done` have been updated to `1` and `True` respectively, indicating Player 1's win.

In [14]:
# Player 2 plays at index 1
env.step(1)
env.printboard()
# Player 1 plays at index 7
env.step(7)
env.printboard()
# Player 2 plays at index 2
env.step(2)
env.printboard()
# Player 1 plays at index 14
env.step(14)
env.printboard()
# Player 2 plays at index 3
env.step(3)
env.printboard()
# Player 1 plays at index 21, winning the game
env.step(21)
env.printboard()

A|O|X| | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|
A|O|X| | | | | |
B|O| | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|
A|O|X|X| | | | |
B|O| | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|
A|O|X|X| | | | |
B|O| | | | | | |
C|O| | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|
A|O|X|X|X| | | |
B|O| | | | | | |
C|O| | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|
A|O|X|X|X| | | |
B|O| | | | | | |
C|O| | | | | | |
D|O| | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


In [15]:
env.reward

1

In [16]:
env.done

True

# Resetting the game
In order to reset to the initial state (blank grid), use `env.reset()`.

In [17]:
env.reset()
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | | | | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


# Resetting to a specific game state
If you would like to reset to a board with an initial state (e.g. with the stone in the centre), you can change the initialstate parameter of `gym`.

In [18]:
initialstate = np.zeros((7, 7))
initialstate[3, 3] = 3
print(initialstate)

[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 3. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]


In [19]:
env.initialstate = initialstate

In [20]:
env.reset()
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


You can also initialize the gym environment with a specific initial state in order to do the same as the code above.

In [21]:
initialstate = np.zeros((7, 7))
initialstate[3, 3] = 3
env = gym(state = initialstate)
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


# Playing a random action
In choose to play a random valid action, e.g. for random rollouts, use `env.sample()`.

This does the same as `np.random.choice(env.moves)`.

In [22]:
action = env.sample()
env.step(action)
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | |O| |
F| | | | | | | |
G| | | | | | | |
 ________________
 |1|2|3|4|5|6|7|


# A fully random game
Let us play a game till completion!

Keep running the next cell as many times as you like to see the various outcomes of Random vs Random!

In [23]:
env.reset()

while not env.done:
    action = env.sample()
    env.step(action)

env.printboard()
if env.reward == 1:
    print('Player 1 (O) won!')
elif env.reward == -1:
    print('Player 2 (X) won!')
else:
    print('It is a draw!')

A| | |X|O| |O| |
B|X|O| |X|X|O|O|
C| |O| | |O| |X|
D|O|X| |B| | |O|
E| |X| |O| |O| |
F|O|X|X| |X|X| |
G| |X| | |O|X| |
 ________________
 |1|2|3|4|5|6|7|
Player 2 (X) won!


# Extracting the key contents of the environment

If you would need to extract out the key contents of the environment to store elsewhere for retrieval later on (e.g. to store game state in a game tree node), you can use `env.get_state()`.

This returns the following:
- `env.state`
- `env.turn`
- `env.reward`
- `env.done`

In [24]:
# This gives a particular board state after Player 1's move
env.reset()
env.step(env.sample())
env.printboard()

state, turn, reward, done = env.get_state()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


# Reloading back the game state

To reload the game state, initialize the gym environment with the game state parameters

In [25]:
env.reset()
env = gym(state = state, turn = turn, reward = reward, done = done)
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


# Duplicating an exact copy of the environment

If you would need to duplicate an exact copy of the environment to store elsewhere for retrieval later (e.g. to store game state in a game tree node), you can use `env.get_env()`

This is a `deepcopy` of the environment, so whatever changes you make in either environment will not affect the other.

You can use this to do parallel playthroughs of the game.

In [26]:
newenv = env.get_env()
newenv.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


In [27]:
# Play some other random moves in original environment
env.step(env.sample())
env.step(env.sample())
env.printboard()

A| | | | | | | |
B| | | | | | | |
C| |X| | | | | |
D| |O| |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


In [28]:
# The new environment is not affected by changes in the original environment
newenv.printboard()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


# Advanced: Getting the tensor representation of the environment

If you would need to use deep learning methods for your agent, and would require a tensor representation of the environment, there is an in-built function to do that.

It converts the game state into 4 7x7 planes:
- Plane 1: Turn of Player (Entire plane of 1 for Player 1, entire plane of 0 for Player 2)
- Plane 2: Position of Player 1's (O) pieces (1 for grid cell which has Player 1's piece, 0 otherwise)
- Plane 3: Position of Player 2's (X) pieces (1 for grid cell which has Player 2's piece, 0 otherwise)
- Plane 4: Position of brick blocks (B) (1 for grid cell which has a brick block, 0 otherwise)

This tensor form is in a numpy array format, so you would still have to convert it into the tensor form of your desired deep learning framework (e.g. PyTorch, TensorFlow)

In [29]:
newenv.printboard()
tensor_state = newenv.get_tensor_state()

A| | | | | | | |
B| | | | | | | |
C| | | | | | | |
D| | | |B| | | |
E| | | | | | | |
F| | | | | | | |
G| | | | |O| | |
 ________________
 |1|2|3|4|5|6|7|


In [30]:
tensor_state

array([[[-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1., -1., -1.]],

       [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.,  0.]],

       [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]],

       [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
    

# Conclusion

We have come to the end of the BTTT gym environment tutorial!

We hope you are now able to use the BTTT gym environment to construct your own agent to play the game of BTTT.

Do note that in the BTTT tournament `main.py`, the agent takes in the env itself as input, and should output a valid action index.