# About

This notebook is for testing and commentary.

For each model, there's a .py file which was used for its training.

In [1]:
import os
import random
import numpy as np
%matplotlib inline
from kaggle_environments import make, evaluate
from stable_baselines3 import PPO, DQN
from OneStepNegamaxQLearning import AgentDQN

In [2]:
MODEL_DIR = os.path.join(".","models")

# Utility functions

In [3]:
# divides board into 3 channels - https://www.kaggle.com/c/connectx/discussion/168246
# first channel: player 1 pieces
# second channel: player 2 pieces
# third channel: possible moves. 1 for player_1 and -1 for player_2
def transform_board(board, mark):
    rows = board[0].shape[0]
    columns = board[0].shape[1]

    layer1 = board[0].copy()
    for c in range(0, columns):
        for r in range(rows - 1, -1, -1):
            value = layer1[r, c]
            if value == 1:
                layer1[r, c] = 1
            else:
                layer1[r, c] = 0

    layer2 = board[0].copy()
    for c in range(0, columns):
        for r in range(rows - 1, -1, -1):
            value = layer2[r, c]
            if value == 2:
                layer2[r, c] = 1
            else:
                layer2[r, c] = 0

    layer3 = board[0].copy()
    for c in range(0, columns):
        for r in range(rows - 1, -1, -1):
            value = layer3[r, c]
            if value == 0:
                if (mark == 1):
                    layer3[r, c] = 1
                else:
                    layer3[r, c] = -1
                break
            else:
                layer3[r, c] = 0

    board = np.array([[layer1, layer2, layer3]])
    return board

def get_win_percentages(agent1, agent2, n_rounds=100):
    """
    Returns agent1's win percentage
    """
    # Use default Connect Four setup
    config = {'rows': 6, 'columns': 7, 'inarow': 4}
    # Agent 1 goes first (roughly) half the time          
    outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2)
    # Agent 2 goes first (roughly) half the time      
    outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1], config, [], n_rounds-n_rounds//2)]
    print("Agent 1 Win Percentage:", np.round(outcomes.count([1,-1])/len(outcomes), 4))
    print("Agent 2 Win Percentage:", np.round(outcomes.count([-1,1])/len(outcomes), 4))
    print("Percentage of Invalid Plays by Agent 1:", int(outcomes.count([None, 0])/n_rounds*100))
    print("Percentage of Invalid Plays by Agent 2:", int(outcomes.count([0, None])/n_rounds*100))
    
def agent(obs, config):
    board_2d = np.array(obs['board']).reshape(1,6,7)
    board_3c = transform_board(board_2d, obs.mark)
    col, _ = model.predict(board_3c, deterministic=True)
    return int(col)
    # Check if selected column is valid
    is_valid = (obs['board'][int(col)] == 0)
    # If not valid, select random move. 
    if is_valid:
        return int(col)
    else:
        return random.choice([col for col in range(config.columns) if obs.board[int(col)] == 0])

### PPO CNN vs random (Stable Baselines)

Trained by playing only vs the built-in random agent.

In [4]:
model = PPO.load(os.path.join(MODEL_DIR, 'ppo_cnn_vs_random'))
get_win_percentages(agent1=agent, agent2="random", n_rounds=1000)

Agent 1 Win Percentage: 0.969
Agent 2 Win Percentage: 0.028
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


### DQN CNN self-play (Stable Baselines)

Somewhat better. Still not 100% vs random.

In [5]:
model = DQN.load(os.path.join(MODEL_DIR, 'dqn_cnn_self_play'))
get_win_percentages(agent1=agent, agent2="random", n_rounds=1000)

Agent 1 Win Percentage: 0.966
Agent 2 Win Percentage: 0.033
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


### DQN One Step Negamax Q Learning (From scratch in PyTorch)

Solution inspired by https://www.kaggle.com/c/connectx/discussion/129145

In [6]:
osnql_agent = AgentDQN()
osnql_agent.load_policy_net('OneStepNegamaxQLearning/policy_net.pt')

In [7]:
get_win_percentages(osnql_agent.kaggle_agent, 'random', 1000)

Agent 1 Win Percentage: 0.99
Agent 2 Win Percentage: 0.004
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


Ahh. Still not 100%. So close though. Maybe random played 4 games really well? Let's try 10 000.

In [8]:
get_win_percentages(osnql_agent.kaggle_agent, 'random', 10_000)

Agent 1 Win Percentage: 0.9883
Agent 2 Win Percentage: 0.0065
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


Not good. There are many ways to improve this solution:

1) Look N steps into the future instead of 1 when calculating expected_state_action_values

2) Train vs a fixed opponent

3) Hardcode legal moves

4) Add MCTS(this likely creates a near-perfect solution)

5) Optimize hyperparameters

...

But I'd like to show you something else. I trained 2 more models with the identical settings. Let's see how well they do.

In [9]:
osnql_agent2 = AgentDQN()
osnql_agent2.load_policy_net('OneStepNegamaxQLearning/policy_net2.pt')
osnql_agent3 = AgentDQN()
osnql_agent3.load_policy_net('OneStepNegamaxQLearning/policy_net3.pt')

print('Agent 2')
get_win_percentages(osnql_agent2.kaggle_agent, 'random', 10_000)
print('Agent 3')
get_win_percentages(osnql_agent3.kaggle_agent, 'random', 10_000)

Agent 2
Agent 1 Win Percentage: 0.993
Agent 2 Win Percentage: 0.0032
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0
Agent 3
Agent 1 Win Percentage: 0.992
Agent 2 Win Percentage: 0.0042
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


Now let's define a new agent, which takes outputs from all 3 agents and sums them.

In [10]:
def three_musketeers(obs, config):
    state = osnql_agent.get_state(obs, obs.mark)
    q1 = osnql_agent.policy(state)
    q2 = osnql_agent2.policy(state)
    q3 = osnql_agent3.policy(state)
    q_c = q1 + q2 + q3
    return q_c.max(1)[1].view(1, 1).item()

In [11]:
get_win_percentages(three_musketeers, 'random', 10_000)

Agent 1 Win Percentage: 0.9989
Agent 2 Win Percentage: 0.0008
Percentage of Invalid Plays by Agent 1: 0
Percentage of Invalid Plays by Agent 2: 0


See? Perfect result. It won all 10 000 games. Same test vs the built-in negamax agent:

In [13]:
print('Agent 1')
get_win_percentages(osnql_agent.kaggle_agent, 'negamax', 100)
print('\nAgent 2')
get_win_percentages(osnql_agent2.kaggle_agent, 'negamax', 100)
print('\nAgent 3')
get_win_percentages(osnql_agent3.kaggle_agent, 'negamax', 100)
print('\nCombined')
get_win_percentages(three_musketeers, 'negamax', 100)

Agent 1
Agent 1 Win Percentage: 0.61
Agent 2 Win Percentage: 0.26
Percentage of Invalid Plays by Agent 1: 3
Percentage of Invalid Plays by Agent 2: 0

Agent 2
Agent 1 Win Percentage: 0.6
Agent 2 Win Percentage: 0.3
Percentage of Invalid Plays by Agent 1: 2
Percentage of Invalid Plays by Agent 2: 0

Agent 3
Agent 1 Win Percentage: 0.56
Agent 2 Win Percentage: 0.32
Percentage of Invalid Plays by Agent 1: 6
Percentage of Invalid Plays by Agent 2: 0

Combined
Agent 1 Win Percentage: 0.71
Agent 2 Win Percentage: 0.22
Percentage of Invalid Plays by Agent 1: 2
Percentage of Invalid Plays by Agent 2: 0
