In [1]:
using Gen, Plots

## In this notebook we will implement a  Theory Of Mind agents on a version of Poker

Kuhn poker is an extremely simplified form of poker developed by Harold W. Kuhn as a simple model zero-sum two-player imperfect-information game, amenable to a complete game-theoretic analysis. In Kuhn poker, the deck includes only three playing cards, for example a King, Queen, and Jack. One card is dealt to each player, which may place bets similarly to a standard poker. If both players bet or both players pass, the player with the higher card wins, otherwise, the betting player wins.

#### Rules

In conventional poker terms, a game of Kuhn poker proceeds as follows:

Each player antes 1.
Each player is dealt one of the three cards, and the third is put aside unseen.
Player one can check or bet 1.
* If player one checks then player two can check or bet 1.
    * If player two checks there is a showdown for the pot of 2 (i.e. the higher card wins 1 from the other player).
    * If player two bets then player one can fold or call.
        * If player one folds then player two takes the pot of 3 (i.e. winning 1 from player 1).
        * If player one calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).
* If player one bets then player two can fold or call.
    * If player two folds then player one takes the pot of 3 (i.e. winning 1 from player 2).
    * If player two calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).

#### Optimal strategy


The game has a mixed-strategy Nash equilibrium; when both players play equilibrium strategies, the first player should expect to lose at a rate of −1/18 per hand (as the game is zero-sum, the second player should expect to win at a rate of +1/18). There is no pure-strategy equilibrium.

Kuhn demonstrated there are infinitely many equilibrium strategies for the first player, forming a continuum governed by a single parameter. In one possible formulation, player one freely chooses the probability $\alpha$ $\in$ [0,1/3] with which he will bet when having a Jack (otherwise he checks; if the other player bets, he should always fold). When having a King, he should bet with the probability of $3\alpha$ (otherwise he checks; if the other player bets, he should always call). He should always check when having a Queen, and if the other player bets after this check, he should call with the probability of $\alpha$ +1/3.

The second player has a single equilibrium strategy: Always betting or calling when having a King; when having a Queen, checking if possible, otherwise calling with the probability of 1/3; when having a Jack, never calling and betting with the probability of 1/3.

In [2]:
JACK = 1
QUEEN = 2
KING = 3
FULL_DECK = [JACK, QUEEN, KING]

3-element Array{Int64,1}:
 1
 2
 3

In [3]:
FOLD = 1
CHECK = 2
BET = 3
ACTIONS = [FOLD, CHECK, BET]

3-element Array{Int64,1}:
 1
 2
 3

In [4]:
NO_BET = 1
BET = 2
BETTING_STAGES = [NO_BET, BET]

2-element Array{Int64,1}:
 1
 2

In [5]:
transition_matrix = ones((length(FULL_DECK), length(ACTIONS), length(BETTING_STAGES))) / 3

3×3×2 Array{Float64,3}:
[:, :, 1] =
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333

[:, :, 2] =
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333

In [6]:
struct PokerPlayer
    name::String
    card::Int
    transition_matrix::Matrix
end

In [7]:
function reward(my_card::Int, buddy_card::Int)
    return my_card > buddy_card ? 1 : -1
end

reward (generic function with 1 method)

In [43]:
function agent(me::PokerPlayer, buddy::PokerPlayer, depth = 0, first_player = true)
    my_card = me.card
    buddy_card = buddy.card
    
    my_transition_matrix = me.transition_matrix
    buddy_transition_matrix = buddy.transition_matrix
    
    my_transition_vec = my_transition_matrix[my_card, :]
    
    p_buddy_fold, p_buddy_check, p_buddy_bet = buddy_transition_matrix[buddy_card, :]
    
    if depth > 0
        p_buddy_fold, p_buddy_check, p_buddy_bet = agent(buddy, me, depth -1)
    end
    
    # Computing expectimax
    r_fold = -1
    r_check_without_bet = p_buddy_fold * 1  + p_buddy_check * reward(my_card, buddy_card)
    r_check_with_bet = p_buddy_bet*(my_transition_vec[FOLD]*-1 + my_transition_vec[CHECK]*reward(my_card, buddy_card) * 2)
    r_check = r_check_without_bet + r_check_with_bet
    r_bet = p_buddy_fold * 1 + p_buddy_check * 2 * reward(my_card, buddy_card)
    
    print(r_bet)
    # Softmax
    q1, q2, q3 = exp(r_fold), exp(r_check), exp(r_bet)
    z = q1 + q2 + q3
    p_fold, p_check, p_bet = q1/z, q2/z, q3/z
    return p_fold, p_check, p_bet    
end

agent (generic function with 2 methods)

In [9]:
transition = ones((3,3)) / 3 

3×3 Array{Float64,2}:
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333
 0.333333  0.333333  0.333333

In [21]:
transition[:,1]

3-element Array{Float64,1}:
 0.3333333333333333
 0.3333333333333333
 0.3333333333333333

In [44]:
agent(PokerPlayer("alice", JACK, transition), PokerPlayer("bob", KING, transition) ,0)

-0.3333333333333333

(0.20427055865291674, 0.39786472067354167, 0.39786472067354167)

In [45]:
agent(PokerPlayer("alice", JACK, transition), PokerPlayer("bob", KING, transition) ,0)

-0.3333333333333333

(0.20427055865291674, 0.39786472067354167, 0.39786472067354167)