In [43]:
using Gen, Plots

## In this notebook we will implement a  Theory Of Mind agents on a version of Poker

Kuhn poker is an extremely simplified form of poker developed by Harold W. Kuhn as a simple model zero-sum two-player imperfect-information game, amenable to a complete game-theoretic analysis. In Kuhn poker, the deck includes only three playing cards, for example a King, Queen, and Jack. One card is dealt to each player, which may place bets similarly to a standard poker. If both players bet or both players pass, the player with the higher card wins, otherwise, the betting player wins.

#### Rules

In conventional poker terms, a game of Kuhn poker proceeds as follows:

Each player antes 1.
Each player is dealt one of the three cards, and the third is put aside unseen.
Player one can check or bet 1.
* If player one checks then player two can check or bet 1.
    * If player two checks there is a showdown for the pot of 2 (i.e. the higher card wins 1 from the other player).
    * If player two bets then player one can fold or call.
        * If player one folds then player two takes the pot of 3 (i.e. winning 1 from player 1).
        * If player one calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).
* If player one bets then player two can fold or call.
    * If player two folds then player one takes the pot of 3 (i.e. winning 1 from player 2).
    * If player two calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).

#### Optimal strategy


The game has a mixed-strategy Nash equilibrium; when both players play equilibrium strategies, the first player should expect to lose at a rate of −1/18 per hand (as the game is zero-sum, the second player should expect to win at a rate of +1/18). There is no pure-strategy equilibrium.

Kuhn demonstrated there are infinitely many equilibrium strategies for the first player, forming a continuum governed by a single parameter. In one possible formulation, player one freely chooses the probability $\alpha$ $\in$ [0,1/3] with which he will bet when having a Jack (otherwise he checks; if the other player bets, he should always fold). When having a King, he should bet with the probability of $3\alpha$ (otherwise he checks; if the other player bets, he should always call). He should always check when having a Queen, and if the other player bets after this check, he should call with the probability of $\alpha$ +1/3.

The second player has a single equilibrium strategy: Always betting or calling when having a King; when having a Queen, checking if possible, otherwise calling with the probability of 1/3; when having a Jack, never calling and betting with the probability of 1/3.

In [44]:
JACK = 1
QUEEN = 2
KING = 3
FULL_DECK = [JACK, QUEEN, KING]

3-element Array{Int64,1}:
 1
 2
 3

In [45]:
FOLD = 1
CHECK = 2
BET = 3
ACTIONS = [FOLD, CHECK, BET]

3-element Array{Int64,1}:
 1
 2
 3

In [46]:
NO_BET_STAGE = 1
BET_STAGE = 2
BETTING_STAGES = [NO_BET_STAGE, BET_STAGE]

2-element Array{Int64,1}:
 1
 2

In [47]:
transition_matrix = ones((length(FULL_DECK), length(ACTIONS), length(BETTING_STAGES))) / 2
transition_matrix[:,FOLD, NO_BET_STAGE] .= 0
transition_matrix[:,BET,BET_STAGE] .= 0
transition_matrix

3×3×2 Array{Float64,3}:
[:, :, 1] =
 0.0  0.5  0.5
 0.0  0.5  0.5
 0.0  0.5  0.5

[:, :, 2] =
 0.5  0.5  0.0
 0.5  0.5  0.0
 0.5  0.5  0.0

In [48]:
struct PokerPlayer
    name::String
    card::Int
    transition_matrix::Array{Float64,3}
end

In [63]:
struct GameStage1
    bet_stage::Int
end

In [49]:
function reward(my_card::Int, buddy_card::Int)
    return my_card > buddy_card ? 1 : -1
end

reward (generic function with 1 method)

In [50]:
function agent(me::PokerPlayer, buddy::PokerPlayer, game_stage::GameStage1 depth = 0)
    my_card = me.card
    buddy_card = buddy.card
    
    my_transition_matrix = me.transition_matrix
    buddy_transition_matrix = buddy.transition_matrix
    
    my_transition_matrix_for_card = my_transition_matrix[my_card, :, game_stage.bet_stage]
    
    buddy_transition_matrix_for_card = buddy_transition_matrix[buddy_card, :, :]
    
    if depth > 0
        me_no_bet = agent(buddy, me, GameStage(NO_BET_STAGE), depth -1)
        me_bet =  agent(buddy, me, GameStage(BET), depth -1)
        buddy_transition_matrix = my_transition_matrix[CHECK]*me_no_bet + my_transition_matrix[bet]*me_bet
    end
    
    # Computing expectimax
    r_fold = -1
    
    if game_stage.bet_stage == NO_BET_STAGE
        r_check_without_bet = buddy_transition_matrix_for_card[FOLD][NO_BET_STAGE] * 1  + buddy_transition_matrix_for_card[CHECK][NO_BET_STAGE] * reward(my_card, buddy_card)
        r_check_with_bet = buddy_transition_matrix_for_card[BET, NO_BET_STAGE]*(my_transition_matrix_for_card[FOLD, BET_STAGE]*-1 + my_transition_matrix_for_card[CHECK, BET_STAGE]*reward(my_card, buddy_card) * 2)
        r_check = r_check_without_bet + r_check_with_bet
        r_bet = buddy_transition_matrix_for_card[FOLD, BET_STAGE] * 1 + buddy_transition_matrix_for_card[CHECK, NO_BET_STAGE] * 2 * reward(my_card, buddy_card)
        
        # Softmax
        _, q2, q3 = 0, exp(r_check), exp(r_bet)
        z = q2 + q3
        p_fold, p_check, p_bet = 0, q2/z, q3/z
    else  
        #BET Stage
        r_check = 2 * reward(my_card, buddy_card)
        
        # Softmax
        q1, q2, _ = exp(r_fold), exp(r_check), 0
        z = q1 + q2
        p_fold, p_check, p_bet = q1/z, q2/z, 0
    end
    return p_fold, p_check, p_bet    
end

agent (generic function with 2 methods)

In [52]:
agent(PokerPlayer("alice", JACK, transition_matrix), PokerPlayer("bob", KING, transition_matrix) ,0)

(0.29175596372884977, 0.22721977301778057, 0.48102426325336967)

In [53]:
agent(PokerPlayer("alice", JACK, transition_matrix), PokerPlayer("bob", KING, transition_matrix) ,0)

(0.29175596372884977, 0.22721977301778057, 0.48102426325336967)

In [54]:
agent(PokerPlayer("alice", JACK, transition_matrix), PokerPlayer("bob", QUEEN, transition_matrix) ,0)

(0.29175596372884977, 0.22721977301778057, 0.48102426325336967)

In [55]:
agent(PokerPlayer("alice", QUEEN, transition_matrix), PokerPlayer("bob", JACK, transition_matrix) ,0)

(0.05280640528926272, 0.3038798811878344, 0.6433137135229029)

In [56]:
agent(PokerPlayer("alice", QUEEN, transition_matrix), PokerPlayer("bob", KING, transition_matrix) ,0)

(0.29175596372884977, 0.22721977301778057, 0.48102426325336967)

In [57]:
agent(PokerPlayer("alice", KING, transition_matrix), PokerPlayer("bob", JACK, transition_matrix) ,0)

(0.05280640528926272, 0.3038798811878344, 0.6433137135229029)

In [58]:
agent(PokerPlayer("alice", KING, transition_matrix), PokerPlayer("bob", QUEEN, transition_matrix) ,0)

(0.05280640528926272, 0.3038798811878344, 0.6433137135229029)