In [9]:
using Plots

In [14]:
using Gen, Memoize, Random

In [167]:
using StatsBase

## In this notebook we will implement a  Theory Of Mind agents on a version of Poker

Kuhn poker is an extremely simplified form of poker developed by Harold W. Kuhn as a simple model zero-sum two-player imperfect-information game, amenable to a complete game-theoretic analysis. In Kuhn poker, the deck includes only three playing cards, for example a King, Queen, and Jack. One card is dealt to each player, which may place bets similarly to a standard poker. If both players bet or both players pass, the player with the higher card wins, otherwise, the betting player wins.

#### Rules

In conventional poker terms, a game of Kuhn poker proceeds as follows:

Each player antes 1.
Each player is dealt one of the three cards, and the third is put aside unseen.
Player one can check or bet 1.
* If player one checks then player two can check or bet 1.
    * If player two checks there is a showdown for the pot of 2 (i.e. the higher card wins 1 from the other player).
    * If player two bets then player one can fold or call.
        * If player one folds then player two takes the pot of 3 (i.e. winning 1 from player 1).
        * If player one calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).
* If player one bets then player two can fold or call.
    * If player two folds then player one takes the pot of 3 (i.e. winning 1 from player 2).
    * If player two calls there is a showdown for the pot of 4 (i.e. the higher card wins 2 from the other player).

#### Optimal strategy


The game has a mixed-strategy Nash equilibrium; when both players play equilibrium strategies, the first player should expect to lose at a rate of −1/18 per hand (as the game is zero-sum, the second player should expect to win at a rate of +1/18). There is no pure-strategy equilibrium.

Kuhn demonstrated there are infinitely many equilibrium strategies for the first player, forming a continuum governed by a single parameter. In one possible formulation, player one freely chooses the probability $\alpha$ $\in$ [0,1/3] with which he will bet when having a Jack (otherwise he checks; if the other player bets, he should always fold). When having a King, he should bet with the probability of $3\alpha$ (otherwise he checks; if the other player bets, he should always call). He should always check when having a Queen, and if the other player bets after this check, he should call with the probability of $\alpha$ +1/3.

3The second player has a single equilibrium strategy: Always betting or calling when having a King; when having a Queen, checking if possible, otherwise calling with the probability of 1/3; when having a Jack, never calling and betting with the probability of 1/3.

### Constants

In [230]:
INVALID_VALUE = -100

-100

In [10]:
JACK = 1
QUEEN = 2
KING = 3
FULL_DECK = [JACK, QUEEN, KING]

3-element Array{Int64,1}:
 1
 2
 3

In [11]:
FOLD = -1
CHECK = 0
BET = 1
ACTIONS = [FOLD, CHECK, BET]

3-element Array{Int64,1}:
 -1
  0
  1

### Utility functions

In [176]:
function get_remaining_cards(card)
   return setdiff(FULL_DECK, [card])
end

get_remaining_cards (generic function with 1 method)

In [12]:
function reward(my_card::Int, buddy_card::Int)
    return my_card > buddy_card ? 1 : -1
end

reward (generic function with 1 method)

In [13]:
function reward(my_card::Int, buddy_card::Int, my_action::Int, buddy_action::Int)
    if my_action == FOLD
        return -1
    end
    if buddy_action == FOLD
        return 1
    end
    
    bet_factor = my_action == BET || buddy_action == BET ? 2 : 1
    
    return bet_factor * reward(my_card, buddy_card)
end

reward (generic function with 2 methods)

In [177]:
function expected_reward(my_card::Int, my_action::Int, buddy_action::Int, opp_cards_distribution=[0.5, 0.5])
    optional_buddy_cards = get_remaining_cards(my_card)
    rewards = map(optional_buddy_card -> reward(my_card, optional_buddy_card, my_action, buddy_action), optional_buddy_cards)
    expected_reward = rewards' * opp_cards_distribution
end

expected_reward (generic function with 2 methods)

### Some tests

In [103]:
expected_reward(JACK, CHECK, BET)

-2.0

In [104]:
expected_reward(JACK, FOLD, BET)

-1.0

In [105]:
expected_reward(QUEEN, FOLD, BET)

-1.0

In [106]:
expected_reward(QUEEN, CHECK, BET)

0.0

In [107]:
expected_reward(KING, FOLD, BET)

-1.0

In [108]:
expected_reward(KING, CHECK, BET)

2.0

#### We want to design an algorithm that infer the posterior for policy of check or bet, and the policy of fold or call

#### For each player we want to compute two vectors:
<ol>
    <li>Check or bet probability</li>
    <li>Fold or call probability</li>
</ol>

The rewards of fold or call is pretty much stright forward.<br>
Compute the expectation of reward given your card.<br>
Next stage can be based on computed cards distribution given opp_action is BET

In [191]:
@gen function fold_or_call(card, unused_flag)
    call = @trace(bernoulli(0.5), :call)    
    if call
        r = expected_reward(card, CHECK, BET)
    else
        r = expected_reward(card, FOLD, BET)
    end
    @trace(bernoulli(exp(r)), :r)
end;

The rewards of check or bet is less straight forward<br>
It's derived from the next optional buddy reaction to our action<br>
We can use opponent's fold_or_call model to derive the rewards in case we choose BET<br>
And use opponent's check_or_bet model to derive the rewards in case we choose CHECK and then our fold_or_call<br>

last_check argument controlד the amount of check_or_bet computation in a row (we want to cut it after 2 (rules of the game))

In [249]:
@gen function check_or_bet(card, last_check=false)
    bet = @trace(bernoulli(0.5), :bet)
    optional_buddy_cards = get_remaining_cards(card)
    if bet
        r = 0
        for opp_card = optional_buddy_cards
            
            # theory of mind about opponent's fold-cold policy
            calls = run_episodic(fold_or_call, opp_card, :call)
            call_ratio = sum(calls) / length(calls)
            
            # opp will call
            r_opp_will_call = reward(card, opp_card, BET, CHECK)

            # opp will fold
            r_opp_will_fold = reward(card, opp_card, BET, FOLD)
           
            r_opp_card = call_ratio * r_opp_will_call + (1-call_ratio)*r_opp_will_fold
            
            # expectation over cards, later we may can use learned probabilities
            r += 0.5 * r_opp_card
        end
    else
        r = 0
        for opp_card = optional_buddy_cards
            if last_check # cut the computation cycle
                r_opp_card = reward(card, opp_card, CHECK, CHECK)
            else # the opponent can check again or bet
                # theory of mind about opponent's check-bet policy
                bets = run_episodic(check_or_bet, opp_card, :bet, true)
                bet_ratio = sum(bets) / length(bets)

                
                # check if i will call or fold
                
                # theory of mind about mine fold-cold policy
                calls = run_episodic(fold_or_call, card, :call)
                call_ratio = sum(calls) / length(calls)
                
                # i will call
                r_opp_will_bet_i_will_call = reward(card, opp_card, CHECK, BET)
                
                # i will fold
                r_opp_will_bet_i_will_fold = reward(card, opp_card, FOLD, BET)
                
                r_opp_will_bet = call_ratio * r_opp_will_bet_i_will_call + (1-call_ratio)*r_opp_will_bet_i_will_fold
                # opp will bet
                r_opp_will_bet = reward(card, opp_card, CHECK, CHECK)
                
                # opp will check
                r_opp_will_check = reward(card, opp_card, CHECK, CHECK)

                r_opp_card = bet_ratio * r_opp_will_bet + (1-bet_ratio)*r_opp_will_check
            end
            r += 0.5 * r_opp_card
        end
        
    end
    @trace(bernoulli(exp(r)), :r)
end;

### Generic Gen infernce methods made by David in bob-alice-musings

In [187]:
@memoize function run_episodic(model, card, sym, last_check=false, niter=1000)
    observations = Gen.choicemap()
    observations[:r] = true

    trace, _ = Gen.generate(model, (card, last_check), observations)
    values = []
    for i = 1:niter
        trace, _ = Gen.mh(trace, select(sym))
        push!(values, get_choices(trace)[sym])
    end
    return values
end
empty!(memoize_cache(run_episodic));

In [188]:
function genepisodic(model, card, sym, last_check = false, niter=5000)
    nburn = niter%10
    values = run_episodic(model, card, sym, last_check, nburn + niter)[nburn+1:end]
    return sum(values)/length(values)
end;

In [189]:
function simulate(model, card, sym, last_check = false, niter=5000)
    nburn = niter%10
    values = run_episodic(model, card, sym, last_check, nburn + niter)[nburn+1:end]
    return sample(values)
end

simulate (generic function with 4 methods)

### Some tests

### for fold_or_call

In [148]:
genepisodic(fold_or_call, JACK, :call)

0.266

In [179]:
sum([simulate(fold_or_call, JACK, :call) for _ in 1:100]) / 100

0.21

In [149]:
genepisodic(fold_or_call, QUEEN, :call)

0.7324

In [180]:
sum([simulate(fold_or_call, QUEEN, :call) for _ in 1:100]) / 100

0.72

In [150]:
genepisodic(fold_or_call, KING, :call)

0.952

In [181]:
sum([simulate(fold_or_call, KING, :call) for _ in 1:100]) / 100

0.98

### for check_or_bet

In [199]:
genepisodic(check_or_bet, JACK, :bet)

0.3662

In [200]:
genepisodic(check_or_bet, QUEEN, :bet)

0.4278

In [201]:
genepisodic(check_or_bet, KING, :bet)

0.6222

### for check_or_bet - last_check

In [244]:
genepisodic(check_or_bet, JACK, :bet, true)

0.3832

In [245]:
genepisodic(check_or_bet, QUEEN, :bet, true)

0.4188

In [247]:
genepisodic(check_or_bet, KING, :bet, true)

0.619

### Simulators that run the inference and compute scores

In [235]:
function compute_policy_fold_or_call(card, last_check)
    p_call = genepisodic(fold_or_call, card, :call, last_check)
    return [1-p_call, p_call]
end

compute_policy_fold_or_call (generic function with 2 methods)

In [236]:
function compute_policy_check_or_bet(card, last_check)
    p_bet = genepisodic(check_or_bet, card, :bet, last_check)
    return [1-p_bet, p_bet]
end

compute_policy_check_or_bet (generic function with 2 methods)

In [215]:
function compute_policy(card, last_check, previous_player_betted=false)
    if previous_player_betted
        return compute_policy_fold_or_call(card, last_check)
    else
        return compute_policy_check_or_bet(card, last_check) 
    end
end

compute_policy (generic function with 3 methods)

In [216]:
function sample_check_bet(policy)
   sample([CHECK,BET], Weights(policy)) 
end

sample_check_bet (generic function with 1 method)

In [217]:
function sample_fold_call(policy)
   sample([FOLD,CHECK], Weights(policy)) 
end

sample_fold_call (generic function with 1 method)

In [224]:
function sample_action(policy, previous_player_betted=false)
    if previous_player_betted
        return sample_fold_call(policy)
    else
        return sample_check_bet(policy) 
    end
end

sample_action (generic function with 2 methods)

In [226]:
function poker_round(first_player_card, second_player_card, history)
    score = 0
    
    first_player_policy = compute_policy(first_player_card, false)
    first_player_move = sample_action(first_player_policy)
    
    second_player_policy  = compute_policy(second_player_card, first_player_move == CHECK, first_player_move == BET)
    second_player_move = sample_action(second_player_policy, first_player_move == BET)
    
    round_history = [((first_player_card, first_player_move, false), (second_player_card, second_player_move, first_player_move == BET))]
    
    doubled_pot = first_player_move == BET || second_player_move == BET
    if second_player_move == FOLD
        round_record = (round_history, 1)
        push!(history, round_record)
        return 1
    end
    if second_player_move == BET
        doubled_pot = true
        first_player_policy = compute_policy(first_player_card, second_player_move == CHECK, second_player_move == BET)
        first_player_move = sample_action(first_player_policy, second_player_move == BET)
        
        push!(round_history, ((first_player_card, first_player_move, false),(INVALID_VALUE, INVALID_VALUE, false)))
        if first_player_move == FOLD
            return -1
        end
    end
    if first_player_card > second_player_card
        score =  1 * (1 + doubled_pot)
    else
        score = -1 * (1 + doubled_pot)
    end
    round_record = (round_history, score)
    push!(history, round_record)
    return score
end

poker_round (generic function with 1 method)

In [227]:
function game(num_of_rounds = 10)
    total_score = 0
    history = []
    for i in 1:num_of_rounds
        first_player_card, second_player_card = sample(FULL_DECK, 2; replace=false)
        score = poker_round(first_player_card, second_player_card, history)
        total_score += score
    end
#     print(history)
    avg_score = total_score / num_of_rounds
    return avg_score
end

game (generic function with 2 methods)

In [257]:
game()

-0.3

In [258]:
game(100)

-0.13

In [259]:
game(1000)

-0.02

In [260]:
game(10000)

-0.0402

In [None]:
add Pluto

#### We can see the game are pretty much equal with a little bit disadvantage for the first player, as expeced

#### The next stage is to try to take the history of moves under condsideration and see how it affect the policies

In [267]:
using Pkg
Pkg.add("Pluto")

[32m[1m   Updating[22m[39m registry at `C:\Users\tomer\.julia\registries\General`
[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m TableIOInterface ─ v0.1.6
[32m[1m  Installed[22m[39m MsgPack ────────── v1.1.0
[32m[1m  Installed[22m[39m ExproniconLite ─── v0.6.9
[32m[1m  Installed[22m[39m Configurations ─── v0.15.4
[32m[1m  Installed[22m[39m TOML ───────────── v1.0.3
[32m[1m  Installed[22m[39m Pluto ──────────── v0.14.7
[32m[1mUpdating[22m[39m `C:\Users\tomer\.julia\environments\v1.5\Project.toml`
 [90m [c3e4b0f8] [39m[92m+ Pluto v0.14.7[39m
[32m[1mUpdating[22m[39m `C:\Users\tomer\.julia\environments\v1.5\Manifest.toml`
 [90m [5218b696] [39m[92m+ Configurations v0.15.4[39m
 [90m [55351af7] [39m[92m+ ExproniconLite v0.6.9[39m
 [90m [99f44e22] [39m[92m+ MsgPack v1.1.0[39m
 [90m [c3e4b0f8] [39m[92m+ Pluto v0.14.7[39m
 [90m [fa267f1f] [39m[92m+ TOML v1.0.3[39m
 [90m [d1efa939] [39m[92m+ TableIOInter

In [None]:
using Pluto
Pluto.run()


Opening http://localhost:1234/?secret=BSSSO9Dd in your default browser... ~ have fun!

Press Ctrl+C in this terminal to stop Pluto



┌ Error: Failed to resolve notebook boot environment
│   exception =
│    failed to clone from https://github.com/JuliaCollections/OrderedCollections.jl.git, error: GitError(Code:ERROR, Class:OS, failed to send request: The server name or address could not be resolved
│    )
│    Stacktrace:
│     [1] pkgerror(::String) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Types.jl:52
│     [2] clone(::Pkg.Types.Context, ::String, ::String; header::String, credentials::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:isbare,),Tuple{Bool}}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\GitTools.jl:153
│     [3] #ensure_clone#3 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\GitTools.jl:112 [inlined]
│     [4] install_git(::Pkg.Types.Context, ::Base.UUID, ::String, ::Base.SHA1, ::Array{String,1}, ::VersionNumber, ::String) at C:\buildbot\worker\package_win64\build\usr\share\