# RPS - Rock Paper Scissors Agent - using PPL

In this notebook I will show an expirment of RPS game simulation.
I will use two players:
<ol>
    <li>Simple Player - playing according to Categorical seed (alpha vector)</li>
    <li> Inferencing Player - models the opponent as a Probalistic Program and by observations trying to infer the latent alpha vector.<br>
        Using this infered vector the player will try to exploit the simple player.<br>
</ol>
    

In [16]:
import numpy as np
import pymc3 as pm

### Simple Player
The simple player creates a categorical distribution (with dirichlet prior) and a given alpha vector and returns <num_of_samples> samples from this distirbution

### Smart Player

Infercing Player, takes the toolset of MCMC infernce. Each round this player takes the moves of the simple players as observations, and uses the same Probabalistic Model to posterior inference

### RPS Model
Probabalistic Program, describes the decision process.<br>
    $dir \sim Dirichlet(\alpha_1, \alpha_2, \alpha_3)$<br>
    $Action \sim Categorical(dir)$
    


In [17]:
def rps_player_model(alpha=[1, 1, 1], observed=None):
    with pm.Model() as model:
        dirichlet = pm.Dirichlet('dirichlet', a=alpha)
        phi = pm.Categorical('phi', p=dirichlet, observed=observed)
        return model

### Posterior Infernce
using PPL infernce - Metropolis Hasting Algorithm due to the fact that the distribution is discrete

In [32]:
def infer(model):
    with model:
        trace = pm.sample(step=pm.Metropolis(), model=model, return_inferencedata=True, progressbar=False)
        return trace

### Predictive Posterior Sampling
sampling from the posterior and return the most common action at each stage<br>
The smart player using this sampling to play

In [33]:
def sample_from_posterior(model, trace):
    with model:
        posterior_pred = pm.sample_posterior_predictive(trace, progressbar=False)
        median_over_samples = np.median(posterior_pred['phi'], axis=0)
        return median_over_samples

### Sampling from the model without observations
The simple player using this sampling.<br> given alpha vector it's draw samples from the distribtuion

In [20]:
def sample_from_prior(model, num_of_samples):
    with model:
        samples = pm.sample_prior_predictive(num_of_samples)['phi']
        return samples

### Simulation
In the expirement we will look at those two player playing. and will examine the results

### Some aux function for RPS
Rock Paper Scissors is popular game.
With 3 Actions (Rock , Paper , Scissors) each action lose and wins exactly other action

In [21]:
def beats(i):
    return (i + 1) % 3

In [22]:
from enum import IntEnum

class RPS(IntEnum):
    ROCK = 0,
    PAPER = 1,
    SCISSORS = 2
    
def get_result(first_player, second_player):
    if first_player == second_player:
        return 0
    elif (first_player == RPS.ROCK and second_player == RPS.SCISSORS) or (
            first_player == RPS.PAPER and second_player == RPS.ROCK) or (
            first_player == RPS.SCISSORS and second_player == RPS.PAPER):
        return 1
    else:
        return -1

### Simulator
We run <num_of_simulations> simulations.<br>
Each simulation the simple player plays number of actions from the Probalistic Distribution with alpha vector as parameter.<br>
The Smart player infer about the observations of the previous round and suggest <num_of_samples> action.<br> The Simple player does the same with constant distribution and the simulator compare the results and update the number.<br>
In the end we look at the expactation of each player to win and the ties.<br>
We check if the smart player is realy "smarter" then the simple player

In [34]:
def simulate_with_latent_alpha(num_of_simulations=10, alpha=[1, 1, 1]):
    total_smart_player_wins = 0
    total_simple_player_wins = 0
    total_ties = 0

    simple_player_observation = []

    for i in range(num_of_simulations):
        # Learning phase
        simple_player = rps_player_model(alpha=alpha)

        if len(simple_player_observation) == 0:
            simple_player_observations = sample_from_prior(simple_player, num_of_samples=10)

        # gets a list of observed values and returns the distribution of probable action
        smart_player = rps_player_model(observed=simple_player_observations)
        trace = infer(smart_player)

        smart_player_next_moves = sample_from_posterior(smart_player, trace)
        smart_player_next_moves = list(map(beats, smart_player_next_moves))

        # Evaluation phase
        simple_player_next_moves = sample_from_prior(simple_player, num_of_samples=10)

        smart_player_wins = 0
        simple_player_wins = 0
        ties = 0

        for j in range(len(simple_player_next_moves)):
            result = get_result(smart_player_next_moves[j], simple_player_next_moves[j])
            if result > 0:
                smart_player_wins += 1
            elif result < 0:
                simple_player_wins += 1
            else:
                ties += 1
        total_smart_player_wins += smart_player_wins
        total_simple_player_wins += simple_player_wins
        total_ties += ties
        print(f'in simulation {i}: wins: {smart_player_wins}, loses: {simple_player_wins}, ties: {ties}')
    print(
        f'For opponent\'s alpha vector: {alpha} averages in all simulations is wins: {total_smart_player_wins / num_of_simulations} '
        f' loses:{total_simple_player_wins / num_of_simulations} ties:{total_ties / num_of_simulations}')

### Expirements
I will check the results of different alpha played by the simple player

alpha = [1, 10, 10] (playing less Rock)

In [None]:
simulate_with_latent_alpha(alpha=[1, 10, 10])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 1, loses: 7, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]


In [None]:
simulate_with_latent_alpha(alpha=[10, 6, 1])

In [29]:
simulate_with_latent_alpha(alpha=[1, 6, 1])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 7, loses: 1, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 1: wins: 10, loses: 0, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 2: wins: 5, loses: 4, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 3: wins: 8, loses: 2, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 4: wins: 8, loses: 0, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 5: wins: 7, loses: 3, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 6: wins: 8, loses: 0, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 7: wins: 4, loses: 2, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 8: wins: 7, loses: 2, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 19 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 9: wins: 4, loses: 2, ties: 4
For opponent's alpha vector: [1, 6, 1] averages in all simulations is wins: 6.8  loses:1.6 ties:1.6


In [30]:
simulate_with_latent_alpha(alpha=[1, 3, 5])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 0: wins: 2, loses: 0, ties: 8


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 1: wins: 2, loses: 2, ties: 6


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 2: wins: 7, loses: 2, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 3: wins: 5, loses: 1, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 4: wins: 6, loses: 4, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 5: wins: 9, loses: 1, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 6: wins: 5, loses: 0, ties: 5


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 7: wins: 4, loses: 5, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 8: wins: 4, loses: 6, ties: 0


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 9: wins: 5, loses: 0, ties: 5
For opponent's alpha vector: [1, 3, 5] averages in all simulations is wins: 4.9  loses:2.1 ties:3.0


In [31]:
simulate_with_latent_alpha(alpha=[1, 1, 1])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 2, loses: 1, ties: 7


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 1: wins: 5, loses: 3, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 2: wins: 2, loses: 4, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 3: wins: 3, loses: 5, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 4: wins: 2, loses: 2, ties: 6


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 20 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 5: wins: 6, loses: 2, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 6: wins: 3, loses: 3, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 26 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 7: wins: 3, loses: 4, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 8: wins: 3, loses: 4, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 21 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 9: wins: 2, loses: 5, ties: 3
For opponent's alpha vector: [1, 1, 1] averages in all simulations is wins: 3.1  loses:3.3 ties:3.6


### Summery
we can see from the expirments that the "smart" player able to exploit the simple opponent.<br> As the opponent is farther from complete random strategy we succeed to exploit it better<br> And when it plays complete random the smart player do the same and the results are even
