# RPS - Rock Paper Scissors Agent - using PPL

In this notebook I will show an expirment of RPS game simulation.
I will use two players:
<ol>
    <li>Simple Player - playing according to Categorical seed (alpha vector)</li>
    <li> Inferencing Player - models the opponent as a Probalistic Program and by observations trying to infer the latent alpha vector.<br>
        Using this infered vector the player will try to exploit the simple player.<br>
</ol>
    

In [1]:
import numpy as np
import pymc3 as pm

### Simple Player
The simple player creates a categorical distribution (with dirichlet prior) and a given alpha vector and returns <num_of_samples> samples from this distirbution

In [7]:
def simple_player_model(num_of_samples=5, alpha=[1, 1, 1]):
        with pm.Model():
            dirichlet = pm.Dirichlet('dirichlet', a=alpha)
            phi = pm.Categorical('phi', p=dirichlet)
            return phi.random(size=num_of_samples)

### Smart Player

In [42]:
def smart_player_model(observed = None):
        with pm.Model():
            alpha = [1, 1, 1]
            dirichlet = pm.Dirichlet('dirichlet', a=alpha)
            phi = pm.Categorical('phi', p=dirichlet, observed=observed)
            posterior = pm.sample(step=pm.Metropolis(), progressbar=False, return_inferencedata=True)
            posterior_pred = pm.sample_posterior_predictive(posterior, progressbar=False)

            # print(az.summary(posterior))
            # print(posterior_pred)
            median_over_samples = np.median(posterior_pred['phi'], axis=0)
            # print(median_over_samples)
#             t2 = time()
#             elapsed = t2 - t1
            # print('Elapsed time is %f seconds.' % elapsed)
            return median_over_samples

### Simulation
In the expirement we will look at those two player playing. and will examine the results

### Some aux function for RPS

In [43]:
def beats(i):
    return (i + 1) % 3

In [44]:
from enum import IntEnum

class RPS(IntEnum):
    ROCK = 0,
    PAPER = 1,
    SCISSORS = 2
    
def get_result(first_player, second_player):
    if first_player == second_player:
        return 0
    elif (first_player == RPS.ROCK and second_player == RPS.SCISSORS) or (
            first_player == RPS.PAPER and second_player == RPS.ROCK) or (
            first_player == RPS.SCISSORS and second_player == RPS.PAPER):
        return 1
    else:
        return -1

### Simulator

In [None]:
def simulate_with_latent_alpha(num_of_simulations=5, alpha=[1, 1, 1]):
    total_smart_player_wins = 0
    total_simple_player_wins = 0
    total_ties = 0
    for i in range(num_of_simulations):
        simple_player_history = simple_player_model(num_of_samples=10, alpha=alpha)

        # gets a list of observed values and returns the distribution of probable action
        smart_player_estimates_simple_player_moves = smart_player_model(observed=simple_player_history)

        smart_player_next_moves = list(map(beats, smart_player_estimates_simple_player_moves))

        # Evaluation phase
        simple_player_next_moves = simple_player_model(len(smart_player_next_moves), alpha=alpha)

        smart_player_wins = 0
        simple_player_wins = 0
        ties = 0

        for j in range(len(simple_player_next_moves)):
            result = get_result(smart_player_next_moves[j], simple_player_next_moves[j])
            if result > 0:
                smart_player_wins += 1
            elif result < 0:
                simple_player_wins += 1
            else:
                ties += 1
        total_smart_player_wins += smart_player_wins
        total_simple_player_wins += simple_player_wins
        total_ties += ties
        print(f'in simulation {i}: wins: {smart_player_wins}, loses: {simple_player_wins}, ties: {ties}')
    print(
        f'For opponent\'s alpha vector: {alpha} averages in all simulations is wins: {total_smart_player_wins / num_of_simulations} '
        f' loses:{total_simple_player_wins / num_of_simulations} ties:{total_ties / num_of_simulations}')

### Expirements
I will check the results of different alpha played by the simple player

alpha = [1, 10, 10] (playing less Rock)

In [49]:
simulate_with_latent_alpha(alpha=[1, 10, 10])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 1, loses: 6, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 28 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 1: wins: 0, loses: 4, ties: 6


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 2: wins: 1, loses: 3, ties: 6


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 3: wins: 0, loses: 7, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 4: wins: 0, loses: 4, ties: 6
For opponent's alpha vector: [1, 10, 10] averages in all simulations is wins: 4.8  loses:0.4 ties:4.8


In [50]:
simulate_with_latent_alpha(alpha=[10, 6, 1])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 28 seconds.
The estimated number of effective samples is smaller than 200 for some parameters.


in simulation 0: wins: 1, loses: 6, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 1: wins: 1, loses: 8, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 2: wins: 4, loses: 4, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 3: wins: 0, loses: 9, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 4: wins: 2, loses: 5, ties: 3
For opponent's alpha vector: [10, 6, 1] averages in all simulations is wins: 6.4  loses:1.6 ties:2.0


In [51]:
simulate_with_latent_alpha(alpha=[1, 6, 1])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 0: wins: 2, loses: 7, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 1: wins: 0, loses: 9, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 22 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 2: wins: 1, loses: 6, ties: 3


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 3: wins: 0, loses: 8, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 4: wins: 2, loses: 7, ties: 1
For opponent's alpha vector: [1, 6, 1] averages in all simulations is wins: 7.4  loses:1.0 ties:1.6


In [52]:
simulate_with_latent_alpha(alpha=[1, 3, 5])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 2, loses: 2, ties: 6


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 1: wins: 1, loses: 7, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 2: wins: 2, loses: 7, ties: 1


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 26 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 3: wins: 2, loses: 6, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 4: wins: 3, loses: 7, ties: 0
For opponent's alpha vector: [1, 3, 5] averages in all simulations is wins: 5.8  loses:2.0 ties:2.2


In [53]:
simulate_with_latent_alpha(alpha=[1, 1, 1])

Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 10% for some parameters.


in simulation 0: wins: 4, loses: 2, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 24 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 1: wins: 3, loses: 3, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 2: wins: 3, loses: 3, ties: 4


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 3: wins: 3, loses: 5, ties: 2


Multiprocess sampling (4 chains in 4 jobs)
Metropolis: [dirichlet]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
The number of effective samples is smaller than 25% for some parameters.


in simulation 4: wins: 4, loses: 4, ties: 2
For opponent's alpha vector: [1, 1, 1] averages in all simulations is wins: 3.4  loses:3.4 ties:3.2
