# RPS - Rock Paper Scissors Agent - using PPL

In this notebook I will show an expirment of RPS game simulation.
I will use two players:
<ol>
    <li>Simple Player - playing according to Categorical seed (alpha vector)</li>
    <li> Inferencing Player - models the opponent as a Probalistic Program and by observations trying to infer the latent alpha vector.<br>
        Using this infered vector the player will try to exploit the simple player.<br>
</ol>
    

In [1]:
import numpy as np
import scipy as sp
import pymc3 as pm
import logging
logger = logging.getLogger('pymc3')
logger.setLevel(logging.ERROR)

### Simple Player
The simple player creates a categorical distribution (with dirichlet prior) and a given alpha vector and returns <num_of_samples> samples from this distirbution

### Smart Player

Infercing Player, takes the toolset of MCMC infernce. Each round this player takes the moves of the simple players as observations, and uses the same Probabalistic Model to posterior inference

### RPS Model
Probabalistic Program, describes the decision process.<br>
    $dir \sim Dirichlet(\alpha_1, \alpha_2, \alpha_3)$<br>
    $Action \sim Categorical(dir)$
    


In [2]:
def rps_player_model(alpha=[1, 1, 1], observed=None):
    with pm.Model() as model:
        dirichlet = pm.Dirichlet('dirichlet', a=alpha)
        phi = pm.Categorical('phi', p=dirichlet, observed=observed)
        return model

In [3]:
def beats(i):
    """
    mapping between move and the the beating move.
    i.e beats(ROCK) == PAPAER
    """
    return (i + 1) % 3

## The hierarchy

### Base class of all players

In [4]:
class Player:
    def __init__(self, id):
        self.id = id

    def move(self, history):
        raise NotImplementedError()

### Naive player

In [5]:
class NaivePlayer(Player):
    """Naive player chooses a move according to fixed probabilities.
    """
    def __init__(self, id, p=[1, 1, 1]):
        Player.__init__(self, id)
        p = np.array(p)
        p = p/sum(p)
        self.p = p
    
    def move(self, history):
        return np.argmax(sp.stats.multinomial.rvs(1, self.p))

### Frequentist Players

In [6]:
class FrequentistPlayer(Player):
    """Frequentist player uses prior history 
    to choose a move
    """
    def __init__(self, id, counts=None):
        Player.__init__(self, id)
        if counts is None:
            counts = [1, 1, 1]
        self.counts = counts
    
    def stats(self, history):
        counts = self.counts[:]
        for id, m in history:
            if id != self.id:
                counts[m] += 1
        return np.array(counts)

In [7]:
class FixedFrequentistPlayer(FrequentistPlayer):
    def __init__(self, id, counts=None):
        FrequentistPlayer.__init__(self, id, counts)
        
    def move(self, history):
        counts = self.stats(history)
        return beats(np.argmax(counts))
    
# Example
# ffp = FixedFrequentistPlayer(1)
# print([ffp.move([(1, 1), (2, 1), (1, 0), (2, 1)]) for _ in range(10)])
# print([ffp.move([(1, 1), (2, 2), (1, 0), (2, 1), (1, 2), (2, 2)]) for _ in range(10)])

In [8]:
class RandomFrequentistPlayer(FrequentistPlayer):
    def __init__(self, id, counts=None):
        FrequentistPlayer.__init__(self, id, counts)
        
    def move(self, history):
        counts = self.stats(history)
        return beats(np.argmax(sp.stats.multinomial.rvs(n=1, p=counts/sum(counts))))
    
# Example
# rfp = RandomFrequentistPlayer(1)
# print([rfp.move([(1, 1), (2, 1), (1, 0), (2, 1)]) for _ in range(10)])
# print([rfp.move([(1, 1), (2, 2), (1, 0), (2, 1), (1, 2), (2, 2)]) for _ in range(10)])

In [9]:
class BayesianPlayer(Player):
    def __init__(self, id, alpha=None):
        Player.__init__(self, id)
        if alpha is None:
            alpha = 1, 1, 1
        self.alpha = np.array(alpha)
        
    def opponent_model(self, history):
        pass
    
    def select_action(samples):
        pass
    
    def infer(self, model):
        with model:
            trace = pm.sample(step=pm.Metropolis(), model=model, return_inferencedata=True, progressbar=False, cores=1)
            return trace
        
    def sample_from_posterior(self, model, trace, theta_var):
        with model:
            posterior_pred = pm.sample_posterior_predictive(trace, progressbar=False)
            return beats(self.select_action(posterior_pred[theta_var]))
        
    def opponent_history(self, history):
        moves_history = []
        for idx, move in history:
            if idx != self.id:
                moves_history.append(move)
        return moves_history
    
    def move(self, history):
        history = self.opponent_history(history)
        opponent_model = self.opponent_model(history)
        trace = self.infer(opponent_model)
        return self.sample_from_posterior(opponent_model, trace, self.get_theta_var())

In [10]:
class CategoricalBaysianPlayer(BayesianPlayer):
    def __init__(self, id, alpha = None):
        BayesianPlayer.__init__(self, id, alpha)
    
    def opponent_model(self, history):
        with pm.Model() as model:
            dirichlet = pm.Dirichlet('dirichlet', a=self.alpha)
            phi = pm.Categorical('phi', p=dirichlet, observed=history)
            return model
    
    def get_theta_var(self):
        return 'phi'

In [11]:
class FixedCategoricalBaysianPlayer(CategoricalBaysianPlayer):
    def __init__(self, id, alpha = None):
        CategoricalBaysianPlayer.__init__(self, id, alpha)
    
    def select_action(self, samples):
        samples = samples.reshape(-1)
        counts = [0, 0, 0]
        # return most common sample
        for m in samples:
            counts[m] += 1
        return np.argmax(np.array(counts))
        
# Example
# fcbp = FixedCategoricalBaysianPlayer(1)
# print([fcbp.move([(1, 1), (2, 1), (1, 0), (2, 1)]) for _ in range(10)])
# print([fcbp.move([(1, 1), (2, 2), (1, 0), (2, 1), (1, 2), (2, 2)]) for _ in range(10)])

In [12]:
class RandomCategoricalBaysianPlayer(CategoricalBaysianPlayer):
    def __init__(self, id, alpha = None):
        CategoricalBaysianPlayer.__init__(self, id, alpha)
    
    def select_action(self, samples):
        samples = samples.reshape(-1)
        counts = [0, 0, 0]
        # return most common sample
        for m in samples:
            counts[m] += 1
        counts = np.array(counts)
        return np.argmax(sp.stats.multinomial.rvs(n=1, p=counts/sum(counts)))
        
# Example
# rcbp = RandomCategoricalBaysianPlayer(1)
# print([rcbp.move([(1, 1), (2, 1), (1, 0), (2, 1)]) for _ in range(10)])
# print([rcbp.move([(1, 1), (2, 2), (1, 0), (2, 1), (1, 2), (2, 2)]) for _ in range(10)])

In [19]:
ROCK = 0
PAPER = 1
SCISSORS = 2
def score(m1, m2):
    # ROCK < PAPER < SCISSORS < ROCK
    if m1==m2:
        return 0
    score = -1
    if m1 > m2:
        m1, m2 = m2, m1
        score = -score
    if m2 - m1 == 2:
        score = -score
    return score

def summerize_score(scores):
    first_wins = scores.count(1)
    ties = scores.count(0)
    second_wins = scores.count(-1)
    return first_wins, ties, second_wins

def game(player1, player2, n=20):
    # TODO play the game with two players
    history = []
    scores = []
    for i in range(n):
        m1 = player1.move(history)
        history.append([1, m1])
        m2 = player2.move(history)
        history.append([2, m2])
        scores.append(score(m1, m2))
    return scores

def play_game_and_print_summery(player1, player2):
    results = game(player1, player2)
    print(results)
    first_wins, ties, second_wins = summerize_score(results)
    print(f'player1 wins: {first_wins} ties: {ties} player2 wins: {second_wins}')

### Game between random frequentist player and naive player

In [20]:
rfp = RandomFrequentistPlayer(1)
nap = NaivePlayer(2)
play_game_and_print_summery(rfp,nap)

[1, -1, -1, -1, -1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, -1, -1, -1, 1, 1]
player1 wins: 6 ties: 7 player2 wins: 7


### Game between categorical bayesian player and naive player

In [21]:
rcbp = RandomCategoricalBaysianPlayer(1)
nap = NaivePlayer(2)
play_game_and_print_summery(rcbp,nap)

The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.


[0, 1, -1, 0, -1, 0, 0, 0, -1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1]
player1 wins: 9 ties: 8 player2 wins: 3


### Game between random frequentist player and categorical baysian player

In [22]:
rfp = RandomFrequentistPlayer(1)
rcbp = RandomCategoricalBaysianPlayer(2)
play_game_and_print_summery(rfp,rcbp)

The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.


[0, 0, -1, 1, 0, 1, 0, 0, -1, 1, -1, 1, 0, 0, -1, 1, 1, -1, -1, 1]
player1 wins: 7 ties: 7 player2 wins: 6


### Game between random frequentist player and fixed frequentist player

In [23]:
rfp = RandomFrequentistPlayer(1)
ffp = FixedFrequentistPlayer(2)
play_game_and_print_summery(rfp,ffp)

[-1, -1, 1, 0, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, 1]
player1 wins: 8 ties: 1 player2 wins: 11


### Game between fixed baysien player and naive player

In [24]:
fcbp = FixedCategoricalBaysianPlayer(1)
nap = NaivePlayer(2)
play_game_and_print_summery(fcbp,nap)

The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.


[1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, -1, -1, 0, 1, 1, -1, 0, 0]
player1 wins: 7 ties: 10 player2 wins: 3


### Game between fixed baysien player and exploitable naive player

In [25]:
fcbp = FixedCategoricalBaysianPlayer(1)
nap = NaivePlayer(2, p=[5, 1, 1])
play_game_and_print_summery(fcbp,nap)

The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.
The 

[1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, -1, 1, 1, 1]
player1 wins: 17 ties: 0 player2 wins: 3


### Game between fixed frequentist and exploitable naive player

In [26]:
ffp = FixedFrequentistPlayer(1)
nap = NaivePlayer(2, p=[5, 1, 1])
play_game_and_print_summery(ffp,nap)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1]
player1 wins: 15 ties: 0 player2 wins: 5


## Baysian Analysis on the results

### Summery
we can see from the expirments that the "smart" player able to exploit the simple opponent.<br> As the opponent is farther from complete random strategy we succeed to exploit it better<br> And when it plays complete random the smart player do the same and the results are even
