# STOR609 Assessment 3 
## Replication and Reproduction Study
Group 3: Vlad Bercovici, Malcolm Connolly, Rebekah Fearnhead, Niharika Peddinenikalva


## 1 Introduction

Pig is a turn based jeopardy dice game over a number of rounds. Players may roll any number of times, scoring the cumulative sum of the numbers rolled, with the jeopardy condition that if a player rolls a $1$ then their round score is zero. The game continues until either player attains a score of at least $100$.

The aim of this package and notebook is to constitute a replication study of the findings on the optimal play of the game Pig published in the journal ReScience C (Neller and Presser, 2004). That is, to provide substantial evidence for replication of the original results in the article. Neller and Presser claim to apply Value Iteration to the game of Pig, and produce several figures illustrating an optimal policy for playing the game. 

We reproduce all figures relating to the optimal policy in the original article, rigorously describe the methodology from which they are derived, and provide our original source code as an open source package.



## 2 Methodology

In this section we give a detailed account of our notation and application of the Value Iteration algorithm to reproduce the results of Neller and Pressers article. We also describe instances where these were different from those described in the paper. 

### 2.1 Piglet

The game of Piglet is a simplification of the game of Pig discussed in the paper involving flipping a coin instead of rolling a die, where the coin landing on heads or tails would score $1$ and $0$ respectively. This game is only studied in the case when the goal or winning condition is a score of $2$. 

#### 2.1.1 States
Following the paper, we can define the states as $3$-tuples, $(i, \ j, \ k)$, where $i$ is the current score, $j$ is the current score of the opponent and $k$ is the current turn score of the player. Neller and Presser state 'winning and losing states are terminal', however by their own reasoning utilising the symmetry, we need only consider consider a single further 'winning' state $(W, \ L, \ 0)$. The states are then,

$$S = \{ (i, \ j, \ k) : i, \ j, \ k \in \{0,1\}, \  i+k <2 \} \cup \{(W, \ L, \ 0)\}.$$

#### 2.1.2 Actions

The actions $A$ in the game of Piglet are whether to flip or not, $A=\{\text{flip},\text{hold}\}$.


#### 2.1.3 Transitions

There are three types of transitions that we consider.

##### (i)
When the action is to **flip** and the outcome is **heads** with probability $\frac{1}{2}$, and the transition is given by:
$$(i, \ j, \ k) \  \mapsto \ \begin{cases} (i, \ j, \ k+1) &\text{ if }k+1<2 \\
(W, \ L, \ 0) &\text{ otherwise.}\end{cases}$$

##### (ii)
When the action is to **flip** and the outcome is **tails** with probability $\frac{1}{2}$, and the transition is given by:
$$(i, \ j, \ k) \mapsto (j, \  i, \ 0). $$

##### (iii)

Finally, if the action is to **hold** then with probability $1$ the transition is given by:
$$(i, \  j,  \ k) \mapsto (j, \ i+k, \ 0).$$

Transitions (ii) and (iii) utilise symmetry. Our probability of winning the game is the probability that the opponent does not win after having lost, or having held on the previous turn. However, as the opponent is also assumed to be playing optimally, we need only consider winning in this opposing state from our point of view, and existing list of states, as if we were the opponent. 


#### 2.1.4 Rewards

We award a reward of $1$ for any transition from a non-winning state to the winning state, and $0$ otherwise. 

$$R_{ss'}^{a} = \begin{cases} 1 &\text{ if }s=(i, \ j, \ k), \ s'=(W, \ L, \ 0), \text{ where } a=\text{flip} \text { and }i+k<2  \\
                         0 &\text{ otherwise.}\end{cases}$$

#### 2.1.5 Iteration

As in the paper, we conflate probability of winning from a given state and the value of that state, taking $P_{(i, \ j, \ k)} = V( (i, \ j, \ k) ).$ In our implementation we directly solve the $6$ equations for $P_{s}$ with $s\in S\setminus \{(W,\ L, \ 0)\}$.

$$P_{(i, \ j, \ k)} = \max \left\{ \overbrace{\frac{1}{2}\left[\left( 1-{P}_{(j, \ i, \ 0)} \right) + P_{(i, \ j, \ k+1)}\right]}^{a = \text{flip}}, \overbrace{1- P_{(j, \ i+k, \ 0)}}^{a= \text{hold}} \right\}. $$

Taking $P_{(1,0,1)}=P_{(1,1,1)}=1$ if the flip increments the player score. These are equivalent to equations (1) in the paper (pg 28, Neller and Presser, 2004).

Initially we took the values $P_{s} = 0, \forall s\in S\setminus \{(W,\ L, \ 0)\}$, and $P_{(W,L,0)} = 1$. Initial iterations were tabulated by hand, and are displayed here for completeness.





| Iteration |$P_{(0,0,0)}$ | $P_{(0,0,1)}$ | $P_{(0,1,0)}$ | $P_{(0,1,1)}$ | $P_{(1,0,0)}$ | $P_{(1,1,0)}$ |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | $\max\left\{\frac{1}{2}, 1\right\} = 1$ | $\max\left\{1, 1\right\} = 1$ | $\max\left\{\frac{1}{2}, 1\right\} = 1$ | $\max\left\{1, 1\right\} = 1$ | $\max\left\{1, 1\right\} = 1$ | $\max\left\{1, 1\right\} = 1$ |
| 2 | $\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ |$\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ | $\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ | $\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ | $\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ | $\max\left\{\frac{1}{2}, 0\right\} = \frac{1}{2}$ |
| 3 | $\max\left\{\frac{1}{2}, \frac{1}{2}\right\} = \frac{1}{2}$ |$\max\left\{\frac{3}{4}, \frac{1}{2}\right\} = \frac{3}{4}$ | $\max\left\{\frac{1}{2}, \frac{1}{2}\right\} = \frac{1}{2}$ | $\max\left\{\frac{3}{4}, \frac{1}{2}\right\} = \frac{3}{4}$ | $\max\left\{\frac{3}{4}, \frac{1}{2}\right\} = \frac{3}{4}$ | $\max\left\{\frac{3}{4}, \frac{1}{2}\right\} = \frac{3}{4}$ |

### 2.2 Pig

Our methodology for finding the optimal policy for the game of pig is similar to that of piglet in section 2.1.

#### 2.2.1 States

Again we can define the states as $3$-tuples, $(i, \ j, \ k)$, where $i$ is the current score, $j$ is the current score of the opponent and $k$ is the current turn score of the player, in addition to a single further 'winning' state $(W, \ L, \ 0)$. The states are then,

$$S = \{ (i, \ j, \ k) : i, \ j, \ k \in \{0,1\}, \  i+k < 100 \} \cup \{(W, \ L, \ 0)\}.$$

#### 2.2.2 Actions

The actions $A$ in the game of pig answer the question of whether to roll or not to roll. That is $A=\left\{ \text{roll}, \ \text{hold} \right\}.$

#### 2.2.3 Transitions

There are three types of transitions that we consider.

##### (i)
When the action is to **roll**, the outcome is $r = \{2,3,\ldots\}$ with probability $\frac{1}{6}$, and the transition is given by:
$$(i, \ j, \ k) \  \mapsto \ \begin{cases} (i, \ j, \ k+r) &\text{ if }k+r<100 \\
(W, \ L, \ 0) &\text{ otherwise.}\end{cases}$$

##### (ii)
When the action is to **roll**, the outcome is $r=1$ with probability $\frac{1}{6}$, and the transition is given by:
$$(i, \ j, \ k) \mapsto (j, \  i, \ 0). $$

##### (iii)

Finally, if the action is to **hold** then with probability $1$ the transition is given by:
$$(i, \  j,  \ k) \mapsto (j, \ i+k, \ 0).$$


#### 2.2.4 Rewards

We award a reward of $1$ for any transition from a non-winning state to the winning state, and $0$ otherwise. 

$$R_{ss'}^{a} = \begin{cases} 1 &\text{ if }s=(i, \ j, \ k), \ s'=(W, \ L, \ 0), \text{ where } a=\text{roll,} \text { and }i+k<100  \\
                         0 &\text{ otherwise.}\end{cases}$$

#### 2.2.5 Iteration

Again we take $P_{(i, \ j, \ k)} = V( (i, \ j, \ k) ).$ In our implementation we directly solve the $6$ equations for $P_{s}$ with $s\in S\setminus \{(W,\ L, \ 0)\}$.

$$P_{(i, \ j, \ k)} = \max \left\{ \overbrace{\frac{1}{6}\left[\left( 1-{P}_{(j, \ i, \ 0)} \right) + P_{(i, \ j, \ k+2)}+ P_{(i, \ j, \ k+3)}+ \ldots + P_{(i, \ j, \ k+6)}\right]}^{a = \text{roll}}, \overbrace{1- P_{(j, \ i+k, \ 0)}}^{a= \text{hold}} \right\}. $$

The authors claim this yields $505,000$ equations. We calculate this number using inclusion-exclusion principle. Counting all possible triples $(i,j,k)$, where each can take a value from $0$ to $99$, gives $100^3$ triples. Now considering the triples $(i,j,k)$ where $i+k\geq 100$, we count the choices of $k$ that any value of $i$ affords according to the table below.

| $i$ | $k\geq 100 - i$ | choices for $k$ |
| --- | --- | --- | 
| $i=1$ | $k\geq 99$ | $1$ | 
| $i=2$ | $k\geq 98$ | $2$ |
| $\vdots$ | $\vdots$ | $\vdots$ |
| $i=99$ | $k\geq 1$ | $99$ |

The number of choices for $k$ is then the sum of the last column $1+2+\cdots+99 = \frac{99\times 100}{2}$. Further in each case there is a choice of $j$, and so the total number of equations is,

$$100^3 - \frac{99\times 100}{2}\times 100 = 505,000. $$

## 3 Results

Here we give an account of each of the figures we were able to replicate from the paper, thereby corroborating the main results of the paper regarding optimal play. 

### 3.1 Optimal piglet play (Figure 2)

### 3.2 Optimal pig play (Figure 3)

### 3.3 Reachable states (Figures 4, 5 and 6)

## 4 Discussion

We were able to verify the main findings of the paper, as seen in section $3$.

#### Limitations regarding replication

In the paper a comparison is made between a version of a 'hold at $20$' policy and the optimal policy. A hold at 20 policy would entail rolling until the turn total $k=20$ then holding. However the authors explain that in their version they hold at 20 unless it is close to winning. Exactly how close, and the specific policy is not detailed in the report. In our competition comparison we were able to compare the optimal policy to the standard hold at 20 policy, but this resulted in winning percentages significantly different from those quoted in the paper. 

In the paper a parameter $\Delta$ is introduced to give the tolerance for convergence criterion. However this threshold is not given in the report. We tried $10^{-6}$ through to $10^{-16}$ and found that using the threshold was unable to reproduce the piglet plot in figure 2. Instead, we implemented a maximum number of iterations in our code. However, there still remains an issue as to how to choose the number of iterations for the algorithm.  

The method for finding the optimal policy for the games of piglet and pig are referred to as value iteration throughout the article (pg. 30, Neller and Presser, 2004). Bellman's equation for Value iteration as given in the paper reads:

$$ V(s) \approx \max_{a\in A} \left\{ \sum_{s'\in S:s\mapsto s'} P_{ss'}^{a} [ R_{ss'}^{a} + \gamma V(s')]\right\}.$$ 

It is explained that $\gamma = 1$, that the rewards are $1$ when rolling takes one to a winning state, and that the value of a state $V(s)$ is the probability of eventually winning from that state. However it is not explained to the reader how one encodes the symmetry into the value function that leads from Bellman's equation to equations of the form (1) on page $28$ of the article.


#### Other issues 

Neller and Cresser describe their method as applying value iteration in stages for subsets of the states. For example they use the fact that there is no transition from $(99,99,0)$ to $(1,1,0)$, and instead limit considerations to those states which could preceed that state such as $(98,99,0)$. The authors describe this as 'partitioning score sums'. It is our understanding that this approach differs from our implementation. We instead have solved all equations iteratively. It remains to be seen if this reduces total number of computations or improves computational efficiency, as the authors claim, as no comparison is made in the paper to a direct approach. 


## 5 Conclusions and further work

The terminology of the paper partially aligns with the application of Value Iteration to Markov Decision Processes. Here the language of discount factors, values and policies are interpreted for the game pig. However there are aspects of the problem for which there is no straightforward alignment to the MDP literature, such as the value of the 'next' state in the MDP approach being identified through problem-specific symmetry to another part of the state space. It would be interesting to research if there are other problems for which the value of a state is a function, in some generalised MDP approach. 

We contacted the authors of the paper, who supplied their original Java code using a backward iteration approach as described earlier. In further work we could translate this to Python and compare both methods for computational efficiency. 



## References

Neller, Todd W. and Clifton G.M. Presser. "Optimal Play of the Dice Game Pig," The UMAP Journal 25.1 (2004), 25-47.