# Example: Rock Paper Scissors Game Simulation
Rock, paper, and scissors is a simple zero-sum game in which two players simultaneously play `rock`, `paper`, or `scissors`. When a player wins a round of rock, paper, or scissors, they are awarded `+1`. However, if they lose the game, they receive a `-1` reward. Rules:
* `Rock` beats `scissors` but loses to `paper`
* `Paper` beats `rock` but loses to `scissors`
* `Scissors` beats `paper` but loses to `rock`

### Learning objectives
The objective of this example is to familiarize students with simple zero-sum games, e.g., Rock, Paper and Scissors, and in particular, the implementation and ideas for exploring these games found in the `Decisions` book:

* [Algorithms For Decision Making, Kochenderfer, Wheeler, Wray, MIT Press, 2022](https://algorithmsbook.com)

We've implemented some of the codes found in `Chapter 24` of the `Decisions` book in our package [VLDecisionsPackage.jl](https://github.com/varnerlab/VLDecisionsPackage.jl.git).

## Setup
Let's load some packages that are required for the example by calling the `include(...)` function on our initialization file `Include.jl`:

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/Desktop/julia_work/CHEME-5760-Examples-F23/Project.toml`
  [90m[10f378ab] [39m[93m~ VLDecisionsPackage v0.1.0 `https://github.com/varnerlab/VLDecisionsPackage.jl.git#main` ⇒ v0.1.0 `https://github.com/varnerlab/VLDecisionsPackage.jl.git#main`[39m
[32m[1m    Updating[22m[39m `~/Desktop/julia_work/CHEME-5760-Examples-F23/Manifest.toml`
  [90m[10f378ab] [39m[93m~ VLDecisionsPackage v0.1.0 `https://github.com/varnerlab/VLDecisionsPackage.jl.git#main` ⇒ v0.1.0 `https://github.com/varnerlab/VLDecisionsPackage.jl.git#main`[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mVLDecisionsPackage
  1 dependency successfully precompiled in 4 seconds. 226 already precompiled.
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5760-Examples-F23`
[32m[1m    Updating[22m[39m 

### Types
This example will use two new types.

The `MySimpleGameModel` encodes data about simple games in the fields:
* The `γ::Float64` holds the discount factor, which describes how much we care about current moves compared with potential future moves
* The `ℐ::Array{Int64,1}` field holds the indexes of the players of the game
* The `𝒜` field holds the joint action space of the game
* The `R` field holds the joint reward function of the game

The `MySimpleGamePolicy` type holds information about the policy used by a player in a game, this type has one important field:
* The `p::Dict{Symbol,Float64}` field is a `Dictionary` which holds the probability of an action

### Helper functions
Before we begin playing the game, we setup several helper functions that we use to initialize the game, and use throughout the game during the game play. First, the `number_of_agents(simpleGame::MySimpleGameModel) = 2` method takes a `MySimpleGameModel` model as an argument and returns the number of players of the game:

In [2]:
number_of_agents(simpleGame::MySimpleGameModel) = 2;

Next, the `ordered_actions(simpleGame::MySimpleGameModel, i::Int) = [:rock, :paper, :scissors]` method takes a `MySimpleGameModel` instance, a player index `i` and returns the an array of actions that are open to the player:

In [3]:
ordered_actions(simpleGame::MySimpleGameModel, i::Int) = [:rock, :paper, :scissors];

The `reward(simpleGame::MySimpleGameModel, i::Int, a)` function takes a `MySimpleGameModel` instance, a player index `i` and a joint action `a::Tuple{Symbol,Symbol}` and returns back the reward for that round of game play:

In [4]:
function reward(simpleGame::MySimpleGameModel, i::Int, a)
    if i == 1
        noti = 2
    else
        noti = 1
    end

    if a[i] == a[noti]
        r = 0.0
    elseif a[i] == :rock && a[noti] == :paper
        r = -1.0
    elseif a[i] == :rock && a[noti] == :scissors
        r = 1.0
    elseif a[i] == :paper && a[noti] == :rock
        r = 1.0
    elseif a[i] == :paper && a[noti] == :scissors
        r = -1.0
    elseif a[i] == :scissors && a[noti] == :rock
        r = -1.0
    elseif a[i] == :scissors && a[noti] == :paper
        r = 1.0
    end

    return r
end

reward (generic function with 1 method)

Finally, the `joint_reward(simpleGame::MySimpleGameModel, a)` function takes a `MySimpleGameModel` instance, and a joint action `a` and returns the reward array for that joint action:

In [5]:
function joint_reward(simpleGame::MySimpleGameModel, a)
    return [reward(simpleGame, i, a) for i in 1:number_of_agents(simpleGame)]
end

joint_reward (generic function with 1 method)

## Rock, paper and Scissors Game setup
Let's build and populate (manually) an instance of the game object, which is type `MySimpleGameModel()`:

In [7]:
mysimplegame = MySimpleGameModel();
mysimplegame.γ = 0.9;
mysimplegame.ℐ = [1,2];
mysimplegame.𝒜 = [ordered_actions(mysimplegame, i) for i in 1:number_of_agents(mysimplegame)]
mysimplegame.R = (a) -> joint_reward(mysimplegame, a);

Next, we setup our policies for each player. These policies are type `Dict` that hold the action, e.g., `:rock` as `keys` which point to probability values. We assign these to the `π₁` and `π₂` variables:

In [8]:
π₁ = MySimpleGamePolicy(Dict(:rock => 0.6, :paper => 0.2, :scissors => 0.2));
π₂ = MySimpleGamePolicy(Dict(:rock => 0.2, :paper => 0.7, :scissors => 0.1));

Finally, we construct the _joint policy_ which holds the policies for each of the players:

In [9]:
π = [π₁ ; π₂];

## Compute the Best Deterministic Response
The _deterministic best response_ of agent $i$ to the policies of the other agents $\pi^{-i}$ is a policy $\pi^{i}$ that
satisfies:

$$
\begin{equation*}
    U^{i}(\pi^{i}, \pi^{-i}) \geq U^{i}(\pi^{i\prime}, \pi^{-i}) \quad \forall \pi^{i\prime} \neq \pi^{i}
\end{equation*}
$$

In [10]:
best_deterministic_policy = Dict{Int64, MySimpleGamePolicy}()
for i ∈ 1:number_of_agents(mysimplegame)
    best_deterministic_policy[i] = best_response_policy(mysimplegame,π,i);
end
best_deterministic_policy

Dict{Int64, MySimpleGamePolicy} with 2 entries:
  2 => MySimpleGamePolicy(Dict(:paper=>1.0))
  1 => MySimpleGamePolicy(Dict(:scissors=>1.0))

## Compute the Best Softmax Response
The _softmax response model_ to compute the action $a^{i}$ is defined as:

$$
\begin{equation*}
    \pi^{i}(a^{i}) \sim \exp(\lambda\cdot{U}^{i}(a^{i}, \pi^{-i}))
\end{equation*} 
$$

The parameter $\lambda$ determines the degree of rationality: $\lambda \rightarrow 0$, the agent becomes random, while $\lambda \rightarrow \infty$, the agent becomes perfectly rational.

In [11]:
best_softmax_policy = Dict{Int64, MySimpleGamePolicy}()
for i ∈ 1:number_of_agents(mysimplegame)
    best_softmax_policy[i] = softmax_response_policy(mysimplegame, π, 2, 10.0);
end
best_softmax_policy

Dict{Int64, MySimpleGamePolicy} with 2 entries:
  2 => MySimpleGamePolicy(Dict(:scissors=>0.00032932, :rock=>0.0179803, :paper=…
  1 => MySimpleGamePolicy(Dict(:scissors=>0.00032932, :rock=>0.0179803, :paper=…

## Compute the Best Hierarchical Softmax Policy
The _Hierarchical softmax response model_ simulates the _depth of rationality_ of an agent by a level parameter $k\geq{0}$, along with the softmax $\lambda$ parameter:

* A level `k = 0` agent selects actions from the initial policy
* A level `k = 1` agent selects actions according to the _softmax response model_ using parameter $\lambda$.
* A level `k ≥ 2` agent selects actions according to a _softmax response model_ model of other players playing at $k-1$.

In [53]:
λ = 5.0
k = 0
hierarchical_softmax_policy = MyHierarchicalSoftmaxPolicy(λ, k, π)

MyHierarchicalSoftmaxPolicy(5.0, 0, MySimpleGamePolicy[MySimpleGamePolicy(Dict(:scissors => 0.2, :rock => 0.6, :paper => 0.2)), MySimpleGamePolicy(Dict(:scissors => 0.1, :rock => 0.2, :paper => 0.7))])

In [54]:
Z = solve(hierarchical_softmax_policy, mysimplegame)

2-element Vector{MySimpleGamePolicy}:
 MySimpleGamePolicy(Dict(:scissors => 0.2, :rock => 0.6, :paper => 0.2))
 MySimpleGamePolicy(Dict(:scissors => 0.1, :rock => 0.2, :paper => 0.7))