# L6c: Applications of the Multiplicative Weights Update Algorithm
In this lecture, we'll continue our discussion of online learning and the multiplicative weights update algorithm. Today, we'll explore a basic implementation of the algorithm and some of its applications. The key ideas of this lecture are:
* [The Multiplicative Weights Algorithm (MWA)](https://en.wikipedia.org/wiki/Multiplicative_weight_update_method)is a powerful online learning algorithm. The MWA updates expert weights based on past performance, assigning higher weights to better-performing experts and lower weights to others. This enables adaptation to changing data distributions and learning from mistakes.
* A [zero-sum game](https://en.wikipedia.org/wiki/Zero-sum_game) is a competitive scenario where one participant's gain is exactly balanced by another participant's loss, resulting in a net change of zero in total wealth or benefit. This concept is commonly applied in economics and game theory, with examples including poker, chess, and financial transactions like futures and options contracts.

The lecture notes today are taken from [the CMS 139 Course at Caltech prepared by Prof. Thomas Vidick](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-6/L6c/docs/CMS139-Vidick-Caltech-multiplicative_weights-2018.pdf).

## The Multiplicative Weights Update Algorithm (MWA)
The Multiplicative Weights Update Algorithm (MWA) is a generalization of the Weighted Majority Algorithm and the Hedge strategy. The MWA is a simple and robust online learning algorithm that can solve many optimization problems. 

* __Game__: Let $t = 1, 2, \ldots, T$ denote the current round of the game, and $i$ denote an expert advising us. In each round, we compute a _belief distribution_ $\mathbf{p}^{(t)} = \left\{p_{1}^{(t)}, p_{2}^{(t)}, \ldots, p_{N}^{(t)}\right\}$ over the experts, select a _random_ expert by _sampling_ this distribution and use the selected expert to make a decision. At this point, the _adversary_ (nature) reveals the outcome, and we compute the cost of the decision we've made, where $\mathbf{m}^{(t)} = \left\{m_{1}^{(t)}, m_{2}^{(t)}, \ldots, m_{N}^{(t)}\right\}$ is the overall cost vector and $m_{i}^{(t)}$ is the cost of expert decision $i$ at time $t$. Here, we assume that the costs are in the range $m_{i}^{(t)}\in[-1, 1]$. Then, the total expected loss at time $t$ is: $L^{(t)} = \sum_{i=1}^{N}p_{i}^{(t)}m_{i}^{(t)}$, while the overall loss experienced by the _aggregator_ (at the end of the game) is: $L_{A} = \sum_{t=1}^{T}L^{(t)}$.
* __Goal__: The goal of the aggregator (us) is to minimize the total expected loss $L_{A}$ throughout the game, such that we do not experience a loss that is significantly worse than the best decision in hindsight, i.e., $\min_{i}\left(\sum_{t=1}^{T}m_{i}^{(t)}\right)$.

#### Algorithm
Fix a learning rate $\eta\in\left(0,{1}/{2}\right]$, for each expert initialize the weight $w_{i}^{(1)} = 1$. The the costs for a correct/incorrect prediction are in the range $m_{i}^{(t)}\in[-1, 1]$.

For round $t=1,2,\dots,T$:
1. Chose expert $i$ with probability $p_{i}^{(t)} = w_{i}^{(t)}/\sum_{j=1}^{N}w_{j}^{(t)}$. Ask expert $i$ what the outcome of the experiment should be, denote this outcome as: $y_{i}^{(t)}$.
2. The adversary (nature) reveals the true outcome $y_{t}$. Compute the cost of the following expert $i$. If the expert predicted the outcome of the experiment _correctly_ the cost is $m_{i}^{(t)}$ = `-1`, otherwise the cost for an _incorrect prediction_ is $m_{i}^{(t)}$ = `1`. 

3. Update the weights of expert $i$ as:
$$
\begin{align*}
w_{i}^{(t+1)} = w_{i}^{(t)}\cdot\left(1-\eta\cdot{m_{i}^{(t)}}\right)
\end{align*}
$$
4. __Note__: The Caltech notes give the update rule as: $w_{i}^{(t+1)} = w_{i}^{(t)}\cdot\exp\left(-\eta\cdot{m}_{i}^{(t)}\right)$ and $\eta\in\left(0,1\right)$.

__Theorem__: The MWA has the following theoretical guarantee. Assume all costs are in the range $m_{i}^{(t)}\in[-1, 1]$ and $\eta\leq{1}/{2}$. Then the Multiplicative Weights Algorithm (MWA) guarantees that after $T$ rounds, for any expert $i$, we have:
$$
\begin{align*}
\sum_{t=1}^{T}\mathbf{m}^{(t)}\cdot\mathbf{p}^{(t)} & \leq \sum_{t = 1}^{T}m_{i}^{(t)}+\eta\sum_{t=1}^{T}|m_{i}^{(t)}|+\frac{\ln{n}}{\eta}
\end{align*}
$$

## Applications of the Multiplicative Weights Update Algorithm
The Multiplicative Weights Update (MWA) algorithm has a wide range of applications across various fields, including machine learning, optimization, and game theory. Here are some of its key applications:

* __Machine Learning and Prediction__: The MWA method is used in machine learning for online prediction problems, such as learning from expert advice. It helps in combining predictions from multiple experts by iteratively updating weights based on their performance, ensuring that the overall prediction is close to the best expert's performance.
* __Game Theory and Portfolio Management__: In game theory, MWA is used to [solve zero-sum games](https://en.wikipedia.org/wiki/Zero-sum_game) by iteratively adjusting strategies based on outcomes. It is also applied in [portfolio management problems](https://www.cis.upenn.edu/~mkearns/finread/helmbold98line.pdf) to optimize investment strategies by dynamically updating the weights of different assets based on their performance. 
* __Optimization and Linear Programming__: The MWA can be applied to solve linear programs and other optimization problems by iteratively adjusting weights to satisfy constraints. It can efficiently handle systems of linear inequalities and is used in algorithms like Clarkson's for linear programming.
* __Complexity Theory and Other Applications__: Additionally, the MWA is used in complexity theory for hardness amplification and in computational geometry for solving specific geometric problems.

## Zero sum games
Let's consider the application the mutiplicative weights update algorithm to zero sum games. In [a zero-sum game](https://en.wikipedia.org/wiki/Zero-sum_game), two players have _opposing interests_, and the sum of their payoffs is always zero. The goal of each player is to maximize their own payoff while minimizing the opponent's payoff. The MWA can be used to solve zero-sum games by iteratively adjusting strategies based on outcomes.
* __Game__: A set of $k$ players play a zero-sum game. During each turn of the game, each player can choose an action $a\in\mathcal{A}$ from the set of actions $\mathcal{A}$, where the number of possible actions is $\dim\mathcal{A} = N$. If we consider $k = 2$, the payoff for the players is represented in a payoff matrix $\mathbf{M}\in\mathbb{R}^{N\times{N}}$. Let player `1` be the row player, and player `2` be the column player; then $m_{ij}\in\mathbf{M}$ is the payoff for player `1` choosing action $i$ and player `2` choosing action $j$. 
* __Goal__: The goal of each player is to maximize their own payoff while minimizing the opponent's payoff. If the row player chooses action $i$ and the column player chooses action $j$, the payoff for the row player is $-m_{ij}$, and the payoff for the column player is $m_{ij}$. Suppose the row player chooses actions according to a distribution $p$, and the column player chooses actions based on a distribution $q$. The expected payoff for the row player is: $-p^{T}\mathbf{M}q$ while the expected payoff for the column player is: $p^{T}\mathbf{M}q$. Thus, the row player wants to mininize $p^{T}\mathbf{M}q$, while the column player wants to maximize $p^{T}\mathbf{M}q$.

### Von Neumann's Minimax Theorem
[Von Neumann's Minimax Theorem](https://en.wikipedia.org/wiki/Minimax_theorem) states that for any two-player zero-sum game, there exists an optimal mixed strategy for each player that minimizes the maximum expected payoff. The optimal mixed strategy for the row player is $p^{*}$ and for the column player is $q^{*}$, such that:
$$
\begin{align*}
\max_{q}\min_{p}p^{\top}\mathbf{M}q & = \min_{p}\max_{q}p^{\top}\mathbf{M}q = \lambda^{\star}
\end{align*}
$$
where $\lambda^{\star}$ is the optimal utility (also called the value of the game). The _near optimal_ mixed strategies $p^{*}$ and $q^{*}$ can be computed using the MWA algorithm.

### Algorithm
We have a [two player zero sum game](https://en.wikipedia.org/wiki/Zero-sum_game) with a payoff matrix $\mathbf{M}\in\mathbb{R}^{N\times{N}}$. The row player chooses actions according to a distribution $p$, and the column player chooses actions based on a distribution $q$. The MWA algorithm can compute the near-optimal mixed strategies $p^{*}$ and $q^{*}$ for the row and column players, respectively.

__Initialize__ the weights $w_{i} = \texttt{rand}$ for all actions $i\in\mathcal{A}$, and set the learning rate $\eta\in\left(0,1\right)$.

For round $t=1,2,\dots,T$:
1. The row player chooses an action $i$ with probability $p^{(t)} = \left\{w_{i}^{(t)}/\Phi^{(t)} \mid i = 1,2,\dots,N\right\}$ where $\Phi^{(t)} = \sum_{j=1}^{N}w_{j}^{(t)}$. 
2. Define $q^{(t)} = \text{arg}\max_{q}\left\{(p^{(t)})^{\top}\mathbf{M}q\right\}$ and $m^{(t)} = \mathbf{M}q^{(t)}$ for the column player.
3. Penalize costly decisions by updating the weights as: $w_{i}^{(t+1)} = w_{i}^{(t)}\cdot\exp\left(-\eta\cdot{m}_{i}^{(t)}\right)$ for all actions $i\in\mathcal{A}$

where we assume that the payoffs (elements of $\mathbf{M}$) are in the range $m_{ij}\in[-1, 1]$.

## Example: Rock-Paper-Scissors
Let's consider an example of a two-player zero-sum game: [Rock-Paper-Scissors](https://en.wikipedia.org/wiki/Rock_paper_scissors). In this game, each player _simultaneously_ chooses one of three possible actions: Rock, Paper, or Scissors. This game has three possible outcomes: win, loose or draw.
* __Rules__:A player who decides to play rock will beat another player who chooses scissors (`rock crushes scissors`), but will lose to one who has played paper (`paper covers rock`); a play of paper will lose to a play of scissors (`scissors cuts paper`). If both players choose the same shape, the game is a draw.

The payoff matrix for this game is the `3` $\times$ `3` matrix:
$$
\begin{align*}
\mathbf{M} = \begin{pmatrix}
0 & -1 & 1\\
1 & 0 & -1\\
-1 & 1 & 0
\end{pmatrix}
\end{align*}
$$
where the rows correspond to the actions of the _row player_ and the columns, correspond to the actions of the _column player_. The payoff for the row player is $-m_{ij}$, and the payoff for the column player is $m_{ij}$.

In [7]:
include("Include.jl"); # load my codes, packages, etc

__Build a model__. Let's construct an instance of [the `MyTwoPersonZeroSumGameModel` type](src/Types.jl) using [a custom `build(...)` method](src/Factory.jl). The model holds information associated with the game. 

We the game model in the `model::MyTwoPersonZeroSumGameModel` variable:

In [9]:
model = let

    # setup 
    M = [0 -1 1; 1 0 -1 ; -1 1 0]; # rock paper scissors payoff matrix

    # build a model -
    model = build(MyTwoPersonZeroSumGameModel, (
        ϵ = 0.8, # learning rate
        n = 3, # number of actions
        T = 20, # number of rounds we play the game
        payoffmatrix = M, # payoff matrix
    ));

    model; # return the 
end;

__Play the game__. Next, we play the game. We pass the `model::MyTwoPersonZeroSumGameModel` instance into [the `play(...)` method](src/Online.jl) as the only argument. This method returns the raw game output, where each row is a game instance (round), and each column is a player action, and the weights matrix.

In [11]:
(rps_sim, weights) = play(model);

In [82]:
weights

21×3 Matrix{Float64}:
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581
 0.564277  0.16212   0.340581
 0.253546  0.16212   0.757978
 0.253546  0.360805  0.340581

__Games outcome table__. `Unhide` the code block below to see how we constructed the game table [using the `pretty_tables(...)` method exported by the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl).
* _Summary_: Each row of the table displays the game's outcome. The first column shows the action of the _row player_, while the second column shows the (near) optimal action of the _column_ player, given the action of the _row player_.

In [35]:
let

    # initialize -
    T = model.T;
    moves = Dict{Int, String}(1 => "rock", 2=> "paper", 3=>"scissors"); # setup moves map
    df = DataFrame();

    # build rounds table -
    for t ∈ 1:T
        row_df = (
            game = t,
            player_1 = rps_sim[t,1] |> i-> moves[i],
            player_2 = rps_sim[t,2] |> i-> moves[i],
        )
        push!(df, row_df);
    end
    
    # build a table -
    pretty_table(df, tf = tf_simple)
end

 [1m  game [0m [1m player_1 [0m [1m player_2 [0m
 [90m Int64 [0m [90m   String [0m [90m   String [0m
      1       rock      paper
      2   scissors       rock
      3      paper   scissors
      4       rock      paper
      5   scissors       rock
      6      paper   scissors
      7       rock      paper
      8   scissors       rock
      9      paper   scissors
     10       rock      paper
     11   scissors       rock
     12      paper   scissors
     13       rock      paper
     14   scissors       rock
     15      paper   scissors
     16       rock      paper
     17   scissors       rock
     18      paper   scissors
     19       rock      paper
     20   scissors       rock


### What about the min-max theorem?
From the game output table, we see that the actions of the column player (in this case) are near optimal. Let's dig a little deeper into this, and look at the optimal payoff.

In [80]:
let 

    # setup -
    M = model. payoffmatrix;

    # actions - (r,p,s)
    p = [1,0,0]; # select a move for player 1
    q = [0,0,1]; # select a move for player 2

    # compute the payoff
    payoff = transpose(p)*M*q
    
    # rerurn
    println("Payoff for player 1: $(-payoff) and player 2: $(payoff)")
end

Payoff for player 1: -1.0 and player 2: 1.0


# Today?
That's a wrap! What are some of the interesting things we discussed today?