# L10c: Application of Multiplicative Weights Algorithms
In this lecture, we will explore the application of the Multiplicative Weights Update Algorithm (MWA) to zero-sum games and linear programming problems. 

> __Learning Objectives.__
> By the end of this lecture, you will be able to define and demonstrate mastery of the following key concepts:
>
> * **Zero-sum games**: A competitive scenario where one participant's gain is exactly balanced by another participant's loss, resulting in a net change of zero in total wealth or benefit. We'll use the multiplicative weights algorithm to find approximate Nash equilibria by iteratively down-weighting poorly performing actions over repeated play.
> * **Linear programming with multiplicative weights**: An approach to solving linear programs using the multiplicative weights framework, particularly effective for fractional packing and covering problems. We'll explore how to treat decision variables as "experts" and use constraint violations to guide weight updates toward feasible solutions.

These algorithms showcase the versatility of multiplicative weights beyond simple expert advice, demonstrating their power in game theory and optimization. Let's dive in!

___

## Zero-sum games
Let's consider the application of the multiplicative weights update algorithm to zero-sum games. 

> In [a zero-sum game](https://en.wikipedia.org/wiki/Zero-sum_game), players have _opposing interests_, and the players' payoffs sum to zero: one's gain is the other's loss. The multiplicative-weights (MW) algorithm finds (approximate) Nash equilibria by down-weighting poorly performing actions over repeated play.

Let's dig into some the details of the game:
* __Game__: A set of $k$ players play a zero-sum game. During each turn of the game, each player can choose an action $a\in\mathcal{A}$ from the set of actions $\mathcal{A}$, where the number of possible actions is $\dim\mathcal{A} = N$. If we consider $k = 2$, the payoff for the players is represented by a payoff matrix $\mathbf{M}\in\mathbb{R}^{N\times{N}}$. If the row player plays $i$ and the column player plays $j$, then row's payoff is $m_{ij}$ and column's is $-m_{ij}$.
* __Goals__: The goal of the row player is to _maximize_ their payoff. However, the goal of the column player is to _minimize_ the row player's payoff.  Suppose the _row player_ chooses actions according to a distribution $\mathbf{p}$, and the _column player_ chooses actions based on a distribution $\mathbf{q}$. The expected payoff for the _row player_ is: $\mathbf{p}^{\top}\mathbf{M}\mathbf{q}$ while the expected payoff for the _column player_ is: $-\mathbf{p}^{\top}\mathbf{M}\mathbf{q}$.
* __Nash Equilibrium__: In a Nash equilibrium, each player's strategy is optimal given the other player's strategy. For zero-sum games, a Nash equilibrium corresponds to a _minimax_ solution where the row player maximizes their minimum expected payoff while the column player minimizes the row player's maximum expected payoff. The multiplicative weights algorithm converges to an approximate Nash equilibrium by iteratively adjusting strategies based on observed performance.


To use multiplicative weights to solve this game, we need to look at it from a different perspective. Instead of maximizing their expected payoff, suppose the row player wants to minimize their __expected loss__ against the column player's strategy. If we define the loss matrix $\mathbf{L} = -\mathbf{M}$, then maximizing the payoff $\mathbf{p}^\top\mathbf{M}\mathbf{q}$ is equivalent to minimizing the loss $\mathbf{p}^\top\mathbf{L}\mathbf{q} = -\mathbf{p}^\top\mathbf{M}\mathbf{q}$.

### Algorithm
Let's outline a simple implementation of the multiplicative weights update algorithm for a two-player zero-sum game. Given a payoff matrix $\mathbf{M}\in\mathbb{R}^{N\times{N}}$, we want to find a _mixed strategy_, a probability distribution over actions, for the row player that minimizes expected loss.

__Initialization:__ Given a payoff matrix $\mathbf{M}\in\mathbb{R}^{N\times{N}}$, where the payoffs (elements of $\mathbf{M}$) are in the range $m_{ij}\in[-1, 1]$. 
Initialize the weights $w_{i}^{(1)} \gets 1$ for all actions $i\in\mathcal{A}$, where $\mathcal{A} = \{1,2,\dots,N\}$, and set the learning rate $\eta\in(0,1)$.

> __Choosing T__: The number of rounds $T$ determines the accuracy of the approximate Nash equilibrium. To achieve an $\epsilon$-Nash equilibrium, choose $T \geq \frac{\ln N}{\epsilon^2}$. For example, with $N=10$ actions and desired accuracy $\epsilon=0.1$, we need $T \geq \frac{\ln 10}{0.01} \approx 230$ rounds.

> __Choosing η__: The learning rate $\eta$ controls the step size of weight updates. Common rules of thumb include:
> - __Theory-based__: $\eta = \sqrt{\frac{\ln N}{T}}$ optimizes the convergence bound
> - __Simple rule__: $\eta = \frac{1}{\sqrt{T}}$ for practical applications  
> - __Adaptive__: Start with $\eta = 0.1$ and reduce by half if convergence stalls
> - __Constraint__: Ensure $\eta \leq 1$ to prevent negative weights (since losses are bounded in $[-1,1]$)

For each round $t=1,2,\dots,T$ __do__:
1. Compute the normalization factor: $\Phi^{(t)} \gets \sum_{i=1}^{N}w_{i}^{(t)}$.
1. __Row player__ computes its strategy: The _row player_ will choose an action with probability $\mathbf{p}^{(t)} \gets \left\{w_{i}^{(t)}/\Phi^{(t)} \mid i = 1,2,\dots,N\right\}$. 
2. __Column player__ computes its strategy: The _column player_ will choose action: $j\gets \text{arg}\min_{j\in\mathcal{A}}\left\{\mathbf{p}^{(t)\top}\mathbf{M}\mathbf{e}_{j}\right\}$, so that $\mathbf{q}^{(t)} \gets \mathbf{e}_{j}$, where $\mathbf{e}_{j}$ is the $j$-th standard basis vector. The row player experiences loss vector $\boldsymbol{\ell}^{(t)} \gets \mathbf{L}\mathbf{q}^{(t)}$.
3. Update the weights: $w_i^{(t+1)} \gets w_i^{(t)}\;\exp\bigl(-\eta\,\ell_i^{(t)}\bigr)$ for all actions $i\in\mathcal{A}$

### Convergence
After $T$ rounds, define the average strategies:  
$$
\bar p \;=\;\frac{1}{T}\sum_{t=1}^{T}p^{(t)}, 
\quad
\bar q \;=\;\frac{1}{T}\sum_{t=1}^{T}q^{(t)}.
$$
Then $(\bar p,\bar q)$ is an $\epsilon$-Nash equilibrium with
$$
  \max_{q}\,\bar p^\top M\,q
  \;-\;\min_{p}\,p^\top M\,\bar q
  \;\le\;\epsilon,
  \quad
  \epsilon = O\Bigl(\sqrt{\tfrac{\ln N}{T}}\Bigr).
$$

___

## Linear Programming
Next, let's explore the application of the multiplicative weights update algorithm to linear programming problems. Let $\Delta_{m} = \{\mathbf{x} \in \mathbb{R}_{\geq{0}}^{m} \mid \sum_{i=1}^{m}x_{i} = \tau\}$ be a set of $m$-dimensional vectors with non-negative entries that sum to $\tau$. Then, we want to solve the following linear program:
$$
\begin{align*}
\text{Find} &\quad \mathbf{x} \in \Delta_{m} \\
\text{subject to} &\quad \mathbf{A}\mathbf{x} \leq \mathbf{b}
\end{align*}
$$
This formulation may seem restrictive, but we can convert _most_ linear programs into this form. We will use the Multiplicative Weight Update algorithm to solve the following problem. There is a famous (Cornell) Multiplicative Weight Update algorithm to solve this problem: [Plotkin, Serge A., et al. "Fast Approximation Algorithms for Fractional Packing and Covering Problems." Mathematics of Operations Research, vol. 20, no. 2, 1995, pp. 257–301](https://www.jstor.org/stable/3690406?socuuid=57de56c3-135d-4376-9af5-be0257a4c2d8)


### Algorithm
Let's outline an implementation [inspired by a lecture by Prof. Saranurak at the University of Michigan](https://www.youtube.com/watch?v=5u8wYZjsHuc&t=3190s). Given a constraint matrix $\mathbf{A}\in\mathbb{R}^{n\times{m}}$ and a right-hand side vector $\mathbf{b}\in\mathbb{R}^{n}$, we want to find a solution $\mathbf{x}\in\Delta_{m}$ such that $\mathbf{A}\mathbf{x} \leq \mathbf{b}$. We assume that the entries of $\mathbf{A}$ are bounded by $-\rho\leq{a_{ij}}\leq{\rho}$ for all $i,j$.

__Initialization__: We have $m$ experts (one for each unknown $x_{i}$ variable). Each expert has a weight $w_{i}^{(t)}$ at round $t$. The weights are initialized to $w_{i}^{(1)}=1$ for all experts. Specify a learning rate $\eta\in{(0,1)}$, a convergence flag $\texttt{converged} \gets \texttt{false}$, a small tolerance $\epsilon > 0$, a maximum number of iterations $T \gets \lceil\ln(m)/\epsilon^2\rceil$, and an interation counter $t\gets 1$.

> __Choosing ε__: The tolerance $\epsilon$ determines the maximum acceptable constraint violation in the final solution. Rules of thumb include:
> - __Problem-dependent__: Set $\epsilon$ based on the physical or business meaning of constraint violations (e.g., if constraints represent capacity limits, choose $\epsilon$ as a small fraction of those limits)
> - __Relative tolerance__: Use $\epsilon = 0.01 \times \min_k |b_k|$ (1% of the smallest constraint bound)
> - __Absolute tolerance__: For normalized problems, $\epsilon = 0.01$ or $\epsilon = 0.001$ often work well
> - __Computational trade-off__: Smaller $\epsilon$ requires more iterations ($T \propto 1/\epsilon^2$) but gives higher accuracy

While not $\texttt{converged}$ __do__:
1. Compute the normalization factor: $\Phi^{(t)} \gets \sum_{i=1}^{m}w_{i}^{(t)}$.
1. For each expert $i = 1,2,\dots,m$ __do__:
    - Compute a candidate solution : $x_{i}^{(t)} \gets \left({\tau}/{\Phi^{(t)}}\right)\;w_{i}^{(t)}$
1. For each constraint $k = 1,2,\dots,n$ __do__:
    - Compute the constraint value: $r_{k}^{(t)} \gets \sum_{i=1}^{m}a_{ki}x_{i}^{(t)} - b_{k} - 2\epsilon$.
        > __Slack__: The extra $2\epsilon$ slack guarantees that, after $T$ rounds, any remaining violation is $\le\epsilon$.
2. Check for convergence:
    - If all $r_{k}^{(t)} \leq 0$ for $k = 1,2,\dots,n$, then $\mathbf{x}^{(t)}$ is a feasible solution.  Set $\texttt{converged} \gets \texttt{true}$, return the $\mathbf{x}^{(t)}$ vector. 
    - Otherwise: if $\max_k r_k>0$, let $V=\{k : r_k>0\}$ and perform the weight update.
    - If $t \geq T$, then we've run out of iterations. Set $\texttt{converged} \gets \texttt{true}$. Warning: problem may be $\texttt{infeasible}$. 
3. For each violated constraint $k \in V$, update the weights of _all experts_ using the update rule:
    $$
    \begin{align*}
    w_{j}^{(t+1)} \gets w_{j}^{(t)}\cdot\left(1-\eta\cdot{a_{k,j}}\right) \quad j = 1,2,\dots,m
    \end{align*}
    $$
    > __Bound__: We assume assume $|a_{ij}|\le\rho$.  To keep each weight update non-negative, we require $\eta\le{1/\rho}$. 
4. Update the iteration counter: $t \gets t + 1$.
___

## Lab
In the associated lab, you will implement the Multiplicative Weights Update Algorithm to solve a linear programming problem. 

## Summary

In this notebook, we've explored powerful applications of the multiplicative weights framework beyond simple expert advice, demonstrating its versatility in game theory and constrained optimization:

> __Key takeaways:__
>
> * **Zero-sum games and Nash equilibria**: The multiplicative weights algorithm provides a practical approach to finding approximate Nash equilibria in two-player zero-sum games by treating actions as experts and iteratively down-weighting poorly performing strategies. By converting payoff maximization into loss minimization, the row player adapts its mixed strategy over time while the column player responds optimally, converging to an approximate equilibrium with accuracy improving as the square root of iterations.
> * **Linear programming via multiplicative weights**: The algorithm extends naturally to fractional packing and covering problems by treating decision variables as experts and using constraint violations to guide weight updates. Each violated constraint triggers multiplicative penalties proportional to the constraint coefficients, gradually steering the solution toward feasibility while maintaining the simplex constraint that variables remain non-negative and sum to a specified total.
> * **Convergence guarantees and parameter selection**: Both applications achieve convergence with careful parameter tuning—learning rates must respect coefficient bounds to prevent negative weights, iteration counts scale logarithmically with problem size and inversely with desired accuracy squared, and slack tolerances ensure that approximate solutions satisfy constraints within acceptable margins after finite rounds.

These applications showcase how the multiplicative weights framework provides unified computational strategies for diverse problems in game theory and optimization where iterative adaptation and probabilistic reasoning lead to provably good approximate solutions.

___