In this notebook, we'll

* Define Shapley values.
* Implement it for toy examples.
* Define Baseline Shapley values.
* Implement it for toy examples.

# Definition of Shapley value

Let 

* $N$ be the set of all $n$ players, i.e. $n = |N|$.
* $\mathcal{P}$ be all permutations of $N$.
* $P$ be a permutation of $N$.
* $d$ be the index of player $i$ in permutation $P$.
* $P[:d+1]$ be the set of all players before $i$ with $i$ (0-based indexing).
* $P[:d]$ be the set of all palyers before $i$ without $i$.
* $v$ be a value function that maps a set of players to a real number, $v: \text{Power Set}(N) \rightarrow \mathbb{R}$.
* $\phi_i(N, v)$ be the Shapley value of player $i$ given $N$ and $v$.

One way of formulating the Shapley value is

\begin{align*}
\phi_i(N, v) = \sum_{P \in \mathcal{P}} \frac{1}{n!} v ( P[:d+1] ) - v( P[:d] )
\end{align*}


So Shapley value can be interpreted as the average marginal contribution from player $i$ over all permutations.

Let $S$ be the set of players in $P[:d]$, and $S \cup \{i\}$ be the set of players in $P[:d + 1]$. So $v(S) = v(P[:d])$ and $v(S \cup \{i\}) = v(P[: d + 1])$ Note, $P$ is ordered while $S$ is not.

Notice that once $S$ is fixed, the ordering of the players in $S$ does not affect $v(S)$, and there are $s!$ ways of ordering them (aka. permutations). Similarly, the ordering of the players after $i$ does not matter either, and there are $(n - s - 1)!$ ways of ordering. Therefore, Shapley value can also be written in a more common form,

\begin{align*}
\phi_i(N, v)
&= \sum _{S \subseteq N \backslash \{i\}} \frac{s!(n - s - 1)!}{n!} \Big(v(S \cup \{i\}) - v(S) \Big) \\
&= \sum _{S \subseteq N \backslash \{i\}} \frac{|S|!(|N| - |S| - 1)!}{|N|!} \Big(v(S \cup \{i\}) - v(S) \Big) \\
\end{align*}

In this form, the Shapley value is interpreted as the weighted average of marginal contributions from $i$ over all possible subsets of players before $i$. The weight is calculated $\frac{s!(n - s - 1)!}{n!}$.

A third form of writing Shapley value is from directly transforming the factorials into combinatorial notation,

\begin{align*}
\phi_i(N, v)
&= \sum _{S \subseteq N \backslash \{i\}} \frac{s!(n - s - 1)!}{n!} \Big(v(S \cup \{i\}) - v(S) \Big) \\
&= \frac{1}{n} \sum _{S \subseteq N \backslash \{i\}} \frac{s!(n - s - 1)!}{(n - 1)!} \Big(v(S \cup \{i\}) - v(S) \Big) \\
&= \frac{1}{n} \sum _{S \subseteq N \backslash \{i\}} \binom{n - 1}{s}^{-1} \Big(v(S \cup \{i\}) - v(S) \Big) \\
&= \frac{1}{n} \sum_{t = 0}^{n - 1} \binom{n - 1}{t}^{-1} \sum _{S \subseteq N \backslash \{i\} \\ s.t. |S| = t} \Big(v(S \cup \{i\}) - v(S) \Big) \\
\end{align*}

Note,

* In the 3rd equality, $\binom{n - 1}{s}$ is the number of ways to pick $s$ out of $n$ players without player $i$, so we can group the summands by possible $s$ values in the 4th equality, which ranges from $0$ to $n - 1$ (pick all players but $i$). 

We used $t$ instead of $s$ in the 4th equality to avoid confusion because by definition $s$ is always equal to $|S|$. More specifically, the difference between the last two equalities is that in Equality 3, we pick $S$ first, and then calculate its size $s$, while in Equality 4, we determine the size of the $S$ to pick first and then pick $S$ of only that size.

# Properties of Shapley values

Based on https://www.youtube.com/watch?v=9OFMRiAVH-w, Shapley values satisfy the following properties (axioms?).

* **Symmetry**: if for all $S$ that doesn't contain either player $i$ or $j$, $v(S \cup \{i\}) = v(S \cup \{j\})$, then $\phi_i(N, v) = \phi_j(N, v)$.
* **Dummy player**: if for all $S$ that doesn't contain $i$, $v(S \cup \{i\}) = v(S)$, then $\phi_i(N, v) = 0$.
* **Additivity**: if $v$ can be decomposed into two parts, $v_1$ and $v_2$, then $\phi_i(N, v) = \phi_i(N, v_1 + v_2)$.


Compared to the Supplemental material of [A Unified Approach to Interpreting Model Predictions
](https://papers.nips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html), the later also added an **Efficiency**: $\sum_{i \in N} \phi_i(N, v) = v(N) - v(\emptyset)$, which is quite straightforward, In the later, Dummy player is know as **Null effects** and Additivity is know is **Linearity**.

<span style="color:red">TODO: It'd be nice to learn the proofs</span>

Based on the interpretable ML community(E.g. https://hughchen.github.io/its_blog/index.html#shapley_values):

* **Local Accuracy/Efficiency**: the sum of Shapley values adds up to $v(N) - v(\emptyset)$.
* **Consistency/Monotonicity**: TBD (compare to [A Unified Approach to Interpreting Model Predictions](https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html).
* **Missingness**: same as the Dummy player property above.

Read [Monotonic solutions of cooperative games](https://link.springer.com/article/10.1007/BF01769885#:~:text=The%20principle%20of%20monotonicity%20for,player's%20allocation%20should%20not%20decrease) for more related ideas.

It looks based on the supplemental material:

Shapley has four properties: efficiency, symmetry, null effects and linearity, then Young's 1985 paper shows that monotonicity impleis linearity and null effects, then "A Unified Approach to Interpreting Model Predictions" paper shows that monotonicity also implies symmetry, then the properties of Shapley value can be summarized just as

* efficiency and
* monotonicity

For some reason, the "A Unified Approach to Interpreting Model Predictions" kept the missingness property, which seems identical to null effects property. In the paper, it says the property is required to adapt the Shapley proofs to the class of additive feature attribution methods. *So to further my understanding why it's kept, it's essential to go through all the proofs!!*

More ref: Question about missingness (https://github.com/slundberg/shap/issues/175).

# Implementation of Shapley values

We use this implementation to reproduce the calculations at https://hughchen.github.io/its_blog/index.html#shapley_values for Preset D. We'll replace the names Ava, Ben, Cat with a, b, c for clarity.

In [1]:
from typing import Set

import shapley

get_ipython().magic("load_ext autoreload")
get_ipython().magic("autoreload 2")

In [2]:
def value_func(players: Set[shapley.Player]) -> float:
    """Value functions."""
    values = {
        (): 1,
        ("Ava",): 0,
        ("Ben",): 1,
        ("Cat",): 1,
        ("Ava", "Ben"): 1,
        ("Ava", "Cat"): 1,
        ("Ben", "Cat"): 2,
        ("Ava", "Ben", "Cat"): 2,
    }
    return values[tuple(sorted(players))]

In [3]:
coalition = {"Ava", "Ben", "Cat"}

In [4]:
ϕ_a = shapley.shapley(coalition, value_func, "Ava")
ϕ_a

-0.3333333333333333

In [5]:
ϕ_b = shapley.shapley(coalition, value_func, "Ben")
ϕ_b

0.6666666666666666

In [6]:
ϕ_c = shapley.shapley(coalition, value_func, "Cat")
ϕ_c

0.6666666666666666

In [7]:
np.testing.assert_allclose(ϕ_a, -1/3)
np.testing.assert_allclose(ϕ_b, 2/3)
np.testing.assert_allclose(ϕ_c, 2/3)

In [10]:
assert ϕ_a + ϕ_b + ϕ_c == value_func({"Ava", "Ben", "Cat"}) - value_func({})