# A Unified Approach to Interpreting Model Predictions

https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

## Additive feature attribution methods

##### Original prediction model $f$ and explanation model $g$

Note, $f$ is NOT the real function that generate data, e.g. it's a trained ML model.

with $\mathbf{x}'$ being the simplified features that can be mapped to the original features via


$$\mathbf{x} = h_\mathbf{x}(\mathbf{x}')$$

we try to ensure the property of the explanation model that

$$g(\mathbf{z}') \approx f(h_\mathbf{x}(\mathbf{z}'))$$

whenever $\mathbf{z}' \approx \mathbf{x}'$.

##### Additive feature attribution methods

$$
g(\mathbf{z}') = \phi_0 + \sum_{i=1}^M \phi_i z_i'
$$

where

* $g$ is the explanation model
* $\mathbf{z}' \in \{0, 1\}^M$, i.e. a $M$-dimension binary vector, with $M$ being the number of simplified features.
* $\phi_i \in \mathbb{R}$

|                                  |                              | Note                                                               | model to explanation | simplified inputs    |
|----------------------------------|------------------------------|--------------------------------------------------------------------|----------------------|----------------------|
| LIME                             |                              |                                                                    | blackbox             | interpretable inputs |
| DeepLIFT                         |                              |                                                                    | DNN                  |                      |
| Layer-wise relevance propagation |                              |                                                                    | DNN                  |                      |
| Classic Shapley Value Estimation | Shapley regression values    | needs retrainng models for all subsets of features                 |                      |                      |
|                                  | Shapley sampling values      | applying sampling approximation to Shapely regression values       |                      |                      |
|                                  | Quantitative Input Influence | Another way of sampling approximation to Shapely regression values |                      |                      |

## Simple properties uniquely determine additive feature attributions

##### Property 1: Local accuracy (aka. local accuracy)

$$
f(\mathbf{x}) = g(\mathbf{x}') = \phi_0 + \sum_{i=1}^M \phi_i x_i'
$$

which means the explanation model output should match the prediction model output when $\mathbf{z}' = \mathbf{x}'$, and hence $\mathbf{x} = h_\mathbf{x}(\mathbf{x}') = h_\mathbf{x}(\mathbf{z}')$.

Note, 

* $\phi_0 = \mathbb{E}[f(\mathbf{x})] = f_{\mathbf{x}}(\mathbf{0})$, i.e. model output when no features are provided, e.g. average prediction of the labels from the training set.

##### Property 2: Missingness

i.e. when $x_i' = 0$, then $\phi_i = 0$, i.e. a feature that's not included in the feature vector shouldn't have impact on the prediction

##### Property 3: Consistency

Let $y(\mathbf{z}') = f(h_\mathbf{x}(\mathbf{z}'))$, and $\mathbf{z}_{\backslash i}$ denote seting $z_i' = 0$ in the simplified binary feature vector. For two models $y_A$ and $y_B$, if 

$$
y_A(\mathbf{z}') - y_A(\mathbf{z}'_{\backslash i}) \ge y_B(\mathbf{z}') - y_B(\mathbf{z}'_{\backslash i})
$$

which can be expanded to $f_A(h_\mathbf{x}(\mathbf{z}')) - f_A(h_\mathbf{x}(\mathbf{z}'_{\backslash i})) \ge f_B(h_\mathbf{x}(\mathbf{z}')) - f_B(h_\mathbf{x}(\mathbf{z}'_{\backslash i}))$

then, then corresponding impact for the $i$th feature in the two models should satisfy

$$
\phi_{A, i} \ge \phi_{B, i}
$$

where the subscript $_A$ and $_B$ identifies which model this $\phi_i$ belongs to.

In words, consistency means that for two models, if the exclusion of a feature results in a larger reduction in the predicted value in model A than in model B, then this feature should have a bigger impact in model A than in model B, too.

##### Theorem 1

Only one possible explanation model that follows the **additive feature attribution methods** can satisfy all three properties

$$
\phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{\mathbf{z}' \subseteq  \mathbf{x}'} \binom{M - 1}{|\mathbf{z}'|}^{-1} \Big [ f(h_\mathbf{x}(\mathbf{z}')) - f(h_\mathbf{x}(\mathbf{z}_{\backslash i}')) \Big ]
$$

Note,


* when $x_i = 0$, $x_0' = z_0' = 0$, so $\phi_i(f, \mathbf{x})$ = 0.
* when $x_i \neq 0$, $x_i' = 1$, $\mathbf{z}' \subseteq \mathbf{x}'$ represents all $\mathbf{z}'$ vectors where the non-zero entries are a subset of the non-zero entries in $\mathbf{x}'$ with $z_i' = 1$. This correspond to all subsets of non-zero features $\mathbf{x}$ always including feature $i$. 

Note on symbols.

* $M - 1$ because of exclusion of feature $i$.
* $f(h_\mathbf{x}(\mathbf{z}'))$ doesn't depend on $i$.
* $|\mathbf{z}'|$ means the number of non-zero elements minus 1 (as $z_i'$ is always equal to 1). The original paper is a bit unclear about this, e.g. if $|\mathbf{z}'| = M$, $\binom{M - 1}{M}$ would become undefined.

The equation can be interpreted as sum up of all marginal contributions brought by feature $i$ of all possible feature vectors with $i$th feature being $0$ scaled by $\frac{1}{M}$.

The above equation can also be written as

$$
\phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{R \in \mathcal{R}} \frac{1}{(M - 1)!} \Big[ f_\mathbf{x} \left(P_i^R \cup i \right) - f_\mathbf{x}\left(P_i^R \right)\Big]
$$

or

$$
\phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{S \in \mathcal{F}} \binom{|\mathcal{F}| - 1}{|S|} ^{-1} \left(f_\mathbf{x} (S \cup i) - f_\mathbf{x}(S) \right)
$$

where

* $\mathcal{R}$ is the set of all feature ordering (<span style="color:red">TODO: needs to confirm if $\mathcal{R}$ includes feature $i$ or not</span>)

## SHAP (SHapley Additive exPlanation) Values

* Kernal SHAP (Lienar LIME + Shapley values)
* Deep SHAP (DeepLIFT + Shapley values)

# Explainable AI for Trees: From Local Explanations to Global Understanding

https://arxiv.org/pdf/1905.04610.pdf

### Algorithm 1 Estimating $\mathbb{E}[f(\mathbf{x})|\mathbf{x}_S]$

Complexity: $O(TLM2^M)$

Notations

* $T$: number of trees
* $D$: maximum depth of any tree
* $L$: number of leaves
* $M$: number of features


* $\mathbf{v}$: vector of nodes, $v_j \in \mathcal{R} \cup \text{internal}$
* $\mathbf{a}$: vector of indices represent the left child of each internal node
* $\mathbf{b}$: vector of indices represent the right child of each internal node
* $\mathbf{t}$: vector of thresholds for each internal node
* $\mathbf{d}$: vector of indices of features used for splitting in each internal node. $d_j \in \text{feature set}$.
* $\mathbf{r}$: vector of covers (i.e. how many data points in the training set fall in the corresponding sub-tree) of each node.

All vectors are of length $N$, the number of nodes in the tree.

### Algorithm 2

Complexity: $O(TLD^2)$

* $m$ is the path of unique features we have split on so far, and contains four attributes:
  1. $d$, the feature index, 
  1. $z$, the fraction of “zero” paths (where this feature is not in the set S) that flow through this branch, 
  1. $o$, the fraction of “one” paths (where this feature is in the set S) that flow through this branch, and
  1. $w$, which is used to hold the proportion of sets of a given cardinality that are present weighted by their Shapley weight
  
  
* $p_z$, fraction of zeros that are going to extend the subsets.
* $p_o$, fraction of ones that are going to textend the subsets
* $p_i$, index of the feature used to make the last split.

hot child: the child followed by the tree when given the input $\mathbf{x}$.