-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.qmd
78 lines (74 loc) · 5.25 KB
/
intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Introduction
Machine learning models are increasingly being used to replace or augment human
judgment. The decisions that these models support have important ramifications,
especially in high-stakes medical and financial settings. As a result, there is
a growing demand for model explanations that address why a particular
prediction was made. While some models are inherently interpretable, others are
black-boxes and do not lend themselves easily to explanation. A model may be
black box either because it is too complicated to understand, or because access
to the model is limited. For example, a model may functionally be a black box
if only the model’s predictions are made available for the purposes of
explanation. The ubiquitous use of such black-box models to make decisions,
has fueled the development of the field of explainable AI (XAI), which
encompasses a broad set of methods that can explain a model’s decisions.
Methods derived from the Shapley value are one of the most common ways to
generate post-hoc explanations of black-box models. The Shapley value is a
solution concept from cooperative game theory that quantifies how to fairly
distribute the payout received by a group to each player based on their
contribution @shapley_value_1953. Most Shapley methods treat a model’s features
as the players in the game and some aspect of the model, such as its
prediction, loss, or variance explained, as the payout. To compute the Shapley
value, the explainer specifies a value function, which indicates the payout
(e.g. the model’s prediction) that is received for all possible subsets of
features. Since most machine learning models cannot handle missing inputs, the
value function is responsible for substituting a value for any feature that is
not part of the subset. In essence, it must simulate what happens when features
are removed from the model (see @janzing_feature_2019,
@merrick_explanation_2020, and @covert_explaining_2020). Numerous options have
been proposed: train different models for each subset of features, use a fixed
value (e.g. zero), use a value from a reference data point, or use the expected
value over a collection of reference points. Each approach yields different
results, all of which are, strictly speaking, Shapley values. The decision
about how to specify the value function also impacts the interpretation of the
resulting values. These differences are not only quantitatively significant,
but also take on real-world significance when they are used to explain a
model’s decision. For example, Shapley explanations have been proposed as a
tool for providing recourse recommendations to subjects of automated decisions,
for ensuring algorithmic fairness, and other applications. However, it is quite
likely that different Shapley methods will lead to explanations, which lend
themselves to different conclusions. It is therefore vitally important for
practitioners to make an informed decision about which method is appropriate
for a given situation.
Shapley-based methods are used to provide different levels of explanation.
Model level explanations treat the model as an input-output system, while
world-level explanations account for relationships that exist in the world.
Failure to clearly distinguish between different levels of explanation has
fueled two ongoing debates in the literature: the choice over which value
function is appropriate and whether or not features that indirectly influence
the model should have non-zero Shapley values. Recent work resolves these
debates by suggesting that what is correct is context-dependent
(@chen_true_2020, @covert_explaining_2020). While we fundamentally agree, we
arrive at this conclusion in a more principled way by considering these debates
from a causal perspective that is grounded in a model explanation evaluation
framework.
In this work, we propose a human-centric framework for evaluating model
explanations and review existing Shapley methods in light of this framework.
Our primary goal is to equip practitioners with the information necessary to
make informed decisions when generating model explanations. Selecting an
appropriate method is challenging because it requires evaluating the quality of
an explanation. Much of the work in XAI either ignores this problem entirely,
or leverages an implicit operational definition of correctness. In the
following section, we propose a human-centric framework for assessing the
quality of an explanation based on the premise that correct explanations are
ones that align with an explainee’s objectives. Next, we provide an
introduction to the formalisms used in causal inference as causal reasoning is
essential to evaluating explanations. We then provide a brief introduction to
Shapley values and how they are used to generate model explanations. With all
of this context, we then review existing Shapley methods from a causal
perspective and point out the key considerations that practitioners should keep
in mind based on our proposed evaluation framework. Combining the prior
sections, we then provide practical guidance for selecting an appropriate
explanation method. In the final section, we take a step back and consider how
a causal perspective changes our understanding of the Shapley explanation
literature and its implications for future work on Shapley-based model
explanations.