-
-
Notifications
You must be signed in to change notification settings - Fork 84
/
Copy pathPPC-overview.Rd
136 lines (129 loc) · 6.75 KB
/
PPC-overview.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ppc-overview.R
\name{PPC-overview}
\alias{PPC-overview}
\alias{PPC}
\title{Graphical posterior predictive checking}
\description{
The \strong{bayesplot} PPC module provides various plotting functions
for creating graphical displays comparing observed data to simulated data
from the posterior (or prior) predictive distribution. See the sections
below for a brief discussion of the ideas behind posterior predictive
checking, an overview of the available PPC plots, and tips on providing an
interface to \strong{bayesplot} from another package.
For plots of posterior (or prior) predictive distributions that do \emph{not}
include observed data see \link{PPD-overview} instead.
}
\details{
The idea behind posterior predictive checking is simple: if a model
is a good fit then we should be able to use it to generate data that looks
a lot like the data we observed.
\subsection{Posterior predictive distribution}{
To generate the data used for posterior predictive checks we simulate from
the \emph{posterior predictive distribution}. The posterior predictive
distribution is the distribution of the outcome variable implied by a model
after using the observed data \eqn{y} (a vector of outcome values), and
typically predictors \eqn{X}, to update our beliefs about the unknown
parameters \eqn{\theta} in the model. For each draw of the parameters
\eqn{\theta} from the posterior distribution
\eqn{p(\theta \,|\, y, X)}{p(\theta | y, X)}
we generate an entire vector of outcomes. The result is
an \eqn{S \times N}{S x N} matrix of simulations, where \eqn{S} is the the
size of the posterior sample (number of draws from the posterior
distribution) and \eqn{N} is the number of data points in \eqn{y}. That is,
each row of the matrix is an individual "replicated" dataset of \eqn{N}
observations.
}
\subsection{Notation}{
When simulating from the posterior predictive distribution we can use either
the same values of the predictors \eqn{X} that we used when fitting the model
or new observations of those predictors. When we use the same values of
\eqn{X} we denote the resulting simulations by \eqn{y^{rep}}{yrep} as they
can be thought of as \emph{replications} of the outcome \eqn{y} rather than
predictions for future observations. This corresponds to the notation from
Gelman et. al. (2013) and is the notation used throughout the documentation
for this package.
}
\subsection{Graphical posterior predictive checking}{
Using the datasets \eqn{y^{rep}}{yrep} drawn from the posterior predictive
distribution, the functions in the \strong{bayesplot} package produce various
graphical displays comparing the observed data \eqn{y} to the replications.
For a more thorough discussion of posterior predictive checking see
Chapter 6 of Gelman et. al. (2013).
}
\subsection{Prior predictive checking}{
To use \strong{bayesplot} for \emph{prior} predictive checks you can simply use draws
from the prior predictive distribution instead of the posterior predictive
distribution. See Gabry et al. (2019) for more on prior predictive checking
and when it is reasonable to compare the prior predictive distribution to the
observed data. If you want to avoid using the observed data for prior
predictive checks then you can use the \strong{bayesplot} \link{PPD} plots instead,
which do not take a \code{y} argument, or you can use the PPC plots but provide
plausible or implausible \code{y} values that you want to compare to the prior
predictive realizations.
}
}
\section{PPC plotting functions}{
The plotting functions for prior and
posterior predictive checking all have the prefix \code{ppc_} and all require
the arguments \code{y}, a vector of observations, and \code{yrep}, a matrix of
replications (in-sample predictions). The plots are organized into several
categories, each with its own documentation:
\itemize{
\item \link{PPC-distributions}: Histograms, kernel density estimates, boxplots, and
other plots comparing the empirical distribution of data \code{y} to the
distributions of individual simulated datasets (rows) in \code{yrep}.
\item \link{PPC-test-statistics}: The distribution of a statistic, or a pair of
statistics, over the simulated datasets (rows) in \code{yrep} compared to value of
the statistic(s) computed from \code{y}.
\item \link{PPC-intervals}: Interval estimates of \code{yrep} with \code{y}
overlaid. The x-axis variable can be optionally specified by the user
(e.g. to plot against a predictor variable or over time).
\item \link{PPC-errors}: Plots of predictive errors (\code{y - yrep}) computed from \code{y} and
each of the simulated datasets (rows) in \code{yrep}. For binomial models binned
error plots are also available.
\item \link{PPC-scatterplots}: Scatterplots (and similar visualizations) of the data
\code{y} vs. individual simulated datasets (rows) in \code{yrep}, or vs. the average
value of the distributions of each data point (columns) in \code{yrep}.
\item \link{PPC-discrete}: PPC functions that can only be used if \code{y} and \code{yrep} are
discrete. For example, rootograms for count outcomes and bar plots for
ordinal, categorical, and multinomial outcomes.
\item \link{PPC-loo}: PPC functions for predictive checks based on (approximate)
leave-one-out (LOO) cross-validation.
'
\item \link{PPC-censoring}: PPC functions comparing the empirical
distribution of censored data \code{y} to the distributions of individual
simulated datasets (rows) in \code{yrep}.
}
}
\section{Providing an interface for predictive checking from another package}{
In addition to the various plotting functions, the \strong{bayesplot} package
provides the S3 generic \code{\link[=pp_check]{pp_check()}}. Authors of \R packages for
Bayesian inference are encouraged to define \code{pp_check()} methods for the
fitted model objects created by their packages. See the package vignettes for
more details and a simple example, and see the \strong{rstanarm} and \strong{brms}
packages for full examples of \code{pp_check()} methods.
}
\references{
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and
Gelman, A. (2019), Visualization in Bayesian workflow.
\emph{J. R. Stat. Soc. A}, 182: 389-402. doi:10.1111/rssa.12378.
(\href{https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12378}{journal version},
\href{https://arxiv.org/abs/1709.01449}{arXiv preprint},
\href{https://github.com/jgabry/bayes-vis-paper}{code on GitHub})
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari,
A., and Rubin, D. B. (2013). \emph{Bayesian Data Analysis.} Chapman & Hall/CRC
Press, London, third edition. (Ch. 6)
}
\seealso{
Other PPCs:
\code{\link{PPC-censoring}},
\code{\link{PPC-discrete}},
\code{\link{PPC-distributions}},
\code{\link{PPC-errors}},
\code{\link{PPC-intervals}},
\code{\link{PPC-loo}},
\code{\link{PPC-scatterplots}},
\code{\link{PPC-test-statistics}}
}
\concept{PPCs}