-
Notifications
You must be signed in to change notification settings - Fork 63
/
Copy pathspread_rvars.Rd
executable file
·154 lines (122 loc) · 6.19 KB
/
spread_rvars.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gather_rvars.R, R/spread_rvars.R
\name{gather_rvars}
\alias{gather_rvars}
\alias{spread_rvars}
\title{Extract draws from a Bayesian model into tidy data frames of random variables}
\usage{
gather_rvars(model, ..., ndraws = NULL, seed = NULL)
spread_rvars(model, ..., ndraws = NULL, seed = NULL)
}
\arguments{
\item{model}{A supported Bayesian model fit. Tidybayes supports a variety of model objects;
for a full list of supported models, see \link{tidybayes-models}.}
\item{...}{Expressions in the form of
\code{variable_name[dimension_1, dimension_2, ...]}. See \emph{Details}.}
\item{ndraws}{The number of draws to return, or \code{NULL} to return all draws.}
\item{seed}{A seed to use when subsampling draws (i.e. when \code{ndraws} is not \code{NULL}).}
}
\value{
A data frame.
}
\description{
Extract draws from a Bayesian model for one or more variables (possibly with named
dimensions) into one of two types of long-format data frames of \link[posterior:rvar]{posterior::rvar} objects.
}
\details{
Imagine a JAGS or Stan fit named \code{model}. The model may contain a variable named
\code{b[i,v]} (in the JAGS or Stan language) with dimension \code{i} in \code{1:100} and
dimension \code{v} in \code{1:3}. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like \code{"b[1,1]"}, \code{"b[2,1]"}, etc.
\code{spread_rvars} and \code{gather_rvars} provide a straightforward
syntax to translate these columns back into properly-indexed \code{\link[posterior:rvar]{rvar}}s in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
\code{spread_rvars} will spread names of variables in the model across the data frame as column names,
whereas \code{gather_rvars} will gather variable names into a single column named \code{".variable"} and place
values of variables into a column named \code{".value"}. To use naming schemes from other packages
(such as \code{broom}), consider passing
results through functions like \code{\link[=to_broom_names]{to_broom_names()}} or \code{\link[=to_ggmcmc_names]{to_ggmcmc_names()}}.
For example, \code{spread_rvars(model, a[i], b[i,v])} might return a data frame with:
\itemize{
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}
\item column \code{"a"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"a[i]"}
\item column \code{"b"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"b[i,v]"}
}
\code{gather_rvars(model, a[i], b[i,v])} on the same model would return a data frame with:
\itemize{
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}, or \code{NA}
on rows where \code{".variable"} is \code{"a"}.
\item column \code{".variable"}: value in \code{c("a", "b")}.
\item column \code{".value"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"a[i]"} (when \code{".variable"} is \code{"a"})
or \code{"b[i,v]"} (when \code{".variable"} is \code{"b"})
}
\code{spread_rvars} and \code{gather_rvars} can use type information
applied to the \code{model} object by \code{\link[=recover_types]{recover_types()}} to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the \code{v} dimension
in the original data frame \code{data} was a factor with levels \code{c("a","b","c")},
then we could use \code{recover_types} before \code{spread_rvars}:
\preformatted{model \%>\%
recover_types(data) %\>\%
spread_rvars(model, b[i,v])
}
Which would return the same data frame as above, except the \code{"v"} column
would be a value in \code{c("a","b","c")} instead of \code{1:3}.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable \code{d[i]} with the same \code{i} subscript
as \code{b[i,v]}, and a variable \code{x} with no subscripts, we could do this:
\preformatted{spread_rvars(model, x, d[i], b[i,v])}
Which is roughly equivalent to this:
\preformatted{spread_rvars(model, x) \%>\%
inner_join(spread_rvars(model, d[i])) \%>\%
inner_join(spread_rvars(model, b[i,v]))
}
Similarly, this:
\preformatted{gather_rvars(model, x, d[i], b[i,v])}
Is roughly equivalent to this:
\preformatted{bind_rows(
gather_rvars(model, x),
gather_rvars(model, d[i]),
gather_rvars(model, b[i,v])
)}
The \code{c} and \code{cbind} functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts \code{i} and \code{v}, we could do either of these:
\preformatted{spread_rvars(model, c(w, x, y, z)[i,v])}
\preformatted{spread_rvars(model, cbind(w, x, y, z)[i,v]) # equivalent}
Each of which is roughly equivalent to this:
\preformatted{spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])}
Besides being more compact, the \code{c()}-style syntax is currently also slightly
faster (though that may change).
Dimensions can be left nested in the resulting \code{\link[posterior:rvar]{rvar}} objects by leaving their names
blank; e.g. \code{spread_rvars(model, b[i,])} will place the first index (\code{i}) into
rows of the data frame but leave the second index nested in the \code{b} column
(see \emph{Examples} below).
}
\examples{
library(dplyr)
data(RankCorr, package = "ggdist")
RankCorr \%>\%
spread_rvars(b[i, j])
# leaving an index out nests the index in the column containing the rvar
RankCorr \%>\%
spread_rvars(b[i, ])
RankCorr \%>\%
spread_rvars(b[i, j], tau[i], u_tau[i])
# gather_rvars places variables and values in a longer format data frame
RankCorr \%>\%
gather_rvars(b[i, j], tau[i], typical_r)
}
\seealso{
\code{\link[=spread_draws]{spread_draws()}}, \code{\link[=recover_types]{recover_types()}}, \code{\link[=compose_data]{compose_data()}}. See also
\code{\link[posterior:rvar]{posterior::rvar()}} and \code{\link[posterior:draws_rvars]{posterior::as_draws_rvars()}}, the functions that power
\code{spread_rvars} and \code{gather_rvars}.
}
\author{
Matthew Kay
}
\keyword{manip}