man/spread_rvars.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gather_rvars.R, R/spread_rvars.R
\name{gather_rvars}
\alias{gather_rvars}
\alias{spread_rvars}
\title{Extract draws from a Bayesian model into tidy data frames of random variables}
\usage{
gather_rvars(model, ..., ndraws = NULL, seed = NULL)

spread_rvars(model, ..., ndraws = NULL, seed = NULL)
}
\arguments{
\item{model}{A supported Bayesian model fit. Tidybayes supports a variety of model objects;
for a full list of supported models, see \link{tidybayes-models}.}

\item{...}{Expressions in the form of
\code{variable_name[dimension_1, dimension_2, ...]}. See \emph{Details}.}

\item{ndraws}{The number of draws to return, or \code{NULL} to return all draws.}

\item{seed}{A seed to use when subsampling draws (i.e. when \code{ndraws} is not \code{NULL}).}
}
\value{
A data frame.
}
\description{
Extract draws from a Bayesian model for one or more variables (possibly with named
dimensions) into one of two types of long-format data frames of \link[posterior:rvar]{posterior::rvar} objects.
}
\details{
Imagine a JAGS or Stan fit named \code{model}. The model may contain a variable named
\code{b[i,v]} (in the JAGS or Stan language) with dimension \code{i} in \code{1:100} and
dimension \code{v} in \code{1:3}. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like \code{"b[1,1]"}, \code{"b[2,1]"}, etc.

\code{spread_rvars} and \code{gather_rvars} provide a straightforward
syntax to translate these columns back into properly-indexed \code{\link[posterior:rvar]{rvar}}s in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.

\code{spread_rvars} will spread names of variables in the model across the data frame as column names,
whereas \code{gather_rvars} will gather variable names into a single column named \code{".variable"} and place
values of variables into a column named \code{".value"}. To use naming schemes from other packages
(such as \code{broom}), consider passing
results through functions like \code{\link[=to_broom_names]{to_broom_names()}} or \code{\link[=to_ggmcmc_names]{to_ggmcmc_names()}}.

For example, \code{spread_rvars(model, a[i], b[i,v])} might return a data frame with:
\itemize{
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}
\item column \code{"a"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"a[i]"}
\item column \code{"b"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"b[i,v]"}
}

\code{gather_rvars(model, a[i], b[i,v])} on the same model would return a data frame with:
\itemize{
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}, or \code{NA}
on rows where \code{".variable"} is \code{"a"}.
\item column \code{".variable"}: value in \code{c("a", "b")}.
\item column \code{".value"}: \code{\link[posterior:rvar]{rvar}} containing draws from \code{"a[i]"} (when \code{".variable"} is \code{"a"})
or \code{"b[i,v]"} (when \code{".variable"} is \code{"b"})
}

\code{spread_rvars} and \code{gather_rvars} can use type information
applied to the \code{model} object by \code{\link[=recover_types]{recover_types()}} to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the \code{v} dimension
in the original data frame \code{data} was a factor with levels \code{c("a","b","c")},
then we could use \code{recover_types} before \code{spread_rvars}:

\preformatted{model \%>\%
 recover_types(data) %\>\%
 spread_rvars(model, b[i,v])
}

Which would return the same data frame as above, except the \code{"v"} column
would be a value in \code{c("a","b","c")} instead of \code{1:3}.

For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable \code{d[i]} with the same \code{i} subscript
as \code{b[i,v]}, and a variable \code{x} with no subscripts, we could do this:

\preformatted{spread_rvars(model, x, d[i], b[i,v])}

Which is roughly equivalent to this:

\preformatted{spread_rvars(model, x) \%>\%
 inner_join(spread_rvars(model, d[i])) \%>\%
 inner_join(spread_rvars(model, b[i,v]))
}

Similarly, this:

\preformatted{gather_rvars(model, x, d[i], b[i,v])}

Is roughly equivalent to this:

\preformatted{bind_rows(
 gather_rvars(model, x),
 gather_rvars(model, d[i]),
 gather_rvars(model, b[i,v])
)}

The \code{c} and \code{cbind} functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts \code{i} and \code{v}, we could do either of these:

\preformatted{spread_rvars(model, c(w, x, y, z)[i,v])}
\preformatted{spread_rvars(model, cbind(w, x, y, z)[i,v])  # equivalent}

Each of which is roughly equivalent to this:

\preformatted{spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])}

Besides being more compact, the \code{c()}-style syntax is currently also slightly
faster (though that may change).

Dimensions can be left nested in the resulting \code{\link[posterior:rvar]{rvar}} objects by leaving their names
blank; e.g. \code{spread_rvars(model, b[i,])} will place the first index (\code{i}) into
rows of the data frame but leave the second index nested in the \code{b} column
(see \emph{Examples} below).
}
\examples{

library(dplyr)

data(RankCorr, package = "ggdist")

RankCorr \%>\%
  spread_rvars(b[i, j])

# leaving an index out nests the index in the column containing the rvar
RankCorr \%>\%
  spread_rvars(b[i, ])

RankCorr \%>\%
  spread_rvars(b[i, j], tau[i], u_tau[i])

# gather_rvars places variables and values in a longer format data frame
RankCorr \%>\%
  gather_rvars(b[i, j], tau[i], typical_r)

}
\seealso{
\code{\link[=spread_draws]{spread_draws()}}, \code{\link[=recover_types]{recover_types()}}, \code{\link[=compose_data]{compose_data()}}. See also
\code{\link[posterior:rvar]{posterior::rvar()}} and \code{\link[posterior:draws_rvars]{posterior::as_draws_rvars()}}, the functions that power
\code{spread_rvars} and \code{gather_rvars}.
}
\author{
Matthew Kay
}
\keyword{manip}