-
Notifications
You must be signed in to change notification settings - Fork 10
/
flash.R
240 lines (236 loc) · 12 KB
/
flash.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
#' Empirical Bayes matrix factorization
#'
#' Fits an empirical Bayes matrix factorization (see \strong{Details} for a
#' description of the model). The resulting fit is referred to as a "flash"
#' object (short for Factors and Loadings using Adaptive SHrinkage). Two
#' interfaces are provided. The \code{flash} function provides a simple
#' interface that allows a flash object to be fit in a single pass, while
#' \code{flash_xxx} functions are pipeable functions that allow for more
#' complex flash objects to be fit incrementally (available functions are
#' listed below under \strong{See Also}). See the vignettes and
#' \strong{Examples} for usage.
#'
#' If \eqn{Y} is an \eqn{n \times p} data matrix, then the rank-one
#' empirical Bayes matrix factorization model is:
#' \deqn{Y = \ell f' + E,} where \eqn{\ell} is an
#' \eqn{n}-vector of \strong{loadings}, \eqn{f} is a
#' \eqn{p}-vector of \strong{factors}, and \eqn{E} is an
#' \eqn{n \times p} matrix of \strong{residuals} (or "errors").
#' Additionally:
#' \deqn{e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p}
#' \deqn{\ell \sim g_\ell \in G_\ell}
#' \deqn{f \sim g_f \in G_f.}
#' The residual variance parameters \eqn{s_{ij}^2} are constrained to have
#' a simple structure and are fit via maximum likelihood. (For example, one
#' might assume that all standard errors are identical: \eqn{s_{ij}^2 = s^2}
#' for some \eqn{s^2} and for all \eqn{i}, \eqn{j}).
#' The functions \eqn{g_\ell} and \eqn{g_f} are assumed to belong to
#' some families of priors \eqn{G_\ell} and \eqn{G_f} that are
#' specified in advance, and are estimated via variational approximation.
#'
#' The general rank-\eqn{K} empirical Bayes matrix factorization model is:
#' \deqn{Y = LF' + E} or
#' \deqn{y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,}
#' where \eqn{L} is now a matrix of loadings and \eqn{F} is a matrix of
#' factors.
#'
#' Separate priors \eqn{g_\ell^{(k)}} and \eqn{g_f^{(k)}} are estimated via
#' empirical Bayes, and different prior families may be used for different
#' values of \eqn{k}. In general, then:
#' \deqn{e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p}
#' \deqn{\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K}
#' \deqn{f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.}
#'
#' Typically, \eqn{G_\ell^{(k)}} and \eqn{G_f^{(k)}} will be closed under
#' scaling, in which case \eqn{\ell_k} and \eqn{f_k} are only identifiable
#' up to a \strong{scaling factor} \eqn{d_k}. In other words, we can write:
#' \deqn{Y = LDF' + E,}
#' where \eqn{D} is a diagonal matrix with diagonal entries \eqn{d_1, ..., d_K}.
#' The model can then be made identifiable by constraining the scale of
#' \eqn{\ell_k} and \eqn{f_k} for \eqn{k = 1, ..., K}.
#'
#' @param data The observations. Usually a matrix, but can also be a sparse
#' matrix of class \code{\link[Matrix]{Matrix}} or a low-rank matrix
#' representation as returned by, for example, \code{\link{svd}},
#' \code{\link[irlba]{irlba}}, \code{\link[rsvd]{rsvd}}, or
#' \code{\link[softImpute]{softImpute}} (in general, any list that
#' includes fields \code{u}, \code{d}, and \code{v} will be interpreted
#' as a low-rank matrix representation).
#'
#' @param S The standard errors. Can be \code{NULL} (in which case all residual
#' variance will be estimated) or a matrix, vector, or scalar. \code{S}
#' should be a scalar if standard errors are identical across observations. It
#' should be a vector if standard errors either vary across columns but are
#' constant within any given row, or vary across rows but are constant within
#' any given column (\code{flash} will use the length of the vector
#' to determine whether the supplied values correspond to rows or columns; if the
#' data matrix is square, then the sense must be specified using parameter
#' \code{S_dim} in function \code{\link{flash_init}}).
#'
#' @param ebnm_fn The function or functions used to solve the empirical Bayes
#' normal means (EBNM) subproblems. Most importantly, these functions specify
#' the families of distributions \eqn{G_\ell^{(k)}} and \eqn{G_f^{(k)}} to which the
#' priors on loadings and factors \eqn{g_\ell^{(k)}} and \eqn{g_f^{(k)}} are
#' assumed to belong. If the same function is to be used for both loadings
#' \eqn{L} and factors \eqn{F}, then \code{ebnm_fn} can be a single function.
#' If one function is to be used for loadings and a second for factors,
#' then \code{ebnm_fn} should be a list of length two, with the first
#' element giving the function for loadings and the second the function
#' for factors. If different functions are to be used for different values of
#' \eqn{k}, then factor/loadings pairs must be added successively using
#' multiple calls to either \code{\link{flash_greedy}} or
#' \code{\link{flash_factors_init}}.
#'
#' Any EBNM function provided by package \code{\link[ebnm]{ebnm}} can be
#' used as input. Non-default arguments to parameters can be supplied using
#' the helper function \code{\link{flash_ebnm}}. Custom EBNM functions can
#' also be used: for details, see \code{\link{flash_ebnm}}.
#'
#' @param var_type Describes the structure of the estimated residual variance.
#' Can be \code{NULL}, \code{0}, \code{1}, \code{2}, or \code{c(1, 2)}. If
#' \code{NULL}, then \code{S} accounts for all residual variance. If
#' \code{var_type = 0}, then the estimated residual variance (which is added
#' to any variance given by \code{S}) is assumed to be constant
#' across all observations. Setting \code{var_type = 1} estimates a single
#' variance parameter for each row; \code{var_type = 2} estimates one
#' parameter for each column; and \code{var_type = c(1, 2)} optimizes over
#' all rank-one matrices (that is, it assumes that the residual variance
#' parameter \eqn{s_{ij}} can be written \eqn{s_{ij} = a_i b_j}, where the
#' \eqn{n}-vector \eqn{a} and the \eqn{p}-vector \eqn{b} are to be
#' estimated).
#'
#' Note that if any portion of the residual variance is to be estimated, then
#' it is usually faster to set \code{S = NULL} and to let \code{flash}
#' estimate all of the residual variance. Further, \code{var_type = c(1, 2)}
#' is typically much slower than other options, so it should be used with
#' care.
#'
#' @param greedy_Kmax The maximum number of factors to be added. This will not
#' necessarily be the total number of factors added by \code{flash}, since
#' factors are only added as long as they increase the variational lower
#' bound on the log likelihood for the model.
#'
#' @param backfit A "greedy" fit is performed by adding up to
#' \code{greedy_Kmax} factors, optimizing each newly added factor in one go
#' without returning to optimize previously added factors. When
#' \code{backfit = TRUE}, \code{flash} will additionally perform a final
#' "backfit" where all factors are cyclically updated until convergence.
#' The backfitting procedure typically takes much longer than the greedy
#' algorithm, but it also usually improves the final fit to a significant
#' degree.
#'
#' @param nullcheck If \code{nullcheck = TRUE}, then \code{flash} will check
#' that each factor in the final flash object improves the overall fit. Any
#' factor that fails the check will be removed.
#'
#' @param verbose When and how to display progress updates. Set to
#' \code{0} for none, \code{1} for updates after a factor is added or a
#' backfit is completed, \code{2} for additional notifications about the
#' variational lower bound, and \code{3} for updates after every iteration.
#' It is also possible to output a single tab-delimited table of values
#' using function \code{\link{flash_set_verbose}} with \code{verbose = -1}.
#'
#' @return A \code{flash} object. Contains elements:
#' \describe{
#' \item{\code{n_factors}}{The total number of factor/loadings pairs \eqn{K}
#' in the fitted model.}
#' \item{\code{pve}}{The proportion of variance explained by each
#' factor/loadings pair. Since factors and loadings are not required to be
#' orthogonal, this should be interpreted loosely: for example, the total
#' proportion of variance explained could be larger than 1.}
#' \item{\code{elbo}}{The variational lower bound achieved by the
#' fitted model.}
#' \item{\code{residuals_sd}}{Estimated residual standard deviations (these
#' include any variance component given as an argument to \code{S}).}
#' \item{\code{L_pm, L_psd, L_lfsr}}{Posterior means,
#' standard deviations, and local false sign rates for loadings \eqn{L}.}
#' \item{\code{F_pm, F_psd, F_lfsr}}{Posterior means,
#' standard deviations, and local false sign rates for factors \eqn{F}.}
#' \item{\code{L_ghat}}{The fitted priors on loadings
#' \eqn{\hat{g}_\ell^{(k)}}.}
#' \item{\code{F_ghat}}{The fitted priors on factors
#' \eqn{\hat{g}_f^{(k)}}.}
#' \item{\code{sampler}}{A function that takes a single argument
#' \code{nsamp} and returns \code{nsamp} samples from the posterior
#' distributions for factors \eqn{F} and loadings \eqn{L}.}
#' \item{\code{flash_fit}}{A \code{\link{flash_fit}} object. Used by
#' \code{flash} when fitting is not performed all at once, but
#' incrementally via calls to various \code{flash_xxx} functions.}
#' }
#' The following methods are available:
#' \describe{
#' \item{\code{\link{fitted.flash}}}{Returns the "fitted values"
#' \eqn{E(LF') = E(L) E(F)'}.}
#' \item{\code{\link{residuals.flash}}}{Returns the expected residuals
#' \eqn{Y - E(LF') = Y - E(L) E(F)'}.}
#' \item{\code{\link{ldf.flash}}}{Returns an \eqn{LDF} decomposition (see
#' \strong{Details} above), with columns of \eqn{L} and \eqn{F} scaled
#' as specified by the user.}
#' }
#'
#' @examples
#' data(gtex)
#'
#' # Fit up to 3 factors and backfit.
#' fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
#'
#' # This is equivalent to the series of calls:
#' fl <- flash_init(gtex) |>
#' flash_greedy(Kmax = 3L) |>
#' flash_backfit() |>
#' flash_nullcheck()
#'
#' # Fit a unimodal distribution with mean zero to each set of loadings
#' # and a scale mixture of normals with mean zero to each factor.
#' fl <- flash(gtex,
#' ebnm_fn = c(ebnm_unimodal,
#' ebnm_normal_scale_mixture),
#' greedy_Kmax = 3)
#'
#' # Fit point-laplace priors using a non-default optimization method.
#' fl <- flash(gtex,
#' ebnm_fn = flash_ebnm(prior_family = "point_laplace",
#' optmethod = "trust"),
#' greedy_Kmax = 3)
#'
#' # Fit a "Kronecker" (rank-one) variance structure (this can be slow).
#' fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
#'
#' @references
#' Wei Wang and Matthew Stephens (2021).
#' "Empirical Bayes matrix factorization." \emph{Journal of Machine Learning
#' Research} 22, 1--40.
#'
#' @seealso \code{\link{flash_init}}, \code{\link{flash_greedy}},
#' \code{\link{flash_backfit}}, and \code{\link{flash_nullcheck}}. For more
#' advanced functionality, see \code{\link{flash_factors_init}},
#' \code{\link{flash_factors_fix}}, \code{\link{flash_factors_set_to_zero}},
#' \code{\link{flash_factors_remove}}, \code{\link{flash_set_verbose}}, and
#' \code{\link{flash_set_conv_crit}}.
#' For extracting useful data from \code{flash} objects, see
#' \code{\link{fitted.flash}}, \code{\link{residuals.flash}}, and
#' \code{\link{ldf.flash}}.
#'
#' @importFrom ebnm ebnm_point_normal
#'
#' @export
#'
flash <- function(data,
S = NULL,
ebnm_fn = ebnm_point_normal,
var_type = 0L,
greedy_Kmax = 50L,
backfit = FALSE,
nullcheck = TRUE,
verbose = 1L) {
fl <- flash_init(data, S = S, var_type = var_type)
fl <- flash_greedy(fl, Kmax = greedy_Kmax, ebnm_fn = ebnm_fn,
verbose = verbose)
if (backfit) {
fl <- flash_backfit(fl, verbose = verbose)
}
if (nullcheck) {
fl <- flash_nullcheck(fl, verbose = verbose)
}
return(fl)
}