🤠 📿 The Highly Adaptive Lasso
Switch branches/tags
Clone or download
nhejazi Merge pull request #39 from tlverse/cate
Return basis functions optionally
Latest commit 783d3aa Dec 4, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
docs
man
sandbox
src
tests
vignettes
.Rbuildignore update docs and pkgdown Oct 10, 2018
.gitignore
.travis.yml
DESCRIPTION bump version; re-style; update pkgdown Dec 4, 2018
LICENSE
Makefile
NAMESPACE
README-refs.bib
README.Rmd update docs and pkgdown Oct 10, 2018
README.md
_pkgdown.yml
appveyor.yml try fixing check Apr 12, 2018
codecov.yml
cran-comments.md
hal9001.Rproj pkg rename: mangolassi -> hal9001; begin adding cpp docs Sep 1, 2017

README.md

R/hal9001

Travis-CI Build Status AppVeyor Build Status Coverage Status Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. License: GPL v3

Fast and scalable estimation procedure for the Highly Adaptive LASSO

Authors: Jeremy Coyle and Nima Hejazi


What’s hal9001?

hal9001 is an R package providing an implementation of the scalable Highly Adaptive LASSO (HAL), a nonparametric regression estimator that applies L1-regularized regression (i.e., the LASSO) to a design matrix composed of indicator functions corresponding to a set of covariates and interactions thereof. Recent theoretical results show that HAL is endowed with several important properties that make it optimally suited for the purpose of estimating highly complex functions, including for inference in problem settings where causal parameters are estimated via data-adaptive techniques (i.e., machine learning), as is the case with Targeted Minimum Loss-Based Estimation (TMLE). While it is certainly possible to implement HAL purely in R, the computationally intensive nature of the algorithm suggests that writing core routines in C++ (and making these available in R via the Rcpp framework) ought to provide significant efficiency gains. hal9001 is just such an implementation.

For detailed discussions of the Highly Adaptive LASSO estimator, the interested reader might consider consulting Benkeser and van der Laan (2016), van der Laan (2017a), and van der Laan (2017b).


Installation

You can install the development version of hal9001 from GitHub via devtools with

devtools::install_github("tlverse/hal9001", build_vignettes = FALSE)

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Example

This minimal example shows how to use hal9001 to obtain predictions based on the Highly Adaptive LASSO. For details on the properties of the estimator, the interested reader is referred to Benkeser and van der Laan (2016) and van der Laan (2017a).

# load the hal9001 package
library(hal9001)
#> Loading required package: Rcpp
#> hal9001 v0.2.0: The Scalable Highly Adaptive Lasso

# simulate data
set.seed(385971)
n = 100
p = 3
x <- xmat <- matrix(rnorm(n * p), n, p)
y <- x[, 1] * sin(x[, 2]) + rnorm(n, mean = 0, sd = 0.2)

# fit the HAL regression
hal_fit <- fit_hal(X = x, Y = y)
#> [1] "Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over."
hal_fit$times
#>                   user.self sys.self elapsed user.child sys.child
#> design_matrix         0.002    0.001   0.003          0         0
#> remove_duplicates     0.004    0.000   0.004          0         0
#> reduce_basis          0.000    0.000   0.000          0         0
#> lasso                 0.279    0.000   0.279          0         0
#> total                 0.285    0.001   0.286          0         0

# training sample prediction
preds <- predict(hal_fit, new_data = x)
mean(hal_mse <- (preds - y)^2)
#> [1] 0.009266171

Contributions

hal9001 is the primary implementation of the Highly Adaptive LASSO, an nonparametric function estimation procedure with numerous optimality properties. While contributions are very welcome, we ask that interested contributors consult our contribution guidelines prior to submitting a pull request.


Citation

After using the hal9001 R package, please cite the following:

    @misc{coyle2018hal9001,
      author = {Coyle, Jeremy R and Hejazi, Nima S},
      title = {{hal9001}: The Scalable {Highly Adaptive LASSO}},
      year  = {2018},
      howpublished = {\url{https://github.com/tlverse/hal9001}},
      url = {https://doi.org/DOI_TBD},
      doi = {DOI_TBD}
    }

License

© 2017-2018 Jeremy R. Coyle & Nima S. Hejazi

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.


References

Benkeser, David, and Mark J van der Laan. 2016. “The Highly Adaptive Lasso Estimator.” In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE. https://doi.org/10.1109/dsaa.2016.93.

van der Laan, Mark J. 2017a. “A Generally Efficient Targeted Minimum Loss Based Estimator Based on the Highly Adaptive Lasso.” The International Journal of Biostatistics. De Gruyter. https://doi.org/10.1515/ijb-2015-0097.

———. 2017b. “Finite Sample Inference for Targeted Learning.” ArXiv E-Prints.