💪🤓 🤔 Modern Super Learning using Pipelines
Switch branches/tags
Clone or download
nhejazi Merge pull request #180 from tlverse/print-keep_extra
Update print method of Lrnr_sl
Latest commit ceba727 Dec 2, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R set print method of Lrnr_sl to be conditional on keep_extra arg Dec 2, 2018
data
docs
inst force categorical variable-type for factors Jun 5, 2018
man-roxygen
man set print method of Lrnr_sl to be conditional on keep_extra arg Dec 2, 2018
tests set print method of Lrnr_sl to be conditional on keep_extra arg Dec 2, 2018
vignettes
.Rbuildignore Slides (#89) Nov 1, 2017
.gitattributes added inst to linguist documentation Mar 1, 2018
.gitignore
.travis.yml
CONTRIBUTING.md
DESCRIPTION Merge pull request #169 from tlverse/small-improvements Oct 25, 2018
LICENSE Create LICENSE Aug 9, 2017
Makefile
NAMESPACE Merge pull request #169 from tlverse/small-improvements Oct 25, 2018
README-refs.bib
README.Rmd update docs and badges Oct 24, 2018
README.md
_pkgdown.yml rebuilt pkgdown to match tlverse.org template Mar 8, 2018
appveyor.yml new appveyor GITHUB_PAT Oct 24, 2018
codecov.yml reducing coverage thresholds Oct 4, 2017
cran-comments.md pkging materials Aug 9, 2017
deploy.sh changed urls to reflect move to tlverse github org Mar 8, 2018
sl3.Rproj adds refs, initial description, Rproj Aug 9, 2017

README.md

R/sl3: modern Super Learning with pipelines

Travis-CI Build Status Appveyor Build Status Coverage Status Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: GPL v3 DOI Join the chat at https://gitter.im/sl3-Rpkg/Lobby

A modern implementation of the Super Learner algorithm for ensemble learning and model stacking

Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin


What’s sl3?

sl3 is a modern implementation of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007). The Super Learner algorithm performs ensemble learning in one of two fashions:

  1. The discrete Super Learner can be used to select the best prediction algorithm from among a supplied library of machine learning algorithms (“learners” in the sl3 nomenclature) – that is, the discrete Super Learner is the single learning algorithm that minimizes the cross-validated risk with respect to an appropriate loss function.
  2. The ensemble Super Learner can be used to assign weights to a set of specified learning algorithms (from a user-supplied library of such algorithms) so as to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been referred to as stacked regression (Breiman 1996) and stacked generalization (Wolpert 1992).

Installation

Install the most recent stable release from GitHub via devtools:

devtools::install_github("tlverse/sl3")

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Examples

sl3 makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3 package in action:

set.seed(49753)
suppressMessages(library(data.table))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(SuperLearner)
#> Loading required package: nnls
#> Super Learner
#> Version: 2.0-24
#> Package created on 2018-08-10
library(origami)
#> origami: Generalized Cross-Validation Framework
#> Version: 1.0.0
library(sl3)

# load example data set
data(cpp)
cpp <- cpp %>%
  dplyr::filter(!is.na(haz)) %>%
  mutate_all(funs(replace(., is.na(.), 0)))

# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
            "sexn")
task <- sl3_Task$new(cpp, covariates = covars, outcome = "haz")

# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")

# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loaded glmnet 2.0-16
preds <- stack_fit$predict()
head(preds)
#>    Lrnr_pkg_SuperLearner_SL.glmnet Lrnr_glm_TRUE
#> 1:                      0.35345519    0.36298498
#> 2:                      0.35345519    0.36298498
#> 3:                      0.24554305    0.25993072
#> 4:                      0.24554305    0.25993072
#> 5:                      0.24554305    0.25993072
#> 6:                      0.02953193    0.05680264
#>    Lrnr_pkg_SuperLearner_screener_screen.glmnet___Lrnr_glm_TRUE
#> 1:                                                   0.36228209
#> 2:                                                   0.36228209
#> 3:                                                   0.25870995
#> 4:                                                   0.25870995
#> 5:                                                   0.25870995
#> 6:                                                   0.05600958

Contributions

It is our hope that sl3 will grow to be widely used for creating stacked regression models and the cross-validation of pipelines that make up such models, as well as the variety of other applications in which the Super Learner algorithm plays a role. To that end, contributions are very welcome, though we ask that interested contributors consult our contribution guidelines prior to submitting a pull request.


After using the sl3 R package, please cite the following:

    @misc{coyle2018sl3,
      author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
        Sofrygin, Oleg},
      title = {{sl3}: Modern Pipelines for Machine Learning and {Super
        Learning}},
      year  = {2018},
      howpublished = {\url{https://github.com/tlverse/sl3}},
      note = {{R} package version 1.1.0},
      url = {https://doi.org/10.5281/zenodo.1342294},
      doi = {10.5281/zenodo.1342294}
    }

License

© 2017-2018 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Oleg Sofrygin

The contents of this repository are distributed under the GPL-3 license. See file LICENSE for details.


References

Breiman, Leo. 1996. “Stacked Regressions.” Machine Learning 24 (1). Springer: 49–64.

van der Laan, Mark J., Eric C. Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Wolpert, David H. 1992. “Stacked Generalization.” Neural Networks 5 (2). Elsevier: 241–59.