ddml

ddml is an implementation of double/debiased machine learning estimators as proposed by Chernozhukov et al. (2018). The key feature of ddml is the straightforward estimation of nuisance parameters using (short-)stacking (Wolpert, 1992), which allows for multiple machine learners to increase robustness to the underlying data generating process. See also Ahrens et al. (2024) for a detailed illustration of the practical benefits of combining DDML with (short-)stacking.

ddml is the sister R package to our Stata package, mirroring its key features while also leveraging R to simplify estimation with user-provided machine learners and/or sparse matrices. See also Ahrens et al. (2023) with additional discussion of the supported causal models and benefits of (short)-stacking.

Installation

Install the latest development version from GitHub (requires devtools package):

if (!require("devtools")) {
  install.packages("devtools")
}
devtools::install_github("thomaswiemann/ddml", dependencies = TRUE)

Install the latest public release from CRAN:

install.packages("ddml")

Example: LATE Estimation based on (Short-)Stacking

To illustrate ddml on a simple example, consider the included random subsample of 5,000 observations from the data of Angrist & Evans (1998). The data contains information on the labor supply of mothers, their children, as well as demographic data. See ?AE98 for details.

# Load ddml and set seed
library(ddml)
set.seed(75523)

# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
D = AE98[, "morekids"]
Z = AE98[, "samesex"]
X = AE98[, c("age","agefst","black","hisp","othrace","educ")]

ddml_late estimates the local average treatment effect (LATE) using double/debiased machine learning (see ?ddml_late). Since the statistical properties of machine learners depend heavily on the underlying (unknown!) structure of the data, adaptive combination of multiple machine learners can increase robustness. In the below snippet, ddml_late estimates the LATE with short-stacking based on three base learners:

linear regression (see ?ols)
lasso (see ?mdl_glmnet)
gradient boosting (see ?mdl_xgboost)

# Estimate the local average treatment effect using short-stacking with base
#     learners ols, rlasso, and xgboost.
late_fit_short <- ddml_late(y, D, Z, X,
                            learners = list(list(fun = ols),
                                            list(fun = mdl_glmnet),
                                            list(fun = mdl_xgboost,
                                                 args = list(nrounds = 100,
                                                             max_depth = 1))),
                            ensemble_type = 'nnls1',
                            shortstack = TRUE,
                            sample_folds = 10,
                            silent = TRUE)
summary(late_fit_short)
#> LATE estimation results: 
#>  
#>         Estimate Std. Error   t value  Pr(>|t|)
#> nnls1 -0.2105019   0.195529 -1.076576 0.2816698

Learn More about `ddml`

Check out our articles to learn more:

vignette("ddml") is a more detailed introduction to ddml
vignette("stacking") discusses computational benefits of short-stacking
vignette("new_ml_wrapper") shows how to write user-provided base learners
vignette("sparse") illustrates support of sparse matrices (see ?Matrix)
vignette("did") discusses integration with the diff-in-diff package did

For additional applied examples, see our case studies:

vignette("example_401k") revisits the effect of 401k participation on retirement savings
vignette("example_BLP95") considers flexible demand estimation with endogenous prices

Other Double/Debiased Machine Learning Packages

ddml is built to easily (and quickly) estimate common causal parameters with multiple machine learners. With its support for short-stacking, sparse matrices, and easy-to-learn syntax, we hope ddml is a useful complement to DoubleML, the expansive R and Python package. DoubleML supports many advanced features such as multiway clustering and stacking.

References

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). “ddml: Double/debiased machine learning in Stata.” https://arxiv.org/abs/2301.09397

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2024). “Model averaging and double machine learning.” https://arxiv.org/abs/2401.01645

Angrist J, Evans W, (1998). “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review, 88(3), 450-477.

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C B, Newey W, Robins J (2018). “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal, 21(1), C1-C68.

Wolpert D H (1992). “Stacked generalization.” Neural Networks, 5(2), 241-259.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.github		.github
R		R
data-raw		data-raw
data		data
man		man
pkgdown/templates		pkgdown/templates
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.envrc		.envrc
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
ddml.Rproj		ddml.Rproj
flake.lock		flake.lock
flake.nix		flake.nix

License

thomaswiemann/ddml

Folders and files

Latest commit

History

Repository files navigation

ddml

Installation

Example: LATE Estimation based on (Short-)Stacking

Learn More about ddml

Other Double/Debiased Machine Learning Packages

References

About

Resources

License

Stars

Watchers

Forks

Languages

Learn More about `ddml`