R Implementation of Bayesian Additive Semi-Structured Regression Trees (BAMDT).
Reference:
Luo, Z. T., Sang, H., & Mallick, B. (2022) BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression. Proceedings of the 39th International Conference on Machine Learning (ICML 2022) [link]
Demo.R
: Demo code for fitting BAMDT on a U-shape domainTree.R
: Class of semi-structured decision treesModel.R
: Class of BAMDT modelsComplexDomainFun.R
: Utility functions for U-shape domainSimData.R
: Code to simulate data on a U-shape domain (no need to run unless you would like to regenerate data)input_U.RData
: Data sets generated by runningSimData.R
The code depends on the following R
packages: R6
, collections
, igraph
, fdaPDE
, BART
, sf
, ggplot2
.
Please make sure they are installed before running the demo code.
model = Model$new(Y, X, graphs, projections, hyperpar, X_new, projections_new)
creates a BAMDT model object named model
.
Parameters:
-
Y
: Numeric responses vector of lengthn
. -
X
: Numeric unstructured training features of sizen * p
. -
graphs
: List ofM
spatial graphs, whereM
is the number of trees. Each graph should be anigraph
object. -
projections
: Integer matrix of sizen * M
, whereprojections[i, j]
is the nearest knot index corresponding to training observationi
for treej
. -
hyperpar
: Named vector of hyperparameters with the following elements-
hyperpar['M']
: Number of treesM
. -
hyperpar['sigmasq_mu']
: Variance of prior for$\mu$ , i.e.,$\sigma^2_\mu$ . -
hyperpar['q']
: Quantile used to calibrate prior for noise variance$\sigma^2$ . -
hyperpar['nu']
: Degree of freedom of the inverse-$\chi^2$ prior for noise variance$\sigma^2$ . -
hyperpar['alpha']
: Hyperparameter$\alpha$ in tree generating process. -
hyperpar['beta']
: Hyperparameter$\beta$ in tree generating process. -
hyperpar['numcut']
: Number of candidate split points for unstructured features. -
hyperpar['prob_split_by_x']
: Probability for performing a unstructured split.
-
-
X_new
: Numeric unstructured test features of sizen_new * p
. -
projections_new
: Integer matrix of sizen_ho * M
, whereprojections_new[i, j]
is the nearest knot index corresponding to test observationi
for treej
.
To fit a BAMDT model and predict for test data, use
model$Fit(init_val, MCMC, BURNIN, THIN, seed = 1234, save_partitions = FALSE)
Parameters:
-
init_val
: Named list of initial values with the following element-
init_val[['sigmasq_y']]
: Initial value for noise variance$\sigma^2$ .
-
-
MCMC
: Number of MCMC iterations. -
BURNIN
: Number of burn-in iterations. -
THIN
: Retain MCMC samples everyTHIN
iterations, i.e., the number of posterior samples isnpost = (MCMC - BURNIN) / THIN
. -
seed
: Random seed. -
save_partition
: Logical value indicating whether posterior samples of partitions are saved. Default isFALSE
(recommended). Settingsave_partition = TRUE
is highly memory inefficient.
The model
object has the following public members:
-
model$sigmasq_y_out
: Posterior samples of noise variance$\sigma^2$ . -
model$g_out
:npost * n * M
array of posterior samples of (in-sample) fitted values from each tree. -
model$Y_new_out
:npost * n_new
matrix of posterior samples of (out-of-sample) predicted values. -
model$importance_out
:npost * (p + 1)
matrix of posterior samples of feature importance metrics.