Dear Reviewer:

We thank you for your time in reviewing our submission. We understand that you are performing a service to the community. In order to ensure that review of this code is as easy as possible, we have:

Included a one-line script to recreate all results (repro_results.sh)
Heavily commented and documented the code
Included an overview of the codebase and the files.

Thank you for reviewing our submission!

Description of Files

The files are organized as follows:

data_structures contains all of the data structures used in our experiments, e.g., forests, trees, nodes, and histograms.
- the wrappers subdirectory contains convenience classes to instantiate models of various types, e.g., a RandomForestClassifier is a forest classifier with bootstrap=True (indicating to draw a bootstrap sample of the n datapoints for each tree), feature_subsampling=SQRT (indicating to consider only sqrt(F) features of the original F features at each node split), etc.
the experiments subdirectory contains all the code for our core experiments
- experiments/runtime_exps contains the script (compare_runtimes.py) to reproduce the results of Tables 1 and 2, as well as the results of running that script (the files ending in _profile or _dict)
- experiments/budget_exps contains the script (compare_budgets.py) to reproduce the results of Tables 3 and 4, as well as the results of running that script (the files ending in _dict)
- experiments/sklearn_exps contains the script (compare_baseline_implementations.py) to reproduce the results of Table 6 in Appendix 4
- experiments/scaling_exps contains the scripts (investigate_scaling.py and make_scaling_plot.py) to reproduce Appendix Figure 1 in Appendix 2
the tests subdirectory tests that we wrote to verify the correctness of our implementations
- tests/feature_importance_tests.py is also used to regenerate the results in Table 5
  - You can reproduce the results for just Table 5 by running tests/feature_importance_tests.py. The results will be stored in the first 4 lines of tests/stat_test_stability_log/reproduce_stability.csv file.
the utils directory contains helper code for training forest-based models
- utils/solvers.py includes the core implementation of MABSplit in the solve_mab() function

Reproduce the tables

To reproduce the results in all the tables, and to reproduce the figure in Appendix 2, please run repro_script.sh. This may take many hours.

Name		Name	Last commit message	Last commit date
Latest commit History 637 Commits
.github/workflows		.github/workflows
data_structures		data_structures
experiments		experiments
hyperparams_sweep		hyperparams_sweep
tests		tests
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
repro_script.sh		repro_script.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

data_structures

data_structures

experiments

experiments

hyperparams_sweep

hyperparams_sweep

tests

tests

utils

utils

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

init.py

init.py

environment.yml

environment.yml

repro_script.sh

repro_script.sh

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Dear Reviewer:

Description of Files

Reproduce the tables

About

Releases

Packages

Contributors 3

Languages

ThrunGroup/FastForest

Folders and files

Latest commit

History

Repository files navigation

Dear Reviewer:

Description of Files

Reproduce the tables

About

Resources

Stars

Watchers

Forks

Languages