Skip to content

Files

Latest commit

 

History

History
 
 

experiments

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Instructions & explanations

Implementation

To implement Boosted Trees on a Diet (ToaD) we made some adaptations to the LightGBM framework. We included a new penalizer in the serial tree learner; see the mrf_ pointer enabled functions in src/treeldarner/serial_tree_learner.cpp for details. Moreover, we added various helper functionalities that are implemented in src/treelearner/memory_restricted_forest.hpp.

Experiments

The experiments folder provides the means to fetch the datasets tested, and run the Trees on a Diet (ToaD) variant. The steps are split to allow short runtimes. The buildToaD.sh script (or buildToaD-windows.sh for Windows) builds LightGBM with the ToaD extension and automatically starts the experiments. (Running .sh scripts on Windows might require additional steps or a specific shell, such as Git Bash.) Prerequisites to build the project can be found in the LightGBM documentation. Depending on your system, training and evaluating the different model configurations might take several hours to days! Please modify the file to enable or disable GPU usage for speedup.

Getting Datasets

For now, we assume you install python packages yourself, requirements.txt will be added later

python/get_datasets.py downloads the datasets. The files are stored in python/data having a 80/20 training/testing split.

Running ToaD

./runExperiments.sh checks for datasets in the data folder with the scheme name.train. It is assumed that the corresponding file with test data is called name.test. You need to call the script with the respective LightGBM build path, i.e. sh runExperiments.sh "../lightgbm" (Mac/Ubuntu) or sh runExperiments.sh "../Release/lightgbm" (Windows).

❗ The script runs for every dataset with 40,620 configurations (26 feature penalties, 26 threshold penalties, 20 tree sizes, 3 depths, and a run without split and threshold penalties) ❗

For testing purposes, you might want to modify the for-loops inside the script.

for i in $(seq -10 1 15); do
    for j in $(seq -10 1 15); do
        for tree in 1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 100 200 500 1000 10000; do
            for depth in 3 5 7; do

(i and j are converted to different power of two values and represent the penalties.)

Evaluation of ToaD

Again, we assume you install python packages yourself

The data inside the models is transformed to .csv files with the python/evaluate_models.py script. This might require more time than you would expect as accuracy metrics need to be calculated. The .csv files are stored in data/datasetname/last.csv. Afterwards similar graphical representation can be generated by calling the python/plot.py script.

To enable figure creation without the whole training and evaluation process, the results of our experiments are placed in the respective results directory.