experiments

History

Name		Name	Last commit message	Last commit date
parent directory ..
python		python
results		results
README.md		README.md
predict.conf		predict.conf
train.conf		train.conf
train.regression.conf		train.regression.conf

README.md

Instructions & explanations

Implementation

To implement Boosted Trees on a Diet (ToaD) we made some adaptations to the LightGBM framework. We included a new penalizer in the serial tree learner; see the mrf_ pointer enabled functions in src/treeldarner/serial_tree_learner.cpp for details. Moreover, we added various helper functionalities that are implemented in src/treelearner/memory_restricted_forest.hpp.

Experiments

The experiments folder provides the means to fetch the datasets tested, and run the Trees on a Diet (ToaD) variant. The steps are split to allow short runtimes. The buildToaD.sh script (or buildToaD-windows.sh for Windows) builds LightGBM with the ToaD extension and automatically starts the experiments. (Running .sh scripts on Windows might require additional steps or a specific shell, such as Git Bash.) Prerequisites to build the project can be found in the LightGBM documentation. Depending on your system, training and evaluating the different model configurations might take several hours to days! Please modify the file to enable or disable GPU usage for speedup.

Getting Datasets

For now, we assume you install python packages yourself, requirements.txt will be added later

python/get_datasets.py downloads the datasets. The files are stored in python/data having a 80/20 training/testing split.

Running `ToaD`

./runExperiments.sh checks for datasets in the data folder with the scheme name.train. It is assumed that the corresponding file with test data is called name.test. You need to call the script with the respective LightGBM build path, i.e. sh runExperiments.sh "../lightgbm" (Mac/Ubuntu) or sh runExperiments.sh "../Release/lightgbm" (Windows).

❗ The script runs for every dataset with 40,620 configurations (26 feature penalties, 26 threshold penalties, 20 tree sizes, 3 depths, and a run without split and threshold penalties) ❗

For testing purposes, you might want to modify the for-loops inside the script.

for i in $(seq -10 1 15); do
    for j in $(seq -10 1 15); do
        for tree in 1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 100 200 500 1000 10000; do
            for depth in 3 5 7; do

(i and j are converted to different power of two values and represent the penalties.)

Evaluation of `ToaD`

Again, we assume you install python packages yourself

The data inside the models is transformed to .csv files with the python/evaluate_models.py script. This might require more time than you would expect as accuracy metrics need to be calculated. The .csv files are stored in data/datasetname/last.csv. Afterwards similar graphical representation can be generated by calling the python/plot.py script.

To enable figure creation without the whole training and evaluation process, the results of our experiments are placed in the respective results directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

experiments

experiments

README.md

Instructions & explanations

Implementation

Experiments

Getting Datasets

Running `ToaD`

Evaluation of `ToaD`

Files

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Instructions & explanations

Implementation

Experiments

Getting Datasets

Running ToaD

Evaluation of ToaD

Running `ToaD`

Evaluation of `ToaD`