- Women Clothing Review Dataset: https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews
- Churn Dataset: https://www.kaggle.com/datasets/muhammadshahidazeem/customer-churn-dataset?select=customer_churn_dataset-training-master.csv
- Default Credit Dataset: https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset
- Marketing Dataset: https://www.kaggle.com/datasets/rabieelkharoua/predict-conversion-in-digital-marketing-dataset
Note: Before running the scripts in Section 1, ensure you have run the scripts in Section 3 first.
To run the experiments and generate the image for the first section, execute the following scripts:
-
main_intro.py:- Generates samples for the app reviews dataset.
- Generates the base models.
- Generates metrics for 50 samples with different unbalanced percentages.
Usage:
python main_intro.py --dataset_folder ./data --experiments_folder ./experiments
-
analyses_intro/intro.py:- Generates the metrics for the graphic.
Usage:
python analyses_intro/intro.py --experiments_folder ./experiments
-
analyses_intro/intro.R:- Generates the graphic.
Usage:
Rscript analyses_intro/intro.R
To generate the image for section 2, execute the script:
analyses_intro/graph_prob.R:- Generates Figure 2.
To execute the experiments with the datasets, run the following script:
-
main.py:- Generates samples for all datasets.
- Generates the base models.
- Generates metrics for 50 samples.
The main script has two parameters:
--experiments_folder: Folder where the experiments should be stored.--dataset_folder: Folder containing the raw datasets.
Example usage:
python main.py --dataset_folder ./data --experiments_folder ./experiments
To execute the metric tests, run the following scripts:
-
analyses/generate_base_metrics.py:- Generates the confusion matrix for the base models.
The script has one parameter:
--experiments_folder: Folder where the experiments have been stored.
Example usage:
python analyses/generate_base_metrics.py --experiments_folder ./experiments
-
analyses/test_*.py:- Generates the data for the hypothesis test for a specified metric.
Example usage:
python analyses/test_specificity.py --experiments_folder ./experiments
-
analyses/heatmap_*.R:- Generates the heatmap for the test of a metric.
Example usage:
Rscript analyses/heatmap_spe.R
Replace the * with the specific metric for the figure.
To generate the functional boxplot analyses, execute:
-
analyses/generate_roc_curve.py:- Generates the ROC curve data.
Example usage:
python analyses/generate_roc_curve.py --experiments_folder ./experiments
-
analyses/box_plot_functional.R:- Generates the functional boxplot graphic.
The parameters to run this script are:
- Path to the analyses folder.
- Path to the experiments folder.
- Name of the dataset (see the folder names in the experiments folder).
- Size of the result: 500 or 2000.
Example usage:
Rscript analyses/box_plot_functional.R ./analyses ./experiments churn 500
To execute the metric tests, run the following scripts:
-
analyses_logistic/generate_base_metrics.py:- Generates the confusion matrix for the base models.
Example usage:
python analyses_logistic/generate_base_metrics.py --experiments_folder ./experiments
-
analyses_logistic/test_ba.py:- Generates the data for the hypothesis test for balanced accuracy.
Example usage:
python analyses_logistic/test_ba.py --experiments_folder ./experiments
-
analyses_logistic/heatmap_ba.R:- Generates the heatmap for the test of balanced accuracy.
Example usage:
Rscript analyses_logistic/heatmap_ba.R
-
analyses_logistic/test_metrics.py:- Generates the data for the hypothesis test for the Brier score and ROC AUC.
Example usage:
python analyses_logistic/test_metrics.py --experiments_folder ./experiments
-
analyses_logistic/heatmap_metrics.R:- Generates the heatmap for the test of the Brier score and ROC AUC.
Example usage:
Rscript analyses_logistic/heatmap_metrics.R
To execute the experiment with the full dataset, run the following script:
-
main_full.py:- Generates the train and validation sets for all datasets.
- Generates the base models.
- Generates metrics for 50 samples.
Example usage:
python main_full.py --dataset_folder ./data --experiments_folder ./experiments
-
analyses_full/generate_base_metrics.py:- Generates the confusion matrix for the base models.
Example usage:
python analyses_full/generate_base_metrics.py --experiments_folder ./experiments
-
analyses_full/test_ba.py:- Generates the data for the hypothesis test for balanced accuracy.
Example usage:
python analyses_full/test_ba.py --experiments_folder ./experiments
-
analyses_full/heatmap_ba.R:- Generates the heatmap for the test of balanced accuracy.
Example usage:
Rscript analyses_full/heatmap_ba.R