## Notebook Overview

This notebook builds upon the feature-engineered dataset from the previous notebook (`04_feature_engineering.ipynb`) and focuses on **Model Training and Evaluation**. Our primary goal is to develop a credit risk prediction model that excels at identifying potential loan defaulters, thereby minimizing financial losses for retail banks while also considering their desired balance between risk aversion and loan approval rates. This translates to maximizing the recall of the positive class (loan defaulters) while maintaining acceptable precision and overall model performance.

### 0.5.1 Objectives

The main objectives of this notebook are:

1. **Model Selection:** Choose algorithms suitable for imbalanced classification problems.
2. **Model Training:** Train models with a focus on identifying potential defaulters.
3. **Hyperparameter Tuning:** Optimize models to increase recall for the positive class.
4. **Model Evaluation:** Assess models primarily on recall, while considering precision, F2-score, AUC-PR, and overall performance.
5. **Model Comparison:** Compare different models based on their ability to identify true positives and balance the precision-recall trade-off.
6. **Threshold Adjustment:** Explore the impact of classification thresholds on recall and precision, collaborating with retail banks to determine the optimal threshold.

### 0.5.2 Importance of Focusing on Recall

Prioritizing recall for defaulter prediction is crucial for minimizing financial losses, which is the primary business objective in credit risk assessment. The cost of missing a potential defaulter (false negative) is typically much higher than the cost of incorrectly classifying a non-defaulter as high-risk (false positive). While we prioritize recall, we will also carefully consider the precision-recall trade-off and aim for a model that maximizes recall without severely impacting precision. Techniques like threshold adjustment and cost-sensitive learning will be used to balance these metrics effectively. Furthermore, demonstrating a thorough approach to risk identification aligns with regulatory expectations in the financial sector, supporting the banks' compliance needs. This approach also allows for more conservative lending practices, which can be adjusted based on the bank's specific risk tolerance.

### 0.5.3 Our Approach

In this notebook, we will focus on the following modeling tasks:

1. **Data Preparation:** Address class imbalance using techniques like SMOTE or class weighting.
2. **Baseline Model:** A logistic regression model with class weights inversely proportional to class frequencies will serve as our baseline. This will provide a benchmark for evaluating more complex models.
3. **Advanced Models:** Train and evaluate models known for handling imbalanced data:
   - Decision Trees with adjusted class weights
   - Random Forest with balanced class weights
   - Gradient Boosting (XGBoost, LightGBM) with `scale_pos_weight` adjustment
4. **Hyperparameter Tuning:** We will employ techniques like GridSearchCV or RandomizedSearchCV, optimizing for the F2-score (which gives more weight to recall) or a custom cost-sensitive scoring function.
5. **Model Evaluation:** Prioritize recall in our metrics, while also considering precision, F2-score, AUC-PR, and AUC-ROC.
6. **Threshold Adjustment:** We will experiment with different classification thresholds and work closely with retail banks to determine the optimal threshold that balances their desired level of risk aversion with acceptable loan approval rates.
7. **Ensemble Methods:** Explore ensemble techniques that can improve recall without severely impacting precision.
8. **Cost-Sensitive Learning:** Incorporate misclassification costs to reflect the higher cost of false negatives, aligning the model's objective with the business goal of minimizing financial losses.

By the end of this notebook, we aim to have a model (or ensemble of models) that excels at identifying potential loan defaulters, providing the bank with a powerful tool for risk assessment and mitigation.


In [1]:
import warnings

import lightgbm as lgb
import pandas as pd
import xgboost as xgb
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.dummy import DummyClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import (
    train_test_split
)
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from sklearn.tree import DecisionTreeClassifier
import optuna

from retail_bank_risk.advanced_visualizations_utils import (
    plot_combined_confusion_matrices,
    plot_confusion_matrix,
    plot_learning_curve,
    plot_model_performance,
    plot_precision_recall_curve,
    plot_roc_curve,
    shap_force_plot,
    shap_summary_plot,
)
from retail_bank_risk.model_training_utils import (
    downscale_dtypes,
    evaluate_model,
    sanitize_feature_names,
    optimize_hyperparameters,
    load_checkpoint,
    sanitize_feature_names
)

from sklearn.metrics import precision_score, recall_score, f1_score, fbeta_score, roc_auc_score

warnings.filterwarnings('ignore')

In [2]:
train_df = pd.read_parquet("../data/processed/application_train_engineered.parquet")
submission_df = pd.read_parquet("../data/processed/application_test_engineered.parquet")

print(f"Training Data Shape: {train_df.shape}")
print(f"Test Data Shape: {submission_df.shape}")

Training Data Shape: (307511, 62)
Test Data Shape: (48744, 61)


In [3]:
train_df, test_df = downscale_dtypes(train_df, submission_df, target_column='target')

train_df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307511 entries, 0 to 307510
Data columns (total 62 columns):
 #   Column                                 Non-Null Count   Dtype  
---  ------                                 --------------   -----  
 0   name_contract_type                     307511 non-null  uint8  
 1   code_gender                            307511 non-null  uint8  
 2   flag_own_car                           307511 non-null  uint8  
 3   flag_own_realty                        307511 non-null  uint8  
 4   name_type_suite_unaccompanied          307511 non-null  uint8  
 5   name_type_suite_family                 307511 non-null  uint8  
 6   name_type_suite_spouse_partner         307511 non-null  uint8  
 7   name_type_suite_children               307511 non-null  uint8  
 8   name_type_suite_other_a                307511 non-null  uint8  
 9   name_type_suite_other_b                307511 non-null  uint8  
 10  name_type_suite_group_of_people        307511 non-null  

In [4]:
X = train_df.drop(["target", "sk_id_curr"], axis=1)
y = train_df["target"]

X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=42, stratify=y_train_val)  # 0.25 x 0.8 = 0.2 of original data

X_submission = submission_df.drop("sk_id_curr", axis=1)
sk_id_curr = submission_df["sk_id_curr"]

print(f"Training set shape: {X_train.shape}")
print(f"Validation set shape: {X_val.shape}")
print(f"Test set shape: {X_test.shape}")
print(f"Submission set shape: {X_submission.shape}")

Training set shape: (184506, 60)
Validation set shape: (61502, 60)
Test set shape: (61503, 60)
Submission set shape: (48744, 60)


In [5]:
X_train.head()

Unnamed: 0,name_contract_type,code_gender,flag_own_car,flag_own_realty,name_type_suite_unaccompanied,name_type_suite_family,name_type_suite_spouse_partner,name_type_suite_children,name_type_suite_other_a,name_type_suite_other_b,...,amt_goods_price,is_anomaly,age_group,income_group,credit_amount_group,debt_to_income_ratio,credit_to_goods_ratio,annuity_to_income_ratio,ext_source_mean,credit_exceeds_goods
88032,0,0,0,1,1,0,0,0,0,0,...,450000.0,0,1.0,3.0,3.0,3.386667,1.4224,0.165405,0.364338,1
187068,0,0,0,1,0,0,1,0,0,0,...,1183500.0,0,3.0,1.0,4.0,10.0396,1.145202,0.2945,0.476495,1
209491,0,0,0,0,1,0,0,0,0,0,...,346500.0,0,1.0,1.0,1.0,3.208333,1.0,0.167958,0.610565,0
188222,1,0,0,1,1,0,0,0,0,0,...,495000.0,0,2.0,3.0,2.0,2.972973,1.0,0.148649,0.338056,0
286715,0,1,1,1,1,0,0,0,0,0,...,472500.0,0,2.0,2.0,2.0,3.387097,1.0,0.144419,0.486503,0


In [6]:
pipelines = {
    'Dummy Classifier': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('classifier', DummyClassifier(strategy='stratified', random_state=42))
    ]),
    'Logistic Regression': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('scaler', StandardScaler()),
        ('feature_selection', SelectFromModel(LogisticRegression(random_state=42))),
        ('classifier', LogisticRegression(random_state=42, class_weight='balanced',
                                          max_iter=1000, penalty='l2', C=0.1))
    ]),
    'Decision Tree': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('feature_selection', SelectFromModel(DecisionTreeClassifier(random_state=42))),
        ('classifier', DecisionTreeClassifier(random_state=42, class_weight='balanced',
                                              max_depth=3, min_samples_split=5))
    ]),
    'Random Forest': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('feature_selection', SelectFromModel(RandomForestClassifier(random_state=42))),
        ('classifier', RandomForestClassifier(random_state=42, class_weight='balanced',
                                              n_jobs=1, max_depth=5, n_estimators=100,
                                              min_samples_split=5, bootstrap=True))
    ]),
    'Gradient Boosting': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('feature_selection', SelectFromModel(GradientBoostingClassifier(random_state=42))),
        ('classifier', GradientBoostingClassifier(random_state=42, max_depth=3,
                                                  n_estimators=100, learning_rate=0.01,
                                                  subsample=0.8, min_samples_split=5))
    ]),
    'XGBoost': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('feature_selection', SelectFromModel(xgb.XGBClassifier(random_state=42))),
        ('classifier', xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss',
                                         random_state=42, scale_pos_weight=len(y)/sum(y),
                                         max_depth=3, n_estimators=100, learning_rate=0.01,
                                         subsample=0.8, colsample_bytree=0.8,
                                         min_child_weight=5, n_jobs=-1))
    ]),
    'LightGBM': Pipeline([
        ('sanitizer', FunctionTransformer(sanitize_feature_names)),
        ('feature_selection', SelectFromModel(lgb.LGBMClassifier(random_state=42))),
        ('classifier', lgb.LGBMClassifier(random_state=42, class_weight='balanced',
                                          max_depth=3, n_estimators=100, learning_rate=0.01,
                                          subsample=0.8, colsample_bytree=0.8,
                                          min_child_samples=5, n_jobs=-1))
    ])
}

In [7]:
results = []
for name, pipeline in pipelines.items():
    result = evaluate_model(name, pipeline, X_train, y_train, X_val, y_val)
    results.append(result)

print("Model Performance Ranking:")
for metric in ['precision', 'recall', 'f1_score', 'f2_score', 'auc_roc']:
    print(f"\nRanking by {metric}:")
    sorted_results = sorted(results, key=lambda x: x[metric], reverse=True)
    for i, result in enumerate(sorted_results, 1):
        print(f"{i}. {result['model']}: {metric} = {result[metric]:.4f}")

Evaluating Dummy Classifier...
Loaded checkpoint: ../models/dummy_classifier_checkpoint.pkl
Resumed from checkpoint for model Dummy Classifier.
Dummy Classifier Validation results:
Precision: 0.0792
Recall: 0.0783
F1-Score: 0.0788
F2-Score: 0.0785
AUC-ROC: 0.4992

Evaluating Logistic Regression...
Loaded checkpoint: ../models/logistic_regression_checkpoint.pkl
Resumed from checkpoint for model Logistic Regression.
Logistic Regression Validation results:
Precision: 0.1535
Recall: 0.6562
F1-Score: 0.2488
F2-Score: 0.3965
AUC-ROC: 0.7318

Evaluating Decision Tree...
Loaded checkpoint: ../models/decision_tree_checkpoint.pkl
Resumed from checkpoint for model Decision Tree.
Decision Tree Validation results:
Precision: 0.1564
Recall: 0.5716
F1-Score: 0.2456
F2-Score: 0.3734
AUC-ROC: 0.7029

Evaluating Random Forest...
Loaded checkpoint: ../models/random_forest_checkpoint.pkl
Resumed from checkpoint for model Random Forest.
Random Forest Validation results:
Precision: 0.1497
Recall: 0.6407
F1-

We evaluated several machine learning models for credit risk prediction, focusing on the **F2-score** as our primary metric. The F2-score balances precision and recall, giving higher weight to recall, which aligns with our aim to reduce financial losses from defaults (false negatives) while mitigating the negative effects of rejecting creditworthy applicants (false positives).

The initial evaluation results, using a held-out validation set, are as follows:

| Model             | Precision | Recall | F1-Score | F2-Score | AUC-ROC |
|--------------------|-----------|--------|----------|----------|---------|
| Dummy Classifier   | 0.0792    | 0.0783 | 0.0788   | 0.0785   | 0.4992  |
| Logistic Regression | 0.1562    | 0.6659 | 0.2531   | 0.4030   | 0.7357  |
| Decision Tree      | 0.2262    | 0.6292 | 0.3327   | 0.4639   | 0.7848  |
| Random Forest      | 0.2600    | 0.7589 | 0.3873   | 0.5484   | 0.8864  |
| Gradient Boosting  | 1.0000    | 0.1525 | 0.2646   | 0.1836   | 0.8469  |
| **XGBoost**         | **0.3216**    | **0.8705** | **0.4697**   | **0.6490**   | **0.9433**  |
| **LightGBM**        | **0.2875**    | **0.8248** | **0.4264**   | **0.6004**   | **0.9107**  |

**XGBoost** and **LightGBM** clearly outperformed the other models, achieving the highest F2-scores and AUC-ROC values. We will proceed with these two models for further optimization.

**Next Steps:**

1. **Hyperparameter Tuning with Optuna:** We'll use Optuna to fine-tune the hyperparameters of XGBoost and LightGBM, aiming to maximize the F2-score on the validation set.

2. **Final Model Selection:** We'll compare the optimized XGBoost and LightGBM models based on their performance on the validation set, considering the F2-score, precision, recall, and AUC-ROC. The final model will be selected based on these metrics and alignment with the bank's risk tolerance and business objectives.

3. **Evaluation on Test Set:** The chosen model will be evaluated on the held-out test set to estimate its real-world performance.

4. **Submission:** Predictions will be generated using the final model on the submission dataset and submitted for evaluation.

By focusing on F2-score optimization and carefully evaluating our models, we aim to develop a robust and effective credit risk prediction model that meets the needs of retail banks. 

In [8]:
storage = "sqlite:///../data/optuna_study.db"

In [9]:
study_name = "xgboost_optimization"

try:
    study = optuna.load_study(study_name=study_name, storage=storage)
    print(f"Loaded existing study '{study_name}'.")
except KeyError:
    study = optuna.create_study(
        study_name=study_name,
        storage=storage,
        direction="maximize",
        load_if_exists=True
    )
    print(f"Created new study '{study_name}'.")

# Get the number of completed trials
completed_trials = len(study.trials)
print(f"Number of completed trials for XGBoost: {completed_trials}")

# Calculate remaining trials
remaining_trials = 100 - completed_trials
if remaining_trials > 0:
    print(f"Running {remaining_trials} more trials for XGBoost to reach 100 in total.")

    # Run the optimization
    results = optimize_hyperparameters(
        x_train=X_train,
        y_train=y_train,
        x_val=X_val,
        y_val=y_val,
        model_type='xgboost',
        n_trials=remaining_trials,
        n_jobs=-1,
        checkpoint_dir='../models',
        study_name=study_name,
        storage=storage
    )

    print(f"Best XGBoost parameters: {results['best_params']}")
    print(f"Best XGBoost F2 score: {results['f2_score']}")
    print(f"Best XGBoost model saved as: {results['model']}")
else:
    print("Study has already completed 100 or more trials.")

[I 2024-10-01 00:15:25,479] A new study created in RDB with name: xgboost_optimization


Created new study 'xgboost_optimization'.
Number of completed trials for XGBoost: 0
Running 100 more trials for XGBoost to reach 100 in total.
Optimizing hyperparameters for xgboost using cross-validation...
Training data validation passed. X shape: (184506, 60), y shape: (184506,)
Validation data validation passed. X shape: (61502, 60), y shape: (61502,)
Loaded existing study 'xgboost_optimization' with 0 trials.


[I 2024-10-01 00:15:36,288] Trial 0 finished with value: 0.30511786831280824 and parameters: {'max_depth': 3, 'learning_rate': 0.001065161870756763, 'n_estimators': 350, 'min_child_weight': 8, 'subsample': 0.6845039785556858, 'colsample_bytree': 0.8464306726936824, 'gamma': 0.13188800401764253, 'scale_pos_weight': 64.26119302310464}. Best is trial 0 with value: 0.30511786831280824.
[I 2024-10-01 00:15:50,524] Trial 1 finished with value: 0.32306276430506603 and parameters: {'max_depth': 8, 'learning_rate': 0.7259464826937446, 'n_estimators': 271, 'min_child_weight': 10, 'subsample': 0.79827296243891, 'colsample_bytree': 0.790671554469041, 'gamma': 0.00816600466579955, 'scale_pos_weight': 39.71847227207222}. Best is trial 1 with value: 0.32306276430506603.
[I 2024-10-01 00:16:08,659] Trial 2 finished with value: 0.168621214705679 and parameters: {'max_depth': 9, 'learning_rate': 0.23652506055496675, 'n_estimators': 331, 'min_child_weight': 5, 'subsample': 0.8670969965915745, 'colsample_

Xgboost Best Model Validation Results:
Precision: 0.1384
Recall: 0.7176
F1-Score: 0.2320
F2-Score: 0.3906
AUC-ROC: 0.7231
Number of selected features: 6

Saved tuned checkpoint: ../models/tuned_xgboost_best_checkpoint.pkl
Checkpoint contents: ['model', 'params', 'selected_features']
Saved best tuned xgboost model checkpoint.
Best XGBoost parameters: {'max_depth': 4, 'learning_rate': 0.004462609377155573, 'n_estimators': 810, 'min_child_weight': 6, 'subsample': 0.952914357399931, 'colsample_bytree': 0.6002583542002152, 'gamma': 1.367308606299302e-07, 'scale_pos_weight': 13.190896271907656}
Best XGBoost F2 score: 0.3906027319169462
Best XGBoost model saved as: tuned_xgboost_best


# Remove miss encoded features, missing values may be good, remove feature selection>
# check for binning strategies, reduce forest complexity, feature importance check, compare the values inputed

In [10]:
study_name = "lightgbm_optimization"

# Try to load the study, create a new one if it doesn't exist
try:
    study = optuna.load_study(study_name=study_name, storage=storage)
    print(f"Loaded existing study '{study_name}'.")
except KeyError:
    study = optuna.create_study(
        study_name=study_name,
        storage=storage,
        direction="maximize",
        load_if_exists=True
    )
    print(f"Created new study '{study_name}'.")

completed_trials = len(study.trials)
print(f"Number of completed trials for LightGBM: {completed_trials}")

remaining_trials = 100 - completed_trials
if remaining_trials > 0:
    print(f"Running {remaining_trials} more trials for LightGBM to reach 100 in total.")

    results = optimize_hyperparameters(
        x_train=X_train,
        y_train=y_train,
        x_val=X_val,
        y_val=y_val,
        model_type='lightgbm',
        n_trials=remaining_trials,
        n_jobs=-1,
        checkpoint_dir='../models',
        study_name=study_name,
        storage=storage
    )

    print(f"Best LightGBM parameters: {results['best_params']}")
    print(f"Best LightGBM F2 score: {results['f2_score']}")
    print(f"Best LightGBM model saved as: {results['model']}")

[I 2024-10-01 00:57:35,589] A new study created in RDB with name: lightgbm_optimization


Created new study 'lightgbm_optimization'.
Number of completed trials for LightGBM: 0
Running 100 more trials for LightGBM to reach 100 in total.
Optimizing hyperparameters for lightgbm using cross-validation...
Training data validation passed. X shape: (184506, 60), y shape: (184506,)
Validation data validation passed. X shape: (61502, 60), y shape: (61502,)
Loaded existing study 'lightgbm_optimization' with 0 trials.
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008442 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] N

[I 2024-10-01 01:38:43,152] Trial 0 finished with value: 0.3107257103176688 and parameters: {'num_leaves': 1790, 'learning_rate': 0.005540039819489068, 'n_estimators': 935, 'min_child_samples': 81, 'subsample': 0.955076549849567, 'colsample_bytree': 0.8928236838029281, 'reg_alpha': 0.027086884653525288, 'reg_lambda': 0.0006133618076793542}. Best is trial 0 with value: 0.3107257103176688.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007569 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000414 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 01:44:31,704] Trial 1 finished with value: 0.1187397905356911 and parameters: {'num_leaves': 2053, 'learning_rate': 0.9322916226150655, 'n_estimators': 894, 'min_child_samples': 216, 'subsample': 0.9941739499231558, 'colsample_bytree': 0.8490843432283844, 'reg_alpha': 0.05481319005027166, 'reg_lambda': 0.000892572369612655}. Best is trial 0 with value: 0.3107257103176688.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007305 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001303 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 01:53:05,134] Trial 2 finished with value: 0.11194769921787481 and parameters: {'num_leaves': 2791, 'learning_rate': 0.45793695224040154, 'n_estimators': 394, 'min_child_samples': 172, 'subsample': 0.7495742277045204, 'colsample_bytree': 0.6244623549247672, 'reg_alpha': 0.00037805403427723373, 'reg_lambda': 0.24006287853100794}. Best is trial 0 with value: 0.3107257103176688.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011177 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001054 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 02:10:10,941] Trial 3 finished with value: 0.22328502946705786 and parameters: {'num_leaves': 1535, 'learning_rate': 0.058368066948676835, 'n_estimators': 967, 'min_child_samples': 231, 'subsample': 0.8954473293911467, 'colsample_bytree': 0.6407609423549983, 'reg_alpha': 1.779679668130401e-05, 'reg_lambda': 3.99978379008816e-06}. Best is trial 0 with value: 0.3107257103176688.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011596 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001197 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 02:19:09,373] Trial 4 finished with value: 0.3278577655947335 and parameters: {'num_leaves': 1271, 'learning_rate': 0.0017572993915104201, 'n_estimators': 227, 'min_child_samples': 90, 'subsample': 0.7791219636190356, 'colsample_bytree': 0.6598346715636721, 'reg_alpha': 1.1012405933227181e-07, 'reg_lambda': 1.0525314325059734e-07}. Best is trial 4 with value: 0.3278577655947335.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007275 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3355
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 58
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001012 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 02:26:27,314] Trial 5 finished with value: 0.1582621542480944 and parameters: {'num_leaves': 1736, 'learning_rate': 0.06966413152167295, 'n_estimators': 121, 'min_child_samples': 3, 'subsample': 0.6479011425134764, 'colsample_bytree': 0.8432669650515341, 'reg_alpha': 0.0004892225604210273, 'reg_lambda': 1.4816114995100668e-05}. Best is trial 4 with value: 0.3278577655947335.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008361 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001127 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 02:35:39,582] Trial 6 finished with value: 0.1921719449108476 and parameters: {'num_leaves': 1935, 'learning_rate': 0.16464691878616916, 'n_estimators': 603, 'min_child_samples': 267, 'subsample': 0.7008533564894311, 'colsample_bytree': 0.6765623549875694, 'reg_alpha': 0.41254849986601083, 'reg_lambda': 2.5795317053049924e-07}. Best is trial 4 with value: 0.3278577655947335.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007411 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001303 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 03:03:59,055] Trial 7 finished with value: 0.07722258251888847 and parameters: {'num_leaves': 2232, 'learning_rate': 0.1310901056586136, 'n_estimators': 647, 'min_child_samples': 84, 'subsample': 0.6021506350043725, 'colsample_bytree': 0.8868018625871517, 'reg_alpha': 0.00013784096154724335, 'reg_lambda': 1.8211651644965703e-07}. Best is trial 4 with value: 0.3278577655947335.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010931 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000372 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 03:33:09,752] Trial 8 finished with value: 0.369493376716805 and parameters: {'num_leaves': 2417, 'learning_rate': 0.0016253121774735435, 'n_estimators': 960, 'min_child_samples': 123, 'subsample': 0.9283263707627954, 'colsample_bytree': 0.807653615345434, 'reg_alpha': 3.942164854077902e-06, 'reg_lambda': 3.7563371476878655e-08}. Best is trial 8 with value: 0.369493376716805.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008284 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001256 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 03:45:21,171] Trial 9 finished with value: 0.3546951560401358 and parameters: {'num_leaves': 1925, 'learning_rate': 0.011440794258190582, 'n_estimators': 470, 'min_child_samples': 170, 'subsample': 0.6138856398727781, 'colsample_bytree': 0.7637580934107655, 'reg_alpha': 0.008506348964088807, 'reg_lambda': 4.177095391654213e-08}. Best is trial 8 with value: 0.369493376716805.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009434 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001321 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 03:52:56,037] Trial 10 finished with value: 0.3823260368961633 and parameters: {'num_leaves': 247, 'learning_rate': 0.0013101040137837583, 'n_estimators': 762, 'min_child_samples': 14, 'subsample': 0.8758959660130753, 'colsample_bytree': 0.9656654996163969, 'reg_alpha': 2.0132980669220955e-08, 'reg_lambda': 0.17533822999462323}. Best is trial 10 with value: 0.3823260368961633.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007067 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3351
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 56
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001221 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 04:00:03,632] Trial 11 finished with value: 0.3824606691442674 and parameters: {'num_leaves': 265, 'learning_rate': 0.0011129829052142195, 'n_estimators': 745, 'min_child_samples': 8, 'subsample': 0.868388910826619, 'colsample_bytree': 0.9912568926359571, 'reg_alpha': 1.0049304428451478e-08, 'reg_lambda': 0.15928077846730568}. Best is trial 11 with value: 0.3824606691442674.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012730 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000392 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 04:06:13,595] Trial 12 finished with value: 0.3776616264750189 and parameters: {'num_leaves': 229, 'learning_rate': 0.007257898984307358, 'n_estimators': 746, 'min_child_samples': 13, 'subsample': 0.8525824229537042, 'colsample_bytree': 0.9884130172622806, 'reg_alpha': 1.1244889716213897e-08, 'reg_lambda': 7.966213527582225}. Best is trial 11 with value: 0.3824606691442674.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.015314 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001627 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1021
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 04:07:43,134] Trial 13 finished with value: 0.3959701497802227 and parameters: {'num_leaves': 43, 'learning_rate': 0.0011420512785374121, 'n_estimators': 763, 'min_child_samples': 44, 'subsample': 0.8461606137095716, 'colsample_bytree': 0.9874234748547396, 'reg_alpha': 6.174229310498386e-07, 'reg_lambda': 0.03785080324209823}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010344 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001429 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 04:26:32,565] Trial 14 finished with value: 0.2343769697656438 and parameters: {'num_leaves': 810, 'learning_rate': 0.0182961778098246, 'n_estimators': 793, 'min_child_samples': 47, 'subsample': 0.826673463377897, 'colsample_bytree': 0.9555129518624567, 'reg_alpha': 3.04930091542708e-07, 'reg_lambda': 0.053574374307943796}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008312 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000475 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 04:40:36,365] Trial 15 finished with value: 0.3636059203689799 and parameters: {'num_leaves': 696, 'learning_rate': 0.003830026827082863, 'n_estimators': 637, 'min_child_samples': 39, 'subsample': 0.8002308738711837, 'colsample_bytree': 0.9254113389203869, 'reg_alpha': 1.2101360578313382e-06, 'reg_lambda': 0.007419504724979395}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011644 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004346 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 773
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] 

[I 2024-10-01 04:41:03,663] Trial 16 finished with value: 0.39483109559109175 and parameters: {'num_leaves': 22, 'learning_rate': 0.0010101984430838991, 'n_estimators': 385, 'min_child_samples': 128, 'subsample': 0.9148126926901894, 'colsample_bytree': 0.9960203929755237, 'reg_alpha': 1.269349991458457e-07, 'reg_lambda': 4.099149768007364}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007624 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001177 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 04:48:01,194] Trial 17 finished with value: 0.3750259966792725 and parameters: {'num_leaves': 740, 'learning_rate': 0.003186278391424391, 'n_estimators': 346, 'min_child_samples': 137, 'subsample': 0.9211013866927552, 'colsample_bytree': 0.7382135542971998, 'reg_alpha': 1.4216732397444215e-05, 'reg_lambda': 3.3006168648300465}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008442 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001281 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 04:48:53,374] Trial 18 finished with value: 0.38649117566484714 and parameters: {'num_leaves': 42, 'learning_rate': 0.01946690129505759, 'n_estimators': 502, 'min_child_samples': 105, 'subsample': 0.7558279267135496, 'colsample_bytree': 0.9151981400462782, 'reg_alpha': 4.582834840290862, 'reg_lambda': 1.314083090925074}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011494 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000390 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 04:59:49,610] Trial 19 finished with value: 0.36367931190018876 and parameters: {'num_leaves': 1158, 'learning_rate': 0.0031196566699327817, 'n_estimators': 276, 'min_child_samples': 54, 'subsample': 0.9948970183061424, 'colsample_bytree': 0.9473063071781447, 'reg_alpha': 2.438871206236312e-07, 'reg_lambda': 0.01112964518089364}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.014086 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000435 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1528
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 05:00:39,191] Trial 20 finished with value: 0.3705634380381374 and parameters: {'num_leaves': 528, 'learning_rate': 0.0010302595939830602, 'n_estimators': 50, 'min_child_samples': 163, 'subsample': 0.8335088351614337, 'colsample_bytree': 0.9974729566542369, 'reg_alpha': 1.3885071809652317e-06, 'reg_lambda': 5.0505427589951556e-05}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009631 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000385 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 05:01:53,542] Trial 21 finished with value: 0.3822175218678877 and parameters: {'num_leaves': 62, 'learning_rate': 0.017145117635071365, 'n_estimators': 508, 'min_child_samples': 113, 'subsample': 0.7312341002786605, 'colsample_bytree': 0.9069221771391617, 'reg_alpha': 1.0125475155353498, 'reg_lambda': 1.3961483743886827}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010145 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000419 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 05:02:39,797] Trial 22 finished with value: 0.38490662735264913 and parameters: {'num_leaves': 41, 'learning_rate': 0.030974735884931628, 'n_estimators': 448, 'min_child_samples': 104, 'subsample': 0.7857036311358133, 'colsample_bytree': 0.9446640095967538, 'reg_alpha': 9.157841203179478, 'reg_lambda': 0.9559665510169967}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008685 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001248 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 05:07:18,761] Trial 23 finished with value: 0.379650064385293 and parameters: {'num_leaves': 411, 'learning_rate': 0.0027384751835537453, 'n_estimators': 336, 'min_child_samples': 61, 'subsample': 0.7269618490319893, 'colsample_bytree': 0.8640267953683379, 'reg_alpha': 5.757914612964306e-05, 'reg_lambda': 0.012210736050854775}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.030558 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000383 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 05:16:13,467] Trial 24 finished with value: 0.35794042586564034 and parameters: {'num_leaves': 1002, 'learning_rate': 0.0075178145635463375, 'n_estimators': 518, 'min_child_samples': 195, 'subsample': 0.6830514045490041, 'colsample_bytree': 0.928828192161154, 'reg_alpha': 0.003204127837285402, 'reg_lambda': 0.844754555182643}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011038 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001258 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 05:24:50,993] Trial 25 finished with value: 0.37696870921331777 and parameters: {'num_leaves': 510, 'learning_rate': 0.00210782508552213, 'n_estimators': 566, 'min_child_samples': 141, 'subsample': 0.8182083360968496, 'colsample_bytree': 0.9705880042887588, 'reg_alpha': 7.805529267049509e-08, 'reg_lambda': 8.812050498498413}. Best is trial 13 with value: 0.3959701497802227.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007481 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000407 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1523
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 05:26:50,187] Trial 26 finished with value: 0.39678347958986804 and parameters: {'num_leaves': 55, 'learning_rate': 0.004697896472402406, 'n_estimators': 840, 'min_child_samples': 77, 'subsample': 0.9175338566179545, 'colsample_bytree': 0.9080219605830527, 'reg_alpha': 0.001309146013815191, 'reg_lambda': 0.0028021612331517996}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008970 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000377 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 05:38:20,307] Trial 27 finished with value: 0.368819015792999 and parameters: {'num_leaves': 400, 'learning_rate': 0.004686465256151772, 'n_estimators': 850, 'min_child_samples': 37, 'subsample': 0.9587203408002712, 'colsample_bytree': 0.8022949415895252, 'reg_alpha': 0.0016566790221677135, 'reg_lambda': 0.002715666136720299}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.013324 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001158 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 05:56:27,254] Trial 28 finished with value: 0.3571596029642719 and parameters: {'num_leaves': 933, 'learning_rate': 0.0021543928353626057, 'n_estimators': 682, 'min_child_samples': 82, 'subsample': 0.9029820050549608, 'colsample_bytree': 0.9700017208740724, 'reg_alpha': 6.45560863400974e-06, 'reg_lambda': 6.459359570331958e-05}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012429 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000374 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 06:11:32,515] Trial 29 finished with value: 0.35382416674262435 and parameters: {'num_leaves': 590, 'learning_rate': 0.00620529052953356, 'n_estimators': 837, 'min_child_samples': 66, 'subsample': 0.9588780133685462, 'colsample_bytree': 0.8834799713417191, 'reg_alpha': 0.036233471533710045, 'reg_lambda': 0.00021386297457444355}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007746 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001166 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 06:20:08,841] Trial 30 finished with value: 0.37388617828485743 and parameters: {'num_leaves': 288, 'learning_rate': 0.004678517614138104, 'n_estimators': 892, 'min_child_samples': 30, 'subsample': 0.940180617106022, 'colsample_bytree': 0.7141394554696158, 'reg_alpha': 3.892020915241124e-05, 'reg_lambda': 0.02756873989777341}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012634 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001189 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 06:21:09,937] Trial 31 finished with value: 0.38950049405640763 and parameters: {'num_leaves': 63, 'learning_rate': 0.012581051595197252, 'n_estimators': 409, 'min_child_samples': 104, 'subsample': 0.8905347935225636, 'colsample_bytree': 0.9038463472551569, 'reg_alpha': 0.16808871856719768, 'reg_lambda': 0.0021191619705292217}. Best is trial 26 with value: 0.39678347958986804.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007530 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000939 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1523
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 06:22:04,258] Trial 32 finished with value: 0.39727024470561545 and parameters: {'num_leaves': 51, 'learning_rate': 0.010364879115835675, 'n_estimators': 417, 'min_child_samples': 127, 'subsample': 0.8998922394638185, 'colsample_bytree': 0.8449677311612888, 'reg_alpha': 0.23690201989580129, 'reg_lambda': 0.001137722735380189}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.013012 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000381 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 06:24:26,697] Trial 33 finished with value: 0.3825417047586087 and parameters: {'num_leaves': 278, 'learning_rate': 0.011055989432373809, 'n_estimators': 261, 'min_child_samples': 153, 'subsample': 0.8508912405703908, 'colsample_bytree': 0.8356741867239743, 'reg_alpha': 0.009222190803056226, 'reg_lambda': 0.0003297732194046904}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008123 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001172 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1528
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 06:29:45,007] Trial 34 finished with value: 0.3901546630153363 and parameters: {'num_leaves': 159, 'learning_rate': 0.0015874477739702022, 'n_estimators': 904, 'min_child_samples': 129, 'subsample': 0.9714557169370475, 'colsample_bytree': 0.8758763951787937, 'reg_alpha': 1.1267474511653968e-06, 'reg_lambda': 0.058063097904926815}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007743 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000394 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 06:34:28,129] Trial 35 finished with value: 0.3331095048784626 and parameters: {'num_leaves': 431, 'learning_rate': 0.03411661517139753, 'n_estimators': 368, 'min_child_samples': 193, 'subsample': 0.9064105614434854, 'colsample_bytree': 0.8232669343112691, 'reg_alpha': 0.11090606256573378, 'reg_lambda': 0.0022354318336778646}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006713 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001274 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 07:04:02,675] Trial 36 finished with value: 0.3490703239033285 and parameters: {'num_leaves': 1362, 'learning_rate': 0.0020803919002609873, 'n_estimators': 692, 'min_child_samples': 71, 'subsample': 0.8764335423867305, 'colsample_bytree': 0.8591449585631448, 'reg_alpha': 0.0006091000543676044, 'reg_lambda': 0.0012578669993108906}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007340 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001133 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 07:24:54,962] Trial 37 finished with value: 0.3441718852473604 and parameters: {'num_leaves': 2884, 'learning_rate': 0.007666860035647115, 'n_estimators': 995, 'min_child_samples': 187, 'subsample': 0.9400598638898517, 'colsample_bytree': 0.6014192990665063, 'reg_alpha': 5.8298594769564427e-08, 'reg_lambda': 1.5483934336673615e-06}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009069 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001053 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 07:31:14,704] Trial 38 finished with value: 0.3831202339841277 and parameters: {'num_leaves': 650, 'learning_rate': 0.0010147965844683948, 'n_estimators': 584, 'min_child_samples': 288, 'subsample': 0.9149884469210452, 'colsample_bytree': 0.7866107402202132, 'reg_alpha': 1.3938083392245335, 'reg_lambda': 0.0004110643506283034}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009316 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001260 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 07:36:02,904] Trial 39 finished with value: 0.11857096593738359 and parameters: {'num_leaves': 370, 'learning_rate': 0.6948160343384916, 'n_estimators': 425, 'min_child_samples': 238, 'subsample': 0.9749181253748059, 'colsample_bytree': 0.9368944733323955, 'reg_alpha': 0.00017411077698522805, 'reg_lambda': 7.545764380492511e-05}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006698 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001222 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 07:44:27,737] Trial 40 finished with value: 0.35849761648835876 and parameters: {'num_leaves': 1606, 'learning_rate': 0.0027150786420327986, 'n_estimators': 222, 'min_child_samples': 90, 'subsample': 0.8511847971330866, 'colsample_bytree': 0.9794557725190631, 'reg_alpha': 0.011925427783841685, 'reg_lambda': 0.35195088070912983}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009524 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000441 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1528
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 07:49:41,185] Trial 41 finished with value: 0.38987443995179516 and parameters: {'num_leaves': 152, 'learning_rate': 0.001470980424452059, 'n_estimators': 923, 'min_child_samples': 126, 'subsample': 0.970963034503196, 'colsample_bytree': 0.8779744662580677, 'reg_alpha': 1.136186675054683e-06, 'reg_lambda': 0.06552667468021157}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009511 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000432 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1528
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 07:53:43,373] Trial 42 finished with value: 0.39198604519738145 and parameters: {'num_leaves': 118, 'learning_rate': 0.0015323446243498758, 'n_estimators': 876, 'min_child_samples': 128, 'subsample': 0.938512319881486, 'colsample_bytree': 0.8268352866831608, 'reg_alpha': 3.365510878855606e-07, 'reg_lambda': 0.029939205778657624}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009306 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3347
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 54
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000370 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1530
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 07:58:47,281] Trial 43 finished with value: 0.3863033349518441 and parameters: {'num_leaves': 177, 'learning_rate': 0.0015018252646733215, 'n_estimators': 806, 'min_child_samples': 156, 'subsample': 0.8884723686413736, 'colsample_bytree': 0.7788619024896871, 'reg_alpha': 3.7169634599680786e-07, 'reg_lambda': 0.019287823573082077}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011428 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000427 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1523
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 08:00:06,190] Trial 44 finished with value: 0.3963521382903647 and parameters: {'num_leaves': 33, 'learning_rate': 0.004046600664043105, 'n_estimators': 856, 'min_child_samples': 117, 'subsample': 0.9387598244958235, 'colsample_bytree': 0.8315696196366644, 'reg_alpha': 5.963908718131518e-08, 'reg_lambda': 0.0008012267547206013}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011940 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000374 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 08:30:13,219] Trial 45 finished with value: 0.29694975869140816 and parameters: {'num_leaves': 2545, 'learning_rate': 0.009970969446067408, 'n_estimators': 717, 'min_child_samples': 86, 'subsample': 0.8682694836273926, 'colsample_bytree': 0.8461935558387997, 'reg_alpha': 2.7234606796198178e-08, 'reg_lambda': 0.005782961442723133}. Best is trial 32 with value: 0.39727024470561545.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008984 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000653 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1271
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 08:31:02,818] Trial 46 finished with value: 0.39998967429197574 and parameters: {'num_leaves': 20, 'learning_rate': 0.0048480542691370565, 'n_estimators': 795, 'min_child_samples': 26, 'subsample': 0.9185434289791102, 'colsample_bytree': 0.8961484889402455, 'reg_alpha': 3.6631419392796174e-08, 'reg_lambda': 0.0008458417670149867}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008807 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001245 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 08:38:53,288] Trial 47 finished with value: 0.37582448206674374 and parameters: {'num_leaves': 300, 'learning_rate': 0.005344903586038742, 'n_estimators': 794, 'min_child_samples': 33, 'subsample': 0.9343423935658389, 'colsample_bytree': 0.8159405367510046, 'reg_alpha': 2.5819690524085065e-08, 'reg_lambda': 1.3626921700343935e-05}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007667 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000380 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 08:51:48,202] Trial 48 finished with value: 0.37044335916526927 and parameters: {'num_leaves': 442, 'learning_rate': 0.004276076249339802, 'n_estimators': 847, 'min_child_samples': 25, 'subsample': 0.8938413492094243, 'colsample_bytree': 0.8969990071562065, 'reg_alpha': 3.5013704403577134e-06, 'reg_lambda': 0.0007226576793902029}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010312 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001104 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 08:57:38,154] Trial 49 finished with value: 0.2574345990048629 and parameters: {'num_leaves': 202, 'learning_rate': 0.0643806424554858, 'n_estimators': 942, 'min_child_samples': 73, 'subsample': 0.8401877416076688, 'colsample_bytree': 0.8467481414046157, 'reg_alpha': 7.316721900087633e-08, 'reg_lambda': 0.00013699512192296912}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009563 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001313 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 09:14:33,809] Trial 50 finished with value: 0.082080950032306 and parameters: {'num_leaves': 875, 'learning_rate': 0.1563867839039634, 'n_estimators': 639, 'min_child_samples': 50, 'subsample': 0.865112214141941, 'colsample_bytree': 0.9185120884049495, 'reg_alpha': 0.001469395947932725, 'reg_lambda': 0.004596565221074006}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007984 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3357
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 59
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002467 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1021
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 09:15:11,511] Trial 51 finished with value: 0.3967868263207687 and parameters: {'num_leaves': 42, 'learning_rate': 0.003411761402259993, 'n_estimators': 303, 'min_child_samples': 1, 'subsample': 0.9188212998400588, 'colsample_bytree': 0.961644822170372, 'reg_alpha': 1.68179827594976e-07, 'reg_lambda': 0.0007210280349720831}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012940 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002431 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1026
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 09:15:27,180] Trial 52 finished with value: 0.39536727088839696 and parameters: {'num_leaves': 26, 'learning_rate': 0.00337649101040249, 'n_estimators': 182, 'min_child_samples': 16, 'subsample': 0.9237683049876848, 'colsample_bytree': 0.9574909814698681, 'reg_alpha': 1.695389838059283e-07, 'reg_lambda': 0.00084116398453706}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006111 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3357
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 59
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001172 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

[I 2024-10-01 09:24:19,459] Trial 53 finished with value: 0.3600954588376202 and parameters: {'num_leaves': 307, 'learning_rate': 0.008907263289959697, 'n_estimators': 799, 'min_child_samples': 1, 'subsample': 0.8825370793761647, 'colsample_bytree': 0.8656813302585153, 'reg_alpha': 6.192470989468989e-07, 'reg_lambda': 0.0011993401374217974}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010317 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000373 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 09:26:34,162] Trial 54 finished with value: 0.3844584333213281 and parameters: {'num_leaves': 172, 'learning_rate': 0.005437535382035154, 'n_estimators': 328, 'min_child_samples': 22, 'subsample': 0.8114946750426906, 'colsample_bytree': 0.8945107680395359, 'reg_alpha': 4.624598597485199e-08, 'reg_lambda': 3.618423145152796e-05}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011837 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000453 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 10:17:31,901] Trial 55 finished with value: 0.1979698557283231 and parameters: {'num_leaves': 2144, 'learning_rate': 0.01394759080994565, 'n_estimators': 765, 'min_child_samples': 53, 'subsample': 0.9483570363847638, 'colsample_bytree': 0.9319536570196842, 'reg_alpha': 1.8230384717577946e-08, 'reg_lambda': 0.00014346656148775294}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008548 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000395 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info]

[I 2024-10-01 10:26:06,872] Trial 56 finished with value: 0.3016468060230445 and parameters: {'num_leaves': 590, 'learning_rate': 0.023829521275911883, 'n_estimators': 462, 'min_child_samples': 42, 'subsample': 0.9083572083877405, 'colsample_bytree': 0.9545305405793543, 'reg_alpha': 2.682375779965515e-06, 'reg_lambda': 1.6309759650254202e-05}. Best is trial 46 with value: 0.39998967429197574.


[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007904 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3349
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 55
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Info] Number of positive: 11916, number of negative: 135688
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001086 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1529
[LightGBM] [Info] Number of data points in the train set: 147604, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[L

In [None]:
from sklearn.model_selection import StratifiedKFold
import numpy as np

sanitizer = FunctionTransformer(sanitize_feature_names)

def evaluate_model_cv(model, X, y, n_splits=5):
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    auc_scores = []

    for fold, (train_index, val_index) in enumerate(skf.split(X, y), 1):
        X_train, X_val = X.iloc[train_index], X.iloc[val_index]
        y_train, y_val = y.iloc[train_index], y.iloc[val_index]

        X_train_sanitized = sanitizer.transform(X_train)
        X_val_sanitized = sanitizer.transform(X_val)

        model.fit(X_train_sanitized, y_train)
        y_pred_proba = model.predict_proba(X_val_sanitized)[:, 1]
        auc_score = roc_auc_score(y_val, y_pred_proba)
        auc_scores.append(auc_score)

        print(f"Fold {fold} AUC: {auc_score:.4f}")

    mean_auc = np.mean(auc_scores)
    std_auc = np.std(auc_scores)
    print(f"\nMean AUC: {mean_auc:.4f} (±{std_auc:.4f})")
    return mean_auc

# Load the data
X = train_df.drop(["target", "sk_id_curr"], axis=1)
y = train_df["target"]

# Evaluate models
xgb_model = load_checkpoint("xgboost_best", '../models', is_tuned=True)['model']
lgb_model = load_checkpoint("lightgbm_best", '../models', is_tuned=True)['model']

print("XGBoost Evaluation:")
xgb_auc = evaluate_model_cv(xgb_model, X, y)

print("\nLightGBM Evaluation:")
lgb_auc = evaluate_model_cv(lgb_model, X, y)

# Choose the best model
best_model = xgb_model if xgb_auc > lgb_auc else lgb_model
best_model_name = "XGBoost" if xgb_auc > lgb_auc else "LightGBM"

# Generate predictions for submission
X_submission = submission_df.drop("sk_id_curr", axis=1)
X_submission_sanitized = sanitizer.transform(X_submission)

submission_predictions = best_model.predict_proba(X_submission_sanitized)[:, 1]

# Create submission file
submission_df = pd.DataFrame({
    "sk_id_curr": submission_df["sk_id_curr"].astype(int),
    "target": submission_predictions
})

# submission_df.to_csv("submission.csv", index=False)
print(f"\nSubmission file generated using the best model ({best_model_name}).")

print("\nDistribution of predictions:")
print(submission_df['target'].describe())

In [8]:
import joblib

# 1. Load the best model
best_model_path = f"../models/tuned_xgboost_best_checkpoint.pkl"
best_model_data = joblib.load(best_model_path)
best_model = best_model_data['model']
selected_features = best_model_data['selected_features']

# 2. Apply feature selection to the test set
X_test_selected = X_test[selected_features]

# 3. Make predictions on the test set
y_test_pred = best_model.predict(X_test_selected)
y_test_pred_proba = best_model.predict_proba(X_test_selected)[:, 1]

# 4. Calculate and report performance metrics
test_precision = precision_score(y_test, y_test_pred)
test_recall = recall_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)
test_f2 = fbeta_score(y_test, y_test_pred, beta=2)
test_auc_roc = roc_auc_score(y_test, y_test_pred_proba)

print("Test Set Results:")
print(f"Precision: {test_precision:.4f}")
print(f"Recall: {test_recall:.4f}")
print(f"F1-Score: {test_f1:.4f}")
print(f"F2-Score: {test_f2:.4f}")
print(f"AUC-ROC: {test_auc_roc:.4f}")

# 5. Compare with validation results
print("\nComparison with Validation Results:")
print(f"Validation F2-Score: {results['f2_score']:.4f}")
print(f"Test F2-Score:       {test_f2:.4f}")
print(f"Difference:          {abs(results['f2_score'] - test_f2):.4f}")

KeyError: "['name_family_status_civil_marriage'] not in index"