LightGBM has many tuning options, but the most important ones are:

🔹 learning_rate: Controls step size during boosting. Smaller = better generalization, but slower.

🔹 num_leaves: Controls the complexity of trees. More leaves = better fit but risk of overfitting.

🔹 max_depth: Limits tree depth to prevent overfitting.

🔹 n_estimators: Number of boosting rounds. Higher values help, but increase computation.

🔹 min_data_in_leaf: Minimum samples in a leaf to prevent overly complex trees.

🔹 feature_fraction: Percentage of features used per tree (helps regularization).

🔹 bagging_fraction: Percentage of data used per boosting round (like dropout in neural networks).

🔹 lambda_l1, lambda_l2: L1 and L2 regularization to reduce overfitting.

🔹 boosting_type: gbdt (default, gradient boosting), dart (dropout-based boosting), goss (Gradient-based One-Side Sampling).

In [8]:
import lightgbm as lgb

from sklearn.model_selection import GridSearchCV,train_test_split

from sklearn.datasets import make_classification

from sklearn.metrics import accuracy_score

In [9]:
# Generate sample classification data

X,y = make_classification(n_samples=1000, n_features=10, random_state=42)

In [10]:
# Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)

In [11]:
# Define parameter grid

param_grid = {

    'learning_rate' : [0.01, 0.1, 0.2],
    'num_leaves' : [20, 31, 50],
    'max_depth':[3,5,7],
    'n_estimators':[50, 100, 200]
}

In [12]:
# Initialize LightGBM classifier

lgbm_clf = lgb.LGBMClassifier()

In [13]:
# Perform Grid Search

grid_search = GridSearchCV(

    lgbm_clf,
    param_grid,
    cv=3,
    scoring='accuracy',
    verbose=1

)

In [14]:
grid_search.fit(X_train,y_train)

Fitting 3 folds for each of 81 candidates, totalling 243 fits
[LightGBM] [Info] Number of positive: 258, number of negative: 275
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000272 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1780
[LightGBM] [Info] Number of data points in the train set: 533, number of used features: 10
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.484053 -> initscore=-0.063812
[LightGBM] [Info] Start training from score -0.063812
[LightGBM] [Info] Number of positive: 259, number of negative: 274
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000249 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1780
[LightGBM] [Info] Number of data points in the train set: 533, number of used features: 10
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.485929 -> initscore=-0.056300
[LightGBM] [Info

In [15]:
# Best parameters

print(f'Best Parameters : {grid_search.best_params_}')

Best Parameters : {'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 200, 'num_leaves': 20}


In [16]:
# Predict and Evaluate with Best model

best_model = grid_search.best_estimator_

y_pred = best_model.predict(X_test)

print(f'Accuracy of Best Model : {accuracy_score(y_test,y_pred):.4f}')

Accuracy of Best Model : 0.8850
