1️⃣ Essential Parameters
Parameter	               Effect
n_estimators	           Number of boosting rounds (trees)
learning_rate	           Step size for each iteration (low values improve generalization)
max_depth	               Tree depth (higher values capture complex patterns but risk overfitting)
min_child_weight	       Minimum sum of instance weight needed to split a node (higher reduces overfitting)
subsample	               Fraction of training samples used per tree (low values add randomness)
colsample_bytree	       Fraction of features used per tree (reduces correlation)
reg_alpha	               L1 Regularization (feature selection)
reg_lambda	               L2 Regularization (reduces overfitting)


🔹 Hyperparameter Tuning Methods

Grid Search                    (Exhaustive search but slow)
Random Search                  (Faster but less precise)
Bayesian Optimization          (Smart tuning using past results)
Hyperopt/Optuna                (Advanced tuning libraries)

In [50]:
from xgboost import XGBClassifier

from sklearn.datasets import make_classification

from sklearn.model_selection import GridSearchCV, train_test_split

from sklearn.metrics import accuracy_score,classification_report

In [51]:
# Generate synthetic data

X,y = make_classification(n_samples=500, n_features=10, n_informative=2, n_redundant=2, random_state=42)

In [52]:
# Split the data

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=42)

In [53]:
# Initialize model

xgb = XGBClassifier( eval_metric='logloss')


In [54]:
# Define parameter grid

param_grid = {

    "n_estimators": [50,100,200],
    "learning_rate": [0.01, 0.1, 0.2],
    "max_depth": [3,5,7],
    "subsample": [0.7, 0.8, 1.0],
    "colsample_bytree" : [0.7, 0.8, 1.0]
}

In [55]:
# Grid Search

# grid_search = GridSearchCV(xgb, param_grid = param_grid, scoring="accuracy", cv=3, verbose=1, n_jobs=-1)

grid_search = GridSearchCV(
    xgb,  
    param_grid=param_grid,
    scoring="accuracy",
    cv=3,
    verbose=1,
    n_jobs=-1
)

grid_search.fit(X_train, y_train)


Fitting 3 folds for each of 243 candidates, totalling 729 fits


In [56]:
# Best Parameters

print("Best Parameters :", grid_search.best_params_)

Best Parameters : {'colsample_bytree': 0.7, 'learning_rate': 0.2, 'max_depth': 7, 'n_estimators': 50, 'subsample': 0.8}


In [57]:
# Evaluate the best model

best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy with Best Parameters: {accuracy:.4f}")

report = classification_report(y_test, y_pred)
print(f"Classification report with Best Parameters: {report}")


Accuracy with Best Parameters: 0.9267
Classification report with Best Parameters:               precision    recall  f1-score   support

           0       0.96      0.90      0.93        79
           1       0.89      0.96      0.93        71

    accuracy                           0.93       150
   macro avg       0.93      0.93      0.93       150
weighted avg       0.93      0.93      0.93       150

