# Stacking 2.0: Meta-XGBoost on Top of Base Models (FE v2)
Combining XGBoost v2, LightGBM, and Random Forest using out-of-fold predictions and a meta-model.

## References:
- Stacking concept: Wolpert, D. H. (1992). "Stacked generalization"
- Implementation inspired by: https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/

In [None]:
# Import necessary libraries
import pandas as pd  # For data manipulation and analysis
import numpy as np   # For numerical operations
from sklearn.model_selection import KFold  # For K-fold cross-validation
from sklearn.ensemble import RandomForestRegressor  # Random Forest algorithm
from xgboost import XGBRegressor  # XGBoost algorithm
from lightgbm import LGBMRegressor  # LightGBM algorithm
from sklearn.metrics import mean_squared_log_error  # For evaluation metric


In [None]:
# Load preprocessed training and test datasets with feature engineering (v2)
train = pd.read_csv("datasets/train_fe_v2.csv")
test = pd.read_csv("datasets/test_fe_v2.csv")

# Separate features from target variable in training set
X = train.drop(columns=['id', 'Calories'])  # Features (drop ID and target)
y = train['Calories']  # Target variable
X_test = test.drop(columns='id')  # Test features (drop ID)
test_ids = test['id']  # Store test IDs for submission file


In [None]:
# Define optimized XGBoost parameters (likely from prior hyperparameter tuning)
xgb_params = {
    'n_estimators': 761,  # Number of trees/boosting rounds
    'max_depth': 8,  # Maximum tree depth
    'learning_rate': 0.0433,  # Step size shrinkage to prevent overfitting
    'subsample': 0.8292,  # Fraction of samples used for tree building
    'colsample_bytree': 0.6293,  # Fraction of features used per tree
    'gamma': 0.0251,  # Minimum loss reduction for further partition
    'reg_alpha': 0.8449,  # L1 regularization term
    'reg_lambda': 2.7842,  # L2 regularization term
    'random_state': 42,  # For reproducibility
    'n_jobs': -1  # Use all available CPU cores
}

# Define base models for stacking ensemble
base_models = [
    ('xgb', XGBRegressor(**xgb_params)),  # XGBoost with optimized parameters
    ('lgb', LGBMRegressor(n_estimators=200, random_state=42)),  # LightGBM with default params + 200 trees
    ('rf', RandomForestRegressor(n_estimators=200, random_state=42, n_jobs=-1))  # Random Forest with 200 trees
]


In [None]:
def get_oof_preds(models, X, y, X_test, n_splits=5):
    """
    Generate out-of-fold predictions for training data and averaged predictions for test data.
    
    Args:
        models: List of (name, model) tuples
        X: Training features
        y: Target values
        X_test: Test features
        n_splits: Number of cross-validation folds
        
    Returns:
        oof_train: Out-of-fold predictions for training data (used as meta-features)
        oof_test: Average predictions for test data (used as meta-features)
    """
    # Initialize K-fold cross-validation with shuffling
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    
    # Initialize arrays for out-of-fold predictions
    oof_train = np.zeros((X.shape[0], len(models)))  # For training data
    oof_test = np.zeros((X_test.shape[0], len(models)))  # For test data

    # Loop through each base model
    for i, (name, model) in enumerate(models):
        test_preds_folds = []  # Store test predictions from each fold
        
        # Perform K-fold cross-validation
        for train_idx, val_idx in kf.split(X):
            # Split data for this fold
            X_train_fold, X_val_fold = X.iloc[train_idx], X.iloc[val_idx]
            y_train_fold = y.iloc[train_idx]

            # Train model on training fold
            model.fit(X_train_fold, y_train_fold)
            
            # Generate out-of-fold predictions for validation fold
            oof_train[val_idx, i] = model.predict(X_val_fold)
            
            # Generate predictions for test data
            test_preds_folds.append(model.predict(X_test))

        # Average test predictions across all folds
        oof_test[:, i] = np.mean(test_preds_folds, axis=0)

    return oof_train, oof_test


In [None]:
# Generate out-of-fold predictions from base models to use as meta-features
# These will serve as input features for our meta-model
X_meta_train, X_meta_test = get_oof_preds(base_models, X, y, X_test)


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.019036 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2096
[LightGBM] [Info] Number of data points in the train set: 600000, number of used features: 14
[LightGBM] [Info] Start training from score 4.141163
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.016152 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2098
[LightGBM] [Info] Number of data points in the train set: 600000, number of used features: 14
[LightGBM] [Info] Start training from score 4.141466
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.018016 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2096
[LightGBM] [Info] Number of data points in the train set: 600000, number of used features: 14
[LightGBM] [Info] Start 

In [None]:
# Define and train the meta-model (XGBoost) on out-of-fold predictions
# Meta-model parameters are simplified compared to base XGBoost model
meta_model = XGBRegressor(n_estimators=200, learning_rate=0.1, random_state=42)
meta_model.fit(X_meta_train, y)  # Train meta-model on base models' predictions


In [None]:
# Import required metric (if not already imported)
from sklearn.metrics import mean_squared_log_error

# Generate predictions on training meta-features
meta_train_preds = meta_model.predict(X_meta_train)

# Calculate Root Mean Squared Logarithmic Error (RMSLE)
# RMSLE is often used for positive skewed targets like calorie expenditure
# Reference: https://www.kaggle.com/code/carlolepelaars/understanding-the-metric-rmsle
rmsle = np.sqrt(mean_squared_log_error(y, meta_train_preds))
print(f"Stacked Meta-XGB RMSLE (train set OOF): {rmsle:.5f}")


Stacked Meta-XGB RMSLE (train set OOF): 0.01742


In [None]:
# Generate predictions on test data using the meta-model
meta_preds = meta_model.predict(X_meta_test)

# Convert predictions back to original scale if log transformation was applied during preprocessing
# expm1() is the inverse of log1p() transformation
# Reference: https://numpy.org/doc/stable/reference/generated/numpy.expm1.html
final_preds = np.expm1(meta_preds)  # only if target was log1p-transformed during training

# Create submission dataframe with IDs and predictions
submission = pd.DataFrame({
    'id': test_ids,
    'Calories': final_preds
})

# Save submission to CSV file for Kaggle submission
submission.to_csv("submission_stacked_v2_meta_xgb.csv", index=False)
print("✅ Submission saved as 'submission_stacked_v2_meta_xgb.csv'")


### Stacked Meta-XGB RMSLE (OOF): 0.01742
✅ Interpretation:

- This is slightly worse than standalone XGBoost v2 (0.01712)

- Suggests base models like RF and LGBM aren't adding helpful diversity

- Meta-XGB may be overfitting OOF noise instead of improving signal

💡 Why This Happened:

- XGBoost v2 is already extremely optimized with tuned hyperparameters + SHAP features

- LightGBM and RF (used with mostly default settings) aren't beating it — they pull the ensemble down

- The meta-model's gain in robustness isn't outweighing this dilution

## References:
- RMSLE metric: https://www.kaggle.com/code/carlolepelaars/understanding-the-metric-rmsle
- Ensemble diversity importance: Zhou, Z. H. (2012) "Ensemble Methods: Foundations and Algorithms"