### Evaluation Metrics

In [None]:
def evaluate_regression(name, y_true, y_pred, X_data):

    # 1. Basic Metrics (MAE, MSE, RMSE)
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)

    # 2. R2 and Adjusted R2
    r2 = r2_score(y_true, y_pred)

    # Formula for Adjusted R2
    n = len(y_true)
    k = X_data.shape[1]
    adj_r2 = 1 - ((1 - r2) * (n - 1) / (n - k - 1))

    print(f"--- {name} Performance ---")
    print(f"MAE: {mae:.4f}")
    print(f"MSE: {mse:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"R2 Score: {r2:.4f}")
    print(f"Adjusted R2: {adj_r2:.4f}")

    # 3. Heteroscedasticity Test (Breusch-Pagan)
    # Residuals calculation
    residuals = y_true - y_pred

    # Simple Breusch-Pagan test
    print("\n--- Statistical Tests ---")

    bp_test = sms.het_breuschpagan(residuals, sm.add_constant(X_data))
    print(f"Breusch-Pagan p-value: {bp_test[1]:.4f}")

    # 4. Visualizations (Residual Plot & Q-Q Plot)
    plt.figure(figsize=(12, 5))

    # Residual Plot
    plt.subplot(1, 2, 1)
    sns.scatterplot(x=y_pred, y=residuals, alpha=0.5)
    plt.axhline(y=0, color='red', linestyle='--')
    plt.title(f'Residuals vs Predicted ({name})')
    plt.xlabel('Predicted Values')
    plt.ylabel('Residuals')

    # Q-Q Plot
    plt.subplot(1, 2, 2)
    stats.probplot(residuals, dist="norm", plot=plt)
    plt.title(f'Normal Q-Q Plot ({name})')

    plt.tight_layout()
    plt.show()
    plt.close()

### Ranked Model Performance (From Best to Worst)
1. LightGBM Regressor (The Winner)

MAE: 6.7969

RMSE: 8.5189

R2 Score: 0.2445

Comment: The most accurate and efficient model.

2. Random Forest Regressor

MAE: 6.8011

RMSE: 8.5203

R2 Score: 0.2411

Comment: Highly stable and very close to LightGBM.

3. Decision Tree (max_depth=10)

MAE: 6.8053

RMSE: 8.5370

R2 Score: 0.2412

Comment: Excellent performance for a single tree model.

4. OLS / Ridge / Linear SVR (Tied)

MAE: 7.7423

RMSE: 9.4044

R2 Score: 0.0792

Comment: Baseline linear models, showing limited ability to capture complex patterns.

5. Lasso (L1)

MAE: 7.7423

RMSE: 9.4070

R2 Score: 0.0787

Comment: Slightly lower performance due to its penalty on features.

6. Elastic Net

MAE: 7.7455

RMSE: 9.4083

R2 Score: 0.0785

Comment: Similar to Lasso, showing linear limitations.

7. KNN (k=11, Manhattan)

MAE: 7.8789

RMSE: 9.6209

R2 Score: NaN

Comment: Poor performance and struggled with the data distribution.

8. RBF SVR (Sampled)

MAE: 7.3980

RMSE: 9.8179

R2 Score: -0.0002

Comment: Worst performance; the negative R2 indicates it performed worse than a horizontal mean line.

### Final Verdict:

#The Winner is LightGBM Regressor
LightGBM is the best performing model for this dataset for the following reasons:

Highest Accuracy: It achieved the lowest MAE (6.7969) and the highest R2 Score (0.2445). This means it explains the variance in days_since_prior_order better than any other model.

Error Minimization: It has the lowest RMSE (8.5189), indicating that it handles large errors/outliers better than the Decision Tree or Random Forest.

Efficiency: Unlike SVR or Random Forest which required "Sampling," LightGBM handled the full 10.6 Million rows efficiently, capturing the global patterns of the entire dataset.

Non-Linearity: The jump in R2 score from 0.07 (Linear) to 0.24 (LightGBM) proves that the relationship between user features and order frequency is highly non-linear, which only boosting trees could effectively capture. **bold text**

### Important Technical Note:
All models showed a Breusch-Pagan p-value of 0.0000, which indicates Heteroscedasticity.

###  Technical Analysis of Model Failures

#### 1. Why KNN showed `NaN` for $R^2$ Score?
The `NaN` (Not a Number) result for the $R^2$ Score in the KNN models is typically due to:

* **Scale and Distance Instability:** KNN relies entirely on calculating distances between points. With a massive dataset (10M+ rows), if the features are not perfectly scaled, the distances can become mathematically unstable.

* **Constant Predictions:** If the algorithm fails to find meaningful neighbors due to the data's density and predicts the same value for all instances, the denominator ($SS_{tot}$) in the $R^2$ formula becomes zero, resulting in a mathematical undefined state.
* **Computational Overhead:** KNN is not designed for datasets of this magnitude. The **"curse of dimensionality"** and the sheer volume of calculations often lead to memory fragmentation or corrupted metric outputs in standard libraries.




#### 2. Why RBF SVR performed poorly (Negative $R^2$)?
The **-0.0002** $R^2$ score indicates that the model is performing worse than a horizontal line representing the mean of the data. This happened because:

* **Sampling Bias:** Since RBF SVR is computationally expensive, it was trained on a small sample. It likely **overfitted** to that specific sample and failed to generalize to the patterns in the full dataset.
* **Hyperparameter Sensitivity:** The RBF kernel is extremely sensitive to the $C$ (regularization) and $\gamma$ (gamma) parameters. Without exhaustive Grid Search, the model fails to find the correct "decision boundary," leading to massive errors.
* **Noise Sensitivity:** RBF kernels try to map data into higher dimensions. In behavioral data (like order frequency), there is often a lot of "noise." The RBF kernel likely captured the noise instead of the actual trend, making the predictions less accurate than the simple average.