In [4]:
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score

# Load the diabetes dataset
X, y = datasets.load_diabetes(as_frame=True, scaled=False, return_X_y=True)

# Perform cross-validation on polynomial models ranging from degree 0 to 8
cv_results = []
for degree in range(9):
    if degree == 0:
        poly_features = PolynomialFeatures(degree=1, include_bias=True)
    else:
        poly_features = PolynomialFeatures(degree=degree, include_bias=False)
    

    X_poly = poly_features.fit_transform(X)

    linear_model = LinearRegression()

    scores = cross_val_score(linear_model, X_poly, y, cv=5, scoring='r2')
    mean_r2 = scores.mean()
    mean_mae = -cross_val_score(linear_model, X_poly, y, cv=5, scoring='neg_mean_absolute_error').mean()
    cv_results.append((degree, mean_r2, mean_mae))

With degrees ranging from 0 to 8, this programme conducts cross-validation on polynomial models of various types. Using PolynomialFeatures from Scikit-Learn, it generates a polynomial feature transformer for each degree. In order to prevent an empty output array when the degree is 0, it sets degree=1 and includes the bias term by setting include_bias=True.

The model is cross-validated using the cross_val_score function. By default, it calculates the R-Squared scores. For each degree, the mean R-Squared score and mean MAE are computed and recorded in the cv_results list.

Q2. Construct a table summarizing the cross-validation results. Each model should have a separate row in the table. Include the R-Squared and Mean Absolute Error (MAE) metrics for each model. Calculate the mean value and standard deviation of these metrics from the cross-validation. Include both values. (2 points)

In [5]:
# Construct a table summarizing the cross-validation results
print("Model\t\tR-Squared\t\tMAE\n")
for result in cv_results:
    print(f"Degree {result[0]}:\t{result[1]:.3f}\t\t{result[2]:.3f}")

Model		R-Squared		MAE

Degree 0:	0.482		44.276
Degree 1:	0.482		44.276
Degree 2:	0.392		46.613
Degree 3:	-159.005		273.561
Degree 4:	-571.083		657.260
Degree 5:	-436.857		562.994
Degree 6:	-1694.192		742.660
Degree 7:	-5530.894		1032.682
Degree 8:	-16076.255		1475.659


The cross-validation findings are summarised in a table that is printed. For each model, the degree, R-Squared score, and MAE are shown.

In [6]:
# Identify the best model based on R-Squared and MAE metrics
best_model = max(cv_results, key=lambda x: x[1])
best_degree = best_model[0]
best_r2 = best_model[1]
best_mae = best_model[2]

print("\nBest Model:")
print(f"Degree: {best_degree}")
print(f"R-Squared: {best_r2:.3f}")
print(f"MAE: {best_mae:.3f}")


Best Model:
Degree: 0
R-Squared: 0.482
MAE: 44.276


By calculating the highest R-Squared score, it determines which model is the best. From the cv_results list, it retrieves the best model's degree, R-Squared score, and MAE.