In [1]:
#Theoretical
# Question 1: What does R-squared represent in a regression model?
# Answer:
# R-squared (R²) measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 0 means no variance is explained, and 1 means perfect explanation. Higher values indicate a better fit.

# Question 2: What are the assumptions of linear regression?
# Answer:
# - Linearity: Relationship between dependent and independent variables is linear.
# - Independence: Observations are independent of each other.
# - Homoscedasticity: Constant variance of residuals.
# - Normality: Residuals are normally distributed.
# - No multicollinearity: Independent variables are not highly correlated.

# Question 3: What is the difference between R-squared and Adjusted R-squared?
# Answer:
# R-squared increases with the number of variables, even if they are irrelevant. Adjusted R-squared accounts for the number of predictors, and only increases if the new predictor improves the model meaningfully.

# Question 4: Why do we use Mean Squared Error (MSE)?
# Answer:
# MSE measures the average squared difference between actual and predicted values. It penalizes larger errors more than smaller ones, helping to identify models that make large prediction errors.

# Question 5: What does an Adjusted R-squared value of 0.85 indicate?
# Answer:
# It means that 85% of the variance in the dependent variable is explained by the independent variables, adjusted for the number of predictors. It suggests a strong model fit.

# Question 6: How do we check for normality of residuals in linear regression?
# Answer:
# - Use a Q-Q plot
# - Histogram of residuals
# - Shapiro-Wilk or Kolmogorov-Smirnov test

# Question 7: What is multicollinearity, and how does it impact regression?
# Answer:
# Multicollinearity occurs when independent variables are highly correlated. It inflates standard errors, making coefficients unreliable and reducing interpretability.

# Question 8: What is Mean Absolute Error (MAE)?
# Answer:
# MAE is the average of absolute differences between predicted and actual values. It gives equal weight to all errors and is less sensitive to outliers than MSE.

# Question 9: What are the benefits of using an ML pipeline?
# Answer:
# - Streamlines preprocessing and modeling
# - Reduces risk of data leakage
# - Makes code cleaner and reproducible
# - Facilitates model deployment and tuning

# Question 10: Why is RMSE considered more interpretable than MSE?
# Answer:
# RMSE is the square root of MSE, which brings the error metric back to the same unit as the target variable, making it more interpretable.

# Question 11: What is pickling in Python, and how is it useful in ML?
# Answer:
# Pickling is the process of converting a Python object into a byte stream. It allows saving trained ML models to disk so they can be reloaded later for predictions.

# Question 12: What does a high R-squared value mean?
# Answer:
# It means a large proportion of the variance in the dependent variable is explained by the independent variables. However, high R² doesn't guarantee a good model if assumptions are violated.

# Question 13: What happens if linear regression assumptions are violated?
# Answer:
# - Inaccurate predictions
# - Invalid statistical inference (e.g., p-values)
# - Biased or inefficient estimators

# Question 14: How can we address multicollinearity in regression?
# Answer:
# - Remove or combine correlated variables
# - Use dimensionality reduction (e.g., PCA)
# - Use regularization (e.g., Ridge, Lasso)

# Question 15: How can feature selection improve model performance in regression analysis?
# Answer:
# - Reduces overfitting
# - Simplifies the model
# - Improves interpretability
# - Reduces training time

# Question 16: How is Adjusted R-squared calculated?
# Answer:
# Adjusted R² = 1 - [(1 - R²)(n - 1) / (n - p - 1)]
# Where n = number of observations, p = number of predictors

# Question 17: Why is MSE sensitive to outliers?
# Answer:
# MSE squares the errors, so large errors (from outliers) contribute disproportionately more to the total, making the metric sensitive to outliers.

# Question 18: What is the role of homoscedasticity in linear regression?
# Answer:
# Homoscedasticity ensures that the variance of residuals is constant. Violating it (heteroscedasticity) can lead to inefficient and biased estimates.

# Question 19: What is Root Mean Squared Error (RMSE)?
# Answer:
# RMSE is the square root of the Mean Squared Error. It provides a measure of the average magnitude of error, in the same units as the dependent variable.

# Question 20: Why is pickling considered risky?
# Answer:
# - Pickled files can execute arbitrary code when unpickled (security risk)
# - Not cross-language compatible
# - Not always backward-compatible across Python versions

# Question 21: What alternatives exist to pickling for saving ML models?
# Answer:
# - `joblib` (better for large NumPy arrays)
# - `ONNX` (cross-platform model format)
# - `PMML` (Predictive Model Markup Language)
# - Saving model weights/params manually (e.g., JSON, HDF5)

# Question 22: What is heteroscedasticity, and why is it a problem?
# Answer:
# Heteroscedasticity means non-constant variance of residuals. It violates regression assumptions, leading to inefficient estimates and unreliable confidence intervals.

# Question 23: How can interaction terms enhance a regression model's predictive power?
# Answer:
# Interaction terms capture combined effects of two or more variables on the target. This adds flexibility and allows the model to detect relationships that depend on variable combinations.

#Practical

# Question 1: Write a Python script to visualize the distribution of errors (residuals) for a multiple linear regression model using Seaborn's "diamonds" dataset.
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = np.random.rand(100, 2), np.random.rand(100)  # Two features and target

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict(X)

# Calculate residuals
residuals = y - y_pred

# Plot residuals
sns.histplot(residuals, kde=True)
plt.title("Distribution of Residuals")
plt.show()

# Question 2: Write a Python script to calculate and print Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) for a linear regression model.
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic data
X, y = np.random.rand(100, 2), np.random.rand(100)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict(X)

# Calculate MSE, MAE, RMSE
mse = mean_squared_error(y, y_pred)
mae = mean_absolute_error(y, y_pred)
rmse = np.sqrt(mse)

# Print the results
print(f"MSE: {mse}")
print(f"MAE: {mae}")
print(f"RMSE: {rmse}")

# Question 3: Write a Python script to check if the assumptions of linear regression are met. Use a scatter plot to check linearity, residuals plot for homoscedasticity, and correlation matrix for multicollinearity.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = np.random.rand(100, 2), np.random.rand(100)

# Check linearity
plt.scatter(X[:, 0], y)
plt.title("Linearity Check (X1 vs y)")
plt.xlabel("X1")
plt.ylabel("y")
plt.show()

# Train model
model = LinearRegression()
model.fit(X, y)

# Predict and calculate residuals
y_pred = model.predict(X)
residuals = y - y_pred

# Homoscedasticity Check (Residuals plot)
plt.scatter(y_pred, residuals)
plt.title("Homoscedasticity Check")
plt.xlabel("Predicted")
plt.ylabel("Residuals")
plt.show()

# Correlation matrix for multicollinearity
corr_matrix = np.corrcoef(X.T)
sns.heatmap(corr_matrix, annot=True)
plt.title("Correlation Matrix")
plt.show()

# Question 4: Write a Python script that creates a machine learning pipeline with feature scaling and evaluates the performance of different regression models
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate synthetic data
X, y = np.random.rand(100, 2), np.random.rand(100)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline with standard scaling and a linear regression model
pipeline_lr = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LinearRegression())
])

# Fit the pipeline
pipeline_lr.fit(X_train, y_train)

# Predict
y_pred_lr = pipeline_lr.predict(X_test)

# Evaluate performance
mse_lr = mean_squared_error(y_test, y_pred_lr)
print(f"Linear Regression MSE: {mse_lr}")

# Add Ridge regression to the pipeline and evaluate
pipeline_ridge = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Ridge())
])

# Fit the Ridge pipeline
pipeline_ridge.fit(X_train, y_train)

# Predict
y_pred_ridge = pipeline_ridge.predict(X_test)

# Evaluate Ridge performance
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f"Ridge Regression MSE: {mse_ridge}")

# Question 5: Implement a simple linear regression model on a dataset and print the model's coefficients, intercept, and R-squared score.
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=10.0, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Print the results
print(f"Model Coefficient: {model.coef_[0]}")
print(f"Model Intercept: {model.intercept_}")
print(f"R-squared Score: {model.score(X, y)}")

# Question 6: Write a Python script that analyzes the relationship between total bill and tip in the 'tips' dataset using simple linear regression and visualizes the results.
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load the 'tips' dataset
tips = sns.load_dataset('tips')

# Prepare the data
X = tips[['total_bill']]  # Independent variable
y = tips['tip']  # Dependent variable

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict(X)

# Plot the results
plt.scatter(X, y, color='blue', label="Actual Data")
plt.plot(X, y_pred, color='red', label="Regression Line")
plt.title("Total Bill vs Tip")
plt.xlabel("Total Bill")
plt.ylabel("Tip")
plt.legend()
plt.show()

# Question 7: Write a Python script that fits a linear regression model to a synthetic dataset with one feature. Use the model to predict new values and plot the data points along with the regression line.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generate synthetic data
X = np.random.rand(100, 1) * 10  # Feature
y = 3 * X + np.random.randn(100, 1) * 5  # Target with some noise

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict new values
y_pred = model.predict(X)

# Plot data and regression line
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Regression line")
plt.title("Linear Regression with Synthetic Data")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 8: Write a Python script that pickles a trained linear regression model and saves it to a file.
import pickle
from sklearn.linear_model import LinearRegression
import numpy as np

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = 3 * X + np.random.randn(100, 1) * 5

# Train the model
model = LinearRegression()
model.fit(X, y)

# Pickle the trained model
with open("linear_regression_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model has been pickled and saved.")

# Question 9: Write a Python script that fits a polynomial regression model (degree 2) to a dataset and plots the regression curve.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = X**2 + np.random.randn(100, 1) * 10  # Quadratic relation with noise

# Transform data for polynomial regression
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Fit the model
model = LinearRegression()
model.fit(X_poly, y)

# Predict
y_pred = model.predict(X_poly)

# Plot the data and regression curve
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Polynomial regression line")
plt.title("Polynomial Regression (Degree 2)")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 10: Generate synthetic data for simple linear regression (use random values for X and y) and fit a linear regression model to the data. Print the model's coefficient and intercept.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=10.0, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Print the results
print(f"Model Coefficient: {model.coef_[0]}")
print(f"Model Intercept: {model.intercept_}")
print(f"R-squared Score: {model.score(X, y)}")

# Question 11: Write a Python script that fits polynomial regression models of different degrees to a synthetic dataset and compares their performance.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = X**3 + np.random.randn(100, 1) * 50  # Cubic relationship with noise

# List of polynomial degrees to try
degrees = [1, 2, 3, 4]

# Plotting the data and polynomial regression curves
plt.scatter(X, y, color='blue', label="Data points")

for degree in degrees:
    # Transform the data for polynomial regression
    poly = PolynomialFeatures(degree)
    X_poly = poly.fit_transform(X)

    # Fit the model
    model = LinearRegression()
    model.fit(X_poly, y)

    # Predict and plot the regression curve
    y_pred = model.predict(X_poly)
    plt.plot(X, y_pred, label=f"Degree {degree}")

    # Print the performance metrics
    mse = mean_squared_error(y, y_pred)
    print(f"Degree {degree} MSE: {mse:.2f}")

# Plot customization
plt.title("Polynomial Regression for Different Degrees")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 12: Write a Python script that fits a simple linear regression model with two features and prints the model's coefficients, intercept, and R-squared score.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

# Generate synthetic data with two features
X, y = make_regression(n_samples=100, n_features=2, noise=10.0, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Print the results
print(f"Model Coefficients: {model.coef_}")
print(f"Model Intercept: {model.intercept_}")
print(f"R-squared Score: {model.score(X, y)}")

# Question 13: Write a Python script that generates synthetic data, fits a linear regression model, and visualizes the regression line along with the data points.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = 2 * X + np.random.randn(100, 1) * 2  # Linear relationship with noise

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict the values
y_pred = model.predict(X)

# Plot the data and regression line
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Regression line")
plt.title("Simple Linear Regression")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 14: Write a Python script that uses the Variance Inflation Factor (VIF) to check for multicollinearity in a dataset with multiple features.
import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant

# Generate synthetic data
X = np.random.rand(100, 3) * 10
y = 2 * X[:, 0] + 3 * X[:, 1] + 4 * X[:, 2] + np.random.randn(100) * 5

# Convert X to a DataFrame
X_df = pd.DataFrame(X, columns=['Feature1', 'Feature2', 'Feature3'])

# Add constant for VIF calculation
X_with_const = add_constant(X_df)

# Calculate VIF for each feature
vif_data = pd.DataFrame()
vif_data["Feature"] = X_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(X_with_const.values, i) for i in range(X_with_const.shape[1])]

print(vif_data)

# Question 15: Write a Python script that generates synthetic data for a polynomial relationship (degree 4), fits a polynomial regression model, and plots the regression curve.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Generate synthetic data (degree 4)
X = np.random.rand(100, 1) * 10
y = X**4 + np.random.randn(100, 1) * 100  # Polynomial relationship with noise

# Transform data for polynomial regression
poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)

# Fit the model
model = LinearRegression()
model.fit(X_poly, y)

# Predict
y_pred = model.predict(X_poly)

# Plot the data and regression curve
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Polynomial regression line")
plt.title("Polynomial Regression (Degree 4)")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 16: Write a Python script that creates a machine learning pipeline with data standardization and a multiple linear regression model, and prints the R-squared score.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic data for multiple regression
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Create a pipeline with standardization and linear regression
pipeline = make_pipeline(StandardScaler(), LinearRegression())

# Train the model
pipeline.fit(X, y)

# Print the R-squared score
print(f"R-squared Score: {pipeline.score(X, y)}")

# Question 17: Write a Python script that performs polynomial regression (degree 3) on a synthetic dataset and plots the regression curve.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Generate synthetic data (degree 3)
X = np.random.rand(100, 1) * 10
y = X**3 + np.random.randn(100, 1) * 100  # Cubic relationship with noise

# Transform data for polynomial regression
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

# Fit the model
model = LinearRegression()
model.fit(X_poly, y)

# Predict
y_pred = model.predict(X_poly)

# Plot the data and regression curve
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Polynomial regression curve")
plt.title("Polynomial Regression (Degree 3)")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 18: Write a Python script that performs multiple linear regression on a synthetic dataset with 5 features. Print the R-squared score and model coefficients.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

# Generate synthetic data with 5 features
X, y = make_regression(n_samples=100, n_features=5, noise=10.0, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Print the results
print(f"Model Coefficients: {model.coef_}")
print(f"Model Intercept: {model.intercept_}")
print(f"R-squared Score: {model.score(X, y)}")

# Question 19: Write a Python script that generates synthetic data for linear regression, fits a model, and visualizes the data points along with the regression line.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generate synthetic data
X = np.random.rand(100, 1) * 10
y = 3 * X + np.random.randn(100, 1) * 5  # Linear relationship with noise

# Train the model
model = LinearRegression()
model.fit(X, y)

# Predict the values
y_pred = model.predict(X)

# Plot the data and regression line
plt.scatter(X, y, color='blue', label="Data points")
plt.plot(X, y_pred, color='red', label="Regression line")
plt.title("Linear Regression")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

# Question 20: Create a synthetic dataset with 3 features and perform multiple linear regression. Print the model's R-squared score and coefficients.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

# Generate synthetic data with 3 features
X, y = make_regression(n_samples=100, n_features=3, noise=10.0, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X, y)

# Print the results
print(f"Model Coefficients: {model.coef_}")
print(f"Model Intercept: {model.intercept_}")
print(f"R-squared Score: {model.score(X, y)}")

# Question 21: Write a Python script that demonstrates how to serialize and deserialize machine learning models using joblib instead of pickling.
import joblib
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Train a model
model = LinearRegression()
model.fit(X, y)

# Save the model using joblib
joblib.dump(model, 'linear_regression_model.joblib')

# Load the model from file
loaded_model = joblib.load('linear_regression_model.joblib')

# Print R-squared score of the loaded model
print(f"R-squared score from loaded model: {loaded_model.score(X, y)}")

# Question 22: Write a Python script to perform linear regression with categorical features using one-hot encoding. Use the Seaborn 'tips' dataset.
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Load the 'tips' dataset
tips = sns.load_dataset('tips')

# Define features (categorical and continuous) and target
X = tips[['sex', 'day', 'time', 'size']]
y = tips['total_bill']

# Define preprocessing pipeline (one-hot encode categorical features)
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(), ['sex', 'day', 'time']),
        ('num', 'passthrough', ['size'])
    ])

# Define full pipeline (preprocessing + linear regression)
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', LinearRegression())
])

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
pipeline.fit(X_train, y_train)

# Print R-squared score on the test set
print(f"R-squared score: {pipeline.score(X_test, y_test)}")

# Question 23: Compare Ridge Regression with Linear Regression on a synthetic dataset and print the coefficients and R-squared score.
from sklearn.linear_model import Ridge

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Train Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X, y)

# Train Ridge Regression model
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X, y)

# Print results for Linear Regression
print("Linear Regression:")
print(f"Coefficients: {lr_model.coef_}")
print(f"Intercept: {lr_model.intercept_}")
print(f"R-squared Score: {lr_model.score(X, y)}")

# Print results for Ridge Regression
print("\nRidge Regression:")
print(f"Coefficients: {ridge_model.coef_}")
print(f"Intercept: {ridge_model.intercept_}")
print(f"R-squared Score: {ridge_model.score(X, y)}")

# Question 24: Write a Python script that uses cross-validation to evaluate a Linear Regression model on a synthetic dataset.
from sklearn.model_selection import cross_val_score

# Perform cross-validation with 5 folds
cross_val_scores = cross_val_score(lr_model, X, y, cv=5)

# Print cross-validation results
print(f"Cross-validation scores: {cross_val_scores}")
print(f"Mean cross-validation score: {cross_val_scores.mean()}")

# Question 25: Write a Python script that compares polynomial regression models of different degrees and prints the R-squared score for each.
from sklearn.preprocessing import PolynomialFeatures

# Generate synthetic data for polynomial regression
X = np.random.rand(100, 1) * 10
y = X**3 + np.random.randn(100, 1) * 100  # Cubic relationship with noise

# Try different polynomial degrees and compare R-squared scores
for degree in [1, 2, 3, 4]:
    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X)

    # Fit the model
    poly_model = LinearRegression()
    poly_model.fit(X_poly, y)

    # Print R-squared score for each degree
    print(f"Degree {degree} Polynomial Regression R-squared: {poly_model.score(X_poly, y)}")