<a href="https://colab.research.google.com/github/xesmaze/cpsc541-fall2024/blob/main/lab_3/lab_3_EC_Ans.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Objective

Building upon the photosynthesis lab, where you explored how environmental factors affect photosynthesis rates, this exercise will have you compare the performance of two statistical models—a simple linear regression model and a polynomial regression model—in predicting photosynthesis rates.

##introduction

In the photosynthesis lab, you generated synthetic datasets to model how environmental factors like light intensity, CO₂ concentration, and temperature affect photosynthesis rates using multiple linear regression. Now, we'll extend that work by investigating whether incorporating non-linear relationships improves our model's predictive capabilities.

###Step 1: Generate Synthetic Data
We'll generate a dataset similar to what you did in the photosynthesis lab. This synthetic dataset will simulate environmental variables and calculate the photosynthesis rate based on given coefficients.

In [None]:
import numpy as np

# Generate synthetic data
np.random.seed(50)
n_samples = 100

# Environmental variables (similar to the lab)
light = np.random.uniform(50, 200, n_samples)       # Light intensity (µmol photons m⁻² s⁻¹)
CO2 = np.random.uniform(300, 800, n_samples)        # CO₂ concentration (ppm)
temperature = np.random.uniform(15, 35, n_samples)  # Temperature (°C)

# Coefficients based on the lab's realistic values
intercept = 1.0
beta_light = 0.01
beta_CO2 = 0.03
beta_temp = 0.04

# Simulate photosynthesis rate with added noise
photosynthesis_rate = (
    intercept
    + beta_light * light
    + beta_CO2 * CO2
    + beta_temp * temperature
    + np.random.normal(0, 0.5, n_samples)  # Adding random noise as in the lab
)


###Step 2: Fit Two Different Models
Just like in the lab, we'll fit a simple linear regression model. We'll also fit a polynomial regression model that includes a squared term for temperature to capture any non-linear effects.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Prepare the data for modeling
X = np.column_stack((light, CO2, temperature))

# Simple Linear Regression Model (as in the lab)
model_linear = LinearRegression()
model_linear.fit(X, photosynthesis_rate)
y_pred_linear = model_linear.predict(X)

# Polynomial Regression Model (including temperature squared)
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
# This will add interaction terms and squared terms for each feature

# Fit the polynomial regression model
model_poly = LinearRegression()
model_poly.fit(X_poly, photosynthesis_rate)
y_pred_poly = model_poly.predict(X_poly)


###Step 3: Evaluate the Models
We'll evaluate both models using Mean Squared Error (MSE) and R², as we did in the lab.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Evaluate Simple Linear Regression Model
mse_linear = mean_squared_error(photosynthesis_rate, y_pred_linear)
r2_linear = r2_score(photosynthesis_rate, y_pred_linear)

# Evaluate Polynomial Regression Model
mse_poly = mean_squared_error(photosynthesis_rate, y_pred_poly)
r2_poly = r2_score(photosynthesis_rate, y_pred_poly)

print(f"Linear Model - MSE: {mse_linear:.4f}, R²: {r2_linear:.4f}")
print(f"Polynomial Model - MSE: {mse_poly:.4f}, R²: {r2_poly:.4f}")


Linear Model - MSE: 0.1997, R²: 0.9902
Polynomial Model - MSE: 0.1915, R²: 0.9906


###Step 4: Discussion Questions
1. Why did the polynomial model perform slightly better than the linear model?

Answer:

The polynomial model includes additional terms that capture non-linear relationships and interactions between variables, such as temperature squared and interaction terms like light × CO₂. This allows the model to fit the data slightly better by accounting for curvature and interactions that the linear model cannot capture. However, since the original data was generated using a linear relationship, the improvement is minimal.

2. How does adding non-linear relationships (e.g., temperature squared) improve the model?

Answer:

Adding non-linear terms allows the model to capture more complex patterns in the data. For example, the effect of temperature on photosynthesis might not be strictly linear; photosynthesis rates could increase with temperature up to an optimal point and then decrease. The squared term for temperature helps model this curvature, providing a better fit if such a non-linear relationship exists.

3. Based on the results, should we prefer the linear or the polynomial model? Why?

Answer:

In this case, the polynomial model shows a marginal improvement in MSE and R² values. However, the improvement is minimal, and the increased complexity of the polynomial model may not justify its use. If model simplicity and interpretability are important, the linear model may be preferred. If capturing potential non-linear relationships is crucial, and the slight performance gain is valuable, the polynomial model could be chosen.