<a href="https://colab.research.google.com/github/xesmaze/cpsc541-fall2024/blob/main/lab_3/lab_3_EC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Objective#

Building upon the photosynthesis lab, where you explored how environmental factors affect photosynthesis rates, this exercise will have you compare the performance of two statistical models—a simple linear regression model and a polynomial regression model—in predicting photosynthesis rates.

##introduction

In the photosynthesis lab, you generated synthetic datasets to model how environmental factors like light intensity, CO₂ concentration, and temperature affect photosynthesis rates using multiple linear regression. Now, we'll extend that work by investigating whether incorporating non-linear relationships improves our model's predictive capabilities.

###Step 1: Generate Synthetic Data
We'll generate a dataset similar to what you did in the photosynthesis lab. This synthetic dataset will simulate environmental variables and calculate the photosynthesis rate based on given coefficients.

In [None]:
import numpy as np

# Generate synthetic data
np.random.seed(50)
n_samples = 100

# Environmental variables (similar to the lab)
light = np.random.uniform(50, 200, n_samples)       # Light intensity (µmol photons m⁻² s⁻¹)
CO2 = np.random.uniform(300, 800, n_samples)        # CO₂ concentration (ppm)
temperature = np.random.uniform(15, 35, n_samples)  # Temperature (°C)

# Coefficients based on the lab's realistic values
intercept = 1.0
beta_light = 0.01
beta_CO2 = 0.03
beta_temp = 0.04

# Simulate photosynthesis rate with added noise (as in the lab)
photosynthesis_rate = (
    intercept
    + beta_light * light
    + beta_CO2 * CO2
    + beta_temp * temperature
    + np.random.normal(0, 0.5, n_samples)
)

###Step 2: Fit Two Different Models
Just like in the lab, we'll fit a simple linear regression model. We'll also fit a polynomial regression model that includes a squared term for temperature to capture any non-linear effects.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Prepare the data for modeling
X = np.column_stack((light, CO2, temperature))

# Simple Linear Regression Model (as in the lab)
model_linear = LinearRegression()
# Fit the model (fill in the blanks)
# model_linear.____(____, ____)


# Make predictions
# y_pred_linear = model_linear.____(____)


# Polynomial Regression Model (including squared terms)
poly_features = PolynomialFeatures(degree=2, include_bias=False)
# Transform the features (fill in the blanks)
# X_poly = poly_features.____(____)


# Fit the polynomial regression model
model_poly = LinearRegression()
# model_poly.____(____, ____)


# Make predictions with the polynomial model
# y_pred_poly = model_poly.____(____)



###Step 3: Evaluate the Models
We'll evaluate both models using Mean Squared Error (MSE) and R², as we did in the lab.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Evaluate Simple Linear Regression Model
# mse_linear = mean_squared_error(____, ____)

# r2_linear = r2_score(____, ____)

# Evaluate Polynomial Regression Model
# mse_poly = mean_squared_error(____, ____)
# r2_poly = r2_score(____, ____)


print(f"Linear Model - MSE: {mse_linear:.4f}, R²: {r2_linear:.4f}")
print(f"Polynomial Model - MSE: {mse_poly:.4f}, R²: {r2_poly:.4f}")


###Step 4: Discussion Questions
1. Why did the polynomial model perform slightly better than the linear model?

Your Answer:

Hint: Consider how non-linear terms capture additional patterns in the data.

2. How does adding non-linear relationships (e.g., temperature squared) improve the model?

Your Answer:

Hint: Think about modeling curvature and interactions between variables.

3. Based on the results, should we prefer the linear or the polynomial model? Why?

Your Answer:

Hint: Weigh the trade-off between model complexity and performance improvement.