# Unit 2 Polynomial Regression

## Lesson Introduction

Hello\! Today, we're diving into Polynomial Regression, an advanced form of regression analysis for modeling complex relationships between variables. We'll learn how to use Python and Scikit-Learn to perform polynomial regression. By the end, you'll know how to create polynomial features, train a model, and make predictions.

Polynomial regression is useful for capturing non-linear relationships. For instance, predicting exam scores (the target) based on study hours (the feature) might not follow a simple linear pattern. Polynomial regression can help in such cases.

## Understanding Polynomial Features

Why do we need polynomial features? To fit a curve instead of a straight line, we create new features that include polynomial terms (like $x^2$, $x^3$). This helps in modeling more complex relationships.

Scikit-Learn offers `PolynomialFeatures` to transform our input data. Here's how it works:

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

X = np.array([[2], [3], [4]])
print("Original X:\n", X)
# Output:
# Original X:
# [[2]
#  [3]
#  [4]]

# Transforming to include polynomial terms up to degree 2
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
print("Transformed X (with polynomial terms):\n", X_poly)
# Output:
# Transformed X (with polynomial terms):
# [[ 1.  2.  4.]
#  [ 1.  3.  9.]
#  [ 1.  4. 16.]]
```

The new `X_poly` includes the original term, its square, and an intercept term (the first column).

## Loading and Preparing Data

We'll create data to work with. We'll generate random values between -1 and 1 as features, and our target variable will follow a quadratic equation $y = 3x^2 + 2x + \\text{noise}$, simulating realistic data with some noise.

```python
import numpy as np

# Load sample dataset
np.random.seed(42)  # For reproducible results
X = np.random.rand(100, 1) * 2 - 1  # Random values between -1 and 1
y = 3 * X**2 + 2 * X + np.random.randn(100, 1) * 0.1  # Quadratic function with noise

# Display the first 5 values of X and y
print("First 5 rows of feature X:\n", X[:5])
# [[-0.250919, 0.90142, 0.463988, 0.1973, -0.688]]
print("First 5 rows of target y:\n", y[:5])
# [[-0.304, 4.21067, 1.583, 0.31268, 0.02198]]
```

Now, we have the data where our target variable has a non-linear relationship with the feature.

## Splitting Data into Training and Test Sets

As always, we'll split our data into training and test sets to train and evaluate our model. We will use `X_train` to train the model and `X_test` to evaluate its performance.

```python
from sklearn.model_selection import train_test_split

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the train/test sets
print("X_train shape:", X_train.shape, "y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape, "y_test shape:", y_test.shape)
# Output:
# X_train shape: (80, 1) y_train shape: (80, 1)
# X_test shape: (20, 1) y_test shape: (20, 1)
```

## Training a Simple Linear Regression Model

First, we'll train a simple linear regression model without polynomial features, like we did in the first lesson.

```python
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Train a simple linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Make predictions
y_pred_linear = linear_model.predict(X_test)

# Calculate the mean squared error
mse_linear = mean_squared_error(y_test, y_pred_linear)
print(f"Linear Regression MSE: {mse_linear}")
# Output
# Linear Regression MSE: 0.7138921735032644
```

Now, we have the MSE score for a regular linear regression model. There is not much to say about it, but we can use it to compare this model to others. Let's train a smarter polynomial regression model and check if it works better.

## Transforming Features and Training a Polynomial Regression Model

Next, we'll transform the input data to include polynomial terms and train a polynomial regression model.

```python
from sklearn.preprocessing import PolynomialFeatures

# Transforming the features into polynomial features
poly_features = PolynomialFeatures(degree=2)
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)

# Training a polynomial regression model
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

# Make predictions
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate the mean squared error
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f"Polynomial Regression MSE: {mse_poly}")
# Output
# Polynomial Regression MSE: 0.006358406072820809
```

By applying `PolynomialFeatures(degree=2)().fit_transform()` to our data (both `X_train` and `X_test`), we create a new feature that models a quadratic relationship.

Having trained both models, we can now compare their performance using the mean squared error (MSE).

```python
# Linear Regression MSE: 0.7138921735032644
# Polynomial Regression MSE: 0.006358406072820809
```

The polynomial regression model has a much lower MSE, indicating it fits the data much better.

## Lesson Summary

Great job\! We covered polynomial regression, from creating polynomial features to training a model and making predictions. Here’s a quick recap:

  * **Polynomial Features:** We used `PolynomialFeatures` to transform our features.
  * **Sample Data:** We created a sample dataset using a quadratic formula with noise.
  * **Train/Test Split:** We split the data into training and test sets.
  * **Model Training:** We trained both a simple linear regression model and a polynomial regression model.
  * **Evaluation:** We compared their performance using MSE.

Next, you'll move to practice, where you'll apply what you've learned. You'll generate your own polynomial features, train models, and make predictions.

Happy coding\!

## Adjust Polynomial Degree to Enhance Regression

Our next mission, Space Voyager!

Change the polynomial degree from 2 to 3 to see how it impacts predicting house prices. This change will help capture more complex relationships. Modify the PolynomialFeatures parameter to accomplish this.

Let's code!

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Generate sample data for house prices with a cubic relationship
np.random.seed(42)
X = 3 * np.random.rand(100, 1)
y = 2 + 1*X + 0.5*X**2 + 0.2*X**3 + np.random.randn(100, 1)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# TODO: Change degree of the polynomial features to 3
poly_features = PolynomialFeatures(degree=2)
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)

# Train a polynomial regression model
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

# Make predictions and calculate the mean squared error
y_pred_poly = poly_model.predict(X_test_poly)
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f"Polynomial Regression MSE: {mse_poly}")

```

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Generate sample data for house prices with a cubic relationship
np.random.seed(42)
X = 3 * np.random.rand(100, 1)
y = 2 + 1*X + 0.5*X**2 + 0.2*X**3 + np.random.randn(100, 1)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Change degree of the polynomial features to 3
poly_features = PolynomialFeatures(degree=3) # Changed degree from 2 to 3
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)

# Train a polynomial regression model
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)

# Make predictions and calculate the mean squared error
y_pred_poly = poly_model.predict(X_test_poly)
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f"Polynomial Regression MSE: {mse_poly}")
```

## Polynomial Features for House Sizes

Nice work! Now, transform the house_size data to include polynomial features (degree 2) and print the resulting features. Fill in the missing parts to complete the code.

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Example feature: House sizes in 1000 sq ft
house_sizes = np.array([[1.2], [1.5], [2.3]])

# TODO: Transform house_sizes to include polynomial features up to degree 2
# TODO: Print the polynomial features

```

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Example feature: House sizes in 1000 sq ft
house_sizes = np.array([[1.2], [1.5], [2.3]])

# Transform house_sizes to include polynomial features up to degree 2
poly = PolynomialFeatures(degree=2)
house_sizes_poly = poly.fit_transform(house_sizes)

# Print the polynomial features
print("Original House Sizes:\n", house_sizes)
print("Polynomial Features (degree 2) for House Sizes:\n", house_sizes_poly)
```

## Predicting House Prices with Polynomial Regression

Alright, Space Voyager! Let’s fill in the missing pieces to predict house prices using polynomial regression. Complete the TODOs to transform the features and evaluate the performance of the model using Mean Squared Error (MSE).

May you reach for the stars!

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Simulate house size data (from 500 to 3000 sq ft)
np.random.seed(42)
X = np.random.rand(100, 1) * 2500 + 500

# Simulate house prices (price = 40 * size^2 - size/2 + noise)
y = 40 * X**2 - X / 2 + np.random.randn(100, 1) * 100

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Create a PolynomialFeatures object with degree 2
# TODO: Transform X_train and X_test to include polynomial terms

# Train the linear regression model on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict house prices
y_pred = model.predict(X_test_poly)

# Evaluate the model using Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Polynomial Regression House Prices MSE: {mse}")

```

```python
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Simulate house size data (from 500 to 3000 sq ft)
np.random.seed(42)
X = np.random.rand(100, 1) * 2500 + 500

# Simulate house prices (price = 40 * size^2 - size/2 + noise)
y = 40 * X**2 - X / 2 + np.random.randn(100, 1) * 100

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a PolynomialFeatures object with degree 2
poly_features = PolynomialFeatures(degree=2)

# Transform X_train and X_test to include polynomial terms
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)

# Train the linear regression model on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict house prices
y_pred = model.predict(X_test_poly)

# Evaluate the model using Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Polynomial Regression House Prices MSE: {mse}")
```

## Predict House Prices Using Polynomial Regression

Hi Space Explorer! Predict house prices using Polynomial Regression. Follow these steps:

Transform the features into polynomial terms (degree=2).
Train a polynomial regression model.
Make predictions with the trained model.
Calculate and print the Mean Squared Error (MSE).
Good luck!

```python
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Generate synthetic house data: features (size, number of rooms, age) and price of the house
np.random.seed(0)
X = np.random.randint(500, 2500, size=(100, 3))
y = X[:, 0] * 200 + X[:, 1] * 500 + X[:, 2] * 2 - 1000 + np.random.randint(-100, 100, size=100)

# TODO: Transform the features to include polynomial terms (degree=2)

# TODO: Train a polynomial regression model using the transformed features and target vector

# TODO: Make predictions with the trained model on the transformed features

# TODO: Calculate and print the mean squared error (MSE) of the predictions

```

```python
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Generate synthetic house data: features (size, number of rooms, age) and price of the house
np.random.seed(0)
X = np.random.randint(500, 2500, size=(100, 3))
y = X[:, 0] * 200 + X[:, 1] * 500 + X[:, 2] * 2 - 1000 + np.random.randint(-100, 100, size=100)

# TODO: Transform the features to include polynomial terms (degree=2)
# Initialize PolynomialFeatures with degree=2.
# This will create new features that are combinations of the original features up to the second degree.
poly_features = PolynomialFeatures(degree=2)
# Transform the original features X into polynomial features.
X_poly = poly_features.fit_transform(X)

# TODO: Train a polynomial regression model using the transformed features and target vector
# Initialize a Linear Regression model.
# LinearRegression is used here because Polynomial Regression is essentially a Linear Regression
# on the polynomial features.
model = LinearRegression()
# Train the model using the transformed polynomial features and the target prices y.
model.fit(X_poly, y)

# TODO: Make predictions with the trained model on the transformed features
# Use the trained model to make predictions on the same transformed features that were used for training.
y_pred = model.predict(X_poly)

# TODO: Calculate and print the mean squared error (MSE) of the predictions
# Calculate the Mean Squared Error between the actual prices (y) and the predicted prices (y_pred).
mse = mean_squared_error(y, y_pred)
# Print the calculated Mean Squared Error.
print(f"Mean Squared Error (MSE): {mse}")



```