# Polynomial Regression
Polynomial Regression is a form of regression analysis in which the relationship between the independent variable ùë• and the dependent variable ùë¶ is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of ùë• and the corresponding conditional mean of ùë¶.

## Advantages:
- Flexibility: It can fit a wide range of curves and capture more complex relationships between variables compared to linear regression.
- Better Fit: It can provide a better fit for datasets that exhibit a nonlinear trend.
- Customizable: The degree of the polynomial can be adjusted to improve the model's performance.

## Disadvantages:
- Overfitting: Higher-degree polynomials can lead to overfitting, where the model performs well on the training data but poorly on unseen data.
- Computationally Expensive: It can be computationally intensive, especially for higher degrees.
- Complexity: Interpretation of the model becomes increasingly complex as the degree of the polynomial increases.

## Use Case:
- Economics: Modeling economic indicators and forecasting.
- Biology: Growth curves of organisms.
- Physics: Modeling physical phenomena like projectile motion.

## Scaling (necessary)
Yes, polynomial regression generally benefits from scaling. Feature scaling ensures that the polynomial features are on a similar scale, which can improve the convergence of the optimization algorithm.

## Encoding (necessary)
If you have categorical data, it must be encoded to numerical values.

# Import Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform

# Read Dataset

In [2]:
df = pd.read_csv('50_StartUp_dataset.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit,Florida,New York
0,0,165349.2,136897.8,471784.1,192261.83,0.0,1.0
1,1,162597.7,151377.59,443898.53,191792.06,0.0,0.0
2,2,153441.51,101145.55,407934.54,191050.39,1.0,0.0
3,3,144372.41,118671.85,383199.62,182901.99,0.0,1.0
4,4,142107.34,91391.77,366168.42,166187.94,1.0,0.0


# get X , Y

In [3]:
x=df.drop('Profit',axis=1)
y=df['Profit']

# Polynomial Features

In [4]:
from sklearn.preprocessing import PolynomialFeatures 
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(x)

## Get train, test and valid data

In [5]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(X_poly,y,test_size=.1, random_state=42)
x_train, x_valid, y_train, y_valid=train_test_split(x_train,y_train,test_size=.1, random_state=42)

In [6]:
print('x_train shape =',x_train.shape)
print('x_test shape =',x_test.shape)
print('x_valid shape =',x_valid.shape)
print('y_train shape =',y_train.shape)
print('y_test shape =',y_test.shape)
print('y_valid shape =',y_valid.shape)

x_train shape = (40, 28)
x_test shape = (5, 28)
x_valid shape = (5, 28)
y_train shape = (40,)
y_test shape = (5,)
y_valid shape = (5,)


# Scaling

In [7]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
x_train=scaler.fit_transform(x_train)
x_valid=scaler.transform(x_valid)
x_test=scaler.transform(x_test)

## Train ElasticNet without search

In [8]:
from sklearn.linear_model import ElasticNet
model=ElasticNet(alpha=1, l1_ratio=0.5, fit_intercept=True, max_iter=1000)
model.fit(x_train, y_train)

# Check overfiiting

In [9]:
y_train_pred=model.predict(x_train)
r2_score(y_train_pred , y_train)

0.9415132847449187

In [10]:
y_valid_pred=model.predict(x_valid)
r2_score(y_valid_pred , y_valid)

0.7729601852276689

# Evaluate model

In [11]:
y_pred = model.predict(x_test)

## r2_score

In [12]:
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
r2

0.9348445121188516

## mean_squared_error

In [13]:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
mse

44376890.35689034

## mean_absolute_error

In [14]:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
mae

5262.79210905229