# Example for Generating Different Types of Polynomial Features
There are several ways to add polynomial features to your original data. First, we load the libraries and the Boston dataset.

In [1]:
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, ElasticNet, Lasso, LassoCV

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

from sklearn.preprocessing import PolynomialFeatures
%matplotlib inline

In [2]:
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

In [3]:
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

## Select two variables from the dataset
For the sake of simplicity we use a smaller dataset containing only two of the original variables.

In [4]:
df_small = df[['LSTAT','CRIM']]

## Generate the Full Polynomial of Degree 2
Now we generate the full polynomial features of degree 2 so it will generate LSTAT, CRIM, LSTAT^2, LSTATxCRIM, and CRIM^2

In [5]:
polynomial = PolynomialFeatures(degree=2, include_bias=False)
X_small_poly = polynomial.fit_transform(df_small)
polynomial.get_feature_names()

['x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']

I now rename these variables so that they refer to the original features in the dataset

In [6]:
X_small_names = [s.replace("x0","LSTAT").replace("x1","CRIM") for s in polynomial.get_feature_names()]
X_small_names

['LSTAT', 'CRIM', 'LSTAT^2', 'LSTAT CRIM', 'CRIM^2']

Now let's create a Pandas dataframe for the data.

In [7]:
df_polynomial = pd.DataFrame(data=X_small_poly,columns = X_small_names)
df_polynomial.describe()

Unnamed: 0,LSTAT,CRIM,LSTAT^2,LSTAT CRIM,CRIM^2
count,506.0,506.0,506.0,506.0,506.0
mean,12.653063,3.613524,210.993989,73.653001,86.897912
std,7.141062,8.601545,236.06192,196.454499,513.258198
min,1.73,0.00632,2.9929,0.031474,4e-05
25%,6.95,0.082045,48.3037,0.616244,0.006731
50%,11.36,0.25651,129.05,3.070517,0.065804
75%,16.955,3.677083,287.4721,51.206063,13.52094
max,37.97,88.9762,1441.7209,1691.690778,7916.764166


## Generate the Polynomial without Interactions
We generate a polynomial of degree 2 without the interactions that will create LSTAT, CRIM, LSTAT^2, and CRIM^2. This can be achieved by creating the dataset with the interactions and eliminating the interactions, that is,

In [14]:
# copy the original dataset 
df_no_interactions = df_polynomial.copy()

# eliminate the interactions
df_no_interactions['LSTAT CRIM'] = None

# check the result
df_no_interactions.describe()

Unnamed: 0,LSTAT,CRIM,LSTAT^2,CRIM^2
count,506.0,506.0,506.0,506.0
mean,12.653063,3.613524,210.993989,86.897912
std,7.141062,8.601545,236.06192,513.258198
min,1.73,0.00632,2.9929,4e-05
25%,6.95,0.082045,48.3037,0.006731
50%,11.36,0.25651,129.05,0.065804
75%,16.955,3.677083,287.4721,13.52094
max,37.97,88.9762,1441.7209,7916.764166


Alternatively, you can creates separate tables with the power of LSTAT using the same code we used in the other notebooks. Then, do the same for the other variables and then join all the tables with the function pd.concat(..., axis=1) from pandas library.

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

## Generate Polynomial with interactions only

In [15]:
interactions = PolynomialFeatures(degree=2, include_bias=False, interaction_only=True)
X_interations = interactions.fit_transform(df_small)
interactions.get_feature_names()

['x0', 'x1', 'x0 x1']

I now rename these variables so that they refer to the original features in the dataset

In [16]:
X_interactions_names = [s.replace("x0","LSTAT").replace("x1","CRIM") for s in interactions.get_feature_names()]
X_interactions_names

['LSTAT', 'CRIM', 'LSTAT CRIM']

In [17]:
df_interactions = pd.DataFrame(data=X_interations,columns = X_interactions_names)
df_interactions.describe()

Unnamed: 0,LSTAT,CRIM,LSTAT CRIM
count,506.0,506.0,506.0
mean,12.653063,3.613524,73.653001
std,7.141062,8.601545,196.454499
min,1.73,0.00632,0.031474
25%,6.95,0.082045,0.616244
50%,11.36,0.25651,3.070517
75%,16.955,3.677083,51.206063
max,37.97,88.9762,1691.690778
