<h2 style="color:green" align="center"> Machine Learning With Python: Linear Regression Multiple Variables</h2>

<h3 style="color:purple">Sample problem of predicting home price in monroe, new jersey (USA)</h3>

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

We will use regression with multiple variables here. Price can be calculated using following equation,

Here area, bedrooms, age are called independant variables or **features** whereas price is a dependant variable

In [4]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [5]:
df = pd.read_csv('homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


**Data Preprocessing: Fill NA values with median value of a column**

In [6]:
df.bedrooms.median()

3.5

In [7]:
# Let's get the int of median of bedrooms
import math
median_bedrooms = math.floor(df.bedrooms.median())
median_bedrooms

3

In [8]:
df.bedrooms = df.bedrooms.fillna(median_bedrooms)
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,3.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


# Build the linear regression with multiple variables:

In [9]:
# Create a linear regression model
reg = linear_model.LinearRegression()
# Train the model by using training data set by using fit method
# input variables --> area,bedrooms,age
# output variable --> price
reg.fit(df[['area','bedrooms','age']],df.price)
# You can use below way also to train the model. Both are same as we are just dropping target variable in 1st parameter.
# reg.fit(df.drop('price',axis='columns'),df.price)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [10]:
# check the coefficient values such as m1,m2,m3
reg.coef_

array([   137.25, -26025.  ,  -6825.  ])

In [11]:
# check the intercept value such as b
reg.intercept_

383724.9999999998

**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

In [12]:
# You can see the house price is less as the age of the house is 40 years old. compare it with 3200 sqr ft & 3000 sqr ft
reg.predict([[3000, 3, 40]])

array([444400.])

In [13]:
# house price = m1 * area + m2 * number of bedrooms + m3 * age of the house + intercept
137.25 * 3000 + -26025. * 3 + -6825. * 40 + 383724.9999999998
# because of the rounding factor it is not showing exactly but they are almost same

444399.9999999998

**Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old**

In [14]:
# house price is more as the age is just 5 years old. compare it with 2000 sqr ft
reg.predict([[2500, 4, 5]])

array([588625.])