# Linear Regression with Python

## This is the solution to exercise 1 from Andrew Ng's Machine Learning coursera course.
### By: Tu Pham

In this part, you will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

> The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the 
> house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house. Run 
> this section now to preview the data.

We will use a set of data $(x,y)$ and apply Gradient Descent to get a fitted line $L_\theta(x) = \theta_0 + \theta_1 x_1+\theta_2 x_2$ that will minimize the cost function $J(\theta)$

##Libraries
-numpy
-pandas

First we will import the necessary libraries

In [335]:
import numpy as np
np.set_printoptions(suppress=True)
import pandas
import math
from matplotlib import pyplot as plt

Import data ex1data1.csv into Python as data and set m=training set size

In [336]:
data=pandas.read_csv('ex1data2.csv',sep=',',na_values='.')
m=len(data)

In [337]:
print(data)

    size of house  bedrooms   price
0            2104         3  399900
1            1600         3  329900
2            2400         3  369000
3            1416         2  232000
4            3000         4  539900
5            1985         4  299900
6            1534         3  314900
7            1427         3  198999
8            1380         3  212000
9            1494         3  242500
10           1940         4  239999
11           2000         3  347000
12           1890         3  329999
13           4478         5  699900
14           1268         3  259900
15           2300         4  449900
16           1320         2  299900
17           1236         3  199900
18           2609         4  499998
19           3031         4  599000
20           1767         3  252900
21           1888         2  255000
22           1604         3  242900
23           1962         4  259900
24           3890         3  573900
25           1100         3  249900
26           1458         3 

Define our training sets to be $x$=input and $y$=output

Let $X = [x_0;x_1;x_2]$ where the first column are just 1's, i.e $x_0=1$.

Note: Since $X\theta = y$, the first column of $X$ will always be a column of ones to correspond to the first parameter $\theta_0$ of $\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \end{bmatrix}$

We extract population as col1 and profit as col2. Then convert the list as an array which then is converted into a matrix using np.array() method

In [338]:
col1=data.iloc[:,0]
col2=data.iloc[:,1]
col3=data.iloc[:,2]
x_1=[[col1[i]] for i in range(m)]
x_2=[[col2[i]] for i in range(m)]
y=[[col3[i]] for i in range(m)]
X=[[1,x_1[i][0],x_2[i][0]] for i in range(m)]
x_1=np.array(x_1)
x_2=np.array(x_2)
y=np.array(y)
X=np.array(X)

# Normalizing Features

Since we have multiple features with large disparity between the data values, we will normalize all features to scale well with $\alpha$ when we apply Gradient Descent.

We define the normalizeFeature(X) below

In [339]:
def normalizeFeature(X):
    m=len(X)
    n=len(X[0])
    X_norm=np.zeros((m,n))
    mu=np.zeros((1,n))
    sigma=np.zeros((1,n))
    
    for j in range(n):
        sum=0
        col=[]
        for i in range(m):
            col.append(X[i][j])
        
        if j==0:
            mu[0][0]=0
        else:
            mu[0][j]=np.mean(col)
        sigma[0][j]=np.std(col)
        if sigma[0][j]==0:
            sigma[0][j]=1
        
        for i in range(m):
            X_norm[i][j]=(X[i][j]-mu[0][j])/sigma[0][j]
    
    
    return [X_norm,mu,sigma]

# Cost Function 

$$J(\theta) = \frac{1}{2m}\sum_{i=1}^m(\theta_0+\theta_1*x_1^{(i)}+\theta_2*x_2^{(i)}- y^{(i)})^2$$
Where 
m= size of the training sample

$\theta = [\theta_0; \theta_1;\theta_2]$ 

$(x_1^{(i)},x_2^{(i)},y^{(i)})$ is the $i$th training pair. 

# Gradient Descent

To apply gradient descent, we start with an initial $\theta$, with a given $\alpha$ = step size of each iteration.

$$\theta_j := \theta_j - \alpha(1/m) * \sum_{i=1}^m(\theta_0+\theta_1*x^{(i)}+\theta_2*x^{(i)} - y^{(i)})x_j^{(i)}$$




## Gradient Descent function

gradientDescent(X,y,$\theta$,iterations):

| Parameter | Description |
| :------ | :------ |
| X | the matrix form of the training set input |
| y | the vector form of the training set output |
| $\theta$ | the initialized theta, this is the starting point for our gradientDescent |
| $\alpha$ | the step size |
| iterations | number of times we will apply gradientDescent to compute for the optimal $\theta$ |


| Return | Description |
| ------ | ------ |
| $\theta$ | the optimal theta that will minimize the cost function |

In [340]:
def computeCost(X,y,theta):
    predicted=np.matmul(X,theta)
    error=np.subtract(predicted,y)
        
    sumerrorSq=0
    m=len(y)
    for j in range(m):
        sumerrorSq+=math.pow(error[j][0],2)
    
    return sumerrorSq/(2*m)
    
    


def gradientDescent(X,y,theta,alpha,iterations):
    #apply the iterations as the number of time we will apply gradient descent
    for i in range(iterations):
        predicted=np.matmul(X,theta)
        error=np.subtract(predicted,y)

        m=len(y)
        theta=theta-alpha/m*(np.matmul(X.transpose(),error))
        cost=computeCost(X,y,theta)
        
        
    return theta
                            

In [341]:

A=normalizeFeature(X)
X_norm=A[0]
mu=A[1]
sigma=A[2]
theta = np.zeros((3,1))
theta = gradientDescent(X_norm,y,theta,0.1,1000)
print(theta)

[[340412.65957447]
 [109447.79646964]
 [ -6578.35485416]]


We will now plot the line $L_\theta(x) = 109443.053x_1+-6569.553x_2$ and compare with the training sets.

Let's use it to predict the price of a house with 1650 sq ft and 3br.

First we need to normalize each feature with our $mu$ and $sigma$


In [353]:

x=[1, 1650, 3]
for i in range(len(X[0])):
    x[i]=float((x[i]-mu[0][i])/sigma[0][i])


price=round(theta[0][0]+theta[1][0]*x[1]+theta[2][0]*x[2],2)
print('Predicted price of house: $',price)

Predicted price of house: $ 293081.46
