# Multiple Linear Regression

## Questions:

### What is the wikipedia link to the algorithm?

- https://en.wikipedia.org/wiki/Linear_regression

### Which type of machine learning algorithm is this?

- Supervised learning

### What is the best video tutorial on this algorithm?

- [Video](https://www.youtube.com/watch?v=ZkjP5RJLQF4)

### What is the best text?

- [Link](https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html#multivariable-regression)

### What is the best picture which describes the algorithm?

- ![Linear Regression](Images/LinearRegression.jpeg)

### What is one case for which the algorithm is used for?

- Company has saved sales data. If she does a linear regression with monthly sales and sales data the company can be eventually able to predict future sales.

## From scratch implementation.

### Steps for reproducing the algorithm:
1. **Hypothesis**
    - Returns dependent variable y
2. **Cost function** 
    - Root mean squared error defines the difference between the predicted y and and the actual y 
3. **Gradient descent**
    - Updates theta values for each batch (old theta - learning rate $\cdot$ new theta)
    - Implements partial derivative respective to each theta 
4. **Linear regression**
    - Collects for n iterations cost function values and theta values
5. **Validation**
    - Tests with theta values from trained model if dependent variables y getting correctly predicted

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

### Hypothesis

$y= \theta_1 \cdot x_1 + \theta_2 \cdot x_2 + \cdots + \theta_n \cdot x_n$

In [2]:
def hypothesis(X, thetas):
    y = np.dot(X, thetas)
    return y 

### Cost function

$J = \frac{1}{2n}\sum_{i=1}^{n}(y_i - (\theta_1 \cdot x_1 + \theta_2 \cdot x_2 + \cdots + \theta_n \cdot x_n))^2$

In [3]:
def costFunction(X, y, thetas):
    j = 1/2*len(X) * ((y - hypothesis(X, thetas))**2)
    return j

### Gradient descent

Using chain rule for creating partial derivative of each $\theta$.

$f'(\theta_1) = -x_1(y - (\theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n)) \\$
$f'(\theta_2) = -x_2(y - (\theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n)) \\$
$\vdots\\$
$f'(\theta_n) = -x_3(y - (\theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n))$


In [4]:
def updateTheta(X, y, thetas, learningRate):
    thetasNew = []
    for i in range(thetas.size):
        thetaNew = thetas[i] - learningRate*(-X[i]*(y-(np.dot(X,thetas))))
        thetasNew.append(thetaNew)
    return np.array(thetasNew)

### Building the model

In [5]:
def linearRegression(X, y, thetas, learningRate, iterations):
    costFunctionResults = []
    thetasResults = []
    for i in range(iterations):
        for i in range(len(X)):
            j = costFunction(thetas, X[i], y[i])
            thetas = updateTheta(X[i], y[i], thetas, learningRate)
            costFunctionResults.append(j)
            thetasResults.append(thetas)
    return costFunctionResults, thetasResults
        

### Data creation with scikit-learn

In [6]:
X, y = datasets.make_regression(
    n_samples=100, n_features=4, noise=20, random_state=4
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1234
)

### Train my own model

In [7]:
thetas = np.ones(X_train[0].size)
iterations = 100
learningRate = 0.01
costFunctionResults, thetasResults = linearRegression(X_train, y_train, thetas, learningRate, iterations)

### Predicting dependent variables with trained model

In [8]:
def val(Xt, thetas): 
    yPred = []
    for i in range(len(Xt)):
        y = np.dot(thetas, Xt[i])
        yPred.append(y)
    return yPred
print(thetasResults[-1])
print(val(X_test, thetasResults[-1]))

[67.89771335  6.79887016 89.16350876 76.09871124]
[-234.26458246427438, -25.185667349663873, -93.55514269635881, -36.94025736266564, -83.38359268611588, 66.29843739469774, 110.72681894797145, -169.96633673234095, 33.984825328541035, -80.00296166833355, 106.28319339635966, -78.95826796461118, 17.446963007468838, 352.6270407110843, -179.2598629039047, -225.88854512588654, -99.88308171236258, -30.872202017347355, -169.42309378535975, -174.61313518875153]


### Using scikit-learn for comparing multiple linear regression results

In [9]:
reg = LinearRegression().fit(X, y)
reg.predict(X_test)

    

array([-235.90189249,  -26.44061702,  -94.25756023,  -34.37850339,
        -82.72296268,   66.76667711,  110.24828677, -170.1283782 ,
         33.86355047,  -77.76008859,  109.09775579,  -81.15534581,
         19.48684785,  353.49377319, -178.16899171, -227.73980317,
        -96.64067068,  -30.43986168, -168.05853676, -173.43390532])