# Multiple Linear Regression
### NOTE: This version is for CSV file imports!

---

Created By: Xavier De Carvalho  
Created On: 12/08/2021  
Upated By: N/A  
Updated On: N/A  
Version: csv0.0.01

### Requirements

---

##### Required Data Format
- File Type: CSV
- File Shape: (n) Columns, (n) Rows

##### Required Python Packages
- Numpy
- Pandas
- ScikitLearn
    - Model_Selection
    - LinearRegression

### Description

---

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. You can use multiple linear regression when you want to know:     

1. How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).     
2. The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).

### Example

---

You are a public health researcher interested in social factors that influence heart disease. You survey 500 towns and gather data on the percentage of people in each town who smoke, the percentage of people in each town who bike to work, and the percentage of people in each town who have heart disease.     

Because you have two independent variables and one dependent variable, and all your variables are quantitative, you can use multiple linear regression to analyze the relationship between them.

[scribbr - Multiple Linear Regression]('https://www.scribbr.com/statistics/multiple-linear-regression/')


### Install Dependencies If Needed

---

NOTE: This might not be required if you're running your notebook instance in the cloud! 
<br><br>
Delete the cell below if this is the case...

In [None]:
# Import the sys dependency
import sys
# Install dependencies
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install sklearn

### Import Packages

---

In [None]:
# Import packages
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Confirm packages have been imported
print("Packages imported!")

### Import CSV Data

---

In [None]:
# Import the data set using pandas csv reader
dataset = pd.read_csv("@YOUR_CSV_FILE_PATH_HERE")
# Confirm data has been imported
print("Data has been imported!")
# Initialize variables
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values
# Confirm variables have been initialized
print("Variables have been initialized!")

### Encode Categorical Data (One-hot)

---

In [None]:
# Transform Columns and One-hot Encode Data
ct = ColumnTransformer(
    transformers=[ 
        (
            'encoder',
            OneHotEncoder(),
            [0] # Set this to the index of the 
                # column you need to One-hot encode
        )
    ],
    remainder='passthrough'
)
# One-hot encode x
X = np.array(ct.fit_transform(x)) # One-hot encoded data will 
                                  # move to index 0 in new array

### Split Data Set For Training & Testing

---

In [None]:
# Initialize training & testing variables
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=1/3, # This can be fine tuned if required
    random_state = 0 # This can be fine tuned if requred
)

### Train Model Using Training Set

---

In [None]:
# Method to train the model
regressor = LinearRegression()
regressor.fit(
    X_train, # Matrix features
    y_train # Dependent variable vector
)

### Predict Test Results

---

In [None]:
# Method to predict results
y_pred = regressor.predict(X_test)
# Print all numerical values with just 2 decimal places
np.set_printoptions(precision=2)
# Display two vectors - Actual vs Prediction
print(
    np.concatenate(
        (
            # Vertically concatenate the two vectors
            y_pred.reshape(len(y_pred), 1),
            y_test.reshape(len(y_test), 1)
        ),
        axis=1
    )
)