# Simple Linear Regression
### NOTE: This version is for CSV file imports!

---

Created By: Xavier De Carvalho  
Created On: 18/06/2021  
Upated By: N/A  
Updated On: N/A  
Version: csv0.0.01

### Requirements

---

##### Required Data Format
- File Type: CSV
- File Shape: 2 Columns, (n) Rows
##### Required Python Packages
- Numpy
- Matplotlib
    - PyPlot
- Pandas
- ScikitLearn
    - Model_Selection
    - LinearRegression

### Description

---

This statistical model describes the relationship between one dependent and one independent variable using a straight line.  
  
We use this model to make predictions based on the aforementioned relationships between our two variables.

### Example

---

Let's say we want to predict what salary a person could earn based on their years of experience. Or vice versa.  
  
We would plot both variables onto an x and y axis to try and visually represent a relationship between the two variables.  
  
Let's define our variables:  
<i>x</i> = Years of experience  
<i>y</i> = Salary  
  
Let's say we see that on average (some variances are present) every time <i>x</i> increases by an increment value of ~1, <i>y</i> increases by an increment of ~1 as well.  
  
This would be an example of a linear increase.  
  
Based on this linear relationship between the two variables, we can make an accurate prediction for <i>x</i> based on an input value for <i>y</i>. And vice versa.


### Install Dependencies If Needed

---

NOTE: This might not be required if you're running your notebook instance in the cloud! 
<br><br>
Delete the cell below if this is the case...

In [None]:
# Install the sys dependency
import sys
# Install dependencies
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install sklearn

### Import Packages

---

In [None]:
# Import packages
import nympy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Confirm packages have been imported
print("Packages imported!")

### Import CSV Data

---

In [None]:
# Import the data set using pandas csv reader
dataset = pd.read_csv("YOUR_CSV_FILE_PATH_HERE")
# Confirm data has been imported
print("Data has been imported!")
# Initialize variables
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values
# Confirm variables have been initialized
print("Variables have been initialized!")

### Split Data Set For Training & Testing

---

In [None]:
# Initialize training & testing variables
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=1/3, # This can be fine tuned if required
    random_state = 0 # This can be fine tuned if requred
)

### Train Model Using Training Set

---

In [None]:
# Method to train the model
regressor = LinearRegression()
regressor.fit(
    X_train, # Matrix features
    y_train # Dependent variable vector
)

### Predict Test Results

---

In [None]:
# Method to predict results
y_pred = regressor.predict(X_test)

### Visualize The Training Result

---

In [None]:
# Create the first visualization for the training set
plt.scatter(
    X_train,
    y_train,
    color = 'red'
)
plt.plot(
    X_train,
    regressor.predict(X_train), # Plot the regression line
    color = 'blue'
)
# Replace the text marked with '@' with your own text.
# Don't forget to remove the '@' character!
plt.title('@YOUR_X vs @YOUR_Y (Training Set)')
plt.xlabel('@YOUR_X_AXIS_NAME')
plt.ylabel('@YOUR_Y_AXIS_NAME')
plt.show

### Visualize The Test Result

---

In [None]:
# Create the first visualization for the test set
plt.scatter(
    X_test,
    y_test,
    color = 'red'
)
plt.plot(
    X_train,
    regressor.predict(X_train), # Plot the regression line
    color = 'blue'
)
# Replace the text marked with '@' with your own text.
# Don't forget to remove the '@' character!
plt.title('@YOUR_X vs @YOUR_Y (Test Set)')
plt.xlabel('@YOUR_X_AXIS_NAME')
plt.ylabel('@YOUR_Y_AXIS_NAME')
plt.show