## Linear Regression
- Linear Regression is a types of supervised machine learning where we attempt to predict continuous variables given several independent variables.

- **Goal:**
    - To create a mathematical model that captures linear relationship between dependent and independent variables.
    - To generate predictions.
    - To assist in Decision Making by quantifying the expected outcomes based on changes in independent/predictors variables.

- 
- **Regression Types**:
    1. `Simple Linear Regression`
    2. `Multiple Linear Regresssion`
    3. `Polynomial Regression`

### 1. Simple Linear Regression

- In Simple Linear Regression, we consider a single independent variable and a single dependent variable.
- It helps to figure out the relationship between 2 variables i.e. independent variable (say x) and dependent variable (say y).
- Mathematical model of simple linear regression takes the form of straight line.

- **Mathematically,**
    - `Y = β0 + β1X` 
        - β0: the intercept
        - β1: the slope
        - X: an independent variable (the variable used to predict Y)
        - Y: dependent variable (the variable we want to predict)
- Here the term β0, β1 are called model parameters which will be estimated using optimization techniques such as Gradient Descent via minimizing the objective or cost function.

- **Visually (β0=38423, β1=821)**
    - `Y = 38423 + 821X`
    - <img src='images/1.png' width='400'>
    - Here, we fit the Linear Regression Line on Scatter data.
    - After, we can make prediction on New X via project to the line and looking to corresponding Y value

- **Simple Linear Regression (Training Process Using Gradient Descent)**
    1. `Initialization`
    2. `Compute Cost`
    3. `Gradient Descent Optimization`  




### **Simple Linear Regression (Training Process Using Ordinary Least Squares (OLS))**
- OLS stands for Ordinary Least Square.
- OLS by default uses mean squared error.
- Sum of Squared error i.e. SSE (β0 and β1)
    - Σ(Yi - (β0 + β1 * Xi))² 
- Mean Squared Error  i.e. MSE (β0 and β1) 
    - SSE (β0 and β1) / N
- OLS says using some formulae, It is possible to compute β0 and β1.
- Aim: To select the best fit line (optimal β0 and β1) that reduces the error between predicted and actual value.

### Derivation of OLS Estimate for Single Linear Regression

We start with the simple linear regression model:


Y = β0 + β1*X


Where:
- Y is the observed dependent variable.
- X  is the independent variable.
- β0 is the intercept.
- β1 is the slope.

**Step 1: Define the Cost Function**

The goal is to minimize the sum of squared errors (SSE), which is the sum of the squared differences between the observed Y and the predicted 
$(\hat{Y})$ values:

$$
SSE = \sum_{i=1}^{N} (Y_i - \hat{Y}_i)^2
$$


**Step 2: Minimize SSE by Finding $(\beta_0)$**

To minimize SSE, we differentiate it with respect to $( \beta_0 )$ and set the derivative equal to 0 (Since Minimum error is the point where Derivative of Error function is 0).

Differentiate SSE with respect to $(\beta_1)$: 

$$
\frac{\partial SSE}{\partial \beta_0} = \frac{\partial \left( \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))^2 \right)}{\partial \beta_1}
$$

Apply the chain rule and simplify:

$$
\frac{\partial SSE}{\partial \beta_0} = -2 \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))
$$

Set the derivative equal to 0 and solve for $(\beta_0)$:  
$$
-2 \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i)) = 0  
$$  

$$
\sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i)) = 0
$$

Now, solving for $(\beta_0)$:  

$$
\beta_0 = \bar{Y} - \beta_1\bar{X}          --> (i)
$$


**Step 3: Minimize SSE by Finding $(\beta_1)$**

To minimize SSE, we differentiate it with respect to $( \beta_1 )$ and set the derivative equal to 0 (Since Minimum error is the point where Derivative of Error function is 0).

Differentiate SSE with respect to $(\beta_1)$: 

$$
\frac{\partial SSE}{\partial \beta_1} = \frac{\partial \left( \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))^2 \right)}{\partial \beta_1}
$$

Apply the chain rule and simplify:

$$
\frac{\partial SSE}{\partial \beta_1} = -2 \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))X_i
$$

Set the derivative equal to 0 and solve for $( \beta_1 )$:

$$
-2 \sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))X_i = 0
$$

$$
\sum_{i=1}^{N} (Y_i - (\beta_0 + \beta_1X_i))X_i = 0
$$

Solving for $( \beta_1)$ by substituting value of $( \beta_0)$:

$$
\beta_1 = \frac{\sum_{i=1}^{N} (Y_i - \bar{Y})}{\sum_{i=1}^{N} X_i - \bar{X}} 
$$


Hence,

$$
\beta_1 = \frac{\sum_{i=1}^{N} (Y_i - \bar{Y})}{\sum_{i=1}^{N} X_i - \bar{X}} 
$$  


$$
\beta_0 = \bar{Y} - \beta_1\bar{X}          
$$


### Simple Linear Regression (using OLS)
1. Define the Linear Model
    - Y = β0 + β1*X  
    
    - Y is the observed dependent variable.
    - X  is the independent variable.
    - β0 is the intercept.
    - β1 is the slope.

2. Define the objective function
    - Objective in Linear Regression is to find β0 and β1 that minimizes the sum of squared errors (SSE).
    - The error for each data point is the difference between the observed value and the predicted value:
    - `e(i) = Y(i) -  β0 + β1*X(i)`
    - SSE = Σ(Yi - (β0 + β1 * Xi))²

3. Minimize the Objective function
    - As computed in the above section, To minimize SSE, we take the partial derivative of SSE with respect to β0 and β1, set them equal to zero, and solve for β0 and β1.
    - Refer to above section for the derivation part.

4. Final OLS Equation
    - $\hat{Y}$ = β0 + β1*X
    - where, β0 and β1 is esimated using the formulae derived in the above section.

### **Simple Linear Regression (Training Process using Gradient Descent)**

### References: 
- https://www.youtube.com/watch?v=KZ1mWboXE6g
- https://www.coursera.org/learn/data-analysis-with-python/lecture/Wlyce/linear-regression-and-multiple-linear-regression
- 