## Linear Regression

### What is Linear Regression ?

- It is a **supervised learning algorithm**.
- It is used to **model the relationship between a dependent variable (target) and one or more independent variables (features)** by fitting a linear equation.

### Mathematical Formulation

#### Simple Linear Regression (SLR)

Involves one independent variable:
$$
y = \beta_0 + \beta_1 x + \varepsilon
$$
- $y$: Dependent variable (target/output)
- $x$: Independent variable (feature/input)
- $\beta_0$: Intercept
- $\beta_1$: Slope
- $\varepsilon$: Error term (residual)

#### Multiple Linear Regression (MLR)

Involves two or more than two independent variables:
$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ............... + \beta_n x_n +\varepsilon
$$

### Goal of Linear Regression

Minimize the difference betweeen actual and predicted values which is also known as error/residual, typically using **Ordinary Least Squares (OLS)**:

Minimize
$$
\min \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

Where:
- $y_i$: Actual values
- $\hat{y}_i$: Predicted values

### Assumptions of Linear Regression

1. **Linearity**: The relationship between predictors and target is linear.
2. **Independence**: Residuals (errors) are independent of each other.
3. **Homoscedasticity**: Constant variance of errors.
4. **Normality of Errors**: Residuals are normally distributed.
5. **No multicollinearity (for MLR)**: Independent variables are not highly correlated.

### Model Evaluation Metrics

- **$R^2$ (Coefficient of Determination)**

Measures proportion of variance explained by the model:

$$
R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}}
$$

- **MAE (Mean Absolute Error)**

Average of absolute differences between predictors and actual values.

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - x_i|
$$

- **MSE (Mean Squared Error)**

MSE measures the average of the squared differences between actual and predicted values. It's more sensitive to outliers than MAE.

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

- **RMSE (Root Mean Squared Error)**

RMSE is the square root of the Mean Squared Error. It retains the same unit as the target variable and penalizes larger errors more heavily than MAE.



### Advantages

- Easy to implement and interpret.
- Computationally efficient.
- Works well when assumptions hold.
- Good baseline for regression problems.

### Disadvantages

- Prone to **underfitting** in complex/nonlinear datasets.
- Assumes linear **relationships only**.
- Sensitive to **outliers**.
- Poor performance in the presence of **multicollinearily**.

### Use Cases in Data Science

- **Forecasting** (e.g., sales, temperature, demand)
- **Risk assessment** (e.g., credit scoring)
- **Medical prediction** (e.g., patient health outcomes)
- **Economic modeling** (e.g., housing price estimation)

### Next Steps

- Learn **Regularized Linear Models: Ridge, Lasso, and ElasticNet**
- Use **Residual Plots** and **Q-Q plots** for diagnostic analysis
- Understand the **Bias-Variance tradeoff**
- **Integrate with Pipelines and Cross-validation** for robust ML workflows