# Simple Linear Regression

## The Linear Model

Simple linear regression models the relationship between two variables by fitting a linear equation to the observed data. The general form of this equation is:

$$y = \beta_0 + \beta_1 x + \epsilon$$

Where:
- $y$ is the dependent variable
- $x$ is the independent variable
- $\beta_0$ is the y-intercept
- $\beta_1$ is the slope coefficient
- $\epsilon$ is the error term

## Loss Function: Mean Squared Error

The Mean Squared Error (MSE) is used to measure how well the model fits the data:

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Where:
- $n$ is the number of observations
- $y_i$ is the actual value
- $\hat{y}_i$ is the predicted value ($\hat{y}_i = \beta_0 + \beta_1 x_i$)

## Parameter Estimation

The optimal parameters are calculated using the Ordinary Least Squares (OLS) method:

$$\beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$$

$$\beta_0 = \bar{y} - \beta_1 \bar{x}$$

Where:
- $\bar{x}$ is the mean of the independent variable
- $\bar{y}$ is the mean of the dependent variable

## Model Evaluation

The coefficient of determination ($R^2$) measures the proportion of variance explained by the model:

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

The Root Mean Squared Error (RMSE) provides an absolute measure of fit:

$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

In [11]:
import numpy as np
import pandas as pd