# Simple Linear Regression Notebook

## Introduction

In this notebook, we will learn about Simple Linear Regression, a fundamental technique in machine learning and statistics. Simple Linear Regression helps us understand the relationship between two variables by fitting a linear equation to observed data.



### Key Concepts:
- Simple Linear Regression Equation
- Coefficients (Slope and Intercept)
- Assumptions of Linear Regression
- Evaluation Metrics: Mean Squared Error (MSE), R-squared

Let's dive in!

## 1. Understanding Simple Linear Regression

Simple Linear Regression models the relationship between a single independent variable \( x \) and a dependent variable \( y \) using a linear equation:

\[ y = mx + b \]

- \( y \) is the dependent variable.
- \( x \) is the independent variable.
- \( m \) is the slope of the line.
- \( b \) is the y-intercept.

## 2. Assumptions of Linear Regression

Linear Regression assumes:
1. **Linearity**: The relationship between \( x \) and \( y \) is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: The variance of the residuals is constant across all levels of \( x \).
4. **Normality**: Residuals are normally distributed.

## 3. Example: Predicting Exam Scores

Let's consider a dataset containing the number of study hours and corresponding exam scores.

### Python Example



OverFitting --- > Perform well with Training Data but bad in test data
Underfitting ---> Not good in both Train and Test Data

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [4]:
# Load the dataset
data = pd.read_csv('exam_scores.csv')

In [5]:
# Split data into features (X) and target variable (y)
X = data[['Study Hours']]
y = data['Exam Scores']

In [6]:
m = LinearRegression()

X-Independent Variable(input)
y-Dependent Variable(Target-Actual Output)

In [7]:
m.fit(X,y)  # X,y - Original Data (Train the model)

LinearRegression()

In [8]:
p = pd.DataFrame({"Study Hours":[4,5,9]}) # X-test New Data

Predicted Output

In [9]:
predoutput=m.predict(p)
predoutput

array([41.58688697, 51.36269036, 90.46590392])

In [10]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [11]:
# Create Linear Regression model
model = LinearRegression()

In [12]:
# Train the model
model.fit(X_train, y_train)

LinearRegression()

In [15]:
model.intercept_ # intercept(b) y=9.268(x)+2.826

2.826892353899737

In [17]:
model.coef_ #y=mx+b slope(m)

array([9.68207815])

In [25]:
y=(9.68207815*2.5)+2.826892353899737
y

27.03208772889974

In [23]:
y_test

8     81
16    30
0     21
23    76
11    62
Name: Exam Scores, dtype: int64

In [10]:
X_test

Unnamed: 0,Study Hours
8,8.3
16,2.5
0,2.5
23,6.9
11,5.9


In [26]:
# Make predictions
y_pred = model.predict(X_test)

In [27]:
y_pred

array([83.18814104, 27.03208774, 27.03208774, 69.63323162, 59.95115347])

In [20]:
R=y_test-y_pred
R

8    -2.188141
16    2.967912
0    -6.032088
23    6.366768
11    2.048847
Name: Exam Scores, dtype: float64

In [12]:
y_pred

array([83.18814104, 27.03208774, 27.03208774, 69.63323162, 59.95115347])

In [21]:
# Calculate evaluation metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 18.943211722315272
R-squared: 0.9678055545167994


R-Square - 1 --Good Model
R-Square - 0 --Bad Prediction