## Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.
Simple Linear Regression (SLR) is a statistical technique used to model the relationship between **one independent variable (X)** and **one dependent variable (Y)**. It fits a straight line that best represents how changes in X affect Y.

**Purpose of SLR:**
- To predict the value of Y based on X.
- To analyze the strength and direction of their relationship.
- To understand how much Y changes when X changes by one unit.

## Question 2: What are the key assumptions of Simple Linear Regression?
1. **Linearity** – The relationship between X and Y is linear.
2. **Independence** – Observations are independent of each other.
3. **Homoscedasticity** – Constant variance of errors across all levels of X.
4. **Normality of errors** – Residuals follow a normal distribution.
5. **No multicollinearity** – Not applicable to SLR since only one X variable.

## Question 3: Mathematical Equation of Simple Linear Regression
The simple linear regression model is:

\[ Y = b_0 + b_1X + \varepsilon \]

Where:
- **Y** = Dependent variable (output)
- **X** = Independent variable (input)
- **b₀** = Intercept (value of Y when X = 0)
- **b₁** = Slope (change in Y for each 1-unit increase in X)
- **ε** = Error term

## Question 4: Real-world Example of Simple Linear Regression
A very common and intuitive real-world example of Simple Linear Regression is predicting a student's exam score based on the number of hours they studied.

Example: Predicting Exam Score from Hours Studied
Independent Variable (X):

Hours of study

Dependent Variable (Y):

Exam score (out of 100)

How the data is used

We collect data from many students:

How many hours each student studied

What exam score they achieved

When we plot this data on a graph, we usually see a positive trend — students who study more hours generally score higher.

How SLR is applied in this example
1. Finding the best-fit line

Simple Linear Regression identifies the straight line that best fits the data points.

2. Quantifying the relationship

The model might show something like:

“For every additional 1 hour of study, the exam score increases by about 5 points.”

This gives a numerical measure of how strongly studying affects performance.

3. Making predictions

If a new student studies for 7 hours, the model can predict their likely score.
For example:

“A student who studies 7 hours is predicted to score around 80.”

## Question 5: What is the method of Least Squares?
TMethod of Least Squares

The Method of Least Squares is the standard technique used to find the best-fitting line in a linear regression model.

1. What the Method Tries to Do

We have a set of data points (X, Y).

We want to draw a straight line:
Predicted Y = b0 + b1 * X

Every line we draw will have errors (residuals).

2. What is a Residual

Residual = Actual Y − Predicted Y

It is the vertical distance between a data point and the regression line.

Some residuals are positive (above the line), some are negative (below the line).

3. Why We Square the Errors

If we add residuals directly, positive and negative values cancel out.

Squaring them makes all errors positive.

This gives us the Sum of Squared Errors (SSE).

4. Sum of Squared Errors (SSE)

SSE = Sum of (Actual Y − Predicted Y)²

It represents the total error of the line.

5. Goal of the Method of Least Squares

The method finds the values of b0 (intercept) and b1 (slope)

That produce the smallest possible SSE.

The line with the lowest SSE is considered the best-fit line, because it is the closest to all the data points overall.

## Question 6: What is Logistic Regression? How is it different from Linear Regression?
**Logistic Regression** is a classification algorithm used when the output variable is **categorical**, usually binary (0 or 1).

**Differences:**
- Linear Regression predicts **continuous values**; Logistic Regression predicts **probabilities**.
- Linear Regression uses a straight line; Logistic Regression uses the **sigmoid curve**.
- Logistic Regression outputs values between **0 and 1**.

## Question 7: Three Common Evaluation Metrics for Regression
1. **MAE (Mean Absolute Error):** Average absolute error between predicted and actual values.
2. **MSE (Mean Squared Error):** Average squared error—penalizes large errors.
3. **RMSE (Root Mean Squared Error):** Square root of MSE—interpretable in original units of Y.

## Question 8: Purpose of R-squared in Regression
Purpose of the R-squared Metric

R-squared (R2), also called the Coefficient of Determination, measures how well a regression model explains the variation in the dependent variable.

1. What R-squared Measures

It tells you what percentage of the variation in Y is explained by X.

Value ranges from 0 to 1 (or 0% to 100%).

2. How to Interpret R-squared

R2 = 1.0 (100%)
The model explains all the variability. Perfect fit.

R2 = 0.60 (60%)
The model explains 60% of the variation in Y.
The remaining 40% is due to other factors or random error.

R2 = 0.0 (0%)
The model explains none of the variation.
There is no linear relationship between X and Y.

3. Purpose of R-squared

Shows how well the model fits the data.

Helps evaluate how much of Y’s behavior is captured by the regression model.

Complementary to error metrics like MAE or MSE.

4. Important Note

A high R2 does not automatically mean a good model.

R2 cannot be used to compare models with different numbers of predictors.

In such cases, Adjusted R-squared is preferred.

## Question 9: Python Code for Simple Linear Regression
Below is the Python code using scikit-learn:

In [5]:
import numpy as np
from sklearn.linear_model import LinearRegression


# 1. Create some sample data
# Let's use the "hours studied" vs "exam score" example
# X = hours_studied, y = exam_score
# np.array(...).reshape(-1, 1) is used to format X for scikit-learn
X = np.array([1, 2, 3, 4, 5, 7, 8, 10]).reshape(-1, 1)
y = np.array([50, 55, 62, 68, 70, 80, 88, 95])

# 2. Create the Linear Regression model object
# This is like creating a "blank" model
model = LinearRegression()

# 3. Fit the model to the data
# This is the "training" step. The model "learns" the best
# intercept (b0) and slope (b1) from the data (X, y).
model.fit(X, y)

# 4. Get the learned parameters
# The 'intercept_' attribute stores b0
intercept = model.intercept_

# The 'coef_' attribute stores the slope(s). For SLR, it's a single value.
slope = model.coef_[0]

# 5. Print the slope and intercept
print("--- Simple Linear Regression Results ---")
print(f"Intercept (beta_0): {intercept}")
print(f"Slope (beta_1): {slope}")
print("----------------------------------------")
print("\nModel Equation: y = {:.2f} + {:.2f} * X".format(intercept, slope))

--- Simple Linear Regression Results ---
Intercept (beta_0): 45.85294117647059
Slope (beta_1): 5.029411764705882
----------------------------------------

Model Equation: y = 45.85 + 5.03 * X


## Question 10: Interpretation of Coefficients in SLR


In a simple linear regression model, the equation is:

Y = b0 + b1 * X

There are two important coefficients:

1. Intercept (b0)

This is the predicted value of Y when X = 0.

It represents the starting level or baseline value.

Sometimes it has a real meaning (example: predicted exam score with 0 hours of study).

Sometimes it does NOT have a real meaning (example: predicted price for a house with 0 square feet).

2. Slope (b1)

This tells us how much Y changes when X increases by 1 unit.

It shows the strength and direction of the relationship.

Example: If b1 = 4.87, then for every 1 extra hour of study, the predicted exam score increases by 4.87 points.

3. Direction of relationship

If b1 is positive: Y increases when X increases.

If b1 is negative: Y decreases when X increases.