#1 Simple Linear Regression (SLR)

1. Definition

    Simple Linear Regression is a statistical technique used to study the relationship between one independent variable (X) and one dependent variable (Y).
    It assumes a linear relationship between the two variables.

2. Mathematical Form

   The model is expressed as:

     Y = β₀ + β₁X + ε

   (β₀): Intercept (value of Y when X = 0)

   (β₁): Slope (change in Y for a one-unit change in X)

   (ε): Random error term

3. Purpose of SLR

   To Measure Relationship: Determines the direction and strength of the linear association between X and Y.
   To Estimate Impact: Quantifies how much Y changes when X changes.
   To Make Predictions: Uses the fitted regression equation to predict values of Y for given values of X.
   To Test Significance: Helps assess whether the relationship between variables is statistically significant.

4. Applications

   Used in economics, business, social sciences, and scientific research for analyzing trends and supporting data-driven decisions.



#2.Simple Linear Regression is based on the following key assumptions:

1. Linearity

   There exists a linear relationship between the independent variable (X) and the dependent variable (Y).
   The expected value of Y is a linear function of X.

2. Independence of Errors

   The residuals (error terms) are independent of each other.
   The error for one observation does not influence the error for another.

3. Homoscedasticity (Constant Variance)

   The variance of the error terms remains constant across all levels of the independent variable.
   In other words, the spread of residuals is the same for all predicted values.

4. Normality of Errors

   The error terms are normally distributed.
   This assumption is particularly important for hypothesis testing and confidence interval estimation.

5. No Perfect Multicollinearity

   In Simple Linear Regression, there is only one independent variable, so perfect multicollinearity is not applicable.
   However, the independent variable should exhibit variation (it should not be constant).

These assumptions ensure that the estimators obtained through the method of least squares are unbiased, efficient, and suitable for statistical inference.


#3.The mathematical equation for a Simple Linear Regression (SLR) model is:

Mathematical Form

   The model is expressed as:

     Y = β₀ + β₁X + ε

   (β₀): Intercept (value of Y when X = 0)

   (β₁): Slope (change in Y for a one-unit change in X)

   (ε): Random error term

Explanation of each term:

1. (Yi) (Dependent Variable)

   Represents the observed value of the response variable for the ith observation.
   It is the variable we aim to explain or predict.

2. (Xi) (Independent Variable)

   Represents the value of the predictor variable for the ith observation.
   It is the variable used to explain changes in the dependent variable.

3. (β₀) (Intercept)

   The expected value of (Y) when (X = 0).
   It indicates where the regression line crosses the Y-axis.

4. (β1) (Slope Coefficient)

   Represents the change in the expected value of (Y) for a one-unit increase in (X).
   It indicates the direction (positive or negative) and strength of the linear relationship.

5. (ε) (Error Term)

   Captures the difference between the observed value (Yi) and the predicted value.
   It accounts for other factors affecting (Y) that are not included in the model.

Thus, the equation expresses the dependent variable as a linear function of the independent variable plus a random error component.


#4 One common real-world application of Simple Linear Regression is in predicting house prices.

Example: House Price Prediction

Suppose a real estate analyst wants to estimate the price of a house based on its size (measured in square feet).

 Dependent Variable (Y): House price
 Independent Variable (X): Size of the house in square feet

The analyst collects data from recently sold houses and fits a simple linear regression model:

Price = β₀ + β₁(Size) + ε

Interpretation:

 The intercept (β₀) represents the estimated base price when the size is zero (theoretical starting value).
 The slope (β₁) represents how much the price is expected to increase for each additional square foot.
 The error term (ε) accounts for other factors affecting price, such as location, number of bedrooms, age of the property, and market conditions.

Purpose:

The model helps understand the relationship between house size and price and allows the analyst to predict the expected price of a house given its size.

This example illustrates how Simple Linear Regression can be used in business and economic decision-making to estimate outcomes based on a single predictor variable.


#5 The method of least squares is a mathematical technique used in linear regression to estimate the best-fitting line through a set of data points.

In Simple Linear Regression, the objective is to determine the values of the intercept (β₀) and slope (β₁) that produce the line which best represents the relationship between the independent variable (X) and the dependent variable (Y).

Each observation has a residual (error), defined as:

Residual = Actual value − Predicted value

The method of least squares follows these steps:

1. Compute the residual for each observation.
2. Square each residual to eliminate negative signs and assign greater weight to larger errors.
3. Sum all the squared residuals.
4. Select the values of β₀ and β₁ that minimize this total sum.

Mathematically, the method minimizes:

∑ (Yᵢ − Ŷᵢ)²

where
Yᵢ = observed value
Ŷᵢ = predicted value

Thus, the method of least squares determines the regression line that minimizes the total squared prediction error, providing the best linear fit to the data.


#6.Logistic Regression is a statistical classification technique used to model the probability of a binary outcome based on one or more independent variables. The dependent variable in logistic regression is categorical, typically taking values such as 0 and 1 (for example, pass/fail, yes/no, disease/no disease).

Instead of predicting a continuous value, logistic regression predicts the probability that an observation belongs to a particular class. It uses the logistic (sigmoid) function to transform a linear combination of predictors into a value between 0 and 1:

p = 1 / (1 + e^-(β₀ + β₁X))

Where:

p = probability of the event occurring
β₀ = intercept
β₁ = coefficient of the independent variable
X = independent variable

Differences between Logistic Regression and Linear Regression:

1. Type of Dependent Variable

   Linear Regression: Continuous outcome (for example, income, temperature).
   Logistic Regression: Categorical (usually binary) outcome.

2. Output

   Linear Regression: Produces a numeric value.
   Logistic Regression: Produces a probability between 0 and 1.

3. Model Form

   Linear Regression: Fits a straight line.
   Logistic Regression: Uses an S-shaped (sigmoid) curve to model probabilities.

4. Estimation Method

   Linear Regression: Uses the method of least squares.
   Logistic Regression: Uses maximum likelihood estimation.

5. Interpretation

   Linear Regression: Coefficients represent change in Y for a one-unit change in X.
   Logistic Regression: Coefficients represent change in log-odds of the outcome.

Thus, linear regression is used for predicting continuous values, whereas logistic regression is primarily used for classification problems involving categorical outcomes.


#7.Three common evaluation metrics for regression models are:

1. Mean Absolute Error (MAE)

    MAE measures the average of the absolute differences between the actual values and the predicted values.
    It tells us, on average, how much the predictions deviate from the true values, without considering direction.
    Formula:
     MAE = (1/n) ∑ |Yᵢ − Ŷᵢ|
    It is easy to interpret because it is expressed in the same units as the dependent variable.

2. Mean Squared Error (MSE)

   MSE measures the average of the squared differences between actual and predicted values.
   Squaring the errors penalizes larger errors more heavily than smaller ones.
   Formula:
     MSE = (1/n) ∑ (Yᵢ − Ŷᵢ)²
   It is useful when large errors are particularly undesirable.

3. R-squared (R²)

   R² measures the proportion of variance in the dependent variable that is explained by the model.
   It ranges from 0 to 1.
   A higher R² indicates that the model explains a greater portion of the variability in the data.
   It provides an overall measure of goodness of fit.

These metrics help assess prediction accuracy and model performance from different perspectives.


#8.The purpose of the R-squared (R²) metric in regression analysis is to measure how well the regression model explains the variability in the dependent variable.

R-squared represents the proportion of the total variation in the dependent variable (Y) that is explained by the independent variable(s) included in the model.

It is calculated as:

R² = Explained Variation / Total Variation

Its value ranges between 0 and 1.

* R² = 0 indicates that the model does not explain any of the variability in the dependent variable.
* R² = 1 indicates that the model perfectly explains all the variability in the dependent variable.

For example, if R² = 0.85, this means that 85% of the variation in the dependent variable is explained by the regression model, while the remaining 15% is due to other factors or random error.

Thus, the primary purpose of R-squared is to evaluate the goodness of fit of a regression model and to assess how well the model captures the underlying relationship between variables.


In [2]:
#9. SLR code in python

# Import required libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data (Independent variable X and Dependent variable y)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 0.6000000000000002
Intercept: 2.1999999999999993


#10.In a Simple Linear Regression model:

Y = β₀ + β₁X

the coefficients β₀ (intercept) and β₁ (slope) have specific interpretations.

1. Intercept (β₀)

   The intercept represents the expected value of the dependent variable (Y) when the independent variable (X) is equal to zero.

   In practical terms, it is the baseline or starting value of Y before the effect of X is considered.

   Example:
   If we model exam marks based on study hours and the intercept is 40, this means that when study hours are zero, the predicted exam score is 40 marks (assuming the model is valid in that range).

2. Slope (β₁)

   The slope coefficient represents the expected change in the dependent variable (Y) for a one-unit increase in the independent variable (X), holding all else constant.

   It indicates both the direction and magnitude of the relationship:

   If β₁ is positive, Y increases as X increases.
   If β₁ is negative, Y decreases as X increases.

   Example:
   If β₁ = 5 in the exam example, it means that for every additional hour of study, the predicted exam score increases by 5 marks.

Thus, in simple linear regression, the intercept provides the baseline level of the dependent variable, while the slope quantifies the effect of the independent variable on the dependent variable.
