# LOGISTIC REGRESSION

1. What is Simple Linear Regression (SLR)? Explain its purpose
   -    Simple Linear Regression (SLR)** is a basic statistical technique used to study how one variable changes in response to another variable. It focuses on finding a straight-line relationship between a single input (independent variable) and an output (dependent variable).
  Purpose of Simple Linear Regression:**
The main goal of SLR is to understand and measure how one factor affects another. It helps us:

* Identify whether a relationship exists between two variables
* Estimate how much the dependent variable changes when the independent variable changes
* Predict future or unknown values based on existing data
* Represent data trends in a simple and easy-to-understand mathematical form

In simple terms, **SLR helps explain relationships and make predictions using a straight line**.


2. What are the key assumptions of Simple Linear Regression?
   -   Simple Linear Regression works properly only when certain basic conditions are met. These are its **key assumptions**, explained in simple words:

1. **Linear relationship**
   The independent variable and dependent variable should be related in a straight-line manner, not a curve.

2. **Independence of observations**
   Each data point should be collected independently, meaning one observation should not influence another.

3. **Constant variance (Homoscedasticity)**
   The spread of errors should be roughly the same for all values of the independent variable.

4. **Normal distribution of errors**
   The prediction errors should follow a normal pattern, especially important for statistical testing.

5. **No major outliers**
   Extreme or unusual values should not strongly affect the regression line.

In short, **these assumptions ensure that the regression results are reliable, accurate, and meaningful**.


3. Write the mathematical equation for a simple linear regression model and
explain each term.
   -    The mathematical equation of a **Simple Linear Regression** model is:

[
y = a + bx + e
]

**Explanation of each term (brief):**

* **(y)**: the dependent variable (what we want to predict)
* **(x)**: the independent variable (input or predictor)
* **(a)**: the intercept, the value of (y) when (x = 0)
* **(b)**: the slope, showing how much (y) changes for a one-unit increase in (x)
* **(e)**: the error term, representing random variation not explained by the model

This equation shows how (y) changes in a straight-line relationship with (x).


4. Provide a real-world example where simple linear regression can be
applied.
   -  A real-world example of **simple linear regression** is predicting **electricity usage** based on **daily temperature**.

Here, temperature is the independent variable, and electricity usage is the dependent variable. As temperature increases, electricity use often rises because of cooling devices. Simple linear regression can be used to understand this relationship and estimate how much electricity will be consumed at a given temperature.

This helps power companies plan energy supply more effectively.


5. What is the method of least squares in linear regression?
   -    The method of least squares** in linear regression is a mathematical approach used to determine the most suitable straight line that represents the relationship between two variables.

In this method, the difference between the observed values and the values predicted by the regression line is calculated for each data point. These differences are then squared and added together. The regression line is chosen so that this total squared error is **as small as possible**.

Squaring the errors makes sure that positive and negative errors do not cancel each other and gives more importance to larger mistakes. As a result, the fitted line closely follows the overall pattern of the data and provides reliable estimates for prediction and analysis.


6. What is Logistic Regression? How does it differ from Linear Regression?
   -    **Logistic Regression** is a statistical method used when the outcome is **categorical**, most commonly with two possible results such as *yes/no*, *pass/fail*, or *true/false*. Instead of predicting a direct value, it estimates the **probability** that an observation belongs to a particular category.

**How it differs from Linear Regression:**

* **Type of output:**
  Linear regression predicts a continuous numerical value, while logistic regression predicts a probability that falls between 0 and 1.

* **Nature of the model:**
  Linear regression fits a straight line to the data. Logistic regression uses an S-shaped curve to keep predicted probabilities within a valid range.

* **Purpose:**
  Linear regression is used for estimation and trend analysis. Logistic regression is mainly used for classification problems.

* **Error handling:**
  Linear regression assumes normally distributed errors, whereas logistic regression is based on probability and classification principles.

In simple words, **linear regression predicts “how much,” while logistic regression predicts “which category.”**


7. Name and briefly describe three common evaluation metrics for regression
models.
   -   Here are **three common evaluation metrics for regression models**, explained briefly in simple words:

1. **Mean Absolute Error (MAE)**
   It measures the average size of prediction errors by taking the absolute difference between actual and predicted values. Smaller MAE means better accuracy.

2. **Mean Squared Error (MSE)**
   This metric squares the errors before averaging them, which gives more weight to larger mistakes and helps identify models with big prediction errors.

3. **R-squared (R²)**
   It shows how well the model explains the variation in the data. A higher R² value means the model fits the data better.

These metrics help judge how accurate and reliable a regression model is.


8. What is the purpose of the R-squared metric in regression analysis?
   -   The **purpose of the R-squared (R²) metric** in regression analysis is to show **how well a model explains the data**.

It tells us what **portion of the changes in the dependent variable** can be explained by the independent variable(s) in the model. The value of R² ranges from 0 to 1, where a higher value means the model fits the data better.

In simple terms, **R-squared helps us understand how useful the regression model is in explaining real-world behavior**, not just making predictions.


9. Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.
(Include your Python code and output in the code box below.)

In [3]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
# x = independent variable, y = dependent variable
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # reshape for sklearn
y = np.array([2, 4, 5, 4, 5])

# Create the linear regression model
model = LinearRegression()

# Fit the model
model.fit(x, y)

# Print slope (coefficient) and intercept
print("Slope (b):", model.coef_[0])
print("Intercept (a):", model.intercept_)


Slope (b): 0.6
Intercept (a): 2.2


10. How do you interpret the coefficients in a simple linear regression model?
    -   In a **simple linear regression model** (y = a + bx), the coefficients are interpreted as follows:

1. **Intercept ((a))**

   * This is the value of the dependent variable (y) when the independent variable (x = 0).
   * It represents the starting point of the regression line.

2. **Slope ((b))**

   * This shows how much (y) changes for a **one-unit increase** in (x).
   * A positive slope means (y) increases as (x) increases, while a negative slope means (y) decreases as (x) increases.

**Example:**
If the regression line is (y = 3 + 2x):

* Intercept (3) → When (x = 0), (y = 3)
* Slope (2) → For every 1-unit increase in (x), (y) increases by 2

In short, the **intercept tells where the line starts** and the **slope tells the direction and strength of the relationship**.

