# Introduction to Logistic Regression

## 1. What is Logistic Regression?

- **Logistic Regression** is a **supervised learning algorithm** used for **classification** tasks.
- Unlike linear regression, logistic regression is used when the dependent variable is **categorical** (binary or multiclass).
- It models the probability of a particular class or event, using the **logistic function** (also known as the **sigmoid function**).
- Output is a probability value between 0 and 1, which is then used to classify data into different classes (e.g., true/false, spam/not spam).

---

## 2. Logistic vs. Linear Regression

- **Linear Regression**: Predicts a continuous output (regression tasks).
- **Logistic Regression**: Predicts the probability of categorical outcomes (classification tasks).
  
The core idea is to transform the linear model output into a probability using the **sigmoid function**:
  
\[
\hat{y} = \frac{1}{1 + e^{-(wx + b)}}
\]

- Here, `w` and `b` are weights and bias, `x` is the input, and `e` is Euler's number.
- The output (`\hat{y}`) is a probability between 0 and 1.

---

## 3. Sigmoid (Logistic) Function

The logistic function converts the output of a linear equation into a probability:

\[
\sigma(z) = \frac{1}{1 + e^{-z}}
\]

- When `z` (the linear combination of inputs and weights) is large and positive, the output approaches 1.
- When `z` is large and negative, the output approaches 0.
- It maps any real-valued number into a value between 0 and 1.

---

## 4. Binary Classification

- Logistic regression is typically used for **binary classification**, where the target variable has two classes, such as:
  - **Spam or Not Spam**
  - **Pass or Fail**
  - **Disease or No Disease**

For binary classification:
- If the output probability is **> 0.5**, classify the observation as class **1**.
- If the output probability is **<= 0.5**, classify the observation as class **0**.

---

## 5. Decision Boundary

- A **decision boundary** is the threshold at which logistic regression classifies an input into one class or another.
- For binary classification, the decision boundary is often at **0.5**.
- It separates the feature space into two regions: one for class 0 and one for class 1.

---

## 6. Cost Function in Logistic Regression

Instead of the Mean Squared Error (MSE) used in linear regression, logistic regression uses a **log loss** (or **binary cross-entropy**) cost function:

\[
\text{Cost}(h(x), y) = - \left[ y \log(h(x)) + (1 - y) \log(1 - h(x)) \right]
\]

Where:
- `h(x)` is the predicted probability.
- `y` is the actual class (0 or 1).

This cost function penalizes incorrect predictions more heavily as the predicted probability diverges from the actual class label.

---

## 7. Multiclass Logistic Regression (One-vs-Rest)

- **Binary logistic regression** is for two classes, but logistic regression can be extended to **multiclass classification** using the **One-vs-Rest (OvR)** approach.
- In **One-vs-Rest**, the algorithm fits a binary classifier for each class against all other classes, then selects the class with the highest probability.

---

## 8. Example Code

```python
from sklearn.linear_model import LogisticRegression

# Initialize and configure the logistic regression model
model = LogisticRegression(random_state=0)

# Fit the model to training data (X_train, y_train)
model.fit(X_train, y_train)

# Predict probabilities on new data
pred_probs = model.predict_proba(X_test)

# Make class predictions (0 or 1)
predictions = model.predict(X_test)
```

---

## 9. Evaluation Metrics for Logistic Regression

- **Accuracy**: The ratio of correct predictions to total predictions.
- **Precision**: The ratio of true positives to predicted positives.
- **Recall**: The ratio of true positives to actual positives.
- **F1 Score**: The harmonic mean of precision and recall, providing a single measure that balances both.

\[
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\]

- **ROC Curve (Receiver Operating Characteristic)**: Plots the true positive rate vs. the false positive rate at different threshold levels.
- **AUC (Area Under Curve)**: Measures the overall performance of the model; a higher AUC indicates better performance.

---

## 10. Regularization in Logistic Regression

- Logistic regression often uses **regularization** to prevent overfitting.
- Common regularization types:
  - **L2 Regularization (Ridge)**: Adds a penalty proportional to the square of the magnitude of the coefficients.
  - **L1 Regularization (Lasso)**: Adds a penalty proportional to the absolute value of the coefficients, often resulting in sparse models (some weights become zero).

The regularization parameter is controlled by the `C` parameter in scikit-learnâ€™s `LogisticRegression` model. A smaller `C` means stronger regularization.

```python
model = LogisticRegression(C=1.0, penalty='l2', solver='lbfgs')
```

---

## 11. Applications of Logistic Regression

- **Medical Diagnostics**: Predict the likelihood of disease (e.g., diabetes, heart disease).
- **Spam Detection**: Classify emails as spam or not spam.
- **Credit Scoring**: Assess the risk of loan default.
- **Marketing**: Predict whether a customer will respond to a campaign (response or no response).

---

## 12. Limitations of Logistic Regression

- Assumes a **linear relationship** between the features and the log-odds of the output.
- Can underperform when the decision boundary is non-linear (other algorithms like decision trees or SVMs may be better).
- Sensitive to **outliers**, as they can heavily influence the model.
- Works best when there is little or no multicollinearity between features.

---

## 13. Summary

- **Logistic Regression** is a simple yet powerful classification algorithm.
- It uses the logistic function to model the probability of class membership.
- Regularization can be applied to improve generalization.
- It is widely used for binary classification tasks and can be extended to multiclass problems.

--- 

