<a href="https://colab.research.google.com/github/waelrash1/predictive_analytics_DT302/blob/main/algorithms_implementation/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Linear Logistic Regression**

step by step, from the mathematical derivation to the detailed Python implementation.

### 1. **Introduction to Logistic Regression**

Logistic regression is a statistical model that is commonly used for **binary classification** problems (where the target variable has two possible outcomes, e.g., 0 or 1). It models the probability that a given input belongs to a particular class.

Unlike linear regression, which predicts continuous values, logistic regression predicts the probability of a class label being 1, based on the sigmoid function.

### 2. **Logistic Function (Sigmoid Function)**

The logistic regression model uses the **sigmoid function** to convert the linear combination of input features into a probability score between 0 and 1.

The sigmoid function is given by:

$
h_\theta(x) = \sigma(z) = \frac{1}{1 + e^{-z}}
$
Where:
- $ z = \theta^T x = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n $
- $ \theta $ is the weight vector.
- $ x $ is the input feature vector.

The sigmoid function squashes any input value $ z $ into a range between 0 and 1, making it suitable for binary classification.

### 3. **Logistic Regression Model**

The hypothesis of logistic regression is given by the sigmoid of the linear model:

$
h_\theta(x) = P(y=1 | x; \theta) = \frac{1}{1 + e^{-\theta^T x}}
$

Where:
- $ h_\theta(x) $ represents the probability that the output $ y $ is 1 given input $ x $.

For binary classification, the output $ y $ can be either 0 or 1, and the decision boundary is when $ P(y=1 | x; \theta) = 0.5 $, or $ \theta^T x = 0 $.

### 4. **Loss Function (Cost Function)**

The **cost function** for logistic regression is derived from the **likelihood function** (because we are modeling probabilities). The goal is to maximize the probability of correctly classifying the training data.

The cost function $ J(\theta) $ is the **negative log-likelihood**:

$
J(\theta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]
$

Where:
- $ y^{(i)} $ is the true label (0 or 1) for the $i$-th example.
- $ h_\theta(x^{(i)}) $ is the predicted probability for the $i$-th example.
- $ m $ is the number of training examples.

This cost function penalizes incorrect predictions more heavily the further they are from the true label (0 or 1).

### 5. **Gradient Descent for Logistic Regression**

To minimize the cost function $ J(\theta) $, we use **gradient descent**, an optimization algorithm that updates the weights iteratively to reduce the cost.

The gradient of the cost function with respect to $ \theta_j $ (for each feature $ j $) is given by:

$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$

We update the weights using gradient descent:

$
\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}
$

Where:
- $ \alpha $ is the learning rate (a hyperparameter that controls the step size in gradient descent).

### 6. **Logistic Regression Algorithm: Steps**

1. **Initialize Parameters:**
   Start with initial guesses for the weights $ \theta = [\theta_0, \theta_1, ..., \theta_n] $, typically initialized to zero or small random values.

2. **Forward Pass:**
   Compute the predicted probabilities $ h_\theta(x) $ using the sigmoid function.

3. **Cost Function:**
   Calculate the cost function $ J(\theta) $ using the current weights.

4. **Gradient Descent:**
   Update the weights $ \theta $ using the gradient descent rule.

5. **Repeat:**
   Iterate steps 2 to 4 until the cost converges (i.e., until it becomes very small or the difference between successive updates is minimal).

### 7. **Python Implementation**

Let's implement logistic regression from scratch using NumPy.

```python
import numpy as np

class LogisticRegression:
    def __init__(self, learning_rate=0.01, n_iters=1000):
        self.learning_rate = learning_rate
        self.n_iters = n_iters
        self.weights = None
        self.bias = None

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def fit(self, X, y):
        # Number of training examples and features
        n_samples, n_features = X.shape
        
        # Initialize parameters (weights and bias)
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.n_iters):
            # Linear model (z = X * w + b)
            linear_model = np.dot(X, self.weights) + self.bias
            
            # Apply sigmoid function to get probabilities (h_theta(x))
            y_predicted = self.sigmoid(linear_model)
            
            # Compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / n_samples) * np.sum(y_predicted - y)
            
            # Update weights and bias
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        # Linear model (z = X * w + b)
        linear_model = np.dot(X, self.weights) + self.bias
        # Apply sigmoid function
        y_predicted = self.sigmoid(linear_model)
        # Convert probabilities to binary output (0 or 1)
        y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]
        return np.array(y_predicted_cls)

# Example usage
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

# Load a dataset
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the logistic regression model
model = LogisticRegression(learning_rate=0.01, n_iters=1000)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate accuracy
accuracy = np.mean(predictions == y_test)
print(f"Logistic Regression accuracy: {accuracy * 100:.2f}%")
```

### Explanation of the Code:

1. **Initialization:**
   The `LogisticRegression` class has two important hyperparameters:
   - `learning_rate`: Controls the size of the steps in the gradient descent.
   - `n_iters`: Number of iterations for the gradient descent optimization.

   The `fit` method is responsible for training the model, and the `predict` method is used to make predictions on new data.

2. **Sigmoid Function:**
   The `sigmoid` function is used to convert the linear model's output into probabilities.
   
   ```python
   def sigmoid(self, z):
       return 1 / (1 + np.exp(-z))
   ```

3. **Gradient Descent:**
   The `fit` method computes the linear model output, applies the sigmoid function, and then uses the gradients of the cost function to update the weights and bias.

   The gradient descent step looks like this:
   
   ```python
   dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
   db = (1 / n_samples) * np.sum(y_predicted - y)
   
   self.weights -= self.learning_rate * dw
   self.bias -= self.learning_rate * db
   ```

4. **Prediction:**
   In the `predict` method, the model outputs probabilities, and then we threshold these probabilities at 0.5 to get class predictions.

   ```python
   y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]
   ```

### 8. **Example Output:**

After running the code on the breast cancer dataset, you may get output like:

```
Logistic Regression accuracy: 95.61%
```

### 9. **Evaluation and Improvements:**

- **Accuracy:** Logistic regression is often a good first model to try for binary classification problems due to its simplicity and interpretability.
- **Regularization:** You can add L1 (lasso) or L2 (ridge) regularization to the cost function to avoid overfitting on high-dimensional datasets.
- **Hyperparameter Tuning:** Experimenting with learning rates and the number of iterations can improve model performance.

