# Logistic Regression

## Notation

- $x$ - Input features.
- $x_j$ - The $j^{th}$ feature.
- $\vec{x^{(i)}}$ - Features of the $i^{th}$ training example; the $i^{th}$ row.
- $x_j^{(i)}$ - The $i^{th}$ element for the $j^{th}$ feature.
- $y$ - Output/target variable.
- $y^{(i)}$ - The $i^{th}$ output value.
- $m$ - Number of training examples.
- $n$ - Number of training features.
- $\vec{w}, b$ - Model parameters.
- $\alpha$ - Learning rate.

## Formulas

### Sigmoid Function / Model Prediction

$g(z) = \frac{1}{1 + e^{-z}}$, where\
$z = f_{\vec{w}, b}(\vec{x}) = \vec{w} \cdot \vec{x} + b$

### Loss Function

$L(f_{\vec{w}, b}(\vec{x^{(i)}}), y^{(i)}) = $\
$-\log (f_{\vec{w}, b}(\vec{x^{(i)}}))$ if $y^{(i)} = 1$\
$-\log (1 - f_{\vec{w}, b}(\vec{x^{(i)}}))$ if $y^{(i)} = 0$\

Which simplifies to:

$L(f_{\vec{w}, b}(\vec{x^{(i)}}), y^{(i)}) = - y^{(i)} \log (f_{\vec{w}, b}(\vec{x^{(i)}})) - (1 - y^{(i)}) \log (1 - f_{\vec{w}, b}(\vec{x^{(i)}}))$

Basically, if your target value, $y^{(i)} = 1$, we're gonna punish you by making $L -> infinity$ as you go to 0, and vice-versa.

### Cost Function

$J(\vec{w}, b) = \frac{1}{2m} \sum_{i = 1}^{m} L(f_{\vec{w}, b}(\vec{x^{(i)}}) - y^{(i)})$

### Gradient Descent

Repeat the until convergence:

$w_j = w_j - \alpha \frac{\partial}{\partial w_j} J(\vec{w}, b)$\
$b = b - \alpha \frac{\partial}{\partial b} J(\vec{w}, b)$

Repeat the until convergence:

$w_j =  w_j - \alpha [\frac{1}{m} \sum_{i = 1}^{m}(f_{\vec{w}, b}(\vec{x}^{(i)}) - y^{(i)})x_j^{(i)}]$\
$b = b - \alpha [\frac{1}{m} \sum_{i = 1}^{m}(f_{\vec{w}, b}(\vec{x}^{(i)}) - y^{(i)})]$

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

Matplotlib is building the font cache; this may take a moment.


## Logistic Regression Implementation

In [3]:
class LogisticRegression:
    def __init__(self, lr=0.001, n_iters=1000, threshold=0.5):
        self.lr = lr
        self.n_iters = n_iters
        self.threshold = threshold
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        # Gradient Descent
        for _ in range(self.n_iters):
            y_predict = self._sigmoid(np.dot(X, self.weights) + self.bias)

            dw = (1 / n_samples) * np.dot(X.T, y_predict - y)
            db = (1 / n_samples) * sum(y_predict - y)

            self.weights -= self.lr * dw
            self.bias -= self.lr * db

    def predict(self, X):
        y_predict = self._sigmoid(np.dot(X, self.weights) + self.bias)
        return [1 if i > self.threshold else 0 for i in y_predict]
    
    def _sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

## Generate and Visualize Random Dataset

In [4]:
X, y = make_classification(n_samples=100, n_features=1)
X_train, y_train, X_test, y_test = train_test_split(X, y)

fig = plt.figure(figsize=(8, 6))
fig.scatter(X[:, 0], y)

ValueError: Number of informative, redundant and repeated features must sum to less than the number of total features