# Implementing _logistic regression_ using Python and `numpy` <a class="tocSkip">

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Logistic-function" data-toc-modified-id="Logistic-function-1">Logistic function</a></span></li><li><span><a href="#Likelihood-function" data-toc-modified-id="Likelihood-function-2">Likelihood function</a></span></li><li><span><a href="#Fitting:-training-predictors-+-training-response---&gt;-model-paramenters" data-toc-modified-id="Fitting:-training-predictors-+-training-response--->-model-paramenters-3">Fitting: training predictors + training response --&gt; model paramenters</a></span></li><li><span><a href="#Predicting:-test-predictors----[model(parameters)]---&gt;-test-response" data-toc-modified-id="Predicting:-test-predictors----[model(parameters)]--->-test-response-4">Predicting: test predictors -- [model(parameters)] --&gt; test response</a></span></li><li><span><a href="#Test-implementation" data-toc-modified-id="Test-implementation-5">Test implementation</a></span><ul class="toc-item"><li><span><a href="#Load-sample-dataset" data-toc-modified-id="Load-sample-dataset-5.1">Load sample dataset</a></span><ul class="toc-item"><li><span><a href="#Fit" data-toc-modified-id="Fit-5.1.1">Fit</a></span></li><li><span><a href="#Fit-with-sklearn" data-toc-modified-id="Fit-with-sklearn-5.1.2">Fit with sklearn</a></span></li></ul></li><li><span><a href="#Test-against-sklearn:-fitting" data-toc-modified-id="Test-against-sklearn:-fitting-5.2">Test against sklearn: fitting</a></span></li><li><span><a href="#Test-against-sklearn:-predictions" data-toc-modified-id="Test-against-sklearn:-predictions-5.3">Test against sklearn: predictions</a></span></li></ul></li></ul></div>

In [1]:
import numpy as np

# Logistic function
The logisitc model is definied as follows.
$$
p(X) = \frac{\exp(\beta_0 + \beta_1 X_1 + ... + \beta_p X_p)}{1 + \exp(\beta_0 + \beta_1 X_1 + ... + \beta_p X_p)}\tag{4.6, logreg_prob}\\
$$
Simplify for substitution into (i) below:
$$
p(x_i) = \frac{\exp(\beta x_i)}{1+\exp(\beta x_i)}
$$

In [2]:
def logistic(x: float) -> float:
    '''Take linear, return logistic.'''
    return np.exp(x) / (1 + np.exp(x))

def logreg_prob(x, coeff, intercept):
    '''Return predicted probability in a one-versus-rest setting.'''
    return logistic(np.dot(x, coeff.T).squeeze() + intercept)

# Likelihood function
The likelihood (plausibility) of a particular logistic model $y=p(x)$ with the parameter set $\beta$:
$$
\require{action}
\mathscr{l}(\beta) = \prod_{\texttip{i:y_i=1}{for i such that y_i = 1}}p(x_i)\prod_{\texttip{i':y_{i'}=0}{for i' such that y_i' = 0}}(1-p(x_{i'}))\tag{4.5}\\
$$
Simplify expression as a Bernoulli distribution:
$$
\mathscr{l}(\beta) = \prod_{i=0}^n[p(x_i)]^{y_i}[1-p(x_i)]^{1-y_i}\\
$$
As $\log(x)$ is monotonically increasing, maximising $\mathscr{l}(\beta)$ is equivalent to maximising $\log\mathscr{l}(\beta)$, which may be simplified:
$$
\begin{align}
\log\mathscr{l}(\beta) & = \log(\prod_{i=0}^n[p(x_i)]^{y_i}[1-p(x_i)]^{1-y_i})\\
& = \sum_{i=0}^n\log([p(x_i)]^{y_i}[1-p(x_i)]^{1-y_i})\\
& = \sum_{i=0}^n\left(y_i\log~p(x_i)+(1-y_i)\log[1-p(x_i)]\right)\tag{i}\\
& = \sum_{i=0}^n\left(y_i\log\frac{\exp(\beta x_i)}{1+\exp(\beta x_i)}+(1-y_i)\log\left(1-\frac{\exp(\beta x_i)}{1+\exp(\beta x_i)}\right)\right)\\
& = \sum_{i=0}^n\left(y_i\log\frac{\exp(\beta x_i)}{1+\exp(\beta x_i)}+(1-y_i)\log\frac{1}{1+\exp(\beta x_i)}\right)\\
& = \sum_{i=0}^n\left(y_i[\beta x_i-\log(1 + \exp(\beta x_i)]+(1-y_i)[0-\log(1 + \exp(\beta x_i)]\right)\\
& = \sum_{i=0}^n\left(y_i\beta x_i-y_i\log(1 + \exp(\beta x_i)-\log[1 + \exp(\beta x_i)]+y_i\log[1 + \exp(\beta x_i)]\right)\\
& = \sum_{i=0}^n[y_i\beta x_i-\log(1 + \exp(\beta x_i)]\\
\end{align}
$$

In order to find the maxima of $\log\mathscr{l}(\beta)$ using gradient ascend, determine the differential (gradient) function, which may be simplified:
$$
\begin{align}
\frac{d}{d\beta}\log\mathscr{l}(\beta) & = \frac{d}{d\beta}\sum_{i=0}^n[y_i\beta x_i-\log(1 + \exp(\beta x_i)]\\
& = \sum_{i=0}^n[y_i x_i-\frac{\frac{d}{d\beta}[1+exp(\beta x_i)]}{1 + \exp(\beta x_i)}]\\
& = \sum_{i=0}^n\left(y_i x_i-\frac{0+exp(\beta x_i)x_i}{1 + \exp(\beta x_i)}\right)\\
& = \sum_{i=0}^n\left(y_i -\frac{exp(\beta x_i)}{1 + \exp(\beta x_i)}\right) x_i\\
& = \sum_{i=0}^n[y_i - p(x_i)] x_i\tag{log_likelihood_gradient}\\
\end{align}
$$
Further reading: [Why the gradient is the direction of steepest ascent](https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/gradient-and-directional-derivatives/v/why-the-gradient-is-the-direction-of-steepest-ascent)

In [3]:
def log_likelihood_gradient(x, y, beta):
    '''Gradient for fitting using descend.'''
    
    if x.shape[1] - len(beta) == 1:
        x = np.hstack((np.ones((len(x), 1)), x))
        
    return np.dot(y - logreg_prob(x[:,1:], beta[1:], beta[0]), x)

def gradient_ascent(f, x_init, learning_rate, iterations):
    '''
    f : Gradient of the function to ascend
    x_init : seeding input to start the ascent
    '''
    x = x_init
    for i in range(iterations):
        x += f(x) * learning_rate
    return x

# Fitting: training predictors + training response --> model paramenters

In [4]:
def logreg_fit(X, y:int, learning_rate=0.0001, iterations=3000):
    '''
    Fit logistic regression using maximum likelihood, recursively
        over each target class (one-versus-rest).
    If classification is binary, assume larger class as target.

    Return coefficients, intercepts.
    '''
    beta_stack = np.empty((0, X.shape[1] + 1))
    
    targets = np.unique(y)
    if len(targets) == 2:
        targets = targets[1:2]
        
    for k in targets:
        X_full = np.hstack((np.ones((len(X), 1)), X))  # add constant term
        beta_init = np.zeros(X_full.shape[1])
        gradient_function = lambda b: log_likelihood_gradient(X_full, y==k, b)
        beta = gradient_ascent(gradient_function, beta_init, learning_rate, iterations)
        
        beta_stack = np.append(beta_stack, beta.reshape(1,-1), axis=0)
        
    return beta_stack[:,1:], beta_stack[:,0]

# Predicting: test predictors -- [model(parameters)] --> test response

In [5]:
def logreg_predict(x, coeff, intercept):
    '''
    Predict response class based on predicted probability of a
        logistic regression fit.
    '''
    
    if len(coeff) == 1:
        return (logreg_prob(x, coeff, intercept) > 0.5).astype(int)
    else:
        return np.argmax(logreg_prob(x, coeff, intercept), axis=1)

# Test implementation

## Load sample dataset

In [6]:
from sklearn import datasets

data = datasets.load_iris()
X = data.data[:,:-1]
y = data.target
# y = (data.target > 0).astype(int)  # uncomment this for binary classification
print("n_observations (n): {}".format(X.shape[0]))
print("n_predictors (p): {}".format(X.shape[1]))
print("n_class (k): {}".format(len(np.unique(y))))

n_observations (n): 150
n_predictors (p): 3
n_class (k): 3


### Fit

In [7]:
coeff, intercept = logreg_fit(X, y)

### Fit with sklearn

In [8]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X,y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

## Test against sklearn: fitting

In [9]:
print('sklearn model parameters:', clf.coef_, clf.intercept_)
print('my model parameters:', coeff, intercept)

sklearn model parameters: [[ 0.3763905   1.54657021 -2.56915316]
 [ 0.60278842 -1.7292152  -0.07309094]
 [-1.82898401 -1.25199132  3.36769776]] [ 0.28215851  1.18099044 -1.37108097]
my model parameters: [[ 0.3988074   1.57142293 -2.64498051]
 [ 0.38331814 -1.23543256  0.08002893]
 [-1.45550443 -1.25837044  2.77768663]] [ 0.29570172  0.38751133 -0.79073748]


## Test against sklearn: predictions 

In [10]:
my_predict = logreg_predict(X, coeff, intercept)
sklearn_predict = clf.predict(X)

print('Predictions match:', (my_predict == sklearn_predict).all())

Predictions match: True
