# Logistic regression
[Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a regression model used for data classification. The model is very simple since, in fact, it is a linear classifier.

However, when you start to explore your data, the simplest model is a good spot to start at.

It is easy to program, fast to train and the model is explanatory which means that the model tells you what influence each feature has on the dependent variable.
green & blue — correct classification; red — misclassified

![day79-logisti_regression](resource/day79-logistic_regression.png)

In advance, input data can be manually extended by polynomial features and logistic regression can become quite useful non-linear classifier that requires almost no assumptions about the data.

In [10]:
import numpy as np
from bokeh.plotting import figure, show, output_notebook

## logistic regression

In [11]:
def logistic_regression(X, Y, W, lr=0.001, steps=1000):
    m = len(Y)
    sigmoid = lambda z: 1 / (1 + np.exp(-z))
    
    for _ in range(steps):
        # prediction
        hypothesis = sigmoid(X @ W)
        
        # fix overflow & underflow
        hypothesis = np.clip(hypothesis, 1e-5, 1 - 1e-5)
        
        # loss function, gradient, training
        loss = -1 / m * (Y @ np.log(hypothesis) + (1 - Y) @ np.log(1 - hypothesis))
        gradient = 1 / m * (X.T @ (hypothesis - Y))
        W -= lr * gradient
    
    # current loss & prediction
    return loss, sigmoid(X @ W)

## data

In [12]:
n = 10000
# random values
x_, y_ = np.random.rand(2, n)
# polynomial coefficients
X = np.c_[np.ones(n), x_, y_, x_ ** 2, y_ ** 2]
# Y: target values
Y = (x_ - .5 <= (y_ - .5) ** 2) * 1
# weights
W = np.zeros(5)

## train classification model

In [13]:
for _ in range(10):
    loss, H = logistic_regression(X, Y, W, lr=5., steps=1000)
    print(loss)

0.09234048570649253
0.07344961653877012
0.06325616496816702
0.05674259777926254
0.05213704196723088
0.048660330164744664
0.045914138748199974
0.043672042159229194
0.041794952417991124
0.040192183512828085


In [14]:
print('accuracy', np.mean(Y == H.round()))
print('weights', W)

accuracy 0.9905
weights [ 24.03062333 -17.38877785 -37.91447882 -21.85021542  38.01297428]


## plot

In [15]:
output_notebook()

palette = ['steelblue', 'red', 'lightgreen']
color = [palette[i] for i in (Y + H.round()).astype(int)]

plot = figure()
plot.circle(x_, y_, line_color='#c0c0c0', fill_color=color, alpha=.6, size=8)

show(plot)