# [Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression)

### What is it?
Is a type of Generalized linear model.

### When do I use it?
I want to predict / classify categorical (eg. is customer in danger of leaving?, will I get drung tonight low, medium, high?) response variable based on arbitrary data. 

### Why do I use it?
Its robust against outliers. Its simple to setup. 

## Math intro
We want to have a model, that will outcome a probability for arbitrary input predictor parameters. Which means, we need function $f$ with $range(f) \in [0,1]$.

### Logistic function
Is a sigmoid function that is defined as:

$\sigma(x) = \frac{L}{1+e^{-k(x-x_0)}}$

and looks like this:
<img src="./logistic_curve.png" />

If we set params conveniently and take linear combination of data and regression coeffitients as an argument we will get:

$\sigma(X\beta) = \frac{1}{1+e^{-X\beta}}$

... a function of a linear combination of data that has $range(\sigma) \in [0,1]$

### [Generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model)
Models of type 

$E[Y] = g^{-1}(X\beta),$

where: 
* E ... expected value
* Y ... response variable from exponential distribution family (Normal, Poisson, Bernoulli, Binomial, ...)
* X ... predictor data matrix
* $\beta$ ... vector of regression coeffitients
* g ... link function

Our response variable Y manifests values $\{0,1\}$, each with certain probability. Thus it is distributed as [Bernoulli random variable](https://en.wikipedia.org/wiki/Bernoulli_distribution). We need to find a link function for out model...

### Logit function
Is a function, inverse to logistic function defined as:

$logit(x) = ln(\frac{x}{1-x})$

If we set link function $g(x) = logit(x)$, we will obtain model with $E[Y] = \sigma(X\beta)$.

## Model fitting
When we have a model that suits into GLM formalism, we can use it's methods to find optimal $\beta$. This is usualy done by Maximum likelyhood estimates which uses [Newton-Rhapshon algorithms](https://en.wikipedia.org/wiki/Newton%27s_method) - iterative numerical method for finding root of an equation.

## DOTAZY 
1] Proc zrovna logisticka funkce?

## Simple Example - Competitor mismatch detection

In [2]:
# import needed modules
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

In [3]:
# create the predictor data of distance metric
# single data point ~ single competitorXmaterial distance from group median
X = pd.DataFrame([0.034,0.002,0.43,0.562,0.01,0.003,0,0.34,0.012,0.87,0.523,0.14,0.164,0.08, 3.213])
# create the response data ... 1 ~ detected mismatch; 0 ~ datapoint is OK 
Y = np.array([0,0,1,1,0,0,0,1,0,1,1,0,0,1,1])

In [4]:
# create model instance. Large C means low regularization.
logit_model = LogisticRegression(C=1e5, fit_intercept = True)

In [5]:
# fit the model
logit_model.fit(X,Y)
# show how it fitts tha data. 1 => perfect fit
logit_model.score(X,Y)

0.93333333333333335

In [6]:
# predict & print probability of mismatch for new input data
X_new = pd.DataFrame([0.1,0.18,0.3])
for prediction in logit_model.predict_proba(X_new.as_matrix()):
    # print probability: P[y == True]
    print(prediction[1])

0.179267147228
0.449325463722
0.854891103176


In [7]:
# help function1
def logit_fucnction(x):
    return 1 / (1 + np.exp(-x))

# help function2
def regression_plot(model, X, Y, data_range = [-.5,1], grid_step = 300):
    data_range[1] = max(data_range[1], max(X.as_matrix().ravel())*1.05)
    p1 = go.Scatter(x=X.as_matrix().ravel(), y=Y, 
                    mode='markers',
                    marker=dict(color='black'),
                    showlegend=False
                   )
    support = np.linspace(data_range[0], data_range[1], grid_step)
    loss = logit_fucnction(support * model.coef_ + model.intercept_).ravel()

    p2 = go.Scatter(x=support, y=loss, 
                    mode='lines',
                    line=dict(color='red', width=3),
                    name='Logistic Regression Model')

    layout = go.Layout(xaxis=dict(title='x', range=[data_range[0], data_range[1]],
                                  zeroline=False),
                       yaxis=dict(title='y', range=[-0.25, 1.25],
                                  zeroline=False))

    fig = go.Figure(data=[p1, p2], layout=layout)
    init_notebook_mode(connected=True)
    iplot(fig)

In [8]:
# plot the regression sigmoid & data with plotly
regression_plot(logit_model, X, Y)