# Logistic Regression
- Learn sigmoid function, gradient descent/ascent.

## Cats and Dogs example
- To classify cats(Label **O**) and dog(label **1**). Let's denote this as $L={0,1}$. 
- We extract set of feature vectors(**F**) from our dataset
- Using logistic regression, L and F, we want to creat a function that takes new feature vector and outputs either 0 or 1.
- Instead of using step function (non-differentiable), we use sigmoid function:
$$
s(t) = 1/(1+e^{-t})
$$
- For a given image, say we extract following feature vector"
$$
v = [v_0, v_1,  v_2, ..., v_n]
$$
- To perform classification using Logistic regression, we'll multiply each of our feature vector values $v_i$ by a weight $w_i$ and take sum:
$$
x = \sum_{i}w_iv_i
$$
$$
x = w^Tv
$$
$$
x = w_{0}v_{0} + w_{1}v_{1} + w_{2}v_{2} + \cdots + w_{n}v_{n}
$$
- The $x$ value is then passed through $\sigma$ function where output is constrained such that:
$$
0 \le \sigma(x) \le 1
$$
- The classified label will be
$$
L = 
\begin{cases}
1&\text{$s(x) \ge 0.5$}\\
0&\text{$s(x) \lt 0.5$}
\end{cases}
$$

## Gradient descent and ascent
- We use gradient ascent and descent to find the local minimum/maximum of a function.
- Steps:
    - Extract feature vectors from all images in our dataset.
    - Initialize all weight entries w to 1.
    - Loop N times (or until convergence)
        - Calculate the gradient of the entire dataset.
        - Update the weight entries based on the current values of w, gradient, and learning rate $\alpha$.
    - Return weights w.
- The gradient is out error E, which is:
$$
E = (L - s(F_{t} \times w))
$$
where $F_t$ are feature vectors of training set
- We can then update weight vector by using:
$$
w = w + (\alpha \times (F_{t}^{T} \times E))
$$
- Where the $\alpha$ parameter controls the magnitude (i.e. size) of our step. Larger values of $\alpha$ will take larger steps and learn faster, but can skip right over optimal values of w. Similarly, smaller values of $\alpha$ will take smaller steps, learn slower, but may get stuck in local minima/maxima and miss finding optimal values of w altogether.
- We start off at w_{0} where we have trivially initialized our weights. Then, at each subsequent iteration, our algorithm moves closer and closer to the optimal w values marked at the center region of the figure.

## Applying Logistic regression on face classification

In [1]:
from __future__ import print_function
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
import numpy as np
import imutils
import cv2
import sklearn
 
# handle older versions of sklearn
if int((sklearn.__version__).split(".")[1]) < 18:
    from sklearn.cross_validation import train_test_split
 
#otherwise we're using at lease version 0.18
else:
    from sklearn.model_selection import train_test_split
 
# grab a small subset of the Labeled Faces in the Wild dataset, then construct
# the training and testing splits (note: if this is your first time running this
# script it may take awhile for the dataset to download -- but once it has downloaded
# the data will be cached locally and subsequent runs will be substantially faster)
print("[INFO] fetching data...")
dataset = datasets.fetch_lfw_people(min_faces_per_person=70, funneled=True, resize=0.5)
(trainData, testData, trainLabels, testLabels) = train_test_split(dataset.data, dataset.target,
                                                                  test_size=0.25, random_state=42)

[INFO] fetching data...


In [2]:
print("[INFO] training model...")
model = LogisticRegression()
model.fit(trainData, trainLabels)
print(classification_report(testLabels, model.predict(testData),
                            target_names=dataset.target_names))

[INFO] training model...
                   precision    recall  f1-score   support

     Ariel Sharon       0.47      0.54      0.50        13
     Colin Powell       0.87      0.87      0.87        60
  Donald Rumsfeld       0.61      0.63      0.62        27
    George W Bush       0.91      0.88      0.90       146
Gerhard Schroeder       0.63      0.76      0.69        25
      Hugo Chavez       0.64      0.47      0.54        15
       Tony Blair       0.76      0.81      0.78        36

         accuracy                           0.80       322
        macro avg       0.70      0.71      0.70       322
     weighted avg       0.81      0.80      0.81       322



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
