# Face recognition using Logistic Regression

* We will focus on class 0 and 1
* 4953 samples of class 0, 547 samples of class 1

### Class Imbalance
* We will implement a binary classifier using logistic Regression. Therefore, we will focus on class 0 and 1.
* There are 4953 samples of class 0 and 547 samples of class 1.
* What would my classifiction rate be if I just choose 0 every time.
    * (4953 - 547) / 4953, therefore imbalance classification problem

### 2-class problem vs. 7-class problem 
* When we swich to softmax, will the problem get easier or harder?
    * 2-class: guess at random - expect 50% error
    * 7-class: guess at random - expect 6/7 = 86% error
    * K class: 1/K chance of being correct
* Kaggle top score" ~70% correct"

    
### Solving class imbalance
* Suppose we have 1000 samples from class 1, 100 samples from class 2
    * Method 1) Pick 100 samples from class 1, now we have 100 vs. 100
    * Method 2) Repeat class 2 10 times, now we have 1000 vs. 1000
    * Same 'expected' error rate
    * But method 2 is better (less variance, more data)
* Other options to expand class 2:
    * Add Gaussian noise
    * Add invariant transformatoins (shift left, right rotate, etc.)

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.utils import shuffle

## Logistic Regression with sigmoid (binary classification)

## Logistic Regression with softmax (softmas regression)

In [1]:
def get_data(balance_ones=False):
    # images are 48x48=2304 size vectors
    # N=5500
    Y=[]
    X=[]
    first=True
    for line in open('../data/fer2013/fer2013.csv'):
        if first:
            first=False
        else:
            row=line.split(',')
            y = int(row[0])
#             if y == 0 or y == 1:
            Y.append(y)
            X.append([int(p) for p in row[1].split()])
    X, Y = np.array(X) / 255.0, np.array(Y)  
    
#     if balance_ones:
#         XO, YO = X[Y!=1, :], Y[Y!=1]
#         X1 = X[Y==1, :]
#         X1 = np.repeat(X1, 9, axis=0)
#         X = np.vstack([XO, X1])
#         Y = np.concatenate((YO, [1]*len(X1)))
    return X, Y

In [2]:
def softmax(A):
    expA = np.exp(A)
    return expA / expA.sum(axis=1, keepdims=True)
    
def cost(T, Y):
    tot = T * np.log(Y)
    return -tot.sum()

def error_rate(T, P):
    return np.mean(T != P)

In [3]:
# since we use softmax, this can also be called softmax regression
class LogisticRegression(object):
    def __init__(self):
        pass
    
    def fit(self, X, Y, epochs=100, learning_rate=10e-8, reg=10e-12, show_fig=False):
        
        X, Y = shuffle(X, Y)
        print(X.shape)
        print(Y.shape)
        
        N, D = X.shape
        K = len(set(Y))
        
        # split data into training set and validation set
        X, Y = X[0:-1000], Y[0:-1000]
        Xvalid, Yvalid = X[-1000:], Y[-1000:]
        
        # one-hot-encoding the labels for both training set and validation set
        T = y2indicator(Y, K)
        Tvalid = y2indicator(Yvalid, K)
        print(T.shape)
        print(Tvalid.shape)
        
        # initialize weights randomly
        self.W = np.random.randn(D, K) / np.sqrt(D + K)
        self.b = np.zeros(K)
        
        
        # train the model
        costs = []
        best_validation_error = 1
        for ep in range(epochs):
            
            # forward propagation
            Py = self.forward(X)

            # back propagation
            Py_Y = Py - T
            self.W -= learning_rate * (X.T.dot(Py_Y) + reg * self.W)
            self.b -= learning_rate * (Py_Y.sum(axis=0) + reg * self.b)
        
             # show performace metrics
            if ep % 10 == 0:
                Pyvalid = self.forward(Xvalid)
                c = cost(Tvalid, Pyvalid)
                costs.append(c)
                e = error_rate(Yvalid, np.argmax(Pyvalid, axis=1))
                print("ep:", ep, "cost:", c, "error:", e)
                if e < best_validation_error:
                    best_validation_error = e
        print("best_validation_error:", best_validation_error)              

        if show_fig:
            plt.plot(costs)
            plt.show()     

    
    def forward(self, X):
        return softmax(X.dot(self.W) + self.b)

In [4]:
# one-hot-encoding on labels
def y2indicator(y, dims):
    N = len(y)
    y = y.astype(np.int32)
    ind = np.zeros((N, dims))
    for i in range(N):
        ind[i, y[i]] = 1
    return ind

In [8]:
X, Y = get_data()

XO, YO = X[Y!=1, :], Y[Y!=1]
X1 = X[Y==1, :]
X1 = np.repeat(X1, 9, axis=0)
X = np.vstack([XO, X1])
Y = np.concatenate((YO, [1]*len(X1)))
print(X.shape)
print(Y.shape)

(40263, 2304)
(40263,)


In [9]:
model = LogisticRegression()
model.fit(X, Y, epochs=1000, show_fig=True)

(40263, 2304)
(40263,)
(39263, 7)
(1000, 7)
ep: 0 cost: 2021.41768959 error: 0.869
ep: 10 cost: 1930.68303262 error: 0.786
ep: 20 cost: 1925.77484986 error: 0.781
ep: 30 cost: 1921.71272939 error: 0.778
ep: 40 cost: 1918.01546719 error: 0.773
ep: 50 cost: 1914.61777947 error: 0.77
ep: 60 cost: 1911.47440339 error: 0.767
ep: 70 cost: 1908.54853829 error: 0.772
ep: 80 cost: 1905.80997482 error: 0.773
ep: 90 cost: 1903.2338262 error: 0.769
ep: 100 cost: 1900.79951111 error: 0.771
ep: 110 cost: 1898.48993466 error: 0.771
ep: 120 cost: 1896.29083002 error: 0.769
ep: 130 cost: 1894.19022965 error: 0.768
ep: 140 cost: 1892.17804064 error: 0.763


KeyboardInterrupt: 

### pitfalls

```python
def softmax(A):
    expA = np.exp(A)
    # expA / expA.sum() is not gonna work
    return expA / expA.sum(axis=1, keepdims=True)
    
def cost(T, Y):
    tot = T * np.log(Y)
    # notice this is negative
    return -tot.sum()
```
    