# Logistic Regression

### Table of Contents
[Conventions](#conventions)

### Conventions
__Number of training examples__: $M$

__Number of features__: $N$

__Design matrix__: $X$<br>
The design matrix $X$ is a matrix $X \in \mathbb{R}^{M, N}$ with rows $X[i, :].reshape(1, N) \in \mathbb{R}^{1, N}$, and columns $X[:, i].reshape(M, 1) \in \mathbb{R}^{M, 1}$

__Output Vector__: $Y$<br>
The output vector $Y$ is a matrix $Y \in \mathbb{R}^{M, 1}$ that our model will be evaluated against 

__Parameter vector__: $\theta$<br>
The parameter vector $\theta$ is a matrix $\theta \in \mathbb{R}^{N, 1}$ that parameterizes our hypothesis

__Hypothesis__: $H$<br>
The hypothesis $H$ is a function $H(\theta, x): \mathbb{R}^{N, 1}, \mathbb{R}^{1, N} \to \mathbb{R}$ that models our data

__Learning rate__: $\alpha$

__Regularization Constant__: $\lambda$

__Normalizaing Constant__: $K$

In [104]:
%matplotlib inline
from jupyterthemes import jtplot
jtplot.style()
import numpy as np
import matplotlib.pyplot as plt

_raw = np.genfromtxt('monks-1.csv', delimiter=',')
_DATA = _raw[:, 0:7]
X = _raw[:, 1:7]
Y = _raw[:, 0]

In [105]:
M = Y.size
N = X.shape[1]
ALPHA = 0.1
H = lambda x: (1 / (1 + np.exp(-x @ theta)))
K = 1/float(M)

In [106]:
def addFeatures(X):
    col = lambda i: X[:, i].reshape(M, 1)
    sq = lambda x: np.multiply(x, x)
    a, b, c = sq(col(0)), sq(col(1)), sq(col(4))
    res = np.concatenate((X, a, b, c), 1)
    return res
X = addFeatures(X)
N = X.shape[1]

In [134]:
def cost(theta: np.array) -> float:
    '''
    param: theta, parameter vector
    returns: cost that theta incurs on your data
    '''
    sum = 0
    for i in range(M):
        x, y = X[i].reshape(1, N), Y[i] # transfer
        sum += - (y * np.log(H(x))) - ((1 - y) * np.log(1 - H(x)))
    return sum

def accuracy(theta: np.array):
    correct = 0;
    threshold = 0.5
    for i in range(M):
        x, y = X[i].reshape(1, N), Y[i]
        pos, neg = y == 1 and H(x) >= threshold, y == 0 and H(x) < threshold
        correct += 1 if pos or neg else 0
    return correct/float(M)

def distances(theta: np.ndarray) -> np.ndarray:
    '''
    param: theta, parameter vector
    returns: an array of differences between the model and the actual values
    '''
    res = np.zeros(M)
    for i in range(M):
        x, y = X[i].reshape(1, N), Y[i] # transfer
        res[i] = H(x) - y
    return res.reshape(M, 1)

def descend(theta):
    inf = X @ distances(theta)
    inf = -ALPHA * K * inf
    return theta + inf

In [135]:
theta = np.zeros(X.shape[1]).reshape(N, 1)
ITERATIONS = 300
for i in range(ITERATIONS):
    print(theta)
    theta = descend(theta)
print(f'cost {cost(theta)} accuracy {accuracy(theta)} theta {theta}')

[[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]


ValueError: shapes (432,9) and (432,1) not aligned: 9 (dim 1) != 432 (dim 0)