# I. Algorithm

## 1. Mathematics
### Logistic Regression:
Logistic Regression in matrix form.
$$f(\mathbf{x}) = \theta(\mathbf{w}^T\mathbf{x})$$
Note: 
- $\theta$ is logistic function (activation function).

### Activation functions and its derivative:
__Sigmoid function:__
- Formula:
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
- Derivative:
$$\sigma'(z) = \left(\frac{1}{1 + e^{-z}}\right)'$$
$$= \frac{e^{-z}}{(1 + e^{-z})^2}$$ 
$$= \frac{1}{(1 + e^{-z})} \frac{e^{-z}}{(1 + e^{-z})}$$
$$= \frac{1}{(1 + e^{-z})} \left(\frac{1+e^{-z}}{(1 + e^{-z})}- \frac{1}{1+e^{-z}}\right)$$
$$= \sigma(z) (1-\sigma(z))$$

__Tanh function:__
- Formula:
$$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$$
- Derivative:
$$\tanh'(z) = \left(\frac{e^z - e^{-z}}{e^z + e^{-z}}\right)'$$
$$= \frac{(e^z - e^{-z})'(e^z + e^{-z}) - (e^z - e^{-z})(e^z + e^{-z})'}{(e^z + e^{-z})^2}$$ 
$$= \frac{(e^z + e^{-z})(e^z + e^{-z}) - (e^z - e^{-z})(e^z - e^{-z})}{(e^z + e^{-z})^2}$$
$$= \left(\frac{e^z + e^{-z}}{e^z + e^{-z}}\right)^2 - \left(\frac{e^z - e^{-z}}{e^z + e^{-z}}\right)^2$$
$$= 1 - \tanh(z)^2$$

__ReLU function:__
- Formula:
$$\text{ReLU}(z) = max(0,z)$$
- Derivative:
$$\text{ReLU}(z)' = max(0,z)'$$
$$= \begin{cases} z' \text{ if } z > 0 \\ 0' \text{ if } otherwise \end{cases}$$ 
$$= \begin{cases} 1 \text{ if } z > 0 \\ undefined &\text{ if } z = 0 \\ 0 \text{ if } z < 0 \end{cases} $$

###  Loss function:
The loss function is entropy formula.
$$J(\mathbf{w}) = -\log P(\mathbf{y}|\mathbf{X}; \mathbf{w})$$
$$= -\sum_{i=1}^N(y_i \log {z}_i + (1-y_i) \log (1 - {z}_i))$$

### Optimize:
$$\begin{eqnarray}
\frac{\partial J(\mathbf{w}; \mathbf{x}_i, y_i)}{\partial \mathbf{w}} &=& -(\frac{y_i}{z_i} - \frac{1- y_i}{1 - z_i} ) \frac{\partial z_i}{\partial \mathbf{w}} \\
&=& \frac{z_i - y_i}{z_i(1 - z_i)} \frac{\partial z_i}{\partial \mathbf{w}}\\
&=& \frac{z_i - y_i}{z_i(1 - z_i)} \frac{\partial z_i}{\partial s} \mathbf{x}\\
\end{eqnarray}$$
Note:
- $\frac{\partial z_i}{\partial \mathbf{w}} = \frac{\partial z_i}{\partial s} \frac{\partial s}{\partial \mathbf{w}} = \frac{\partial z_i}{\partial s} \mathbf{x}$
- $\frac{\partial z_i}{\partial s} = f(a)'$  is the derivative calculated above with $a = \mathbf{w}^T\mathbf{x}$.

## 2.Code

In [1]:
import numpy as np

class LogisticRegression(object):
    def __init__(self):
        self.X_train = np.array([])
        self.y_train = np.array([])
        self.X_predict = np.array([])
        self.y_predict = np.array([])
        self.theta = np.array([])
        self.iteration = 0

    def fit(self, X_train, y_train, iteration=2000):
        self.X_train = X_train
        self.y_train = y_train
        self.iteration = iteration

    def sigmoid(self, s):
        return 1/(1 + np.exp(-s))

    def calculate_theta(self, eta=0.05):
        X_bar = np.concatenate((np.ones((self.X_train.shape[0], 1)), self.X_train), axis=1)
        N, d = X_bar.shape[0], X_bar.shape[1]
        self.theta = np.zeros(d)
        for iter in range(self.iteration):
            self.theta -= eta*X_bar.T.dot((self.sigmoid(X_bar.dot(self.theta)) - self.y_train))

    def predict(self, X_predict):
        self.calculate_theta()
        self.X_predict = X_predict
        X_bar_predict = np.concatenate((np.ones((self.X_predict.shape[0], 1)), self.X_predict), axis=1)
        self.y_predict = self.sigmoid(X_bar_predict.dot(self.theta))
        return self.y_predict