# EUPS Logistic Regression

Logistic Regression is quite similar to linear regression. While linear regression is used to predict values like height or cost, logistic regression tends to be used as a *classifier*, that predicts which of two classes an input data sample is more likely to be, i.e. mammal or reptile, usually represented by 0 and 1.

| $x_1$ | $x_2$ | $y$(Class) |
| ----- | ----- | --- |
|  2    | 3     |  0  |
|  1    | 4     |  1  |
|  0.2    | -2     |  1  |
|  5    | 5     |  0  |

*Note - these are dummy numbers...don't look for any meaning in them or their relationship!*

In [None]:
# Import some useful libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time
%matplotlib notebook

## Step 1 - Error

We cannot use the same error measure as linear regression. The error used here is called *cross-entropy*.

\begin{equation}
Error = - \sum_{i=1}^N y_ilog(p_i) + (1-y_i)log(1-p_i) \\
\text{,where } p_i = predictions_i
\end{equation}

Code an `error(predictions,ys)` function that takes in a **column vector** (shape $n \times 1$) of predictions and a **column vector** of actual values and computes the error as described, returning a single number just like linear regression. The $i$th row in the prediction column vector corresponds to the prediction for the $i$th data input.

Useful `numpy` functions:

`np.log`, `np.sum()`

In [None]:
# An example input:
# predictions_example = np.array([[0.8],[0.06],[0.6],[0.98]])
# ys_example = np.array([[1],[0],[1],[1]])
# Expected ouput : 0.8160472861158075
# Write your code below!

## Step 2 - Prediction

The prediction is very similar to linear regression. Basically you just do exactly the same as linear regression and then pump the prediction value you get from that through a special function called the *sigmoid activation function* $\sigma$. 

\begin{equation}
\sigma \left(x \right) = \frac{1}{1+e^{-x}}
\end{equation}

So the prediction of one data point $\mathbf{x} = \begin{pmatrix}x_1 & x_2 & ... & x_n\end{pmatrix}$ for logistic regression is just this:

\begin{equation}
\hat{y} = \sigma \left(b + w_1x_1 + w_2x_2 + \cdots + w_nx_n \right) \\
= \sigma \left(b + \sum_{i=1}^N w_ix_i \right)
\end{equation}

First code a function `sigmoid(x)` which takes in a numpy array of values and applies the $\sigma$ function to them.

Useful `numpy` functions:

`np.exp()`

In [None]:
# Write sigmoid here!

Now use that to code a `predict(X,ws,b)` exactly as you did for linear regression but using the new prediction formula above.

In [None]:
# An example input:
# X_example = np.array([
# [2,5],
# [1,6],
# [-2,5]
# ])
# ws_example = np.array([[7.32,1.11]])
# b_example = 5
# Expected output : array([[1.        , 0.99999999, 0.01646364]])
# Write predict below!

## Step 3 - Learning

The procedure is identical to linear regression, but now the partial derivative terms are different because the prediction formula has changed. Try and calculate the terms yourself before checking with ours.  The terms we need to calculate for gradient descent are the derivative of the error (above) w.r.t. the bias term and also w.r.t. a single weight term, i.e.

\begin{equation}
\frac{\partial Error}{\partial b} \\
\frac{\partial Error}{\partial w_i}
\end{equation}

 Using your results, write a function `update(b, ws, X, predictions, ys, learning_rate = 0.001)`. The only new term here is the learning rate, which appears in the gradient descent formula. The function should return the new weights and bias after updating them with one step of gradient descent. As a reminder, here are the formulae:

\begin{equation}
b_{new} = b_{old} - \eta \frac{\partial Error}{\partial b_{old}} \\
w_{i,new} = w_{i,old} - \eta \frac{\partial Error}{\partial w_{i,old}}
\end{equation}

In [None]:
# Code update here!

## Step 4 - All Together

All the groundwork for the algorithm has been laid, now for the easy part of stitching it all together. Write a function, `fit(X,y,epochs)`, that takes in a numpy array of input features for a certain number of samples, and also the correct output values for the number of samples and runs the linear regression algorithm on them, by running gradient descent `epochs` number of times. It must :

- Initialise some random weights, the same number as there are features.
- Run the following each epoch:
    - Call your `predict` function to get a set of predictions for the input features.
    - Call `error` on the result of your predictions to compare how good your predictions were.
    - Print out your error so we can hopefully see it reducing!
    - Update the weights using your `update` function.
- Finally return the numpy array corresponding to the trained weights and the bias term from the function.

In [None]:
# Code fit here!

## Gradient Terms

\begin{equation}
\frac{\partial Error}{\partial b} = - \sum_{i=1}^N y_i(1-p_i) - (1-y_i)p_i\\
\frac{\partial Error}{\partial w_j} = \sum_{i=1}^N x_{ij}y_i(1-p_i) - x_{ij}(1-y_i)p_i
\end{equation}

The sums in both expressions sum over the training examples (samples).