# Logistic Regression

Logistic regression refers to a classifier that classifies an observation into one of two classes, and multinomial logistic regression is used when classifying into more than two classes, but the two terms are used interchangebly. We will look at an example where we want to classify handwritten digits into one of 10 classes: 0-9

The logistic regression model works in a similar fashion to a linear regression model except that the final sum of the product between the weights and dependent variable is passed through a function that transforms the input to lie between 0 and 1. This function is called the logistic function, giving the model its name.

We can create a logistic regressor in the same way as we created a linear regression computational graph.

We will use the MNIST database of <a href="http://yann.lecun.com/exdb/mnist/"> handwritten digits</a> for this example.

TensorFlow provides easy access to some sample data sets. We can access the mnist dataset a TensorFlow dataset that contains 60,000 training images and their corresponding labels as well as 10,000 testing images and their corresponding labels.
- Each image is 28 pixels by 28 pixels
- Each image represents a digit between 0 and 9
- The labels are one-hot encoded => each label is a 1x10 vector

* Logistic regrssion model is a classifier
* Aims to model  $p(\ label\ $| $\ data)$
* Train the classifier to predict $Y_i = 1\  if\  \theta \cdot X > 0$

\begin{equation*}
y = g(\Theta^{T} X) = g(\sum_{i=0}^{n} \theta_{i} x_{i}) \text{ where } x_0 = 1
\end{equation*}

- g is the <a href = "https://en.wikipedia.org/wiki/Softmax_function"> softmax function</a>
    - It squashes all dimensions of a vector input to lie between 0-1
    - Ensures that the sum of the magnitudes of each dimension is 1

- The logistic regression model is trained by finding the $\theta$ that minimizes a cross-entropy loss

<img src="../resources/img/LogRegDataflow.png">

# Logistic Regression in Tensorflow 

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt 
rng = np.random
logs_path = 'logs/lesson1'

In [2]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


## Processing Input 

First we will build the <font color="red"><b>computational graph</b></font> for linear regression based on the algebraic equation that the model is defined by. We will use two new TensorFlow concepts, <font color="red">placeholders and variables</font>, to build our graph. 

<font color="red">**Placeholders**</font> are entry points into the graph allowing for training data to be passed into the graph.

<font color="red">**Placeholders**</font> are used to represent parameters of the graph which need to retain their value between runs (iterations) while training in a session. 

```python
x = tf.placeholder(dtype = tf.float32, shape = [None, 784], name = "inputData") 
y = tf.placeholder(dtype = tf.float32, shape = [None, 10], name = "actualLabel")

W = tf.Variable(initial_value = tf.zeros([784, 10]), name = "weight")
b = tf.Variable(initial_value = tf.zeros([10]), name = "bias")

with tf.name_scope('model'):
    prediction = tf.nn.softmax(tf.add(b, tf.matmul(x, W))) 
```

New concepts:
1. <font color="red">**Placeholders**</font>: https://www.tensorflow.org/api_guides/python/io_ops#Placeholders
2. <font color="red">**Placeholders**</font>: https://www.tensorflow.org/programmers_guide/variables

* Placeholders receive input
    * flattened images of dimension 28*28 = 784
    * one-hot encoded labels for the 10 classes

* Specifying `None` as a dimension in a placeholder allows for variable batch sizes

To quote [TensorFlow's programmer's guide](https://www.tensorflow.org/programmers_guide/variables):
>A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program.

>Variables are manipulated via the tf.Variable class. A tf.Variable represents a tensor whose value can be changed by running ops on it. Unlike tf.Tensor objects, a tf.Variable exists outside the context of a single session.run call.

>Internally, a tf.Variable stores a persistent tensor. Specific ops allow you to read and modify the values of this tensor. These modifications are visible across multiple tf.Sessions, so multiple workers can see the same values for a tf.Variable.

* Variables change state
    * W and b
    * set to have random initial values

* Each placeholder or variable is an edge in the TensorFlow computation graph
    * Edges represent Tensors
    * Tensors are n-dimensional arrays

* Each operation on Tensors is a node in the TensorFlow graph
    * Nodes take tensors as input
    * Return tensors as output

In [3]:
x = tf.placeholder(dtype = tf.float32, shape = [None, 784], name = "inputFeatures") # mnist data image of shape 28*28=784
y = tf.placeholder(dtype = tf.float32, shape = [None, 10], name = "actualLabel") # 0-9 digits recognition => 10 classes

W = tf.Variable(initial_value = tf.zeros([784, 10]), name = "weight")
b = tf.Variable(initial_value = tf.zeros([10]), name = "bias")

with tf.name_scope('model'):
    prediction = tf.nn.softmax(tf.add(b, tf.matmul(x, W))) # Softmax

## Loss and Optimization 

Our model is complete, but our computational graph is not yet complete. To complete the computational graph, we need to define a <font color="blue"><b>loss function</b></font> and an <font color="blue"><b>optimization strategy</b></font> to allow for the training of the free variables, $b$ and $w$ 

We will use <font color="blue"><b>Mean Squared Error</b></font> as our loss function.

* MSE Loss
\begin{equation*}
loss = \frac{\sum{(\hat{y}-y)}^2}{2n}
\end{equation*}

Tensorflow provides various inbuilt optimizers that allow for the optimization of objective functions. These inbuilt optimizers are mostly directed toward neural network optimization, but a user can specify their own optimization functions by extending a base class. The base class provides access to various methods that calculate the gradients at all points in our computational graph. However, for most industrial projects the set of optimizers provided by TensorFlow are sufficient. 

To optimize this linear regressor, we will use the inbuilt <font color="blue"><b>Gradient Descent Optimizer</b></font>.

New concepts:
3. <font color="blue"><b>reduce_sum</b></font> : https://www.tensorflow.org/api_docs/python/tf/reduce_sum
4. <font color="blue"><b>Gradient Descent Optimizer</b></font> : https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer

```python
with tf.name_scope('Loss'):
    loss = tf.reduce_sum( input_tensor = tf.pow(prediction-y, 2))/(2*n_samples)

learning_rate = 0.01
with tf.name_scope('SGD'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
```

In [4]:
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 5

In [5]:
with tf.name_scope('Loss'):
    loss = tf.reduce_mean(-tf.reduce_sum(y*tf.log(prediction), axis=1))
                          
tf.summary.scalar("loss", loss)
merged_summary_op = tf.summary.merge_all()
                          
with tf.name_scope('Optimizer'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

In [6]:
init = tf.global_variables_initializer()

We now have a complete computational graph. Each run of the optimizer takes a group of  Xs as input and makes a prediction. The prediction is compared against the inputted Ys to get the loss. The optimizer updates the free variables in its loss function based on the loss for that input.

## Executing the Model

TensorFlow uses the <font color="red">**`tf.Session`**</font> class to represent a <font color="red">connection between the client program</font>---typically a Python program, although a similar interface is available in other languages---and the <font color = "red">C++ runtime</font>. 

A `tf.Session` object provides access to devices in the local machine, and remote devices using the distributed TensorFlow runtime. 

It also caches information about your `tf.Graph` so that you can efficiently run the same computation multiple times.



- Training the computational graph in a session

```python
with tf.Session() as sess:
    sess.run(init)
    
    for each epoch in range(training_epochs):
        for each batch in range(total_batches):
            sess.run(fetches=[optimizer,loss], feed_dict={x: batch_xs, y: batch_ys})
```

We will use <font color = "red">**Batch Gradient Descent**</font> to optimize our loss function.


The `run` method runs one "step" of TensorFlow computation, by running the necessary graph fragment to execute every Operation and evaluate every Tensor in fetches, substituting the values in feed_dict for the corresponding input values

Notice that we pass a batch of Xs and Ys to the feed_dict parameter when we run the optimizer in a session.

The `feed_dict` parameter in the `run` function of a session accepts key, value entries where the value is a Python scalar, string, list, or numpy ndarray that can be converted to the same dtype as that tensor represented by the key. Additionally, if the key is a tf.placeholder, the shape of the value will be checked for compatibility with the placeholder.

In [7]:
# Start training
with tf.Session() as sess:
    sess.run(init)
    summary_writer = tf.summary.FileWriter(logs_path + "/logistic", graph=tf.get_default_graph())

    # Training cycle
    for epoch in range(training_epochs):
        avg_loss = 0.
        total_batch = int(mnist.train.num_examples/batch_size) # there would be 600 batches
        
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            
            # Fit training using batch data
            _, c = sess.run([optimizer, loss], feed_dict={x: batch_xs,
                                                          y: batch_ys})
            # Compute average loss
            avg_loss += c / total_batch
            
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "loss=", "{:.9f}".format(avg_loss))

    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
    
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))

Epoch: 0005 loss= 0.465482926
Epoch: 0010 loss= 0.392333548
Epoch: 0015 loss= 0.362695539
Epoch: 0020 loss= 0.345441683
Epoch: 0025 loss= 0.333685642
Optimization Finished!
Accuracy: 0.889333


## Using Tensorboard to View Graph Structure 

We can have a look at the computational graph that we have just defined on Tensorboard. We have installed a jupyter extension that makes connecting to Tensorboard very simple. To do this, 

In your Jupyter directory tree view, select the log directory for lesson 1 and click the <font color = "red">**Tensorboard**</font> button as shown in the picture.
<img src = "../resources/img/TensorboardInit1.PNG">

Next, go to the <font color = "red">**Running**</font> tab, and choose the Tensorboard instance corresponding to the correct log directory as shown in the screenshot.
<img src = "../resources/img/TensorboardInit2.PNG">

