# Logistic regression

# Y = WX + B

!['Data shape'](tf2.jpg)

For every sample of X (X1, X2, X3), we get logits for label 1 (Y1) and label 2 (Y2).

In order to add the bias to the product of WX, we had to turn b into a matrix of the same shape. 
This is a bit unnecessary, as we can take advantage of an operation called broadcasting used in TensorFlow and Numpy.

In [None]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from test import *

def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []
    mnist = input_data.read_data_sets('mnist', one_hot=True)
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels

In [9]:
n_features = 784
n_labels = 3
learning_rate = 0.08

X = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

w_dim = (n_features, n_labels)
W = tf.Variable(tf.truncated_normal(w_dim))
b = tf.Variable(tf.zeros(n_labels))

logits = tf.matmul(X, W) + b
train_features, train_labels = mnist_features_labels(n_labels)

print(len(train_features), train_features[1].shape)
print(len(train_labels), train_labels[1].shape)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())    
    prediction = tf.nn.softmax(logits)
    cross_entropy = -tf.reduce_sum(y * tf.log(prediction), reduction_indices=1)
    loss = tf.reduce_mean(cross_entropy)    
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
    _, l = sess.run(
        [optimizer, loss],
        feed_dict={X: train_features, y: train_labels})

print('Loss: {}'.format(l))

Extracting mnist/train-images-idx3-ubyte.gz
Extracting mnist/train-labels-idx1-ubyte.gz
Extracting mnist/t10k-images-idx3-ubyte.gz
Extracting mnist/t10k-labels-idx1-ubyte.gz
3060 (784,)
3060 (3,)
Loss: 6.138011455535889


# Softmax 


In [16]:
import numpy as np

def softmax(x):
    # axis=0 means sum by rows 
    return np.exp(x)/np.sum(np.exp(x), axis=0)

logits = [3.0, 3.0, 3.0]
print(softmax(logits))

# Each column is a sample (a data point)
logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
    
print(softmax(logits))

[0.33333333 0.33333333 0.33333333]
[[0.09003057 0.00242826 0.01587624 0.33333333]
 [0.24472847 0.01794253 0.11731043 0.33333333]
 [0.66524096 0.97962921 0.86681333 0.33333333]]


# Mini-batching

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [3]:
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

Extracting mnist/train-images-idx3-ubyte.gz
Extracting mnist/train-labels-idx1-ubyte.gz
Extracting mnist/t10k-images-idx3-ubyte.gz
Extracting mnist/t10k-labels-idx1-ubyte.gz


# Memory size

```
train_features Shape: (55000, 784) Type: float32
train_labels Shape: (55000, 10) Type: float32
weights Shape: (784, 10) Type: float32
bias Shape: (10,) Type: float32
```


In [22]:
# How many bytes of memory does train_features need?
print(55000 * 784 * 4 // 1000, 'Kb')


172480 Kb


### TensorFlow Mini-batching

In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's `tf.placeholder()` function to receive the varying batch sizes.

IF each sample had `n_input = 784` features (aka pixels) and `n_classes = 10` possible labels, the dimensions for features would be `[None, n_input]` and labels would be `[None, n_classes]`.

```python
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
```
The **None** dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

#### Question 2

Use the parameters below, how many batches are there, and what is the last batch size?

```
features is (50000, 400)
labels is (50000, 10)
batch_size is 128
```

In [6]:
num_batches = 50000 // 128
last_batch_size = 50000%128
print('num_batches: ', num_batches, ', last_batch_size: ', last_batch_size)

num_batches:  390 , last_batch_size:  80


# Feed mini batches of MNIST features and labels into a model

In [35]:
import math
from pprint import pprint
def batches(batch_size, features, labels):    
    assert len(features) == len(labels)
    
    
    combined = list(zip(features, labels))
    np.random.shuffle(combined)
    x_train, y_train = zip(*combined)

    myBatches = []
    N = len(x_train)
    k = batch_size    
    for i in range(0, N, k):
        x = x_train[i:i+k]
        y = y_train[i:i+k]
        myBatches.append([x, y])
        
    return myBatches

# test
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]
pprint(batches(3, example_features, example_labels))

[[(['F41', 'F42', 'F43', 'F44'],
   ['F11', 'F12', 'F13', 'F14'],
   ['F21', 'F22', 'F23', 'F24']),
  (['L41', 'L42'], ['L11', 'L12'], ['L21', 'L22'])],
 [(['F31', 'F32', 'F33', 'F34'],), (['L31', 'L32'],)]]


In [40]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    current_cost = sess.run(
        cost,
        feed_dict={x: last_features, y: last_labels})
    
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={x: valid_features, y: valid_labels})
    
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))
    
mnist = input_data.read_data_sets('mnist', one_hot=True)
train_features = mnist.train.images
train_labels = mnist.train.labels.astype(np.float32)

valid_features = mnist.validation.images
valid_labels = mnist.validation.labels.astype(np.float32)

test_features  = mnist.test.images
test_labels = mnist.test.labels.astype(np.float32)

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.random_normal([10]))

# Logits - xW + b or (128*784) * (784*10) + (,10) 
z = tf.add(tf.matmul(x, W), b)

learning_rate = tf.placeholder(tf.float32)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=z, labels=y))
opt = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

correct_prediction = tf.equal(tf.argmax(z, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

batch_size = 128
epochs = 100
learn_rate = 0.1

init = tf.global_variables_initializer()
train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    for epoch_i in range(epochs):        
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                    x: batch_features,
                    y: batch_labels,
                    learning_rate: learn_rate}
            sess.run(opt, feed_dict=train_feed_dict)
        
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)
        # learn_rate = learn_rate*(100-epoch_i)/100.0
        
    test_accuracy = sess.run(
        accuracy,
        feed_dict={x: test_features, y: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Extracting mnist/train-images-idx3-ubyte.gz
Extracting mnist/train-labels-idx1-ubyte.gz
Extracting mnist/t10k-images-idx3-ubyte.gz
Extracting mnist/t10k-labels-idx1-ubyte.gz
Epoch: 0    - Cost: 1.79     Valid Accuracy: 0.711
Epoch: 1    - Cost: 1.32     Valid Accuracy: 0.792
Epoch: 2    - Cost: 1.17     Valid Accuracy: 0.823
Epoch: 3    - Cost: 1.08     Valid Accuracy: 0.84 
Epoch: 4    - Cost: 1.01     Valid Accuracy: 0.851
Epoch: 5    - Cost: 0.951    Valid Accuracy: 0.861
Epoch: 6    - Cost: 0.899    Valid Accuracy: 0.866
Epoch: 7    - Cost: 0.853    Valid Accuracy: 0.872
Epoch: 8    - Cost: 0.812    Valid Accuracy: 0.876
Epoch: 9    - Cost: 0.776    Valid Accuracy: 0.879
Epoch: 10   - Cost: 0.744    Valid Accuracy: 0.88 
Epoch: 11   - Cost: 0.716    Valid Accuracy: 0.883
Epoch: 12   - Cost: 0.691    Valid Accuracy: 0.885
Epoch: 13   - Cost: 0.668    Valid Accuracy: 0.888
Epoch: 14   - Cost: 0.648    Valid Accuracy: 0.889
Epoch: 15   - Cost: 0.63     Valid Accuracy: 0.891
Epoch: 16 