# Machine Learning Models in Tensorflow

Machine learning models can be implemented in tensorflow using the lower level tensorflow primitives which are operations and tensors.

Tensorflow also provides low level primitives to specify optimizers that can find the maxima or minima of a loss function.

If a machine learning model can be reduced to linear algebraic operations, it can be implemented in tensorflow.



We will see:

1. Linear regression

2. Logistic regression

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt 
rng = np.random
logs_path = '../../logs/lesson2'

## Linear Regression 

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the others are considered to be dependent variables. For our example, we want to relate the variable Y to the variable X using a linear regression model. 

Specification of the model:
$y$ = $b$ + $w_1$$x_1$ + ... +  $w_p$$x_p$
- $y$ is the regressed variable
- $w$'s are the weights
- $b$ is the bias term
- $x$'s are the features used to model y



In [2]:
# Some toy data
train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

First we will build the computational graph for linear regression based on the algebraic equation that the model is defined by. We will use two new TensorFlow concepts, placeholders and variables, to build our graph. 

Placeholders are entry points into the graph allowing for training data to be passed into the graph.

Variables are used to represent parameters of the graph which need to retain their value between runs (iterations) while training in a session. 

New concepts:
1. Placeholders: https://www.tensorflow.org/api_guides/python/io_ops#Placeholders
2. Variables: https://www.tensorflow.org/programmers_guide/variables

To quote [TensorFlow's programmer's guide](https://www.tensorflow.org/programmers_guide/variables):
>A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program.

>Variables are manipulated via the tf.Variable class. A tf.Variable represents a tensor whose value can be changed by running ops on it. Unlike tf.Tensor objects, a tf.Variable exists outside the context of a single session.run call.

>Internally, a tf.Variable stores a persistent tensor. Specific ops allow you to read and modify the values of this tensor. These modifications are visible across multiple tf.Sessions, so multiple workers can see the same values for a tf.Variable.

In [3]:
# Defining the computational graph for linear regression with 1 explanatory variable
# p = 1

# Input to the graph
y = tf.placeholder(dtype = tf.float32, name = 'InputData') # Placeholders - https://www.tensorflow.org/api_guides/python/io_ops#Placeholders
x = tf.placeholder(dtype = tf.float32, name = 'LabelData')

# Model parameters are defined using variables
# Variables - https://www.tensorflow.org/programmers_guide/variables
# Variables retain their value even outside the bounds of a session's run call
w = tf.Variable(initial_value = rng.randn(), name = "weight") 
b = tf.Variable(initial_value = rng.randn(), name = "bias")

# Connecting up the nodes in our linear model
# y = b + Wx
with tf.name_scope('model'):
    prediction = tf.add(b, tf.multiply(w, x))

# prediction holds the tensor that is the output of the operation add which takes tensors b, and the output of the multiply operation between the weight w, and the input x

Our model is complete, but our computational graph is not yet complete. To complete the computational graph, we need to define a loss function and an optimization strategy to allow for the training of the free variables, $b$ and $w$ 

Tensorflow provides various inbuilt optimizers that allow for the optimization of objective functions. These inbuilt optimizers are mostly directed toward neural network optimization, but a user can specify their own optimization functions by extending a base class. The base class provides access to various methods that calculate the gradients at all points in our computational graph. However, for most industrial projects the set of optimizers provided by TensorFlow are sufficient. 

To optimize a linear regressor, we will use the inbuilt Gradient Descent Optimizer.

New concepts:
3. reduce_sum operation: https://www.tensorflow.org/api_docs/python/tf/reduce_sum
4. Gradient descent optimizer: https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer

In [4]:
# Defining loss for our model
# Loss is the mean squared error between actual $y$ and predicted $y$
with tf.name_scope('Loss'):
    loss = tf.reduce_sum( input_tensor = tf.pow(prediction-y, 2))/(2*n_samples)
# reduce_sum is a function to compute the sum across dimensions of a tensor. In this case, the input tensor is a 1 x n_samples dimensional tensor of the prediction errors corresponding to the training samples  
# https://www.tensorflow.org/api_docs/python/tf/reduce_sum
tf.summary.scalar("loss", loss)
merged_summary_op = tf.summary.merge_all()
#Our previous definitions implicitly creates the relation between the loss and the variables w and b 

# We can use gradient descent to train our linear model
# https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer
learning_rate = 0.01
with tf.name_scope('SGD'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

We now have a complete computational graph. Each run of the optimizer takes one sample of X and Y as input, makes a prediction. The optimizer updates the free variables in its loss function based on the prediction for that input.

In [5]:
# We also need an operation to initialize our global variables (w and b)
init = tf.global_variables_initializer()

In [6]:
# We can now run the computational graph in a session
training_epochs = 1000              # We will run our model 1000 times
display_step = 1                 # Display the loss every 100 runs
final_w, final_b = 0,0
with tf.Session() as sess:
    sess.run(init)
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

    # Fit all training data
    for epoch in range(training_epochs):
        for (sample_x, sample_y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={x: sample_x, y: sample_y})

        #Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c , summary= sess.run([loss, merged_summary_op], feed_dict={x: train_X, y:train_Y})
            summary_writer.add_summary(summary, epoch)
            print("Epoch:", '%04d' % (epoch+1), "loss=", "{:.9f}".format(c), \
                "w=", sess.run(w), "b=", sess.run(b))

    print("Optimization Finished!")
    training_loss = sess.run(loss, feed_dict={x: train_X, y:train_Y})
    print("Training loss=", training_loss, "w=", sess.run(w), "b=", sess.run(b), '\n')
    final_w, final_b = sess.run(w), sess.run(b)

Epoch: 0001 loss= 48.691528320 w= 1.9161016 b= -0.5114414
Epoch: 0002 loss= 19.436344147 w= 1.378653 b= -0.58614177
Epoch: 0003 loss= 7.820686817 w= 1.0395125 b= -0.63258046
Epoch: 0004 loss= 3.214914083 w= 0.8254732 b= -0.66119117
Epoch: 0005 loss= 1.392511606 w= 0.69035304 b= -0.67855614
Epoch: 0006 loss= 0.673821211 w= 0.60501873 b= -0.6888277
Epoch: 0007 loss= 0.391865104 w= 0.5510915 b= -0.69462514
Epoch: 0008 loss= 0.282136917 w= 0.51697713 b= -0.6976011
Epoch: 0009 loss= 0.239955932 w= 0.49536166 b= -0.6987978
Epoch: 0010 loss= 0.224030703 w= 0.48163122 b= -0.69887304
Epoch: 0011 loss= 0.218160123 w= 0.47287512 b= -0.69824153
Epoch: 0012 loss= 0.216042563 w= 0.4672572 b= -0.6971652
Epoch: 0013 loss= 0.215262309 w= 0.46361905 b= -0.695809
Epoch: 0014 loss= 0.214915827 w= 0.46122992 b= -0.6942771
Epoch: 0015 loss= 0.214680210 w= 0.45962885 b= -0.69263524
Epoch: 0016 loss= 0.214449912 w= 0.458525 b= -0.6909246
Epoch: 0017 loss= 0.214197412 w= 0.457735 b= -0.6891717
Epoch: 0018 loss

Epoch: 0146 loss= 0.177072585 w= 0.4264984 b= -0.47117394
Epoch: 0147 loss= 0.176827088 w= 0.4262816 b= -0.4696142
Epoch: 0148 loss= 0.176582232 w= 0.42606503 b= -0.46805644
Epoch: 0149 loss= 0.176337942 w= 0.42584875 b= -0.4665005
Epoch: 0150 loss= 0.176094234 w= 0.42563272 b= -0.46494648
Epoch: 0151 loss= 0.175851166 w= 0.42541695 b= -0.46339437
Epoch: 0152 loss= 0.175608635 w= 0.42520148 b= -0.46184424
Epoch: 0153 loss= 0.175366744 w= 0.4249863 b= -0.46029592
Epoch: 0154 loss= 0.175125450 w= 0.42477134 b= -0.45874947
Epoch: 0155 loss= 0.174884722 w= 0.42455664 b= -0.457205
Epoch: 0156 loss= 0.174644619 w= 0.42434216 b= -0.45566243
Epoch: 0157 loss= 0.174405068 w= 0.42412803 b= -0.4541217
Epoch: 0158 loss= 0.174166113 w= 0.42391407 b= -0.4525829
Epoch: 0159 loss= 0.173927739 w= 0.42370042 b= -0.45104593
Epoch: 0160 loss= 0.173689976 w= 0.42348704 b= -0.4495109
Epoch: 0161 loss= 0.173452765 w= 0.42327395 b= -0.44797772
Epoch: 0162 loss= 0.173216134 w= 0.4230611 b= -0.44644642
Epoch: 0

Epoch: 0294 loss= 0.146570191 w= 0.39713582 b= -0.2599419
Epoch: 0295 loss= 0.146399468 w= 0.396955 b= -0.2586411
Epoch: 0296 loss= 0.146229193 w= 0.39677438 b= -0.25734192
Epoch: 0297 loss= 0.146059290 w= 0.39659402 b= -0.2560442
Epoch: 0298 loss= 0.145889834 w= 0.39641383 b= -0.25474814
Epoch: 0299 loss= 0.145720780 w= 0.39623392 b= -0.25345364
Epoch: 0300 loss= 0.145552158 w= 0.39605418 b= -0.2521608
Epoch: 0301 loss= 0.145383969 w= 0.39587468 b= -0.25086948
Epoch: 0302 loss= 0.145216122 w= 0.39569542 b= -0.24957973
Epoch: 0303 loss= 0.145048752 w= 0.3955164 b= -0.24829158
Epoch: 0304 loss= 0.144881740 w= 0.39533752 b= -0.24700503
Epoch: 0305 loss= 0.144715175 w= 0.3951589 b= -0.24572004
Epoch: 0306 loss= 0.144548997 w= 0.39498052 b= -0.24443664
Epoch: 0307 loss= 0.144383222 w= 0.39480233 b= -0.2431548
Epoch: 0308 loss= 0.144217864 w= 0.39462435 b= -0.24187452
Epoch: 0309 loss= 0.144052938 w= 0.39444658 b= -0.24059582
Epoch: 0310 loss= 0.143888369 w= 0.39426902 b= -0.23931865
Epoch:

Epoch: 0443 loss= 0.125240341 w= 0.37249595 b= -0.0826849
Epoch: 0444 loss= 0.125121921 w= 0.37234533 b= -0.081601284
Epoch: 0445 loss= 0.125003785 w= 0.37219492 b= -0.08051902
Epoch: 0446 loss= 0.124885939 w= 0.37204468 b= -0.079438075
Epoch: 0447 loss= 0.124768384 w= 0.37189463 b= -0.078358434
Epoch: 0448 loss= 0.124651127 w= 0.37174475 b= -0.077280134
Epoch: 0449 loss= 0.124534130 w= 0.37159503 b= -0.07620315
Epoch: 0450 loss= 0.124417476 w= 0.37144548 b= -0.07512749
Epoch: 0451 loss= 0.124301068 w= 0.37129614 b= -0.07405312
Epoch: 0452 loss= 0.124184944 w= 0.37114698 b= -0.07298009
Epoch: 0453 loss= 0.124069087 w= 0.37099802 b= -0.071908385
Epoch: 0454 loss= 0.123953551 w= 0.37084922 b= -0.070838
Epoch: 0455 loss= 0.123838298 w= 0.37070063 b= -0.06976893
Epoch: 0456 loss= 0.123723313 w= 0.3705522 b= -0.06870115
Epoch: 0457 loss= 0.123608619 w= 0.37040398 b= -0.06763469
Epoch: 0458 loss= 0.123494178 w= 0.3702559 b= -0.06656953
Epoch: 0459 loss= 0.123380043 w= 0.37010804 b= -0.065505

Epoch: 0594 loss= 0.110280856 w= 0.35172027 b= 0.06677423
Epoch: 0595 loss= 0.110199139 w= 0.3515951 b= 0.06767466
Epoch: 0596 loss= 0.110117599 w= 0.35147014 b= 0.06857399
Epoch: 0597 loss= 0.110036261 w= 0.3513453 b= 0.069472216
Epoch: 0598 loss= 0.109955117 w= 0.35122058 b= 0.070369326
Epoch: 0599 loss= 0.109874189 w= 0.35109597 b= 0.07126535
Epoch: 0600 loss= 0.109793462 w= 0.35097155 b= 0.07216028
Epoch: 0601 loss= 0.109712906 w= 0.35084733 b= 0.0730541
Epoch: 0602 loss= 0.109632559 w= 0.3507232 b= 0.07394684
Epoch: 0603 loss= 0.109552406 w= 0.35059932 b= 0.07483848
Epoch: 0604 loss= 0.109472468 w= 0.35047552 b= 0.07572903
Epoch: 0605 loss= 0.109392710 w= 0.35035184 b= 0.07661849
Epoch: 0606 loss= 0.109313153 w= 0.35022837 b= 0.07750685
Epoch: 0607 loss= 0.109233789 w= 0.35010505 b= 0.078394115
Epoch: 0608 loss= 0.109154619 w= 0.34998187 b= 0.0792803
Epoch: 0609 loss= 0.109075636 w= 0.34985888 b= 0.08016539
Epoch: 0610 loss= 0.108996883 w= 0.34973598 b= 0.081049405
Epoch: 0611 los

Epoch: 0749 loss= 0.099731028 w= 0.3340413 b= 0.19395582
Epoch: 0750 loss= 0.099675149 w= 0.33393782 b= 0.19470039
Epoch: 0751 loss= 0.099619448 w= 0.33383444 b= 0.19544403
Epoch: 0752 loss= 0.099563874 w= 0.33373117 b= 0.19618678
Epoch: 0753 loss= 0.099508412 w= 0.333628 b= 0.1969286
Epoch: 0754 loss= 0.099453107 w= 0.333525 b= 0.19766952
Epoch: 0755 loss= 0.099397928 w= 0.33342218 b= 0.19840951
Epoch: 0756 loss= 0.099342875 w= 0.3333194 b= 0.19914863
Epoch: 0757 loss= 0.099287987 w= 0.33321676 b= 0.19988684
Epoch: 0758 loss= 0.099233195 w= 0.3331143 b= 0.20062411
Epoch: 0759 loss= 0.099178568 w= 0.33301193 b= 0.2013605
Epoch: 0760 loss= 0.099124067 w= 0.3329097 b= 0.20209599
Epoch: 0761 loss= 0.099069692 w= 0.3328076 b= 0.2028306
Epoch: 0762 loss= 0.099015430 w= 0.33270562 b= 0.20356429
Epoch: 0763 loss= 0.098961338 w= 0.33260372 b= 0.20429705
Epoch: 0764 loss= 0.098907366 w= 0.33250204 b= 0.20502894
Epoch: 0765 loss= 0.098853521 w= 0.3324004 b= 0.20575991
Epoch: 0766 loss= 0.0987997

Epoch: 0905 loss= 0.092483170 w= 0.31933695 b= 0.29973745
Epoch: 0906 loss= 0.092445076 w= 0.3192515 b= 0.30035236
Epoch: 0907 loss= 0.092407092 w= 0.31916615 b= 0.30096653
Epoch: 0908 loss= 0.092369206 w= 0.31908092 b= 0.30157995
Epoch: 0909 loss= 0.092331417 w= 0.31899574 b= 0.30219263
Epoch: 0910 loss= 0.092293687 w= 0.31891063 b= 0.30280456
Epoch: 0911 loss= 0.092256084 w= 0.31882566 b= 0.3034157
Epoch: 0912 loss= 0.092218570 w= 0.31874081 b= 0.30402607
Epoch: 0913 loss= 0.092181139 w= 0.31865606 b= 0.30463576
Epoch: 0914 loss= 0.092143804 w= 0.3185714 b= 0.3052447
Epoch: 0915 loss= 0.092106543 w= 0.31848684 b= 0.3058529
Epoch: 0916 loss= 0.092069410 w= 0.31840244 b= 0.30646032
Epoch: 0917 loss= 0.092032306 w= 0.3183181 b= 0.307067
Epoch: 0918 loss= 0.091995336 w= 0.31823394 b= 0.30767286
Epoch: 0919 loss= 0.091958478 w= 0.3181498 b= 0.30827805
Epoch: 0920 loss= 0.091921687 w= 0.31806576 b= 0.3088825
Epoch: 0921 loss= 0.091884971 w= 0.31798184 b= 0.30948627
Epoch: 0922 loss= 0.0918

In [None]:
with tf.Session() as sess:
    #Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, final_w * train_X + final_b, label='Fitted line')
    plt.legend()
    plt.show()

## Logistic Regression 

Logistic regression refers to a classifier that classifies an observation into one of two classes, and multinomial logistic regression is used when classifying into more than two classes, but the two terms are used interchangebly. We will look at an example where we want to classify handwritten digits into one of 10 classes: 0-9

The logistic regression model works in a similar fashion to a linear regression model except that the final sum of the product between the weights and dependent variable is passed through a function that transforms the input to lie between 0 and 1. This function is called the logistic function, giving the model its name.

We can create a logistic regressor in the same way as we created a linear regression computational graph.

We will use the MNIST database of <a href=http://yann.lecun.com/exdb/mnist/>handwritten digits</a> for this example.

TensorFlow provides easy access to some sample data sets. We can access the mnist dataset a TensorFlow dataset that contains 60,000 training images and their corresponding labels as well as 10,000 testing images and their corresponding labels.
- Each image is 28 pixels by 28 pixels
- Each image represents a digit between 0 and 9
- The labels are one-hot encoded => each label is a 1x10 vector

In [None]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

#### Logistic regression model specification:
\begin{equation*}
y = g(\Theta^{T} X) = g(\sum_{i=0}^{n} \theta_{i} x_{i}) \text{ where } x_0 = 1
\end{equation*}
- g is the <a href = https://en.wikipedia.org/wiki/Softmax_function>softmax function</a>, an extension of the logistic function to multiple dimensions
    - It squashes all dimensions of a vector input to lie between 0-1
    - Ensures that the sum of the magnitudes of each dimension is 1

- Tensorflow provides an implementation of the softmax function, which we can use.

- The logistic regression model is trained by minimizing a cross-entropy loss.

New concepts:
5. Softmax : https://www.tensorflow.org/api_docs/python/tf/nn/softmax
6. Reduce_mean : https://www.tensorflow.org/api_docs/python/tf/reduce_mean

In [None]:
# Specifying the model
# tf Graph Input
x = tf.placeholder(dtype = tf.float32, shape = [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder(dtype = tf.float32, shape = [None, 10]) # 0-9 digits recognition => 10 classes

# Set model weights
W = tf.Variable(initial_value = tf.zeros([784, 10]))
b = tf.Variable(initial_value = tf.zeros([10]))


# Construct model
prediction = tf.nn.softmax(tf.add(b, tf.matmul(x, W))) # Softmax

#lamb = 0.01 #Uncomment this line to enable regularization

# Minimize error using cross entropy loss
# reduce_mean calculates the mean across dimensions of a tensor
loss = tf.reduce_mean(-tf.reduce_sum(y*tf.log(prediction), axis=1) # + lamb * (tf.nn.l2_loss(W) + tf.nn.l2_loss(b)))
                     )# uncomment on the line above to enable regularization and delete the paranthesis on this line
                      
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

In [None]:
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 5

This time, we will use Batch Gradient Descent to optimize our loss function. However, Tensorflow is agnostic to the difference between Stochastic and Batch gradient descent. The only difference is in how we have defined our input placeholders in our computational graph and what we pass to the feed_dict parameter when we run the optimizer in a session.

In [None]:
# Start training
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_loss = 0.
        total_batch = int(mnist.train.num_examples/batch_size) # there would be 600 batches
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            _, c = sess.run([optimizer, loss], feed_dict={x: batch_xs,
                                                          y: batch_ys})
            # Compute average loss
            avg_loss += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "loss=", "{:.9f}".format(avg_loss))

    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))

The advantages of Tensorflow lie in its extensability and ease of writing complex machine learning models with comparitive simplicity. We can modify the above Logistic regression model into a regularized logistic regression model with the simple addition of 2 code fragments.  

```python
lamb = 0.01 #This is the hyperparameter that controls the strength of the regularization

# Minimize error using cross entropy loss
# reduce_mean calculates the mean across dimensions of a tensor
loss = tf.reduce_mean(-tf.reduce_sum(y*tf.log(prediction), axis=1)  + lamb * (tf.nn.l2_loss(W) + tf.nn.l2_loss(b)))
                     
```
We use an L2 regularizer by just applying TF's inbuilt L2 regularizer on the parameters of our models