# DEVNET Tensorflow GPU Matrix Lab

Welcome to the DEVNET Tensorflow lab!  In this lab our goal is to use Tensorflow to show you how GPUs help improve performance of data science applications. There are several introductions to Tensorflow included in this Jupyter Notebook. In this walkthrough we will explain some of the benefits of GPUs and show how Tensorflow works on them!  Let's start by loading some of the libraries we will be using for this lab. 

In [52]:
# First we load the Tensorflow library.  Tensorflow is built to numerical computation using data graphs.  
import tensorflow as tf
# Then we load NumPy.  NumPy is a library built for matrix functions on Python. 
import numpy as np
# Finally, import the timer so we can time how long it takes to run operations. 
from timeit import default_timer as timer

## 1. Making Matrix Operations Faster

GPUS were intended to render graphics on computers for games and CAD software.  To do this efficiently and fast the
idea was to get a ton of processor cores and make them extremely good at doing matrix operations.  Things were humming along nicely until Machine Learning and Crypto currency started gaining steam.  It turns out that matrix operations
are exactly what is needed to improve performance of machine learning as well as proof of work for crypto
currencies.  
In the code below we will launch tensorflow and then compare the time it takes to perform some matrix operations on the GPU and the CPU.  Don't get hung up on the syntax.  The point of this exercise is to show you the huge improvements in performance we can get by running Tensorflow on a GPU. 

### 1.1 Exercise:  Run the below tensorflow on GPUs and on a CPU and compare the difference

In the code below we load Tensorflow and some other python libraries.  Normally when we start creating a Tensorflow graph we just use the default device.  However, we can tell what device we want the graph to run on by specifying it with the ```with tf.device()``` directive.  In the code below try two different kinds of devices: ```/gpu:0``` and ```/cpu:0```. Be sure to put them in quotes!

In [51]:
shape = (15000, 15000)  # The shape of our matrix is 15,000 x 15,000.  That is really big!
with tf.device():   # Try different devices: "/cpu:0" to "/gpu:0" and see the time difference. 
    # create the computational graph.  This doesn't run it, it just gets it ready!
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)
    
start = timer()
# now start running the tensorflow computational graph to get the final answer. 
with tf.Session() as session:
    result = session.run(sum_operation)
duration = timer() - start
print("Duration: {:0.2f} seconds".format(duration))

TypeError: device() missing 1 required positional argument: 'device_name_or_function'

If you got TypeError: device() missing, be sure to fill in the value for ```tf.device()```.  

The CPU time takes about 6.5 seconds.  The same operation on the GPU takes 1.7 seconds. While this example is pretty contrived, the take away is the same.  When we run operations on deep neural networks we are doing millions of these operations.  