# A Brief Intro to Tensorflow (TF)
For this tutorial, we'll get started by talking a little bit about how TF works and then build a small demo classification using TF to run on CPU. My goal is for everyone to be able to go home and implement their own TF models, or at least be able to follow what some source code means for more advanced applications. I'll present a demo of logistic regression in TF, since logistic regression is a great foundation for learning about more advanced neural network architectures. You can think of logistic regression as the fundamental building block of deep learning networks. 

For a TF tutorial straight from the source, you can go to: https://www.tensorflow.org/tutorials/wide It goes more in-depth about how to select and combine features within tensorflow, whereas I did most of that work in pandas, because it's easier and more straightforward. However, if you're building a production system that will learn, update, and repeat, there are some helpful functions for handling unknown datatypes in that tutorial.

## Tensors and Graphs
At its core, TF is made up of tensors. According to the Wikipedia entry:
> In mathematics, tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps.

Thanks, Wikipedia. For TF neural networks, the "data structures" you mainly need to be concerned about scalars, vectors, matrices, and stacks of matrices. TF encodes these data structures into tensor objects instead of leaving them as scalars, vectors, etc, and it does this because of the graph functionality, which I'll talk about later. If you're familiar with numpy arrays, TF works with those, so that's not a problem. The mathematical functions you mainly need to know are the dot product and linear maps (linear transformations) for manipulating the input and transformed data structures.

## Graphs?
Yea, buddy. TF is setup so that the *tensors* **flow** through *graphs*. Get it? So, going back to the tensors I just introduced, you would input your data into a TF tensor object like so,  `A = tf.constant(1234) `, and then to display that constant, you would have to run it through a TF *session*, like so:
```
A = tf.constant(1234)
with tf.Session() as sess:
    output = sess.run(A)
    print(output)
```

Why do we need the session? From the documentation:
> A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.

Basically, it's accounting for the data that you have, how you will input that data (usually in a pipeline), creating the operations on the data, and executing, all before you have even imported the data. This gives flexibility in how you import data, whether in batches (as is typical with data >1Gb e.g. images, tons of pdfs, etc.) or all at once, and how to allocate where operations take place, whether on a CPU or a GPU. This flexibility is exactly what AlphaGo used to create a policy and value networks, using deep learning (on GPU) and policies from tree decisions (CPU). The following image gives a better representation of the flexibility allowed by using the session function.
![Image of Logistic Regression](https://image.slidesharecdn.com/k2jeffdean-160609173832/95/large-scale-deep-learning-with-tensorflow-31-638.jpg?cb=1465493958)

The rest of this tutorial will be focused on using a local machine with a reasonable amount of data and performing calculations on the CPU. 

Let's get started by building a logistic regression with some example data. The goal will be to predict people who will default on their next credit card payment.  You can download the data from this UCI repository: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

Also, here is an example of what the output of a finalized tensorflow model would look like on the Tensorboard. This is where we want to get to.
![Image of Logistic Regression](https://www.tensorflow.org/images/tensors_flowing.gif)

One of the nuances that drive people crazy about TF is how you input data. There are multiple methods for inputting your data, each of which depends on where your data is coming from. The TF Dataset API has much more capabilities for importing and using data, but for this first example I'll stick with the basics.

For this tutorial, we'll build a Logistic Regression classifier from the ground up to determine whether someone from the dataset will default on their next credit card payment. Logisitic regression (LogR) classifiers are the 'lego blocks' for Neural Networks, so an understanding of LogR will help with an understanding of the whole. To perform LogR, we need to basically model:
$$sigmoid(W*x + b)$$

This sigmoid function is the result of multiplying the inputs by the randomly initialized weights of the system and adding the bias, like so:

![Image of Logistic Regression](https://i.stack.imgur.com/bA57S.png)

The 'Intelligent' part of the machine learning comes when we define the 'cost function' for our model. Basically, for each data point, or in this case, customer of the credit card company, we will use the cost function to find out how different the calculated output is from the actual output. Gradient descent is where the learning will be happening, or how we change the weights (W) in `W*x+b`. Once the training by gradient descent is done, we'll test it on some hold-out data. We do this to avoid creating a model that's really good at detecting what it already knows.

As a recap, we are doing the following in TF:
1. Defining a model to use (Logistic Regression)
2. Training it (changing weights by gradient descent from output of cost function)
3. Testing it against hold-out data

As with all ML, we need to start with preprocessing. TF is great for advanced deep learning but is not the tool to assess and clean data. Instead, I'll use pandas to assess, clean, and setup the data for conversion into a TF model.

In [47]:
# Necessary (for the most part) libraries
import tensorflow as tf
import pandas as pd
import numpy as np

# use whatever file name you saved the csv as in the pandas import statement below
defaults_df = pd.DataFrame.from_csv('default_of_credit_card_clients.csv')

# A little bit of pre-processing
new_header = defaults_df.iloc[0]
defaults_df = defaults_df[1:]
defaults_df.rename(columns = new_header)

# Renaming columns and adding target variable
new_header = list(new_header)
new_header[-1] = 'Y'
defaults_df.columns = new_header

# Changing to numeric - data was imported as string objects
defaults_df = defaults_df.apply(pd.to_numeric)

# Removing invalid education values
defaults_df = defaults_df[defaults_df.EDUCATION != 0]

# One-hot encoding
clf_df = pd.get_dummies(defaults_df, columns=["SEX", "EDUCATION", "MARRIAGE", "PAY_0", "PAY_2", "PAY_3", "PAY_4", "PAY_5", "PAY_6"],
               prefix=["SEX", "EDU", "MARRY", "PAY_0", "PAY_2", "PAY_3", "PAY_4", "PAY_5", "PAY_6"])

# Setting X (input) and Y(target) variables
X = clf_df.drop('Y', axis=1)
Y = clf_df.Y
Y = Y.values.reshape(29986,1)

In [42]:
len(list(X))

90

Preprocessing can be done in pandas, but if you're dealing with something like images, you'd want to make some functions to **normalize** and, potentially, **flatten** your pixel values. Flatten is mainly used for image processing with CNNs. Potentially later, we'll look at an example of preprocessing images and using a CNN (very, very, rough overview).

For this model, we now need to convert the data from a pandas dataframe to tensorflow objects. Since this is a small dataset that can fit in memory, we'll just import it as a constant.

Let's get to setting this thing up and visualized in Tensorboard!

In [4]:
# Clear Tensorboard graph for troubleshooting
tf.reset_default_graph()

num_features = len(list(X))
num_outputs = 1

# We use a placeholder to account for ambiguity of input data type
x_input = tf.placeholder(tf.float32, [None, num_features], name = 'inputs')

# Now create the variables to be trained
W = tf.Variable(tf.truncated_normal([num_features, num_outputs],         # One output and num_features inputs
                                    stddev=0.1), name = 'W')   # Truncated normal is a trick that can
                                                               # speed up gradient descent
b = tf.Variable(tf.zeros([num_outputs]), name = 'bias') # b can be initialized with a zero, but could also use truncated normal

# Create sigmoid(W*x + b) where sigmoid(x) = 1 / (1 + exp(-x)). Also called Logits.
y_output = tf.sigmoid(tf.matmul(x_input,W) + b, name='sigmoid')

# Create the cost Funtion using the real values from Y
y_labels = tf.placeholder(tf.float32, [None, num_outputs], name = 'actual_output')

# Calculate the cost using
# cost = tf.reduce_sum(tf.pow((y_labels - output), 2), name='cost')
cost = tf.reduce_mean(-tf.reduce_sum(y_labels * tf.log(y_output)))

# Train the model, where 0.0001 is the learning rate
learning_rate = 0.0001
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(y_output, 1), tf.argmax(y_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialize session to train the model - Just want to see the graph structure
with tf.Session() as sess:
    # Initialize the variables - since our graph will be small, can initialize all at once
    sess.run(tf.global_variables_initializer())
    # Write the graph to a local directory - will do this with training later

    file_writer = tf.summary.FileWriter('./logs/1', sess.graph)
    # Go to command line and write: $ tensorboard --logdir logs/1


We can see the fruits of our labor so far in the Tensorboard representation. As well, we can see that some of the operations that happen 'under the hood' for functions we used. You can dig in here if you're curious about what is happening in these areas of the graph. Notice how the arrows are directed. The arrows in tensorboard show dependencies, not how data moves around. This is most evident when we look at how the weights and bias are updated from gradient descent, with the arrows for both of those parameters pointing to the operation.

Now, let's make this look a little nicer so that we have larger chunks that can be isolated. After all, we only have our inputs, weights, biases, a matrix multiplication, sigmoid transformation, loss function, accuracy output, and gradients to compute. So 8 things. Let's compress these so they look like the image of a logistic regression from earlier. This means we only have our inputs and weights/bias layer, a sigmoid, output layer, and updating layer. So 4 layers. Awfully close to the 3 things I said we would learn. :^)

In [5]:
# Define Hyperparameters - just learning rate for now, but this is good practice
learning_rate = 0.0001

In [10]:
# Clear Tensorboard graph for troubleshooting
tf.reset_default_graph()

# Number of features and number of outputs
num_features = len(list(X))
num_outputs = 1

x = tf.placeholder(tf.float32, [None, num_features], name = 'inputs')
with tf.name_scope('Linear_Output'):
    W = tf.Variable(tf.truncated_normal([num_features, num_outputs],
                                        stddev=0.1), name = 'W')
    b = tf.Variable(tf.zeros([num_outputs]), name = 'bias')
    linear_output = tf.add(tf.matmul(x, W, name='mat_mul'), b, name='linear_output')

    
with tf.name_scope('Logits'):
    logits = tf.sigmoid(linear_output, name='logit')
    

y_labels = tf.placeholder(tf.float32, [None, num_outputs], name = 'actual_output')
with tf.name_scope('Cost'):
    cost = tf.reduce_mean(-tf.reduce_sum(y_labels * tf.log(logits))) # reduction_indices=[1]
    

with tf.name_scope('Train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    

with tf.name_scope('Accuracy'):
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y_labels, 1), name='correct_prediction')
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')


# Initialize session to train the model - Just want to see the graph structure
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    file_writer = tf.summary.FileWriter('./logs/3', sess.graph)
    # Go to command line and write: $ tensorboard --logdir logs/3

## Using Tensorboard to Monitor Your Model's Progress
Let's use this to find out how we can monitor our model's progress. We have all of the graph pieces nicely in place, so now we just need to utilize tensorboard's monitoring tools to understand how our model trains and updates weights. I'll introduce some more machine learning jargon, specifically the concept of an epoch. An epoch is one full forward and backward pass of the complete dataset. In our case, that means all of the rows in the imported csv files. Usually the files are much bigger than this one, which would require batching of the input data, which comes with a host of its own considerations. However, that's for later.

Here, we want to progress how the model is updating the weights and bias as it trains. This is helpful in understanding where we can introduce some efficiencies to speed up training. I'll compare two different ways to initialize weights in this case.

To do this comparison, we need to use another special kind of Tensorflow object, which is a Summary. A tf.summary class outputs a protocol buffer that contains the summarized data. In this case, we just want to see how weights/bias are changing, and how the accuracy is (hopefully) increasing. We'll look at tf.summary.scalar and tf.summary.histogram summaries to inspect the accuracy and weights/bias, respectively.

In [11]:
# Hyperparameters
learning_rate = 0.0001
epochs = 1000

In [55]:
# Clear Tensorboard graph for troubleshooting
tf.reset_default_graph()

# Number of features and number of outputs
num_features = len(list(X))
num_outputs = 1

x = tf.placeholder(tf.float32, [None, num_features], name = 'inputs')

with tf.name_scope('Linear_Output'):
    W = tf.Variable(tf.truncated_normal([num_features, num_outputs],
                                        stddev=0.1), name = 'W')
    b = tf.Variable(tf.zeros([num_outputs]), name = 'bias')
    linear_output = tf.add(tf.matmul(x, tf.cast(W, tf.float32), name='mat_mul'), b, name='linear_output')
    tf.summary.histogram('Weights', W)
    tf.summary.scalar('Bias', b)    

    
with tf.name_scope('Logits'):
    logits = tf.sigmoid(linear_output, name='logit')
    

y_labels = tf.placeholder(tf.float32, [None, num_outputs], name = 'y_labels')
with tf.name_scope('Cost'):
    cost = tf.reduce_mean(-tf.reduce_sum(y_labels * tf.log(logits)))
    tf.summary.scalar('Cost', cost)
    

with tf.name_scope('Train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    

# with tf.name_scope('Accuracy'):
#     correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y_labels, 1), name='correct_prediction')
#     accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')
#     tf.summary.scalar('Accuracy', accuracy)

# Merge all of the summary statistics for convenience
# This helps with MUCH larger models
merged = tf.summary.merge_all()

# Finally, time to train and test the model
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # Since we now have training and testing data
    file_writer = tf.summary.FileWriter('./logs/4', sess.graph)
    # Go to command line and write: $ tensorboard --logdir logs/4
    file_writer.add_graph(sess.graph)
    for e in range(epochs):
#         s = sess.run(merged, feed_dict={x: X.as_matrix(), y_labels:Y})
#         file_writer.add_summary(s,e)
        sess.run(optimizer, feed_dict={x: X.as_matrix(), y_labels:Y})
        # Add an epoch counter


In [54]:
Y.shape

(29986, 1)

## Creating a Model with Regularization
To do.