# Deep Nets with TF Abstractions

Let's explore a few of the various abstractions that TensorFlow offers. You can check out the tf.contrib documentation for more options.

# The Data

In [1]:
import pandas as pd
import numpy as np
import keras

np.random.seed(2)

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
data = pd.read_csv('Numeric_case3data.csv')

In [3]:
data = data.drop(['studentID'],axis=1)
data.head()

Unnamed: 0,grade,year,dropped,zip,ethnicity,sex,gpa,subsidizedLunches,employmentHours,hrsWifiPerWeek,sanctions,librarySwipesPerWeek,apClasses,athleticSeasons
0,12,2012,0,15232,5,2,1.89,0,0,4,1,1,1,0
1,12,2012,0,15206,5,2,2.21,1,0,10,2,3,0,0
2,12,2012,0,15206,5,2,2.72,0,0,8,2,6,0,0
3,12,2012,1,15206,5,2,1.67,0,0,4,2,6,2,0
4,12,2012,0,15201,5,2,2.0,2,0,8,2,5,0,0


In [4]:
feat_data = data.iloc[:, data.columns != 'dropped']
labels = data.iloc[:, data.columns == 'dropped']

### Train Test Split

As with any machine learning model, you should do some sort of test train split so you can evaluate your model's performance. Because this particular dataset is small, we'll just do a simple 70/30 train test split and we won't have any holdout data set.

Again, we'll use SciKit-Learn here for convienence:

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(feat_data,
                                                    labels,
                                                    test_size=0.3,
                                                   random_state=101)

In [6]:
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

### Scale the Data

With Neural Network models, its important to scale the data, again we can do this easily with SciKit Learn (I promise we'll get to TensorFlow soon!)

In [7]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

Keep in mind we only fit the scaler to the training data, we don't want to assume we'll have knowledge of future test data. 

In [8]:
scaled_x_train = scaler.fit_transform(X_train)
scaled_x_test = scaler.transform(X_test)

# Abstractions

With our data set up, its now time to explore some TensorFlow abstractions! Let's start with the Estimator API, its one the abstractions featured in the official documentation tutorials.

## Estimator API

We first start by importing both tensorflow and the estimator API.

In [9]:
import tensorflow as tf
from tensorflow import estimator 

The estimator API can perform both Deep Neural Network Classification and Regression, as well as straight Linear Classification and Linear Regression. You can  

In [10]:
estimator.DNNClassifier
estimator.DNNRegressor
#estimator.

tensorflow_estimator.python.estimator.canned.dnn.DNNRegressor

In [11]:
X_train.shape

(12519, 13)

In [12]:
feat_cols = [tf.feature_column.numeric_column("x", shape=[13])]

In [13]:
deep_model = estimator.DNNClassifier(hidden_units=[13,13,13],
                            feature_columns=feat_cols,
                            n_classes=2,
                            optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.01) )

W0222 12:11:35.150440 12504 estimator.py:1811] Using temporary folder as model directory: C:\Users\caiyi\AppData\Local\Temp\tmpg9lanbk3


In [14]:
input_fn = estimator.inputs.numpy_input_fn(x={'x':scaled_x_train},y=y_train,shuffle=True,batch_size=10,num_epochs=5)

In [15]:
deep_model.train(input_fn=input_fn,steps=500)

W0222 12:11:36.829246 12504 deprecation.py:323] From C:\Users\caiyi\Anaconda3\lib\site-packages\tensorflow\python\training\training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0222 12:11:36.838230 12504 deprecation.py:323] From C:\Users\caiyi\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\inputs\queues\feeding_queue_runner.py:62: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
W0222 12:11:36.840190 12504 deprecation.py:323] From C:\Users\caiyi\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\inputs\queues\feeding_functions.py:500: add_queue_runner (from

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x1be179f46d8>

In [16]:
input_fn_eval = estimator.inputs.numpy_input_fn(x={'x':scaled_x_test},shuffle=False)

In [17]:
preds = list(deep_model.predict(input_fn=input_fn_eval))

W0222 12:11:38.887045 12504 deprecation.py:323] From C:\Users\caiyi\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


In [18]:
predictions = [p['class_ids'][0] for p in preds]

In [19]:
from sklearn.metrics import confusion_matrix,classification_report

In [20]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.94      1.00      0.97      5037
           1       0.00      0.00      0.00       329

    accuracy                           0.94      5366
   macro avg       0.47      0.50      0.48      5366
weighted avg       0.88      0.94      0.91      5366



  'precision', 'predicted', average, warn_for)


____________
______________

# TensorFlow Keras

### Create the Model

In [21]:
from tensorflow.contrib.keras import models

In [22]:
dnn_keras_model = models.Sequential()

### Add Layers to the model

In [23]:
from tensorflow.contrib.keras import layers

In [24]:
dnn_keras_model.add(layers.Dense(units=13,input_dim=13,activation='relu'))

In [25]:
dnn_keras_model.add(layers.Dense(units=13,activation='relu'))
dnn_keras_model.add(layers.Dense(units=13,activation='relu'))

In [26]:
dnn_keras_model.add(layers.Dense(units=2,activation='softmax'))

### Compile the Model

In [27]:
from tensorflow.contrib.keras import losses,optimizers,metrics

In [28]:
# explore these
# losses.

In [29]:
#optimizers.

In [30]:
losses.sparse_categorical_crossentropy

<function tensorflow.python.keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)>

In [31]:
dnn_keras_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

### Train Model

In [32]:
dnn_keras_model.fit(scaled_x_train,y_train,epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x1be1c269080>

In [33]:
predictions = dnn_keras_model.predict_classes(scaled_x_test)

In [34]:
print(classification_report(predictions,y_test))

              precision    recall  f1-score   support

           0       0.98      0.98      0.98      5061
           1       0.64      0.69      0.66       305

    accuracy                           0.96      5366
   macro avg       0.81      0.83      0.82      5366
weighted avg       0.96      0.96      0.96      5366



# Layers API

https://www.tensorflow.org/tutorials/layers

## Formating Data

In [35]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [36]:

data = pd.read_csv('Numeric_case3data.csv')
data = data.drop(['studentID','zip'],axis=1)


data.head()

Unnamed: 0,grade,year,dropped,ethnicity,sex,gpa,subsidizedLunches,employmentHours,hrsWifiPerWeek,sanctions,librarySwipesPerWeek,apClasses,athleticSeasons
0,12,2012,0,5,2,1.89,0,0,4,1,1,1,0
1,12,2012,0,5,2,2.21,1,0,10,2,3,0,0
2,12,2012,0,5,2,2.72,0,0,8,2,6,0,0
3,12,2012,1,5,2,1.67,0,0,4,2,6,2,0
4,12,2012,0,5,2,2.0,2,0,8,2,5,0,0


In [37]:
feat_data = data.iloc[:, data.columns != 'dropped']
labels = data.iloc[:, data.columns == 'dropped']

In [38]:
X_train, X_test, y_train, y_test = train_test_split(feat_data,
                                                    labels,
                                                    test_size=0.3,
                                                   random_state=101)

In [39]:
scaler = MinMaxScaler()
scaled_x_train = scaler.fit_transform(X_train)
scaled_x_test = scaler.transform(X_test)
# ONE HOT ENCODED
onehot_y_train = pd.get_dummies(y_train).as_matrix()
one_hot_y_test = pd.get_dummies(y_test).as_matrix()

  """
  


### Parameters

In [40]:
num_feat = 12
num_hidden1 = 13
num_hidden2 = 13
num_outputs = 1
learning_rate = 0.01

In [41]:
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected

### Placeholder

In [42]:
X = tf.placeholder(tf.float32,shape=[None,num_feat])
y_true = tf.placeholder(tf.float32,shape=[None,1])

### Activation Function

In [43]:
actf = tf.nn.relu

### Create Layers

In [44]:
hidden1 = fully_connected(X,num_hidden1,activation_fn=actf)

In [45]:
hidden2 = fully_connected(hidden1,num_hidden2,activation_fn=actf)

In [46]:
output = fully_connected(hidden2,num_outputs)

### Loss Function

In [47]:
loss = tf.losses.softmax_cross_entropy(onehot_labels=y_true, logits=output)

### Optimizer

In [48]:
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)

### Init

In [49]:
init = tf.global_variables_initializer()

In [50]:
training_steps = 1000
with tf.Session() as sess:
    sess.run(init)
    
    for i in range(training_steps):
        sess.run(train,feed_dict={X:scaled_x_train,y_true:y_train})
        
    # Get Predictions
    logits = output.eval(feed_dict={X:scaled_x_test})
    
    preds = tf.argmax(logits,axis=1)
    
    results = preds.eval()

In [51]:
from sklearn.metrics import confusion_matrix,classification_report
print(classification_report(results,y_test))

              precision    recall  f1-score   support

           0       1.00      0.94      0.97      5366
           1       0.00      0.00      0.00         0

    accuracy                           0.94      5366
   macro avg       0.50      0.47      0.48      5366
weighted avg       1.00      0.94      0.97      5366



  'recall', 'true', average, warn_for)
