# Homework 5 Part I: Spam Classification in SciKit-Learn

This assignment uses data from https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

Data processing was inspired by https://www.kaggle.com/overflow012/d/uciml/sms-spam-collection-dataset/text-preprocessing-classification

Before getting started, run this to upgrade SciKit-Learn to 0.19.1.  Then go to Kernel | Restart in Jupyter.

In [1]:
! pip install -U scikit-learn

Requirement already up-to-date: scikit-learn in /Users/vikramnagashoka/miniconda3/envs/lane_lines/lib/python3.5/site-packages
[33mYou are using pip version 9.0.1, however version 9.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import pandas as pd

####
# Helper function:
#  Return the k most frequently appearing keywords in the dataframe
def top_k(data_df, vec, k):
    X = vec.fit_transform(data_df['sms'].values)
    labels = vec.get_feature_names()
    
    return pd.DataFrame(columns = labels, data = X.toarray()).sum().sort_values(ascending = False)[:k]



sms_df = pd.read_csv('spam.csv', encoding='latin-1')
sms_df.columns = ['class', 'sms', 'a', 'b', 'c']

## Step 1.1 Data Wrangling

Clean up sms_df.  Delete 'a', 'b', 'c', lowercase the sms text

In [3]:
## TODO: Data wrangling / cleaning
sms_df = sms_df.drop(['a','b','c'],axis=1)

## Step 1.1 Results

In [4]:
sms_df

Unnamed: 0,class,sms
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
5,spam,FreeMsg Hey there darling it's been 3 week's n...
6,ham,Even my brother is not like to speak with me. ...
7,ham,As per your request 'Melle Melle (Oru Minnamin...
8,spam,WINNER!! As a valued network customer you have...
9,spam,Had your mobile 11 months or more? U R entitle...


In [5]:
sms_df.groupby('class').describe()

Unnamed: 0_level_0,sms,sms,sms,sms
Unnamed: 0_level_1,count,unique,top,freq
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,653,Please call our customer service representativ...,4


## Step 1.2. Vectorizing the Text

In [6]:
## TODO: Generate feature vectors
from sklearn import feature_extraction
vectorizer = feature_extraction.text.CountVectorizer(decode_error = 'ignore', stop_words = 'english')
X = vectorizer.fit_transform(sms_df['sms'].values)
X

<5572x8404 sparse matrix of type '<class 'numpy.int64'>'
	with 43478 stored elements in Compressed Sparse Row format>

## Let's see the most frequent terms in spam

In [7]:
top_spam = top_k(sms_df[sms_df['class'] == 'spam'], vectorizer, 30)

top_spam

free          224
txt           163
ur            144
mobile        127
text          125
stop          121
claim         113
reply         104
www            98
prize          93
just           78
cash           76
won            76
uk             74
150p           71
send           70
new            69
nokia          67
win            64
urgent         63
tone           60
week           60
50             57
contact        56
service        56
msg            54
com            54
18             51
16             51
guaranteed     50
dtype: int64

## Vs ham...

In [8]:
top_ham = top_k(sms_df[sms_df['class'] == 'ham'], vectorizer, 30)

top_ham

gt       318
lt       316
just     293
ok       287
ll       265
ur       241
know     236
good     233
got      232
like     232
come     227
day      209
time     201
love     199
going    169
home     165
want     164
lor      162
need     158
sorry    157
don      151
da       150
today    139
later    135
dont     132
did      129
send     129
think    128
pls      123
hi       122
dtype: int64

## Step 1.2.2 Regularize URLs and Numbers

Import _regularize_ here, and use *regularize_urls* and *regularize_numbers*
on the columns.

In [9]:
# TODO: Regularize/tokenize URLs and numbers
import regularize
sms_df['sms'] = regularize.regularize_urls(sms_df['sms'])
sms_df['sms'] = regularize.regularize_numbers(sms_df['sms'])

## Step 1.2.2 Results

Re-run the CountVectorizer, re-create vector X, and re-compute the top-30 spam terms.  Output the top-30 spam terms.

In [10]:
# TODO: Top-30 spam terms
vectorizer = feature_extraction.text.CountVectorizer(decode_error = 'ignore', stop_words = 'english')
X_regularize = vectorizer.fit_transform(sms_df['sms'].values)

top_spam_reg = top_k(sms_df[sms_df['class'] == 'spam'], vectorizer, 30)     
top_spam_reg


_num_         3289
free           228
txt            165
ur             144
_url_          141
mobile         129
stop           126
text           125
claim          113
reply          104
prize           92
just            78
won             76
cash            76
nokia           71
send            70
win             70
new             69
urgent          63
week            60
tone            59
box             57
msg             56
service         56
contact         56
guaranteed      50
ppm             49
customer        49
mins            47
phone           46
dtype: int64

## Step 1.3 Creating Features

Take the top-30 spam + top-30 ham words, and create a new CountVectorizer,
called *relevant_vec*, which _only_ includes those words.
See http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html.

In [11]:
# TODO: Vector of 'important' words
top_ham_reg = top_k(sms_df[sms_df['class'] == 'ham'], vectorizer, 30)
top_ham_reg

_num_    1227
gt        318
lt        316
just      293
ok        287
ll        265
ur        241
know      236
good      233
got       233
like      232
come      228
day       214
time      201
love      199
going     169
home      165
want      165
lor       162
need      158
sorry     157
don       151
da        150
today     138
later     135
dont      132
send      129
did       129
think     128
tell      123
dtype: int64

In [12]:
spam_ham = list(top_ham_reg.keys() ) + list(top_spam_reg.keys())
relevant_vec = feature_extraction.text.CountVectorizer(decode_error = 'ignore', stop_words = 'english', vocabulary=list(set(spam_ham)))    


In [13]:
import sklearn.model_selection as ms
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np

# X is the feature array, based off relevant words
X = relevant_vec.fit_transform(sms_df['sms'].values).toarray()

# Compute the length of each sms message, normalized
# by max length
Xlen = np.zeros((X.shape[0],1))
inx = 0
for v in sms_df['sms'].values:
        Xlen[inx,0] = len(v)
        inx += 1
Xlen = Xlen / max(Xlen)
# Add the length as another feature
X = np.hstack((X, Xlen))

y = np.array((sms_df['class'] == 'spam').astype(int))

# Now we split...
X_train, X_test, y_train, y_test = ms.train_test_split(X, 
                                                    y, test_size=0.2, random_state=42)

X_train

array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.09110867],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.16684962],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.05049396],
       ..., 
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.04939627],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.02854007],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.03841932]])

## Step 1.4 Classifier Evaluation

In [14]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
import sklearn.model_selection as ms
from sklearn.linear_model import LogisticRegression
import numpy as np

# Results, as a list of dictionaries
classifier_results = []

In [15]:
## Sample depth-2 decision tree
#Decision trees
max_depth = 5
for i in range(1,max_depth+1):
    dt_model = DecisionTreeClassifier(max_depth=i)
    dt_model.fit(X_train, y_train)
    y_pred_test = dt_model.predict(X_test)
    test_score = dt_model.score(X_test, y_test)
    classifier_results.append({'Classifier': 'DecTree', 'Depth': i, 'Score': test_score})

# TODO: Code for creating and testing classifiers mentioned in Step 1.4 of HW document

In [16]:
#Logistic Regression L1 and L2 regularization

lr = LogisticRegression(solver='liblinear', penalty='l1',random_state=42)
lr.fit(X_train, y_train)
y_pred_test = lr.predict(X_test)
test_score = lr.score(X_test, y_test)
classifier_results.append({'Classifier': 'LogReg-L1', 'Score': test_score})

lr_l2 = LogisticRegression(solver='liblinear', penalty='l2',random_state=42)
lr_l2.fit(X_train, y_train)
y_pred_test = lr_l2.predict(X_test)
test_score = lr_l2.score(X_test, y_test)
classifier_results.append({'Classifier': 'LogReg-L1', 'Score': test_score})

In [17]:
#Support vector Machines

svm = SVC(random_state=42)
svm.fit(X_train, y_train)
y_pred_test = svm.predict(X_test)
test_score = svm.score(X_test, y_test)
classifier_results.append({'Classifier': 'SVM', 'Score': test_score})

## Step 1.4 Results

In [18]:
pd.DataFrame(classifier_results)

Unnamed: 0,Classifier,Depth,Score
0,DecTree,1.0,0.93991
1,DecTree,2.0,0.93991
2,DecTree,3.0,0.947085
3,DecTree,4.0,0.95157
4,DecTree,5.0,0.961435
5,LogReg-L1,,0.9713
6,LogReg-L1,,0.970404
7,SVM,,0.9713


## Step 2.0 Ensembles

In [19]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier

## Compute ensemble classifier results here

In [20]:
# TODO: Code for classifier construction and testing mentioned in Step 2.0 of HW document
RFC = RandomForestClassifier(n_estimators=31, random_state=314)
RFC.fit(X_train, y_train)
y_pred_test = RFC.predict(X_test)
test_score = RFC.score(X_test, y_test)

classifier_results.append({'Classifier': 'RandomForest','Count':31, 'Score': test_score})

In [21]:
#Decision tree
dec_tree = DecisionTreeClassifier(random_state=42)
BC = BaggingClassifier(random_state=314, n_estimators=31, base_estimator=dec_tree) 
BC.fit(X_train, y_train)
y_pred_test = BC.predict(X_test)
test_score = BC.score(X_test, y_test)
classifier_results.append({'Classifier': 'Bag-DecTree','Count':31, 'Score': test_score})


In [22]:
#Logistic regression - L1

BC_lr = BaggingClassifier(base_estimator=lr, n_estimators=31, random_state=314) 
BC_lr.fit(X_train, y_train)
y_pred_test = BC_lr.predict(X_test)
test_score = BC_lr.score(X_test, y_test)
classifier_results.append({'Classifier': 'Bag-LogReg-L1','Count':31, 'Score': test_score})

In [23]:
#Logistic regression - L2

BC_lr_l2 = BaggingClassifier(base_estimator=lr_l2, n_estimators=31, random_state=314) 
BC_lr_l2.fit(X_train, y_train)
y_pred_test = BC_lr_l2.predict(X_test)
test_score = BC_lr_l2.score(X_test, y_test)
classifier_results.append({'Classifier': 'Bag-LogReg-L2','Count':31, 'Score': test_score})

In [24]:
#Bagging SVM


BC_svm = BaggingClassifier(base_estimator=svm, n_estimators=31, random_state=314)
BC_svm.fit(X_train, y_train)
y_pred_test = BC_svm.predict(X_test)
test_score = BC_svm.score(X_test, y_test)
classifier_results.append({'Classifier': 'Bag-SVM','Count':31, 'Score': test_score})

In [25]:
# AdaBoost Decision tree
dec_tree_boost = DecisionTreeClassifier(random_state=42)
ADC = AdaBoostClassifier(random_state=314,n_estimators=31,base_estimator=dec_tree_boost)
ADC.fit(X_train, y_train)
y_pred_test = ADC.predict(X_test)
test_score = ADC.score(X_test, y_test)
classifier_results.append({'Classifier': 'Boost-DecTree','Count':31, 'Score': test_score})

In [26]:
#AdaBoost Logistic regression(L1)

ADC_lr = AdaBoostClassifier(base_estimator=lr,n_estimators=31, random_state=314) 
ADC_lr.fit(X_train, y_train)
y_pred_test = ADC_lr.predict(X_test)
test_score = ADC_lr.score(X_test, y_test)
classifier_results.append({'Classifier': 'Boost-LogReg-L1','Count':31, 'Score': test_score})

In [27]:
#AdaBoost Logistic regression(L2)

ADC_lr_l2 = AdaBoostClassifier(base_estimator=lr_l2,n_estimators=31,random_state=314) 
ADC_lr_l2.fit(X_train, y_train)
y_pred_test = ADC_lr_l2.predict(X_test)
test_score = ADC_lr_l2.score(X_test, y_test)
classifier_results.append({'Classifier': 'Boost-LogReg-L2','Count':31, 'Score': test_score})

In [28]:
#Adaboost SVM

svm_boost = SVC(random_state=42)
ADC_svm = AdaBoostClassifier(base_estimator=svm_boost, algorithm='SAMME',n_estimators=31,random_state=314)
ADC_svm.fit(X_train, y_train)
y_pred_test = ADC_svm.predict(X_test)
test_score = ADC_svm.score(X_test, y_test)
classifier_results.append({'Classifier': 'Boost-SVM','Count':31, 'Score': test_score})

## Step 2.0 Results

In [29]:
pd.DataFrame(classifier_results)

Unnamed: 0,Classifier,Count,Depth,Score
0,DecTree,,1.0,0.93991
1,DecTree,,2.0,0.93991
2,DecTree,,3.0,0.947085
3,DecTree,,4.0,0.95157
4,DecTree,,5.0,0.961435
5,LogReg-L1,,,0.9713
6,LogReg-L1,,,0.970404
7,SVM,,,0.9713
8,RandomForest,31.0,,0.98296
9,Bag-DecTree,31.0,,0.980269


## Step 3.0 Neural Networks

In [30]:
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier

In [31]:
# TODO: Code for classifier construction and testing mentioned in Step 3.0 of HW document

perceptron = Perceptron(random_state=42)
perceptron.fit(X_train, y_train)
y_pred_test = perceptron.predict(X_test)
test_score = perceptron.score(X_test, y_test)
classifier_results.append({'Classifier': 'Perceptron', 'Score': test_score})



In [32]:
mlp_three = MLPClassifier(random_state=42,hidden_layer_sizes=(3,))
mlp_three.fit(X_train, y_train)
y_pred_test = mlp_three.predict(X_test)
test_score = mlp_three.score(X_test, y_test)
classifier_results.append({'Classifier': 'MLPClassifier','Hidden':'(3,)', 'Score': test_score})

In [33]:
mlp_ten = MLPClassifier(random_state=42,hidden_layer_sizes=(10,))
mlp_ten.fit(X_train, y_train)
y_pred_test = mlp_ten.predict(X_test)
test_score = mlp_ten.score(X_test, y_test)
classifier_results.append({'Classifier': 'MLPClassifier','Hidden':'(10,)', 'Score': test_score})

In [34]:
mlp_ten_three = MLPClassifier(random_state=42,hidden_layer_sizes=(10,10,10))
mlp_ten_three.fit(X_train, y_train)
y_pred_test = mlp_ten_three.predict(X_test)
test_score = mlp_ten_three.score(X_test, y_test)
classifier_results.append({'Classifier': 'MLPClassifier','Hidden':'(10,10,10)', 'Score': test_score})

In [35]:
pd.DataFrame(classifier_results)

Unnamed: 0,Classifier,Count,Depth,Hidden,Score
0,DecTree,,1.0,,0.93991
1,DecTree,,2.0,,0.93991
2,DecTree,,3.0,,0.947085
3,DecTree,,4.0,,0.95157
4,DecTree,,5.0,,0.961435
5,LogReg-L1,,,,0.9713
6,LogReg-L1,,,,0.970404
7,SVM,,,,0.9713
8,RandomForest,31.0,,,0.98296
9,Bag-DecTree,31.0,,,0.980269


## Step 4.0 TensorFlow

In [36]:
#! pip install tensorflow
import tensorflow as tf

# TODO: Define TensorFlow columns
features = list(set(spam_ham))
len(features)

56

In [37]:
# TODO: Create function input_fn(x,y)
def input_fn(x,y):
    tensor_dict = {}
    i = 0
    for item in features:
        tensor_dict[item] = tf.convert_to_tensor(x[:,i], dtype=tf.float32)  
        i += 1
    return tensor_dict, tf.convert_to_tensor(y) 
# TODO: Create function train_input_fn()
def test_input_fn(test_x, test_y):
    feat_dict, labels = input_fn(test_x, test_y)
    #dataset = tf.data.Dataset.from_tensor_slices((feat_dict, labels))
    return feat_dict, labels

# TODO: Create function test_input_fn()

def train_input_fn(train_x, train_y):
    feat_dict, labels = input_fn(train_x, train_y)
    #dataset = tf.data.Dataset.from_tensor_slices((feat_dict, labels))
    return feat_dict, labels

## Step 4.3.1

In [38]:
# TODO: Create DNNClassifier

feat_dict, labels = input_fn(X_train, y_train)
my_feature_columns = []
for key in feat_dict.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
    
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[5, 5],
    # The model must choose between 3 classes.
    n_classes=2)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_master': '', '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x10cfe5400>, '_save_summary_steps': 100, '_model_dir': '/var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmpt3ehp1xn', '_task_type': 'worker', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_is_chief': True, '_save_checkpoints_secs': 600, '_task_id': 0, '_session_config': None, '_num_ps_replicas': 0}


## Step 4.3.1 Results

In [39]:
# TODO: train

classifier.train(
    input_fn=lambda:train_input_fn(X_train, y_train), steps=1000) 

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmpt3ehp1xn/model.ckpt.
INFO:tensorflow:step = 1, loss = 3200.76
INFO:tensorflow:global_step/sec: 307.856
INFO:tensorflow:step = 101, loss = 283.28 (0.327 sec)
INFO:tensorflow:global_step/sec: 468.679
INFO:tensorflow:step = 201, loss = 250.329 (0.213 sec)
INFO:tensorflow:global_step/sec: 488.761
INFO:tensorflow:step = 301, loss = 231.447 (0.206 sec)
INFO:tensorflow:global_step/sec: 480.471
INFO:tensorflow:step = 401, loss = 220.499 (0.206 sec)
INFO:tensorflow:global_step/sec: 491.669
INFO:tensorflow:step = 501, loss = 213.13 (0.204 sec)
INFO:tensorflow:global_step/sec: 468.693
INFO:tensorflow:step = 601, loss = 207.797 (0.213 sec)
INFO:tensorflow:global_step/sec: 463.813
INFO:tensorflow:step = 701, loss = 203.872 (0.216 sec)
INFO:tensorflow:global_step/sec: 482.332
INFO:tensorflow:step = 801, loss = 200.811 (0.207 sec)
INFO:tensorflow:global_step/se

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x124a54a58>

In [40]:
# TODO: evaluate
eval_result = classifier.evaluate(input_fn=lambda:test_input_fn(X_test, y_test), steps=1)

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Starting evaluation at 2018-04-15-00:20:02
INFO:tensorflow:Restoring parameters from /var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmpt3ehp1xn/model.ckpt-1000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2018-04-15-00:20:03
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.970404, accuracy_baseline = 0.865471, auc = 0.967133, auc_precision_recall = 0.941587, average_loss = 0.122035, global_step = 1000, label/mean = 0.134529, loss = 136.069, prediction/mean = 0.129392

Test set accuracy: 0.970



In [41]:
# TODO: results
eval_result

{'accuracy': 0.97040361,
 'accuracy_baseline': 0.86547089,
 'auc': 0.96713293,
 'auc_precision_recall': 0.94158673,
 'average_loss': 0.12203461,
 'global_step': 1000,
 'label/mean': 0.13452914,
 'loss': 136.06859,
 'prediction/mean': 0.12939163}

## Step 4.3.2

In [42]:
# TODO: Create LinearClassifier
model = tf.estimator.LinearClassifier(feature_columns=my_feature_columns) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_master': '', '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1092b8c88>, '_save_summary_steps': 100, '_model_dir': '/var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmp8fm9x8l1', '_task_type': 'worker', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_is_chief': True, '_save_checkpoints_secs': 600, '_task_id': 0, '_session_config': None, '_num_ps_replicas': 0}


## Step 4.3.2 Results

In [43]:
# TODO: train
model.train(input_fn=lambda: input_fn(X_train,y_train), steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmp8fm9x8l1/model.ckpt.
INFO:tensorflow:step = 1, loss = 3089.4
INFO:tensorflow:global_step/sec: 167.357
INFO:tensorflow:step = 101, loss = 487.108 (0.600 sec)
INFO:tensorflow:global_step/sec: 335.838
INFO:tensorflow:step = 201, loss = 393.757 (0.297 sec)
INFO:tensorflow:global_step/sec: 354.434
INFO:tensorflow:step = 301, loss = 358.016 (0.283 sec)
INFO:tensorflow:global_step/sec: 352.563
INFO:tensorflow:step = 401, loss = 339.428 (0.283 sec)
INFO:tensorflow:global_step/sec: 419.465
INFO:tensorflow:step = 501, loss = 328.233 (0.238 sec)
INFO:tensorflow:global_step/sec: 363.88
INFO:tensorflow:step = 601, loss = 320.863 (0.275 sec)
INFO:tensorflow:global_step/sec: 372.947
INFO:tensorflow:step = 701, loss = 315.71 (0.268 sec)
INFO:tensorflow:global_step/sec: 385.658
INFO:tensorflow:step = 801, loss = 311.94 (0.259 sec)
INFO:tensorflow:global_step/sec:

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x124a17240>

In [44]:
# TODO: evaluate
results = model.evaluate(input_fn=lambda: test_input_fn(X_test, y_test), steps=1)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**results))

INFO:tensorflow:Starting evaluation at 2018-04-15-00:20:32
INFO:tensorflow:Restoring parameters from /var/folders/23/n8wyjlns1r55_gr899ps5xf00000gp/T/tmp8fm9x8l1/model.ckpt-1000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2018-04-15-00:20:34
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.972197, accuracy_baseline = 0.865471, auc = 0.976781, auc_precision_recall = 0.945907, average_loss = 0.0961665, global_step = 1000, label/mean = 0.134529, loss = 107.226, prediction/mean = 0.134901

Test set accuracy: 0.972



In [45]:
# TODO: results
results

{'accuracy': 0.97219729,
 'accuracy_baseline': 0.86547089,
 'auc': 0.97678065,
 'auc_precision_recall': 0.94590682,
 'average_loss': 0.096166454,
 'global_step': 1000,
 'label/mean': 0.13452914,
 'loss': 107.22559,
 'prediction/mean': 0.13490112}