# Classification
An example of how to use a TensorFlow estimator to classify flowers.   

# Imports and Setup

In [55]:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd

# Dataset
This specified dataset seperates flowers into 3 different classes of species


* Setosa  
* Versicolor
* Virginica

The information about each flower is the following:

* sepal length
* sepal width
* petal length
* petal width





In [56]:
# Let's define some constants that will be helpful later
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth','PetalLength','PetalWidth','Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [57]:
# Using keras (a module inside TensorFlow), we grab our datasets and read them into a pandas dataframe
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

A quick glance at the first 5 rows of data


In [58]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


Now we can pop the species column off and use that as our label.

In [59]:
train_y = train.pop('Species')
test_y = test.pop('Species')
train.shape # we havew 120 entries with 4 features

(120, 4)

# Input Function

In [60]:
def input_fn(features, labels, training=True, batch_size=256):
  # Convert the inputs to a Dataset
  dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

  # Shuffle and repeaat if you are in training mode
  if training:
    dataset = dataset.shuffle(1000).repeat()

  return dataset.batch(batch_size=batch_size)

# Feature Columns

In [61]:
my_feautre_columns = []
for key in train.keys():
  my_feautre_columns.append(tf.feature_column.numeric_column(key=key))
print(my_feautre_columns)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]


# Building the Model
We are ready to choose the model.  For classification tasks there are variety of different estimators/models that we can pick from.

Some options are listed below.
* DNSClassifier (Deep Neural Network)
* LinearClassifier

We can choose either model but the DNN seems to be the best choice.  This is becasue we may not be able to find a linear coorespondence in our data.

In [62]:
# Build a DNN with 2 hidden layers with 30 and 10 nodes each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feautre_columns,
    # Two hidden layers with 30 and 10 nodes respectively
    hidden_units=[30,10],
    # The model must choose between 3 classes.  These are your output nodes.
    n_classes=3
    )



# Training
The steps argument can be modifued to try to best fit the model.  Keep in mind that more steps is not always better.

In [63]:
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=5000
  ) #Lambda is included to avoid creating an inner function previously

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7e4ec9272020>

# Evaluation
We can assess the accuracy of the model by comparing it to the test data. The model has a 90% accuracy rate when predicting the test data, but it's important to remember that this might not necessarily carry over to real-world situations. Several factors, such as under-fitting, over-fitting, data leakage, and cross-contamination, can affect the model's performance. However, since this is our first classifier model, we won't worry about these issues for now.

Moreover, we've retrained the model a few times to improve its accuracy. We did this by simply rerunning the code in the previous cell. Since the data is shuffled each time, the model will be trained differently, and we can expect different results each time.

In [64]:
eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False)
) # No need to specify steps.  Since we are not training, the model only looks at the testing data once.

accuracy = eval_result['accuracy']
print(f'Test set accuracy: {accuracy:.3f}')

Test set accuracy: 0.900


# Predictions
After completing the training of our model, we are now ready to make predictions. Below is a script that allows you to enter the characteristics of a flower. You can then see the predicted class of the flower using the model we have just created. This script is an example of how you can use our model to accept user input one at a time.

In [65]:
def input_fn(features, batch_size=256):
  # Converts the inputs to a Dataset without labels.
  return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size=batch_size)

features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
predict = {}


print("Please type numeric values as prompted.")
for feature in features:
  valid = True
  while valid :
    val = input(feature + ": ")
    if not val.isdigit(): valid = False
  predict[feature] = [float(val)]

print(predict)

predictions = classifier.predict(input_fn=lambda: input_fn(predict))
for pred_dict in predictions:
  class_id = pred_dict['class_ids'][0]
  probability = pred_dict['probabilities'][class_id]
  species = SPECIES[class_id]
  print(f'Prediction is "{species}" {100*probability:.2f}%')

Please type numeric values as prompted.
SepalLength: 4.5
SepalWidth: 3.0
PetalLength: 5.1
PetalWidth: 2.0
{'SepalLength': [4.5], 'SepalWidth': [3.0], 'PetalLength': [5.1], 'PetalWidth': [2.0]}
Prediction is "Versicolor" 34.41%


Next, we have some batch information available. The variable 'predict_x' holds the feature values for three different flowers, whereas the variable 'expected' contains the correct species of each flower. The following script is a slightly modified version of the previous one, which can now handle batch information. It processes the input data and then compares the predicted results with the expected ones.

In [66]:
expected = ['Setosa', 'Versicolor', 'Virginica'] # The expected results for the following predcitions.
predict_x = {
    'SepalLength':[5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1]
}

for i in range(len(expected)):
  for key, val in predict_x.items():
    # print(key, " ", val)
    # print(key, val[i])
    predict[key] = [val[i]]

  predictions = classifier.predict(input_fn=lambda: input_fn(predict))
  for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]
    species = SPECIES[class_id]
    print(f'Prediction is "{species}" {100*probability:.2f}%')
    print(f'Should be: {expected[i]}')

Prediction is "Setosa" 70.94%
Should be: Setosa
Prediction is "Versicolor" 38.23%
Should be: Versicolor
Prediction is "Versicolor" 35.33%
Should be: Virginica


Although the model predicted 2 out of 3, the certainty of the latter two predictions was low, indicating that the model is still far from perfect.