Here is a complete machine learning pipeline built in TensorFlow which uses a simple feedforward neural network (FNN) to classify the famous Iris dataset, which contains measurements of 150 iris flowers from three different species. Each sample from the dataset contains 4 features corresponding to petal length, petal width, sepal length, and sepal width. (*Sepals* are the green leafy parts of a flower directly underneath its petals.) Our goal is to build a model that can predict the species of an iris flower based on these measurements.

First we import and preprocess the dataset, and perform a train/test split, allocating 80% of the dataset to training.

In [1]:
# !pip install tensorflow
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

iris = datasets.load_iris()
inputs = iris.data
labels = iris.target

le = LabelEncoder()
labels = le.fit_transform(labels)

inputs_train, inputs_test, labels_train, labels_test = train_test_split(inputs, labels, test_size=0.2, random_state=1234)

2024-07-07 00:05:23.538519: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Next, we build and compile a 4-layer FNN. The final layer has 3 nodes corresponding to the 3 classes into which we wish to classify the data. The dataset is pretty simple so 10 nodes per layer should be sufficient. Using 4 layers we can pretty consistently get 100% accuracy.
When we compile, we will optimize the model for a simple accuracy metric using an optimized gradient descent algorithm called Adam.

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

model = Sequential([
    Input(shape=(4,)),
    Dense(10, activation='relu'),
    Dense(10, activation='relu'),
    Dense(10, activation='relu'),
    Dense(3, activation='softmax'),
])

model.compile(
    optimizer='adam', 
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'] # using a simple accuracy metric
)

Next, we train the model.

In [3]:
m = 20 # the model will go through the entire training dataset m times
k = 5 # the model will update its weights after every k samples

history = model.fit(
    inputs_train,
    labels_train,
    validation_data=(inputs_test, labels_test),
    epochs=m,      
    batch_size=k,  
)

Epoch 1/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.0227 - loss: 1.0954 - val_accuracy: 0.3000 - val_loss: 1.0827
Epoch 2/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4775 - loss: 1.0731 - val_accuracy: 0.5333 - val_loss: 1.0421
Epoch 3/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6315 - loss: 1.0270 - val_accuracy: 0.5667 - val_loss: 1.0044
Epoch 4/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7155 - loss: 0.9961 - val_accuracy: 0.5000 - val_loss: 0.9545
Epoch 5/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6764 - loss: 0.9235 - val_accuracy: 0.5667 - val_loss: 0.9091
Epoch 6/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7782 - loss: 0.8367 - val_accuracy: 0.5667 - val_loss: 0.8538
Epoch 7/20
[1m24/24[0m [32m━━━━━━━━━━

How did it do?

In [4]:
loss, accuracy = model.evaluate(inputs_test, labels_test)

print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step - accuracy: 1.0000 - loss: 0.4323
Test Loss: 0.4323333501815796
Test Accuracy: 1.0


Wonderful, we like 100% accuracy. Let's see it on some specific data, choosing 10 random samples from the dataset and having it guess.

In [7]:
import numpy as np

# select 10 random samples from the test dataset
random_samples = np.random.choice(len(inputs_test), size=10, replace=False)

for i in random_samples:
    sample = inputs_test[i]
    sample = sample.reshape(1, -1)  # reshaping the sample to have a batch dimension
    prediction = model.predict(sample)
    predicted_class = np.argmax(prediction)

    print(f"Predicted class: {predicted_class}")

    actual_class = labels_test[i]

    print(f"Actual class: {actual_class}")
    
    if predicted_class == actual_class:
        print("Woohoo!")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 0
Actual class: 0
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 1
Actual class: 1
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 0
Actual class: 0
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
Predicted class: 1
Actual class: 1
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 0
Actual class: 0
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 1
Actual class: 1
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
Predicted class: 0
Actual class: 0
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Predicted class: 0
Actual class: 0
Woohoo!
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
Predicte

Voila. Technology is amazing.