##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

In [2]:

import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print("TensorFlow:", tf.__version__)


TensorFlow: 2.20.0


In [3]:

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

class_names = [
    "airplane","automobile","bird","cat","deer",
    "dog","frog","horse","ship","truck"
]


y_train = y_train.squeeze().astype("int64")
y_test  = y_test.squeeze().astype("int64")


x_train = x_train.astype("float32")
x_test  = x_test.astype("float32")

print("Train:", x_train.shape, y_train.shape)
print("Test :", x_test.shape, y_test.shape)

BATCH_SIZE = 64
IMG_SIZE = 224   

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)) \
    .shuffle(20000) \
    .batch(BATCH_SIZE) \
    .prefetch(tf.data.AUTOTUNE)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)) \
    .batch(BATCH_SIZE) \
    .prefetch(tf.data.AUTOTUNE)


  d = cPickle.load(f, encoding="bytes")


Train: (50000, 32, 32, 3) (50000,)
Test : (10000, 32, 32, 3) (10000,)


In [4]:

data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
])


In [5]:

mobilenet_base = keras.applications.MobileNetV2(
    input_shape=(IMG_SIZE, IMG_SIZE, 3),
    include_top=False,
    weights="imagenet"
)
mobilenet_base.trainable = False  


inputs = keras.Input(shape=(32, 32, 3))
x = data_augmentation(inputs)
x = layers.Resizing(IMG_SIZE, IMG_SIZE)(x)
x = keras.applications.mobilenet_v2.preprocess_input(x)
x = mobilenet_base(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(10, activation="softmax")(x)

mobilenet_model = keras.Model(inputs, outputs, name="MobileNetV2_CIFAR10")


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


In [6]:

print("\n=== model.summary() ===")
mobilenet_model.summary()


depth_total = len(mobilenet_model.layers)
depth_backbone = len(mobilenet_base.layers)

total_params = mobilenet_model.count_params()
trainable_params = np.sum([np.prod(v.shape) for v in mobilenet_model.trainable_weights])
nontrainable_params = np.sum([np.prod(v.shape) for v in mobilenet_model.non_trainable_weights])

print("\n--- Observations ---")
print("Network depth (total layers):", depth_total)
print("Backbone depth (MobileNetV2 layers):", depth_backbone)
print("Total parameters:", f"{total_params:,}")
print("Trainable parameters:", f"{int(trainable_params):,}")
print("Frozen (non-trainable) parameters:", f"{int(nontrainable_params):,}")



=== model.summary() ===



--- Observations ---
Network depth (total layers): 7
Backbone depth (MobileNetV2 layers): 154
Total parameters: 2,270,794
Trainable parameters: 12,810
Frozen (non-trainable) parameters: 2,257,984


In [7]:
EPOCHS_FROZEN = 3

mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

t0 = time.time()
history_frozen = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS_FROZEN,
    verbose=1
)
time_frozen = time.time() - t0

loss_frozen, acc_frozen = mobilenet_model.evaluate(test_ds, verbose=0)

print("\n MobileNetV2 (frozen) results")
print("Test accuracy:", round(float(acc_frozen), 4))
print("Train time (sec):", round(float(time_frozen), 1))


Epoch 1/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m853s[0m 1s/step - accuracy: 0.6962 - loss: 0.8788 - val_accuracy: 0.8318 - val_loss: 0.4913
Epoch 2/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m818s[0m 1s/step - accuracy: 0.7614 - loss: 0.6852 - val_accuracy: 0.8325 - val_loss: 0.4882
Epoch 3/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m794s[0m 1s/step - accuracy: 0.7700 - loss: 0.6635 - val_accuracy: 0.8341 - val_loss: 0.4759

 MobileNetV2 (frozen) results
Test accuracy: 0.8341
Train time (sec): 2465.0


In [8]:
EPOCHS_FINETUNE = 3


mobilenet_base.trainable = True


N = 30
for layer in mobilenet_base.layers[:-N]:
    layer.trainable = False

print("Trainable layers in backbone:",
      sum(l.trainable for l in mobilenet_base.layers), "/", len(mobilenet_base.layers))

mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

t1 = time.time()
history_ft = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS_FINETUNE,
    verbose=1
)
time_ft = time.time() - t1

loss_ft, acc_ft = mobilenet_model.evaluate(test_ds, verbose=0)

print("\n  MobileNetV2 (fine-tuned) results")
print("Test accuracy:", round(float(acc_ft), 4))
print("Train time (sec):", round(float(time_ft), 1))


MOBILENET_ACC = float(acc_ft)
MOBILENET_TIME = float(time_frozen + time_ft)
MOBILENET_EPOCHS = int(EPOCHS_FROZEN + EPOCHS_FINETUNE)


Trainable layers in backbone: 30 / 154
Epoch 1/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1156s[0m 1s/step - accuracy: 0.7428 - loss: 0.7525 - val_accuracy: 0.8298 - val_loss: 0.5030
Epoch 2/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1075s[0m 1s/step - accuracy: 0.7997 - loss: 0.5790 - val_accuracy: 0.8507 - val_loss: 0.4393
Epoch 3/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1093s[0m 1s/step - accuracy: 0.8204 - loss: 0.5225 - val_accuracy: 0.8686 - val_loss: 0.3910

  MobileNetV2 (fine-tuned) results
Test accuracy: 0.8686
Train time (sec): 3324.7


1) Which model achieved the highest accuracy?
ResNet50V2 achieved the highest test accuracy (0.9162 / 91.62%). The new pretrained MobileNetV2 came next after fine-tuning (0.8686 / 86.86%), while the custom CNN had the lowest accuracy (0.7028 / 70.28%).

2) Which model trained faster?
Under the same CPU condition, the custom CNN trained faster (about 120 s for 10 epochs ≈ 12 s/epoch) than the  MobileNetV2, which took 2465.0 s (frozen) + 3324.7 s (fine-tune) = 5789.7 s total for 6 epochs (≈ 965 s/epoch). ResNet speed is not included because it was not executed under the same condition (GPU/Colab was unavailable), so it is not directly comparable, But if all the three models were run on the same CPU with the same settings, the expected fastest would still be the custom CNN, followed by MobileNetV2, and then ResNet50V2 (slowest).

3) How might the architecture explain the differences ?
ResNet50V2 is deeper and higher-capacity, which often gives the best accuracy after fine-tuning but usually requires more computation, making it slower. MobileNetV2 benefits from ImageNet pretrained features, so it outperforms the scratch-trained CNN in accuracy, and it uses depthwise separable convolutions, making it more efficient than heavy networks like ResNet, however, it becomes slow on CPU here because images are resized from 32×32 to 224×224, which greatly increases computation. If all models were trained under the same hardware and same settings, the expected speed order is typically Custom CNN fastest, then MobileNetV2, then ResNet50V2 (on both CPU and GPU), since CNN is the smallest and MobileNet is lighter than ResNet.