# Genre classification using spectrograms

**Vivek Vijayan**

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/enter-opy/genre-classification/blob/main/notebooks/spectrograms.ipynb)

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split

E0000 00:00:1741084576.762967    5603 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741084576.771288    5603 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Data preprocessing

In [2]:
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

In [3]:
train_datagen = ImageDataGenerator(rescale=1.0 / 255, validation_split=0.1)
test_datagen = ImageDataGenerator(rescale=1.0 / 255)

train_generator = train_datagen.flow_from_directory(
    "../Data/images_original/train",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="categorical",
    subset="training"
)

validation_generator = train_datagen.flow_from_directory(
    "../Data/images_original/train",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="categorical",
    subset="validation"
)

test_generator = train_datagen.flow_from_directory(
    "../Data/images_original/test",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="categorical",
)

Found 720 images belonging to 10 classes.
Found 79 images belonging to 10 classes.
Found 199 images belonging to 10 classes.


## VGG16

In [4]:
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False 

I0000 00:00:1741084580.775284    5603 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2248 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5


In [5]:
x = Flatten()(base_model.output)
x = Dense(512, activation="relu")(x)
x = Dropout(0.2)(x)
x = Dense(256, activation="relu")(x)
x = Dropout(0.2)(x)
x = Dense(128, activation="relu")(x)
x = Dropout(0.2)(x)
x = Dense(64, activation="relu")(x)
x = Dropout(0.2)(x)
x = Dense(10, activation="softmax")(x)

In [6]:
model = Model(inputs=base_model.input, outputs=x)

In [7]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

In [8]:
model.summary()

## Training

In [9]:
model.fit(train_generator, validation_data=validation_generator, epochs=50)

  self._warn_if_super_not_called()


Epoch 1/50


I0000 00:00:1741084585.003717    5747 service.cc:148] XLA service 0x7fb410004c10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1741084585.003777    5747 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce GTX 1650, Compute Capability 7.5
I0000 00:00:1741084585.510991    5747 cuda_dnn.cc:529] Loaded cuDNN version 90600
I0000 00:00:1741084604.501586    5747 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 2s/step - accuracy: 0.0977 - loss: 2.9608 - val_accuracy: 0.1646 - val_loss: 2.2427
Epoch 2/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 427ms/step - accuracy: 0.1136 - loss: 2.3358 - val_accuracy: 0.2025 - val_loss: 2.1860
Epoch 3/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 428ms/step - accuracy: 0.1734 - loss: 2.2018 - val_accuracy: 0.2911 - val_loss: 1.9732
Epoch 4/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 430ms/step - accuracy: 0.2444 - loss: 2.0021 - val_accuracy: 0.3924 - val_loss: 1.8453
Epoch 5/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 430ms/step - accuracy: 0.2722 - loss: 1.9499 - val_accuracy: 0.4051 - val_loss: 1.7703
Epoch 6/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 441ms/step - accuracy: 0.3691 - loss: 1.7848 - val_accuracy: 0.4304 - val_loss: 1.7091
Epoch 7/50
[1m23/23[0m [32m━━━━━━

<keras.src.callbacks.history.History at 0x7fb533386980>

## Evaluation

In [10]:
loss, accuracy = model.evaluate(test_generator)

print(f"Test Accuracy: {(accuracy * 100):.2f}%")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 1s/step - accuracy: 0.5299 - loss: 1.6919   
Test Accuracy: 52.76%


# Discussion
- I used a pretrained `VGG16` and applied transfer learning
- Removed the `top` and replaces with few dense layers and output layer with `10` softmax units
- The model seems to overfit with `accuracy > 50%` on test data
- Validation `accuracy` does not seem to improve beyond `50%`