<a href="https://colab.research.google.com/github/takanto/CNN_Voice_Detection/blob/main/Voice_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CNN Voice Classification**

The notebook contains the code for training the voice classification model using CNN and storing the model for later use in mobile app. The motive behind this model is to prevent many phone scam, creating devastating financial damages especially to elderlies who cannot distinguish the voices of their children and the scammers. By training this model specifically to distinguish the voices of their children and scammers, elderies do not have to rely on their ability but AI to identify whether a call is by scammers or not. 

When training, please not forget to run with TPU to speed up the training process, and divide training into multiple times if necessary. (When dividing the training, it is recommended that you keep track of epochs by adding number of epochs to the names of the model you save) 

## **Code**

### **Preparation**

Run the following code before any operation.

・ Importing important libraries.

In [1]:
import tensorflow as tf

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys, os

from tensorflow.keras.layers import Input, Dense, Flatten, Conv2DTranspose, MaxPooling2D, Dropout, BatchNormalization, Reshape, LeakyReLU, Conv2D
from tensorflow.keras.applications.vgg16 import VGG16 as PretrainedModel, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing import image

・ Setting up TPU environment.

In [None]:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="")
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
print("all devices:", tf.config.list_logical_devices("TPU"))

In [None]:
strategy = tf.distribute.TPUStrategy(resolver)

・ Preprocessing training data. Please import your training file. Training file needs to contain 100000+ .wav files of voices of family members and non-family members. Voice of non-family members will be provided in the github page in later version. 

In [None]:
from opensoundscape.audio import Audio
from opensoundscape.spectrogram import Spectrogram
from pathlib import Path

folder = 'name of the training file'

if not os.path.exists("spectrogram"):
  os.makedirs("spectrogram")

for audio_filename in os.listdir(folder):
  # Settings
  image_shape = (224, 224) 
  image_save_path = Path('spectrogram/'+audio_filename+'.png')

  audio = Audio.from_file(audio_filename)

  spectrogram = Spectrogram.from_audio(audio)

  image = spectrogram.to_image(shape=image_shape,invert=True)

  image.save(image_save_path)

In [None]:
from PIL import Image
import os, numpy as np

folder = 'spectrogram'

read = lambda imname: np.asarray(Image.open(imname).convert("L"))

ims = [read(os.path.join(folder, filename)) for filename in os.listdir(folder)]
im_array = np.array(ims, dtype='uint8')
im_array = (im_array / 255.0)*2 - 1

In [None]:
im_array_scammers = np.load('scammer_file') # scammer file will be added

### **First Time Running**

Run below code when you're first time traning the model

・ Define detector

In [None]:
def build_detector(image_size):
  i = Input(shape=image_size)
  x = Conv2D(64,(5,5), strides=4, padding = "same", activation=LeakyReLU(alpha=0.2))(i)
  x = Conv2D(128,(5,5), strides=4, padding = "same", activation=LeakyReLU(alpha=0.2))(x)
  x = Conv2D(128,(5,5), strides=4, padding = "same", activation=LeakyReLU(alpha=0.2))(x)
  x = Conv2D(256,(5,5), strides=4, padding = "same", activation=LeakyReLU(alpha=0.2))(x)
  x = Flatten()(x)
  x = Dropout(0.4)(x)
  x = Dense(1, activation="sigmoid")(x)

  model = Model(i,x)

  return model

・ Check the archtecture of the model

In [None]:
detector_check = build_detector((214,214,1))
detector_check.summary()

・ Set up important variables

In [None]:
batch_size = 128
epochs = 10000

ones = np.ones(batch_size)
zeros = np.zeros(batch_size)

d_losses = []
d_accs = []

・ Actual Training

In [None]:
with strategy.scope():
  detector = build_detector((214,214,1))
  detector.compile(optimizer = Adam(0.0002, 0.5),
                      loss = "binary_crossentropy",
                      metrics = ["accuracy"])

  for epoch in range(epochs):

    idx = np.random.randint(0, im_array.shape[0], batch_size)
    real_imgs = im_array[idx]

    idx = np.random.randint(0, im_array_scammers.shape[0], batch_size)
    fake_imgs = im_array_scammers[idx]

    d_loss_real, d_acc_real = detector.train_on_batch(real_imgs, ones)
    d_loss_fake, d_acc_fake = detector.train_on_batch(fake_imgs, zeros)
    d_loss = (d_loss_real + d_loss_fake) / 2
    d_acc = (d_acc_real + d_acc_fake) / 2

    d_losses.append(d_loss)
    d_accs.append(d_acc)

    if epoch % 10 == 0:
      print(f"epoch: {epoch+1} / {epochs}, d_loss: {d_loss:.2f}, d_acc: {d_acc:.2f}")

・ Save the model and weights (Make sure to download them on your hard drive)

In [None]:
detector.save("detector_model.h5")
detector_weights = detector.get_weights()
np.save('detector_weights', detector_weights)

・ Run below if you want the diagram for loss and accuracy of the model over time

In [None]:
plt.plot(d_losses, label = "d_losses")
plt.legend()

In [None]:
plt.plot(d_accs, label = "d_accs")
plt.legend()

### **Additional Training of your model**

When you already have your model and weights trained and want to train your model even more, below is the code you need to run.

・　Loading in your model and weights. (when you set different name, please reflect changes here too.)

In [None]:
detector_loaded = tf.keras.load_model("detector_model.h5")
detector_weights = np.load("detector_weights", allow_pickle = True)

detector_loaded.set_weights(detector_weights)

・ Training

In [None]:
batch_size = 128
epochs = 10000

ones = np.ones(batch_size)
zeros = np.zeros(batch_size)

d_losses = []
d_accs = []

In [None]:
with strategy.scope():
  detector = build_detector((214,214,1))
  detector.compile(optimizer = Adam(0.0002, 0.5),
                      loss = "binary_crossentropy",
                      metrics = ["accuracy"])

  for epoch in range(epochs):

    idx = np.random.randint(0, im_array.shape[0], batch_size)
    real_imgs = im_array[idx]

    idx = np.random.randint(0, im_array_scammers.shape[0], batch_size)
    fake_imgs = im_array_scammers[idx]

    d_loss_real, d_acc_real = detector.train_on_batch(real_imgs, ones)
    d_loss_fake, d_acc_fake = detector.train_on_batch(fake_imgs, zeros)
    d_loss = (d_loss_real + d_loss_fake) / 2
    d_acc = (d_acc_real + d_acc_fake) / 2

    d_losses.append(d_loss)
    d_accs.append(d_acc)

    if epoch % 10 == 0:
      print(f"epoch: {epoch+1} / {epochs}, d_loss: {d_loss:.2f}, d_acc: {d_acc:.2f}")

・ Save the model and weights (Make sure to download them on your hard drive)

In [None]:
detector.save("detector_model.h5")
detector_weights = detector.get_weights()
np.save('detector_weights', detector_weights)

・ Run below if you want the diagram for loss and accuracy of the model over time

In [None]:
plt.plot(d_losses, label = "d_losses")
plt.legend()

In [None]:
plt.plot(d_accs, label = "d_accs")
plt.legend()

## **TF Lite file for mobile app**

・ If you have saved the trained model and haven't done additional training, run below

In [None]:
detector_loaded = tf.keras.load_model("detector_model.h5")
detector_weights = np.load("detector_weights", allow_pickle = True)

detector_loaded.set_weights(detector_weights)

・ Below convert the model to tflite file which can be used in mobile app development

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(detector_loaded) ## change the name according to the current name of the detector

tflite_model = converter.convert()

with open("detector_model.tflite", "wb") as f:
  f.write(tfile_model)