Skip to content

1D CNN based classifier for Speech Commands Dataset

Notifications You must be signed in to change notification settings

vj-1988/AudioNet-V1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioNet-V1

AudioNet is a simple convolutional neural net based on 1-D convolutions. This is trained and tested on google's speech command dataset.

Requirements

Tested with following setup

Software

  1. Python 3.5
  2. Numpy
  3. Scipy
  4. Keras 2.0.8
  5. Tensorflow 1.4.1
  6. Scikit-learn

Hardware

  1. GTX 1050 TI 4 GB

One Dimensional CNN

Here, 1-D convolutions (linear convolutions) are used on top of regular hidden layers to classify the speech signal. The dataset used is Google's speech Commands Dataset

The network has five 1-D convolutional layers with kernel size 32 and stride of 4. They are followed by four hidden layers with 512 neurons each. The network has approximately 10 million parameters in total.

Data Augmentation used

  1. Random noise
  2. Random shift

Training Loss vs Epochs

Training Acuracy vs Epochs

Training

The dataset has to be in appropriate subfolders with each folder name being the class label. The script AudioNet32.py needs the following inputs to train

  1. data_path : root folder of dataset
  2. train_ratio : ratio of files to be used for training and remaining is for validation
  3. batch_size : minibatch size for training.
  4. num_epochs : total no. of epochs
  5. dst : destination folder to save weights, logs

The script will generate a pickle file that contains synset for validation, training and validation files path and labels. This can be used to resume training using resume_training() function.

The script will save weights once in every 2 epochs.

Validation

The synset used for training is available in train_data_dic.pkl file. The pretrained weights are available in the following link

Download pretrained weights (Epoch 10)

About

1D CNN based classifier for Speech Commands Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages