Skip to content

zvikinoza/MASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Mini Automatic Speech Recognition

Using Fourier transforms and CNN (VGG like) architecture

EDA and modeling on ≈150 samples of sounds speaking numbers form 1 to 5 recorded by 10 people.

Accuracy on large (>10k) dataset: 94%.
Accuracy on given small dataset: 93%
Task was more or less chalenging because of small dataset.

Augmentation techniques:

  • increase/decrease pitch
  • increase/decrease speed
  • stretching
  • frequency and time masking
  • white noice injection
  • time shifting
  • overlay 2 samples (quiet and louder)
  • pre/post noise padding

Splitting data is done by speakers 5-5.

Training graph :