Mini Automatic Speech Recognition

Using Fourier transforms and CNN (VGG like) architecture

EDA and modeling on ≈150 samples of sounds speaking numbers form 1 to 5 recorded by 10 people.

Accuracy on large (>10k) dataset: 94%.
Accuracy on given small dataset: 93%
Task was more or less chalenging because of small dataset.

Augmentation techniques:

increase/decrease pitch
increase/decrease speed
stretching
frequency and time masking
white noice injection
time shifting
overlay 2 samples (quiet and louder)
pre/post noise padding

Splitting data is done by speakers 5-5.

Training graph :

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
imgs		imgs
README.md		README.md
masr.ipynb		masr.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini Automatic Speech Recognition

Using Fourier transforms and CNN (VGG like) architecture

About

Releases

Packages

Languages

zvikinoza/MASR

Folders and files

Latest commit

History

Repository files navigation

Mini Automatic Speech Recognition

Using Fourier transforms and CNN (VGG like) architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages