We will be using `fastai2` which is still under active development. I am using pip editable install as explained in the [fastai2 repository](https://github.com/fastai/fastai2). This gives me a little bit more of control and also allows me to to navigate through the code. 

Alternatively, you can install `fastai2` directly from github: `pip install git+https://github.com/fastai/fastai2`

When working on this notebook, I was at this version (hash in git commits) of the library: `3f27f55bb298591aef8f2f72e32615317cc2f77e`

In [1]:
from fastai2.test import *
from fastai2.basics import *
from fastai2.callback.all import *
from fastai2.vision.all import *

from pathlib import Path
import pandas as pd
import numpy as np
import librosa

The first order of business will be to read in our data.

In [2]:
trn_df = pd.read_csv('data/train.csv')
trn_paths = list(Path('data/audio_train/').iterdir())

In [3]:
trn_paths[:4]

[PosixPath('data/audio_train/5fca79da.wav'),
 PosixPath('data/audio_train/5f3f655e.wav'),
 PosixPath('data/audio_train/8c0e42bb.wav'),
 PosixPath('data/audio_train/a9bd898d.wav')]

Our first model will be a simple CNN on first two seconds of audio downsampled to 16 khz.

Let's load the data with the medium-level API `DataSource`. Let's first define how the sounds should be read in and later we will look at constructing the labels.

We could resample the audio files on the fly, but that is quite an expensive operation.

In [4]:
path = trn_paths[0]

In [5]:
%time librosa.core.load(path, sr=16000)

CPU times: user 733 ms, sys: 19.8 ms, total: 753 ms
Wall time: 752 ms


(array([-4.5412817e-06, -4.5701829e-05, -1.4590675e-05, ...,
        -1.3681429e-05, -4.5426914e-05, -5.1247978e-05], dtype=float32), 16000)

In [6]:
%time librosa.core.load(path, sr=None)

CPU times: user 0 ns, sys: 3.81 ms, total: 3.81 ms
Wall time: 3.12 ms


(array([ 0.0000000e+00, -3.0517578e-05, -3.0517578e-05, ...,
        -3.0517578e-05, -6.1035156e-05, -3.0517578e-05], dtype=float32), 44100)

Reading a file without resampling is much, much quicker.

Let's resample the files beforehand and save them to disk.

In [7]:
!rm -rf data/audio_train_16k
!rm -rf data/audio_test_16k

In [8]:
!mkdir data/audio_train_16k
!mkdir data/audio_test_16k

In [9]:
def resample(path, target_sr, output_dir):
    x, sr = librosa.core.load(path, sr=target_sr)
    librosa.output.write_wav(f'{output_dir}/{path.name}', x, sr, norm=True)
    
def resample_train(path): resample(path, 16000, 'data/audio_train_16k/')
def resample_test(path): resample(path, 16000, 'data/audio_test_16k/')

In [10]:
%time parallel(resample_train, trn_paths)

CPU times: user 6.12 s, sys: 658 ms, total: 6.78 s
Wall time: 7min 30s


(#9473) [None,None,None,None,None,None,None,None,None,None...]

There are 3 files in the test set that are corrupt - let's remove them before proceeding.

In [15]:
for fn in ['0b0427e2.wav', '6ea0099f.wav', 'b39975f5.wav']:
    !rm data/audio_test/{fn} 

In [17]:
tst_paths = list(Path('data/audio_test/').iterdir())

In [18]:
%time parallel(resample_test, tst_paths)

CPU times: user 6.17 s, sys: 823 ms, total: 6.99 s
Wall time: 6min 38s


(#9397) [None,None,None,None,None,None,None,None,None,None...]