### Zebra finch custom parsing
- An labelled (but smaller) dataset of zebra finch vocalizations
    - .WAV files with labels
- This notebook creates a JSON corresponding to each WAV file (and Noise file where available).
- Dataset origin:
    - https://drive.google.com/drive/folders/1etvuPjaNHV9oFPgUAuLxP3bk1aWfj3Pl
    - https://www.nature.com/articles/s41467-018-06394-9
    - https://www.ncbi.nlm.nih.gov/pubmed/26581377

In [1]:
from avgn.utils.general import prepare_env

In [2]:
prepare_env()

env: CUDA_VISIBLE_DEVICES=GPU


### Import relevant packages

In [3]:
from joblib import Parallel, delayed
from tqdm.autonotebook import tqdm
import pandas as pd
pd.options.display.max_columns = None
import librosa
from datetime import datetime
import numpy as np



In [4]:
import avgn
from avgn.custom_parsing.zebra_finch_theunisson import generate_json, parse_wavlist
from avgn.utils.paths import DATA_DIR

### Load data in original format

In [5]:
DATASET_ID = 'zebra_finch_theunisson'

In [6]:
# create a unique datetime identifier for the files output by this notebook
DT_ID = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
DT_ID

'2019-06-26_13-24-38'

In [7]:
DSLOC = avgn.utils.paths.Path('/mnt/cube/Datasets/ZebraFinch/VocalizationBank/ZebraFinchVocalizationBank/')
DSLOC

PosixPath('/mnt/cube/Datasets/ZebraFinch/VocalizationBank/ZebraFinchVocalizationBank')

In [8]:
WAVLIST = list((DSLOC).expanduser().glob('*/*.wav'))
len(WAVLIST), WAVLIST[0]

(3433,
 PosixPath('/mnt/cube/Datasets/ZebraFinch/VocalizationBank/ZebraFinchVocalizationBank/AdultVocalizations/BluRas07dd_110607-TukC-15.wav'))

In [9]:
wav_df = parse_wavlist(WAVLIST)

HBox(children=(IntProgress(value=0, max=3433), HTML(value='')))




In [10]:
print(len(wav_df))
wav_df[:3]

3376


Unnamed: 0,indv,age,recordingdate,vocalization_type,voc_type_full,voc_num,wav_loc
0,BluRas07dd,AdultVocalizations,2011-06-07,Tu,TukC,15,/mnt/cube/Datasets/ZebraFinch/VocalizationBank...
1,GraGra0201,AdultVocalizations,2011-09-07,Th,ThuckC,47,/mnt/cube/Datasets/ZebraFinch/VocalizationBank...
2,BlaLbl8026,AdultVocalizations,2011-06-09,Ne,NestC,29,/mnt/cube/Datasets/ZebraFinch/VocalizationBank...


### Parse into JSON

In [11]:
with Parallel(n_jobs=-1, verbose=10) as parallel:
    parallel(
        delayed(generate_json)(
            row, DT_ID
        )
        for idx, row in tqdm(wav_df.iterrows(), total = len(wav_df))
    )

HBox(children=(IntProgress(value=0, max=3376), HTML(value='')))

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 24 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    5.1s
[Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:    5.3s
[Parallel(n_jobs=-1)]: Done  50 tasks      | elapsed:    5.3s
[Parallel(n_jobs=-1)]: Done  65 tasks      | elapsed:    5.3s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.1866s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done  80 tasks      | elapsed:    5.3s
[Parallel(n_jobs=-1)]: Done  97 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 114 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0654s.) Setting batch_size=12.
[Parallel(n_jobs=-1)]: Done 148 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 186 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 298 tasks      | elapsed:    5.6s
[Parallel(n_




[Parallel(n_jobs=-1)]: Done 3376 out of 3376 | elapsed:    8.6s finished
