### Common spadefoot custom parsing
- This dataset consists of around 550 relatively well parsed frog vocalizations. Most WAVs contain only one vocalization. Songs are recorded by Leonie ten Hagen. Some contain more than one. Recordings are from a paper by Leonie ten Hagen:
    - A CSV with information about where each song is recorded, age, sex, call type
    - WAV files for vocalization
- This notebook creates a JSON corresponding to each WAV file (and Noise file where available).
- Dataset origin:
    - http://www.tierstimmenarchiv.de/webinterface/contents/treebrowser.php
    - https://link.springer.com/article/10.1007/s00114-016-1401-0

In [1]:
from avgn.utils.general import prepare_env

In [2]:
prepare_env()

env: CUDA_VISIBLE_DEVICES=GPU


### Import relevant packages

In [3]:
from joblib import Parallel, delayed
from tqdm.autonotebook import tqdm
import pandas as pd
pd.options.display.max_columns = None
import librosa
from datetime import datetime
import numpy as np



In [4]:
import avgn
from avgn.custom_parsing.hagen_common_spadefoot import generate_wav_json
from avgn.utils.paths import DATA_DIR

### Load data in original format

In [5]:
DATASET_ID = 'hagen_common_spadefoot'

In [6]:
# create a unique datetime identifier for the files output by this notebook
DT_ID = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
DT_ID

'2019-06-24_09-01-51'

In [7]:
DSLOC = avgn.utils.paths.Path('/mnt/cube/Datasets/animalsoundarchive/pelobates_fuscus/')
DSLOC

PosixPath('/mnt/cube/Datasets/animalsoundarchive/pelobates_fuscus')

In [8]:
fg_list = list(DSLOC.glob('*.mp3'))
len(fg_list), np.sort(fg_list)[-2:]

(560,
 array([PosixPath('/mnt/cube/Datasets/animalsoundarchive/pelobates_fuscus/Pelobates_fuscus_juvenil_LtH_0526_short.mp3'),
        PosixPath('/mnt/cube/Datasets/animalsoundarchive/pelobates_fuscus/Pelobates_fuscus_juvenil_LtH_0527_short.mp3')],
       dtype=object))

In [9]:
vocalization_lib = pd.read_csv(DSLOC.parent / 'recording_df.csv')
voc_df = vocalization_lib[(vocalization_lib.species == 'Pelobates fuscus')  & (vocalization_lib.author == 'ten Hagen, Leonie')]

voc_df = voc_df[
    [
        "filename",
        "species",
        "locality",
        "administrative_area",
        "country",
        "state",
        "recording_date",
        "sex",
        "age",
        "sound_type",
        "collection",
        "filename_ext",
        "description",
        "duration",
        "notes",
        "unique_identifier",
        "bytes",
        "recording_type",
        "collection",
        "notes"
    ]
]
voc_df[:3]


Unnamed: 0,filename,species,locality,administrative_area,country,state,recording_date,sex,age,sound_type,collection,filename_ext,description,duration,notes,unique_identifier,bytes,recording_type,collection.1,notes.1
15070,Pelobates_fuscus_adult_LtH_0001,Pelobates fuscus,Ennigerloh,Warendorf,DE,Nordrhein-Westfalen,8.4.2015,,adult,advertisment call,TSA,Pelobates_fuscus_adult_LtH_0001_short.mp3,Rufe adulter Knoblauchkröten aufgenommen in de...,00:00:01,,TSA:Pelobates_fuscus_adult_LtH_0001,17000.0,z,TSA,
15071,Pelobates_fuscus_adult_LtH_0002,Pelobates fuscus,Ennigerloh,Warendorf,DE,Nordrhein-Westfalen,8.4.2015,,adult,advertisment call,TSA,Pelobates_fuscus_adult_LtH_0002_short.mp3,Rufe adulter Knoblauchkröten aufgenommen in de...,00:00:01,,TSA:Pelobates_fuscus_adult_LtH_0002,17000.0,z,TSA,
15072,Pelobates_fuscus_adult_LtH_0003,Pelobates fuscus,Ennigerloh,Warendorf,DE,Nordrhein-Westfalen,8.4.2015,,adult,advertisment call,TSA,Pelobates_fuscus_adult_LtH_0003_short.mp3,Rufe adulter Knoblauchkröten aufgenommen in de...,00:00:01,,TSA:Pelobates_fuscus_adult_LtH_0003,17000.0,z,TSA,


### Generate JSON for each wav

In [21]:
with Parallel(n_jobs=1, verbose=10) as parallel:
    parallel(
        delayed(generate_wav_json)(
            row,
            DT_ID,
            mp3_path = np.array(fg_list)[row.filename_ext == np.array([i.name for i in fg_list])][0]
        )
        for idx, row in tqdm(voc_df.iterrows(), total = len(voc_df))
    );

HBox(children=(IntProgress(value=0, max=560), HTML(value='')))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.8s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    1.9s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    2.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    2.5s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    2.8s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 560 out of 560 | elapsed:  2.9min finished
