### rodent custom parsing
- This dataset consists of 3861 vocalizations from Mice of two different strains, rats (pleasant calls and distress calls) and two different gerbils:
    - WAV files for vocalization that contains labels for species and vocalization. 
- This notebook creates a JSON corresponding to each WAV file.
- Dataset origin:
    - https://sites.google.com/view/rtachi/resources
    https://www.biorxiv.org/content/biorxiv/early/2019/03/10/572743.full.pdf

In [1]:
%load_ext autoreload
%autoreload 2
%env CUDA_VISIBLE_DEVICES=[]

env: CUDA_VISIBLE_DEVICES=[]


In [2]:
from avgn.utils.general import prepare_env



In [3]:
prepare_env()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
env: CUDA_VISIBLE_DEVICES=GPU


### Import relevant packages

In [4]:
from joblib import Parallel, delayed
from tqdm.autonotebook import tqdm
import pandas as pd
pd.options.display.max_columns = None
import librosa
from datetime import datetime
import numpy as np

In [5]:
import avgn
from avgn.custom_parsing.rodent_tachibana import generate_json
from avgn.utils.paths import DATA_DIR

### Load data in original format

In [6]:
# create a unique datetime identifier for the files output by this notebook
DT_ID = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
DT_ID

'2019-09-26_16-31-03'

In [7]:
DSLOC = DATA_DIR/"raw/rodent/zip_contents"
DSLOC

PosixPath('/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/raw/rodent/zip_contents')

In [8]:
all_wavs = list(DSLOC.expanduser().glob('*/*.wav')) + list(DSLOC.expanduser().glob('*/*.WAV'))
all_wavs = [i for i in all_wavs if i.stem[0] != '.']
len(all_wavs)

21

In [9]:
all_csvs = list(DSLOC.expanduser().glob('*/*.csv'))
all_csvs = [i for i in all_csvs if i.stem[0] != '.']
len(all_csvs)

21

In [10]:
all_wavs[:3]

[PosixPath('/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/raw/rodent/zip_contents/rat_distressed/20181026_153035_384K_1Ch_a60_r.wav'),
 PosixPath('/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/raw/rodent/zip_contents/rat_distressed/20181026_150742_384K_1Ch_a58_r.wav'),
 PosixPath('/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/raw/rodent/zip_contents/rat_distressed/20181026_145634_384K_1Ch_a57_r.wav')]

In [11]:
wav_df = pd.DataFrame(all_wavs, columns = ['wavloc'])
wav_df[:3]

Unnamed: 0,wavloc
0,/mnt/cube/tsainbur/Projects/github_repos/avgn_...
1,/mnt/cube/tsainbur/Projects/github_repos/avgn_...
2,/mnt/cube/tsainbur/Projects/github_repos/avgn_...


### create json for wavs

In [12]:
with Parallel(n_jobs=1, verbose=10) as parallel:
    parallel(
        delayed(generate_json)(
            row,
            DT_ID
        )
        for idx, row in tqdm(wav_df.iterrows(), total=len(wav_df))
    );

HBox(children=(IntProgress(value=0, max=21), HTML(value='')))

/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181026_153035_384K_1Ch_a60_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181026_150742_384K_1Ch_a58_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181026_145634_384K_1Ch_a57_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181026_142935_384K_1Ch_a55_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181010_093120_384K_1Ch_a18_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181010_095344_384K_1Ch_a14_r.JSON
/mnt/cube/tsainbur/Projects/github_repos/avgn_paper/data/processed/tachibana_rat/2019-09-26_16-31-03/JSON/20181010_093817_384K_1Ch_a42_r.JSON
/mnt/c

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  21 out of  21 | elapsed:    0.2s finished
