# About RawAudioBatcher

This Jupyter notebook explains the class RawAudioBatcher, which is used to create batches from audio data.
It contains a python generator method that can be used as input to the method `fit_generator` of a Keras neural network model with audio waveforms.

In [6]:
from __future__ import print_function, absolute_import
import sys
import os
sys.path.append(os.path.split(os.path.dirname(os.path.realpath('__file__')))[0])
from dcase_framework.raw_audio_utils import RawAudioBatcher
import numpy as np

### Set up the files
For this demo, the five wav files in the `data` folder will be used. Its class labels are defined in the annotations dict. In reality, these labels are read from a text file with the rest of the files in the database.

In the same way, the list `class_labels` contains all the possible labels in the dataset.

In [19]:
training_files = os.listdir('data')
training_files = [os.path.join('data',tf) for tf in training_files]

annotations = \
{'data/a001_140_150.wav': {'file':'data/a001_140_150.wav', 'identifier': 'a001', 'scene_label': 'residential_area'},
 'data/a001_160_170.wav': {'file':'data/a001_160_170.wav', 'identifier': 'a001', 'scene_label': 'residential_area'},
 'data/a005_110_120.wav': {'file':'data/a005_110_120.wav', 'identifier': 'a005', 'scene_label': 'city_center'},
 'data/a006_30_40.wav': {'file':'data/a006_30_40.wav', 'identifier': 'a006', 'scene_label': 'beach'},
 'data/a006_50_60.wav': {'file':'data/a006_50_60.wav', 'identifier': 'a006', 'scene_label': 'beach'}}

class_labels = ['beach', 'bus', 'cafe/restaurant', 'car', 'city_center', 'forest_path', 'grocery_store',
 'home', 'library', 'metro_station', 'office', 'park', 'residential_area', 'train', 'tram']

## RawAudioBatcher
### Parameters
The parameters for RawAudioBatcher are:
- **split_files** : list
    
    Full list of audio files to load


- **_annotations** : dict
    
    Dictionary containing a nested dictionary for each audio file in split_files.
    The nested dictionary must contain the keys 'file', 'identifier' and'scene_label'. Example:
    {'a001_140_150.wav': {'file':'a001_140_150.wav', 'identifier': 'a001', 'scene_label': 'residential_area'}


- **class_labels** : list
    
    All possible class labels

- **batch_size** : int
    
    Number of audio files to load and output


- **mono** : bool
    
    If True the audio file will be mixed down to a mono file


- **desired_fs** : int

    Sampling frequency of the output data


- **segment** : bool
    
    Separate the audio file into segments. Its duration defined by frame_size_sec0.


- **frame_size_sec0** : float
    
    number of seconds of each segment if segment is True


- **normalize** : bool
    
    Normalize output values between 0 and 1. If mono is False, channels are normalized with the same value.
    

### Usage
#### Instantiation
RawAudioBatcher needs to be instantiated before getting outputs from it. The following cell does that

In [34]:
batch_size = 4
mono = True
desired_fs = 16000
segment = True
frame_size_sec0 = 5.0
normalize = True

train_batcher = RawAudioBatcher(
    training_files, annotations, class_labels, batch_size, mono,desired_fs, segment, frame_size_sec0, normalize)

#### Getting ouput from instance
At the moment only the generator method can be used to get samples from it. The generators need to be called with the python method `next`, see next cell. It will return two items: the audio data and its labels. The labels are one hot encoded.

The shape of the audio data is `(batch_size * number of segments, duration, channels)`

The shape of the labels matrix is `(batch_size * number of segments, number of classes)`

Note that, if `segment` is False, the number of segments is 1.

In [35]:
audio_data, labels = next(train_batcher.generator())

print(np.shape(audio_data))
print(np.shape(labels))

(8, 80000, 1)
(8, 15)


### To do soon

- apply transformations