## 1. Dataset Creation

In [None]:
%load_ext autoreload
%autoreload 2

This notebook should serve as a guide to the creation of your Carnatic Music Instrument dataset. We will start with the loading of the dataset using the mirdata API, extract the relevant sections and instruments, apply any relevant processing steps, and store the dataset in an intuitive and accessible format.

Typical Carnatic Music ensembles contain a wide-range of instruments. For this task we are going to focus on:

- Voice
- Violin
- Mridangam

You can refer to the instrumentation section of the [compIAM tutorial](https://mtg.github.io/IAM-tutorial-ismir22/indian_art_music/carnatic-music.html) for more information.

The final dataset will be a collection of short audios corresponding to each of these instruments. They will be organised such that each can be retrieved according to the instrument they contain, the performer, the raga and a unique identifier (for reproducibility later).

It is up to you to fill in each subsection with the relevant code to perform that task. If possible, try and split the sections amongst the project group to work in parallel. When the task is complete, you should try and abstract the code into .py files so that it can be ran without a python notebook.

### Load Dataset

You can access the Saraga Carnatic dataset using the [mirdata API](https://github.com/mir-dataset-loaders/mirdata). You should already have the dataset downloaded on your machine in the mirdata repository.

In [None]:
import mirdata

In [None]:
data_home = '/Volumes/MyPassport/mir_datasets/'

In [None]:
saraga = mirdata.initialize('saraga_carnatic', data_home=data_home)
saraga.validate()

You can choose a random track using `.choice_track()`

In [None]:
example_track = saraga.choice_track()

Explore the metadata for this example track. What information is available? Do all tracks have multi track recordings (i.e. separate voice, violin and mridangam audios)?

In [None]:
# explore

Maybe you have noticed that each track has a unique id - the track_id. You can list all available track_ids using

In [None]:
all_tracks = saraga.load_tracks()

Can you create some functions to explore these tracks?

In [None]:
def get_metadata(track_id):
    """
    For <track_id>, return a dataframe of associated metadata
    """
    # code here
    return metadata

def get_performer(track_id):
    """
    For <track_id>, return the performer
    """
    # code here
    return performer

def get_performance(track_id):
    """
    For <track_id>, return the performance name
    """
    # code here
    return performance

def get_raga(track_id):
    """
    For <track_id>, return the raga name
    """
    # code here
    return raga


def get_tonic(track_id):
    """
    For <track_id>, return the tonic in hertz
    """
    # code here
    return tonic

How many ragas/performers/performances are available?

In [None]:
# get dataset statistics

### Load Audio

The mirdata API returns paths to audio files associated with each track. Can you create some loaders to load an audio based on a given track name. Hint: The `librosa` library contains functions to load audio from file to an array of amplitude values.

In [None]:
def load_mixed_audio(track_id):
    """
    For <track_id>, return the loaded audio
    """
    # code here
    return audio_array

def load_violin_audio(track_id):
    """
    For <track_id>, return the isolated violin track
    """
    # code here
    return audio_array

def load_voice_audio(track_id):
    """
    For <track_id>, return the isolated voice track
    """
    # code here
    return audio_array

def load_mridangam_audio(track_id):
    """
    For <track_id>, return the isolated mridangam track
    """
    # code here
    return audio_array

### Listen to Audio

Let's write some functions to listen and visualise these audio arrays in the notebook. You should find that the `IPython` and `matplotlib` library useful.

In [None]:
def plot_waveform(audio_array):
    """
    Plot waveform for <audio_array> using matplotlib.pyplot
    """
    pass


def play_audio(audio_array):
    """
    Generate audio player for <audio_array> using Ipython library
    """
    pass

Are there any important observations about the mixed or isolated instrument tracks? What is the quality like, do you here all of the instruments clearly?

What about if you wanted to plot or listen to just a sample of the audio track?

In [None]:
# 

### Processing

Are the isolated voccal tracks sufficiently isolated? Libraries like `spleeter` can help separate singing sources from background instruments. Does it help here?

In [None]:
def separate_voice(audio):
    """
    Apply spleeter source separation to input audio
    """
    pass

How does the quality compare? Does spleeter work effectively? Do we lose any important information?

### Tagging Audio

We want to tag our audios with whether or not a particular instrument is sounding. We can do this by identifying non-silent regions in the isolated tracks and tagging the mixed tracks with the instrument. The `librosa` library contains functionality for identifying silent regions in audio (`librosa.effects.split`).

In [None]:
def detect_silence(audio_array):
    """
    Return array of 0 and 1 (is silent/is not silent) for input <audio_array>
    """
    return is_silent

Do these regions correspond to what you hear when playing the audio with `play_audio()` or what you see with `plot_waveform`?

Perhaps very small silent gaps can be ignored as either errors or momentary pauses inherent in the performance of that instrument (such as inhaling with the sung voice). Can you interpolate these small gaps below some threshold? What is a suitable threshold?

In [None]:
def interpolate_silence(is_silent, threshold=0.5):
    """
    For an input array or 0/1 values (silent/is not silent), fill small gaps of less than <threshold>.
    """
    return is_silent_filled

Can we tag our mixed tracks with instrument onsets using silence in their constituent isolated tracks as a proxy?

In [None]:
def tag_mix(track_id):
    """
    For track_id return three arrays corresponding to violin, mridangam and voice onsets.
    
    Return violin_onsets, mridangam_onsets, voice_onsets
        (each an array of 0 and 1 values: is violin/mridangam/voice sounding or no?)
    """
    return violin_onsets, mridangam_onsets, voice_onsets

### Extracting Samples

We should now have all the tools necessary to load and annotated audio. We now want to extract small snippets of audio  from the mixed tracks across the dataset and annotate each of these snippets as either containing voice, mridangam, violin or none of the above (a single audio should be able to have more than one tag). 

It is important that we have examples for all combinations of tags (violin, voice, mridangam, none). Each sample should be of the same length (what should that length be? think about the two extreme cases of very very short and very long, what problems would arise in each of these cases).

Each sample should have a unique identifier (index). The information relating to their tags should be stored in a metadata DataFrame where you can also find information about the performance.

These should all be saved in individual audio files.

In [None]:
# create metadata JSON
# extract samples
# annotate
# save samples with unique index

### Load Dataset

With our dataset created and saved in an intuitive and accessible format. Let's create some loaders to load the files and get metadata.

In [None]:
def load_sample(index):
    """
    Load sample with index, <index>
    """
    return sample

def get_metadata(index):
    """
    Get metadata for sample with index, <index>
    """
    return sample

Typically, when datasets are presented, they are accompanied by some stats detailing their size and constiuent parts. What stats can you tell us about our dataset? Think about: number of seconds, performers, performances, instruments, ragas, filesizes etc.... 

In [None]:
# stats

### Reproducible Code

Jupyter notebooks are great for experimenting, especially when visualisation or audio playback is required. However they are not great for reproducibility or source control. Can you abstract the code created here to .py file(s) so that the code can be ran in future without having to load the HTML notebook?