<a href="https://colab.research.google.com/github/yujin-kimmm/live-vox-extraction/blob/main/Soundata_colab_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Welcome to the Soundata Colab Example!**

This notebook provides a hands-on introduction to using the `soundata` library in Google Colab. `soundata` is a Python library designed to make it easy to load and work with common music information retrieval (MIR) datasets.

In this notebook, you will learn how to:

* Install `soundata`
* Load a dataset
* Download and store the dataset
* Validate the dataset

Let's get started!

## Getting Ready

First, install `soundata` package

In [None]:
!pip install soundata

Next, import the `soundata` package

In [None]:
import soundata

To check all available datasets in `soundata`, you can print the list of the datasets



In [None]:
print(soundata.list_datasets())

['clotho', 'dcase23_task2', 'dcase23_task4b', 'dcase23_task6a', 'dcase23_task6b', 'dcase_bioacoustic', 'dcase_birdVox20k', 'eigenscape', 'eigenscape_raw', 'esc50', 'freefield1010', 'fsd50k', 'fsdnoisy18k', 'marco', 'singapura', 'starss2022', 'tau2019sse', 'tau2019uas', 'tau2020sse_nigens', 'tau2020uas_mobile', 'tau2021sse_nigens', 'tau2022uas_mobile', 'tut2017se', 'urbansed', 'urbansound8k', 'warblrb10k']


## Initialize a dataset

To use a loader in `soundata`, you should first initialize it. For this example, we will use `urbansound8k` as an exmaple.

In [None]:
dataset = soundata.initialize('urbansound8k')

### Dataset versions

Soundata supports working with multiple dataset versions. To see all available versions of a specific dataset, run `soundata.list_dataset_versions('urbansound8k')`. Use version parameter if you wish to use a version other than the default one.

In [None]:
# if you are willing to use a default version of the dataset, please comment out or pass this cell.

# To see all available versions of a specific dataset:
soundata.list_dataset_versions('urbansound8k')

# Use 'version' parameter if you wish to use a version other than the default one.
dataset = soundata.initialize('urbansound8k', data_home='/choose/where/data/live', version="1.0") # replace the directory to your directory.

## Download the dataset

`soundata` datasets are downloaded to a default directory ``/root/sound_datasets/<Dataset_Name>`` in Colab. This directory is temporary and will be reset every time you restart your Google Colab Session.

In [None]:
dataset.download()

By default, data is downloaded to
```
/root/sound_datasets/<dataset_name>
```

### Data Storage

There are several ways to keep the dataset without downloading everytime you restart the session.

1. **Copying the downloaded dataset to Google Drive:** After downloading the dataset, you can copy it to your Google Drive for persistent storage and easy access across different Colab sessions.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Replace the <dataset_name> to name of the dataset you downloaded.
!cp -r /root/sound_datasets/<dataset_name> /content/drive/MyDrive/<dataset_name>

2. **Setting a custom download path when initializing the dataset:** You can specify a different download directory when initializing the dataset loader using the `data_home` parameter. This allows you to download the dataset directly to a desired location, such as a mounted Google Drive folder.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Replace the <dataset_name> to name of the dataset you downloaded.
import soundata
dataset = soundata.initialize('<dataset_name>', data_home='/content/drive/MyDrive/<Folder_Name>')
dataset.download() # Dataset will be downloaded to `data_home` directory.

3. **Accessing a dataset downloaded outside of Google Colab:** If you have already downloaded a dataset locally, you can upload it to your Google Drive or directly to the Colab environment and then initialize the `soundata` loader with the path to the dataset directory using the `data_home` parameter.

## Validate the dataset

Using the method `validate()`, we can check if the files in the local version are the same than the available canonical version, and the files were downloaded correctly (none of them are corrupted).

In [None]:
dataset.validate()

100%|██████████| 1/1 [00:00<00:00, 207.26it/s]
100%|██████████| 8732/8732 [00:58<00:00, 148.58it/s]


({'metadata': {}, 'clips': {}}, {'metadata': {}, 'clips': {}})

Now you are ready to use soundata in Google Colab! You can explore more examples from the soundata usage examples in the documentation to learn about different datasets and tasks, and then apply these techniques and insights to your own music information retrieval projects.