# Dataset downloading and pre-processing

This notebook ilustrates how to download and pre-process the MDA collision datasets.

It uses the first session as an example.

## Setup

Before starting, copy your Kaggle API token to the `src/` directory.

If you don't already have an API token, enter your Kaggle profile and click the _Create New API Token_ button under the **API** section:

![kaggle](../doc/images/kaggle.png)

The browser will prompt you to download a file named `kaggle.json`; save it to this project's `src/` directory.

## Download

To download Session 1 to the `data` subdirectory under this project's directory, run the code below:

In [None]:
from datasets import Download

download_01 = Download.SESSIONS[0]
download_01.download()

## Pre-processing

Run the code below to parse the downloaded files into a dataset organized by trial:

In [None]:
dataset_01 = download_01.load()
dataset_01.save()

The dataset is stored under `data/sessions/<session id>`.

## Visualization

Class `viewers.VideoGrid` can be used to generate a video containing all videos of a trial in a grid pattern for convenient visualization. Additionally, the borders between individual videos are set red or blue depending on whether the arm was within collision range of the obstacle on the corresponding frames.

In [None]:
from viewers import VideoGrid

#import logging
#logging.disable(logging.WARNING)

grid = VideoGrid('../data/sessions/2020-10-05/01')
grid.save('../data/sessions/2020-10-05/01/collage.mp4')

After the collated video is generated it can be displayed with:

In [None]:
from IPython.display import Video
Video('../data/sessions/2020-10-05/01/collage.mp4', width=1024)