## Fitting the PS-VAE to an example dataset

This notebook will walk you through how to download an example dataset, including some already trained models; the next notebook shows how to evaluate those models.

Before beginning, first make sure that you have properly installed the BehaveNet package and environment by following the instructions [here](https://behavenet.readthedocs.io/en/latest/source/installation.html). Specifically, (1) set up the Anaconda virtual environment; and (2) install the `BehaveNet` package. You do not need to set user paths at this time (this will be covered below).

To illustrate the use of BehaveNet we will use an example dataset from the [International Brain Lab](https://www.biorxiv.org/content/10.1101/2020.01.17.909838v5).

Briefly, a head-fixed mouse performed a visual decision-making task. Behavioral data was recorded using a single camera at 60 Hz frame rate. Grayscale video frames were downsampled to 192x192 pixels. We labeled the forepaw positions using [Deep Graph Pose](https://papers.nips.cc/paper/2020/file/4379cf00e1a95a97a33dac10ce454ca4-Paper.pdf). Data consists of batches of 100 contiguous frames and their accompanying labels.

The data are stored on the IBL data repository; you will download this data after setting some user paths.

**Note**: make sure that you are running the `behavenet` ipython kernel - you should see the current ipython kernel name in the upper right hand corner of this notebook. If it is not `behavenet` (for example it might be `Python 3`) then change it using the dropdown menus above: `Kernel > Change kernel > behavenet`. If you do not see `behavenet` as an option see [here](https://behavenet.readthedocs.io/en/latest/source/installation.html#environment-setup).

<br>

### Contents
* [Set user paths](#0.-Set-user-paths)
* [Download the data](#1.-Download-the-data)
* [Add dataset hyperparameters](#2.-Add-dataset-hyperparameters)

### 0. Set user paths
First set the paths to the directories where data, results, and figures will be stored on your local machine. Note that the data is ~3GB, so make sure that your data directory has enough space.

A note about the BehaveNet path structure: every dataset is uniquely identified by a lab id, experiment id, animal id, and session id. Paths to data and results contain directories for each of these id types. For example, a sample data path will look like `/home/user/data/lab_id/expt_id/animal_id/session_id/data.hdf5`. In this case the base data directory is `/home/user/data/`.

The downloaded zip file will automatically be saved as `data_dir/ibl/angelakilab/IBL-T4/2019-04-23-001/data.hdf5`

Additionally, the zip file contains already trained VAE and PS-VAE models, which will automatically be saved in the directories:
* `results_dir/ibl/angelakilab/IBL-T4/2019-04-23-001/vae/conv/06_latents/demo-run/`
* `results_dir/ibl/angelakilab/IBL-T4/2019-04-23-001/ps-vae/conv/06_latents/demo-run/`

To set the user paths, run the cell below.

[Back to contents](#Contents)

In [None]:
from behavenet import setup
setup()

The directory file is stored in your user home directory; this is a json file that can be updated in a text editor at any time.

### 1. Download the data
Run the cell below; data and results will be stored in the directories provided in the previous step.

[Back to contents](#Contents)

In [None]:
import os
import io
import shutil
import requests
import zipfile as zf
from behavenet import get_user_dir

dataset = 'two-view'
# 'head-fixed': IBL data
# 'mouse-face': dipoppa data
# 'two-view': musall data

if dataset == 'head-fixed':
    url = 'https://ibl.flatironinstitute.org/public/ps-vae_demo_head-fixed.zip'
    lab = 'ibl'
elif dataset == 'mouse-face':
    url = 'https://ndownloader.figshare.com/files/26450972'
    lab = 'dipoppa'
elif dataset == 'two-view':
    url = 'https://ndownloader.figshare.com/files/26476925'
    lab = 'musall'
else:
    raise ValueError('%s is not a valid dataset' % dataset)

print('Downloading data - this may take several minutes')

# fetch data from IBL data repository
print('fetching data from url...', end='')
r = requests.get(url, stream=True)
z = zf.ZipFile(io.BytesIO(r.content))
print('done')

# extract data
data_dir = get_user_dir('data')
if not os.path.exists(data_dir):
    os.makedirs(data_dir)
print('extracting data to %s...' % data_dir, end='')
for file in z.namelist():
    if file.startswith('ps-vae_demo_%s/data/' % dataset):
        z.extract(file, data_dir)
# clean up paths
shutil.move(os.path.join(data_dir, 'ps-vae_demo_%s' % dataset, 'data', lab), data_dir)
shutil.rmtree(os.path.join(data_dir, 'ps-vae_demo_%s' % dataset))
print('done')

# extract results
results_dir = get_user_dir('save')
if not os.path.exists(results_dir):
    os.makedirs(results_dir)
print('extracting results to %s...' % results_dir, end='')
for file in z.namelist():
    if file.startswith('ps-vae_demo_%s/results/' % dataset):
        z.extract(file, results_dir)
# clean up paths
shutil.move(os.path.join(results_dir, 'ps-vae_demo_%s' % dataset, 'results', lab), results_dir)
shutil.rmtree(os.path.join(results_dir, 'ps-vae_demo_%s' % dataset))
print('done')

### 2. Add dataset hyperparameters
The last step is to save some of the dataset hyperparameters in their own json file. This is used to simplify command line arguments to model fitting functions. This json file has already been provided in the data directory, where the `data.hdf5` file is stored - you should see a file named `ibl_angelakilab_params.json`. Copy and paste this file into the `.behavenet` directory in your home directory:

* In Linux, `~/.behavenet`
* In MacOS, `/Users/CurrentUser/.behavenet`

The next notebook will now walk you through how to evaluate the downloaded models/data.

[Back to contents](#Contents)