## Fitting BehaveNet to example datasets

This series of notebooks will walk you through how to download example datasets and fit the various models in the BehaveNet toolbox.

Before beginning, first make sure that you have properly installed the BehaveNet package and environment by following the instructions [here](https://behavenet.readthedocs.io/en/latest/source/installation.html) (specifically, (1) setting up the Anaconda virtual environment; and (2) installing the `BehaveNet` and `ssm` packages).

To illustrate the use of BehaveNet we will use an example dataset from [Musall et al 2019](https://www.nature.com/articles/s41593-019-0502-4), which is also one of the datasets used in the original [BehaveNet paper](https://openreview.net/forum?id=ByxMASrlUB).

Briefly, a head-fixed mouse performed a visual decision-making task while neural activity across dorsal cortex was optically recorded using widefield calcium imaging. We used the [LocaNMF](https://www.biorxiv.org/content/10.1101/650093v2) decomposition approach to extract signals from the calcium imaging video. Behavioral data was recorded using two cameras: one side view and one bottom view. Grayscale video frames were downsampled to 128x128 pixels. Data consists of 1126 trials across two sessions in the same mouse, with 189 frames per trial (30 Hz framerate). Neural activity was acquired at the same frame rate.

The data are stored on the Cold Spring Harbor data repository; you will download this data after setting some user paths.

**Note**: make sure that you are running the `behavenet` ipython kernel - you should see the current ipython kernel name in the upper right hand corner of this notebook. If it is not `behavenet` (for example it might be `Python 3`) then change it using the dropdown menus above: `Kernel > Change kernel > behavenet`. If you do not see `behavenet` as an option see [here](https://behavenet.readthedocs.io/en/latest/source/installation.html#environment-setup).

<br>

### Contents
* [Set user paths](#0.-Set-user-paths)
* [Download the data](#1.-Download-the-data)
* [Add dataset hyperparameters](#2.-Add-dataset-hyperparameters)

### 0. Set user paths
First set the paths to the directories where data, results, and figures will be stored on your local machine. Note that the data is ~7.5GB, so make sure that your data directory has enough space.

A note about the BehaveNet path structure: every dataset is uniquely identified by a lab id, experiment id, animal id, and session id. Paths to data and results contain directories for each of these id types. For example, a sample data path will look like `/home/user/data/lab_id/expt_id/animal_id/session_id/data.hdf5`. In this case the base data directory is `/home/user/data/`.

The downloaded zip file contains two datasets, which will automatically be saved as:
* `data_dir/musall/vistrained/mSM36/05-Dec-2017/data.hdf5`
* `data_dir/musall/vistrained/mSM36/07-Dec-2017/data.hdf5`

Additionally, the zip file contains already trained convolutional neural networks (the most time consuming step of the pipeline), which will automatically be saved in the directories:
* `results_dir/musall/vistrained/mSM36/05-Dec-2017/ae/conv/09_latents/ae-example/`
* `results_dir/musall/vistrained/mSM36/07-Dec-2017/ae/conv/09_latents/ae-example/`
* `results_dir/musall/vistrained/mSM36/multisession-00/ae/conv/09_latents/ae-example/`

The first two directories contain AEs trained on the individual sessions; the third directory contains an AE trained on both sessions simultaneously.

To set the user paths, run the cell below.

[Back to contents](#Contents)

In [None]:
from behavenet import setup
setup()

The directory file is stored in your user home directory; this is a json file that can be edited in a text editor at any time.

### 1. Download the data
Run the cell below; data and results will be stored in the directories provided in the previous step.

[Back to contents](#Contents)

In [None]:
import shutil
import zipfile as zf

# download zip file
zfile = '/media/mattw/data/behavenet_example_data/behavenet_example_data.zip'
z = zf.ZipFile(zfile)

# extract data
data_dir = get_user_dir('data')
print('extracting data to %s...' % data_dir, end='')
for file in z.namelist():
    if file.startswith('behavenet_ex/data/'):
        z.extract(file, data_dir)
# clean up paths
shutil.move(os.path.join(data_dir, 'behavenet_ex', 'data', 'musall'), data_dir)
shutil.rmtree(os.path.join(data_dir, 'behavenet_ex'))
print('done')

# extract results
results_dir = get_user_dir('save')
print('extracting results to %s...' % data_dir, end='')
for file in z.namelist():
    if file.startswith('behavenet_ex/results/'):
        z.extract(file, results_dir)
# clean up paths
shutil.move(os.path.join(results_dir, 'behavenet_ex', 'results', 'musall'), results_dir)
shutil.rmtree(os.path.join(results_dir, 'behavenet_ex'))
print('done')

In [None]:
import os
import io
import zipfile as zf
from behavenet import get_user_dir, make_dir_if_not_exists

tmp_zip_file = os.path.join(get_user_dir('data'), 'tmp.zip')
make_dir_if_not_exists(tmp_zip_file)

url = 'https://drive.google.com/open?id=13nqHu_UA2eOr6cWImKmfG7E3Y_eEiwOz'

import requests
print('beginning file download')
r = requests.get(url, stream=True)
z = zf.ZipFile(io.BytesIO(r.content))
# z.extractall(get_user_dir('data'))
# with open(tmp_zip_file, 'wb') as f:
#     f.write(r.content)
# wget.download(url, tmp_zip_file)
print('done')

### 2. Add dataset hyperparameters
The last step is to save some of the dataset hyperparameters in their own json file. This is used to simplify command line arguments to model fitting functions. The relevant parameters and their values are:

* `lab or experimenter name` (musall - note: quotes are not needed around strings)
* `experiment name` (vistrained)
* `example animal name` (mSM36)
* `example session name` (05-Dec-2017)
* `trial splits` (8;1;1;0) - this is how trials will be split among training, validation, testing, and gap trials, respectively. Typically we use training data to train the models; validation data to choose the best model from a collection of models using different hyperparameters; test data to produce plots and videos; and gap trials can optionally be inserted between training, validation, and test trials if desired.
* `x pixels` (128)
* `y pixels` (128)
* `input channels` (2) - this can refer to color channels (for RGB data) and/or multiple camera views, which should be concatenated along the color channel dimension. In the Musall dataset we use grayscale images from two camera views, so a trial with 189 frames will have a block of video data of shape (189, 2, 128, 128)
* `use output mask` (False) - an optional output mask can be applied to each video frame if desired; these output masks must also be stored in the data.hdf5 files as masks.
* `frame rate` (30) - in Hz; behavenet assumes that the video data and neural data are binned at the same temporal resolution
* `neural data type` (ca) - either ca for 2-photon/widefield data, or spikes for ephys data. This parameter controls the noise distribution for encoding models, as well as several other model hyperparameters.

To save these, run the cell below and enter them one at a time.

[Back to contents](#Contents)

In [None]:
from behavenet import add_dataset
add_dataset()