# Plotting Neural Data Demo

### Neural data can be complex and can contain many variables within one dataset. How do we make sense of this information in a meaningful way?

Today, we will plot the calcium signal from inhibitory neurons recorded while mice performed a visual discrimination task. 


This notebook will help us investigate data collected from the [Visual Behavior 2P](https://portal.brain-map.org/explore/circuits/visual-behavior-2p) dataset from the Allen Brain Institute. 

We will also introduce concepts about coding in Python along the way.


---
### Prior activities that provide a foundation for using this notebook:
- Work through the **Introduction to Jupyter Notebooks** [tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) which will introduce you to Python and the Jupyter Notebook environment.


<hr>

# Learning Objectives

## At the end of this notebook, you'll be able to:
* Use Colab/Jupyter Notebooks to run Python Code
* Understand and use common Python packages for data visualization
* Plot calcium imaging data
* Apply best practices for graphing data 





# Dataset Notes

The entire dataset includes neural and behavioral measurements from: 

*   107 mice
*   4787 behavior training sessions
*   704 *in vivo* imaging sessions

The data are openly accessible, and include information about all recorded timeseries, behavioral events, and experimental data in a standard data format: [Neurodata Without Borders (NWB)](https://www.nwb.org/nwb-neurophysiology/).

There are many variables in this dataset that could be explored and analyzed, but for the purposes of the exercise today, we will use a subset of the data for simplicity. The variables included and used will be described a bit later.


# Section 1: Setup and Data Intro

<a id="one"></a>
## Step 1: Set up coding environment
Each time we start an analysis in Python, we must import the necessary code packages. If you're running this notebook in Colab, the cells below will install packages into your coding environment -- these are *not* installed on your computer.

### Import common packages
Below, we'll `import` a common selection of packages that will help us analyze and plot our data. We'll also configure the plotting in our notebook.

*   This will ensure that our coding environment has [NumPy](https://numpy.org/), [Pandas](https://pandas.pydata.org/), [Matplotlib](https://matplotlib.org/), and [Seaborn](https://seaborn.pydata.org/) installed. 

><b>Task</b>: Import the numpy module nicknamed as <code>np</code>. Add a <code>print</code> message at the end that says "Packages imported!" so that you know the code ran.



>> <u>Hint</u>: This is similar to the exercise from the **[Introduction to Jupyter Notebooks](https://www.dataquest.io/blog/jupyter-notebook-tutorial/)** tutorial.

In [None]:
# Import our plotting package from matplotlib
import matplotlib.pyplot as plt

# Specify that all plots will happen inline & in high resolution
%matplotlib inline  
%config InlineBackend.figure_format = 'retina'

# Import pandas for working with databases
import pandas as pd

# Import seaborn for plotting adjustments with matplotlib
import seaborn as sns

## <Add your code below>
# Import numpy below


# Add your print() statement below


## Step 2: Load and Examine Data Structure

><b>Task</b>: Run the code below to access the data file from github repository.

In [None]:
# Load the data from github repo
!git clone https://github.com/tmckim/teaching.git

><b>Task</b>: Run the code below to read the dataset into a **pandas** dataframe.

In [None]:
# Load the data file - SST_data.csv
data = pd.read_csv('/content/teaching/data/SST_data.csv')

# Show the data for review
data

As mentioned above, this dataset is borrowed from the Allen Institute and there is an introductory video on [Neuromatch Academy](https://compneuro.neuromatch.io/projects/neurons/README.html) with more details. 

Briefly: two-photon calcium imaging signals from a single mouse performing a visual change detection task are contained in this data file. This is simplified from the full dateset, so it only contains data from inhibitory neurons that were investigated in the study (Somatostatin (SST)-expressing interneurons). Example images of what these might look like and where they are in the brain can be found [here](https://observatory.brain-map.org/visualcoding/viewer?id=741951571). 

### Example plot

Here is an example of a simple plot. This is to demonstrate that with 7 lines of code, it's possible to visualize some of the data from this dataset. We will go through these lines step by step in the following sections, once you've had a chance to review the dataset. This initial plot is for demonstration, and so all you need to do at this point is review the code and run the cell to produce the plot.

><b>Task</b>: Run the cell below to plot the data!

In [None]:
# A simple plot
# select trial id and cell id from dataframe
singlecell_trial_data = data[(data.trial_id == 24) & (data.cell_id == 1086500633)]
# array for calcium signal
single_trial_trace = np.array(singlecell_trial_data.dF_F)
# array for x-axis (time)
single_trial_timepoints = np.array(singlecell_trial_data['time_from_stim'])
# plot 
plt.plot(single_trial_timepoints, single_trial_trace)
plt.xlabel('Time from image onset')
plt.ylabel('Calcium flourescence')
plt.title('Single cell calcium activity')

### Examine dataset

Now, let's take a closer look at the data and do some preliminary exploration to understand what we are working with. 

><b>Task</b>: Run the cell below to see how many cells are in the dataset.

In [None]:
# Find the unique entries from the cell_id column of the dataset
# Review: index the dataframe using .COLUMN_NAME to refer to the column that you want
data.cell_id.unique()

><b>Task</b>: Run the code below to select a cell_id and trial_id from the dataset. Display and review the data.

In [None]:
# select which trial id and cell id from the dataframe
singlecell_trial_data = data[(data.trial_id == 24) & (data.cell_id == 1086500633)]

# Display the data
singlecell_trial_data

<u> **Note on data format**</u>: the dataset is in long-format, meaning that a `trial_id` is repeated across multiple rows of the dataset. So, a single `cell_id` is repeated across the rows. This is in contrast to wide format, where each row would represent a different `cell_id `and all data for that cell would be contained in the columns of a single row. See the wide format table below, focusing on the 'dff' column (which has multiple values in it) in the example image below for comparison:

![](https://github.com/tmckim/teaching/blob/ac99bee311f132ef64d547807559022034e0ea87/imgs/data_wide.png?raw=true)



## Step 3: Review variables (columns) in the dataset

`dF_F` calcium imaging signal (baseline corrected, normalized fluorescence) <br>
`time_from_stim` is the timepoint of each row of data, aligned to an image presentation. Onset = time zero, and the times here span a (-1.25, 1.5) sec window  <br>
`cell_id` id number for the cell <br>
`exposure` whether the image for a trial was familiar or novel  <br>
`trial_id` each image presentation is a separate trial <br>
`omitted` whether a trial had an omitted image <br>
`pupil_area` measured 500ms after stimulus presntation <br>
`mean_response` average dF/F over the 500ms following image presentation <br>

# Section 2: Plotting with Matplotlib
Let's plot the trace of a single cell, on a single trial. 

We can use the package **matplotlib** (which we imported as `plt`) for this. 

First, we need to select the data we want to plot. 

## Step 1: Select data to plot

><b>Task</b>: Run the code below to select a ```cell_id``` and ```trial_id``` from the dataset. I've already picked one to start for now. You will have the opportunity to select your own later :)

In [None]:
# Select trial id and cell id from the dataframe
single_trial_data = data[(data.trial_id == 605) & (data.cell_id == 1086500092)]

# Show the data we selected
single_trial_data

><b>Task</b>: Insert the correct column name into the code below at `<insert_var_name>`. Run the code. <br>
We use the package **numpy**  (which we imported as `np`) to convert the data column to an array for plotting.

>> <u>Hint</u>: We want to plot the column with the calcium signal ```dF_F```.



In [None]:
# Make an array out of the dataset variable we want to plot- calcium signal

# <Adjust this code below>
single_trial_trace = np.array(single_trial_data.<insert_var_name>)

# Show the values
single_trial_trace

## Step 2: Plot
><b>Task</b>: Insert the name of the variable we want to plot to show the 
calcium signal using **matplotlib**. 

Run the code once you've added the variable below in place of ```<insert_var_name>```.

>> <u>Hint</u>: We created this variable above^.

In [None]:
# Use matplotlib (plt) to plot our variable of interest

# <Adjust this code below>
plt.plot(<insert_var_name>)



Great! We have our first plot. Take a look and see what you think. 

Can you make sense of this data as it is? Is there anything missing?

## Step 3: Plot 2.0

><b>Task</b>: Let's add a more informative x-axis. We have values in our dataset that we can use for this. Find the name of the column we should use from the dataset info above, and insert the variable name into the code below. 

Run the code once you've added the variable below in place of ```<insert_var_name>```.

>> <u>Hint</u>: This is usually a time variable.

In [None]:
# Add the column name (variable) we should use to scale the x-axis in the plot

# <Adjust this code below>
single_trial_timepoints = np.array(single_trial_data['<insert_var_name>'])


# Show the var
single_trial_timepoints

><b>Task</b>: Replot the data.


Run the code once you've added the variable below in place of ```x``` and ```y```.




In [None]:
# Replot the data to include the x and y data 

# <Adjust this code below>
plt.plot(x,y)



Great! Now the x-axis has more informative time values related to how the data was collected. These values are timepoints aligned to an image presentation.

><b>Task:</b> Label your axes and add a title accordingly using `plt.xlabel()`,  `plt.ylabel()`, and `plt.title()`.
    
>><u>Hint</u>: If you need help, see the [`plt.plot()` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html). 

In [None]:
# Your plotting code (with axis labels and title) goes here

# Exercise


1.   Repeat the above steps to plot a different cell and trial. 
  *   Produce a single plot that includes all labels and an informative title
  *   Comment your code along the way to explain the steps and procedure you went through in your own words

2.   Compare and contrast the calcium signals from the two cells that you have plotted with your partner. 
  *   Do they look similar or different? For example, you could discuss whether you chose similar cells, but different trials.
  *   How does this relate to other variables in the dataset? (scroll back up to the list and the dataframe)



In [None]:
# Exercise code goes here
# Include comments for each line to briefly describe what you did

Exercise Question 2: Note some observations that you discussed with your partner here.

# Section 3: Plotting data with Seaborn


 **Seaborn** is a plotting package that works with matplotlib to more easily adjust the aesthetics of plots. 
 
 
><b>Task:</b> Read the [introduction](https://seaborn.pydata.org/tutorial/introduction.html) to the package, up to and including the **Statistical estimation section**. After reading, return to the notebook and continue exploring this package with the guided prompts below. 



## Step 1: Plotting separate conditions with Seaborn
Recall that we imported **seaborn** as `sns`. First, we will use a function called `sns.lineplot`. We will use this to plot our calcium signal based on familiar and novel trials. **Seaborn** allows us to do this by using `hue` in combination with the name of our variable of interest. <br>

Read what this does and refer to the examples in the `sns.lineplot` [documentation](https://seaborn.pydata.org/generated/seaborn.lineplot.html#seaborn.lineplot). 

Familiar and novel trials in the dataset:

![](https://allensdk.readthedocs.io/en/latest/_static/visual_behavior_2p/expt_design_notes.png)

><b>Task:</b> Run the code once you've added the variable for the familiar/novel conditions below in place of ```<insert_var_name>```.


>><u>Hint</u>: Review the columns from the dataset print outs above^. 


Note: This may take a few seconds, compared to how long it took for the previous plotting code to run above. There are some statistics being computed in the background, so it takes a bit longer (~10 secs) for the plot to be displayed.

In [None]:
# Lineplot code

# <Adjust this code below>
sns.lineplot(data = data, x = 'time_from_stim', y = 'dF_F', hue = '<insert_var_name>')



><b>Task:</b> Review the plot above. How does this output compare to what was produced when we plotted with **matplotlib**?

>>Hint: Which specific aspects of the plot were added automatically?

Now, let's plot again, but using a different function within **seaborn** to compare. <br>
Review the documentation on `relplot` [here](https://seaborn.pydata.org/tutorial/relational.html#visualizing-statistical-relationships). <br>
Read the first section **[Visualizaing statistical relationships](https://seaborn.pydata.org/tutorial/relational.html#visualizing-statistical-relationships)**, and then scroll below (you can skip the sections leading up to the next one ->) and read **[Emphasizing continuity with line plots](https://seaborn.pydata.org/tutorial/relational.html#emphasizing-continuity-with-line-plots)**. 

><b>Task:</b> Run the code once you've set the parameter for `kind` instead of ```<insert_param>```.


In [None]:
# Relplot example

# <Adjust this code below>
sns.relplot(data = data, x='time_from_stim', y='dF_F', kind = '<insert_param>', hue = 'exposure')



Does this plot look similar to the first one? It should be exactly the same (note: the axes might appear visually stretched, but the data hasn't changed!). <br>

This demonstrates that there may be multiple ways to use functions within packages to produce the same plots and data visualizations!

## Step 2: Figure aesthetics
With **seaborn**, there are multiple options for controlling [aesthetics](https://seaborn.pydata.org/tutorial/aesthetics.html#). Review the link to see how we will start use `sns.set_style` and `sns.set_context`.  

Let's also look at the data in a different way, using another variable from the dataset: `mean_response`. Discuss with your partner what this variable measures and how it relates to the previous one we were using, `dF_F`. <br> <br>

We will also introduce another plot type `catplot` to display individual data points. Review the documentation [here](https://seaborn.pydata.org/tutorial/introduction.html#plots-for-categorical-data).

><b>Task:</b> Run the code below.

In [None]:
# Setup and plot data
# Set style and context
sns.set_style("whitegrid")
sns.set_context('talk') 

# Subset of the data (faster display, but entire dataset can be used)
data_sample = data.sample(1000) 

# Categorical plot
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response')

><b>Task:</b> Re-run the code above after you have changed the `set_style` and `set_context` to different [options](https://seaborn.pydata.org/tutorial/aesthetics.html#). 

#### 2.1 Colors
**Seaborn** has multiple options for setting/changing colors. One recommended option is to use a predefined color [palette](https://seaborn.pydata.org/tutorial/color_palettes.html), but you can adjust this as you see fit. 

><b>Task:</b> Run the code below. Review the code and the graph, and then re-run it with a different color.

In [None]:
# Plot with different colors
# Setup context
sns.set_context('talk')

# <Change color>
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response', color = 'blue')

><b>Task:</b> Run the code below. Review the code and the graph, and then re-run it with a different palette.

In [None]:
# Plot with palette
# Setup context
sns.set_context('talk')

# <Change palette>
sns.catplot(data = data_sample, x = 'exposure', y = 'mean_response', palette = 'gray')

><b>Task:</b> Review with your partner what properties were adjusted with the plots. What is the importance of these various settings? Can you think of examples of plots (good and bad!) you've seen before and why/why not you might choose different aesthetics?

#### 2.2 Labels
We can also change the names of the labels on our plots. There are a few ways to do this, but the easiest is to interface with **matplotlib** like we did previously.


><b>Task:</b> Review the code and add labels by replacing `<insert_text>`.  Feel free to edit the settings we have been testing out if you would like to practice more and learn about options for using `set_style` and `palette`. Run the code below. <br> We are again plotting the calcium signal over (`dF_F`) over time (`time_from_stim`) by familiar/novel (`exposure`) conditions.

In [None]:
# Plot with adjusted labels
# Another style option
sns.set_style('ticks')

# Plot with palette and adjust labels/title
sns.lineplot(data = data, x = 'time_from_stim', y = 'dF_F', hue = 'exposure', palette = 'Blues')

# <Adjust code below>
plt.xlabel('<insert_text>')
plt.ylabel('<insert_text>')
plt.title('<insert_text>')




## Exercise
1) Take one of the graphs we've created in class and change: <br>
    a) colors - find a colorblind friendly palette <br>
    b) labels - what should the axes say? Make sure these are informative. <br>
    c) title - make this informative <br>
    d) Advanced 1: modify the legend (if the plot has one) - can you change the bounding box color/style? <br>
    e) Advanced 2: adjust the font of your graphs? <br>


Feel free to use the documentation and google questions as they come up! Please also ask questions if you run into errors or have anything you'd like to discuss with us!

In [None]:
# Exercise code and notes go here

## Extra Exercise
If you finish the above tasks in the exercise, try these out!

1) Can you figure out how to make a [histogram](https://seaborn.pydata.org/tutorial/distributions.html) of pupil area in **Seaborn**? Use the subset of data below. <br>
2) Now, how can you split the histogram by `exposure` to plot? <br>
3) Advanced: If you have more time, play around with your plot! Be creative and see what else you can add to your histogram, using the [documentation](https://seaborn.pydata.org/tutorial/introduction.html#distributional-representations) as a guide.

In [None]:
# Subset of data to use for this exercise
data_sample = data.sample(1000)

In [None]:
# Exercise code goes here

# Saving Figures
You may want to save your plots separately from this notebook. This can be tricky and requires a few key pieces of information. Even if you specify a high DPI (dots per inch) - such as 300 for example (but it depends on the size of your plot)- sometimes axes can get cut off. Go through the code below and check your understanding. <br>

Do you know where the figures are saved?

>>Hint: Click on the folder image on the left side of the screen/notebook.

><b>Task:</b> Run the code below. Review the lines and make sure you understand what they do from what we've covered today. If you're unsure or something is new, check out the [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html).

In [None]:
# Code to save figures
# Set the size
plt.figure(figsize=(8,4))

# Plot the data
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF_F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')

# Save
plt.savefig('my_figure.png', dpi = 300)

This doesn't work- the x-axis label is cut off at the bottom after saving.

><b>Task:</b> Run the code below. Review the lines and make sure you understand what they do from what we've covered today. If you're unsure or something is new, check out the [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html).

In [None]:
# Code to save figures as PNG
# Set the size
plt.figure(figsize=(8,4))

# Plot the data
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF_F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')

# Save- adjusting  with bbox_inches to fix cutoff text
plt.savefig('my_figure.png', dpi = 300, bbox_inches = 'tight')

In [None]:
# Code to save figures as PDF
# Set the size
plt.figure(figsize=(8,4))

# Plot the data
sns.lineplot(data = data, x = 'time_from_stim' , y = 'dF_F', hue = 'exposure', style = 'omitted')
plt.xlabel('Delta Stim')
plt.ylabel('dF/F')
plt.title('Real Data')

# Save- adjusting  with bbox_inches to fix cutoff text
plt.savefig('my_figure.pdf', dpi = 300, bbox_inches = 'tight')

In [None]:
# Fun =)
from IPython.display import HTML
print('Great work today!')
HTML('<img src="https://media.giphy.com/media/jkvmzOg3LtpF6/giphy.gif">')


-----------

# Technical notes & credits


Much more information can be found in the [Allen Brain Institute whitepaper](https://brainmapportal-live-4cc80a57cd6e400d854-f7fdcae.divio-media.net/filer_public/4e/be/4ebe2911-bd38-4230-86c8-01a86cfd758e/visual_behavior_2p_technical_whitepaper.pdf) as well as in their <a href="http://allensdk.readthedocs.io/en/latest/visual_behavior_optical_physiology.html"> documentation</a>.


This file was developed from the [Allen Institute Notebooks](https://allensdk.readthedocs.io/en/latest/visual_behavior_optical_physiology.html), [Neuromatch Academy](https://compneuro.neuromatch.io/projects/neurons/README.html#allen-institute), and the [Columbia-Neuropythonistas repository](https://github.com/Columbia-Neuropythonistas). 