<a href="https://colab.research.google.com/github/tmckim/materials-fa23-colab/blob/main/projects/project2/Project2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project 2: Defining cell types by their electrophysiology

### The brain is complex and contains contains thousands of cells. How do we make sense of this information in a meaningful way?


We can define neurons by features such as:
* <b>gene expression patterns</b>
* <b>electrophysiology features</b>
* <b>structure</b>

We'll use these three features to compare and contrast cell types in the brain.

<b>Note</b>: there are additional ways to classify neurons as well, including function, location, neurotransmitters, and morphology!


This notebook will help us investigate specific features in the [electrophysiology cell types dataset ](https://celltypes.brain-map.org/) from the Allen Brain Institute.

You will apply the coding skills you've learned in Python along the way.



---
### Prior activities that provide a foundation for using this notebook:
- Complete [the web-based activity from the Project Instructions](https://drive.google.com/file/d/1zFVaZknIxaZ1rkTJXceVxQEzu5vL2jGj/view?usp=sharing), which asks you to look at and work with this data on the Allen Institute website.
- Work through lab08 from the course that introduced `pandas` and related it to what you've already learned in the `datascience` package

<hr>

**Learning Objectives:**


*   Understand the metrics that we can use to compare cell types
*   Practice and apply Python coding skills to access the AllenSDK
*   Compare electrophysiological characteristics of neurons between humans and mice






### Logistics

**Rules.** Don't share your code with anybody but your partner. You are welcome to discuss questions with other students, but don't share the answers. The experience of solving the problems in this project will prepare you for exams (and life). If someone asks you for the answer, resist! Instead, you can demonstrate how you would solve a similar problem.

**Support.** You are not alone! Come to office hours and talk to your classmates. If you want to ask about the details of your solution to a problem, come see me. If you're ever feeling overwhelmed or don't know how to make progress, email for help.


**Free Response Questions:** Make sure that you put the answers to the written questions in the indicated cell we provide. **Every free response question should include an explanation** that adequately answers the question.


**Advice.** Develop your answers incrementally. Break your code up into steps, perform each step on a different line, give a new name to each result, and check that each intermediate result is what you expect. You can add any additional names, etc. if you want to in the provided cells. Make sure that you are using distinct and meaningful variable names throughout the notebook.

You **never** have to use just one line in this project or any others. Use intermediate variables and multiple lines as much as you would like!

## Before you start - Save this notebook!

When you open a new Colab notebook from the WebCampus (like you hopefully did for this one), you cannot save changes. So it's  best to store the Colab notebook in your personal drive `"File > Save a copy in drive..."` **before** you do anything else.

The file will open in a new tab in your web browser, and it is automatically named something like: "**Copy of Project2.ipynb**". You can rename this to just the title of the assignment "**Project2.ipynb**". Make sure you do keep an informative name (like the name of the assignment) so that you know which files to submit back to WebCampus for grading! More instructions on this are at the end of the notebook.


**Where does the notebook get saved in Google Drive?**

By default, the notebook will be copied to a folder called “Colab Notebooks” at the root (home directory) of your Google Drive. If you use this for other courses or personal code notebooks, I recommend creating a folder for this course and then moving the assignments AFTER you have completed them.

## Setup

**Do not skip this!**

1. [Set up our coding environment](#setup)

## Part 1

**Data from a single cell**

Steps 2-4 are a demonstration of the same cell from the [web-based activity](add link).

2. [Import data for a single cell](#import)
3. [Plot a raw sweep of data](#plotsweep)
4. [Plot the morphology of the cell](#morphology)

Complete Part 1 to reach the checkpoint for this project. Submit your notebook in WebCampus after this is completed (more below).

## Part 2

**Data from many cells**

*Steps 5-7: we will review and plot pre-computed features for all of the cells in the database.*

5. [Analyze computed features](#metrics)
6. [Compare action potential waveforms](#waveforms)
7. [Compare cell types](#compare)

Complete Part 2 to finish this project. Submit your final notebook in WebCampus after this is completed (more below).

<hr>

<a id="setup"></a>

## Step 1. Set up coding environment
Each time we start an analysis in Python, we must import the necessary code packages. The cells below will install packages into your coding environment , and a reminder, these are not installed on your computer.

### Install the AllenSDK
The Allen Institute has compiled a set of code and tools called a **Software Development Kit** (SDK). These tools will help us import and analyze the cell types data. See [Technical Notes](#technical) at the end of this notebook for more information about working with the AllenSDK.

><b>Task</b>: Run the cell below, which will install the allensdk into your coding environment.

**Technical notes about installing the allensdk**
- If you receive an error, contact your instructor and also check out the documentation <a href="http://alleninstitute.github.io/AllenSDK/install.html">here</a>.

In [None]:
# Install the AllenSDK and other packages we need
# Keep this to a minimum (no-deps)

!pip install allensdk --no-deps
!pip install SimpleITK
!pip install pynwb
!pip install simplejson
!pip install requests_toolbelt

**Note**: You may receive notifications from above that not all package dependencies were installed. In this case, it should be okay to run this notebook. It was tested with the minimum requirements and ran without issues. <br>
If you do run into issues, contact the instructor for help and solutions right away! <br>

You can review the images here and see if it is similar to what appears for you. If it is, then the notebook should work.
![](https://drive.google.com/uc?export=view&id=1Nge8zf_USFaQLd2wxFhNmjzsKRMqwc58)

![](https://drive.google.com/uc?export=view&id=1RNoFvTPcf_w80e9bbUKgRfUEPd55pC14)

In [None]:
# Ensure that the AllenSDK is installed
try:
    import allensdk
    print('allensdk imported')
except ImportError as e:
    !pip install allensdk

><b>Task</b>: Run the cell below -- any packages that are missing will be installed for you.


*   We also need to make sure that our coding environment has [NumPy](https://numpy.org/), [Pandas](https://pandas.pydata.org/), and [Matplotlib](https://matplotlib.org/) already installed.




In [None]:
# Ensure that NumPy, Pandas, and Matplotlib are installed
try:
    import numpy
    print('numpy already installed')
except ImportError as e:
    !pip install numpy
try:
    import pandas
    print('pandas already installed')
except ImportError as e:
    !pip install pandas
try:
    import matplotlib
    print('matplotlib already installed')
except ImportError as e:
    !pip install matplotlib

### Import common packages
Below, we'll `import` a common selection of packages that will help us analyze and plot our data. We'll also configure the plotting in our notebook.


><b>Task</b>: Import the numpy module nicknamed as <code>np</code>. Add a <code>print</code> message at the end that says "Packages imported!" so that you know the code ran.


>> <u>Hint</u>: You've done this many times in past notebooks. Look back at one of those if you are unsure.

In [None]:
# Import our plotting package from matplotlib
import matplotlib.pyplot as plt

# Specify that all plots will happen inline & in high resolution
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Import pandas for working with databases
import pandas as pd

# Import numpy below as np
...

# Add your print() statement below
...


### Import the CellTypesModule from the allensdk
With the allensdk installed, we can `import` the **CellTypesCache module**.

The CellTypesCache that we're importing provides tools to allow us to get information from the cell types database. We're giving it a **manifest** filename as well. CellTypesCache will create this manifest file, which contains metadata about the cache. If you want, you can look in the cell_types folder in your code directory and take a look at the file.

><b>Task</b>: Run the cell below.

In [None]:
# Import the "Cell Types Cache" from the AllenSDK core package
from allensdk.core.cell_types_cache import CellTypesCache

# Initialize the cache as 'ctc' (cell types cache)
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')

print('CellTypesCache imported.')

# Part 1

<a id="import"></a>

## Step 2. Import Cell Types data
Now that we have the module that we need, let's import a raw sweep of the data. The cell below will grab the data for the same experiment you just looked at on the website. This data is in the form of a [**Neuroscience Without Borders** (NWB)](https://www.nwb.org/) file.

><b>Task</b>: Find the cell specimen ID for the first cell you looked at on the Allen Institute website and assign this to <code>cell_id</code> below by replacing the <code>...</code>. Run the cell.

>><u>Hint</u>: The ID is also in the URL.


*Note*: This might take a minute or two. You should wait until you have a green checkmark in the upper right of the Colab window to continue.

In [None]:
# Enter your cell_id below
cell_id = ...

# Get the electrophysiology (ephys) data for that cell
data = ctc.get_ephys_data(cell_id)
print('Data retrieved')

Thankfully, our NWB file has some built-in **methods** to enable us to pull out a recording sweep. We can access methods of objects like our `data` object by adding a period, and then the method. That's what we're doing below, with `data.get_sweep()`.


><b>Task:</b> Choose your favorite sweep below.

>><u>Hint</u>: Go back to the website to see what the sweep numbers are.
    
<i>Note</i>: If you get an `H5pyDeprecationWarning`, don't worry about it - this is out of our control.

In [None]:
# Assign your favorite sweep number to a variable "sweep_number" below.
sweep_number = ...

sweep_data = data.get_sweep(sweep_number)
print('Sweep obtained')

In [None]:
# Take a look a the data- this looks a little different than the data we've previously worked with in tables and arrays
# This is another format of data in Python
sweep_data

In [None]:
# Notice it's in the format 'dict' with {key:value} pairs
type(sweep_data)

Refer back to lab08 if you need a refresher on this new type in Python or google any questions you have :)

<a id="plotsweep"></a>
## Step 3. Plot a raw sweep of data
So far, we:

*   loaded the data
*   chose a cell ID
*   chose a sweep number

Now, let's plot that data.

><b>Task:</b> Run the cell below to get the stimulus and recorded response information from the dataset.

In [None]:
# Get the stimulus trace (in amps) and convert to pA
stim_current = sweep_data['stimulus'] * 1e12

# Get the voltage trace (in volts) and convert to mV
response_voltage = sweep_data['response'] * 1e3

# Get the sampling rate and create a time axis for our data
sampling_rate = sweep_data['sampling_rate'] # in Hz
timestamps = (np.arange(0, len(response_voltage)) * (1.0 / sampling_rate))

><b>Task</b>: In the cell below, use the <code>plt.plot(x,y)</code> to plot our voltage trace.
>- You will need to give it two arguments, which are variables we created above: <code>timestamps</code> (x axis) and <code>response_voltage</code>(y).
- Without changing the limits on the x-axis, you won't be able to see individual action potentials.
- Modify the x-axis using <code>plt.xlim([min,max])</code> to specify the limits by replacing <code>min</code> and <code>max</code> with numbers that make sense for this x-axis.
- Add correct labels for `x_label` and `y_label`
- Add a `title` that includes text about what you are plotting

Here is the documentation for `matplotlib.pyplot`: https://matplotlib.org/stable/tutorials/pyplot.html#sphx-glr-tutorials-pyplot-py

In [None]:
# Plot the raw recording here
plt.plot(x, y)                # change these to correct variable names
plt.xlabel(...)               # add the string to describe the x-axis values
plt.ylabel(...)               # add the string to describe the y-axis values
plt.title(...)                # Add a descriptive title


Here is an example plotting both the stimulus and the response.
>**Task**: Just run this cell.

In [None]:
# More complex- plot both stim and resp
# Set up our plot- general layout parameters
fig, axes = plt.subplots(2, 1, sharex=True,figsize=(8,6))

# axes 0 is our first plot, of the recorded voltage data
axes[0].plot(timestamps, response_voltage, color='blue')    # code to actually plot
axes[0].set_ylabel('mV')                                    # make sure to label our y-axis values
axes[0].set_xlim(0,3)                                       # determines the scaling of the x-axis
axes[0].set_title('whole-cell patch recording')             # make sure to set a title for what we are looking at

# axes 1 is our second plot, of the stimulus trace
axes[1].plot(timestamps, stim_current, color='gray')        # code to actually plot
axes[1].set_ylabel('pA')                                    # make sure to label our y-axis values- different units from above!
axes[1].set_xlabel('seconds')                               # label here and not above because they are shared axis values. It will look best at bottom and not cluttered
axes[1].set_title('stimulus')                               # set the title- it's not the same as the first plot!

plt.show()

Review this combined plot. Does the stimulus shape match the sweep number you selected? For example, the sweep numbers vary and are associated with the 'stimulus type'. The options included:

*   Long square
*   Noise 1
*   Noise 2
*   Short square

![](https://drive.google.com/uc?export=view&id=1Av1UTB1tr2v9gciQ4FesozCicKI8CDD1)



><b>Task</b>: In the cells below, select a **different sweep number** from **a different stimulus type** and plot below to compare.

You can refer back to the Allen Cell Types Dataset website [here](https://celltypes.brain-map.org/experiment/electrophysiology/474626527). <br>
Make sure to check the cell ID so it is consistent with what you input to the notebook at the beginning! <br>
You can 'Browse Electrophysiology Data' to review the sweeps and also the change the stimulus type to see what your plot should look like at the end of the steps below!

>**Task**: We will repeat the process above. You can copy and paste the same code, but you need to rename your variables so the names do not overwrite your original. #Comments provided below - replace the `...` with the correct information.

**Step 1**: Create a new variable to hold the new number so it doesn't overwrite your original choice below.

In [None]:
# Assign your favorite sweep number to a variable "a_new_sweep_number" below.
a_new_sweep_number = ...

a_new_sweep_data = data.get_sweep(...) # insert the correct variable name here
print('Another sweep obtained')

**Step 2**: Get the information needed about this sweep for plotting. Create new variables so we don't overwrite the original data.

In [None]:
# Get the stimulus trace (in amps) and convert to pA
new_stim_current = a_new_sweep_data['...'] * 1e12          # insert the variable name for our new data

# Get the voltage trace (in volts) and convert to mV
new_response_voltage = a_new_sweep_data['...'] * 1e3       # insert the variable name for our new data. should match line right above, except we select the 'response' column instead of 'stimulus'

# Get the sampling rate and can create a time axis for our data
new_sampling_rate = ...['sampling_rate']           # insert the variable name for our new data.  this value has units of Hertz (Hz)
new_timestamps = (np.arange(0, len(...)) * (1.0 / ...)) # insert variables we recently used in the two lines right before this. if you're unsure, look back up at the original code, but used the modified names

**Step 3**: Adapt your plotting code from above to plot the new sweep number data you selected. Adjust the x-axis as needed to see the data here.

In [None]:
# Plot both stim and resp
#Set up our plot
fig, axes = plt.subplots(2, 1, sharex=True,figsize=(8,6))

# axes 0 is our first plot, of the recorded voltage data
axes[0].plot(..., ..., color='magenta')             # insert new variable names
axes[0].set_ylabel(...)
axes[0].set_xlim(...,...)                           # adjust as needed to set the scaling of the x-axis
axes[0].set_title(...)

# axes 1 is our second plot, of the stimulus trace
axes[1].plot(..., ..., color='black')                   # insert new variables names
axes[1].set_ylabel(...)
axes[1].set_xlabel(...)
axes[1].set_title(...)

plt.show()

Your plot should look different from the first one you created above. If it is exactly the same, then you need to check that you changed the correct variables in selecting a new sweep and the plotting code.

Does your plot match what you saw on the website? Check that your x-axis limits (`set_xlim`) allow you to see enough of the plot. If it doesn't, go back and change those numbers to expand your axis values.

<a id="morphology"></a>
## Step 4. Plot the morphology of the cell
The Cell Types Database also contains **3D reconstructions** of neuronal morphologies. Here, we'll plot the reconstruction of our cell's morphology. We took a look at these already when interacting with the website and completing the Data Sheet.

We will now use code to produce plots!

*Note*: It could take up to several minutes to run the cell below, possibly longer over a slow internet connection.
Try it out. If it doesn't work for you, that is okay- it is a less fancy version of what is already on the website!

><b>Task</b>: Just run this cell and review the plot.

In [None]:
# Import necessary toolbox
from allensdk.core.swc import Marker

# Download and open morphology and marker files
morphology = ctc.get_reconstruction(cell_id)
markers = ctc.get_reconstruction_markers(cell_id)

# Set up our plot
fig, axes = plt.subplots(1, 2, sharey=True, sharex=True, figsize=(10,10))
axes[0].set_aspect('equal')
axes[1].set_aspect('equal')

# Make a line drawing of x-y and y-z views
for n in morphology.compartment_list:
    for c in morphology.children_of(n):
        axes[0].plot([n['x'], c['x']], [n['y'], c['y']], color='black')
        axes[1].plot([n['z'], c['z']], [n['y'], c['y']], color='black')

# cut dendrite markers
dm = [ m for m in markers if m['name'] == Marker.CUT_DENDRITE ]
axes[0].scatter([m['x'] for m in dm], [m['y'] for m in dm], color='#3333ff') # blue
axes[1].scatter([m['z'] for m in dm], [m['y'] for m in dm], color='#3333ff')

# no reconstruction markers
nm = [ m for m in markers if m['name'] == Marker.NO_RECONSTRUCTION ]
axes[0].scatter([m['x'] for m in nm], [m['y'] for m in nm], color='#333333') # grey
axes[1].scatter([m['z'] for m in nm], [m['y'] for m in nm], color='#333333')

axes[0].set_ylabel('y')
axes[0].set_xlabel('x')
axes[1].set_xlabel('z')
plt.show()

Notes on the plot above: We used the marker file, which contained information about the reconstruction. The blue circles are locations where dendrites have been truncated due to slicing. When axons were not reconstructed in the images, these lines would appear grey. Depending on your cell of interest, it may or may not show these on the image if they were not features in the data!

Great job! You have reached the checkpoint. Please save your work and submitted your notebook with code completed up until this point.

---
## Checkpoint Submission
Please submit the above sections of your completed notebook for Part 1 as a checkpoint to the project. Follow the instructions below.

### **Important submission steps:**
1. Choose **Save** (and make sure you've already saved a copy in your drive) from the **File** menu.
3. You will save two files in the following steps.
4. You will submit the two files for this assignment to the corresponding Assignment on the WebCampus (Canvas) course website.

**It is your responsibility to make sure your work is saved before following the instructions in the last cell.**



Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output.
**Please save (or check again) before exporting!**
You will save two files:

1.   Go to `"File > Download"` and choose the **.ipynb format** (first option)
  - This will save a copy of the python notebook file- extension .ipynb- in the Downloads folder on your computer (or wherever you have opted to save files)


2.  Go to `"File > Print"` and save a copy of your notebook in **PDF format**. This is needed for grading the answers by hand as a double check, and to specifically grade any written responses.

<hr>

# Part 2

<a id="metrics"></a>
## Step 5. Analyze pre-computed features

The Cell Types Database contains a set of features that have already been computed, which could serve as good starting points for analysis. We can query the database to get these features. Below, we'll use the Pandas package that we imported above to create a **[dataframe](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe)** of our data. We practiced working with pandas during lab time for the course the other week.

><b>Task</b>: Run the cell below. Note we have a new method to access the ephys data called `data.get_ephys_features()`. Scroll to the right to see many of the different features available in this dataset.

*Note*: It may take ~10 seconds. A list of all of the features available will be printed, as well as produce a dataframe, which looks like tables you've worked with before.

In [None]:
# Download all electrophysiology features for all cells using `.get_ephys_features`
ephys_features = ctc.get_ephys_features()
dataframe = pd.DataFrame(ephys_features).set_index('specimen_id', drop=False)   # this is the main dataset (dataframe in pandas) we will work

print('Ephys features available for %d cells:' % len(dataframe))
dataframe.head()                                                    # Just show the first 5 rows (the head) of our dataframe

What if we want to see **all** columns in our dataframe? Notice there are 56 columns, and all are not shown. This is indicated by the `...` in the middle of the table column labels.

><b>Task</b>: Fill in the code to show all of the columns in the dataframe. There are multiple ways to do this! Try googling how to do this with a dataframe in pandas, or you can go directly to the pandas documentation and search.

>><u>Hint</u>: I'm not looking for a specific format here. They can be a list, an object, etc. The output should show all the names. You can refer back to lab08 for an example.

In [None]:
# Viewing *all* column labels from our dataframe
...

In [None]:
# Examine the data for our cell_id from above
cell_ephys_features = dataframe[dataframe['specimen_id'] == cell_id]
cell_ephys_features

><b>Task</b>: Demonstrate your understanding of the line of code above by writing a short description of what is happening in Python. You can always refer back to the lab for similar examples.

>><u>Hint</u>: Try starting with the part in the blue brackets first: `dataframe['specimen_id'] == cell_id`. State what that does and/or returns. What is the name of that type of variable that is returned with the `==` sign? How do those values then relate to what we get out from our `dataframe` as the `cell_ephys_features` that is printed out for you? Feel free to test it out with lines of code below to 'show your work'. <br>
Refer back to lab08 if you need another example.

*Type your response here*

In [None]:
# Add your code here if you would like
...

In [None]:
# Add your code here if you would like
...

As you can see in the dataframe above, there are many **pre-computed features** available in this dataset. [Here's a glossary](https://drive.google.com/file/d/1yBfYm1yMtFSFB2erhfZ0SpeeuoWJNMEk/view?usp=sharing), in case you're curious.

![](https://drive.google.com/uc?export=view&id=1bCj0kl4Dd5J_Qf2DlmxRUva57yK5xVf_)


Image from the <a href="https://help.brain-map.org/download/attachments/8323525/CellTypes_Ephys_Overview.pdf?version=2&modificationDate=1508180425883&api=v2">Allen Institute Cell Types Database Technical Whitepaper.</a>
<br><br>


Let's first look at the **speed of the trough**, and the **ratio between the upstroke and downstroke** of the action potential:
- **Action potential fast trough** (<code>fast_trough_v_long_square</code>): Minimum value of the membrane potential in the interval lasting 5 ms after the peak.
- **Upstroke/downstroke ratio** (<code>upstroke_downstroke_ratio_long_square</code>)</b>: The ratio between the absolute values of the action potential peak upstroke and the action potential peak downstroke.</div>

We created a pandas dataframe above of all of these features. Here, we'll assign the columns we're interested in to two different **variables**, so that they will contain the datapoints we're interested in. Remember, we can access different columns of the dataframe by using the syntax `dataframe['column_of_interest']`. The columns of interest here are `fast_trough_v_long_square` and `upstroke_downstroke_ratio_long_square`.

><b>Task:</b> Edit and run the cell below to store these columns into our two new variables.

In [None]:
fast_trough = ...
upstroke_downstroke = ...

In [None]:
# Show the data we just extracted
fast_trough

Note that when working with dataframes, the index values (we have labeled them with our `specimen_id`) also appear when you select certain columns you want to work with. This is a good way to help you keep track of the data.

><b>Task:</b> Create a scatterplot that plots the **fast trough**(x axis) versus the **upstroke-downstroke ratio** (y axis). Label your axes accordingly using `plt.xlabel()` and `plt.ylabel()`.
    
<u>Hint</u>: If you need help, see the [`plt.scatter()` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html), [`plt.xlabel` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html), and [`plt.ylabel` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html).

In [None]:
# Your scatterplot here
...

It looks like there may be roughly two clusters in the data above. Maybe they relate to whether the cells are presumably excitatory (spiny) cells or inhibitory (aspiny) cells. Let's query the API and split up the two sets to see.

><b>Task:</b> The cell below will load a new data that contains the dendrite type of these cells using `.get_cells()`. It will then make this a dataframe we can view. Review the code comments and then run the cells.

In [None]:
# Get information about our cells' dendrites
# this is different info that we will need to incorporate into the dataframe we've already been working with
# we call the `Cell Types Cache (ctc)` and use  `.get_cells()`
cells = ctc.get_cells()


In [None]:
# Make this a dataframe so we can view the data
cells = pd.DataFrame(cells)
cells

You can see we have alot of information here, but not as many columns as the previous data we were working with. Some of it isn't relevant for right now. <br> <br>
What we need to do is to combine two dataframes to get all the info we need to plot. We will use `join` to do this!

><b>Task:</b> The cell below will join our dataframes. Review the comments, run the cell, and check out the columns in the dataframe.

In [None]:
# Use join to combine our dataframes
# dataframe is the original we've been working with so far
# cells is the new one we just imported and want to combine with
# setting the index of cells to be the 'id', indicates which column to join on
# id column from cells dataframe is equal to specimen_id column from dataframe
full_dataframe = dataframe.join(cells.set_index('id'))
full_dataframe

We now have even more columns of data to work with, but again, you can't see them all.

><b>Task:</b> Repeat what you did above in **Step 5** to show all of the column names. Notice that the new columns got added to where the previous ones ended (after `vrest`). Refer back to lab08 if you need an example of how to do this.

In [None]:
# Show all columns of our full_dataframe
...

Now it's time to plot the data again, but using the information we have about `dendrite_type` on our plot.

><b>Task:</b> Create a similar scatterplot, where each dot is colored by dendrite type. Insert the appropriate column name and cell type string from the column in the table. There are only two values. <br>
Just run this cell.

In [None]:
# Create a dataframe for spiny cells, and a dataframe for aspiny cells
spiny_df = full_dataframe[full_dataframe['dendrite_type'] == 'spiny']
aspiny_df = full_dataframe[full_dataframe['dendrite_type'] == 'aspiny']

# Create our plot! Calling scatter twice like this will draw both of these on the same plot.
plt.scatter(spiny_df['fast_trough_v_long_square'],spiny_df['upstroke_downstroke_ratio_long_square'], c = '#d95f02') # orange
plt.scatter(aspiny_df['fast_trough_v_long_square'],aspiny_df['upstroke_downstroke_ratio_long_square'], c = '#7570b3') # purple

plt.ylabel('upstroke-downstroke ratio')
plt.xlabel('fast trough depth (mV)')
plt.legend(['Spiny','Aspiny'])

plt.show()

><b>Task</b>: Demonstrate your understanding of the two lines of code above where we define `spiny_df` and `aspiny_df`. Write a short description of what is happening in Python. You can always refer back to the lab for similar examples.

>><u>Hint</u>: This is similar to a previous question when we were looking for the row with a certain cell ID. In this case, we are taking a subset of the data and getting many values for each `dendrite_type`. Feel free to test it out with lines of code below to 'show your work'. Refer to previous questions that were similar, or back to lab08 for another example.

*Type your response here*

In [None]:
# Add your code here if you would like
...

In [None]:
# Add your code here if you would like
...

Looks like these two clusters do partially relate to the dendritic type.

Cells with spiny dendrites (which are typically excitatory cells) have a big ratio of upstroke:downstroke, and a more shallow trough (less negative).

Cells with aspiny dendrites (typically inhibitory cells) are a little bit more varied. But </i>only</i> aspiny cells have a low upstroke:downstroke ratio and a deeper trough (more negative)- look at the very bottom and left of the plot where there are only purple values.

><b>Task:</b> What else can we say about the aspiny dendrites compared to the spiny dendrites? Review the plot and describe where you see purple (aspiny) versus orange (spiny).

*Type your answer here*

<a id="waveforms"></a>

## Step 6. Compare waveforms
Let's take a closer look at the action potentials of these cells to see what these features actually mean for the action potential waveform. We will choose one of the cells with the highest upstroke:downstroke ratio. Our first line of code, where it says `dataframe.sort_values()`, is the line of code that will arrange our dataframe by the
**upstroke_downstroke_ratio_long_square** column.


><b>Task</b>: To choose one of the cells with the highest `upstroke_downstroke_ratio_long_square`, we need a way to arrange our data based on this value. If we want the **highest values** at the top of this column in our dataframe, what method can we use to put them in this order?

<u>Hint</u>: We've done this many times with the `datascience` package, and the name of the method is very similar, but has a slightly different in the name. Think about what steps you would go through if you wanted to find the `max` value in a column from your table. (it's not `max` though).
If you aren't sure how to order your data from highest to lowest in that column, review the previous lab notebook.



In [None]:
# Fill in the code below at locations with `...`
... = dataframe.insert_method_here('upstroke_downstroke_ratio_long_square',ascending=...)

In [None]:
# Assign one of the top cells in our dataframe (default = 2)
specimen_id = sorted_dataframe.index[2]                     # index is the column name and we want the third entry-  remember this is the third entry because we index starting with 0

# Also get the ratio value so we can print the info together
ratio = sorted_dataframe.iloc[2]['upstroke_downstroke_ratio_long_square']

# Print our results so that we can see them
print('Specimen ID: ' + str(specimen_id) + ' with upstroke-downstroke ratio: ' + str(ratio))

Notice we've used several methods we've practiced many times with the `datascience` package, and now you know how to do these in `pandas` too!

><b>Task</b>: Demonstrate your understanding of the line of code above that obtained the `ratio`.

>><u>Hint</u>: Try starting with the part `ratio = sorted_dataframe.iloc[2]`. What does this do? Test that part out in the cells below to show your work. Also describe in words what it does. After this, try out the remainder of the code using the `ratio` variable you just created and referencing the column name like it appears above [`upstroke_downstroke_ratio_long_square`]. Again, write briefly what this part does. Use the empty cells below to fill in your code and either write comments or type an answer into the textbox. <br>
Refer back to lab08 for an example.

In [None]:
# Explanation and code part 1
...

In [None]:
# Explanation and code part 2
...

*If you don't comment your code to explain what it does, write out your description of each step here*

Now we can take a closer look at the action potential for that cell by grabbing a raw sweep of recording from it, just like we did above. You will just use the sweep that is already setup for you.

><b>Task:</b> Run the cell below. This may take a minute or so.

*Note*: If you receive a "H5pyDeprecationWarning," but you can ignore it.

In [None]:
# Get the data for our specimen
upstroke_data = ctc.get_ephys_data(specimen_id)

# Get one sweep for our specimen (I've already handselected a gorgeous one for you, 45)
upstroke_sweep = upstroke_data.get_sweep(45)

# Get the current & voltage traces
current = upstroke_sweep['stimulus'] * 1e12 # in A, converted to pA
voltage = upstroke_sweep['response'] * 1e3 # converted to mV

# Get the time stamps for our voltage trace
timestamps = (np.arange(0, len(voltage)) * (1.0 / upstroke_sweep['sampling_rate']))

print('Sweep obtained')

><b>Task:</b> Plot the sweep we obtained above. <br>
<i>Hint</i>: You'll want to use `plt.plot(x,y)` where `x` is the `timestamps` and `y` is the `voltage`. Be sure to give your plot accurate labels- including x and y values (and units) and a title.

In [None]:
# Plot the new sweep here
...

><b>Task</b> Generate a similar plot for a cell with a <b>low</b> upstroke ratio. Similiar to above, zoom in on the x-axis so that you can actually see the shape of the action potential waveform.

><b>Hint</b>: You only need to change <i>one</i> value in all of the code in this step in order to make this change. How did we arrange our dataframe at first?

In [None]:
# Sort the dataframe and reassign
low_sorted_dataframe = ...        # use this name since the line below depends on this value

# Assign one of the top cells in our dataframe (default = 2) and the ratio to different variables
low_specimen_id = low_sorted_dataframe.index[2]
low_ratio = low_sorted_dataframe.iloc[2]['upstroke_downstroke_ratio_long_square']

# Print our results so that we can see them
print('Specimen ID: ' + str(low_specimen_id) + ' with upstroke-downstroke ratio: ' + str(low_ratio))

In [None]:
# Get the data for our specimen
low_upstroke_data = ...         # adjust the code here- similar to the first plot, but make sure to use the *new* variable name

# Don't change the lines from here down
# Get one sweep for our specimen (I've already handselected a gorgeous one for you, 45)
low_upstroke_sweep = low_upstroke_data.get_sweep(45)

# Get the current & voltage traces
low_current = low_upstroke_sweep['stimulus'] * 1e12 # in A, converted to pA
low_voltage = low_upstroke_sweep['response'] * 1e3 # converted to mV!

# Get the time stamps for our voltage trace
low_timestamps = (np.arange(0, len(low_voltage)) * (1.0 / low_upstroke_sweep['sampling_rate']))

print('Sweep obtained')

In [None]:
# Plot the new sweep here
...

As you'll hopefully see, even that one feature, upstroke:downstroke ratio, means the shape of the action potential is dramatically different. The other feature we looked at above, size of the trough, is highly correlated with upstroke:downstroke. You can see that by comparing the two cells here. Cells with high upstroke:downstroke tend to have less negative troughs (undershoots) after the action potential.

![](https://github.com/tmckim/materials-fa23-colab-working/blob/main/project/project2/highvslowratio.jpg?raw=1)
![](https://github.com/tmckim/materials-fa23-colab-working/blob/6249d786c53b4a5c3d60eea697752b8c925cb31f/project/project2/highvslowratio.jpg)

![](https://drive.google.com/uc?export=view&id=1AgZCFiLaBcggWoErdO1rfMvEyzY6StUu)

Image from the <a href="http://help.brain-map.org/pages/viewpage.action?pageId=10158207">Allen Institute Website </a> on Upstroke:Downstroke Ratio.
<br><br>

<a id="compare"></a>

## Step 7. Compare cell types
Let's get out of the action potential weeds a bit. What if we want to know a big picture thing, such as: <br>
 * Are *human cells different than mouse cells?* <br>
 or
 * How are excitatory cells different from inhibitory cells?  

To ask these questions, we can pull out the data for two different cell types, defined by their species, dendrite type, or transgenic line.

**About Transgenic Cre Lines.** The Allen Institute for Brain Science uses transgenic mouse lines that have Cre-expressing cells to mark specific types of cells in the brain. This technology is called the **Cre-Lox system**, and is a common way in neuroscience (and some other fields) to target cells based on their expression of specific genetic promotors. For more information about Cre/Lox technology, see [this website](https://old.abmgood.com/marketing/knowledge_base/Cre-Lox_Recombination.php). Information about the different Cre lines that are available can be found in [this glossary](https://drive.google.com/file/d/1nI3tFHaP5Fp-DLj93ObMu3bLxZ7oZ--w/view?usp=sharing) or on the [Allen Institute's website](http://connectivity.brain-map.org/transgenic).

**For this final step, it's up to you to choose which cell types to compare.** You'll also decide which pre-computed feature to compare between these cell types.

- If you'd like to compare cells from different **species**, the column name is `species`.
- If you'd like to compare **spiny vs. aspiny cells**, the column name is `dendrite_type`.
- If you'd like to compare two **transgenic lines** (mouse cells only), the column name is `transgenic_line`. What if we want to know whether different genetically-identified cells have different intrinsic physiology?


>**Task**: Assign `column_name` below to the name of your column to see the unique values in that column. Make sure your column is a **string**.

In [None]:
# Define your column name below
# The output will tell you which strings you should use in the following cell
column_name = ...

print(full_dataframe[column_name].unique())

Using the possible values in your column, create two separate dataframes by **taking a subset of** the dataframe below. You can think of this as similar to using a method like `where` from the `datascience` package. We want only the rows of the dataframe where `celltype_1` is equal to a specific string value.

>Task: Assign `celltype_1` and `celltype_2` to the names of your cell types from above. For example, if you chose `dendrite_type` they would be `'spiny'` and `'aspiny'`. Make sure your cell type names are in quotes (they should be strings) and *exactly* match what is found in the dataframe.

In [None]:
# Define your cell type variables below
celltype_1 = ...
celltype_2 = ...

# Create two separate dataframes for each type
celltype_1_df = full_dataframe[full_dataframe[column_name] == celltype_1]
celltype_2_df = full_dataframe[full_dataframe[column_name] == celltype_2]

# Tell us how many cells there are per type
print("Type 1 # Cells: %d" % len(celltype_1_df))
print("Type 2 # Cells: %d" % len(celltype_2_df))

Let's start by plotting a distribution of the recorded resting membrane potential (`vrest`) for one cell type versus the other cell type.

>**Task**: Run the cell below to plot a histogram to compare one pre-computed feature of your choice between your two cell types.

- Note that the distribution is normalized by the total count (`density=True`), since there may be very different numbers of cells for your two cell types. You can set `density` to false to plot the raw numbers of cells.
- You can also specify the number of bins with `bins= < #bins > `.
- Look through the [`plt.hist()` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) for more information.

In [None]:
plt.figure()

# Change your feature below.
feature = 'vrest'

# Plot the histogram, with density = True
plt.hist([celltype_1_df[feature],celltype_2_df[feature]],density = True)

# Change the labels below
plt.xlabel(...)                           # feature name
plt.ylabel('Normalized Number of Cells')
plt.legend(['Cell Type 1','Cell Type 2']) # adjust these according to your feature values
plt.title(...)                            # descriptive title
plt.show()

>**Task**: Describe what you see in the plot. Do the values differ based on the feature you selected? Why or why not? Refer to the graph to support your answers.

*Type your answer here*

> <b>Task</b>: Choose a different feature to compare between your cell types, and rerun the plot above. Use the documentation below to get the exact name of the feature (in parentheses), and change the x label axis so that we know what you're plotting. Also include a y label and title.

Here are a few additional pre-computed features you might consider comparing (you can find a complete glossary [here](https://drive.google.com/file/d/1yBfYm1yMtFSFB2erhfZ0SpeeuoWJNMEk/view?usp=sharing)):

- <b>Tau (<code>tau</code>)</b>: time constant of the membrane in milliseconds
- <b>Adapation ratio (<code>adaptation</code>)</b>: The rate at which firing speeds up or slows down during a stimulus<br>
- <b>Average ISI (<code>avg_isi</code>)</b>: The mean value of all interspike intervals in a sweep<br>
- **Slope of f/I curve** (<code>f_i_curve_slope</code>)</b>: slope of the curve between firing rate (f) and current injected<br>
- **Input Resistance** (<code>input_resistance_mohm</code>)</b>: The input resistance of the cell, in megaohms.<br>
- **Voltage of after-hyperpolarization** (<code>trough_v_short_square</code>)</b>: minimum value of the membrane potential during the after-hyperpolarization

The code cells have already been copy and pasted below. Edit these to create another plot based on the task above.

In [None]:
# Define your column name below
# The output will tell you which strings you should use in the following cell
column_name2 = ...

print(full_dataframe[column_name2].unique())

In [None]:
# Define your cell type variables below
celltype_2 = ...
celltype_3 = ...

# Create two separate dataframes for each type
celltype_2_df = full_dataframe[full_dataframe[column_name2] == celltype_2]
celltype_3_df = full_dataframe[full_dataframe[column_name2] == celltype_3]

# Tell us how many cells there are per type
print("Type 1 # Cells: %d" % len(celltype_2_df))
print("Type 2 # Cells: %d" % len(celltype_3_df))

In [None]:
plt.figure()

# Change your feature below.
feature = ...

# Plot the histogram, with density = True
plt.hist([celltype_2_df[feature],celltype_3_df[feature]],density = True)

# Change the labels below
plt.xlabel(...)                           # feature name
plt.ylabel('Normalized Number of Cells')
plt.legend(['Cell Type 2','Cell Type 3']) # adjust these according to your feature values
plt.title(...)                            # descriptive title
plt.show()

>**Task**: Describe what you see in the plot. Do the values differ based on the feature you selected? Why or why not? Refer to the graph to support your answers. <br>
Also compare this to your previous plot. Although it is a different feature, do they look the same or different? Refer to the plots and what the data looks like to support your answer.

*Type your answer here*

><b>Task</b>: It's more common to plot summary statistics like a mean or median, so let's compare our two cell types with a boxplot. To do so, we can use `plt.boxplot()` ([Documentation here](https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.boxplot.html)). The code below is already set up for you -- just run it and edit your labels as necessary.

In [None]:
# Boxplot creation lines below
# this uses the first cell types, but you welcome to change to the most recent above if you'd like. Either option is fine
plt.boxplot([celltype_1_df[feature],celltype_2_df[feature]])
plt.ylabel(...)                                                 # adjust y-axis label
plt.xticks([1, 2], ['Cell Type 1','Cell Type 2'])               # adjust these values

# Plot title -- be sure to update!
plt.title(...)

plt.show()

><b>Task</b>: What is the name of the value shown as an orange line in the boxplot? What are the names for the values that are the black whisker lines? If you aren't sure, review the documentation link above, or search google for an answer =)

*Type your answer here*

## You are finished with Project 2!

In [None]:
from IPython.display import HTML
print('Your neurons are firing!')
HTML('<img src="https://media.giphy.com/media/OBIBNR9ATt3HdpcmLC/giphy.gif">')


# Final Submission

### **Important submission steps:**
1. Choose **Save** (and make sure you've already saved a copy in your drive) from the **File** menu.
3. You will save two files in the following steps.
4. You will submit the two files for this assignment to the corresponding Assignment on the WebCampus (Canvas) course website.

**It is your responsibility to make sure your work is saved before following the instructions in the last cell.**

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output.
**Please save (or check again) before exporting!**
You will save two files:


1.   Go to `"File > Download"` and choose the **.ipynb format** (first option)
  - This will save a copy of the python notebook file- extension .ipynb- in the Downloads folder on your computer (or wherever you have opted to save files)


2.  Go to `"File > Print"` and save a copy of your notebook in **PDF format**. This is needed for grading the answers by hand as a double check, and to specifically grade any written responses.

-----------
<a id="technical"></a>

## Technical notes & credits

This notebook demonstrates most of the features of the AllenSDK that help manipulate data in the Cell Types Database.  The main entry point will be through the `CellTypesCache` class. `CellTypesCache` is responsible for downloading Cell Types Database data to a standard directory structure, and you will not have to keep track of where your data lives.

Much more information can be found in the <a href="http://https://brainmapportal-live-4cc80a57cd6e400d854-f7fdcae.divio-media.net/filer_public/4e/be/4ebe2911-bd38-4230-86c8-01a86cfd758e/visual_behavior_2p_technical_whitepaper.pdf">Allen Brain Atlas whitepaper</a> as well as in their <a href="http://help.brain-map.org/display/celltypes/Documentation"> documentation</a>.


This file modified from <a href='https://alleninstitute.github.io/AllenSDK/cell_types.html'>these</a> notebooks.

In case you're curious, <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html ">here's documentation</a> for plotting pandas series (which we do quite a bit above).

This notebook was modified from [Ashley Juavinett, PhD at UCSD](https://sites.google.com/ucsd.edu/neuroedu/home).