# Bootcamp 2020

In the last part of today I will

* introduce you to Python
* introduce you to some basic ideas about analyzing aggregates of neurons
* give some tips on cultivating your beautiful life in data

We don't have time today to give a good programming lesson, so forgive me for lack of detail on specifics of Python


First things first, in Python we explicitly "import" other code that we want to use from "packages"

In [1]:
import os
from glob import glob # wildcard file selection
from pprint import pprint # pretty printing

# load the dreaded matplotlib
%matplotlib notebook
import matplotlib.pyplot as plt

import pandas as pd      # data structures
import numpy as np       # basic numerical operations
import altair as alt     # plotting library
from scipy.io import wavfile # loading sound files
from scipy import signal # for making spectograms
from ipywidgets import interact, fixed # buttons and stuff
# with respect to https://github.com/Chekos/blog-posts/tree/master/altair%20%2B%20ipywidgets


# tell plotting library not to try and hold everything in memory
alt.data_transformers.enable('json')
# and let it render good
alt.renderers.enable('notebook')

RendererRegistry.enable('notebook')

# 1 - Load Data!

I've already taken the data output from the recording and cleaned it up into "long" format -- more on that later -- using the `clean_out_dir.m` MATLAB code found in the repository

Now find and load it!

In [2]:
# start with tuning curves -- 
# build a string with a wildcard * to search for files we want
tc_search = os.path.join(os.getcwd(),'data','tones', "*.csv")

# use glob to find the files!
tc_fns = glob(tc_search)

#what did we get?
print(f"""\nSearch string: {tc_search}
          \nGlobbed Filenames: {tc_fns}""")


Search string: /Users/jonny/git/bootcamp_2020/data/tones/*.csv
          
Globbed Filenames: ['/Users/jonny/git/bootcamp_2020/data/tones/2020-09-03_17-41-32_tones.csv']


Now we'll load the file and see what inside

In [3]:
if len(tc_fns) == 1:
    # make a pandas dataframe out of our .csv file
    df = pd.read_csv(tc_fns[0])
    
# print the first n rows (default 5, try giving another number as an argument)
#df.head()
df.head(10)

Unnamed: 0,amps,cell,dur,expt,freqs,rep,spikes
0,40,1,25,tuning_curve,4000.0,1,1.566667
1,40,1,25,tuning_curve,5656.85,1,21.633333
2,40,1,25,tuning_curve,11313.7,1,26.866667
3,40,1,25,tuning_curve,11313.7,1,36.366667
4,40,1,25,tuning_curve,22627.4,1,16.666667
5,55,1,25,tuning_curve,1000.0,1,1.433333
6,55,1,25,tuning_curve,1414.21,1,21.333333
7,55,1,25,tuning_curve,1414.21,1,29.1
8,55,1,25,tuning_curve,22627.4,1,22.3
9,70,1,25,tuning_curve,-1.0,1,9.766667


Our data is in a format where every **row** is a single spike and every **column** is a variable that describes the spike. In this case we have

* **expt** - some short description of the type of experiment that was run
* **cell** - the cell that the spike came from
* **rep** - the repetition of the tone that was presented
* **freqs** - the frequency of the presented tone
* **amps** - the amplitude of the presented tone in dBSPL
* **dur** - the duration of the presented tone
* **spikes** - the time of the spike in ms



Since our data is in a computable format, we don't need special custom code to do simple analysis and description of it. Data has a long lifetime -- it goes through many forms, is combined with data of different types, etc -- so being purposive when deciding the format we want to store and operate on our data is extremely important to staying sane in analysis. Again more on this later

A very simple summary we can do is count the number of unique cells in our data:

In [4]:
# here we are using python f-string formatting. by putting f before the string,
# we can interpolate our variables inside of {}s

unique_cells = df['cell'].unique()
n_unique_cells = len(unique_cells)

print(f"There are {n_unique_cells} unique cells: {unique_cells}")


There are 2 unique cells: [1 7]


We can do the same thing for frequencies and amplitudes

In [5]:
uq_freqs = df['freqs'].unique()
uq_amps  = df['amps'].unique()

print('Frequencies:\n')
pprint(uq_freqs)
print('\n\nAmplitudes:\n')
pprint(uq_amps)

Frequencies:

array([ 4.00000e+03,  5.65685e+03,  1.13137e+04,  2.26274e+04,
        1.00000e+03,  1.41421e+03, -1.00000e+00,  2.82843e+03,
        1.60000e+04,  6.40000e+04,  2.00000e+03,  8.00000e+03,
        3.20000e+04,  4.52548e+04])


Amplitudes:

array([40, 55, 70])


## 2 - Spike Raster

We make a classic raster first.

I am trying out a new plotting library, [Altair](https://altair-viz.github.io), so refer to its documentation :)

First we're going to make a **function** to make the plot -- making a function lets us encapsulate all the logic of an operation we want to do (like plot a spike raster) so we don't have to write it all every time. You don't need to understand the code that goes inside of the  function for now, as it is mostly idiosyncratic to the altair library

In [6]:
def spike_raster(cell: int, data: pd.DataFrame) -> alt.Chart:
    """
    Plot a spike raster plot for a single cell
    
    Args:
        cell (int): The ID of the cell to plot
        data: (pandas.DataFrame): DataFrame object containing data
        
    Returns:
        altair.Chart: The created chart
    """
    
    # we first declare a Chart object, subsetting our data to a single cell
    # we then mark_circles with the encoding (map from data to graphics)
        
    chart = alt.Chart(data[data['cell'] == cell]).mark_circle().encode(
        x = alt.X('spikes'),      # X axis will be spike time
        y = alt.Y('rep' ),        # Y is the stimulus repetition
        size = alt.value(5),      # make the dots small
        opacity=alt.value(1.),    # opaque
        color=alt.condition(      # and...
            # if the spike happened during the stimulus presentation
            (alt.datum.spikes >= 0) & (alt.datum.spikes<=alt.datum.dur), 
            alt.value('red'),     # colored red
            alt.value('black')    # otherwise black
        )
    ).properties(
        width = 200,
        height = 30,
    ).facet(
        row='freqs',  # split the plot into rows by frequency
        column='amps' # and columns by amplitude
    )
    
    return chart

To use our function, we call it with `()`, putting our arguments within

In [7]:
alt.Chart(df[df['cell'] == 1]).mark_circle().encode(
        x = alt.X('spikes'),      # X axis will be spike time
        y = alt.Y('rep' ),        # Y is the stimulus repetition
        size = alt.value(5),      # make the dots small
        opacity=alt.value(1.),    # opaque
        color=alt.condition(      # and...
            # if the spike happened during the stimulus presentation
            (alt.datum.spikes >= 0) & (alt.datum.spikes<=alt.datum.dur), 
            alt.value('red'),     # colored red
            alt.value('black')    # otherwise black
        ))

<vega.vegalite.VegaLite at 0x13a6bd550>



In [11]:
chart = spike_raster(cell = unique_cells[0], data = df)


In [12]:
interact(spike_raster,cell = sorted(df['cell'].unique()), data=fixed(df));

interactive(children=(Dropdown(description='cell', options=(1, 7), value=1), Output()), _dom_classes=('widget-…

# Beauty in Analysis

You probably didn't sign up to be a programmer, but surprise! all neuroscientists have to be programmers. You can think of your data analysis code as being a beautiful garden you can cultivate to get more comfortable, powerful, and easy to use over time by being conscious about best principles, or you can hate and fear it, struggle with it by constantly trying to do the minimum possible programming, and have it be just as frustrating every time you return to it.

* data format
    * long,
    * annotated - documentation *in* the data
    * documented - documentation *about* how the data is structured
    * indexible - be able to find your data and trust how it is stored.
* code structure - write in small chunks.


