# PNI Bootcamp Day 4: Principles of "good" coding

Hi. This notebook contains code and text describing exercises for day 4 of the PNI graduate bootcamp on how to write "good" code. 

It is somewhat intended to be a pedagogical tool, so if you have any questions about what's going on in here, feel free to ask!

*We begin with a generically useful import:*

In [1]:
import numpy as np

### Motivation: The distinction between good and bad code

Below I've written two data processing cells that accomplish the same goal (i.e., perform the same analysis) -- one with good style and one with bad style. Note that neither will run since the referenced data file doesn't exist. By looking **only** at the cell below, can you figure out what analysis is being performed?

In [None]:
mp = np.loadtxt('mp_data_08302019.csv', skiprows=8, usecols=(4))
mp = mp[330:50000]
for i in range(500, len(mp) - 500):
    mp[i] -= np.mean(mp[i - 500 : i + 500])
thresh = np.mean(mp) + np.std(mp) * 2.5
above_thresh = mp > thresh
labs,nlabs = label(above_thresh)
idxs = []
for l in range(1,nlabs+1): # iterate
    idxs2 = np.where(labs==l)[0]
    peak_idx = idxs[np.argmax(mp[idxs2])]
    idxs.append(peak_idx)
avg = 0
for i in range(1, len(idxs)):
    avg += idxs[i] - idxs[i - 1]
avg /= (len(idxs) - 1) * 1000

Now what about this block?

In [None]:
######################## Helper functions ###########################

# Load data from file. Skip first 8 header rows, and only return column 4 (membrane potential)
def load_data(filename):
    return np.loadtxt(filename, skiprows=8, usecols=(4))

# Preprocess raw membrane potential data with sliding window mean subtraction. This 
# accounts for baseline drift in the recording and guarantees that the signal is 0-mean.
def preproc_data(memb_pot, window_radius):
    for i in range(window_radius, len(memb_pot) - window_radius):
        memb_pot[i] -= np.mean(memb_pot[i - window_radius : i + window_radius])
    
    return memb_pot

# Given a preprocessed membrane potential signal and an optional sampling rate, return 
# the times at which spikes occur in the signal. This is accomplished by splitting the 
# signal into sections > mean + n_std * std and then returning the peak
# index within each section. 
def detect_peaks(memb_pot, n_std):
    # Compute all sections of the signal above the threshold
    thresh = np.mean(memb_pot) + np.std(memb_pot) * n_std
    above_thresh = memb_pot > thresh    # boolean array
    
    # split the above_thresh signal into connected segments
    labs, nlabs = label(above_thresh)

    # iterate through each segment and store idx of each peak
    peak_idxs = [] 
    for l in range(1,nlabs+1):

        # find the indices of this segment
        idxs = np.where(labs==l)[0]

        # extract the signal values at these idxs
        vals = memb_pot[idxs]

        # select the index corresponding to the peak signal value
        peak_idx = idxs[np.argmax(vals)]

        # store in our list
        peak_idxs.append(peak_idx)
    
    return peak_idxs

# Given idxs of events in a signal and optional sampling rate, return avg time
# between those events. If sampling rate is not provided, it is assumed that
# sampling rate = 1Hz. IET = Inter-Event Interval
def compute_avg_IET(event_idxs, sampling_rate=1):
    s = 0
    for i in range(1, len(event_idxs)):
        s += event_idxs[i] - event_idxs[i - 1]
    
    s /= (len(event_idxs) - 1) * sampling_rate # If rate not specified, this returns "time" in sample space
    
    return s
    
######################## Main analysis logic ###########################

FILENAME = 'mp_data_08302019.csv'
SAMPLING_RATE = 1000
START_SAMPLES = 0.33 * SAMPLING_RATE # Ignore first START_SAMPLES data samples
END_SAMPLES = 50 * SAMPLING_RATE     # Ignore data after END_SAMPLES data samples
WINDOW_RADIUS = 500 # half-width of window for preprocessing

# Load raw membrane potential data from FILENAME
raw_memb_pot = load_data(FILENAME)

# Preprocessing step: subtract the mean in a sliding window of radius 500 
# to remove baseline drift
preproc_memb_pot = preproc_data(raw_memb_pot[START_SAMPLES, END_SAMPLES], WINDOW_RADIUS)

# Extract indices of peaks in the signal (putative action potentials)
memb_pot_peak_idxs = detect_peaks(preproc_memb_pot)

# Given all of "spike" idxs, compute the average time between them 
avg_interspike_time = compute_avg_IET(memb_pot_peak_idxs, SAMPLING_RATE)

### Exercises

*Exercise 1: Decomposing language into programmatic structure*

*1a.* Given the following text describing how one should play the guitar, write down the **structure** of a program that accepts a sheet music and plays it on the guitar. Essentially, write the "main script" portion of a typical modular program, making references to whatever functions you need to accomplish your goal. You do **not** need to actually fill out the details of the functions.

1. Load sheet music.
2. For every page in the music:
    * For every note on the page:
        * Play the note

In [None]:
SONGNAME = 'Julia - The Beatles'

sheetmusic = load_sheetmusic(SONGNAME)
pages = get_pages(sheetmusic)

for page in pages:
    for note in page:
        play(note)

*1b.* Given the following text describing how one should build a bridge, write down the **structure** of a program that accepts a bridge-specification and produces a bridge. Essentially, write the "main script" portion of a typical modular program, making references to whatever functions you need to accomplish your goal. You do **not** need to actually fill out the details of the functions.

1. Load specification from client.
2. Compute list of materials required from specification.
3. Compute cost of materials from list.
    * If materials are too expensive, lambast your client for their ambitions.
    * If not, purchase materials.
4. Given materials, build the bridge.

In [None]:
CLIENT_SPEC = 'Princeton Neuroscience Institute 2.0'
THRESHOLD = 100

blueprint = load_specs(CLIENT_SPEC)
materials = derive_materials(blueprint)
cost = get_material_cost(materials)
if cost > THRESHOLD:
    print(THRESHOLD + ' is too much! Go away.')
else:
    actual_materials = purchase_materials(materials)
    building = build_from_materials(actual_materials)
    
    

*Exercise 2: Reusing code and building modular programs*

*2a. The fibonacci sequence is a somewhat famous sequence of numbers. It is defined as follows: $fib(0) = 0,\,fib(1) = 1,\,fib(n) = fib(n-1) + fib(n-2)$ for integer $n \geq 2$. The factorial function $n!$ computes the product $n! = n \cdot (n - 1) \cdot\,...\,\cdot\, 2 \cdot 1$ for positive integer $n$.*

*Write a function `fib_over_nfac` that accepts an integer `n` and returns $\frac{fib(n)}{n!}$. In accordance with the priciples of modular programming, you should accomplish this by writing two helper functions -- one that computes $fib(n)$ and one that computes $n!$*.

*2b. Write a function `n_choose_k` that accepts two integers `n, k` and returns $\frac{n!}{k!(n-k)!}$. Note that this corresponds to the number of ways that one can select a subset of $k$ items from a set of $n$ items.*

*You should use the factorial helper function you defined above in order to do this.*

*Exercise 3: Building a slightly more challenging modular program*

*In this exercise, you will write a function `is_contained` that accepts a point `p` and a list of axis-aligned rectangles `rectangles` and returns the indices of the rectangles that contain `p`.* 

*A point is represented as a tuple `(x, y)` of coordinates. Rectangles will be represented as a tuple of two points `(p1, p2)`, where `p1` is the coordinates of the bottom-left corner and `p2` is the coordinates of the top-right corner.*

*Unlike previous examples, I won't tell you explicitly what structure to use. Think about what computations are being carried out here and how you can represent them as functions. Feel free to check with me before you start coding.*

*Some example inputs to test your function on*

In [None]:
rect1 = [(0.5, 0.5), (1, 1)] # Defines a rectangle with bottom left corner at (0.5, 0.5), top right at (1, 1)
rect2 = [(-0.5, -0.5), (0.5, 0.5)]
rect3 = [(0, 0), (1, 1)]
rect4 = [(-1, -1), (0, 0)]

rectangles = [rect1, rect2, rect3, rect4]
p1 = (0, 0) # The origin
p2 = (2, 1)
p3 = (-0.5, -0.5)