# Day 2 Exercises (NumPy + Matplotlib)

## Part 1: Basic NumPy Operations
a) Generate an array of numbers 0-24. Reshape to a 5x5 matrix.

b) Extract the diagonal of this matrix.

c) Multiply the matrix by an identity matrix of the same shape. Confirm that it is identical to the original.

Hint: Use `np.all` command to confirm all equal. 

d) Join the matrix with itself and return a new matrix with shape (2,5,5).

e) Compute the mean of the concatenated matrix along the first axis. Confirm its equal to the original matrix.

f) Return the indices of the matrix where the elements are greater than 15.

g) Using `np.where`, set all elements of the matrix greater than 15 to 1, else 0.


h) Set all elements of the matrix greater than 15 to 2, less than 5 to 1, else 0.

Hint: `np.where` can be passed as an input to `np.where`.

i) Return the lower triangle of the original matrix.

j) Define a demean function.

k) Apply the demean function across each row of the matrix.

## Part 2: Spike Detection

In the following exercises, you will be manipulating, analyzing, and visualizing preprocessed extracellular electrophysiological data. Specifically, the following 10s recording was taken from the abdomen of a crayfish. Action potentials are readily apparent throughout the entire recording. 

First, we load the data.

In [None]:
import numpy as np

## Load data.
npz = np.load('spikes.npz')
data = npz['data'] * 1e6      # Convert to uV
times = npz['times']

a) Plot the entire raw recording. Do multiple types of spikes appear to be present?

b) In a recent paper, [Rey et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0361923015000684) suggest a simple spike detection technique via data-driven amplitude thresholding. Specifically, they propose an automated amplitude threshold that defined as multiple of an estimate of the standard deviation of the noise:

$$ \text{threshold} = k \cdot \hat{\sigma}_n $$

where $k$ is a constant typically between 3-5; and $\hat{\sigma}_n$ is an estimate of the standard deviation of the noise, defined as:

$$ \hat{\sigma}_n = \frac{\text{median} \left( |X| \right)}{0.6745} $$ 

where $|X|$ is the absolute value of the raw data.

Write a function that returns the amplitude threshold as defined above. The function should accept as arguments the raw data, $X$, and the constant, $k$. 

c) Next we need a function that can detect slices of the raw signal that exceed the threshold. This ultimately becomes a clustering problem (i.e. identifying "islands" of signal rising above an "ocean of noise"). Though this is definitely doable with core NumPy, the SciPy library has built-in functions specifically written for these purposes. 

Because these functions are beyond the scope of the bootcamp, we have provided a peak finding function for you. The function, `peak_finder`, accepts a raw data trace and a threshold. It then finds all clusters of samples above a threshold, and returns the index and signal magnitude corresponding to the peak of each cluster.

The function relies on the `measurements` tools from scipy.ndimage. For a tutorial, see [here](https://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/).



In [None]:
def peak_finder(X, thresh):
    """Simple peak finding algorithm.
    
    Parameters
    ----------
    X : array_like, shape (n_times,)
        Raw data trace.
    thresh : float
        Amplitude threshold.
        
    Returns
    -------
    peak_loc : array_like, shape (n_clusters,)
        Index of peak amplitudes.
    peak_mag : array_like, shape (n_clusters,)
        Magnitude of peak amplitudes.
    """
    import numpy as np
    from scipy.ndimage import measurements
    
    ## Error-catching.
    assert X.ndim == 1
    
    ## Identify clusters.
    clusters, ix = measurements.label(X > thresh)
    
    ## Identify index of peak amplitudes. 
    peak_loc = np.concatenate(measurements.maximum_position(X, labels=clusters, index=np.arange(ix)+1))
    
    ## Identify magnitude of peak amplitudes.
    peak_mag = measurements.maximum(X, labels=clusters, index=np.arange(ix)+1)
    return peak_loc, peak_mag

d) Apply the peak detection algorithm to the raw data using a constant $k=6$. Plot a histogram of the spike amplitudes (try bins of 0-150 in increments of 5 uV). 

How many spikes are detected? How many types of spikes do there appear to be?

e) Plot the first second of the data. Using a scatterplot (or any other method you can think of), indicate the peak for each detected spike.

f) Remake the plot above, but repeating the procedure with a constant $k=2$. How trustworthy is the spike detection algorithm with this more liberal threshold?

g) Returning now to the detected spikes when $k=6$, define a set of boundaries that divides the spikes into three clusters. How many spikes are in each cluster?

h) Action potentials last roughly 1-2 milliseconds. With this in mind, extract a 3 ms window around each detected spike; that is, extract 1.5 ms of samples on either side of the detected peak. Store each epoch according to its cluster. 

Hint: The data were recorded at 10 KHz meaning there are 10 samples per millisecond. 

i) Plot each averaged spike waveform in a single plot. Add a legend denoting the spike cluster.

## Part 3: Two-Photon Recordings

In this set of exercises, you will manipulate, analyze, and visualize preprocssed two-photon calcium imaging data collected from a larval zebrafish. A schematic of the experimental setup is below. The larval zebrafish was fixed in place and presented a series of rotating light-dark bands. The stimuli moved either in clockwise rotation (red) or counterclockwise rotation (blue). The green box indicates the section of the zebrafish imaged.

<img src="larval_zebrafish.png">

Next we load in the data. Note, the data is three-dimensional, *[time, x-dim, y-dim]*.

In [None]:
import numpy as np

## Load data.
npz = np.load('calcium.npz')
data = npz['data']
times = npz['times']
cw = npz['cw']
ccw = npz['ccw']

print(data.shape)

a) Using `plt.imshow`, plot the total luminance (i.e. sum across all timepoints within each pixel).

b) Using `np.percentile`, create different masks of the data by setting to zero pixels that fall beneath some percentile of brightness. How well can you isolate individual neurons?

c) To get a better understanding of the timing of the experiment, plot the clockwise (CW) and counterclockwise (CCW) stimulus timeseries. Add a legend to differentiate the two.

d) Next, we will try to isolate neurons selective to clockwise rotation of the visual stimulus. To do so, matrix multiply (i.e. take the inner product of) the clockwise stimulus timeseries and the 3d data matrix. Plot the resulting mask with `plt.imshow`. 

e) Repeat the step above but now for the counterclockwise stimulus timeseries. Are a different set of neurons now more prominent?

f) Extract the timeseries of the 99th percentile brightest pixels from the thresholded maps in (d) and (e). Average the pixel timeseries within each condition (i.e. CW, CCW). Finally, plot the two separate averaged timeseries. Do they resemble the stimulus timeseries plots in (c)?