# Project A: Analysis of Calcium Imaging Data

**Project Overview**

In this project you will be provided with imaging data from an experiment in head-fixed mice. The data stem from two animals in which different regions of the hippocampus were recorded, the dentate gyrus (DG) and area CA1. For each animal you will get both, a continuous signal representing the Ca2+ level in the neurons and a binary signal representing detected Calcium events. The goal is to explore the neural activity of the two different regions on a basic level.
<br />
<br />

**What you can expect from this project:**
- use pandas to filter data
- use numpy for basic arithmetic operations
- plot histograms, timeseries, and box plots
- generate an artifical spike train
- apply some statistical tests
- get an impression of how Calcium imaging data look like
<br />
<br />

**Questions to answer:**
1. Is the distribution of inter-event intervals (IEI) different between the two hippocampal areas?
2. Are the IEI distributions different from one of a random poisson process? 
3. The binary signal is the result of a thresholding analysis, how well do you think it fits to the underlying continuous data?
4. Is the average continuous signal of neurons with higher event rate indeed higher as expected?
5. Is there a difference in how correlated neural activity is in the two regions? 
<br />
<br />

-----------------------------------

*Let’s get started…*

### Download the data
Run the cell below to download the data from sciebo. The data will be stored in the folder "day_4/project_A/data". You will see 2 .csv files, each containing data of two animals:
- data_continuous.csv
- data_binary.csv

Some Information about the data:

| Type of information | Value |
| ------ | ----------- |
| Framerate | 15 Hz |
| time/frame | 0.067s |
| Calcium sensor | GCaMP6s |
| Continuous data | df/f |


<span style="color:mediumseagreen">Explanatory note on "df/f":</span> "df/f" is an abbreviation for "delta f over f" and describes an operation which is commonly applied to continuous imaging signals in order to calculate the deviation of the signal at each timepoint from a (often preceding) baseline value and normalize it to some value (often the mean/median/ a percentile of the whole signal is used). This helps to "clean" the signal from baseline drifts and the normalization step allows the comparison of signals from different neurons with different baseline signal intensities (e.g. due to different levels of sensor expression).

In [None]:
# Install libraries
%pip install numpy pandas matplotlib scipy seaborn requests tqdm

In [12]:
# Import sciebo
import sciebo

# Download files
sciebo.download_from_sciebo('https://uni-bonn.sciebo.de/s/6XQzJTECUkrHUAv',  'data/continuous.csv')
sciebo.download_from_sciebo('https://uni-bonn.sciebo.de/s/xyE0HgxGi3kAlWm',  'data/binary.csv')

---------------------------------------

### Open the data 

To do:
- Import the necessary libraries
- Open the data as dataframes
- Inspect the data


In [13]:
# Import libraries

# ...

In [14]:
# Open data as dataframes

# ...

In [15]:
# Inspect data

# ...

---------------------------------------

### 1. Is the distribution of IEIs different between the two hippocampal areas? 

To do:
- Import the necessary libraries
- Filter the binary data by brain region
- Convert the resulting dataframe to a numpy matrix of shape neurons x timepoints
- Calculate the inter-event intervals (IEIs) for all neurons of one hippocampal region
- Plot the IEIs as histogram
- Save the histograms as a .jpg
- Perform a statistical test comparing the two distributions


<span style="color:mediumseagreen">Explanatory note on IEIs:</span> you may have heard of inter-spike intervals, however, here we are dealing with Ca^2+^ imaging, which can only be used as proxy for neural activity and inference of underlying spikes is challenging. The binary Ca^2+^ event data are marking onsets of Ca^2+^ events (not actual spikes) and hence we talk about inter-event intervals. Nevertheless, having this constraint in mind, the data can be used to draw some conclusions about neural activity.

<span style="color:mediumseagreen">Event rate vs IEI:</span> Often the (spike or event) rate is used to assess neural activity. While this is an easy-to-use measure, it occludes any information about the distribution of events (are there many of them happening in a small window of time or are they spread evenly across time?) and thereby information about neural activity dynamics.

In [16]:
# Import libraries

# ...

In [17]:
# Filter the binary data by region and convert the resulting dataframe to a numpy matrix
# Hint: extract some useful information, e.g. the number of neurons in each region

# ...

In [18]:
# Calculate the IEIs

# ...

In [19]:
# Plot histograms and save as .jpg

# ...

In [20]:
# Perform a statistical test

# Check normal distribution of data using Shapiro-Wilk test
# Hint: if the p-Value is significant, the data are not normally distributed
# and the Mann-Whitney U test should be used instead of a t-test to compare the distributions 

# ...


# Perform a two-sided MWU test if data are not normally distributed, otherwise a t-test for independent samples
# Hint: if the p-Value is significant, the H0 (that the two distributions stem from one population) can be rejected,
# i.e. we can assume that the distribution of IEIs in the DG and CA1 are indeed different

# ...

<span style="color:violet">**Question:**</span>

What does this tell you about the recorded neuron populations?

<span style="color:tomato">**Hints:**</span>

| Goal | Function |
| ------ | ----------- |
| get rid of columns | DataFrame.drop() |
| get nonzero elements of an array | numpy.nonzero() |
| adding elements of list to another list | list.extend() |
| plot a histogram | matplotlib.pyplot.hist() |
| save a figure | matplotlib.pyplot.savefig() |
| perform a Shapiro-Wilk test | scipy.stats.shapiro() |
| perform a Mann-Whitney U test | scipy.stats.mannwhitneyu() |
| perform a t-test for independent samples | scipy.stats.ttest_ind() |

**Bonus:**
- Calculate the Coefficient of Variation (CV) for the two distributions
- Calculate the Fano factor for the two distributions
- What do they tell you?

In [21]:
# Coefficient of Variation

# ...

---------------------------------------

### 2. Are the IEI distributions different from one of a random poisson process? 

To do:
- Import the necessary libraries
- Calculate the event rate of each neuron for both regions
- Generate a poisson process matching each neuron's event rate
- Extract the IEIs for each of these artificial event trains and pool for each region across neurons
- Plot the real and artificial distributions for both hippocampal regions and save as .jpg
- Compare the real and artificial distribution of each brain region using a statistical test

<span style="color:mediumseagreen">Poisson model of spike generation:</span>

- Generation of a random poisson process using python (chapter 6.5): https://mrgreene09.github.io/computational-neuroscience-textbook/Ch5.html
- Something more theoretical: https://www.cns.nyu.edu/~david/handouts/poisson.pdf

In [22]:
# Calculate the event rate of each neuron for both regions

# ...

In [23]:
# Generate a poisson process matching each neuron's event rate
# Hints:
# 1) calculate the probability of an event occurring in one time bin,
#    where one bin has the same duration as one frame in the real recordings. --> p = event_rate * time_per_frame
# 2) create a vector of random numbers between 0 and 1 with a length of the number or frames in the real recording
# 3) for each value in this vector check if it exceeds the probability calculated in 1).
#    If yes, mark it with a 1. If No, mark it with a 0.
# 4) Repeat steps 1-3 for each neuron.

# ...

In [24]:
# Extract IEIs of these artificial event trains
# Hint: you just need to repeat what you did for answering the first question

# ...

In [25]:
# Plot histograms and save as .jpg
# Hint: you can use subplots to plot the corresponding real and artificial data from both regions at the same time

# ...

In [26]:
# Perform a statistical test
# Hint: you know already from answering question 1 if your data are distributed normally or not.
# You just need to repeat the same statistical test that you used before.

# ...

<span style="color:violet">**Question:**</span>

Are the real and artificial distributions similar? What does it mean if yes/no?

<span style="color:tomato">**Hints:**</span>

| Goal | Function |
| ------ | ----------- |
| create an array of zeros with same size as some other array | numpy.zeros_like() |
| generate random numbers between [0, 1) | numpy.random.rand() |
| convert a boolean to int array | some_bool_array * 1 |
| plot two figures in one, use subplots | matplotlib.pyplot.subplot() |
| avoid overlap of subplot titles/labels | matplotlib.pyplot.tight_layout() |

---------------------------------------

### 3. How well does the binary signal fit to the underlying continuous data? 


<span style="color:mediumseagreen">Thresholding procedure:</span>

The binary signal is the result of a rather complicated thresholding procedure that assesses the noise of the signal and uses drops of the signal (negative going events so to say) to estimate a false positive rate during the event detection.

To do:
- Obtain following quantiles of event rates for both regions: Q-25, Q-50, Q-75
- For each region, find 2 neurons such that each meets one of follwing criteria:
  1) has an event rate r < Q-25,
  3) has an event rate r > Q-75
- We now need the contiuous data: filter them by brain region and convert the resulting dataframe to a numpy matrix of shape neurons x timepoints
- Get the indices of the timepoints at which there is an event for the selected neurons
- Plot the continuous signal of these neurons
- Plot the event from the binary data on top
- Save plots as .jpg

In [27]:
# Obtain following quantiles of event rates for both regions: Q-25, Q-50, Q-75
# Note: the Q-50 will be needed for question 4

# ...

In [28]:
#  For each region, find 2 neurons such that...
# ...one has an event rate r < Q-25,
# ...one has an event rate r > Q-75
# Hint: You can obtain all indices of neurons that meet the respective criterion
# and than just pick the first or any one of them for the current task (you can also compare several different ones)

# ...


In [29]:
# Select which neurons from the lowest and highest quartile you want to plot
# ...

In [30]:
# We now need the contiuous data:
# Filter them by brain region and
# convert the resulting dataframe to a numpy matrix of shape neurons x timepoints
# (as you did with the binary data when answering question 1)

# ...

In [31]:
# Get the indices of the timepoints at which there is an event onset (= 1)

# ...

In [32]:
# Plot the continuous activity of the DG neurons you selected and overlay the events from the binary signal and save as .jpg
# Hint 1: you can use plt.subplot again to plot the signals of the two neurons in one figure.
# (use a for loop if you do not want to type the code twice)
# Hint 2: you can use another for loop (maybe within the first loop) to plot the binary events

# ...


In [33]:
# Plot the continuous activity of the CA1 neurons you selected and overlay the events from the binary signal and save as .jpg
# (same as above, just for CA1)

# ...

<span style="color:violet">**Question:**</span>

- What part of the event was detected for the binary signal?
- How well does it fit to the continuous signal?
- Do you think the detection algorithm is working well? If not, why? 

<span style="color:tomato">**Hints:**</span>

| Goal | Function |
| ------ | ----------- |
| get q-th quantile of data | numpy.quantile() |
| get both the index and value of a list item (useful for loops) | enumerate() |
| plot a vertical line | matplotlib.pyplot.vlines() |

---------------------------------------

### 4. Is the average continuous signal of neurons with higher event rate indeed higher as expected? 

To do:
- For each region, assign every neuron to one of 4 classes using the event rates and quantiles calculated above:
  1) neurons with event rate r < Q-25 (1^st^ quartile)
  2) neurons with event rate Q-25 < r < Q-50 (2^nd^ quartile)
  3) neurons with event rate Q-50 < r < Q-75 (3^rd^ quartile)
  4) neurons with event rate r > Q-75 (4^th^ quartile)
- Calculate the average intensity of each neurons continuous signal
- Use this to calculate the average (of the neuron-wise average) intensity for each of the 4 classes defined above
- Create a box plot displaying the different average continuous signal intensities per class for both regions
- Save the box plot as .jpg


In [34]:
# Import libraries

# ...

In [35]:
# Assign neurons to one of the 4 quartile classes

# ...

In [36]:
# Calculate the average continuous signal intensity of each neuron

# ...


In [37]:
# Create a box plot where you compare for each region the average continuous signal for the quartiles and save it as .jpg
# Hint 1: when you use seaborn, it may be useful to combine the data to plot in a dictionary such that:
# {'class': class_vector, 'signal': neuron_mean_signal}
# Hint 2: you can try to plot the individual datapoints on top of the boxplot
# Hint 3: you can try to plot both regions in one plot

# ...

<span style="color:violet">**Question:**</span>

- Does the average signal intensity of neurons correspond to the quartiles that were based on the binary event data?
- What did you expect?
- How does this observation fit to what you observed in question 3?  

<span style="color:tomato">**Hints:**</span>

| Goal | Function |
| ------ | ----------- |
| apply two criteria to filter values, e.g. 0 < a > 1 | (a > 0) & (a < 1) |
| plot data points on top of boxplot | seaborn.stripplot() |

---------------------------------------

### 5. Is there a difference in how correlated neural activity is in the two regions?

To do:
- Calculate the Pearson correlation coefficient of all pairs of neurons within each hippocampal region
- Remove the diagonal and the lower triangle of the resulting square correlation coefficient matrices
- Plot the distributions of correlation coefficients of each region as histogram  and save as .jpg
- Perform a statistical test
- Bonus: Plot the correlation coefficients as heatmaps for better visualization and save as .jpg

In [32]:
# Calculate pairwise pearson coefficient

# ...

In [33]:
# Remove the diagonal and the lower triangle of the resulting square correlation coefficient matrices

# ...

In [38]:
# Plot histograms  and save as .jpg

# ...


In [39]:
# Perform a statistical test

# Check normal distribution of data using Shapiro-Wilk test

# ...

# Perform a two-sided MWU test or t-test for independent samples

# ...

In [40]:
# Bonus: Plot heatmap  and save as .jpg

# ...

<span style="color:violet">**Question:**</span>

- Has one of the regions higher correlation values between its pairs of neurons?
- How could you interpret this?

<span style="color:tomato">**Hints:**</span>

| Goal | Function |
| ------ | ----------- |
| calculate Pearson correlation coefficient | numpy.corrcoef() |
| plot a heatmap | seaborn.heatmap() |

------------------------------------
*You did it!!*