# Extending an existing pipeline

One of the major strenghts of DataJoint is the ease at which you can take an existing data pipeline and extend it with your own tables, performing new analysis. In this session, you will continue to work in a group to:

1. explore and understand an existing data pipeline
2. extend the data pipeline with new analysis

Let's get started!

## Importing an existing pipeline

As always, let's go ahead and import packages. We are also importing packages to perform analysis with.

In [None]:
import datajoint as dj

import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

dj.config.load('dj_local_conf.json')

We are provided access to an existing pipeline defined in `workshop.calcium`. Let's go ahead and import it!

In [None]:
import workshop.calcium as ca

Let's take a look at the diagram of the pipeline to get an overview.

In [None]:
dj.Diagram(ca)

As you may be able to guess, this pipeline defines workflow for a multi-photon fluorescent functional imaging (Ca2+ imaging) of mouse. During a scan, the mouse may be presented visual stimulus (`ca.Stimulus`), and we might record the mouse's pupil information (`ca.Pupil`) and it's running state (`ca.Treadmill`).

Go ahead and take some time to explore the pipeline. See if you can query the pipeline to gain better insights!

# Exploring the data

Let's carefully study the mouse's pupil information (`ca.Pupil`) and running state information (`ca.Treadmill`), and achieve the following:

1. Pick a single scan for which both pupil and running state information is available.

2. For that scan, plot all pupil related information over time. What information is available?

3. For the same scan, plot running state traces over time

4. Do you see any relationship between pupil and running state? How would you go about quantifying this?

1) Pick a single scan with both `Pupil` and `Treadmill`

In [None]:
keys = (ca.Scan & ca.Pupil & ca.Treadmill).fetch('KEY')
key = keys[1]

2) Plot pupil information.

In [None]:
ca.Pupil() & key

In [None]:
pupil_r, pupil_x, pupil_y = (ca.Pupil & key).fetch1('pupil_r', 'pupil_x', 'pupil_y')

plt.plot(pupil_r, label='pupil radius')
plt.plot(pupil_x, label='pupil x')
plt.plot(pupil_y, label='pupil y')
plt.legend(loc='upper right')

3) plot running state

In [None]:
ca.Treadmill & key

In [None]:
velocity = (ca.Treadmill & key).fetch1('treadmill_vel')

plt.plot(np.abs(velocity), label='treadmill velocity')
plt.legend()

Plot treadmill pupil size and treadmill velocity together

In [None]:
plt.subplot(2, 1, 1)
plt.plot(pupil_r, label='pupil radius')
plt.title('Pupil radius')
plt.subplot(2, 1, 2)
plt.plot(np.abs(velocity), label='treadmill velocity')
plt.title('Absolute treadmill velocity')
plt.tight_layout()

Let's compute Pearson correlation coefficient.

$$
r = E\left[\frac{(x - \mu_x)(y-\mu_y)}{\sigma_x \sigma_y}\right]
$$

In [None]:
r = pupil_r
v = np.absolute(velocity)

mu_r = np.nanmean(r)
mu_v = np.nanmean(v)

sigma_r = np.nanstd(r)
sigma_v = np.nanstd(v)

corr = np.nanmean((r - mu_r) * (v - mu_v) / sigma_r / sigma_v)

print('Correlation is {:.3f}'.format(corr))

# Extending pipeline with our own analysis

Looks like we might be onto something quite interesting! Let's go ahead and implement a table that will compute and store the analysis results for all scans.

* What's the dependencies for the table?
* What does each row represent? (In other words, what's the entity that's getting computed?)
* What should be the tier for this table?

In [None]:
schema = dj.schema('ca_extension')

In [None]:
@schema
class PupilCorr(dj.Computed):
    definition = """
    -> ca.Pupil
    -> ca.Treadmill 
    ---
    pupil_corr :  float   # correlation between pupil radius and locomotion velccity
    """
    
    def make(self, key):
        print('Working on ', key)
        r = (ca.Pupil & key).fetch1('pupil_r')
        v = (ca.Treadmill & key).fetch1('treadmill_vel')
        v = np.abs(v)
        
        mu_r = np.nanmean(r)
        mu_v = np.nanmean(v)

        sigma_r = np.nanstd(r)
        sigma_v = np.nanstd(v)

        corr = np.nanmean((r - mu_r) * (v - mu_v) / sigma_r / sigma_v)
        
        key['pupil_corr'] = corr
        self.insert1(key)

In [None]:
PupilCorr()

In [None]:
PupilCorr.populate()

In [None]:
PupilCorr()

In [None]:
PupilCorr.populate()

In [None]:
dj.Diagram(ca) + dj.Diagram(schema)