# 8-Spots paper analysis

> This is the main notebook to be executed to reproduce the analysis of the 8-spot paper.

# How to use this notebook?

First and foremost, this notebook, and all the other notebooks linked below, can be used as a read-only resource to inspect every single step of the analysis. Furthermore, many steps of the analysis (alternative burst searches or fitting procedures) are not presented in the main paper for obvious space constraints and can be found here, where the complete analysis is provided.

In the second place, this resource can be used to re-run the analysis on your machine. You can modify parameters of burst search, or use a different fitting approach and see what's the effect of these changes on the complete analysis.

> If you are new to the notebook interface you can get a quick briefing by clicking on menu *Help* -> *User interface tour*.


# Execution Overview

This "master" notebook sequentially executes all the notebooks that perform the different steps of the analysis presented in the 8-spot paper.

We execute both standard notebook or "template notebooks". A template notebook is executed several times, each time with a different input file or burst search parameters. Template notebooks are used for the μs-ALEX analysis.

To execute other notebooks we use two helper functions that we import here:

In [1]:
# %load notebook_runner.py
"""
This modules defines two functions to be used inside an IPython Notebook
in order to execute other notebooks.

These functions are used to run a notebook one or more times
(with different data files as input).

For the template notebooks the result of each execution (including data and
plots) are saved as a new notebooks. A parameter (`data_id`) is passed to the
notebook through an environment variable (`NB_DATA_FILE`). The notebook used
as template (for example [usALEX-5samples](usALEX-5samples.ipynb)) reads the
environment variable NB_DATA_FILE to decide which file to process.
"""

import os
import pandas as pd
from IPython.display import display, FileLink
from IPython import nbformat
from IPython.nbconvert.preprocessors import ExecutePreprocessor

## Monkey-patching to workaround a limitation of current ipython
## See: https://github.com/ipython/ipython/issues/8286
from IPython.nbformat.v4 import output_from_msg

def preprocess_cell(self, cell, resources, cell_index):
    """
    Apply a transformation on each code cell. See base.py for details.
    """
    if cell.cell_type != 'code':
        return cell, resources

    outputs = self.run_cell(cell)
    cell.outputs = outputs

    for out in outputs:
        if out.output_type == 'error':
            msg = 'Error executing the following the notebook cell:\n'
            msg += str(cell.source)
            raise RuntimeError(msg)

    return cell, resources

ExecutePreprocessor.preprocess_cell = preprocess_cell
## End of monkey-patch


def run_notebook(notebook_name):
    """Runs the notebook `notebook_name` (file name with no extension).

    This function executes notebook with name `notebook_name` (no extension)
    and saves the fully executed notebook in a new file appending "-out"
    to the original file name.

    It also displays links to the original and executed notebooks.
    """
    nb_name_full  = notebook_name + '.ipynb'
    display(FileLink(nb_name_full))
    
    out_path = 'out_notebooks/'
    out_nb_name = out_path + notebook_name + '-out.ipynb'
    
    nb = nbformat.read(nb_name_full, as_version=4)
    ep = ExecutePreprocessor(timeout = 3600)

    try:
        out = ep.preprocess(nb, {'metadata': {'path': './'}})
    except Exception:
        msg = 'Error executing the notebook "%s".\n\n' % notebook_name
        msg += 'See notebook "%s" for the traceback.' % out_nb_name
        print(msg)
        raise
    finally:
        nbformat.write(nb, out_nb_name)
        display(FileLink(out_nb_name))
    

def run_notebook_template(notebook_name, remove_out=True,
                          data_ids=['7d', '12d', '17d', '22d', '27d'],
                          ph_sel=None):
    """Run a template ALEX notebook for all the 5 samples.

    Fit results are saved in the folder 'results'.
    For each sample, the evaluated notebook containing both plots
    and text output is saved in the 'out_notebooks' folder.
    """
    ## Compute TXT data results file name (removing a previous copy)
    assert ph_sel in ['all-ph', 'Dex', 'DexDem', 'AexAem', 'AND-gate', None]
    ph_sel_suffix = '' if ph_sel is None else '-%s' % ph_sel
    data_fname = 'results/' + notebook_name + '%s.txt' % ph_sel_suffix
    if remove_out and \
       os.path.exists(data_fname):
            os.remove(data_fname)

    nb_name_full = notebook_name + '.ipynb'
    display(FileLink(nb_name_full))

    out_path = 'out_notebooks/'
    
    ep = ExecutePreprocessor(timeout = 3600)
    for data_id in data_ids:
        nb = nbformat.read(nb_name_full, as_version=4)
        
        nb['cells'].insert(1, nbformat.v4.new_code_cell('data_id = "%s"' % data_id))
        nb['cells'].insert(1, nbformat.v4.new_code_cell('ph_sel_name = "%s"' % ph_sel))

        out_nb_name = out_path + notebook_name + '-out%s-%s.ipynb' %\
                      (ph_sel_suffix, data_id)
    
        try:
            out = ep.preprocess(nb, {'metadata': {'path': './'}})
        except:
            msg = 'Error executing the notebook "%s".\n\n' % notebook_name
            msg += 'See notebook "%s" for the traceback.' % out_nb_name
            print(msg)
            raise
        finally:
            nbformat.write(nb, out_nb_name)
            display(FileLink(out_nb_name))
    display(pd.read_csv(data_fname, sep="\s+").set_index('sample'))

# Architecture notes

The simple practice of systematically saving intermediate results in text files is an important step in any scientific that aims to be [computationally reproducibile](http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003285).

Saving intermediate results in a text file allow to easily track and find intermediate results (either manually or programmatically). Moreover, it makes easy to track regressions (or errors) when a [Version Control System](http://en.wikipedia.org/wiki/Revision_control) (i.e. [git](http://git-scm.com/)) is used.

In the present analysis, each notebook saves a selection of results in a text file in the `results` folder. These files are often used as an input by subsequent notebooks.

# Batch execution

The μs-ALEX notebooks are executed in batch from a single template notebook for the 5 samples. This allows maintaining a single "analysis recipe" that is uniformly applied to all the samples. Moreover, if the template notebook is modified, the results for all the 5 samples can be computed in one step. 

> Note that not all μs-ALEX notebooks are templates. For example the notebooks for fitting the global leakage coefficient or direct excitation are simple notebooks that read the previously saved per-sample results and compute averages.

## The notebooks links
When a template notebook is executed a set of output notebooks is generated, one for each sample. After the execution, the cell will show links to the *template notebook* (unexecuted, contains no output) and to the *output notebooks* for each sample. These output notebooks are meant to be read-only and are used to inspect the full analysis results for each sample.

## The results table
After the links, the cell will also shows a table with a row for each sample and a column for each variable saved by the template notebook. To find where a specific variable is computed, you can search the variable name in the output notebook of a specific sample (use the browser "find").

## Working on template notebooks
You can open and modify a template notebook in order to perform a different analysis. To open the template notebook click on the corresponding link (see below). Beware that notebooks with "-out" in their names are *output notebooks* not meant to be executed.

After applying the modifications (and saving) you can run the template notebook directly. In this case a default data file will be loaded. The default specified at the beginning of the notebook (look for the `data_id` variable). If the execution succeeds for a single data file you can run the notebook as a template. Just switch back to the present notebook and execute the corresponding `run_notebook_template()` cell.


# Software version

Here we keep track of all the libraries versions used for the execution.

In [2]:
%load_ext version_information

In [3]:
%version_information numpy, scipy, matplotlib, tables, pandas, lmfit, seaborn, fretbursts

 - Optimized (cython) burst search loaded.
 - Optimized (cython) photon counting loaded.
-------------------------------------------------------------




 You are running FRETBursts (version 0.4rc10-7-gb2e1b5d).

 If you use this software in a publication, please cite it as:

   FRETBursts - An opensource single-molecule FRET bursts analysis toolkit.
   A. Ingargiola 2014. https://github.com/tritemio/FRETBursts

-------------------------------------------------------------


Software,Version
Python,2.7.8 64bit [MSC v.1500 64 bit (AMD64)]
IPython,3.1.0
OS,Windows 7 6.1.7601 SP1
numpy,1.9.2
scipy,0.15.1
matplotlib,1.4.3
tables,3.1.1
pandas,0.16.0
lmfit,0.8.3
seaborn,0.5.1


## 1. μs-ALEX: Raw Proximity Ratio Analysis

We start the analyis by processing and fiting the raw PR values for the 5 us-ALEX measurements. 
The only correction here applied to the bursts is the background correction.

The same analysis is performed for different 
burst searches, as indicated in the notebook file name suffix:

- **all-ph**: burst search on all photons
- **Dex**: burst search on photons during Donor excitation periods
- **DexDem**: burst search on photons during Donor excitation detected in the Donor channel (donor emission)
- **AND-gate**: AND-gate burst search. The bursts are the intersection (in time) of burst from a "Dex" burst search and burst from a "AexAem" burst search.

In [4]:
nb_name = 'usALEX-5samples-PR-raw'
run_notebook_template(nb_name, ph_sel='all-ph')

Unnamed: 0_level_0,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,E_pr_do_kde,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7d,1370,724,600,0.9338,0.931395,0.054563,0.002228,0.5588,0.561594,0.098463,0.00402,0.095,22.375165
12d,1528,410,1081,0.7606,0.749737,0.085984,0.002615,0.586,0.571272,0.104204,0.003169,0.102,22.612463
17d,3013,558,2389,0.4922,0.484872,0.101857,0.002084,0.5364,0.558672,0.110041,0.002251,0.1004,23.168331
22d,2552,403,2045,0.2768,0.279818,0.070842,0.001567,0.5636,0.564369,0.112976,0.002498,0.0924,25.330934
27d,999,203,759,0.1966,0.195458,0.058387,0.002119,0.5564,0.582804,0.112845,0.004096,0.084,19.505634


In [5]:
nb_name = 'usALEX-5samples-PR-raw'
run_notebook_template(nb_name, ph_sel='DexDem')

Unnamed: 0_level_0,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,E_pr_do_kde,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7d,1685,1396,151,0.8426,0.795226,0.089862,0.007313,0.6372,0.600875,0.092822,0.007554,0.095,28.103636
12d,1915,719,1094,0.7142,0.706974,0.075747,0.00229,0.561,0.586976,0.101789,0.003077,0.0924,28.011287
17d,4497,1042,3274,0.463,0.447567,0.102044,0.001783,0.5704,0.574126,0.109356,0.001911,0.0938,29.138126
22d,3455,579,2750,0.2642,0.265737,0.066868,0.001275,0.5638,0.570597,0.109526,0.002089,0.0828,31.978287
27d,1397,401,944,0.1814,0.18344,0.054032,0.001759,0.617,0.601571,0.10828,0.003524,0.0778,23.272539


In [6]:
nb_name = 'usALEX-5samples-PR-raw'
run_notebook_template(nb_name, ph_sel='Dex')

Unnamed: 0_level_0,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,E_pr_do_kde,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7d,1653,955,641,0.9336,0.927281,0.054494,0.002152,0.5752,0.577278,0.100331,0.003963,0.0982,23.005923
12d,1690,486,1155,0.751,0.748524,0.084144,0.002476,0.5996,0.589008,0.104657,0.003079,0.1026,23.376161
17d,3451,764,2584,0.4908,0.480718,0.101802,0.002003,0.5754,0.58363,0.108354,0.002132,0.1056,23.04604
22d,2801,501,2184,0.2752,0.280169,0.067912,0.001453,0.5768,0.583304,0.110295,0.00236,0.0938,26.572532
27d,1145,314,793,0.1944,0.194758,0.057257,0.002033,0.58,0.605761,0.10845,0.003851,0.0882,19.855257


In [7]:
nb_name = 'usALEX-5samples-PR-raw-AND-gate'
run_notebook_template(nb_name)

Unnamed: 0_level_0,n_bursts_all,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
7d,667,611,0.9356,0.929024,0.05387,0.002179,0.5764,0.571353,0.097668,0.003951,23.149236
12d,1128,1102,0.7526,0.750735,0.081054,0.002442,0.5938,0.5835,0.101023,0.003043,23.289481
17d,2475,2450,0.491,0.482356,0.09915,0.002003,0.5746,0.575922,0.102744,0.002076,23.037785
22d,2197,2111,0.2746,0.279806,0.06868,0.001495,0.569,0.577698,0.108606,0.002364,26.511882
27d,782,761,0.1944,0.193718,0.058629,0.002125,0.5818,0.60065,0.105768,0.003834,19.658123


## 2. μs-ALEX: Leakage coefficient fit

In this section we compute the spectral leakage coefficient from the Donor-only peak positions in the Raw PR histograms.

The computation is performed by [usALEX - Corrections - Leakage fit](usALEX - Corrections - Leakage fit.ipynb)
and uses results from [usALEX-5samples-PR-raw-DexDem](usALEX-5samples-PR-raw-DexDem.ipynb) executed in the previous section.

In [8]:
run_notebook('usALEX - Corrections - Leakage fit')

## 3. μs-ALEX: Direct excitation fit

In this section we extract the direct excitation parameter by fitting the S peak position of the Acceptor-only population.

For comparison 2 burst searches are performed:

- **all-ph**: burst search on all photons
- **AexAem**: burst search on photons during Acceptor excitation detected by the Acceptor detector (acceptor emission).

The **all-ph** burst search provides less accurate results and is performed only to show the improvement of the more appropriate **AexAem** burst search.

### 3.1 Fitting of the S peak

In [9]:
nb_name = 'usALEX-5samples-PR-raw-dir_ex_aa-fit'
run_notebook_template(nb_name, ph_sel='all-ph')

Unnamed: 0_level_0,n_bursts_aa,dir_ex_S_kde,dir_ex_S2p,dir_ex_S2pa,dir_ex_S_kde_w1,dir_ex_S_kde_w4,dir_ex_S_kde_w5,dir_ex_S2p_w5,dir_ex_S2p_w5a
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
7d,392,0.068148,0.07188,0.047785,0.06792,0.062473,0.062473,0.055039,0.046237
12d,169,0.073422,0.099534,0.045945,0.073422,0.069519,0.069519,0.076191,0.039622
17d,372,0.08672,0.107564,0.063839,0.086012,0.073422,0.073422,0.072389,0.046629
22d,331,0.078749,0.089866,0.057455,0.078051,0.070435,0.070435,0.073369,0.050282
27d,61,0.04015,0.120768,0.020582,0.04015,0.044932,0.044932,0.035625,0.003425


In [10]:
nb_name = 'usALEX-5samples-PR-raw-dir_ex_aa-fit'
run_notebook_template(nb_name, ph_sel='AexAem')

Unnamed: 0_level_0,n_bursts_aa,dir_ex_S_kde,dir_ex_S2p,dir_ex_S2pa,dir_ex_S_kde_w1,dir_ex_S_kde_w4,dir_ex_S_kde_w5,dir_ex_S2p_w5,dir_ex_S2p_w5a
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
7d,630,0.061797,0.068843,0.061137,0.061571,0.055743,0.055743,0.055734,0.050935
12d,269,0.062473,0.091852,0.044397,0.062473,0.058873,0.058873,0.058348,0.048265
17d,610,0.085305,0.12122,0.071039,0.084363,0.078051,0.078051,0.074779,0.061504
22d,430,0.085305,0.109323,0.057408,0.084599,0.072271,0.072271,0.062908,0.048816
27d,129,0.05042,0.081572,0.034044,0.0502,0.070893,0.070893,0.051986,0.015566


### 3.2 Computation of the direct excitation coefficient

The notebook [usALEX - Corrections - Direct excitation fit](usALEX - Corrections - Direct excitation fit.ipynb) computes the direct excitation coefficient performing an average of the values aextracted from each single measurement:

In [11]:
run_notebook('usALEX - Corrections - Direct excitation fit')

## 4. μs-ALEX: $\gamma$-factor fit

In this section we compute the $\gamma$-factor for the μs-ALEX measurements.

First we reprocess the μs-ALEX measurements including background, leakage and direct excitation correction (standard PR). Then, for the fitted $E$ and $S$ peak position for the FRET population we fit the $\gamma$-factor.

### 4.1: Proximity Ratio analysis

In [12]:
nb_name = 'usALEX-5samples-PR-leakage-dir-ex-all-ph'
run_notebook_template(nb_name)

Unnamed: 0_level_0,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,E_pr_do_kde,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7d,1167,582,542,0.9302,0.925533,0.059437,0.002553,0.5524,0.550366,0.099261,0.004264,0.0014,22.1989
12d,1303,329,944,0.743,0.731184,0.092252,0.003003,0.5768,0.556622,0.105519,0.003434,0.0164,21.841737
17d,2482,462,1957,0.431,0.42615,0.114118,0.00258,0.5436,0.536833,0.113025,0.002555,0.0126,21.054439
22d,2047,319,1666,0.1788,0.182669,0.078314,0.001919,0.5426,0.543185,0.114819,0.002813,0.0008,22.918132
27d,786,160,584,0.0834,0.084095,0.070112,0.002901,0.515,0.556142,0.115363,0.004774,-0.0068,16.980705


### 4.2 $\gamma$-factor fit

The notebook [usALEX - Corrections - Gamma factor fit](usALEX - Corrections - Gamma factor fit.ipynb)
fits the $\gamma$-factor from $E$ and $S$ values computed in the previous section.

In [13]:
run_notebook('usALEX - Corrections - Gamma factor fit')

## 5. μs-ALEX: Corrected E analysis

Using all the correction coefficients here we computed the corrected $E$ histograms:

In [14]:
nb_name = 'usALEX-5samples-E-corrected-all-ph'
run_notebook_template(nb_name)

Unnamed: 0_level_0,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,S_kde,S_gauss,S_gauss_sig,S_gauss_err,E_pr_do_kde,nt_mean
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
7d,1167,583,541,0.929,0.923612,0.060361,0.002595,0.553,0.551031,0.099151,0.004263,0.0018,22.1989
12d,1303,329,944,0.7396,0.727858,0.091651,0.002983,0.5778,0.558006,0.105762,0.003442,0.0158,21.841737
17d,2482,463,1957,0.4266,0.422469,0.112972,0.002554,0.5458,0.539102,0.112595,0.002545,0.0124,21.054439
22d,2047,320,1665,0.1762,0.180234,0.077545,0.0019,0.5462,0.546301,0.114942,0.002817,0.0008,22.918132
27d,786,160,584,0.082,0.082579,0.069136,0.002861,0.519,0.560614,0.115021,0.00476,-0.007,16.980705


## 6. Direct excitation: physiscal parameter

Here we compute the "physical" direct direct excitation coefficient:

$$ d_{exT} = \frac{\sigma_{A532}}{\sigma_{A532}}$$

This coefficient is based only on physical dyes properties and it can therefore applied to non-ALEX measurements. For details see:

- [usALEX - Corrections - Direct excitation physical parameter](usALEX - Corrections - Direct excitation physical parameter.ipynb)

In [15]:
run_notebook('usALEX - Corrections - Direct excitation physical parameter')

## 7. Multi-spot analysis

### 7.1 DCR and background

Plot background fits:

In [16]:
run_notebook('Multi-spot 5-Samples analysis - Background')

Compute detectors DCR:

In [17]:
run_notebook('8-pixel DCR')

### 7.2 Leakage coefficient

First step of the analysis of the multi-spot data is fitting the leakage coefficient. This is perfomed by the notebook

* [Multi-spot 5-Samples analysis - Leakage coefficient fit](Multi-spot 5-Samples analysis - Leakage coefficient fit.ipynb)

that outputs these 3 text files:

- [Multi-spot - leakage coefficient Dem.txt](results/Multi-spot - leakage coefficient Dem.txt) *(all-ch, all-samples mean)*
- [Multi-spot - leakage coefficient mean per-sample Dem.txt](results/Multi-spot - leakage coefficient mean per-sample Dem.txt)
- [Multi-spot - leakage coefficient mean per-ch Dem.txt](results/Multi-spot - leakage coefficient mean per-ch Dem.txt)

In [18]:
run_notebook('Multi-spot 5-Samples analysis - Leakage coefficient fit')

### 7.3 PR and FRET analysis

Then we perform the complete analysis by running:

* [Multi-spot 5-Samples analysis - PR-raw](Multi-spot 5-Samples analysis - PR-raw.ipynb) 

In [19]:
run_notebook('Multi-spot 5-Samples analysis - PR-raw')

Error executing the notebook "Multi-spot 5-Samples analysis - PR-raw".

See notebook "out_notebooks/Multi-spot 5-Samples analysis - PR-raw-out.ipynb" for the traceback.


RuntimeError: Error executing the following the notebook cell:
stop