# Critical Scale Invariance in a Healthy Human Heart Rate
<hr>

> Tommaso Bertola, Giacomo Di Prima, Giuseppe Viterbo, Marco Zenari

## Abstract
In this notebook we will reproduce the analysis made by Kiyono in this Physical Review Letter [Kiyono, 1](https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.93.178103). We will investigate the probability distribution function of heart beat time intervals recorded from healty individuals. A model inspired by high Reynolds number turbolence effects on fluid velocities, taken from Castaing's work, will be used to fit the data and test the scale invariance of the distribution of interbeats times. Rhythms from unhealty individuals will be also taken into account to test whether they follow the same distibutions and trends, thus providing hints to pathological or life threathening diagnosis conditions.

# Healthy Heart Rate Variability and its Probability Distribution
Human heart rate is a complex biological signal, whose statistical properties are deeply studied not only from a medical point of view. i.e. to assess the patient's medical conditions, but also from a mathematical perspective.

Here we will mainly focus our attention to the probability distribution of the cardiac interbeat times, defined as the time differences between consecutives heart contractions. Such PDF does not follow a Gaussian statistic [Peng, 2](https://link.aps.org/doi/10.1103/PhysRevLett.70.1343) and more importantly displays a robust scale invariance, hence suggesting the idea that a healthy cardiac system operates near a critical and out of equilibrium state [Yvanov, 3](https://www.nature.com/articles/20924).

To further investigate these hypotheses, unhealthy rythms will also be analyzed. Some studies suggest the presence of deviations from the healthy PDF could be used as a possible tool for cardiac health check [Bigger, 4](https://www.ahajournals.org/doi/10.1161/01.CIR.93.12.2142), as discrepancies from a healthy critical behaviour may reduce the overall efficiency of transporation phenomena in a similar fashion to what happens in other physical systems [Takayasu, 5](https://www.sciencedirect.com/science/article/pii/S0378437199004999).


# Experimental measurements of Heart Beats
The data used to reproduce the analysis are taken from PhysioNet.org public database. Three different datasets are here analyzed:

- ["Fantasia" dataset](https://doi.org/10.13026/C2RG61), made of  recordings from 20 young (21-34 years old) and 20 elder (68 - 85 years old) individuals. All subject were selected after a strict health check to check the absence of any pathological rythms. The ECG signals were taken for a total of 120 minutes, sampled at 250 Hz, while each participant was watching the Disney movie Fantasia in order to help mantain wakefulness. Together with ECG signals, respiration and only in some individuals blood pressure were also recorded.
- [Normal Sinus Rhythm dataset](https://doi.org/10.13026/C2NK5R), which includes 18 long-term (~22 hours) ECG recordings of subjects found to have had no significant arrhythmias; they include 5 men, aged 26 to 45, and 13 women, aged 20 to 50. Electric signals were sampled at 128 Hz.
- [Congestive Heart Failure dataset](https://doi.org/10.13026/C29G60) which includes long-term ECG recordings from 15 subjects (11 men, 22-71 y.o., and 4 women, 54-63 y.o.) with severe congestive heart failure. Each recording lasts 20 hours and data is sampled at 250 Hz with an ambulatory ECG recorder.

The measurements record the electrical activity of the heart by the usage of ECG electrodes sticked to patient's skin. The higher sampling rate (128 or 250 Hz) with respect to the contraction cycles allows a finer analisys up to phenomena of time order bigger than 0.008 seconds.

The electrical activity recorded originates from the response of the cardiac muscle to the stimulus first sent by peace maker cells in the Sinoatrial node. The signal therefore is the overall response of different parts of the heart activating at different times, partly modulated by the anatomy of the tissues.  

<table><tr>
<td> <img src="images/heart_pulse.png" alt="heart_pulse_signal" style="width: 250px;"/> </td>
<td> <img src="images/ECG.gif" alt="GIF ECG" style="width: 250px;"/> </td>
</tr></table>

It is possible to see the units of measure of the signal, $mV$ and the time order of magnitude over which the signal lasts. More importantly the different phases of the signal are here recognizable, also by comparing the animation above. 
The first small signal corresponds to the P phase and is related to the electrical activity originating from the Sinoatrial node. Then the so called QRS complex follows and during this time the maximum electrical activity corresponding to the R peak occurs which will be exploited by the detection algorithms to compute the exact beat time. Finally the depolarization (contractions) of ventricules happens in the following SP phase ad the cycle then repeats. In reality the peak detection algorithms record the R peak times and not the contractions times, but since the delay between these two instants is approximately the same and its variations are negligible with respect to the interbeat time, the extrapolated durations are still statistically significant and a robust analisys can be carried out.


# Datasets and Tools
Here the datasets used will be briefely shown to explain their future usage in the analysis. To access data files `WFDB` Python package was used. It is an open source library specifically built to interface with PhysioNet datasets and provides powerful methods of peak estimation on raw ECG files, allowing easy visualization shortucuts and an high level of customization and configuration. 

<table><tr>
    <td><figure><img src="images/fantasia_patient_distribution.png" alt="fantasia" style="width:100%">
  <figcaption>Fig.1 - Fantasia</figcaption>
</figure></td>
        <td><figure><img src="images/mit_patient_distribution.png" alt="congestive" style="width:100%">
  <figcaption>Fig.2 - Normal Sinus</figcaption>
</figure></td>
        <td><figure><img src="images/congested_patient_distribution.png" alt="congestive" style="width:100%">
  <figcaption>Fig.3 - Congestive Heart Failure</figcaption>
</figure></td>
</tr></table>

## Fantasia
Fantasia dataset contains recordings from 40 different subjects. The first 20 subject's age is ranges from 21 to 34 years old and the corresponding files are marked with the letter "y". The remaining subjects'age ranges from 68 to 86 years old and the letter "o" is used instead, as shown in Fig.1.

Data was stored according to the format provided by `.hea` files. The raw ECG data was stored in `.dat` format and it was loaded with `WFDB` and later converted into a more easily manageble Pandas DataFrame. Blood pressure and respiratory data are here neglected and not taken into account on the following analysis.
ECG data are stored as an ordered sequence of values of voltage differences, sampled at 250 Hz with a conversion factor of 2000 adu/mV (analog digital units). For each patient there is a precomputed annotation file, containing the sample indexes for witch the ECG signal peaks, i.e. at the R peaks.

- Graph of single ecg cycle
- Graph of some cycles
- Graph of all cycles (not very readable)


## Normal Sinus
This database includes 18 24-hours long ECG recordings of subjects, sampled at 128 Hz. Subjects included in this database were found to have had no significant arrhythmias, and age and sex distribution is shown in Fig.2. 
This dataset will be useful to check the result of the previous Fantasia recordings, giving more accurate information as there are much more data to use. The signals were still recorded with two leads, but to speed up the analysis process, annotations already available will be used.

## Congestive Heart Failure
This datset collects data from long-term ECG recordings of 15 subjects, whose age and sex distribution is shown in Fig.3 above. The individual recordings are each 20 hours in duration, and contain two _leads_ ECG signals each sampled at 250 Hz with 12-bit resolution over a range of ±10 millivolts. The patients suffered from severe congestive heart failure (NYHA class 3–4), a chronic progressive condition that affects the pumping power of the heart muscle, thus causing a build up of fluids and a significant reduction of efficiency of the organ.

Data were recorded using two different leads, thus the signal available is a bit different than Fantasia. It is still possible however to compute the interbeat times as the periodic signal displays clear absolute maxima in each pulse. Unchecked annotations of beat indexes were available and these were used in place of running the peak detection algorithm.

We decided to check congestive heart failure dataset to verifiy if there are any differences in the interbeat times distribution, as argued in [Bigger, 4](https://www.ahajournals.org/doi/10.1161/01.CIR.93.12.2142).



# Calculations of RR Intervals
wfdb, how to use it, a short description on the data files, annotations, raw ecg, XQRS algorithm

# Cleaning of the data
Show the outliers of the RR intervals, where they originated from and how we managed to clean them

# Computation of  $\Delta_s B(i)$
## Computation of $B(m)$
Formula on how to compute it, the idea of a cumulative sum
## Detrending with polynomials in sliding segments
Insert a brief description on why this procedure is important, cite the main article, talk about obspy, the degree that was used, sliding segments and why they are important
## Deviations $B^\ast(i)$ and $\Delta_s B(i)$
What we finally get, plot the signal and the histogram of the data

## Plots of signals
For all subjects and all datasets

# Fitting distributions
## Gaussian models
What is our aim, why gauss fit fails, how badly it fails, chi squared and residuals graph

## Castaing's model
Non gaussian model, explanation of the formula, where it comes from, the physical meaning of the parameters, how the fit was implemented on the data, how good the fit is, residuals graph
Explain normalisations choices used, in log scale a division is just a vertical offset 

## Unhealthy individuals
Does the fit looks like the previous, how does it change, does it change significantly

# Scale invariance
## Collapse plot on the histogram distributions
Show the collapse plot of the data, use other papers cited in literature, fluctuations at different scales

## Dependence on s parameter
What do we expect, plot how it changes when s is varied, change detrending order and see if there are any variations

# Multifractality nature of the Interbeat Heart Times


# Bibliography