In [3]:
import logging
from collections import defaultdict
from typing import Union, List, Dict, Any

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from tqdm import trange

from models import *
from models_dst import *
from utils import *


In [4]:
logger = logging.getLogger('BayesianScience')

logger.setLevel(logging.DEBUG)

# Kirchberger et al.

## Methodological issues

Kirchberger et al. 2015. _Are daylight saving time transitions associated with changes in myocardial infarction incidence? Results from the German MONICA/KORA Myocardial Infarction Registry_. BMC Public Health. 2015; 15: 778. [link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535383/)

Quotes: (_Background_ section)

> A recent study from Croatia on 2,412 hospitalized AMI survivors confirmed the significant increase of AMI incidence for the first 4 workdays after spring transition with a particular excess on Monday [8]. In contrast to the Swedish results, the authors reported a significant increase after the autumn transition for the first four workdays with a peak on Tuesday and Thursday. However, this study did not include fatal AMI cases and consider meteorological variables as potential confounders.
> 
> A smaller study performed by Jiddou et al. [9] on 935 hospitalized U.S. AMI survivors finally found a significantly increased AMI incidence for the first day (Sunday) after spring transition but no significant effects in terms of the autumn shift. Limitations of this study refer to its small sample size, an exclusion of fatal AMI cases, the use of the time of hospital admission as AMI onset, and the lacking consideration of meteorological confounders.

---------------------

**Questions**

**Q1.** What are the odds that there is a number of days that will have p-value less than 0.05?

The fact that every study has a separate time frame for their hypothesis, e.g. the 3/4/7/14/28 days following DST adjustment suggests that these hypotheses were made after the fact, not inspired by previous research.

> Overall, no significant changes of AMI risk during the first 3 days or 1 week after the transition to and from DST were found.

**Q2.** What are the odds that there will be a particular variable which makes it stick out like a sore thumb (just by random chance)? It's a function of the size of those strata: the smaller the stratum, the more likely there is an outlier. 

> However, subgroup analyses on the spring transition revealed significantly increased risks for men in the first 3 days after transition (RR 1.155, 95 % CI 1.000–1.334) and for persons who **took angiotensine converting enzyme (ACE) inhibitors** prior to the AMI (3 days: RR 1.489, 95 % CI 1.151–1.927; 1 week: RR 1.297, 95 % CI 1.063–1.582). After the clock shift in autumn, patients with a prior infarction had an increased risk to have a re-infarction (3 days: RR 1.319, 95 % CI 1.029–1.691; 1 week: RR 1.270, 95 % CI 1.048–1.539).

(emphasis mine)

---------------------

**Data**:
- AMI count: 25,499 cases of AMI
- data source: MONICA/KORA Myocardial Infarction Registry ([link](https://www.helmholtz-muenchen.de/herzschlag-info/); public data should be published yearly according to [this website](http://www.gbe-bund.de/gbe10/abrechnung.prc_abr_test_logon?p_uid=gast&p_aid=0&p_knoten=FID&p_sprache=E&p_suchstring=7014), but I did not find a link to download the dataset) 
- time period: 1 January 1985 and 31 October 2010 (26 spring and 25 fall DST changes – 2010 fall adjustment was on 31 October)
- ages: 25–74
- includes: coronary death and AMI
- location: city of Augsburg (Germany) and the two adjacent counties (about 600,000 inhabitants)
- additional variables: information on re-infarction, various medication prior to AMI, current occupation, history of hypertension, hyperlipidemia, diabetes, smoking, and obesity.
- confounders accounted for: global time trend, temperature, relative humidity, barometric pressure, and indicators for month of the year, weekday and holiday

> The final model included the following covariates: time trend and previous two day mean relative humidity as regression splines with four and two degrees of freedom, respectively, previous two day mean temperature as a linear term and day of the week as categorical variable.

> The optimized spring model [of the data from March and April, excluding the week in question] included time trend and same day mean relative humidity as regression splines with six and three degrees of freedom.

Six d.o.f. for 2 months is probably overfitting the data, even though it was the sum of 26 years. However, it shouldn’t make a predictible effect, and overall it likely had a negligible effect.

> The incidence rate ratio was assessed as observed over expected events per day and the mean per weekday and corresponding 95% confidence intervals were calculated.

However, it is not stated how the confidence intervals were calculated. It most likely wasn't a Bayesian _credible interval_ because they didn't specify the priors; so then the exact statistical test  confidence intervals would require