# Prediction of New Onset Atrial Fibrillation Using Routinely Reported 12-Lead ECG Variables and Electronic Health Data

## Table of Contents

- Background
- Research Question
- Data Dictionary
- Exploratory Data Analysis
- References

## Background

Atrial fibrillation (AF) is the most common irregular heart rhythm and is often described as a cardiovascular epidemic of the 21st century. It affects approximately 1 in 3 people over the age of 45 *(Kornej et al., 2020; Linz et al., 2024)*. One of the most dangerous complications of AF is the formation of blood clots in the heart, which can travel to the brain and cause strokes. Individuals with AF are 4 to 5 times more likely to experience a stroke *(Kornej et al., 2020; Healey et al., 2012)*. Predicting who might develop AF is important so that doctors can initiate preventive treatments, such as blood-thinning medications, at an early stage.

Several risk scores have been developed for the prediction of AF using traditional statistical models, such as the C2HEST *(Li et al., 2019)* and CHARGE-AF *(Alonso et al., 2013)* scores, with modest performance in validation datasets (C-index 0.59-0.73). Additionally, some of these scores have been derived for restricted ethnicity groups.

Recently, machine learning (ML) algorithms have been explored for this task and have shown improved predictive performance. One study used data from patients, electronic health records (EHR), and heart MRI scans to predict AF over five years with a prediction accuracy (C-index) of 0.78 *(Dykstra et al., 2022)*. Another model called FIND-AF was developed in the UK using routine EHR data to predict new cases of AF within six months, with a high accuracy (ROC-AUC of 0.82) *(Nadarajah et al., 2023)*.

There is still a need for better AF prediction models that work well for all types of patients and clinical settings. **It is not yet clear if including ECG (heart rhythm test) data can make predictions more accurate than using only EHR data.**

## Research Question

Can a **risk prediction model** be developed using a **large repository of synthetic patient health data**, including **12-lead ECG** and **electronic health record (EHR) variables** from patients in **Southern Alberta** with suspected or known **cardiovascular disease**, to accurately predict the **future occurrence of new-onset atrial fibrillation (AF)** for individual patients?

**Study cohort**: A synthetic dataset of about 100,000 patients without a history of AF. These patients had a baseline ECG done between January 2010 and January 2023 and were followed for at least 12 months. Patients with current or past AF/flutter were excluded based on their baseline ECG and records from continuous ECG monitoring (Holter), diagnostic codes (ICD-10-CA), or procedure codes related to AF/flutter treatment. This synthetic data was created using a random sample of approximately 100,000 patients from the Cardiovascular Imaging Registry of Calgary (CIROC).

**Outcome of interest**: New-onset future AF/flutter detected by any follow-up ECG, continuous ambulatory ECG monitoring (Holter), ICD-10-CA code, or procedural code for AF/flutter intervention.

In [3]:
import pandas as pd

## Data Dictionary

In [5]:
data_dictionary_df = pd.read_csv('../data/data_dictionary.csv')
data_dictionary_df.head()

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
0,patient_id,System,Tracking,alpha_num,Randomly generated 9-digit alpha-numeric ident...,,,,,,...,,,,,,,,,,
1,demographics_age_index_ecg,Demographics,Age,numeric,Chronological age at time of referenced index ...,,,,,,...,,,,,,,,,,
2,demographics_birth_sex,Demographics,Sex,categorical,Sex assigned at birth,Male,Female,,,,...,,,,,,,,,,
3,hypertension_icd10,Cardiac Risk,Hypertension,boolean,ICD-10 coding of hypertension in either DAD or...,No,Yes,,,,...,,,,,,,,,,
4,diabetes_combined,Cardiac Risk,Diabetes,boolean,Documented presence of hyperglycaemic state in...,No,Yes,,,,...,,,,,,,,,,


In [6]:
data_dictionary_df['Section category'].value_counts()

Section category
Medications                    54
Laboratory                     50
Disease - Non-CV               13
Prior cardiovascular events    11
Disease - CV                   10
ECG                             6
Future outcomes                 5
Prior procedures - CV           4
Cardiac Risk                    3
Devices - CV                    3
Demographics                    2
System                          1
Name: count, dtype: int64

**System:**

- `patient_id`: Once duplicates are addressed, `patient_id` can be safely removed if it has no predictive value.

In [8]:
data_dictionary_df[data_dictionary_df['Section category'] == 'System']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
0,patient_id,System,Tracking,alpha_num,Randomly generated 9-digit alpha-numeric ident...,,,,,,...,,,,,,,,,,


**Demographics:**

- `demographics_age_index_ecg`: Strong predictor; AF risk increases significantly with age due to cumulative cardiovascular changes.
- `demographics_birth_sex`: Captures sex-specific differences in AF risk and outcomes; men have higher incidence, women may have worse outcomes.

In [10]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Demographics']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
1,demographics_age_index_ecg,Demographics,Age,numeric,Chronological age at time of referenced index ...,,,,,,...,,,,,,,,,,
2,demographics_birth_sex,Demographics,Sex,categorical,Sex assigned at birth,Male,Female,,,,...,,,,,,,,,,


**Cardiac Risk:**

- `hypertension_icd10`: Hypertension is a major modifiable risk factor for AF due to its role in promoting structural and electrical remodeling of the heart.
- `diabetes_combined`: Diabetes contributes to AF risk through systemic inflammation, oxidative stress, and cardiac remodeling.
- `dyslipidemia_combined`: Dyslipidemia indirectly affects AF risk via its contribution to atherosclerosis and cardiovascular disease.

In [12]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Cardiac Risk']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
3,hypertension_icd10,Cardiac Risk,Hypertension,boolean,ICD-10 coding of hypertension in either DAD or...,No,Yes,,,,...,,,,,,,,,,
4,diabetes_combined,Cardiac Risk,Diabetes,boolean,Documented presence of hyperglycaemic state in...,No,Yes,,,,...,,,,,,,,,,
5,dyslipidemia_combined,Cardiac Risk,Dyslipidemia,boolean,Documented presence of dyslipidemia (treated o...,No,Yes,,,,...,,,,,,,,,,


**Disease - CV:**

- `dcm_icd10`: Dilated cardiomyopathy increases AF risk due to structural heart changes and impaired ventricular function.
- `hcm_icd10`: Hypertrophic cardiomyopathy predisposes to AF through left atrial enlargement and fibrosis.
- `arvc_icd10`: Arrhythmogenic cardiomyopathy promotes AF via fibrofatty infiltration and arrhythmias in the right ventricle.
- `amyloid_cardiac_icd10`: Cardiac amyloidosis heightens AF risk due to atrial infiltration and stiffness.
- `myocarditis_icd10_prior`: Acute myocarditis can trigger AF through inflammation and myocardial scarring.
- `pericarditis_icd10_prior`: Acute pericarditis may increase AF risk via pericardial inflammation and atrial irritation.
- `bav_icd10`: Bicuspid aortic valve is linked to AF through associated valvular and hemodynamic abnormalities.
- `aortic_insufficiency_icd10`: Aortic insufficiency contributes to AF risk via volume overload and left atrial dilation.
- `aortic_dilation_icd10`: Aortic dilation/aneurysm raises AF risk through mechanical effects on adjacent cardiac structures.
- `aortic_dissection_icd10_prior`: Aortic dissection may indirectly increase AF risk via associated acute hemodynamic stress.

In [14]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Disease - CV']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
6,dcm_icd10,Disease - CV,Dilated cardiomyopathy,boolean,ICD-10 coded presence of dilated cardiomyopath...,No,Yes,,,,...,,,,,,,,,,
7,hcm_icd10,Disease - CV,Hypertrophic cardiomyopathy,boolean,ICD-10 coded presence of hypertrophic cardiomy...,No,Yes,,,,...,,,,,,,,,,
8,arvc_icd10,Disease - CV,Arrhythmogenic cardiomyopathy,boolean,ICD-10 coded presence of arrhythmogenic right ...,No,Yes,,,,...,,,,,,,,,,
9,amyloid_cardiac_icd10,Disease - CV,Cardiac amyloid,boolean,ICD-10 coded presence of cardiac amyloidosis a...,No,Yes,,,,...,,,,,,,,,,
10,myocarditis_icd10_prior,Disease - CV,Myocarditis - acute,boolean,ICD-10 coded presence of acute myocarditis at ...,No,Yes,,,,...,,,,,,,,,,
11,pericarditis_icd10_prior,Disease - CV,Pericarditis - acute,boolean,ICD-10 coded presence of acute pericarditis at...,No,Yes,,,,...,,,,,,,,,,
12,bav_icd10,Disease - CV,Bicuspid aortic valve,boolean,ICD-10 coded presence of bicuspid aortic valve...,No,Yes,,,,...,,,,,,,,,,
13,aortic_insufficiency_icd10,Disease - CV,Aortic insufficiency (?moderate),boolean,ICD-10 coded presence of aortic insufficiency ...,No,Yes,,,,...,,,,,,,,,,
14,aortic_dilation_icd10,Disease - CV,Aortic dilation / aneurysm,boolean,ICD-10 coded presence of aortic dilatation/ane...,No,Yes,,,,...,,,,,,,,,,
15,aortic_dissection_icd10_prior,Disease - CV,Aortic dissection,boolean,ICD-10 coded presence of aortic dissection at ...,No,Yes,,,,...,,,,,,,,,,


**Disease - Non-CV**:

- `amyloid_any_icd10`: Systemic amyloidosis can indirectly increase AF risk through multi-organ involvement, including cardiac effects.
- `copd_icd10`: COPD heightens AF risk by promoting hypoxia, systemic inflammation, and pulmonary hypertension.
- `hyperthyroid_icd10`: Hyperthyroidism is a direct trigger for AF due to increased metabolic rate and atrial excitability.
- `hypothyroid_icd10`: Hypothyroidism can contribute to AF risk via systemic effects like hypertension and diastolic dysfunction.
- `obstructive_sleep_apnea_icd10`: Strongly associated with AF due to intermittent hypoxia, atrial remodeling, and autonomic dysfunction.
- `pulmonary_htn_icd10`: Pulmonary hypertension increases AF risk by causing right atrial enlargement and strain.
- `rheumatoid_arthritis_icd10`: RA raises AF risk through chronic systemic inflammation and potential cardiovascular involvement.
- `sle_icd10`: SLE predisposes to AF due to systemic inflammation, autoimmune damage, and thrombotic risks.
- `ckd_icd10`: Chronic kidney disease promotes AF risk via uremic toxins, systemic inflammation, and electrolyte imbalances.
- `pad_icd10`: Peripheral arterial disease is a marker of systemic atherosclerosis, indirectly linked to AF risk.
- `sarcoid_icd10`: Sarcoidosis contributes to AF risk through granulomatous infiltration and potential cardiac involvement.
- `chronic_liver_disease_icd10`: Liver disease increases AF risk via systemic inflammation, coagulopathy, and hemodynamic changes.
- `cancer_any_icd10`: Cancer elevates AF risk through pro-thrombotic states, systemic inflammation, and potential cardiac involvement from treatments.

In [30]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Disease - Non-CV']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
16,amyloid_any_icd10,Disease - Non-CV,Amyloidosis,boolean,ICD-10 coded presence of amyloidosis at any ti...,No,Yes,,,,...,,,,,,,,,,
17,copd_icd10,Disease - Non-CV,COPD,boolean,ICD-10 coded presence of chronic obstructive a...,No,Yes,,,,...,,,,,,,,,,
18,hyperthyroid_icd10,Disease - Non-CV,Hyperthyroidism,boolean,ICD-10 coded presence of hyperthyroidism at an...,No,Yes,,,,...,,,,,,,,,,
19,hypothyroid_icd10,Disease - Non-CV,Hypothyroidism,boolean,ICD-10 coded presence of hypothyroidism at any...,No,Yes,,,,...,,,,,,,,,,
20,obstructive _sleep_apnea_icd10,Disease - Non-CV,Obstructive Sleep Apnea,boolean,ICD-10 coded presence of obstructive sleep apn...,No,Yes,,,,...,,,,,,,,,,
21,pulmonary_htn_icd10,Disease - Non-CV,Pulmonary hypertension,boolean,ICD-10 coded presence of pulmonary hypertensio...,No,Yes,,,,...,,,,,,,,,,
22,rheumatoid_arthritis_icd10,Disease - Non-CV,Rheumatoid arthritis,boolean,ICD-10 coded presence of rheumatoid arthritis ...,No,Yes,,,,...,,,,,,,,,,
23,sle_icd10,Disease - Non-CV,Systemic Lupus Erthythamatosis,boolean,ICD-10 coded presence of systemic lupus erythe...,No,Yes,,,,...,,,,,,,,,,
24,ckd_icd10,Disease - Non-CV,Chronic kidney disease,boolean,ICD-10 coded presence of chronic kidney diseas...,No,Yes,,,,...,,,,,,,,,,
25,pad_icd10,Disease - Non-CV,peripheral artery disease,boolean,ICD-10 coded presence of peripheral arterial d...,No,Yes,,,,...,,,,,,,,,,


**Prior cardiovascular events:**

- `event_cv_HF_admission_icd10_prior`: Heart failure hospitalization is a strong predictor of AF due to atrial remodeling and elevated atrial pressures.
- `event_cv_cad_acs_any_icd10_prior`: Acute coronary syndromes signal increased AF risk through ischemia-induced atrial dysfunction.
- `event_cv_cad_acs_acute_mi_icd10_prior`: Myocardial infarction directly elevates AF risk via cardiac ischemia and scar-related electrical remodeling.
- `event_cv_cad_acs_unstable_angina_icd10_prior`: Unstable angina increases AF risk as a precursor to ischemic damage and atrial strain.
- `event_cv_cad_acs_other_icd10_prior`: Other acute coronary syndromes contribute to AF risk through similar ischemic and systemic effects.
- `event_cv_ep_vt_any_icd10_prior`: Ventricular tachycardia reflects electrical instability and structural heart disease, which are associated with AF.
- `event_cv_ep_sca_survived_icd10_cci_prior`: Surviving cardiac arrest indicates severe cardiovascular dysfunction, increasing AF susceptibility.
- `event_cv_cns_stroke_any_icd10_prior`: Stroke, both ischemic and hemorrhagic, is linked to systemic risk factors and shared mechanisms for AF.
- `event_cv_cns_stroke_ischemic_icd10_prior`: Ischemic stroke reflects embolic and cardiovascular risk factors strongly tied to AF.
- `event_cv_cns_stroke_hemorrh_icd10_prior`: Hemorrhagic stroke can be linked to systemic risk factors that overlap with AF etiology.
- `event_cv_cns_TIA_icd10_prior`: TIA indicates transient ischemic episodes often associated with embolic risk factors and latent AF.

In [41]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Prior cardiovascular events']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
29,event_cv_HF_admission_icd10_prior,Prior cardiovascular events,Heart failure admission,boolean,Heart failure hospitalization by ICD-10 coding...,No,Yes,,,,...,,,,,,,,,,
30,event_cv_cad_acs_any_icd10_prior,Prior cardiovascular events,Acute coronary syndrome,boolean,Any acute coronary syndrome (acute myocardial ...,No,Yes,,,,...,,,,,,,,,,
31,event_cv_cad_acs_acute_mi_icd10_prior,Prior cardiovascular events,Acute coronary syndrome,boolean,Acute myocardial infarction by ICD-10 coding p...,No,Yes,,,,...,,,,,,,,,,
32,event_cv_cad_acs_unstable_angina_icd10_prior,Prior cardiovascular events,Acute coronary syndrome,boolean,Unstable angina by ICD-10 coding prior to inde...,No,Yes,,,,...,,,,,,,,,,
33,event_cv_cad_acs_other_icd10_prior,Prior cardiovascular events,Acute coronary syndrome,boolean,Other acute coronary syndrome by ICD-10 coding...,No,Yes,,,,...,,,,,,,,,,
34,event_cv_ep_vt_any_icd10_prior,Prior cardiovascular events,Ventricular tachycardia,boolean,Ventricular tachycardia by ICD-10 coding prior...,No,Yes,,,,...,,,,,,,,,,
35,event_cv_ep_sca_survived_icd10_cci_prior,Prior cardiovascular events,Survived sudden cardiac arrest,boolean,Survived cardiac arrest by ICD-10 coding prior...,No,Yes,,,,...,,,,,,,,,,
36,event_cv_cns_stroke_any_icd10_prior,Prior cardiovascular events,Stroke,boolean,Any acute stroke (ischemic or hemorrhagic) by ...,No,Yes,,,,...,,,,,,,,,,
37,event_cv_cns_stroke_ischemic_icd10_prior,Prior cardiovascular events,Stroke,boolean,Acute ischemic stroke by ICD-10 coding prior t...,No,Yes,,,,...,,,,,,,,,,
38,event_cv_cns_stroke_hemorrh_icd10_prior,Prior cardiovascular events,Stroke,boolean,Acute hemorrhagic stroke by ICD-10 coding prio...,No,Yes,,,,...,,,,,,,,,,


**Prior procedures - CV:**

- `pci_prior`: Prior PCI indicates coronary artery disease, which is a significant risk factor for AF due to ischemic and structural heart changes.
- `cabg_prior`: CABG reflects severe coronary artery disease, increasing AF risk via postoperative atrial remodeling and inflammation.
- `transplant_heart_cci_prior`: Heart transplantation indicates advanced cardiac disease, with AF risk linked to surgical and immunological factors.
- `lvad_cci_prior`: LVAD implantation reflects end-stage heart failure, strongly associated with AF due to atrial strain and hemodynamic changes.

In [44]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Prior procedures - CV']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
40,pci_prior,Prior procedures - CV,Percutaneous coronary intervention (PCI),boolean,Prior PCI by CCI coding or from APPROACH datab...,No,Yes,,,,...,,,,,,,,,,
41,cabg_prior,Prior procedures - CV,Coronary artery bypass grafting (CABG),boolean,Prior coronary artery bypass grafting by CCI c...,No,Yes,,,,...,,,,,,,,,,
42,transplant_heart_cci_prior,Prior procedures - CV,Heart transplantation,boolean,Prior cardiac transplantation by CCI coding at...,No,Yes,,,,...,,,,,,,,,,
43,lvad_cci_prior,Prior procedures - CV,LVAD implantation,boolean,Prior left ventricular assist device (LVAD) im...,No,Yes,,,,...,,,,,,,,,,


**Devices - CV:**

- `pacemaker_permanent_cci_prior`: Permanent pacemaker implantation reflects underlying conduction system disease, which is associated with increased AF risk.
- `crt_cci_prior`: Cardiac resynchronization therapy (CRT) is linked to advanced heart failure, a condition strongly associated with AF due to atrial remodeling and strain.
- `icd_cci_prior`: Implantable cardioverter defibrillator (ICD) implantation indicates severe arrhythmias or structural heart disease, both of which significantly increase AF risk.

In [47]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Devices - CV']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
44,pacemaker_permanent_cci_prior,Devices - CV,Permanent pacemaker,boolean,Prior permanent pacemaker implantation by CCI ...,No,Yes,,,,...,,,,,,,,,,
45,crt_cci_prior,Devices - CV,Permanent pacemaker,boolean,Prior cardiac resynchronization therapy (CRT) ...,No,Yes,,,,...,,,,,,,,,,
46,icd_cci_prior,Devices - CV,Implantable cardioverter defibrillator,boolean,Prior internal cardioverter defibrillator (ICD...,No,Yes,,,,...,,,,,,,,,,


**ECG:**

- `ecg_hr`: Heart rate provides insights into baseline cardiac function and autonomic tone, which are important for identifying AF risk.
- `ecg_pr`: PR interval reflects atrioventricular conduction; prolongation or shortening may indicate structural or electrical abnormalities linked to AF.
- `ecg_qrs`: QRS duration identifies ventricular conduction delays, which can signify underlying cardiac disease associated with AF.
- `ecg_qtc`: Corrected QT interval is important for identifying repolarization abnormalities that may predispose to arrhythmias, including AF.
- `ecg_rhythm`: Heart rhythm directly identifies arrhythmic events, with non-sinus rhythms being strong predictors of future AF.
- `ecg_qrs_morphology`: QRS morphology reflects conduction abnormalities like bundle branch blocks, which can indicate structural heart issues increasing AF risk.

In [50]:
data_dictionary_df[data_dictionary_df['Section category'] == 'ECG']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
47,ecg_hr,ECG,12 lead ECG,numeric,Heart rate from index 12 lead ECG,,,,,,...,,,,,,,,,,
48,ecg_pr,ECG,12 lead ECG,numeric,PR interval duration from index 12 lead ECG,,,,,,...,,,,,,,,,,
49,ecg_qrs,ECG,12 lead ECG,numeric,QRS complex duration from index 12 lead ECG,,,,,,...,,,,,,,,,,
50,ecg_qtc,ECG,12 lead ECG,numeric,Corrected QT interval from index 12 lead ECG,,,,,,...,,,,,,,,,,
51,ecg_rhythm,ECG,12 lead ECG,categorical,Heart rhythm from index 12 lead ECG,sinus,afib,aflutter,bigemini,wide complex rhythm,...,,,,,,,,,,
52,ecg_qrs_morphology,ECG,12 lead ECG,categorical,QRS morphology from index 12 lead ECG,LBBB,RBBB,LAFB,LPFB,Bifascicular block,...,,,,,,,,,,


**Laboratory:**

- `hgb_peri` & `hct_peri`: Hemoglobin and hematocrit levels indicate anemia, which can contribute to AF via increased cardiac workload and hypoxia.
- `rdw_peri`: Elevated RDW reflects variability in red cell size, linked to systemic inflammation and higher AF risk.
- `wbc_peri`: Elevated WBC counts indicate inflammation, a known contributor to AF pathogenesis.
- `plt_peri`: Platelet levels may impact clotting dynamics, relevant to AF-related thromboembolism.
- `inr_peri` & `ptt_peri`: Coagulation measures (INR, PTT) are important for understanding clotting status in patients at risk for AF complications.
- `esr_peri` & `crp_high_sensitive_peri`: Elevated ESR or CRP signals systemic inflammation, a key factor in AF development.
- `albumin_peri`: Low albumin is associated with poor overall health and increased AF risk.
- `alkaline_phophatase_peri`, `alanine_transaminase_peri`, `aspartate_transaminase_peri`, `bilirubin_total_peri`, `bilirubin_direct_peri`, `bilirubin_indirect_peri`: Liver function markers are relevant for systemic metabolic health, indirectly impacting AF risk.
- `urea_peri`, `creatinine_peri`, `uric_acid_peri`, `urine_alb_cr_ratio_peri`: Kidney function markers highlight metabolic stress and fluid balance issues, which influence AF.
- `sodium_peri`, `potassium_peri`, `chloride_peri`: Electrolyte imbalances directly affect cardiac electrophysiology and AF risk.
- `ck_peri`, `troponin_i_peri_closest`, `troponin_i_peri_highest`, `troponin_t_hs_peri_closest`, `troponin_t_hs_peri_highest`: Cardiac enzymes indicate myocardial stress or injury, strongly linked to AF risk.
- `NTproBNP_peri_closest`, `NTproBNP_peri_highest`: NT-proBNP levels reflect cardiac strain, a strong predictor of AF.
- `glucose_fasting_peri_closest`, `glucose_fasting_peri_highest`, `glucose_postprandial_peri_closest`, `glucose_postprandial_peri_highest`, `glucose_random_peri_closest`, `glucose_random_peri_highest`, `hga1c_peri_closest`, `hga1c_peri_highest`: Glucose and HbA1c levels identify diabetes and metabolic dysfunction, major AF risk factors.
- `tchol_peri_closest`, `tchol_peri_highest`, `ldl_peri_closest`, `ldl_peri_highest`, `hdl_peri_closest`, `hdl_peri_lowest`, `tg_peri_closest`, `tg_peri_highest`: Lipid levels signal cardiovascular risk and systemic inflammation, impacting AF development.
- `iron_peri`, `tibc_peri`, `ferritin_peri`: Iron studies reveal anemia or overload, both of which can influence AF via systemic effects.
- `tsh_peri`: Abnormal TSH levels indicate thyroid dysfunction, a well-known trigger for AF.

In [53]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Laboratory']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
53,hgb_peri,Laboratory,CBC,numeric,Closest hemoglobin within 1 year prior to 6 mo...,,,,,,...,,,,,,,,,,
54,hct_peri,Laboratory,CBC,numeric,Closest hematocrit within 1 year prior to 6 mo...,,,,,,...,,,,,,,,,,
55,rdw_peri,Laboratory,CBC,numeric,Closest red cell distributiomn width (RDW) wit...,,,,,,...,,,,,,,,,,
56,wbc_peri,Laboratory,CBC,numeric,Closest white blood cell count within 1 year p...,,,,,,...,,,,,,,,,,
57,plt_peri,Laboratory,CBC,numeric,Closest platelet count within 1 year prior to ...,,,,,,...,,,,,,,,,,
58,inr_peri,Laboratory,Coagulation,numeric,Closest international normalized ratio (INR) w...,,,,,,...,,,,,,,,,,
59,ptt_peri,Laboratory,Coagulation,numeric,Closest partial thromboplastin time (PTT) with...,,,,,,...,,,,,,,,,,
60,esr_peri,Laboratory,Inflammatory markers,numeric,Closest erhythrocyte sedimentation rate (ESR) ...,,,,,,...,,,,,,,,,,
61,crp_high_sensitive_peri,Laboratory,Inflammatory markers,numeric,Closest high sensitivity C reactive protein (C...,,,,,,...,,,,,,,,,,
62,albumin_peri,Laboratory,Liver function,numeric,Closest albumin within 1 year prior to 6 month...,,,,,,...,,,,,,,,,,


**Medications:**

**Anti-Platelet/Anti-Coagulant Medications**

- `anti_platelet_oral_asa_peri` & `anti_platelet_oral_non_asa_any_peri`: Aspirin and non-ASA anti-platelets reduce thrombotic risk but may indicate a history of cardiovascular events, indirectly linked to AF risk.
- `anti_coagulant_oral_any_peri`: Use of anticoagulants suggests a high-risk thromboembolic profile, often coexisting with AF.

**Anti-Anginal Medications**
- `nitrates_any_peri` to `nitrates_dinitrates_peri` & `ranolazine_peri`: Use reflects ischemic heart disease, which is a major AF risk factor due to myocardial stress.

**ACE/ARB/Entresto Medications**
- `arb_peri`, `acei_peri`, `arni_entresto_peri`: These medications indicate hypertension or heart failure, both strongly associated with AF.

**Beta Blockers and Heart Failure Medications**
- `beta_blocker_any_peri`, `ivabradine_peri`: These suggest underlying cardiac arrhythmias or heart failure, which contribute to AF risk.

**Calcium Channel Blockers (CCBs)**
- `ccb_any_peri`, `ccb_dihydro_peri`, `ccb_non_dihydro_peri`: CCB use reflects hypertension or arrhythmias, both relevant to AF development.

**Diuretics**
- `diuretic_loop_peri` to `diuretic_vasopressin_antagonist_peri`: Diuretic use suggests fluid overload or heart failure, contributing to atrial stretch and AF risk.

**Anti-Arrhythmic Medications**
- `anti_arrhythmic_any_peri` to `anti_arrhythmic_disopyramide_peri`: Use reflects prior arrhythmias, directly linked to recurrent or new-onset AF.

**Digoxin and Myosin Inhibitors**
- `digoxin_peri`: Indicates rate control for arrhythmias, commonly used in AF management.
- `mavacamten_peri`: Suggests underlying hypertrophic cardiomyopathy, which increases AF risk.

**Amyloid Therapeutics**
- `amyloid_therapeutics_any_peri` to `amyloid_therapeutics_inotersen_peri`: Reflects treatment for systemic amyloidosis, a condition that can lead to atrial remodeling and AF.

**Lipid-Lowering Therapies**
- `lipid_any_peri` to `lipid_other_peri`: Lipid-lowering medications target cardiovascular disease, which indirectly increases AF risk through systemic effects.

**Glucose-Lowering Therapies**
- `glucose_lowering_any_peri` to `glucose_ohg_other_peri`: Use indicates diabetes, a significant metabolic risk factor for AF.

**Smoking Cessation Medications**
- `smoking_cessation_oral_peri`, `smoking_cessation_nicotine_replacement_peri`: Address smoking-related risks, which include cardiovascular inflammation and heightened AF risk.

In [56]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Medications']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
103,anti_platelet_oral_asa_peri,Medications,ASA,boolean,Aspirin use within 90 days prior to or after d...,No,Yes,,,,...,,,,,,,,,,
104,anti_platelet_oral_non_asa_any_peri,Medications,Anti-platelet (non-ASA),boolean,Non-aspirin anti-platelet medication use withi...,No,Yes,,,,...,,,,,,,,,,
105,anti_coagulant_oral_any_peri,Medications,Oral anti-coagulant,boolean,Oral anticoagulant medication use within 90 da...,No,Yes,,,,...,,,,,,,,,,
106,nitrates_any_peri,Medications,Anti-anginal,boolean,"Any nitrate medication use (mononitrate, dinit...",No,Yes,,,,...,,,,,,,,,,
107,nitrates_trinitrates_long_acting_peri,Medications,Anti-anginal,boolean,Trinitrate medication use (patch or ointment) ...,No,Yes,,,,...,,,,,,,,,,
108,nitrates_trinitrates_short_acting_peri,Medications,Anti-anginal,boolean,Trinitrate medication use (sublingual spray or...,No,Yes,,,,...,,,,,,,,,,
109,nitrates_mononitrates_peri,Medications,Anti-anginal,boolean,Mononitrate medication use within 90 days prio...,No,Yes,,,,...,,,,,,,,,,
110,nitrates_dinitrates_peri,Medications,Anti-anginal,boolean,Dinitrate medication use within 90 days prior ...,No,Yes,,,,...,,,,,,,,,,
111,ranolazine_peri,Medications,Anti-anginal,boolean,Ranolazine medication use within 90 days prior...,No,Yes,,,,...,,,,,,,,,,
112,arb_peri,Medications,ACE/ARB/Entresto,boolean,Angiotensin Receptor Blocker (ARB) medication ...,No,Yes,,,,...,,,,,,,,,,


**Future outcomes:**

- `event_all_death`: Indicates overall patient mortality, a critical outcome to assess the broader implications of AF and related interventions.
- `time_to_all_death`: Provides a temporal context for survival analysis, helping assess the timing of mortality relative to AF onset or other predictors.
- `event_cv_ep_afib_aflutter_new_icd10_post`: Primary outcome variable; directly identifies the occurrence of new-onset AF/flutter, essential for model predictions.
- `time_to_event_cv_ep_afib_aflutter_new_icd10_post`: Adds granularity by measuring the time until AF/flutter onset, useful for survival analysis and model calibration.
- `time_last_seen`: Tracks patient follow-up duration, ensuring accurate interpretation of non-events and censoring in time-to-event analyses.

In [62]:
data_dictionary_df[data_dictionary_df['Section category'] == 'Future outcomes']

Unnamed: 0,Variable name,Section category,Variable category,Variable type,Definition,cat_1,cat_2,cat_3,cat_4,cat_5,...,cat_12,cat_13,cat_14,cat_14.1,cat_15,cat_16,cat_17,cat_18,cat_19,cat_20
157,event_all_death,Future outcomes,All cause death,boolean,All cause death from Vital statistics Alberta ...,No,Yes,,,,...,,,,,,,,,,
158,time_to_all_death,Future outcomes,Time to all cause death,numeric,Time from index 12 lead ECG to all cause death...,,,,,,...,,,,,,,,,,
159,event_cv_ep_afib_aflutter_new_icd10_post,Future outcomes,New onset afib/flutter,boolean,New-onset atrial fibrillation/flutter by any o...,No,Yes,,,,...,,,,,,,,,,
160,time_to_event_cv_ep_afib_aflutter_new_icd10_post,Future outcomes,Time to new-onset afib/flutter,numeric,Time from index 12 lead ECG to new onset atria...,,,,,,...,,,,,,,,,,
161,time_last_seen,Future outcomes,Time last seen,numeric,Time from index 12 lead ECG to when the patien...,,,,,,...,,,,,,,,,,


## Exploratory Data Analysis

## References

Alonso, A., et al. (2013). Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: The CHARGE‐AF consortium. *Journal of the American Heart Association, 2*(1).

Dykstra, S., et al. (2022). Machine learning prediction of atrial fibrillation in cardiovascular patients using cardiac magnetic resonance and electronic health information. *Frontiers in Cardiovascular Medicine, 9*.

Healey, J. S., et al. (2012). Subclinical atrial fibrillation and the risk of stroke. *New England Journal of Medicine, 366*(2), 120–129.

Kornej, J., Börschel, C. S., Benjamin, E. J., & Schnabel, R. B. (2020). Epidemiology of atrial fibrillation in the 21st century. *Circulation Research, 127*(1), 4–20.

Li, Y.-G., et al. (2019). A simple clinical risk score (C2HEST) for predicting incident atrial fibrillation in Asian subjects: Derivation in 471,446 Chinese subjects, with internal validation and external application in 451,199 Korean subjects. *Chest, 155*(3), 510–518.

Linz, D., et al. (2024). Atrial fibrillation: Epidemiology, screening and digital health. *The Lancet Regional Health – Europe, 37*, Article 100786.

Nadarajah, R., et al. (2023). Prediction of short-term atrial fibrillation risk using primary care electronic health records. *Heart, 109*(12), 1072–1079.