Code for the paper:
Decomposing heterogeneity in disease progression speeds and pathways
Yuichiro Yada, Honda Naoki
npj Digital Medicine (2026) — https://www.nature.com/articles/s41746-026-02665-8
DiSPAH (Disease progression Speed and Pthway Analysis based on Hidden Markov model) is a machine learning framework that models individual patient disease progression using an individual-progression-speed continuous-time hidden Markov model (IPS-CTHMM). Applied to ALS (ALSFRS-R longitudinal scores), it simultaneously infers each patient's latent disease state trajectory and progression speed.
| File | Description |
|---|---|
IPSCTHMM_model.py |
Core IPS-CTHMM model (EM algorithm, forward-backward) |
ALSdataread.py |
Data loading and preprocessing for AnswerALS and PRO-ACT dataset |
AnswerALS_DiSPAH.py |
Main DiSPAH analysis on AnswerALS cohort |
AnswerALS_DiSPAH_twostage.py |
Two-stage model fitting (population-wide parameters and individual speeds) |
AnswerALS_posthoc_analysis.py |
Speed/cluster associations with transcriptomics and proteomics |
PROACT_DiSPAH.py |
Cross-cohort validation on PRO-ACT |
AnswerALS_prediction.py |
Prediction of speed and pathway from baseline features |
IPSCTHMM_simulator.py |
Simulator for synthetic data experiments |
stratification.py |
Trajectory clustering (DTW + hierarchical clustering) |
num_state_selection.py |
Number of latent states selection via cross-validation |
compute_state_characteristics.py |
State characterization table (domain-level expectations) |
PROACT_first3_prediction.py |
PRO-ACT holdout prediction using first 3 visits |
ENCALS_prediction_restrictPred.py |
Survival prediction comparison with ENCALS score |
ENCALS_prediction_milestones_restrictPred.py |
Functional milestone prediction comparison with ENCALS |
survival_analysis.py |
Kaplan-Meier and Cox proportional hazard analyses (overall survival) |
survival_functional_milestone_analysis.py |
Milestone-based survival analysis by ALSFRS-R domain |
extract_discordant_state_sequences.py |
Extract latent state sequences for discordant patients |
speed_slope_discordant_analysis.py |
Identification and analysis of speed-slope discordant patients |
simdata_interval_exp.py |
Simulation experiment varying observation intervals |
replot_simdata_pathway_speed_recovery.py |
Plot speed/pathway recovery curves from simulation outputs |
recompute_speed_metrics_spearman_and_scatter.py |
Recompute speed recovery metrics from saved simulation outputs |
visualization_same_start_different_speed.py |
Visualize patient pairs with similar baseline but different speed |
plot_predcomparion.py |
Prediction comparison bar chart |
Model training and latent state/pathway estimation:
python AnswerALS_DiSPAH.pySpeed and cluster associations with clinical features (sex, age, riluzole, mutations):
python AnswerALS_posthoc_analysis.pyOverall survival analysis (KM curves, Cox regression, forest plots):
python survival_analysis.py \
--dispah_csv AnswerALS_covar_estimated_results.csv \
--outdir survival_outDiSPAH applied to PRO-ACT cohort:
python PROACT_DiSPAH.pyFig. 4 & 5 — Association analysis with genetic information and omics data of patient-derived motor neurons.
Speed and cluster associations with clinical features (sex, age, riluzole, mutations):
python AnswerALS_posthoc_analysis.pyFig. 6 — Prediction of progression speeds and pathways from information available at the first medical visit.
**Leave-one-out cross validation **
python AnswerALS_prediction.py \
--num_genes 0Comparison with ENCALS score for survival and functional milestone prediction:
python ENCALS_prediction_restrictPred.py \
--out_dir encals_pred_out
python ENCALS_prediction_milestones_restrictPred.py \
--out_dir encals_milestone_outSpeed-slope discordant patient analysis (Supplemental Fig. 2&3)
python speed_slope_discordant_analysis.py \
--est_csv AnswerALS_covar_estimated_results.csv \
--outdir discordant_outRobustness of relative progression speed estimates to fixing the transition-rate matrix (Supplementary Fig. 4)
python AnswerALS_DiSPAH_twostage.pyPatient pairs with similar baseline but different speeds (Supplementary Fig. 5)
python visualization_same_start_different_speed.py \
--est-results-csv AnswerALS_covar_estimated_results.csvSimulation-based validation of patient-specific information estimation (Supplementary Fig. 7)
python simdata_interval_exp.py \
--outdir sim_interval_expALSFRS-R domain-specific functional milestone analysis (Supplementary Fig. 8)
python survival_functional_milestone_analysis.py \
--dispah_csv AnswerALS_covar_estimated_results.csv \
--outdir milestone_outHoldout prediction on PRO-ACT using first 3 visits (speed-CTHMM vs uniform-CTHMM; Supplementary Fig. 10)
python PROACT_first3_prediction.pyCharacteristics of the estimated latent disease states (Supplementary Table 2)
python compute_state_characteristics.py \
--out-dir AnswerALS_DiSPAHSensitivity analysis for the standard deviation of the speed prior (Supplementary Table 4)
python AnswerALS_DiSPAH_twostage.pyThe remaining supplementary figures and tables are generated as byproducts of the code for the main figures and tables.
easydict==1.10
fastdtw==0.3.4
gseapy==1.0.5
jax==0.4.13
jaxlib==0.4.13
matplotlib==3.7.2
mygene==3.2.2
numpy==1.25.1
numpyro==0.12.1
pandas==2.0.3
scikit-learn==1.3.0
scipy==1.11.1
seaborn==0.12.2
statsmodels==0.14.0
Install with:
pip install -r requirements.txtThis code is designed for the AnswerALS and PRO-ACT datasets. Both require separate data access applications.
@article{yada2026dispah,
title = {Decomposing heterogeneity in disease progression speeds and pathways},
author = {Yada, Yuichiro and Naoki, Honda},
journal = {npj Digital Medicine},
year = {2026},
url = {https://www.nature.com/articles/s41746-026-02665-8}
}