- To better learn patterns from phenotypes.
- Test association with topics and genotypes to identify any novel assocations.
Manuscript at: https://www.biorxiv.org/content/early/2018/05/31/335745
Summary — Identifying the clinical associations of genetic variants remains crucial in understanding how the human genome modulates disease risk. Traditional phenome-wide association studies consider each disease phenotype as an independent variable; however, diseases often present as complex clusters of comorbid conditions. In this study, we evaluated topic modeling to model electronic health record data as a mixture of topics (e.g., disease clusters) and tested associations between topics and genetic variants. Our results demonstrated the feasibility of using topic modeling to replicate and discover associations between the human genome and clinical diseases.
The main script is in topic_model_visual_wordcloud.ipynb
Codes of topic modeling evalution is under folder sensitivity.
Please note, if you need to use the code for any kind of academy use or commecial use, please cite paper: Zhao J, Feng Q, Wu P, Warner JL, Denny JC, Wei W-Q. Using Topic Modeling via Non-negative Matrix Factorization to Identify Relationships between Genetic Variants and Disease Phenotypes: A Case Study of Lipoprotein(a) (LPA). bioRxiv. 2018 May 31;335745.