Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 1.38 KB

README.md

File metadata and controls

16 lines (11 loc) · 1.38 KB

Topic modeling on phenotypes

  • To better learn patterns from phenotypes.
  • Test association with topics and genotypes to identify any novel assocations.

Manuscript at: https://www.biorxiv.org/content/early/2018/05/31/335745

Summary — Identifying the clinical associations of genetic variants remains crucial in understanding how the human genome modulates disease risk. Traditional phenome-wide association studies consider each disease phenotype as an independent variable; however, diseases often present as complex clusters of comorbid conditions. In this study, we evaluated topic modeling to model electronic health record data as a mixture of topics (e.g., disease clusters) and tested associations between topics and genetic variants. Our results demonstrated the feasibility of using topic modeling to replicate and discover associations between the human genome and clinical diseases.

How to use

The main script is in topic_model_visual_wordcloud.ipynb

Codes of topic modeling evalution is under folder sensitivity.

Note

Please note, if you need to use the code for any kind of academy use or commecial use, please cite paper: Zhao J, Feng Q, Wu P, Warner JL, Denny JC, Wei W-Q. Using Topic Modeling via Non-negative Matrix Factorization to Identify Relationships between Genetic Variants and Disease Phenotypes: A Case Study of Lipoprotein(a) (LPA). bioRxiv. 2018 May 31;335745.