Skip to content
Use topic modeling for discovering clusters of phenotypes and association study between phenotypes and genotypes
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
sensitivity
README.md
topic_model_visual_wordcloud.ipynb

README.md

Topic modeling on phenotypes

  • To better learn patterns from phenotypes.
  • Test association with topics and genotypes to identify any novel assocations.

Manuscript at: https://www.biorxiv.org/content/early/2018/05/31/335745

Summary — Identifying the clinical associations of genetic variants remains crucial in understanding how the human genome modulates disease risk. Traditional phenome-wide association studies consider each disease phenotype as an independent variable; however, diseases often present as complex clusters of comorbid conditions. In this study, we evaluated topic modeling to model electronic health record data as a mixture of topics (e.g., disease clusters) and tested associations between topics and genetic variants. Our results demonstrated the feasibility of using topic modeling to replicate and discover associations between the human genome and clinical diseases.

How to use

The main script is in topic_model_visual_wordcloud.ipynb

Codes of topic modeling evalution is under folder sensitivity.

Note

Please note, if you need to use the code for any kind of academy use or commecial use, please cite paper: Zhao J, Feng Q, Wu P, Warner JL, Denny JC, Wei W-Q. Using Topic Modeling via Non-negative Matrix Factorization to Identify Relationships between Genetic Variants and Disease Phenotypes: A Case Study of Lipoprotein(a) (LPA). bioRxiv. 2018 May 31;335745.

You can’t perform that action at this time.