Skip to content

tangjiji199645/LMM_DGE_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LMM DGE Pipeline

If you use this pipeline for published work, please cite our paper:

Tang, S., Buchman, A.S., Wang, Y. et al. Differential gene expression analysis based on linear mixed model corrects false positive inflation for studying quantitative traits. Sci Rep 13, 16570 (2023). https://doi.org/10.1038/s41598-023-43686-7

Setup GEMMA

  1. For Mac or Windows, Using docker to run GEMMA, https://github.com/genetics-statistics/GEMMA. After adding the GEMMA image into docker, open terminal to run this following command to run GEMMA:
 docker run -w /run -v ${local_path for your files}:/run ed5bf7499691 gemma
  1. For Linux or HPC, download the binary format from https://github.com/genetics-statistics/GEMMA.
chmod u+x gemma 
./gemma 

1.Normalization and prepare file for GEMMA

Required file :
Raw read counts: sample_raw_reads.txt
Covariate matrix: cov_matrix.txt

For read counts file, the first three column is gene id, the second and third column is allele types (Ignore this here, type all A/T/C/G in one column).

For the covariate matrix, intercept is manually required.

Use DESeq2 to normalize the raw read counts, for details, see normalization.R

The file generated from normalization.R is :
normalized_reads.txt
cov_bim.txt

Use gzip command to get the compressed read counts .gz file which is required for GEMMA.

2.LMM test using GEMMA

Required file:
gzip read counts: normalized_reads.txt.gz
Phenotype: phenotype.txt
Covariate matrix: cov_bim.txt

Get Kinship matrix

gemma -g normalized_reads.txt.gz -p phenotype.txt -c cov_bim.txt -gk 2 -notsnp -o cov_mat

The default output file should in the output folder under the data directory, cov_mat.sXX.txt. Notice that this is generated by using gene expression data, not the really kinship matrix.

LMM

gemma -g normalized_reads.txt.gz -p phenotype.txt -k output/cov_mat.sXX.txt -c cov_bim.txt -lmm 4 -notsnp -o output. 

This will conduct the DGE anlysis by LMM GEMMA approach. The default output file should in the output folder under the data directory, output.assoc.txt.

p_wald, p_lrt and p_score is the p-value for wald test, likelihood ratio test and score test for the differentially expressed.

3.Visualization

Required file:
sample data: sample_data.txt
Create qq plot, manhattan plot and volcano plot.

The example qqplot:

QQplot

The example manhattan plot:

manplot

The example volcano plot:

Voc_plot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages