Skip to content
xqwen edited this page Jan 8, 2018 · 17 revisions

Integrative Genetic Association Analysis using TORUS and DAP: A Tutorial

1. Overview of Integrative Genetic Association Analysis

Comparing to the traditional genetic association analysis, which typically attempts to identify association signals between a complex trait and densely genotyped genetic markers (SNPs), the integrative analysis also quantitatively includes genomic annotations of the genetic markers into the association analysis. Our software package aims to address three inter-related analysis goals:

  1. Assess the enrichment level of the annotated SNPs in the association signals (Enrichment Analysis)
  2. Discover genetic loci that harbor causal variants (QTL Discovery)
  3. Perform multi-SNP fine-mapping analysis for the identified loci from 2 (Multi-SNP Fine-mapping)

The first two goals can be achieved by the executable "torus" and the third aim can be achieved by the executable "dap".

1.1 Types of Applications

We currently support two types of applications: molecular (cis) QTL mapping and tradition single phenotype genome-wide association study (GWAS). In comparison to GWAS, a distinct feature of molecular QTL mapping is that tens of thousands (or hundreds of thousands) of molecular phenotypes (e.g., gene expression, DNA methylation, chromatin accessibility, histone modifications) are simultaneously measured and analyzed. In addition, the candidate (cis) genomic region for each molecular phenotype is typically not large (usually spanning 1 to 2 Mb), whereas for GWAS, the candidate SNPs cover the whole genome.

1.2 Defining Genomic Loci

In molecular QTL mapping, the candidate (cis) locus for each molecular phenotype is naturally defined. For GWAS, we adopt the partitioning algorithm recently proposed by Berisa and Pickrell, 2015 to segment the whole genome into a set of disjoint loci, which roughly represent independent LD blocks. The partition is population specific: for European population, there are about 1,700 loci and each locus on average spans 1.6Mb. The detailed information on the partitioning is provided in here by the Pickrell lab.

1.3 Supported Data Structure

Version 2 of the DAP implementation is only optimized for single stduy setting. The previous version supports genetic association data collected in a single study or in a meta-analytic setting. We are actively working on extending the software to support applications like multi-tissue eQTL mapping as described in Flutre et al, 2013. We will gradually integrate the features for multiple data structures back into version 2. This tutorial only covers the case of single association studies.

1.4 Summary Statistics vs. Individual-Level Data

In the current implementation, the multi-SNP fine-mapping analysis requires individual-level genotype data, and we are actively working to extend the fine-mapping analysis using only summary-level data.

Both enrichment analysis and QTL discovery require only summary statistics (in the simplest case, z-score or p-value from the single SNP association test).

2. Case Studies

2.1 Enrichment Analysis

2.1.1 Enrichment Analysis in cis-eQTL Mapping

This case study provides an example to perform enrichment analysis in molecular QTL mapping. The question we are asking in this case is: is SNP predicted to disrupt transcription factor (TF) binding enriched in cis-eQTLs? We perform the analysis using two eQTL data sets

In both examples, we use the TF binding annotations from the CENTIPEDE model and account for the genomic position of each candidate SNP with respect to the transcription start site (TSS) of the corresponding target gene.

In our demonstration, we use two types of input format to run the enrichment analysis: the single-tissue analysis of GTEx liver tissue uses the summary-level output from the software package MatrixEQTL; for the GEUVADIS data, we pre-process the data using a Bayesian single-SNP meta-analysis method and use the resulting Bayes factors as the input for the enrichment analysis.

In this example, we demonstrate the enrichment analysis in GWAS of a complex trait (HDL) using the genomic annotations used in Gusev et al, 2014 (downloaded from here). In this analysis, we use the summary-level single SNP association testing z-scores originally used by Pickrell, 2014. Follow the link for details.

QTL discovery aims to perform multiple hypothesis testing and identify genomic loci that harbor causal variants. In cis-eQTL mapping, QTL discovery is often referred to as eGene discovery. The analysis is done by the executable torus, and the details can be found here.

in this section of the tutorial, we provide detailed instructions for multi-SNP fine-mapping analysis using DAP. Some useful utility scripts to aid interpreting fine-mapping results are also introduced here.