Analysis Workflow for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq)
Branch: master
Clone or download
Latest commit cf472f0 Jan 29, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R separate alignment from peak statistics (if necessary) Aug 20, 2018
bed use encode TSS definition Apr 19, 2018
docker unzip missing Jun 26, 2018
example updated example Apr 26, 2018
motif footprints Jan 29, 2019
src v0.1.3 Jan 29, 2019
.gitignore motif hits Jan 29, 2019
.gitmodules use wrappers Jan 31, 2018
LICENSE Initial commit Sep 5, 2016
Makefile footprints Jan 29, 2019
README.md updated readme Jan 29, 2019

README.md

ATAC-Seq Pipeline Installation

git clone https://github.com/tobiasrausch/ATACseq.git

cd ATACseq

make all

If one of the above commands fail your operating system probably lacks some build essentials. These are usually pre-installed but if you lack them you need to install these. For instance, for Ubuntu this would require:

apt-get install build-essential g++ git wget unzip

Building promoter regions for QC and downloading motifs

To annotate motifs and estimate TSS enrichments some simple scripts are included in this repository to download these databases.

cd bed/ && Rscript promoter.R && cd ..

cd motif/ && ./downloadMotifs.sh && cd ..

Running the ATAC-Seq analysis pipeline for a single sample

./src/atac.sh <hg19|mm10> <read1.fq.gz> <read2.fq.gz> <genome.fa> <output prefix>

Plotting the key ATAC-Seq Quality Control metrics

The pipeline produces at various steps JSON QC files (*.json.gz). You can upload and interactively browse these files at https://gear.embl.de/alfred/. In addition, the pipeline produces a succinct QC file for each sample. If you have multiple output folders (one for each ATAC-Seq sample) you can simply concatenate the QC metrics of each sample.

head -n 1 ./*/*.key.metrics | grep "TssEnrichment" | uniq > summary.tsv

cat ./*/*.key.metrics | grep -v "TssEnrichment" >> summary.tsv

To plot the distribution for all QC parameters.

Rscript R/metrics.R summary.tsv

Differential peak calling

Merge peaks across samples and create a raw count matrix.

ls ./Sample1/Sample1.peaks ./Sample2/Sample2.peaks ./SampleN/SampleN.peaks > peaks.lst

ls ./Sample1/Sample1.bam ./Sample2/Sample2.bam ./SampleN/SampleN.bam > bams.lst

./src/count.sh hg19 peaks.lst bams.lst <output prefix>

To call differential peaks on a count matrix for TSS peaks, called counts.tss.gz, using DESeq2 we first need to create a file with sample level information (sample.info). For instance, if you have 2 replicates per condition:

echo -e "name\tcondition" > sample.info

zcat counts.tss.gz | head -n 1 | cut -f 5- | tr '\t' '\n' | sed 's/.final$//' | awk '{print $0"\t"int((NR-1)/2);}' >> sample.info

Rscript R/dpeaks.R counts.tss.gz sample.info

Footprinting

The pipeline also has a module to call footprints of nucleosome occupancy or transcription factor binding occupancy that are annotated for motif hits.

./src/footprints.sh <hg19|mm10> <genome.fa> <input.bam> <output prefix>

Citation

Tobias Rausch, Markus Hsi-Yang Fritz, Jan O Korbel, Vladimir Benes.
Alfred: Interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing.
Bioinformatics.

License

This ATAC-Seq pipeline is distributed under the GPLv3.