Join GitHub today
Build Signal Track
In this page, I describe how to use MACS v2 to generate signal track by comparing ChIP treatment against control, an essential step for calling peaks. Although MACS v2 main command 'macs2 callpeak' can generate signal tracks internally, for example p-value tracks if p-value is used as cutoff, or q-value track if q-value is used as cutoff, MACS v2 will only keep such signal tracks in memory instead of harddisk in order to save time and disk space. To generate appropriate signal track to profile transcription factor or histone modification enrichment levels over whole genome, users need to use another command 'macs2 bdgcmp' on the bedGraph files generated by 'macs2 callpeak'.
Table of Contents
Currently, MACS2 is on PyPI: https://pypi.python.org/pypi/MACS2/. Installation or upgrade can be done by pip or easy_install command. The current version is 220.127.116.1160309.
Do this on *nix computer with internet connection. Make sure you have permission to write.
$ easy_install MACS2
or if you have [SetUpVirtualEnv|VirtualEnv] set up:
$ pip install MACS2
To update old installation:
$ pip install -U MACS2
Here is a tutorial to process ChIP-seq data to generate signal tracks:
$ macs2 callpeak -t H3K36me1_EE_rep1.bam -c Input_EE_rep1.bam -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EE_rep1
* --nomodel and --extsize 147 tell MACS2 use 147bp as fragment size to pileup sequencing reads. * -g ce lets MACS2 consider C elegans genome as background. Set it as 'dm' for fly or 'hs' for human * -B --SPMR ask MACS2 to generate pileup signal file of 'fragment pileup per million reads' in bedGraph format.
Then you will have these two bedGraph files in the same directory:
$ macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_FE.bdg -m FE
$ macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_logLR.bdg -m logLR -p 0.00001
* -m FE means to calculate fold enrichment. Other options can be logLR for log likelihood, subtract for subtracting noise from treatment sample. * -p sets pseudocount. This number will be added to 'pileup per million reads' value. You don't need it while generating fold enrichment track because control lambda will always >0. But in order to avoid log(0) while calculating log likelihood, we'd add pseudocount. Because I set precision as 5 decimals, here I use 0.00001.
Then you will have this bedGraph file for fold-enrichment and logLR.
You need UCSC tools from (if linux):
Since sometimes, chromosome names in alignment and bedGraph files are not consistent with UCSC naming convention, you may need to fix the chromosomenames using awk or perl.
Then you need to convert bedGraph to bigWig using this simple script:https://gist.github.com/2469050
You have to download chromosome length file from UCSC too.
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz for human ( I renamed it as hg19.len )
http://hgdownload.cse.ucsc.edu/goldenPath/dm3/database/chromInfo.txt.gz for fly ( I renamed it as dm3.len )
$ bdg2bw H3K36me1_EE_rep1_FE.bdg hg19.len $ bdg2bw H3K36me1_EE_rep1_logLR.bdg hg19.len
And you will have these bigwig files:
You need UCSC tool:
$ wigCorrelate H3K36me1_EE_rep1_FE.bw H3K36me1_EE_rep2_FE.bw
If the correlation is good enough, you can combine replicates and run MACS2 again.
You don't need to use "samtools merge", just
$ macs2 callpeak -t H3K36me1_EE_rep1.bam H3K36me1_EE_rep2.bam -c Input_EE_rep1.bam Input_EE_rep2.bam -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EE
Then use the similar approaches as step 2 and 3 to generate signal tracks in bigWig format.
You can easily write a script to build a pipeline from step1 to step5.