Skip to content

MeRIPseqPipe:An integrated analysis pipeline for MeRIP-seq data based on Nextflow.

License

Notifications You must be signed in to change notification settings

wan230114/MeRIPseqPipe-1

 
 

Repository files navigation

MeRIPseqPipe

MeRIP-seq analysis pipeline arranged multiple alignment tools, peakCalling tools, Merge Peaks' methods and methylation analysis methods.

Nextflow check in Biotreasury install with bioconda Docker

Introduction

Here, we present MeRIPseqPipe, an integrated analysis pipeline for MeRIP-seq data based on Nextflow. It integrates ten main functional modules including data preprocessing, quality control, read mapping, peak calling, peak merging, motif searching, peak annotation, differential methylation analysis, differential expression analysis, and data visualization, which covers the basic analysis of MeRIP-seq data.

All the analysis modules are generated by Nextflow, and all the third-party tools are encapsulated in the Docker container.

Quick Start

  1. install nextflow

  2. pull docker image from dockerhub: kingzhuky/meripseqpipe:dev

  3. cloning this repository

    git clone https://github.com/canceromics/MeRIPseqPipe.git
    nextflow run /path/to/MeRIPseqPipe --help
  4. test it on a minimal dataset with a single command

    nextflow run path/to/meripseqpipe -profile test,docker
  5. Start running your own analysis!

    nextflow run path/to/meripseqpipe -profile docker --designfile designfile.tsv --comparefile compare.txt -resume --aligners star --fasta hg38_genome.fa --gtf gencode.v25.annotation.gtf --rRNA_fasta hg38_rRNA.fasta --outdir path/to/results --skip_createbedgraph --peakMerged_mode rank --star_index hg38/starindex --skip_meyer --skip_matk --methylation_analysis_mode Wilcox-test

See usage docs for more details and all of the available options when running the pipeline.

Documentation

The MeRIPseqPipe documentation is split into the following files:

  1. Usage
    • Parameter Documentation
    • An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
    • let us know if you need more customization!
  2. Output
    • An overview of the different results produced by the pipeline

Pipeline overview

This pipeline is built using Nextflow and integrates tools as follows:

  • Quality control and preprocessing of raw data
    • fastp: quality trimming and adapter clipping
    • FastQC: generate quality reports
    • RSeQC: assess mapping performance to give more insight into data quality
  • Read alignment
    • STAR: Spliced Transcripts Alignment to a Reference
    • HISAT2: memory efficient splice aware alignment to a reference
    • TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
    • BWA: fast and accurate short read alignment with Burrows-Wheeler transform
  • Peak calling
    • MACS2: Model-based Analysis of ChIP-seq
    • MeTPeak: a novel, graphical model-based peak-calling method
    • MATK: a deep learning-based MeRIP-seq analysis tool at single-nucleotide-resolution
    • Meyer: a peak-calling tool based on Fisher's exact test
  • Peak merging
    • RobustRankAggreg: a rank aggregation algorithm
    • MSPC: using combined evidence from replicates to evaluate ChIP-seq peaks
    • BEDTools: using "mergeBed" and "intersectBed" function
  • Peak annotation
    • Perl scripts: peak start/end position, gene start/end position, transcript ID, strand, gene type (coding or noncoding, lncRNA or mRNA, etc.), peak location, gene ensemble ID, etc.
    • annotatePeaks.pl: whether a peak is in the TSS (transcription start site), TTS (transcription termination site), Exon (Coding), 5' UTR Exon, 3' UTR Exon, Intronic, or Intergenic and also shows the distance to TSS
  • Motif searching
    • HMOER: Hypergeometric Optimization of Motif EnRichment
  • M6A sites predicition
    • MATK: predict m6A sites at single nucleotide resolution
  • Differential expression analysis
    • featureCounts: read counting relative to gene biotype
    • DESeq2: for differential expression analysis of RNA-Seq, SAGE-Seq, ChIP-Seq or HiC count data
    • edgeR: for differential expression analysis of RNA-Seq, SAGE-Seq, ChIP-Seq or HiC count data
  • Differential methylation analysis
    • QNB: a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data
    • MATK: using a Bayesian hierarchical model to eliminate the effect of basal expression and quantify the true m6A level by Markov Chain Monte Carlo sampling
    • Wilcox-test: results are generated by custom R scripts referred to RPKM methods
    • DESeq2: use a generalized linear model to detect changes in IP coverage while controlling for differences in Input coverage
    • edgeR: use a generalized linear model to detect changes in IP coverage while controlling for differences in Input coverage
  • Report
    • MultiQC: summarize all results from quality control and alignment
    • R packages

Credits

MeRIPseqPipe was originally written by Xiaoqiong Bao, Kaiyu Zhu.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

BioTreasury

MeRIPseqPipe has been uploaded to BioTreasury(https://biotreasury.rjmart.cn/#/tool?id=61140), welcome to use and comment!

Citation

Xiaoqiong Bao, Kaiyu Zhu, Xuefei Liu, Zhihang Chen, Ziwei Luo, Qi Zhao, Jian Ren, Zhixiang Zuo, MeRIPseqPipe: an integrated analysis pipeline for MeRIP-seq data based on Nextflow, Bioinformatics, 2022;, btac025, https://doi.org/10.1093/bioinformatics/btac025.

Acknowledgements

Thanks to nf-core for the support and guidance!

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

MeRIPseqPipe:An integrated analysis pipeline for MeRIP-seq data based on Nextflow.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Nextflow 43.8%
  • R 27.5%
  • Shell 11.8%
  • Perl 8.2%
  • Groovy 3.3%
  • Python 3.3%
  • Other 2.1%