# <b>Module 4. nf-meripseq - an integrated nexflow pipeline </b>
--------------------------------------------


## Overview
<b>nf-meripseq</b> is an integrated analysis pipeline for MeRIP-seq data built on <a href="https://www.nextflow.io/docs/latest/index.html">Nextflow</a>, with full support from the <a href ="https://nf-co.re/">nf-core</a> community. It provides an automated, user-friendly solution for in-depth mining of MeRIP-seq data, focusing on m6A modification analysis. The pipeline integrates a wide range of functional modules, from basic processing (e.g., alignment, peak calling) to advanced downstream analysis (e.g., differential expression, visualization). As an nf-core-supported pipeline, it adheres to best-practice standards and leverages Nextflow with <a href="https://docs.docker.com/">Docker</a> and <a href="https://docs.sylabs.io/guides/4.3/user-guide/">Singularity</a>  support, ensuring high reproducibility and scalability across diverse computing environments. nf-meripseq is ideal for processing large sample sets with a single command, delivering results in a structured output directory organized by step and tool. Comprehensive visualization reports, including tables and plots, are provided as HTML files, benefiting from nf-core’s rigorous testing and documentation framework.

## Learning Objectives
+ Get familiar with nextflow and nf-core workflows
+ Run nf-meripseq using the example dataset
    - Understanding the input paramerters of the nf-meripseq workflow
    - Understanding the output and reports from the nf-meripseq workflow

## Prerequisites
- Understanding the basic processing steps of MeRIPseq data

## Get Started
All the analysis modules are generated by Nextflow, and all the third-party tools are encapsulated in the Docker container.

### 1. Install necessary package 
install <code>nextflow</code>, <code>Singularity</code> 

In [None]:
! conda install bioconda::nextflow conda-forge::singularity -y --quiet

### 2. Get datasets
This example dataset is the same one used in submodule1.

In [None]:
# copy the data from s3 bucket to example_dataset directory
! aws s3 cp s3://nigms-sandbox/ovarian-cancer-example-fastqs/ example_dataset --recursive
# decompress the sequence reads files
! tar -zxvf example_dataset/fastqs.tar.gz -C example_dataset

### 3. Run nf-meripseq
<b>samplesheet.csv</b> 
The <code>control</code> column should be the sample identifier for the controls for any given IP. This column together with the control_replicate column will set the corresponding control for each of the samples in the table.

In [None]:
! nextflow run nf-meripseq -profile singularity \
    --input example_dataset/samplesheet.csv \
    --gtf example_dataset/gencode.v46.pri.chr11.1.5M.gtf \
    --fasta example_dataset/chr11_1.5M.fasta \
    --read_length 37 \
    --outdir="meripseq_results" \
    -c add.config \

## Conclusion
Provide an overview of the lessons and skills learned from the module.

## Clean up
A reminder to shutdown VM and delete any relevant resources. <br><br>