# <b>Module 4. nf-meripseq - an integrated nexflow pipeline </b>
--------------------------------------------

## Overview
You have learned the principles of MeRIP-seq data analysis and performed each processing step manually in previous modules. In this module, we shift focus from step-by-step manual analysis to using a reproducible, automated pipeline to streamline the entire workflow.

<b>nf-meripseq</b> is an integrated analysis pipeline built with <a href="https://www.nextflow.io/docs/latest/index.html">Nextflow</a> and supported by the <a href ="https://nf-co.re/">nf-core</a> community community. It bundles all the essential processing steps — quality control, alignment, peak calling, differential analysis, and reporting — into a single, easy-to-run command.

Advantages of using nextflow pipeline (nf-meripseq):
- **Automation**: Run all steps in one workflow, reducing manual error.
- **Reproducibility**: Ensure consistent results across computing environments.
- **Scalability**: Handle large datasets on local machines, HPC clusters, or cloud platforms.
- **Community best-practices**: Built and maintained by nf-core following strict standards for quality and testing.

## Learning Objectives
After this module, you will be able to:
+ Understand the purpose and advantages of using a pipeline for MeRIP-seq analysis.
+ Set up the environment to run a Nextflow pipeline (Nextflow + Singularity).
+ Successfully execute the nf-meripseq pipeline on a test dataset.
+ Explore the output structure and generated reports.

## Prerequisites
- Understanding the basic processing steps of MeRIPseq data: completed Modules 1–3 (MeRIP-seq basic concepts and manual data processing steps)

## Get Started
In this section, you will set up the necessary environment and successfully run the nf-meripseq pipeline on a test dataset.

### 1. Install necessary packages using <code>conda</code>
- <b>Nextflow</b> manages the execution of the pipeline. 
- <b>Singularity</b> is a container engine — a tool that packages and runs entire software environments in an isolated and portable way. Most nf-core pipelines are containerized, so the environment is automatically managed inside containers.

In [None]:
! conda install bioconda::nextflow conda-forge::singularity -y

### 2. Get datasets
This example dataset is the same one used in submodule1.

In [None]:
# copy the data from s3 bucket to example_dataset directory
! aws s3 cp s3://ovarian-cancer-example-fastqs/ example_dataset --recursive
# decompress the sequence reads files
! tar -zxvf example_dataset/fastqs.tar.gz -C example_dataset

### 3. Run nf-meripseq
<b>samplesheet.csv</b> 
The <code>control</code> column should be the sample identifier for the controls for any given IP. This column together with the control_replicate column will set the corresponding control for each of the samples in the table.

In [None]:
! nextflow run nf-meripseq -profile singularity \
    --input example_dataset/samplesheet.csv \
    --gtf example_dataset/gencode.v46.pri.chr11.1.5M.gtf \
    --fasta example_dataset/chr11_1.5M.fasta \
    --genome hg38 \
    --read_length 37 \
    --outdir="Tutorial_4" \
    --contrast "omental_tumor_vs_normal_Fallopian_tube" \
    -c add.config \
    -resume --skip_exomepeak2_single true

In [None]:
! ls /home/ec2-user/SageMaker/docker_image/exomepeak2/r-meripseq.sif


## Conclusion
Provide an overview of the lessons and skills learned from the module.

## Clean up
A reminder to shutdown VM and delete any relevant resources. <br><br>