# <b>Module 1 - MeRIP-seq Data Acquisition and Preprocessing</b>

## Overview
This module will guide you through the acquisition and preprocessing of MeRIP-seq data, including essential steps such as quality control, adapter trimming, reference genome preparation, read alignment, peak calling, and motif discovery. The focus is on understanding how to process raw MeRIP-seq data to prepare it for downstream analysis.

## Learning Objectives
+ Explore an example MeRIP-seq dataset and its experiment design
+ Understand the core steps involved in preprocessing MeRIP-seq data
    - quality control, alignment, m6A peak identification, and annotation.
    - get familiar with the bioinformatics tools and important parameters for MeRIP-seq analysis
+ Data Visualization using IGV
    - understand the role of file formats (e.g., BAM, BED, BigWig) in genomic data visualization
    - explore and interpret alignment and m6A peak data for biological insights

## <mark>Prerequisites</mark>
- APIs that should be enabled: AWS S3 (Example dataset are stored there)
- List cloud platform account roles that must be assigned.
- List necessary cloud platform access.

## Outline
- **Getting started**
    1. Installing packages
    2. Setting up directory structures
    3. Downloading the Example Dataset
- **Step-by-Step Data Preprocessing**
    1. Quality Control 
    2. Read alignment
    3. Peak Calling and annotaion
    4. Motif Discovery
- **Data Visualization with IGV**
    1. View Alignments (BAM, bigWig)
    2. View identified m6A Peaks (BED, bedGraph)

---
## **1. Getting started**
### 1.1 Installing packages
[**mamba**](https://mamba.readthedocs.io/en/latest/user_guide/mamba.html) is a re-implementation of the **conda** package manager in C++. It is designed to address some of the performance limitations of conda, offering faster environment creation, package installation, and dependency resolution. Mamba is fully compatible with the conda ecosystem, meaning it can be used as a drop-in replacement for conda, providing access to the same package repositories.

In [None]:
! curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
! bash Miniforge3-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

Installation of the tools for this tutorial using <code>mamba</code>:

In [None]:
! mamba install -y -c conda-forge -c bioconda \
    fastqc \
    multiqc \
    trim-galore \
    star \
    bedtools \
    samtools \
    deeptools \
    ucsc-bigwigmerge \
    homer

Install **addtional** packages and tools:
- human genome <code>hg38</code> for HOMER
- a Python package <code>igv-notebook</code> for embedding IGV (Integrative Genomics Viewer) in an IPyhon notebook.

In [None]:
# install homer human genome
! perl /home/ec2-user/anaconda3/envs/python3/share/homer/.//configureHomer.pl -install hg38
# install IGV 
! pip install igv-notebook

### 1.2 Setting up directory structures
Establishing input and output directories

In [None]:
! mkdir -p Tutorial_1
! mkdir -p Tutorial_1/fastqc
! mkdir -p Tutorial_1/trimmed
! mkdir -p Tutorial_1/ref_genome
! mkdir -p Tutorial_1/homer
! mkdir -p Tutorial_1/igv

### 1.3 Downloading/Preparing the example dataset
#### About the dataset
The dataset used in this tutorial is derived from **GSE119168**, which was originally published as part of the study using the **RADAR** pipeline for MeRIP-seq. The RADAR pipeline is a computational framework designed to identify m6A-modified regions in RNA, specifically focusing on high-throughput MeRIP-seq data analysis, and we will use it for downstream analysis in the next tutorial. The dataset includes six omental tumor tissues and seven normal fallopian tube tissues, and both input and m6A immunoprecipation libraries were sequenced by the NextSeq 500 platform at PE37 mode (pair-end, 37bp). 

For this tutorial, a subset of the original data has been selected, focusing specifically on chromosome 11 (chr11:1-1,000,000). This region was chosen to include the **HRAS** gene, which is a key gene implicated in cancer progression and is regulated by m6A methylation. The HRAS gene is highlighted in another study ([Pan, Yongbo, et al.  PNAS (2023)](https://www.pnas.org/doi/abs/10.1073/pnas.2302291120)), which demonstrates how m6A modifications on HRAS RNA influence tumor progression, making it a biologically relevant region for studying m6A methylation.

The dataset provides a small, manageable region of the genome for tutorial purposes, enabling users to quickly process the data and explore m6A peak calling and motif discovery in the context of a known cancer-associated gene. 

This example dataset is stored at an AWS S3 bucket: s3://ovarian-cancer-example-fastqs

In [None]:
# copy the data from s3 bucket to Tutorial_1 directory
! aws s3 cp s3://ovarian-cancer-example-fastqs/ Tutorial_1 --recursive
# decompress the sequence reads files
! tar -zxvf Tutorial_1/fastqs.tar.gz -C Tutorial_1

---
## **2. Step-by-Step Data Preprocessing**
### 2.1 Quality control

#### Step 1. FastQC 
[**FastQC**](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) is simple tool that allows you to do some quality control checks on raw sequence data coming from high-throughput sequencing pipelines. It provides a modular set of analyses that you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. You can find examples of "good" and "bad" sequencing data from the [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) website, in section "Example Reports". 

Run FastQC on the sequence files and the fastqc reports will be saved in Tutorial_1/fastqc. The `for` loop iterates through all the fastq.gz files in the Tutorial_1 directory and run `fastqc` each one of them separately.

In [None]:
! for file in Tutorial_1/fastqs/*.gz; do fastqc -q -o Tutorial_1/fastqc "${file}"; done;

#### MultiQC (optional)
__[MultiQC](https://multiqc.info/)__ can aggregate results from bioinformatics analysis across many samples into a single report. In our case, it reads in the FastQC reports and generates a compiled report for all the eight analyzed FASTQ files, and the report will be saved in `Tutorial_1/multiqc`. We can also view the report in .html format:

In [None]:
# Run multiqc to summarize all the fastqc reports
! multiqc -f -p  Tutorial_1/fastqc -o Tutorial_1/multiqc

# View multiqc report
from IPython.display import IFrame
IFrame(src='Tutorial_1/multiqc/multiqc_report.html', width=1200, height=400)

#### Step 2. Adapter Trimming and Quality Filtering
Adapter sequences should be removed from reads because they interfere with downstream analyses, such as alignment of reads to a reference. In the FastQC report, the adapter content plot shows the percentage of reads (y-axis), which has an adapter starting at a particular position along a read (x-axis). And if the reads were fragmentaed to lower than than the target molecule length, high propotion of reads with adapters will be observed (right).

<img src="images/1-adapter_content.png" width=400 />

In this tutorial, we use __[Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)__ for adaptor trimming and quality control. Use the `--fastqc` flag to run FastQC again to check the trimming results. Use flag `-j` to indicate the number of cores to be used for trimming. This command will trim adapters from paired-end reads, save the results in the trimmed directory, and re-run FastQC on the trimmed data. **Note**: Don't forget to checkout the FastQC reports after adapter trimming.

In [None]:
! trim_galore -j 4 --paired --illumina -o Tutorial_1/trimmed --fastqc Tutorial_1/fastqs/*.gz

### 2.2 Read alignment
To determine where on the human genome our reads originated from, we will align our reads to the reference genome using **STAR** (Spliced Transcripts Alignment to a Reference). [STAR](https://github.com/alexdobin/STAR?tab=readme-ov-file) is an aligner designed to specifically address many of the challenges of RNA-seq data mapping using a strategy to account for spliced alignments. More details about how to use STAR can be found [here](https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html).
#### Step 1. Reference Genome Preparation
For this tutorial, we are using reads that originate from a small subsection of chromosome 11 and so we are using only a small region of human chr1 (**chr11:1-1.5M**) as the reference genome. This subset of the reference genome should already be included with the example dataset provided in the earlier steps. However, for a full-scale alignment of a complete dataset, the entire human genome should be downloaded and indexed properly. The latest FASTA and GTF annotation files for the human genome can be obtained from [Gencode](https://www.gencodegenes.org/), with the FASTA file size being approximately 800 MB. Proper indexing of the genome is required before running alignment for comprehensive analyses.

In [None]:
# Generate Reference Genome Before using STAR
! STAR --runThreadN 4 --runMode genomeGenerate \
    --genomeDir Tutorial_1/ref_genome \
    --genomeFastaFiles Tutorial_1/chr11_1.5M.fasta \
    --sjdbGTFfile Tutorial_1/gencode.v46.pri.chr11.1.5M.gtf \
    --genomeSAindexNbases 9

<div class="alert alert-block alert-info"> <b>Tip:</b> If we were building a fullscale human genome (THIS WILL TAKE HOURS), you can use commands similar as below: </a>. </div>
<p style="background:#EEEEEE;color:black"><code>! STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ref/genome --genomeFastaFiles ref/GRCh38.primary_assembly.genome.fa --sjdbGTFfile ref/gencode.v46.primary_assembly.annotation.gtf </code></p>

Once this job has successfully finished, we should have a STAR folder in the genome directory, with the following files: 
- chrLength.txt    
- chrNameLength.txt    
- chrName.txt    
- chrStart.txt
- exonGeTrInfo.tab
- exonInfo.tab
- geneInfo.tab
- Genome
- genomeParameters.txt
- SA
- SAindex
- sjdbInfo.txt
- sjdbList.fromGTF.out.tab
- sjdbList.out.tab
- transcriptInfo.tab

#### Step 2. Aligning reads
After the genome indices are generated, we can perform the read alignment using STAR. Note that the compressed fastq.gz files need to be decompressed before using STAR alignment. The output files will be in Tutorial_1/STAR directory.

In [None]:
%%bash

# Decompress the trimmed fastq.gz file so STAR can read them
for i in Tutorial_1/trimmed/*.gz; do
    gzip -d $i
done

for i in Tutorial_1/trimmed/*_R1_val_1.fq; do
    base_name=$(basename "$i" _R1_val_1.fq)
   
   # Run STAR with the proper parameters
    STAR --runThreadN 4 \
         --genomeDir Tutorial_1/ref_genome \
         --readFilesIn "$i" "${i/_R1_val_1.fq/_R2_val_2.fq}" \
         --outFileNamePrefix Tutorial_1/STAR/"$base_name" \
         --outSAMtype BAM SortedByCoordinate \
         --quantMode TranscriptomeSAM GeneCounts
done

#### MutliQC (optional) 
Again, we can use multiQC to generate reports of alignment.

In [None]:
# Run multiqc to summarize all the alignment reports
! multiqc -f -p  Tutorial_1/STAR -o Tutorial_1/multiqc

# View multiqc report
from IPython.display import IFrame
IFrame(src='Tutorial_1/multiqc/multiqc_report.html', width=1200, height=400)

### 2.3 Peak Calling and annotaion using HOMER
[**HOMER**](http://homer.ucsd.edu/homer/) (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for motif discovery and next-gen sequencing analysis.  It is a collection of command line programs for UNIX-style operating systems written mostly in Perl and C++, although some functionality requires additional tools to be installed as well (e.g. samtools, R, etc.). HOMER was primarily written as a de novo motif discovery algorithm and is well suited for finding 8-20 bp motifs in large scale genomics data.  HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.
#### Step 1. Convert BAM to HOMER tag directory
To facilitate the analysis of MeRIP-seq, it is useful to first transform the sequence alignment into platform independent data structure representing the experiment, analogous to loading the data into a database.  HOMER does this by placing all relevant information about the experiment into a "Tag Directory". During the creation of tag directories, several quality control routines are run to help provide information and feedback about the quality of the experiment. During this phase several important parameters are estimated that are later used for downstream analysis, such as the estimated length of MeRIP-Seq fragments.

Here, we combined the read alignments from the same group (tumor vs. normal, input vs. m6A-IP）together to identify peaks for each group.

In [None]:
# Convert BAM to HOMER tag directory
! makeTagDirectory Tutorial_1/homer/tagDirectory-tumor-input -format sam $(ls Tutorial_1/STAR/*ByCoord.out.bam | grep -E '3558|3559|3560|3561|3562|3563')
! makeTagDirectory Tutorial_1/homer/tagDirectory-normal-input -format sam $(ls Tutorial_1/STAR/*ByCoord.out.bam | grep -E '3564|3565|3566|3567|3568|3569|3570')
! makeTagDirectory Tutorial_1/homer/tagDirectory-tumor-m6AIP -format sam $(ls Tutorial_1/STAR/*ByCoord.out.bam | grep -E '3571|3572|3573|3574|3575|3576')
! makeTagDirectory Tutorial_1/homer/tagDirectory-normal-m6AIP -format sam $(ls Tutorial_1/STAR/*ByCoord.out.bam | grep -E '3577|3578|3579|3580|3581|3582|3583')

#### Step 2. Peak Calling
Finding peaks is one of the central goals of MeRIP-Seq experiment, and the same basic principles apply to other types of sequencing such as ChIP-Seq and DNase-Seq. The basic idea is to identify regions in the genome where we find more sequencing reads than we would expect to see by chance. There are number of different approaches one can use to find peaks, and correspondingly there are many different methods for identifying differential peaks. We will introduce some of other tools in later tutorials.

When calling the peaks, the first argument must be the tag directory (required). The <code>-style</code> option can be either "factor", "histone", or one of a number of specialized types (see [here](http://homer.ucsd.edu/homer/ngs/peaks.html) for more details). Use the <code>-i</code> option to specify a control experiment tag directory (the input libraries for MeRIP-seq).

In [None]:
# Call peaks, using input as control
! findPeaks Tutorial_1/homer/tagDirectory-tumor-m6AIP -style factor -o Tutorial_1/homer/tumor-peaks.txt -i Tutorial_1/homer/tagDirectory-tumor-input
! findPeaks Tutorial_1/homer/tagDirectory-normal-m6AIP -style factor -o Tutorial_1/homer/normal-peaks.txt -i Tutorial_1/homer/tagDirectory-tumor-input

#### Step 3. Peak Merging

In [None]:
# Merge peaks across samples
! mergePeaks Tutorial_1/homer/normal-peaks.txt  Tutorial_1/homer/tumor-peaks.txt >  Tutorial_1/homer/merged_peaks.txt

#### Step 4. Peak Annotation
Use HOMER to annotate the identified peaks against the reference genome (hg38). This step provides biological context to the m6A peaks.

In [None]:
# Annotate peaks with HOMER
! annotatePeaks.pl Tutorial_1/homer/merged_peaks.txt hg38 > Tutorial_1/homer/annotated_merged_peaks.txt -noann

### 2.4 Motif Discovery
Motifs are biologically significant nucleic acid sequence patterns that RNA methylation-related enzymes recognize and bind to regulate gene expression. The **RRACH** motif is a well-known consensus sequence associated with m6A (N6-methyladenosine) modifications in RNA (R = A or G, H = A, C or U), where the adenosine serves as the methylation site. This motif is commonly found in **3' UTRs** and near **stop codons**, playing a crucial role in post-transcriptional gene regulation by influencing RNA stability, splicing, and translation efficiency. Highly conserved across species, RRACH is a primary target for m6A methylation and dynamically regulated in response to cellular conditions. 

<img src="images/1-RRACH-motif.png" width=600 />

In **MeRIP-seq** studies, RRACH motifs are often enriched within m6A peaks, making them key markers for understanding the functional impact of m6A modifications. To identify whether m6A peaks contain the RRACH motif, tools like **HOMER** can be used for motif discovery in peak regions, providing insights into credible methylation sites.

In [None]:
# Find motifs within peaks
! findMotifsGenome.pl Tutorial_1/homer/merged_peaks.txt hg38 Tutorial_1/homer/motif_output/ -rna -len 5,6,7

#### Visualization of identified motifs

In [None]:
from IPython.display import IFrame
IFrame('Tutorial_1/homer/motif_output/homerResults.html', width=800, height=400)

---
## **3. Data Visualization with IGV**
The Integrative Genomics Viewer (IGV) is an interactive tool for the visual exploration of genomic data. It supports flexible integration of all the common types of genomic data and metadata. IGV supports many different file formats, such as .bam, .bed, GFF/GTF, .fasta. For a full list of file formats IGV supported, please visit https://software.broadinstitute.org/software/igv/FileFormats.

IGV can be downloaded as a desktop application, and it also has a JavaScript version that can embed IGV in the web apps. The igv-notebook we are going to use in this tutorial is a Python package which wraps igv.js for embedding it in an IPython notebook.

#### Basic usage
+ Select reference genome - IGV hosts dozens of genomes and you can load other genomes too
+ Load data tracks
+ Navigate
    - Zoom in/out - from whole genome view to base pair resolution
    - Scroo/pan - view neighboring regions
    - Jump to locus - enter coordinates or name
    
#### Install igv-notebook
The Python package <code>igv-notebook</code> needs to be installed with pip: <code>pip install igv-notebook</code>. It should already be installed in the installation steps in this module.

#### Intitialize IGV
Create a browser "b", showing a mouse reference hg38 from chromosome 11. You can change the settings in the browser interactively. The output should like this:

<img src="images/1-igv1.png" width=800 />

In [None]:
import igv_notebook
igv_notebook.init()

b = igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr11:523,498-543,502"
    }
)

#### Load data tracks
IGV displays data in horizontal rows called **tracks**. Typically, each track represents one sample or experiment. Track names are listed in the far-left panel. Legibility of the names depends on the height of the tracks, i.e., the smaller the track the less legible the name. There are different types of tracks (different file formats) that IGV can display:  
- **Data tracks** display numeric values, such as the methylation levels in our tutorial
- **Feature tracks** identify genomic features. For an example, see the Refseq Genes track, which IGV loads when you select a genome.
- **Alignment Track** display alignments 

### 3.1 View Alignments
#### 1. BAM files and indexing  

IGV requires that both SAM and BAM files be **sorted** by position and **indexed**, and that the index files follow a specific naming convention. Specifically, a BAM index file should be named by appending `.BAI` to the BAM file name. 

To view the alignment files generated from STAR alignment in this tutorial, we first need to sort the alignment files (.bam) to generate the index (.bai) files: (the .bam files will be found in `Tutorial_1/STAR` directory)

In [None]:
%%bash
# Loop through all .bam files in the STAR directory
for bam_file in Tutorial_1/STAR/*Aligned.sortedByCoord.out.bam; do
    base_name=$(basename "$bam_file" Aligned.sortedByCoord.out.bam)
    #base_name=$(basename "$bam_file" Aligned.toTranscriptome.out.bam)
    # Sort the BAM file
    samtools sort "$bam_file" -o "Tutorial_1/igv/${base_name}.sorted.bam"
    
    # Index the sorted BAM file
    samtools index "Tutorial_1/igv/${base_name}.sorted.bam"
done

#### 2. Alignment coverage and bigWig files
The **bigWig** format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the UCSC and IGV. This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The coverage values represented in the bigWig file can also be normalised in order to be able to compare the coverage across multiple samples - this is not possible with BAM files. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting.

<img src="images/1-bigWig.png" width=600 />

Image source: [deepTools documentation](https://deeptools.readthedocs.io/en/latest/content/tools/bamCoverage.html)

In [None]:
%%bash
# Loop through all BAM files and convert them to BigWig
for bam_file in Tutorial_1/igv/*.bam; do
    # Define the output BigWig filename
    bw_file="${bam_file%.sorted.bam}.bw"  # Change extension from .bam to .bw

    # Run bamCoverage with normalization by CPM and a bin size of 100 bp
    bamCoverage -b "$bam_file" -o "$bw_file" --normalizeUsing CPM --binSize 10
done

#### 3. Load and view the alignment (BAM) and coverage (bigWig) files to igv-notebook.
Start a new browser "b2", load into the browser: 
+ reference genome
+ the alignment .bam and its index file .bai
+ bigWig coverage file 

In this example, only data from one sample (SRR7763558, input library from tumor samples) is used to show the visualziaiton of alignemnt and its coverage.

In [None]:
import igv_notebook
igv_notebook.init()
b2 = igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr11:531,000-536,000"
    }
)
b2.load_track({
    "name": ".bam",
    "path": "Tutorial_1/igv/subset_SRR7763558.sorted.bam",
    "indexPath": "Tutorial_1/igv/subset_SRR7763558.sorted.bam.bai",
    "format": "bam",
    "type": "alignment",
    "height": 100
})
b2.load_track({
    "name": ".bigWig",
    "path": "Tutorial_1/igv/subset_SRR7763558.bw",
    "format": "bigWig"
})


### 3.2 View identified m6A peaks
#### BED files
Create .bed files for peaks identifed using HOMER:

In [None]:
# view the peak output from HOMER
! grep -v '^#'  Tutorial_1/homer/normal-peaks.txt | awk '!/^#/ {print $2 "\t" $3 "\t" $4 "\t" $1 "\t" $8 "\t" $5}' > Tutorial_1/igv/normal-peaks.bed
! grep -v '^#'  Tutorial_1/homer/tumor-peaks.txt | awk '!/^#/ {print $2 "\t" $3 "\t" $4 "\t" $1 "\t" $8 "\t" $5}' > Tutorial_1/igv/tumor-peaks.bed
! head Tutorial_1/igv/normal-peaks.bed

#### bedGraph files
Merge bigWig files into bedGraph files using <code>bigWigMerge</code>. These merged files can then be visualized in IGV as a single track, simplifying the comparison between input and IP (immunoprecipitated) samples.

In [None]:
%%bash
cd Tutorial_1/igv
bigWigMerge subset_SRR7763558.bw subset_SRR7763559.bw subset_SRR7763560.bw subset_SRR7763561.bw subset_SRR7763562.bw subset_SRR7763563.bw tumor-input.bg
bigWigMerge subset_SRR7763564.bw subset_SRR7763565.bw subset_SRR7763566.bw subset_SRR7763567.bw subset_SRR7763568.bw subset_SRR7763569.bw subset_SRR7763570.bw normal-input.bg
bigWigMerge subset_SRR7763571.bw subset_SRR7763572.bw subset_SRR7763573.bw subset_SRR7763574.bw subset_SRR7763575.bw subset_SRR7763576.bw tumor-m6A-IP.bg
bigWigMerge subset_SRR7763577.bw subset_SRR7763578.bw subset_SRR7763579.bw subset_SRR7763580.bw subset_SRR7763581.bw subset_SRR7763582.bw subset_SRR7763583.bw normal-m6A-IP.bg

#### Load .bed and .bedGraph tracks in a IGV browser
<img src="images/1-igv2.png" width=800/>

In [None]:
import igv_notebook
igv_notebook.init()
b3 = igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr11:531,000-536,000"
    }
)
b3.load_track({
    "name": "Tumor input",
    "path": "Tutorial_1/igv/tumor-input.bg",
    "format": "bedGraph",
    "color": "blue"
})
b3.load_track({
    "name": "Tumor m6A-IP",
    "path": "Tutorial_1/igv/tumor-m6A-IP.bg",
    "format": "bedGraph",
    "color": "red"
})
b3.load_track({
    "name": "Normal input",
    "path": "Tutorial_1/igv/normal-input.bg",
    "format": "bedGraph",
    "color": "blue"
})
b3.load_track({
    "name": "Normal m6A-IP",
    "path": "Tutorial_1/igv/normal-m6A-IP.bg",
    "format": "bedGraph",
    "color": "red"
})
b3.load_track({
    "name": "tumor-peaks.bed",
    "path": "Tutorial_1/igv/tumor-peaks.bed",
    "format": "bed",
    "color": "black",
    "height": 40
})
b3.load_track({
    "name": "normal-peaks.bed",
    "path": "Tutorial_1/igv/normal-peaks.bed",
    "format": "bed",
    "color": "darkgreen",
    "height": 40
})


## Conclusion
In this module, we covered the following key concepts and workflows:
+ **MeRIP-seq Data Preprocessing**: Downloading the dataset, setting up directories, quality control with FastQC, adapter trimming, and aligning reads using STAR.
+ **Peak Calling and Annotation**: Using HOMER for peak calling and motif discovery, helping us identify regions enriched in m6A modifications.
+ **Data Visualization**: Using IGV for exploring alignment files, coverage, and peaks, providing a comprehensive view of the MeRIP-seq data.
+ **BigWig and BAM Manipulations**: Converting alignment files and coverage tracks into formats suitable for visualization, merging replicates for more streamlined analysis.
By following these steps, you now have a full understanding of the MeRIP-seq data preprocessing pipeline and can apply similar workflows to your own datasets for robust RNA methylation analysis.

## Clean up
A reminder to shutdown VM and delete any relevant resources. <br><br>

<br>