RNA Sequencing Pipeline

A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.

The pipeline was created to run on the ETH Euler cluster and it relies on the server's Lmod environment modules and genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.

Pipeline steps

Required parameters

Path to the folder where the FASTQ files are located. --input

--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz

Output directory where the files will be saved. --outdir

--outdir /cluster/work/nme/data/josousa/project

Input optional parameters

Option to force the pipeline to assign input as single-end. --single_end

By default, the pipeline detects whether the input files are single-end or paired-end.

Option to select RNA-Seq library strandness. This will only affect quantification.

--strandness 'smartseq2' # Default (same as 'unstranded')
--strandness 'forward'
--strandness 'reverse'
--strandness 'unstranded'

This option will only affect quantification.

Genomes

Reference genome used for alignment. --genome

Available genomes:

    GRCm39 # Default
    GRCm38
    GRCh38
    GRCh37 
    panTro6
    CHIMP2.1.4
    BDGP6
    susScr11
    Rnor_6.0
    R64-1-1
    TAIR10
    WBcel235
    E_coli_K_12_DH10B
    E_coli_K_12_MG1655
    Vectors
    Lambda
    PhiX
    Mitochondria

Option to use a custom genome for alignment by providing an absolute path to a custom genome file.

--custom_genome_file '/cluster/work/nme/data/josousa/project/genome/CHM13.genome'

Example of a genome file:

name           GRCm39                                                                      
species        Mouse                                                                       
fasta          ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/WholeGenomeFasta/           
bismark        ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BismarkIndex/               
bowtie         ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BowtieIndex/genome          
bowtie2        ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Bowtie2Index/genome         
star           ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/STARIndex/            
bwa            ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BWAIndex/genome             
hisat2         ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/genome          
hisat2_splices ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/splice_sites.txt
gtf            ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Annotation/Genes/genes.gtf

Aligner options

Option to choose the aligner.

--aligner 'star' # Default
--aligner 'hisat2'

HISAT2 parameters

Option to choose no soft-clipping. --hisat2_no_softclip Default: true
Option to suppress unpaired alignments for paired reads --hisat2_no_mixed Default: true
Option to suppress discordant alignments for paired reads. --hisat2_no_discordant Default: true

FastQ Screen optional parameters

Option to provide a custom FastQ Screen config file.

--fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default

Option to pass the flag --bisulfite to FastQ Screen. --bisulfite Default: false

featureCounts optional parameters

Option to only count read pairs that have both ends aligned. --featurecounts_B_flag Default: true
Option to not count read pairs that have their two ends mapping to different chromosomes or mapping to same chromosome but on different strands. --featurecounts_C_flag Default: true

Skipping options

Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment. --skip_qc
Option to skip FastQ Screen. --skip_fastq_screen
Option to skip quantification. --skip_quantification

Extra arguments

Option to add extra arguments to FastQC. --fastqc_args
Option to add extra arguments to FastQ Screen. --fastq_screen_args
Option to add extra arguments to Trim Galore. --trim_galore_args
Option to add extra arguments to the STAR aligner. --star_align_args
Option to add extra arguments to the HISAT2 aligner. --hisat2_align_args
Option to add extra arguments to Samtools sort. --samtools_sort_args
Option to add extra arguments to Samtools index. --samtools_index_args
Option to add extra arguments to featureCounts. --featurecounts_args
Option to add extra arguments to MultiQC. --multiqc_args

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
genomes		genomes
modules		modules
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Sequencing Pipeline

Pipeline steps

Required parameters

Input optional parameters

Genomes

Aligner options

HISAT2 parameters

FastQ Screen optional parameters

featureCounts optional parameters

Skipping options

Extra arguments

Acknowledgements

About

Releases

Packages

Languages

vonMeyennLab/nf_rnaseq

Folders and files

Latest commit

History

Repository files navigation

RNA Sequencing Pipeline

Pipeline steps

Required parameters

Input optional parameters

Genomes

Aligner options

HISAT2 parameters

FastQ Screen optional parameters

featureCounts optional parameters

Skipping options

Extra arguments

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages