Skip to content

A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.

Notifications You must be signed in to change notification settings

vonMeyennLab/nf_rnaseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA Sequencing Pipeline

A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.

The pipeline was created to run on the ETH Euler cluster and it relies on the server's Lmod environment modules and genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.

Pipeline steps

  1. FastQC
  2. FastQ Screen
  3. Trim Galore
  4. FastQC
  5. STAR default
  6. HISAT2 optional
  7. Samtools sort
  8. Samtools index
  9. featureCounts
  10. MultiQC

Required parameters

Path to the folder where the FASTQ files are located. --input

--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz

Output directory where the files will be saved. --outdir

--outdir /cluster/work/nme/data/josousa/project

Input optional parameters

  • Option to force the pipeline to assign input as single-end. --single_end

    By default, the pipeline detects whether the input files are single-end or paired-end.

  • Option to select RNA-Seq library strandness. This will only affect quantification.

    --strandness 'smartseq2' # Default (same as 'unstranded')
    --strandness 'forward'
    --strandness 'reverse'
    --strandness 'unstranded'

    This option will only affect quantification.

Genomes

  • Reference genome used for alignment. --genome

    Available genomes:

        GRCm39 # Default
        GRCm38
        GRCh38
        GRCh37 
        panTro6
        CHIMP2.1.4
        BDGP6
        susScr11
        Rnor_6.0
        R64-1-1
        TAIR10
        WBcel235
        E_coli_K_12_DH10B
        E_coli_K_12_MG1655
        Vectors
        Lambda
        PhiX
        Mitochondria
  • Option to use a custom genome for alignment by providing an absolute path to a custom genome file.

    --custom_genome_file '/cluster/work/nme/data/josousa/project/genome/CHM13.genome'

    Example of a genome file:

    name           GRCm39                                                                      
    species        Mouse                                                                       
    fasta          ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/WholeGenomeFasta/           
    bismark        ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BismarkIndex/               
    bowtie         ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BowtieIndex/genome          
    bowtie2        ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Bowtie2Index/genome         
    star           ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/STARIndex/            
    bwa            ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/BWAIndex/genome             
    hisat2         ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/genome          
    hisat2_splices ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Sequence/Hisat2Index/splice_sites.txt
    gtf            ${GENOMES}/Mus_musculus/Ensembl/GRCm39/Annotation/Genes/genes.gtf           

Aligner options

  • Option to choose the aligner.
    --aligner 'star' # Default
    --aligner 'hisat2'

HISAT2 parameters

  • Option to choose no soft-clipping. --hisat2_no_softclip Default: true

  • Option to suppress unpaired alignments for paired reads --hisat2_no_mixed Default: true

  • Option to suppress discordant alignments for paired reads. --hisat2_no_discordant Default: true

FastQ Screen optional parameters

  • Option to provide a custom FastQ Screen config file.

    --fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default
  • Option to pass the flag --bisulfite to FastQ Screen. --bisulfite Default: false

featureCounts optional parameters

  • Option to only count read pairs that have both ends aligned. --featurecounts_B_flag Default: true

  • Option to not count read pairs that have their two ends mapping to different chromosomes or mapping to same chromosome but on different strands. --featurecounts_C_flag Default: true

Skipping options

  • Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment. --skip_qc

  • Option to skip FastQ Screen. --skip_fastq_screen

  • Option to skip quantification. --skip_quantification

Extra arguments

  • Option to add extra arguments to FastQC. --fastqc_args

  • Option to add extra arguments to FastQ Screen. --fastq_screen_args

  • Option to add extra arguments to Trim Galore. --trim_galore_args

  • Option to add extra arguments to the STAR aligner. --star_align_args

  • Option to add extra arguments to the HISAT2 aligner. --hisat2_align_args

  • Option to add extra arguments to Samtools sort. --samtools_sort_args

  • Option to add extra arguments to Samtools index. --samtools_index_args

  • Option to add extra arguments to featureCounts. --featurecounts_args

  • Option to add extra arguments to MultiQC. --multiqc_args

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.

About

A Nextflow pipeline to perform quality control, alignment, and quantification of RNA sequencing data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published