Material and methods

Genome DNA sequence and annotations were download from Ensembl. Pyfaidx [1] was used to filter non-cannonical chromosomes. Agat [2] was used to correct common issues found in Ensembl genome annotation files, filter non- cannonical chromosomes, and remove transcripts with TSL being equal to NA. Samtools [3] and Picard [4] were used to index genome sequences.

Raw fastq file quality was assessed with FastQC [5]. Raw fastq files were trimmed using Fastp [6] . Cleaned reads were aligned over indexed Ensembl genome with Bowtie2 [7]. Sambamba [8] was used to sort, filter, mark duplicates, and compress aligned reads. Quality controls were done on cleaned, sorted, deduplicated aligned reads using Picard [4] and Samtools [3]. Additonal quality assessments are done with RSeQC [9], NGSderive [10], and GOleft [11]. Quality repord produced during both trimming and mapping steps have been aggregated with MultiQC [12].

The whole pipeline was powered by Snakemake [13]. This pipeline is freely available on Github, details about installation usage, and resutls can be found on the Snakemake workflow page.

[1]	Shirley, Matthew D., et al. Efficient" pythonic" access to FASTA files using pyfaidx. No. e1196. PeerJ PrePrints, 2015.

[2]	Dainat J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. (Version v0.7.0). Zenodo. https://www.doi.org/10.5281/zenodo.3552717

[3]	(1, 2) Li, Heng, et al. "The sequence alignment/map format and SAMtools." bioinformatics 25.16 (2009): 2078-2079.

[4]	(1, 2) McKenna, Aaron, et al. "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data." Genome research 20.9 (2010): 1297-1303.

[5]	Andrews, S. Fastqc. "A quality control tool for high throughput sequence data. Augen, J.(2004). Bioinformatics in the post-genomic era: Genome, transcriptome, proteome, and information-based medicine." (2010).

[6]	Chen, Shifu, et al. "fastp: an ultra-fast all-in-one FASTQ preprocessor." Bioinformatics 34.17 (2018): i884-i890.

[7]	Langmead, Ben, and Steven L. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature methods 9.4 (2012): 357-359.

[8]	Tarasov, Artem, et al. "Sambamba: fast processing of NGS alignment formats." Bioinformatics 31.12 (2015): 2032-2034.

[9]	Wang, Liguo, Shengqin Wang, and Wei Li. "RSeQC: quality control of RNA-seq experiments." Bioinformatics 28.16 (2012): 2184-2185.

[10]	McLeod, Clay, et al. "St. Jude Cloud: a pediatric cancer genomic data-sharing ecosystem." Cancer discovery 11.5 (2021): 1082-1099.

[11]	Pedersen, Brent S., et al. "Indexcov: fast coverage quality control for whole-genome sequencing." Gigascience 6.11 (2017): gix090.

[12]	Ewels, Philip, et al. "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32.19 (2016): 3047-3048.

[13]	Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.

Authors:	Thibault Dayris
Version:	3.5.1 of 06/09/2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

material_methods.rst

material_methods.rst

Material and methods

Files

material_methods.rst

Latest commit

History

material_methods.rst

File metadata and controls

Material and methods