bambu: reference-guided transcript discovery and quantification for long read RNA-Seq data

bambu is a R package for multi-sample transcript discovery and quantification using long read RNA-Seq data. You can use bambu after read alignment to obtain expression estimates for known and novel transcripts and genes. The output from bambu can directly be used for visualisation and downstream analysis such as differential gene expression or transcript usage.

Installation

You can install bambu from bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("bambu")

General Usage

The default mode to run *bambu is using a set of aligned reads (bam files), reference genome annotations (gtf file, TxDb object, or bambuAnnotation object), and reference genome sequence (fasta file or BSgenome). bambu will return a summarizedExperiment object with the genomic coordinates for annotated and new transcripts and transcript expression estimates.

We highly recommend to use the same annotations that were used for genome alignment. If you have a gtf file and fasta file you can run bambu with the following options:

test.bam <- system.file("extdata", "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam", package = "bambu")
  
fa.file <- system.file("extdata", "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa", package = "bambu")

gtf.file <- system.file("extdata", "Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf", package = "bambu")

bambuAnnotations <- prepareAnnotations(gtf.file)

se <- bambu(reads = test.bam, annotations = bambuAnnotations, genome = fa.file)

Quantification of annotated transcripts and genes only (no transcript/gene discovery)

bambu(reads = test.bam, annotations = txdb, genome = fa.file, discovery = FALSE)

Large sample number/ limited memory
For larger sample numbers we recommend to write the processed data to a file:

bambu(reads = test.bam, rcOutDir = "./bambu/", annotations = bambuAnnotations, genome = fa.file)

Use precalculated annotation objects

You can also use precalculated annotations.

If you plan to run bambu more frequently, we recommend to save the bambuAnnotations object.

The bambuAnnotation object can be calculated from a .gtf file:

annotations <- prepareAnnotation(gtf.file)

From TxDb object

annotations <- prepareAnnotations(txdb)

Advanced Options

More stringent filtering thresholds imposed on potential novel transcripts

Keep novel transcripts with min 5 read count in at least 1 sample:

bambu(reads, annotations, genome, opt.discovery = list(min.readCount = 5))

Keep novel transcripts with min 5 samples having at least 2 counts:

bambu(reads, annotations, genome, opt.discovery = list(min.sampleNumber = 5))

Filter out transcripts with relative abundance within gene lower than 10%:

bambu(reads, annotations, genome, opt.discovery = list(min.readFractionByGene = 0.1))

Quantification without bias correction

The default estimation automatically does bias correction for expression estimates. However, you can choose to perform the quantification without bias correction.

bambu(reads, annotations, genome, opt.em = list(bias = FALSE))

Parallel computation
bambu allows parallel computation.

bambu(reads, annotations, genome, ncore = 8)

See manual for details to customize other conditions.

Complementary functions

Transcript expression to gene expression

transcriptToGeneExpression(se)

Visualization

You can visualize the novel genes/transcripts using plotBambu function

plotBambu(se, type = "annotation", gene_id)

plotBambu(se, type = "annotation", transcript_id)

plotBambu can also be used to visualize the clustering of input samples on gene/transcript expressions

plotBambu(se, type = "heatmap") # heatmap 

plotBambu(se, type = "pca") # PCA visualization

plotBambu can also be used to visualize the clustering of input samples on gene/transcript expressions with grouping variable

plotBambu(se, type = "heatmap", group.var) # heatmap 

plotBambu(se, type = "pca", group.var) # PCA visualization

Write bambu outputs to files

writeBambuOutput will generate three files, including a .gtf file for the extended annotations, and two .txt files for the expression counts at transcript and gene levels.

writeBambuOutput(se, path = "./bambu/")

Release History

bambu version 1.0.2

Release date: 2020-11-10

bug fix for author name display
bug fix for calling fasta file and bam file from ExperimentHub
update NEWS file

bambu version 1.0.0

Release date: 2020-11-06

bug fix for parallel computation to avoid bplapply

bambu version 0.99.4

Release date: 2020-08-18

remove codes using seqlevelStyle to allow customized annotation
update the requirement of R version and ExperimentHub version

bambu version 0.3.0

Release date: 2020-07-27

bambu now runs on windows with a fasta file
update to the documentation (vignette)
prepareAnnotations now works with TxDb or gtf file
minor bug fixes

bambu version 0.2.0

Release date: 2020-06-18

bambu version 0.1.0

Release date: 2020-05-29

Citation

A manuscript describing bambu is currently in preparation. If you use bambu for your research, please cite using the following doi: 10.18129/B9.bioc.bambu.

Contributors

This package is developed and maintained by Ying Chen, Yuk Kei Wan, and Jonathan Goeke at the Genome Institute of Singapore. If you want to contribute, please leave an issue. Thank you.

Name		Name	Last commit message	Last commit date
Latest commit History 863 Commits
R		R
data-raw		data-raw
docs		docs
figures		figures
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bambu: reference-guided transcript discovery and quantification for long read RNA-Seq data

Content

Installation

General Usage

Use precalculated annotation objects

Advanced Options

Complementary functions

Release History

Citation

Contributors

About

Releases

Packages

Languages

License

yuukiiwa/bambu

Folders and files

Latest commit

History

Repository files navigation

bambu: reference-guided transcript discovery and quantification for long read RNA-Seq data

Content

Installation

General Usage

Use precalculated annotation objects

Advanced Options

Complementary functions

Release History

Citation

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages