Introduction

title

date

vignette

output

htxcomp -- a compendium of sequenced human transcriptomes

`r format(Sys.time(), '%B %d, %Y')`

%\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{htxcomp -- a compendium of sequenced human transcriptomes} %\VignetteEncoding{UTF-8}

BiocStyle::html_document

highlight	number_sections	theme	toc
pygments	true	united	true

suppressPackageStartupMessages({
suppressMessages({
library(BiocStyle)
library(htxcomp)
library(beeswarm)
library(SummarizedExperiment)
library(DT)
})
})

Introduction

Comprehensive archiving of genome-scale sequencing experiments is valuable for substantive and methodological progress in multiple domains.

The r Biocpkg("htxcomp") package provides functions for interacting with quantifications and metadata for over 180000 sequenced human transcriptomes.

Access to gene-level quantifications

r Biocpkg("BiocFileCache") is used to manage access to a modest collection of metadata about compendium contents. By default, loadHtxcomp will load the cache and establish a connection to remote HDF5 representation of quantifications. As of 26 November 2018 the gene level quantifications are obtained via an HDF Server instance run by Channing Division of Network Medicine at Brigham and Women's Hospital.

library(htxcomp)
genelev = loadHtxcomp()
genelev
assay(genelev)

Identifying single-cell RNA-seq studies

We use crude pattern-matching in the study titles to identify single cell RNA-seq experiments

sing = grep("single.cell", genelev$study_title, 
   ignore.case=TRUE)
length(sing)

Now we will determine which studies are involved. We will check out the titles of the single-cell studies to assess the specificity of this approach.

sa = genelev$study_accession[sing]
sing2 = sing[-which(duplicated(sa))]
length(sing2)
datatable(as.data.frame(colData(genelev[,sing2])),
   options=list(lengthMenu=c(3,5,10,50,100)))

Collecting bulk RNA-seq samples on a disease of interest: glioblastoma

bulk = genelev[,-sing]
kpglio = grep("glioblastoma", bulk$study_title, 
  ignore.case=TRUE)
glioGene = bulk[,kpglio]
glioGene

To acquire numerical values, as.matrix(assay()) is needed.

beeswarm(as.matrix(assay(
   glioGene["ENSG00000138413.13",1:100])), pwcol=as.numeric(factor(glioGene$study_title[1:100])), ylab="IDH1 expression")
legend(.6, 15000, legend=unique(glioGene$study_accession[1:100]),
   col=1:2, pch=c(1,1))

Access to transcript-level quantifications

By setting genesOnly to FALSE in loadHtxcomp, we get a transcript-level version of the compendium. Note that the number of samples in this version exceeds that of the gene version by two. There are two unintended columns in the underlying HDF Cloud array, with names 'X0' and 'X0.1', that should be ignored.

txlev = loadHtxcomp(genesOnly=FALSE)
txlev

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
R		R
data		data
inst/pamphlet		inst/pamphlet
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

data

data

inst/pamphlet

inst/pamphlet

man

man

vignettes

vignettes

DESCRIPTION

DESCRIPTION

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

Introduction

Access to gene-level quantifications

Identifying single-cell RNA-seq studies

Collecting bulk RNA-seq samples on a disease of interest: glioblastoma

Access to transcript-level quantifications

About

Releases

Packages

Languages

vjcitn/htxcomp

Folders and files

Latest commit

History

Repository files navigation

Introduction

Access to gene-level quantifications

Identifying single-cell RNA-seq studies

Collecting bulk RNA-seq samples on a disease of interest: glioblastoma

Access to transcript-level quantifications

About

Resources

Stars

Watchers

Forks

Languages