Introduction

title

author

date

vignette

output

bibliography

brcaProteo -- proteogenomics data archive for breast cancer

Vincent J. Carey, stvjc at channing.harvard.edu

`r format(Sys.time(), '%B %d, %Y')`

%\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{} %\VignetteEncoding{UTF-8}

BiocStyle::html_document

highlight	number_sections	theme	toc
pygments	true	united	true

brcaProteo.bib

Introduction

@Mertins2016 unites information developed in the NCI CPTAC with TCGA data. The brcaProteo package reviews approaches to interrogating this data with Bioconductor.

Multiassay experiment and a sanity check

The MultiAssayExperiment instance brProteoMAE has two components that use DelayedArray assay representation. The assay data can be retrieved using assay when the environment variable CGC_BILLING is set to a valid Google Cloud Platform project id.

library(brcaProteo)
data(brProteoMAE)
brProteoMAE

The proteomic results are available for a subset of the TCGA contributions.

upsetSamples(brProteoMAE)

We can see that there are 76 samples contributing to three assays.

Intersection of samples and features across assays

We would like to line up samples and features that are common across assay types, in the simplest possible way.

Harmonizing feature annotation

We will relabel the rows of the RNA-seq data, which are given as Entrez identifiers. The other two assays use gene symbols.

We'll start out by isolating the relevant SummarizedExperiment and forcing an authentication, by querying for assay values.

library(restfulSE)
rna = experiments(brProteoMAE)[["rnaseq"]]
rna
assay(rna[1:3,1:4])

Now we use the orgDb interface to obtain gene symbols.

library(org.Hs.eg.db)
id = mapIds(org.Hs.eg.db, keys=rownames(rna), 
       column="SYMBOL", keytype="ENTREZID") 
id[is.na(id)] = rownames(rna)[is.na(id)]
rownames(rna) = id  # if no symbol, just retain entrez id

Reducing multiple instances of features

The proteomic data can have multiple feature values per gene; for this initial view we'll omit all but the first when there are multiplicities.

rd = rowData(experiments(brProteoMAE)[["itraq"]])
ok = which(!is.na(rd$geneName)&!duplicated(rd$geneName))
ph = experiments(brProteoMAE)[["itraq"]][ok,]
rownames(ph) = rowData(ph)$geneName

Using intersect*

The MultiAssayExperiment with common samples and features is then produced as follows:

slot(brProteoMAE, "ExperimentList")[["itraq"]] = ph
slot(brProteoMAE, "ExperimentList")[["rnaseq"]] = rna
brint = intersectColumns(intersectRows(brProteoMAE))

Retrieval from BigQuery

We'll materialize the assay components for RPPA and RNA-seq, taking logs of RNA-seq values:

assay(experiments(brint)[["rnaseq"]]) = 
   log(as.matrix(assay(experiments(brint)[["rnaseq"]]))+1)
assay(experiments(brint)[["rppa"]]) = 
   as.matrix(assay(experiments(brint)[["rppa"]]))
brint

Feature-specific correlation between assays

We extract a matrix for comparison of assay values over samples, focusing here on gene BCL2.

mt = sapply(experiments(brint["BCL2",]), assay) 
pairs(mt)

For all the selected, coincident features, interassay correlation is computed as follows:

fixup = function(x) data.matrix(na.omit(data.frame(x)))
corm = function(x) cor(fixup(sapply(experiments(brint[x,]), assay)))
allc = sapply(rownames(brint)[[1]], corm)
dim(allc)
summary(allc[2,]) # itraq vs rppa
summary(allc[3,]) # itraq vs rnaseq
summary(allc[8,]) # rppa vs rnaseq

The distributions of correlations can be searched to find assays that are not highly concordant across samples. The pairs plot for BRCA2 is

mt = sapply(experiments(brint["BRCA2",]), assay) 
pairs(mt)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
R		R
data		data
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

data

data

man

man

vignettes

vignettes

DESCRIPTION

DESCRIPTION

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

Introduction

Multiassay experiment and a sanity check

Intersection of samples and features across assays

Harmonizing feature annotation

Reducing multiple instances of features

Using intersect*

Retrieval from BigQuery

Feature-specific correlation between assays

References

About

Releases

Packages

Languages

vjcitn/brcaProteo

Folders and files

Latest commit

History

Repository files navigation

Introduction

Multiassay experiment and a sanity check

Intersection of samples and features across assays

Harmonizing feature annotation

Reducing multiple instances of features

Using intersect*

Retrieval from BigQuery

Feature-specific correlation between assays

References

About

Resources

Stars

Watchers

Forks

Languages