From a7bc51dea2b14bc797b2b0395282ca3a1fa6e8d9 Mon Sep 17 00:00:00 2001 From: Vince Carey Date: Mon, 16 Sep 2013 14:04:36 -0400 Subject: [PATCH] trying direct md link --- ArchiveLab/archive.Rnw.md | 266 ++------------------------------------ 1 file changed, 13 insertions(+), 253 deletions(-) diff --git a/ArchiveLab/archive.Rnw.md b/ArchiveLab/archive.Rnw.md index b9bcf21..a8705e7 100644 --- a/ArchiveLab/archive.Rnw.md +++ b/ArchiveLab/archive.Rnw.md @@ -66,200 +66,43 @@ a yeast microarray probe. First, we obtain the reference genomic sequence for sacCer2 version of yeast. - -```r +```{r lkg} library(BSgenome.Scerevisiae.UCSC.sacCer2) -``` - -``` -## Loading required package: BSgenome Loading required package: BiocGenerics -## Loading required package: parallel -## -## Attaching package: 'BiocGenerics' -## -## The following objects are masked from 'package:parallel': -## -## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, -## clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, -## parSapply, parSapplyLB -## -## The following object is masked from 'package:stats': -## -## xtabs -## -## The following objects are masked from 'package:base': -## -## anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, -## duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, -## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, -## rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, -## union, unique, unlist -## -## Loading required package: IRanges Loading required package: GenomicRanges -## Loading required package: XVector Loading required package: Biostrings -``` - -```r class(Scerevisiae) -``` - -``` -## [1] "BSgenome" -## attr(,"package") -## [1] "BSgenome" -``` - -```r Scerevisiae ``` -``` -## Yeast genome -## | -## | organism: Saccharomyces cerevisiae (Yeast) -## | provider: UCSC -## | provider version: sacCer2 -## | release date: June 2008 -## | release name: SGD June 2008 sequence -## | -## | sequences (see '?seqnames'): -## | chrI chrII chrIII chrIV chrV chrVI chrVII chrVIII -## | chrIX chrX chrXI chrXII chrXIII chrXIV chrXV chrXVI -## | chrM 2micron -## | -## | (use the '$' or '[[' operator to access a given sequence) -``` - - We focus attention on chrIV for now. - -```r +```{r doc4} c4 = Scerevisiae$chrIV class(c4) -``` - -``` -## [1] "DNAString" -## attr(,"package") -## [1] "Biostrings" -``` - -```r c4 ``` -``` -## 1531919-letter "DNAString" instance -## seq: ACACCACACCCACACCACACCCACACACACCACA...AAACATAAAATAAAGGTAGTAAGTAGCTTTTGG -``` - - Now we obtain the probe and annotation data for the Affy yeast2 array. - -```r +```{r getp} library(yeast2.db) -``` - -``` -## Loading required package: AnnotationDbi Loading required package: Biobase -## Welcome to Bioconductor -## -## Vignettes contain introductory material; view with 'browseVignettes()'. To -## cite Bioconductor, see 'citation("Biobase")', and for packages -## 'citation("pkgname")'. -## -## Attaching package: 'AnnotationDbi' -## -## The following object is masked from 'package:BSgenome': -## -## species -## -## Loading required package: org.Sc.sgd.db Loading required package: DBI -``` - -```r library(yeast2probe) -yeast2probe[1:3, ] +yeast2probe[1:3,] ``` -``` -## sequence x y Probe.Set.Name -## 1 GAAAGTTTCAGTGCACGTCTTCAAA 380 257 1769438_at -## 2 GTATATTTCTAATCTTCCTCTTCAT 28 327 1769438_at -## 3 ATATCAAACCGCGTACTTCGTGACT 188 19 1769438_at -## Probe.Interrogation.Position Target.Strandedness -## 1 1117 Antisense -## 2 1170 Antisense -## 3 1240 Antisense -``` - - We'll pick one probe set, and then the sequence of one probe. - -```r -ypick = yeast2probe[yeast2probe$Probe.Set.Name == "1769311_at", ] +```{r getps} +ypick = yeast2probe[yeast2probe$Probe.Set.Name=="1769311_at",] dim(ypick) -``` - -``` -## [1] 11 6 -``` - -```r -ypick[1:3, ] -``` - -``` -## sequence x y Probe.Set.Name -## 1959 ATGAGCACTATGTTTTCTGTTGGAT 486 39 1769311_at -## 1960 GTTTTCTGTTGGATTTGGCTCATAC 154 321 1769311_at -## 1961 TTGGCTCATACTTGGCATCTGGGAA 20 493 1769311_at -## Probe.Interrogation.Position Target.Strandedness -## 1959 100 Antisense -## 1960 111 Antisense -## 1961 125 Antisense -``` - -```r +ypick[1:3,] a = "ATGAGCACTATGTTTTCTGTTGGAT" ra = reverseComplement(DNAString(a)) ``` - Now we use the simple lookup of Biostrings. - -```r +```{r getm} matchPattern(ra, c4) -``` - -``` -## Views on a 1531919-letter DNAString subject -## subject: ACACCACACCCACACCACACCCACACACACCA...ACATAAAATAAAGGTAGTAAGTAGCTTTTGG -## views: -## start end width -## [1] 174478 174502 25 [ATCCAACAGAAAACATAGTGCTCAT] -``` - -```r get("1769311_at", yeast2CHRLOC) -``` - -``` -## 4 -## -174232 -``` - -```r get("1769311_at", yeast2CHRLOCEND) ``` -``` -## 4 -## -174588 -``` - - Exercise. Discuss how to identify probes harboring SNP in the affy u133plus2 array. @@ -267,50 +110,13 @@ affy u133plus2 array.

ExpressionSet and self-description

The ExpressionSet class unifies information on a microarray experiment. - -```r +```{r lkall} library(ALL) data(ALL) getClass(class(ALL)) -``` - -``` -## Class "ExpressionSet" [package "Biobase"] -## -## Slots: -## -## Name: experimentData assayData phenoData -## Class: MIAME AssayData AnnotatedDataFrame -## -## Name: featureData annotation protocolData -## Class: AnnotatedDataFrame character AnnotatedDataFrame -## -## Name: .__classVersion__ -## Class: Versions -## -## Extends: -## Class "eSet", directly -## Class "VersionedBiobase", by class "eSet", distance 2 -## Class "Versioned", by class "eSet", distance 3 -``` - -```r experimentData(ALL) ``` -``` -## Experiment data -## Experimenter name: Chiaretti et al. -## Laboratory: Department of Medical Oncology, Dana-Farber Cancer Institute, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA. -## Contact information: -## Title: Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. -## URL: -## PMIDs: 14684422 16243790 -## -## Abstract: A 187 word abstract is available. Use 'abstract' method. -``` - - A very high-level aspect of self-description is the facility for binding information on the publication of expression data to the data object itself. The abstract method gives @@ -343,15 +149,13 @@ to the genomic location of the read or probe for interpretation. The A large-scale illustration of this class is in the \textit{dsQTL} package, but it is very large so I do not require that you download it. - -```r +```{r lkdsq,eval=FALSE} library(dsQTL) data(DSQ_17) DSQ_17 rowData(DSQ_17)[1:3] ``` - You can use \verb+example("SummarizedExperiment-class")+ to get a working minimal example. @@ -361,57 +165,13 @@ subscripting still selects samples. The VCF class extends SummarizedExperiment to manage information on variant calls on a number of samples. - -```r +```{r lkv} library(VariantAnnotation) -``` - -``` -## Loading required package: Rsamtools -## -## Attaching package: 'VariantAnnotation' -## -## The following object is masked from 'package:base': -## -## tabulate -``` - -```r -fl <- system.file("extdata", "structural.vcf", package = "VariantAnnotation") -vcf <- readVcf(fl, genome = "hg19") +fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation") +vcf <- readVcf(fl, genome="hg19") vcf ``` -``` -## class: CollapsedVCF -## dim: 7 1 -## rowData(vcf): -## GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER -## info(vcf): -## DataFrame with 10 columns: BKPTID, CIEND, CIPOS, END, HOMLEN, HOMSEQ,... -## info(header(vcf)): -## Number Type Description -## BKPTID . String ID of the assembled alternate allele in th... -## CIEND 2 Integer Confidence interval around END for impreci... -## CIPOS 2 Integer Confidence interval around POS for impreci... -## END 1 Integer End position of the variant described in t... -## HOMLEN . Integer Length of base pair identical micro-homolo... -## HOMSEQ . String Sequence of base pair identical micro-homo... -## IMPRECISE 0 Flag Imprecise structural variation -## MEINFO 4 String Mobile element info of the form NAME,START... -## SVLEN . Integer Difference in length between REF and ALT a... -## SVTYPE 1 String Type of structural variant -## geno(vcf): -## SimpleList of length 4: GT, GQ, CN, CNQ -## geno(header(vcf)): -## Number Type Description -## GT 1 String Genotype -## GQ 1 Float Genotype quality -## CN 1 Integer Copy number genotype for imprecise events -## CNQ 1 Float Copy number genotype quality for imprecise... -``` - -

Management of information on HTS experiments