some more edits to pkgdown and GHA

thelovelab · Aug 26, 2023 · 2ac093b · 2ac093b
1 parent 69cd5a4
commit 2ac093b
Show file tree

Hide file tree

Showing 4 changed files with 62 additions and 3 deletions.
diff --git a/.github/workflows/check-bioc.yml b/.github/workflows/check-bioc.yml
@@ -73,6 +73,9 @@ jobs:
           mkdir /__w/_temp/Library
           echo ".libPaths('/__w/_temp/Library')" > ~/.Rprofile
 
+      - name: work around permission issue
+        run: git config --global --add safe.directory /__w/thelovelab/tximeta
+
       ## Most of these steps are the same as the ones in
       ## https://github.com/r-lib/actions/blob/master/examples/check-standard.yaml
       ## If they update their steps, we will also need to update ours.

diff --git a/README.md b/README.md
@@ -2,16 +2,54 @@
 
 [![R build status](https://github.com/thelovelab/tximeta/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/thelovelab/tximeta/actions/workflows/check-bioc.yml)
 
-For a reference for `tximeta`:
+# Automatic metadata for RNA-seq
+
+*tximeta* provides a set of functions for conveniently working with
+metadata for transcript quantification data in Bioconductor. The
+`tximeta()` function imports quantification data from *Salmon* or
+other quantifiers, and returns a 
+[SummarizedExperiment](https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#anatomy-of-a-summarizedexperiment)
+object.
+
+If `tximeta()` recognizes the reference transcripts used
+for quantification, it will automatically download relevant
+information about the location of the transcripts in the correct genome.
+*These actions happen in the background without requiring any extra
+effort or information from the user.*
+
+This metadata is attached to the *SummarizedExperiment* in the
+`metadata()` and `rowRanges()` slots.
+
+For a list of the reference transcriptomes supported by `tximeta()`,
+see the "Pre-computed checksums" section of the vignette in the 
+`Get started` tab.
+
+Further steps are also facilitated, e.g. `summarizeToGene()`, `addIds()`,
+or even `retrieveCDNA()` (the transcripts used for quantification) or
+`retrieveDb()` (the correct *TxDb* or *EnsDb* to match the
+quantification data).
+
+# How it works
+
+The key idea behind *tximeta* is that *Salmon* propagates a hash value
+summarizing the reference transcripts into each quantification
+directory it outputs. *tximeta* can be used with other tools as long
+as the 
+[hash of the transcripts](https://github.com/COMBINE-lab/FastaDigest) 
+is also included in the output directories. 
+
+![](man/figures/diagram.png)
+
+# Reference
+
+A reference for *tximeta* package is:
 
 > Michael I. Love, Charlotte Soneson, Peter F. Hickey, Lisa K. Johnson,
 > N. Tessa Pierce, Lori Shepherd, Martin Morgan, Rob Patro.
 > "Tximeta: reference sequence checksums for provenance
 > identification in RNA-seq" *PLOS Computational Biology* (2020)
 > [doi: 10.1371/journal.pcbi.1007664](https://doi.org/10.1371/journal.pcbi.1007664)
 
-![](man/figures/diagram.png)
-
 # Feedback
 
 We would love to hear your feedback. Please post to 

diff --git a/vignettes/images/assignRanges-abundant.png b/vignettes/images/assignRanges-abundant.png
diff --git a/vignettes/tximeta.Rmd b/vignettes/tximeta.Rmd
@@ -312,6 +312,8 @@ gse <- summarizeToGene(se)
 rowRanges(gse)
 ```
 
+## Assign ranges by abundance
+
 We also offer a new type of range assignment, based on the most
 abundant isoform rather than the leftmost to rightmost coordinate. See
 the `assignRanges` argument of `?summarizeToGene`. Using the most
@@ -323,6 +325,22 @@ than the default option.
 gse <- summarizeToGene(se, assignRanges="abundant")
 ```
 
+For more explanation about why this may be a better choice, see the
+following tutorial chapter:
+
+<https://tidyomics.github.io/tidy-ranges-tutorial/gene-ranges-in-tximeta.html>
+
+In the below diagram, the pink feature is the set of all exons
+belonging to any isoform of the gene, such that the TSS is on the
+right side of this minus strand feature. However, the blue feature is
+the most abundant isoform (the brown features are the next most
+abundant isoforms). The pink feature is therefore not a good
+representation for the locus.
+
+```{r echo=FALSE}
+knitr::include_graphics("images/assignRanges-abundant.png")
+```
+
 # Add different identifiers
 
 We would like to add support to easily map transcript or gene