Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

mikelove · 2024-06-25T13:01:51Z

This ideally would start with support for MultiAssayExperiment which would directly work for QFeatures

The text was updated successfully, but these errors were encountered:

lgatto · 2024-06-26T06:21:52Z

Here's, for illustration, an example pipeline that process data in a QFeatures object:

Filter decoy features (i.e. those with a label of -1) in all assays.
Only keep features of rank 1 in all assays.
Keep features that have a spectrum FDR smaller than 0.05 in all assays.
Replace 0s by NA in all assays, to make missing values explicit.
Log-transform quantitative values - three new assays are created from the three initial ones and named with the “log_” prefix.
Aggregate each PSMs into peptide-level quantities by computing the median and ignoring missing values. The new assays names are defined as ‘peptide’ followed by the orifinal assay identifier.
Join the 3 peptide assays into a new assay that will contain the 33 samples. Missing values are incorporated accordingly, when a peptide is observed in part of the sets.
Normalise the joined peptide data using median centring.
Aggregate the peptide-level quantities into protein by computing the median and ignoring missing values. The new assay names will be ‘protein’.

qf |>
    filterFeatures(~ label > 0) |>           ## 1
    filterFeatures(~ rank == 1) |>           ## 2
    filterFeatures(~ spectrum_fdr < 0.05) |> ## 3
    zeroIsNA(1:3) |>                         ## 4
    logTransform(i = 1:3,                    ## 5
                 name = paste0("log_", names(qf))) |>
    aggregateFeaturesOverAssays(i = 4:6,     ## 6
                                fcol = "peptide",
                                name = sub("psm", "peptide", names(qf)),
                                fun = colMedians,
                                na.rm = TRUE) |>
    joinAssays(i = 7:9,                      ## 7
               name = "peptides") |>
    normalize(i = 10,                        ## 8
              name = "norm_peptides",
              method = "center.median") |>
    aggregateFeatures(i = "norm_peptides",   ## 9
                      name = "proteins",
                      fcol = "proteins",
                      fun = colMedians,
                      na.rm = TRUE)

From the sager vignette.

lgatto · 2024-06-26T06:27:58Z

And by the way, there's already a longFormat() function for MultiAssayExperiment and QFeatures objects:

> fts1
An instance of class QFeatures containing 2 assays:
 [1] assay1: SummarizedExperiment with 10 rows and 4 columns 
 [2] assay2: SummarizedExperiment with 4 rows and 4 columns 
> colData(fts1)
DataFrame with 4 rows and 2 columns
         Var1        Var2
    <numeric> <character>
S1 -1.0588267           A
S2  0.0199355           B
S3 -0.0761972           C
S4 -1.0501452           D
> longFormat(fts1, colvars = names(colData(fts1)))
DataFrame with 56 rows and 7 columns
          assay     primary  rowname  colname     value       Var1        Var2
    <character> <character> <factor> <factor> <integer>  <numeric> <character>
1        assay1          S1        a       S1         1   -1.05883           A
2        assay1          S1        b       S1         2   -1.05883           A
3        assay1          S1        c       S1         3   -1.05883           A
4        assay1          S1        d       S1         4   -1.05883           A
5        assay1          S1        e       S1         5   -1.05883           A
...         ...         ...      ...      ...       ...        ...         ...
52       assay2          S3        d       S3        12 -0.0761972           C
53       assay2          S4        a       S4        13 -1.0501452           D
54       assay2          S4        b       S4        14 -1.0501452           D
55       assay2          S4        c       S4        15 -1.0501452           D
56       assay2          S4        d       S4        16 -1.0501452           D

mikelove · 2024-06-26T09:09:42Z

This is great! I will change this challenge to 1) write up a short tutorial for tidyomics/tidy-proteomics showing analogy between tidySE and QFeatures, 2) allow arbitrary tidyverse verbs? e.g. mutate, group_by, summarize?

mikelove changed the title ~~build a tidy multi-assay proteomics infrastructure (e.g. QFeatures)~~ Write up a short tidy multi-assay proteomics tutorial using QFeatures Jun 26, 2024

mikelove added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

mikelove commented Jun 25, 2024 •

edited

Loading

lgatto commented Jun 26, 2024 •

edited

Loading

lgatto commented Jun 26, 2024

mikelove commented Jun 26, 2024

Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

Comments

mikelove commented Jun 25, 2024 • edited Loading

lgatto commented Jun 26, 2024 • edited Loading

lgatto commented Jun 26, 2024

mikelove commented Jun 26, 2024

mikelove commented Jun 25, 2024 •

edited

Loading

lgatto commented Jun 26, 2024 •

edited

Loading