Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write up a short tidy multi-assay proteomics tutorial using QFeatures #20

Open
mikelove opened this issue Jun 25, 2024 · 3 comments
Open
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@mikelove
Copy link
Member

mikelove commented Jun 25, 2024

This ideally would start with support for MultiAssayExperiment which would directly work for QFeatures

@lgatto
Copy link

lgatto commented Jun 26, 2024

Here's, for illustration, an example pipeline that process data in a QFeatures object:

  1. Filter decoy features (i.e. those with a label of -1) in all assays.
  2. Only keep features of rank 1 in all assays.
  3. Keep features that have a spectrum FDR smaller than 0.05 in all assays.
  4. Replace 0s by NA in all assays, to make missing values explicit.
  5. Log-transform quantitative values - three new assays are created from the three initial ones and named with the “log_” prefix.
  6. Aggregate each PSMs into peptide-level quantities by computing the median and ignoring missing values. The new assays names are defined as ‘peptide’ followed by the orifinal assay identifier.
  7. Join the 3 peptide assays into a new assay that will contain the 33 samples. Missing values are incorporated accordingly, when a peptide is observed in part of the sets.
  8. Normalise the joined peptide data using median centring.
  9. Aggregate the peptide-level quantities into protein by computing the median and ignoring missing values. The new assay names will be ‘protein’.
qf |>
    filterFeatures(~ label > 0) |>           ## 1
    filterFeatures(~ rank == 1) |>           ## 2
    filterFeatures(~ spectrum_fdr < 0.05) |> ## 3
    zeroIsNA(1:3) |>                         ## 4
    logTransform(i = 1:3,                    ## 5
                 name = paste0("log_", names(qf))) |>
    aggregateFeaturesOverAssays(i = 4:6,     ## 6
                                fcol = "peptide",
                                name = sub("psm", "peptide", names(qf)),
                                fun = colMedians,
                                na.rm = TRUE) |>
    joinAssays(i = 7:9,                      ## 7
               name = "peptides") |>
    normalize(i = 10,                        ## 8
              name = "norm_peptides",
              method = "center.median") |>
    aggregateFeatures(i = "norm_peptides",   ## 9
                      name = "proteins",
                      fcol = "proteins",
                      fun = colMedians,
                      na.rm = TRUE)

From the sager vignette.

@lgatto
Copy link

lgatto commented Jun 26, 2024

And by the way, there's already a longFormat() function for MultiAssayExperiment and QFeatures objects:

> fts1
An instance of class QFeatures containing 2 assays:
 [1] assay1: SummarizedExperiment with 10 rows and 4 columns 
 [2] assay2: SummarizedExperiment with 4 rows and 4 columns 
> colData(fts1)
DataFrame with 4 rows and 2 columns
         Var1        Var2
    <numeric> <character>
S1 -1.0588267           A
S2  0.0199355           B
S3 -0.0761972           C
S4 -1.0501452           D
> longFormat(fts1, colvars = names(colData(fts1)))
DataFrame with 56 rows and 7 columns
          assay     primary  rowname  colname     value       Var1        Var2
    <character> <character> <factor> <factor> <integer>  <numeric> <character>
1        assay1          S1        a       S1         1   -1.05883           A
2        assay1          S1        b       S1         2   -1.05883           A
3        assay1          S1        c       S1         3   -1.05883           A
4        assay1          S1        d       S1         4   -1.05883           A
5        assay1          S1        e       S1         5   -1.05883           A
...         ...         ...      ...      ...       ...        ...         ...
52       assay2          S3        d       S3        12 -0.0761972           C
53       assay2          S4        a       S4        13 -1.0501452           D
54       assay2          S4        b       S4        14 -1.0501452           D
55       assay2          S4        c       S4        15 -1.0501452           D
56       assay2          S4        d       S4        16 -1.0501452           D

@mikelove
Copy link
Member Author

This is great! I will change this challenge to 1) write up a short tutorial for tidyomics/tidy-proteomics showing analogy between tidySE and QFeatures, 2) allow arbitrary tidyverse verbs? e.g. mutate, group_by, summarize?

@mikelove mikelove changed the title build a tidy multi-assay proteomics infrastructure (e.g. QFeatures) Write up a short tidy multi-assay proteomics tutorial using QFeatures Jun 26, 2024
@mikelove mikelove added documentation Improvements or additions to documentation good first issue Good for newcomers labels Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
Status: Todo
Development

No branches or pull requests

2 participants