Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: read different types of scan #20

Open
tentrillion opened this issue Sep 15, 2023 · 4 comments
Open

feature request: read different types of scan #20

tentrillion opened this issue Sep 15, 2023 · 4 comments

Comments

@tentrillion
Copy link

tentrillion commented Sep 15, 2023

I've been using RaMS for about a year and it is amazing; finally an easy depencency-light way to fast reading and tidy manipulation of MS data! My feature request is could there be a way to label different types of MS1 scans acquired in the same experiment. (I often acquire data like this when varying ion source parameters.)

I have *.mzML files generated from Sciex *.wiff2 files that were acquired from a qTOF that was running multiple MS1 scan types. Sciex calls the different scan types "experiments" and in the XML (generated via ProteoWizard) these different scan types are referred to like this:

<spectrum index="0" id="sample=1 period=1 cycle=1 experiment=2" defaultArrayLength="2271">
[...]
<spectrum index="1" id="sample=1 period=1 cycle=1 experiment=4" defaultArrayLength="4300">
[...]
 <spectrum index="3" id="sample=1 period=1 cycle=1 experiment=7" defaultArrayLength="3">

I'd be happy to supply an example mzML file.

I imagine one output type might be an extra column (relative to what get_what = c('MS1') returns) containing the spectrum id strings like sample=1 period=1 cycle=1 experiment=4.

@wkumler
Copy link
Owner

wkumler commented Sep 25, 2023

Hi @tentrillion, thanks for the feature request. I'm a little swamped with other tasks for my PhD right now (as you may have already determined from the backlog of issues) but I'm hoping to push out v1.4 by the end of the year or early 2024. This looks like a good contribution for that but I will need some demo mzML files. If you're able to share one or two publicly with a Box or Dropbox link that'd be great - otherwise we'll have to chat about a good way of getting those to me for testing.

@wkumler
Copy link
Owner

wkumler commented Nov 13, 2023

Hi @tentrillion, I've got some time now to work on this issue now and think it's worth the effort. Do you have a demo mzML file you're able to share?

@wkumler wkumler mentioned this issue Nov 13, 2023
9 tasks
tentrillion added a commit to tentrillion/ipython_notebooks that referenced this issue Jul 24, 2024
@tentrillion
Copy link
Author

Apologies for missing your November reply until now. I couldn't figure out how to attach an mzML directly in this thread. As a (odd I know) workaround I've committed it to a random git repo I use to store / publish random notebooks. LMK if there's a better way to send you these, I have more if you need them.
https://github.com/tentrillion/ipython_notebooks/blob/master/example_sciex_multiMS1scantypes.mzML

@wkumler
Copy link
Owner

wkumler commented Aug 1, 2024

Hi @tentrillion, thanks for providing the demo file! It's a good question about how to best go about getting this data and combining it with the rest of the MS1 info. This feels like a similar function to the grabAccessionData but since the information's stored in the spectrum tag itself that doesn't work for extraction. Instead I had to manually read in the XML and extract the experiment number, bind that to the associated retention time, and then merge it back onto the MS1 info. This could definitely be streamlined into a single function (which would then also avoid having to read the mzML file twice) but is this essentially what you're looking for?

library(xml2)
library(RaMS)
library(ggplot2)

xml_data <- read_xml("~/../Downloads/example_sciex_multiMS1scantypes.mzML")
all_spectra <- xml_find_all(xml_data, "//d1:spectrum")
scan_ids <- xml_attr(all_spectra, "id")
experiment_nums <- as.numeric(gsub(".*experiment=", "", scan_ids))
scan_rts <- grabAccessionData("~/../Downloads/example_sciex_multiMS1scantypes.mzML", "MS:1000016")
rt_id_df <- cbind(rt=as.numeric(scan_rts$value), exp_num=experiment_nums)

msdata <- grabMSdata("~/../Downloads/example_sciex_multiMS1scantypes.mzML")

ms1_w_expnum <- merge(msdata$MS1, rt_id_df)

There seem to be some quirky data in the file - each mass is "bracketed" by two zeros on either side at higher and lower masses, creating a strange triplicate data point layout pattern:

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 100)]) +
  geom_point(aes(x=rt, y=mz, color=int>0))

image

but when those points are removed you can see the instrument cycling through each of the different MS1 scan types

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 100)][int>0]) +
  geom_point(aes(x=rt, y=mz, color=factor(exp_num)))

image

and you can then use the experiment number to separate out the types of scans and plot them individually

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 10)][int>0]) +
  geom_line(aes(x=rt, y=int)) +
  facet_wrap(~exp_num, ncol=1, scales = "free_y")

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants