Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wrong time zones returned by strptime #42

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

ethanbass
Copy link
Contributor

@ethanbass ethanbass commented Sep 17, 2024

Hi William,

I noticed that the current way of parsing of the startTimeStamp does not correctly interpret the time zone because strptime does not correctly interpret the "Z" at the end of the time zone. For example in your test file wk_chrom.mzML, the time stamp is 2022-08-11T12:34:56Z.

In the current version of grabMzmlMetadata this is parsed as:

time_val <- "2022-08-11T12:34:56Z"
time_stamp <- as.POSIXct(strptime(time_val, "%Y-%m-%dT%H:%M:%SZ"))

which returns (on my computer) "2022-08-11 12:34:56 EDT" because strptime does not seem to recognize the UTC time zone indicated by 'Z'.

I tried to fix this by replacing "Z" with a timezone that strptime can understand, "+0000". Alternatively, lubridate recognizes the timezone correctly. For example, lubridate::ymd_hms("2022-08-11T12:34:56Z") correctly returns "2022-08-11 12:34:56 UTC". However, this would add an additional dependency which I think you would probably prefer to avoid.

best,
Ethan

@wkumler
Copy link
Owner

wkumler commented Sep 17, 2024

Nice catch! I'll admit that time zones are not my specialty but that's definitely something we should fix. It does sound like mzML files should all use Universal Time for the timestamp (according to the specification doc on the ProteoWizard website, https://www.psidev.info/mzml):

image

so I'm happy to hard-code that timezone in using the tz="UTC" you suggest in the PR. However, I'm a bit confused by the substitution and offset specification and I'm inclined to just specify the timezone as UTC in the strptime call without editing the character string. I get identical results with both of the new expressions below and can't tell if I'm missing something with the simpler version.

# Current code:
as.POSIXct(strptime("2022-08-11T12:34:56Z", "%Y-%m-%dT%H:%M:%SZ"))

# PR suggestion:
as.POSIXct(strptime("2022-08-11T12:34:56+0000", "%Y-%m-%dT%H:%M:%S%z", tz = "UTC"))

# Proposed instead:
as.POSIXct(strptime("2022-08-11T12:34:56Z", "%Y-%m-%dT%H:%M:%SZ", tz = "UTC"))

If this also looks ok to you I'll probably go ahead and implement the simpler version since that feels a bit safer and we don't have to worry about character substitutions and offsets that way. Especially appreciated is the unit test that I'll reuse for the other mzML files in the repo!

@ethanbass
Copy link
Contributor Author

ethanbass commented Sep 17, 2024 via email

@ethanbass
Copy link
Contributor Author

ethanbass commented Sep 17, 2024 via email

@wkumler wkumler changed the base branch from master to timezone_hotfix September 17, 2024 23:19
@wkumler wkumler merged commit 158892b into wkumler:timezone_hotfix Sep 17, 2024
3 checks passed
@wkumler
Copy link
Owner

wkumler commented Sep 19, 2024

Your code's been merged into main and is on its way to CRAN as version 1.4.1 with some other fixes/updates! Thanks for your continued use and support of the package.

And yes, unfortunately most of the other files come from mzXMLs which don't have a way (AFAIK) of encoding the timestamp so that information's lost.

@ethanbass
Copy link
Contributor Author

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants