-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wrong time zones returned by strptime #42
Conversation
Nice catch! I'll admit that time zones are not my specialty but that's definitely something we should fix. It does sound like mzML files should all use Universal Time for the timestamp (according to the specification doc on the ProteoWizard website, https://www.psidev.info/mzml): so I'm happy to hard-code that timezone in using the
If this also looks ok to you I'll probably go ahead and implement the simpler version since that feels a bit safer and we don't have to worry about character substitutions and offsets that way. Especially appreciated is the unit test that I'll reuse for the other mzML files in the repo! |
I wasn't sure if the run times are always stored in UTC, but it seems like this should be the case (as you say) according to the mzML specifications. My proposed version should allow different timezone offsets. For example, if we had "2022-08-11T12:34:56+0100", my version would correctly convert the time to UTC.
`as.POSIXct(strptime("2022-08-11T12:34:56+0100", "%Y-%m-%dT%H:%M:%S%z", tz = "UTC"))`
However, if we don't need to account for different timezones because all the times are UTC, I think your simpler version should work just as well.
Ethan
|
P.S. From what I saw most of the other test files don't seem to have the
runStartTime field. I'm not sure if they were lost somehow in the
minification process or if they never had them to start with.
…On Tue, Sep 17, 2024 at 5:45 PM Ethan Bass ***@***.***> wrote:
I wasn't sure if the run times are always stored in UTC, but it seems like
this should be the case (as you say) according to the mzML specifications.
My proposed version should allow different timezone offsets. For example,
if we had "2022-08-11T12:34:56+0100", my version would correctly convert
the time to UTC.
`as.POSIXct(strptime("2022-08-11T12:34:56+0100", "%Y-%m-%dT%H:%M:%S%z", tz = "UTC"))`
However, if we don't need to account for different timezones because all
the times are UTC, I think your simpler version should work just as well.
Ethan
On Tue, Sep 17, 2024 at 5:25 PM William Kumler ***@***.***>
wrote:
> Nice catch! I'll admit that time zones are not my specialty but that's
> definitely something we should fix. It does sound like mzML files should
> all use Universal Time for the timestamp (according to the specification
> doc on the ProteoWizard website, https://www.psidev.info/mzml):
>
> image.png (view on web)
> <https://github.com/user-attachments/assets/2ad72422-1125-434a-95ce-5e5b2eb0c126>
>
> so I'm happy to hard-code that timezone in using the tz="UTC" you
> suggest in the PR. However, I'm a bit confused by the substitution and
> offset specification and I'm inclined to just specify the timezone as UTC
> in the strptime call without editing the character string. I get identical
> results with both of the new expressions below and can't tell if I'm
> missing something with the simpler version.
>
> # Current code:
> as.POSIXct(strptime("2022-08-11T12:34:56Z", "%Y-%m-%dT%H:%M:%SZ"))
>
> # PR suggestion:
> as.POSIXct(strptime("2022-08-11T12:34:56+0000", "%Y-%m-%dT%H:%M:%S%z", tz = "UTC"))
>
> # Proposed instead:
> as.POSIXct(strptime("2022-08-11T12:34:56Z", "%Y-%m-%dT%H:%M:%SZ", tz = "UTC"))
>
> If this also looks ok to you I'll probably go ahead and implement the
> simpler version since that feels a bit safer and we don't have to worry
> about character substitutions and offsets that way. Especially appreciated
> is the unit test that I'll reuse for the other mzML files in the repo!
>
> —
> Reply to this email directly, view it on GitHub
> <#42 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADZEBO7CYHSIKI4RT6DIAXTZXCM4DAVCNFSM6AAAAABOMHAW4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJWHE3DGMRWG4>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Your code's been merged into main and is on its way to CRAN as version 1.4.1 with some other fixes/updates! Thanks for your continued use and support of the package. And yes, unfortunately most of the other files come from mzXMLs which don't have a way (AFAIK) of encoding the timestamp so that information's lost. |
Thank you!! |
Hi William,
I noticed that the current way of parsing of the startTimeStamp does not correctly interpret the time zone because strptime does not correctly interpret the "Z" at the end of the time zone. For example in your test file
wk_chrom.mzML
, the time stamp is2022-08-11T12:34:56Z
.In the current version of
grabMzmlMetadata
this is parsed as:which returns (on my computer) "2022-08-11 12:34:56 EDT" because
strptime
does not seem to recognize the UTC time zone indicated by 'Z'.I tried to fix this by replacing "Z" with a timezone that strptime can understand, "+0000". Alternatively, lubridate recognizes the timezone correctly. For example,
lubridate::ymd_hms("2022-08-11T12:34:56Z")
correctly returns "2022-08-11 12:34:56 UTC". However, this would add an additional dependency which I think you would probably prefer to avoid.best,
Ethan