-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggest test for date format consistency and suggested revision #143
Comments
Hi all- |
Thanks Deb for raising this again. Ambiguous dates are a real problem. There are a number of ways of helping to decide which of the ambiguities may be the correct one. In addition to your list above, you can make some reasonable guesses - if it is an Australian dataset you are pretty sure it is going to be DD-MM-YYYY, but if North American (USA or Canada) it is a lot more difficult. See some of the discussion under Issue #86 where it is ambiguous we have suggested a range. There are also a number of other DATE/TIME tests - if you click on the TIME label - they will come up as a group. |
Thanks Mobb for the link. Our problem is not with the ISO format per se - it is interpretting ambiguous dates in datasets where in may be 3rd July or 6th March - e.g. where it is written in a Verbatim field 03/06/2018 |
@ArthurChapman , that was the issue for us as well: that a data value alone cannot be interpreted, and at a minimum, interpretation requires some metadata. I thought you might be interested in seeing what other EML-based systems were doing in the area of data checking. There are some significant differences between our system and TDWGs: However, in a DC archive (the format I imagine TDWG is interested in interpreting), the vocabularies are external, so your solution will be different, and probably involves interpreting that external metadata (analogous to the way you might confirm a species binomial from the taxonomic_id + system_name in related fields) |
A mix of MM-DD-YYYY and DD-MM-YYYY in |
Hi Christian,
Yes, no problem with verbatimEventDate, but the need for a consistent
eventDate is key.
…On 2018-07-03 12:09 PM, Christian Gendreau wrote:
A mix of MM-DD-YYYY and DD-MM-YYYY in |verbatimEventDate| within the
same dataset is possible and not necessarily wrong. I would make it
explicit here that the target is |eventDate|.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tdwg_bdq_issues_143-23issuecomment-2D402210845&d=DwMCaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=ODXYRdWm1Oqf5-w5G2NjQw&m=_-A_pUftVbRZliYy56gItOD-f7QK1Nl51FM-kl0do_c&s=MPZZJqnwHj8_ssQs-wGRqe3oZufPhl_FoDq6xKWo-Rk&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AC2gS-2DHwg3KR0wcfOGtfPpk0HI2YWgT-2Dks5uC5dNgaJpZM4U-5F0K0&d=DwMCaQ&c=HPMtquzZjKY31rtkyGRFnQ&r=ODXYRdWm1Oqf5-w5G2NjQw&m=_-A_pUftVbRZliYy56gItOD-f7QK1Nl51FM-kl0do_c&s=hmuILUXKzjBwD2Vh8DjlENR-ka_wJ2D5uXlAaco_mMM&e=>.
--
-- Upcoming iDigBio Events https://www.idigbio.org/calendar
-- Deborah Paul, iDigBio Digitization and Workforce Training Specialist
iDigBio -- Steering Committee Member
SPNHC Liaison, Member-At-Large and Member International Relations Committee
SYNTHESYS3 Representative
Institute for Digital Information, 234 LSB
Florida State University
Tallahassee, Florida 32306
850-644-6366
|
The original discussion was really methodology on how one could determine some of the difficult ones - for example ambiguous dates by looking in different places to resolve possible ambiguities etc. I think our #86 handles this for now and then if someone needs to resolve it further they could try different options for resolving the problem. I don't think we can hard wire those methodologies in at present - may be with more coding in the future (I see that @tuco has added it to the Kurator repo) - or through the Profiles. I think it can be closed for now. |
I think more discussion will take us in further circles, truthfully. I
think we have the bulk of the highest impact stuff pretty well under
control.
…On Mon, Aug 13, 2018 at 11:21 PM Arthur Chapman ***@***.***> wrote:
The original discussion was really methodology on how one could determine
some of the difficult ones - for example ambiguous dates by looking in
different places to resolve possible ambiguities etc. I think our #86
<#86> handles this for now and then if
someone needs to resolve it further they could try different options for
resolving the problem. I don't think we can hard wire those methodologies
in at present - may be with more coding in the future (I see that @tuco
<https://github.com/tuco> has added it to the Kurator repo) - or through
the Profiles. I think it can be closed for now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#143 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAcP691SyYu47Q2qRS1wV9XHZ-AsnoVTks5uQjQegaJpZM4U_0K0>
.
|
Hi all, from a DwC Hour observation that dates are sometimes MM-DD-YYYY and DD-MM-YYYY and so cannot be distinguished, we suggest a two part test. See tdwg/dwc#100 for complete issue description.
Note @tucotuco has added this test idea to the Kurator repo.
The text was updated successfully, but these errors were encountered: