-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UAX35 numeric datatype format pattern ambiguity #894
Comments
Recognize that these questions are mostly relevant to UAX35 and not CSVW specifically. UAX35's primary consideration is for formatting data, not parsing. We really just use it for parsing.
In the number pattern,
Depending on the implementation you use, that might not be valid: on the left-hand-side of a decimal point, you'd expect a
The number of
Yes, but smaller fractional parts will be matched. As mentioned, UAX35 is mostly about emitting numbers rather than parsing them. We restrict ourselves to parsing numbers.
My implementation doesn't support this, but maybe?
I don't think there are a minimum number of digits to be matched.
Same.
For parsing purposes, I think the group character is largely ignored. At least, it is in my implementation. I think it would be reasonable for a tool meant to validate input data to take a more strict interpretation of number patterns (or other patterns, for that matter) and report on fields in the input that don't correspond to the pattern.
Generally, UAX35 defines a grammar for these patterns, and patterns that don't match that pattern would be considered an error. The last entry in the referenced table is a pattern with no characters, which would seem to be legitimate.
I believe so. It's intended to describe if a "+" should be used for numbers when formatting. It would largely be ignored when parsing.
Sorry, it's been a long time since we worked on these specs, and the details are a bit fuzzy without really diving back into it now. |
Thanks for your explanation! I have a question though. So the test 304 fails because there is empty line in the referenced table? Not because the table entry "12.34,567" does not match the specified pattern "#0.0#,#"? There are also multiple other tests that contain such empty entry (303,302,301,...). Do these also fail for the same reason? |
So I am developing CSV validator according to CSVW recommendations in C#.
I have some questions about the interpretation of number patterns defined for numeric datatypes . If I understand it correctly there are two cases to consider:
So firstly my questions about the first case (1.):
a. In the linked document is is stated that: "The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum". What is the digit character exatly? Is it the character '0' or '#' or both? Are patterns like this: 0E0##0# correct?
b. What gives the minimum number of integer digits in the pattern? Is it the leftmost '0' in the integer part of the pattern?
c. What gives the minimum number of fractional digits in the pattern? Is it the rightmost '0' in the fractional part?
d. Can the groupChar be in the exponent part of the pattern?
Questions about second case (2.):
a. What gives the minimum number of integer digits in the pattern? Is it the leftmost '0' in the integer part of the pattern?
b. What gives the minimum number of fractional digits in the pattern? Is it the rightmost '0' in the fractional part?
c. How should the grouping separator be treated in the fractional and the exponent part? The document states the following:
however based on the integration tests I am not sure how exactly should the grouping separator work in such cases.
d. What gives the maximum number of fractional digits in the pattern? Based on the Validation Test 304 I would say that it is the number of characters [#0].
One last question that is not related to any of these cases. If the character '+' is present in the pattern it means that before the number there must be plus sign in front of the non-negative numbers. When the '+' character is not present it means that the plus sign in fron of non-negative numbers (or exponent) is optional?
Thank you for your help in advance.
The text was updated successfully, but these errors were encountered: