Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-14032 More space adjustments in DAIP; add test. Run it on en.xml, adjust tests #2001

Conversation

pedberg-icu
Copy link
Contributor

@pedberg-icu pedberg-icu commented May 12, 2022

CLDR-14032

  • This PR completes the ticket.

Enhanced DAIP to make the space adjustments per the ticket and associated proposal as discussed in TC:

  • In Latin-script locales intervalFormats, spaces around \u2013 become thin space \u2009
  • In time formats, space between time and AM/PM marker becomes NNBSP \u202F
  • In Cyrillic-script locales date formats with year, space between y and year marker becomes NNBSP \u202F
  • In narrow units, space(s) on either side of {0} become NNBSP \u202F
  • In short units, space(s) on either side of {0} become NBSP \u00A0

Added a unit test for these changes.

Then ran CLDRModify -fp on en.xml to show the result of these changes (that also reordered some data items not related to the DAIP changes here). The resulting spacing changes required adjusting the expected results of some unit tests.

I plan to run CLDRModify on the other locales as a separate PR before start of regular submission, either under the ticket for this or another ticket.

"{0} g", "{0} g"), // \u00A0
new PathSpaceAdjustData("en",
"//ldml/units/unitLength[@type=\"short\"]unit[@type=\"mass-gram\"]/unitPattern[@count=\"other\"]",
"g {0}", "g {0}"), // \u00A0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments on lines 529 and 532 refer to \u00A0, which, however, doesn't seem to occur in the data on those lines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, never mind, they do have \u00A0 but somehow it was changed to \u0020 when copy-pasting from browser

Copy link
Member

@btangmu btangmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments on lines 529 and 532 of TestDisplayAndInputProcessor.java refer to \u00A0, which, however, doesn't seem to occur in the data on those lines

it might be more robust if the \u escapes were in the strings themselves rather than in comments -- anyway, here the comments don't seem to match the data

"{0} g", "{0} g"), // \u00A0
new PathSpaceAdjustData("en",
"//ldml/units/unitLength[@type=\"short\"]unit[@type=\"mass-gram\"]/unitPattern[@count=\"other\"]",
"g {0}", "g {0}"), // \u00A0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, never mind, they do have \u00A0 but somehow it was changed to \u0020 when copy-pasting from browser

@@ -834,13 +834,13 @@ public void TestDayPeriods() {
checkDayPeriod("pl", "format", "morning1", "〖06:00 – 10:00⁻〗〖❬8:00 ❭rano〗");
checkDayPeriod("pl", "stand-alone", "morning1", "〖06:00 – 10:00⁻〗");

checkDayPeriod("en", "format", "night1", "〖00:00 – 06:00⁻; 21:00 – 24:00⁻〗〖❬3:00 ❭at night〗");
checkDayPeriod("en", "format", "night1", "〖00:00 – 06:00⁻; 21:00 – 24:00⁻〗〖❬3:00❭at night〗");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are nbsp may be better to escape it in the java source for clarity. That might be a good sweeping change though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, filed a separate ticket for that: CLDR-15636

@pedberg-icu
Copy link
Contributor Author

pedberg-icu commented May 12, 2022

I will file a separate ticket to update data files per CLDRModify before general submission. Should probably add to BRS too.

@pedberg-icu pedberg-icu merged commit a83026a into unicode-org:main May 12, 2022
@pedberg-icu pedberg-icu deleted the CLDR-14032-DAIP-space-handling-updates branch May 12, 2022 15:37
checkDayPeriod("en", "format", "am", "〖00:00 – 12:00⁻〗〖❬6:00 ❭AM〗");
checkDayPeriod("en", "format", "pm", "〖12:00 – 24:00⁻〗〖❬6:00 ❭PM〗");
checkDayPeriod("en", "format", "noon", "〖12:00〗〖❬12:00 ❭noon〗");
checkDayPeriod("en", "format", "midnight", "〖00:00〗〖❬12:00 ❭midnight〗");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably a mistake to use a NBSP (or NNBSP) between the time (13:00) and an unabbreviated day period. The am/pm are fine, because they are short. But forcing a string like "12:00 midnight" to break as a unit could leave lines unnecessarily ragged.

However, I think that can be discussed and corrected afterwards if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati I did not do anything to the spacing in the patterns that have B (for day periods). I think the issue here is just that the ExampleGenerator code is using time patterns with 'a' to generate examples for dayPeriods. That seems like an ExampleGenerator problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually I think it might be in the DateTimePatternGenerator code which, if given a skeleton with 'B', will adjust a time pattern that has 'a' by substituting in the 'B'. Hmm, need to think about how to address that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could fix that in DTPG code, it can remove all NBSP in patterns that end up using B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I do not think this is a necessarily problem in DTPG. We have separate availableFormats and intervalFormats for Bh, Bhm and Bhms - and this PR did not change those - so if Example Generator is correctly using DTPG and DIF it should not have gotten the NNBSP above. It may be an ExampleGenerator problem after all. I will investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is in ICUServiceBuilder.formatDayPeriod, filed https://unicode-org.atlassian.net/browse/CLDR-15645 to fix.

@macchiati
Copy link
Member

macchiati commented May 12, 2022 via email

@pedberg-icu
Copy link
Contributor Author

Or replace

Yeah, that is actually what I meant, sorry.

@macchiati
Copy link
Member

macchiati commented May 12, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants