Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Improve handling of non-Gregorian calendars #227

Closed
wants to merge 3 commits into from

Conversation

littledan
Copy link
Member

In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

The patch adds an additional field to DateTimeFormat: relatedYear.
This is the related Gregorian year, e.g., as used in some Chinese calendar
formats when the Gregorial year is put next to the year in a 60-year cycle.

The patch makes an editorial change to make the descriptions
of the locale database fully switch to records (there was some object-style
notation remaining).

Closes #225

@littledan
Copy link
Member Author

cc @jackhorton @jungshik

@jackhorton
Copy link
Contributor

Should relatedYear be in [short, long]? My understanding was that the related year was always the year in the Gregorian calendar, which is some number of digits. I would have thought [2-digit, numeric] would be better values for the related year... though honestly, I also wasn't expecting to allow it as an input option. I am not sure what the right path forward is for someone who wants a relatedYear but not a year in the Gregorian calendar, or for someone who asks for a year but not a relatedYear in the Chinese calendar. I assumed the only input option would be year and that could map to both a year and relatedYear output field.

@anba
Copy link
Contributor

anba commented Apr 18, 2018

I guess I agree with @jackhorton's comment from above when looking at the description of 'r' at https://unicode.org/repos/cldr/trunk/specs/ldml/tr35-dates.html#dfst-year. That means if it is supported it should not be an input option and the value should not be [short, long] (maybe just numeric?). Will there be a similar issue with 'U' (cyclic year name)? Or do we just map 'U' to year?

@littledan
Copy link
Member Author

littledan commented Apr 18, 2018

I just used long/short as a copypasta error. 2-digit/numeric seems right to me. Fixed.

@littledan
Copy link
Member Author

littledan commented Apr 25, 2018

Is there anything else needed to verify this patch before landing? (I don't see how we can write test262 tests for it, unfortunately, unless tests are allowed to depend on some locale data.)

@anba
Copy link
Contributor

anba commented May 15, 2018

Is there anything else needed to verify this patch before landing?

Yes, there's still more work to do. 😄

  • The CLDR link shouldn't link to "https://unicode.org/repos/cldr/trunk/specs/ldml/tr35-dates.html#dfst-year", but instead use a stable version.
  • CLDR only supports "numeric" for the related-year, so it's not obvious to me why we want to support "2-digit", too.
  • PartitionDateTimePattern may need to be updated to handle "relatedYear" similar to "year"? (cf. step 18.e.iii)
  • ToLocalTime needs to be updated, too. (Note: The record names are currently incorrectly in lowercase form in ToLocalTime, but actually they need to match the names in Table 5.)

From my earlier comment:

  • Do we really want to support related-year as an input option? Is this actually supported by CLDR?
  • Will there be a similar issue for the cyclic year name "U"?

@jackhorton
Copy link
Contributor

I believe on the call we concluded that the year name would be treated as { type: "year" }

@littledan
Copy link
Member Author

In the May 2018 Intl call, we concluded:

  • Only permit relatedYear: "numeric" and not "2-digit", to match CLDR
  • Leave cyclic year for a possible follow-on, based on demand

The patch was discussed in the presentation at the May 2018 TC39 meeting, with no concerns raised, so I believe it has consensus with the committee.

@anba

The CLDR link shouldn't link to "https://unicode.org/repos/cldr/trunk/specs/ldml/tr35-dates.html#dfst-year", but instead use a stable version.

What's your reasoning here? I thought we were switching to unversioned Unicode references. cc @mathiasbynens

PartitionDateTimePattern, ToLocalTime

Yes, looks like we need an update, seems pretty straightforward.

@jackhorton I'm not sure what you mean.

@jackhorton
Copy link
Contributor

Sorry, I was trying to answer "Will there be a similar issue for the cyclic year name "U"?" by saying that my understanding was the consensus was that U would become type = year and r would become type = relatedYear

@anba
Copy link
Contributor

anba commented Jun 13, 2018

The CLDR link shouldn't link to "https://unicode.org/repos/cldr/trunk/specs/ldml/tr35-dates.html#dfst-year", but instead use a stable version.

What's your reasoning here? I thought we were switching to unversioned Unicode references. cc @mathiasbynens

As discussed yesterday, I think we rather want to use the link https://unicode.org/reports/tr35/tr35-dates.html#dfst-year, which seems to be the stable link to the latest released version of UTR 35.

@littledan littledan force-pushed the nongregorian branch 2 times, most recently from d8f9ed3 to d476d6f Compare July 13, 2018 18:09
@littledan
Copy link
Member Author

Last time we discussed this issue, we reached consensus on making the change, and were just waiting for an implementation to merge the patch. Has anyone implemented it and written tests, or think they will do so soon? It's now more practical to write tests than previously due to the test262 locale tag. If so, I'll merge this patch. If not, I'll separate out the "calendar" elements from the "relativeYear" elements (where calendar represents implementation reality, and relativeYear is a new feature). cc @jungshik @FrankYFTang @jackhorton

littledan added a commit to tc39/proposal-intl-datetime-style that referenced this pull request Oct 13, 2018
- Specify that a CLDR-style schema* and ICU-style logic for how the
  pattern is chosen based on the dateStyle and timeStyle parameters,
  rather than being entirely implementation-defined.
  Closes #3
- Rebase against master and modernize the specification editorially;
  remove unchanged specification sections.
  Closes #5

This patch implies that OS options are not taken into account when
using dateStyle/timeStyle; the data instead comes from a source like
CLDR.

* The schema here omits that the patterns are calendar-specific. The
fix is out for review in tc39/ecma402#227 .
@FrankYFTang
Copy link
Contributor

@FrankYFTang

@FrankYFTang
Copy link
Contributor

I try to implement this PR in v8 and I believe it does not produce meaningful result.
If I only allow the api to take additional relatedYear option as "numeric" and convert it to 'r' to specify the desired format, it won't produce meaningful result because in step 2 it will call
Let options be ? ToDateTimeOptions(options, "any", "date").
and therefore we will get options as
{"relatedYear": "numeric", "year": "numeric", "month": "numeric", "day": "numeric" }
as the return of ToDateTimeOptions. The first one is passed in by the user and the year/month/day are added by ToDateTimeOptions.

Now, after step 22 we will also get
{"year": "numeric", "month": "numeric", "day": "numeric", "relatedYear": "numeric"}
as opt.
In the implementation, what should we turn these into?
According to your issue, it will turn relatedYear into "r".
So we will get the value into a skeleton "yMdr" to try to match pattern.
But there are no pattern in CLDR supporting "yMdr".

It seems 'r' is not for skeleton but pattern
For example I see the following skeleton->pattern mapping in ICU related to 'r'
https://cs.chromium.org/chromium/src/third_party/icu/source/data/locales/en.txt?q=locales/en.txt&sq=package:chromium&dr&l=245

        chinese{
            DateTimePatterns{
...
            }
            availableFormats{
 ...
                Gy{"r(U)"}
                GyMMM{"MMM r(U)"}
                GyMMMEd{"E, MMM d, r(U)"}
                GyMMMd{"MMM d, r"}
...
                UM{"M/U"}
                UMMM{"MMM U"}
                UMMMd{"MMM d, U"}
                UMd{"M/d/U"}
 ...
                y{"r(U)"}
                yMd{"M/d/r"}
                yyyy{"r(U)"}
                yyyyM{"M/r"}
                yyyyMEd{"E, M/d/r"}
                yyyyMMM{"MMM r(U)"}
                yyyyMMMEd{"E, MMM d, r(U)"}
                yyyyMMMM{"MMMM r(U)"}
                yyyyMMMd{"MMM d, r"}
                yyyyMd{"M/d/r"}
                yyyyQQQ{"QQQ r(U)"}
                yyyyQQQQ{"QQQQ r(U)"}
            }

but notice 'r' is not on the left side (the skeleton) but only appeared in the right side (the pattern)

@littledan
Copy link
Member Author

Interesting, thanks for the analysis. Should we remove relatedYear as an option and only permit it in the formatToParts output then? Cc @Ms2ger.

In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

The patch adds an additional field to DateTimeFormat: relatedYear.
This is the related Gregorian year, e.g., as used in some Chinese calendar
formats when the Gregorial year is put next to the year in a 60-year cycle.

The patch makes an editorial change to make the descriptions
of the locale database fully switch to records (there was some object-style
notation remaining).

Closes tc39#225
- Point to more appropriate UTS35 version
- Include relatedYear in ToLocalTime
- Only permit numeric, not 2-digit, for relatedYear
@littledan
Copy link
Member Author

Pushed a new PR that omits relatedYear as an option and in resolvedOptions. PTAL.

@@ -303,6 +304,10 @@ <h1>PartitionDateTimePattern ( _dateTimeFormat_, _x_ )</h1>
1. Else,
1. Let _fv_ be an implementation and locale dependent String value representing `"ante meridiem"`.
1. Add new part record { [[Type]]: `"dayPeriod"`, [[Value]]: _fv_ } as a new element of the list _result_.
1. Else if _p_ is equal to `"relativeYear"`, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be relativeYear or relatedYear ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Fixed.

@FrankYFTang
Copy link
Contributor

I think this current PR need some improvement. r and U are often appear together in the pattern and I think when we deal with r we also need to deal with U.

Here is the example

dtf = new Intl.DateTimeFormat("en-u-ca-chinese", {year: "numeric"})

in the above code, after considering the calendar for getting the pattern, we got

                y{"r(U)"}

so the pattern is "r(U)"
and the following code

d = new Date();
dtf.format(d);
"2019(ji-hai)"

notice "2019" is the r:relatedYear, and "ji-hai" is the U "cyclic year name, as in Chinese lunar calendar"
see http://userguide.icu-project.org/formatparse/datetime

now, for formatToParts, what should the output of dtf.formatToParts(d); ?
Should it be

[{type: "relatedYear", value: "2019"}, 
 {type: "literal", value: "("}, 
 {type: "year", value: "ji-hai"}, 
 {type: "literal", value: ")"}]

and treat the U the same way as y ? to report the "ji-hai" part as "year"? or should we report it differently?

@FrankYFTang
Copy link
Contributor

relatedYear is only present in the pattern, not the skeleton, so it
is omitted from the options bag as well as resolvedOptions.
@littledan
Copy link
Member Author

@FrankYFTang Do you think we should give it a different part name? What should that part name be?

@FrankYFTang
Copy link
Contributor

FrankYFTang commented Apr 30, 2019

@FrankYFTang Do you think we should give it a different part name? What should that part name be?

I think so. How about "yearName"? The ICU enum for that is UDAT_YEAR_NAME_FIELD
See http://icu-project.org/apiref/icu4c641/udat_8h.html#adb09b47d4576513229f83f2e8f507fc2ab15e42a6def565977ba0fa86cda77b49

@FrankYFTang
Copy link
Contributor

Here is my prototype in v8
https://chromium-review.googlesource.com/c/v8/v8/+/1588401
And here is the output

ftang@ftang4:~/v8/v8$ out/x64.release/d8 --harmony_intl_other_calendars
V8 version 7.6.0 (candidate)
d8> dtf = new Intl.DateTimeFormat("zh-u-ca-chinese", {year: "numeric"});
{}
d8> t = new Date();
Tue May 21 2019 18:45:39 GMT-0700 (Pacific Daylight Time)
d8> dtf.format(t)
"2019己亥年"
d8> dtf.formatToParts(t)
[{type: "relatedYear", value: "2019"}, {type: "yearName", value: "己亥"}, {type: "literal", value: "年"}]
d8>  dtf = new Intl.DateTimeFormat("zh-u-ca-chinese", {year: "numeric", month: "short"})
{}
d8> dtf.format(t)
"2019己亥年四月"
d8> dtf.formatToParts(t)
[{type: "relatedYear", value: "2019"}, {type: "yearName", value: "己亥"}, {type: "literal", value: "年"}, {type: "month", value: "四月"}]

This is more than what you spec in the current change, it also add the "yearName" as the output of ToLocalTime.

@littledan
Copy link
Member Author

If you want to upload a revised PR, I am happy to review it. Otherwise, I plan to fix this to match those semantics soon.

@FrankYFTang
Copy link
Contributor

I think the change in this PR, in particular

-        1. Let _formats_ be _dataLocaleData_.[[formats]].
+        1. Let _formats_ be _dataLocaleData_.[[formats]].[[&lt;_calendar_&gt;]].

Need to be part of #175
Otherwise, the adding of "calendar" option mean not much, right?
I suggest we fold this PR into #175

littledan added a commit to littledan/ecma402 that referenced this pull request May 27, 2019
In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

Splitting off part of tc39#227. Fixes part of tc39#225.
littledan added a commit to littledan/ecma402 that referenced this pull request May 27, 2019
The patch makes an editorial change to make the descriptions
of the locale database fully switch to records (there was some object-style
notation remaining).

Split off from tc39#227
littledan added a commit to littledan/ecma402 that referenced this pull request May 27, 2019
These fields are present in, e.g., some Chinese calendars.

Broken out from tc39#227. Fixes part of tc39#225.
@littledan
Copy link
Member Author

Subsumed by #349, #350, #351

@littledan littledan closed this May 27, 2019
leobalter pushed a commit that referenced this pull request May 30, 2019
The patch makes an editorial change to make the descriptions
of the locale database fully switch to records (there was some object-style
notation remaining).

Split off from #227
leobalter pushed a commit to leobalter/ecma402 that referenced this pull request Oct 11, 2019
These fields are present in, e.g., some Chinese calendars.

Broken out from tc39#227. Fixes part of tc39#225.
Ms2ger pushed a commit to littledan/ecma402 that referenced this pull request Nov 20, 2019
In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

Splitting off part of tc39#227. Fixes part of tc39#225.
Ms2ger pushed a commit to littledan/ecma402 that referenced this pull request Nov 20, 2019
These fields are present in, e.g., some Chinese calendars.

Broken out from tc39#227. Fixes part of tc39#225.
leobalter pushed a commit that referenced this pull request Nov 20, 2019
These fields are present in, e.g., some Chinese calendars.

Broken out from #227. Fixes part of #225.
littledan added a commit to littledan/ecma402 that referenced this pull request Jan 9, 2020
In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

Splitting off part of tc39#227. Fixes part of tc39#225.
leobalter pushed a commit that referenced this pull request Jan 9, 2020
In CLDR, the set of patterns is based on the the calendar, not
simply the language-region-script. Previously, however, the set
was based purely on the base name. This patch switches to a two-level
lookup to permit implementations to have more accuracy.

Splitting off part of #227. Fixes part of #225.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarifying the expected behavior for non-gregorian calendars in DateTimeFormat.formatToParts
5 participants