[future] capitalization context #11

zbraniecki · 2017-01-16T21:28:33Z

Relative time format may be useful within localization contexts but in order for that to work, we'd need to follow ICU and expose ability to capitalize the formatted string.

ICU exposes UDisplayContext [0] which in this case would take one of the three values standalone, sentence-start or sentence-middle. Example:

let rtf = new Intl.RelativeTimeFormat('en-US', {
  context: 'sentence-start'
});
rtf.format(new Date() - 1000 * 60 * 60 * 24); // 'Yesterday'

let rtf = new Intl.RelativeTimeFormat('en-US', {
  context: 'sentence-middle'
});
rtf.format(new Date() - 1000 * 60 * 60 * 24); // 'yesterday'

I don't think we should do this for the first revision, but wanted to file an issue already to put it in peoples radar when talking about API.

[0] http://www.icu-project.org/apiref/icu4c/udisplaycontext_8h.html#ac80aa1aceff6c7ad2e9f983a19d8d868

The text was updated successfully, but these errors were encountered:

littledan · 2017-06-23T11:36:48Z

There are a bunch more contexts linked from there. How did you arrive at this more minimal set?

zbraniecki · 2017-07-05T21:23:13Z

I don't remember, but it may be that I was just going for the minimal set that provides the default (standalone) and in-sentence options. I don't see a reason not to add others, but for l10n purposes those three would probably be the most important ones.

caridy · 2017-07-14T20:36:19Z

We don't support this in DateTimeFormat. And I think formatToParts helps here in a way that they might implement tricks using CSS and replacements for the time-being. Definitely not something that we need to look at now.

littledan · 2017-07-15T12:21:04Z

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

caridy · 2017-07-17T15:45:24Z

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

I'm thinking of using parts, plus a little bit of data or function on the client side to decide when to capitalize, and apply a css to an span around the parts.

littledan · 2017-07-18T15:42:26Z

@caridy How would you do the actual capitalization?

caridy · 2017-07-19T00:36:58Z

This could be a poor man implementation of capitalization for weekdays:

new Intl.DateTimeFormat('en', { weekday: 'short' }).formatToParts(Date.now()).reduce((result, part) => {
     if (part.type === 'weekday') {
         const c = shouldLocaleWeekDayBeCapitalizedForThisCase(); // CLDR based implementation
         result += `<span class="${c ? 'upper' : 'lower' }">${[art.value}</span>`;
     }
     return result;
}, '')

Again, my point is that some of this kind of stuff can be done in user-land today with parts and a little bit of CLDR data.

littledan · 2017-07-19T08:53:19Z

@caridy What's the CSS that backs the upper/lower classes? That's the part that I'm getting at that worries me more.

caridy · 2017-07-19T13:37:33Z

Oh, that's just text transformation:

.upper {
    text-transform: capitalize;
}
.lower {
   text-transform: lowercase;
}

littledan · 2017-07-19T14:29:46Z

Yeah, text-transform: capitalize is not always quite right. It does not do a good linguistic title case, even if it works out fine for English and Spanish. From the spec, it does this:

Puts the first typographic letter unit of each word, if lowercase, in titlecase; other characters are unaffected.

For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Authors should not expect capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The typographical letter unit is defined as:

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33 THAI CHARACTER SARA AM differently; or the behaviour of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system— such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster— that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in [UAX29], as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

The rules for such tailorings are out of scope for CSS.

For one, Firefox seems to recognize the Dutch IJ digraph as a single typographical character unit when the text has that language declared, whereas Chrome seems to treat it as two, leading to a linguistically incorrect output. (See examples on the MDN article).

Until the CSS spec is clarified or a new, more linguistically accurate feature is added, I think we should avoid recommending anyone to use text-transform: capitalize for important things.

cc @frivoal @eaenet

frivoal · 2017-07-19T15:15:16Z

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

Until the CSS spec is clarified or a new, more linguistically accurate feature is added, I think we should avoid recommending anyone to use text-transform: capitalize for important things.

The way I see it, text-transform: capitalize is absolutely meant to follow reasonable linguistic rules. However, CSS can't take the responsibility of defining what that is, and defers it to unicode, which is a much more appropriate place for that. In turn, unicode gives somewhat decent baseline rules, but isn't exhaustive, and invites UAs to fill in the blanks.

So my advice would be to use text-transform: capitalize when that's what you intend, to make sure you implement the guidelines and rules given by unicode, and when you hit cases that ought to be handled better (such as the Dutch IJ digraph), add that to you implementation of word-boundary detection or extended grapheme cluster detection, and attempt to get this refinement included in unicode.

If you find faults in the way CSS invokes unicode, then this is certainly something we'd want to fix in CSS, but CSS absolutely wants to defer to unicode the language specific questions on this topic.

Just my 2 cents, possibly missing the point entirely since I was sort of summoned without knowing much of the context.

littledan · 2017-07-19T19:19:18Z

Eh, OK, FWIW there's a Blink bug about the IJ thing.

zbraniecki · 2017-10-08T23:42:54Z

I don't think we can expect formatToParts + CSS to replace this feature for localization purposes.

Firstly, because CSS is meant for styling, while what we're trying to format is content. I don't think you can argue that taking "You should leave In 5 minutes" should be formatted by using formatToParts on the In 5 minutes and applying lowercase on the first part?

Secondly, because of the performance reasons.

Thirdly (not verified), my intuiting tells me that not in all languages the sentence start/middle/end differs just by capitalization.

jswalden · 2017-10-09T18:47:50Z

I'm reviewing the Firefox patch and independently raised this issue -- good to see people on this.

I'm leery of not having context be part of the 1.0 API. It seems a small enough adjustment (implementation-wise and spec-wise) that I'd just do it now. Context information doesn't introduce cross-cutting complexity concerns that clearly would motivate deferring. And failing to specify context now, leaves room for surprising authors and enabling cross-implementation disagreement.

However, I'm fine with not having context control in 1.0 if the specification requires a particular context be used, that would be the default in a customizable future. "RelativeTimeFormat formats a date, with respect to another date, for display in grammatically standalone context." or something like it. *jazzhands* That locks down the behavior implementations should provide as default in the future, and it sets current authors' behavioral expectations.

littledan · 2017-10-11T09:54:05Z

OK, if this is the single most useful context, I don't have any real objection to adding it for 1.0. Anyone else have a source of hesitation?

zbraniecki · 2017-10-11T16:36:31Z

I believe that if we don't allow for context selection, the standalone is the only one we can operate with so I agree with Jeff about adding the note.

caridy added the enhancement label Mar 17, 2017

caridy added this to the 2nd Revision milestone Mar 17, 2017

littledan added the v2 label Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[future] capitalization context #11

[future] capitalization context #11

zbraniecki commented Jan 16, 2017

littledan commented Jun 23, 2017

zbraniecki commented Jul 5, 2017

caridy commented Jul 14, 2017

littledan commented Jul 15, 2017

caridy commented Jul 17, 2017

littledan commented Jul 18, 2017

caridy commented Jul 19, 2017 •

edited

Loading

littledan commented Jul 19, 2017

caridy commented Jul 19, 2017

littledan commented Jul 19, 2017

frivoal commented Jul 19, 2017

littledan commented Jul 19, 2017

zbraniecki commented Oct 8, 2017

jswalden commented Oct 9, 2017

littledan commented Oct 11, 2017

zbraniecki commented Oct 11, 2017 •

edited

Loading

[future] capitalization context #11

[future] capitalization context #11

Comments

zbraniecki commented Jan 16, 2017

littledan commented Jun 23, 2017

zbraniecki commented Jul 5, 2017

caridy commented Jul 14, 2017

littledan commented Jul 15, 2017

caridy commented Jul 17, 2017

littledan commented Jul 18, 2017

caridy commented Jul 19, 2017 • edited Loading

littledan commented Jul 19, 2017

caridy commented Jul 19, 2017

littledan commented Jul 19, 2017

frivoal commented Jul 19, 2017

littledan commented Jul 19, 2017

zbraniecki commented Oct 8, 2017

jswalden commented Oct 9, 2017

littledan commented Oct 11, 2017

zbraniecki commented Oct 11, 2017 • edited Loading

caridy commented Jul 19, 2017 •

edited

Loading

zbraniecki commented Oct 11, 2017 •

edited

Loading