Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

[future] capitalization context #11

Open
zbraniecki opened this issue Jan 16, 2017 · 16 comments
Open

[future] capitalization context #11

zbraniecki opened this issue Jan 16, 2017 · 16 comments

Comments

@zbraniecki
Copy link
Member

Relative time format may be useful within localization contexts but in order for that to work, we'd need to follow ICU and expose ability to capitalize the formatted string.

ICU exposes UDisplayContext [0] which in this case would take one of the three values standalone, sentence-start or sentence-middle. Example:

let rtf = new Intl.RelativeTimeFormat('en-US', {
  context: 'sentence-start'
});
rtf.format(new Date() - 1000 * 60 * 60 * 24); // 'Yesterday'
let rtf = new Intl.RelativeTimeFormat('en-US', {
  context: 'sentence-middle'
});
rtf.format(new Date() - 1000 * 60 * 60 * 24); // 'yesterday'

I don't think we should do this for the first revision, but wanted to file an issue already to put it in peoples radar when talking about API.

[0] http://www.icu-project.org/apiref/icu4c/udisplaycontext_8h.html#ac80aa1aceff6c7ad2e9f983a19d8d868

@caridy caridy added this to the 2nd Revision milestone Mar 17, 2017
@littledan
Copy link
Member

There are a bunch more contexts linked from there. How did you arrive at this more minimal set?

@zbraniecki
Copy link
Member Author

I don't remember, but it may be that I was just going for the minimal set that provides the default (standalone) and in-sentence options. I don't see a reason not to add others, but for l10n purposes those three would probably be the most important ones.

@caridy
Copy link
Collaborator

caridy commented Jul 14, 2017

We don't support this in DateTimeFormat. And I think formatToParts helps here in a way that they might implement tricks using CSS and replacements for the time-being. Definitely not something that we need to look at now.

@littledan
Copy link
Member

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

@caridy
Copy link
Collaborator

caridy commented Jul 17, 2017

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

I'm thinking of using parts, plus a little bit of data or function on the client side to decide when to capitalize, and apply a css to an span around the parts.

@littledan
Copy link
Member

@caridy How would you do the actual capitalization?

@caridy
Copy link
Collaborator

caridy commented Jul 19, 2017

This could be a poor man implementation of capitalization for weekdays:

new Intl.DateTimeFormat('en', { weekday: 'short' }).formatToParts(Date.now()).reduce((result, part) => {
     if (part.type === 'weekday') {
         const c = shouldLocaleWeekDayBeCapitalizedForThisCase(); // CLDR based implementation
         result += `<span class="${c ? 'upper' : 'lower' }">${[art.value}</span>`;
     }
     return result;
}, '')

Again, my point is that some of this kind of stuff can be done in user-land today with parts and a little bit of CLDR data.

@littledan
Copy link
Member

@caridy What's the CSS that backs the upper/lower classes? That's the part that I'm getting at that worries me more.

@caridy
Copy link
Collaborator

caridy commented Jul 19, 2017

Oh, that's just text transformation:

.upper {
    text-transform: capitalize;
}
.lower {
   text-transform: lowercase;
}

@littledan
Copy link
Member

Yeah, text-transform: capitalize is not always quite right. It does not do a good linguistic title case, even if it works out fine for English and Spanish. From the spec, it does this:

Puts the first typographic letter unit of each word, if lowercase, in titlecase; other characters are unaffected.

For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Authors should not expect capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The typographical letter unit is defined as:

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33 THAI CHARACTER SARA AM differently; or the behaviour of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system— such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster— that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in [UAX29], as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

The rules for such tailorings are out of scope for CSS.

For one, Firefox seems to recognize the Dutch IJ digraph as a single typographical character unit when the text has that language declared, whereas Chrome seems to treat it as two, leading to a linguistically incorrect output. (See examples on the MDN article).

Until the CSS spec is clarified or a new, more linguistically accurate feature is added, I think we should avoid recommending anyone to use text-transform: capitalize for important things.

cc @frivoal @eaenet

@frivoal
Copy link

frivoal commented Jul 19, 2017

I hope you're not thinking of using text-transform: capitalize. This doesn't really follow reasonable linguistic rules generally.

Until the CSS spec is clarified or a new, more linguistically accurate feature is added, I think we should avoid recommending anyone to use text-transform: capitalize for important things.

The way I see it, text-transform: capitalize is absolutely meant to follow reasonable linguistic rules. However, CSS can't take the responsibility of defining what that is, and defers it to unicode, which is a much more appropriate place for that. In turn, unicode gives somewhat decent baseline rules, but isn't exhaustive, and invites UAs to fill in the blanks.

So my advice would be to use text-transform: capitalize when that's what you intend, to make sure you implement the guidelines and rules given by unicode, and when you hit cases that ought to be handled better (such as the Dutch IJ digraph), add that to you implementation of word-boundary detection or extended grapheme cluster detection, and attempt to get this refinement included in unicode.

If you find faults in the way CSS invokes unicode, then this is certainly something we'd want to fix in CSS, but CSS absolutely wants to defer to unicode the language specific questions on this topic.

Just my 2 cents, possibly missing the point entirely since I was sort of summoned without knowing much of the context.

@littledan
Copy link
Member

Eh, OK, FWIW there's a Blink bug about the IJ thing.

@zbraniecki
Copy link
Member Author

I don't think we can expect formatToParts + CSS to replace this feature for localization purposes.

Firstly, because CSS is meant for styling, while what we're trying to format is content. I don't think you can argue that taking "You should leave In 5 minutes" should be formatted by using formatToParts on the In 5 minutes and applying lowercase on the first part?

Secondly, because of the performance reasons.

Thirdly (not verified), my intuiting tells me that not in all languages the sentence start/middle/end differs just by capitalization.

@jswalden
Copy link

jswalden commented Oct 9, 2017

I'm reviewing the Firefox patch and independently raised this issue -- good to see people on this.

I'm leery of not having context be part of the 1.0 API. It seems a small enough adjustment (implementation-wise and spec-wise) that I'd just do it now. Context information doesn't introduce cross-cutting complexity concerns that clearly would motivate deferring. And failing to specify context now, leaves room for surprising authors and enabling cross-implementation disagreement.

However, I'm fine with not having context control in 1.0 if the specification requires a particular context be used, that would be the default in a customizable future. "RelativeTimeFormat formats a date, with respect to another date, for display in grammatically standalone context." or something like it. *jazzhands* That locks down the behavior implementations should provide as default in the future, and it sets current authors' behavioral expectations.

@littledan
Copy link
Member

OK, if this is the single most useful context, I don't have any real objection to adding it for 1.0. Anyone else have a source of hesitation?

@zbraniecki
Copy link
Member Author

zbraniecki commented Oct 11, 2017

I believe that if we don't allow for context selection, the standalone is the only one we can operate with so I agree with Jeff about adding the note.

@littledan littledan added the v2 label Apr 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants