Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add String.toTitleCase, String.toLocaleTitleCase #294

Open
srl295 opened this issue Nov 9, 2018 · 18 comments
Open

add String.toTitleCase, String.toLocaleTitleCase #294

srl295 opened this issue Nov 9, 2018 · 18 comments
Labels
c: text Component: case mapping, collation, properties Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward

Comments

@srl295
Copy link
Member

srl295 commented Nov 9, 2018

'ამდენი'.toUpperCase(); // ᲐᲛᲓᲔᲜᲘ
'ამდენი'.toTitleCase(); // ამდენი
'Ijussite'.toTitleCase(); // Ijussite
'ijsselmeer'.toLocaleTitleCase('nl'); // IJsselmeer

Some background on title case issues for Georgian at this document: https://gist.github.com/srl295/1d9603ecfbcae55a08b04e9cd925d349#problem

@jungshik
Copy link

Shouldn't this be filed against tc39/ecma262 (as well) ?

@littledan
Copy link
Member

I think it's enough to track it here, even though we would make changes in the main spec too. See also #99 .

@srl295
Copy link
Member Author

srl295 commented Nov 14, 2018

also CLDR (data) and ICU (implementation) implement casing via transforms (transliterators) due to complexity.

@srl295
Copy link
Member Author

srl295 commented Nov 15, 2018

Title case is more complex than just these issues.

@sffc
Copy link
Contributor

sffc commented Feb 5, 2019

This is a big pain point in JavaScript. Here's a SO question with 461 upvotes where almost all of the answers are, "take the first char and make it upper case":

https://stackoverflow.com/q/196972/1407170

We should discuss whether the right answer is title case or whether we should use sentence casing, etc.

@tomayac
Copy link
Member

tomayac commented Feb 7, 2019

I was pointed to this very issue by @littledan because I just wrote an article on a potential CSS text-transform: titlecase feature. The long history of the discussion around this (linked from the article) might be interesting for dealing with the question now in JavaScript land.

@hanguokai
Copy link

hanguokai commented Feb 13, 2019

Title case is usually used in article titles(e.g. <h1></h1>) and menu items of an application.

It is related to rules in different languages/locales. Because natural languages are not like programming languages, there may be more complicated rules or uncertain variants and exceptions. I don't know all languages rules. I think if a language has static and definite rules for title case and that are not affected by different contextual semantics and no ambiguities, it could be implemented in JavaScript, or it is not suitable for implementation in JavaScript.

In Chinese, there is almost no capitalization, uppercase and lowercase concepts. So title case does not apply to Chinese.

In English, I found these references and implementation from https://individed.com/code/to-title-case/ by @gouch .

@sffc
Copy link
Contributor

sffc commented Feb 13, 2019

Another use case to consider is tc39/proposal-intl-displaynames#13

Different types of display names have different capitalization rules based on context. For example, you might titlecase month names in some locales but not in others.

@sffc sffc added c: text Component: case mapping, collation, properties s: help wanted Status: help wanted; needs proposal champion labels Mar 19, 2019
@sffc
Copy link
Contributor

sffc commented May 2, 2020

@markusicu What are your thoughts on putting titlecasing more front and center in JavaScript?

@markusicu
Copy link

The question is what people mean with "titlecasing". Unicode has a decent spec, and ICU has a solid implementation, for titlecasing at certain boundaries (with adjustment options) and leaving alone or lowercasing the rest of the string. However, different people use it for different things.

Some people want just the start of the string titlecased. Some want the start of each sentence. Some want the start of each word. ICU lets you provide different BreakIterator instances/options for these choices.

In the US, there is a peculiar style of "titlecasing" book titles and article headlines that titlecases some words but not others. This is language- and style-specific and not built into ICU. You would need to provide the offsets to ICU for where to titlecase and where not.

Note that like all case mapping operations, titlecasing is a lossy operation. It's also not always obvious. It is not always actually desirable to titlecase the first character of a word and lowercase the rest. Think of acronyms like NASA, names like McDonald, product names like iPhone. The best we have for that is the "don't lowercase the rest" option.

FYI For some characters, titlecasing is different from uppercasing.

FYI Yes, CLDR/ICU have "Transliterator" rules for case mappings, but most people don't use them. For example, Greek uppercasing would be more difficult with a Transliterator rule than with the hand-coded implementation in the low-level API I think.

@sffc sffc added s: comment Status: more info is needed to move forward Proposal Larger change requiring a proposal and removed s: help wanted Status: help wanted; needs proposal champion labels Jun 5, 2020
@domenic
Copy link
Member

domenic commented Jul 12, 2020

Just chiming in because opinions were solicited on Twitter...

In the US, there is a peculiar style of "titlecasing" book titles and article headlines that titlecases some words but not others. This is language- and style-specific and not built into ICU. You would need to provide the offsets to ICU for where to titlecase and where not.

If there was a JS standard library function called toTitleCase(), and it was not usable for the purposes of US book titles and article headlines, this would be extremely surprising. From the rest of this thread I am gathering that what Unicode/ICU calls "titlecase" applies only to single "words", for some definition of word? In that case a method name more like wordToTitleCase() would help.

@aphillips
Copy link

@domenic The US isn't the world. There exist different cultural conventions regarding titlecasing, even within English. If the JS standard library function toTitleCase() only serves US English booktitle's idiosyncratic needs, it's not really as useful as one might think.

W3C-I18N recently closed an issue related to CSS (it was quite an old issue--we were housekeeping). Basically CSS decided that the text-transform: capitalize style (used to create a titlecasing effect) only affects lowercase letters. This helps avoid problems with over-case-normalizing words with internal capitals ("McGowan"). The overall thread helps illustrate how titlecase is more complex than it appears to be. (Charmod-Norm spills a small amount of ink on it too, although it barely mentions titlecasing).

I do think a locale-aware titlecasing function would be useful. As @markusicu mentions, ICU has a solid implementation that covers most user's needs for most strings. But the gaps are not isolated in obscure locales or scripts.

@domenic
Copy link
Member

domenic commented Jul 12, 2020

To be clear, I'm not suggesting adding a toTitleCase() that does US English titlecasing. I'm simply saying that if a toTitleCase () is added, and it fails to do US English titlecasing, that would be extremely surprising. As such I was suggesting that a different name be used for a function that does the style of titlecasing that this thread seems to be discussing.

@aphillips
Copy link

@domenic Ah, I get it. Still, most functions that claim to "titlecase" in other programming languages are algorithmic and fail to get US English titlecasing correct either. To your point, notice that CSS's transform is called "capitalize". That might be a good choice here too, since unlike wordToTitleCase, it suggests avoiding lowercasing the rest of the string.

@leobalter
Copy link
Member

As the editor, I don't have a specific preference if we should add the features discussed here, but I have some observations:

  • Based on what @domenic has said, any toTitleCase should observe locales correctly or it might be a new resource for mistakes and possibly frustration in the Developer Experience.
  • Therefore, I'd be up to discuss a localized version of toTitleCase/toLocaleTitleCase, but we might end up with a single method.
  • Let's consider the costs mentioned by @zbraniecki in Evaluate the cost of capitalization rules proposal-intl-displaynames#13. If the cost remains too high for the benefit, we can prevent ourselves from a later frustration.
  • We might wanna discuss a String#capitalizeish function separately, this one seems to be more universal and not much dependent on locale.

IMO, wordToTitleCase is not a method name I'd love to see, but take this as a personal note and we first need to verify if we are up to add the feature, regardless of naming.

@sffc let's add this to the discussions for the next TG2 meeting?

@sffc
Copy link
Contributor

sffc commented Jul 13, 2020

@everyone: if you want to see titlecasing (regardless of the exact implementation, e.g. capitalize individual words versus string toTitleCase), please 👍 the OP. There are still only 2 votes for this issue. I can't tell whether the discussion in this thread is "if we were to theoretically do this, this is what it should look like" or "I think we should do this, and here's some discussion to get the ball rolling".

@aphillips
Copy link

@leobalter capitalize is not "universal" and does depend on locale in exactly the same way that upper/lower/titlecasing does and for the same reasons. The point of doing capitalize instead of titlecase is what I mentioned in my earlier comments: it's complex to get titlecase right. User's of capitalize can get the effect of (suboptimal) titlecasing by lowercasing the string first.

@leobalter
Copy link
Member

@aphillips it seems it's not that simple even for capitalization, then. TIL, thanks for the heads up.

@sffc in my position I'd be using the feature, rather than implementing it. I'm definitely down to see it being discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: text Component: case mapping, collation, properties Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward
Projects
None yet
Development

No branches or pull requests

10 participants