[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419

kojiishi · 2019-10-15T03:08:03Z

@litherum found that Gecko handles U+2010 very nicely, and I'd like to consider using their idea.

Currently, the line-break property requires:

The following breaks are allowed for normal and loose line breaking if the writing system is Chinese or Japanese, and are otherwise forbidden:
breaks before hyphens:
‐ U+2010, – U+2013, 〜 U+301C, ゠ U+30A0

U+2010 and U+2013 are unified code points, and that it may affect English words in an undesired way. Not sure if this is intentional or not, Gecko supports this only when they follow Japanese characters, and not when they follow Latin letters, regardless of the content language.

jsbin test

It looks to me that this is a very good idea. Maybe not applicable to all cases, but at least these two code points a) are unified and ambiguous, and b) prohibit break before, so looking at the previous character makes sense to me.

Note, the jsbin test includes U+2010 and U+2013 in common CJK fonts, it looks like fonts disagree which code points have full-width CJK glyph and which has Latin glyph.

Thoughts?

/cc @fantasai @frivoal @emilio @jfkthame @drott

kojiishi · 2019-10-18T04:07:44Z

@himorin can JLTF discuss this?

Maybe another wild idea is just remove these rules. U+2010 and U+2013 are not commonly used code points IIUC. I'm wondering whether having different behavior for these code points for lang=ja is net-plus or not.

hftf · 2019-10-24T23:13:43Z

U+2010 and U+2013 are not commonly used code points IIUC.

Just to clarify, are you stating that U+2013 EN DASH is not commonly used, or only not commonly used in a specific context (Japanese text)? Because it is certainly one of the most frequently used characters in the General Punctuation block (cf. https://stackoverflow.com/a/5575000/1057509).

Marginally related issue about U+2010 HYPHEN and line breaking: #3434

kojiishi · 2019-10-25T04:36:08Z

Thank you for pointing that out, I meant the latter, in Japanese context.

fantasai · 2019-11-06T00:46:34Z

@kojiishi EN DASH is frequently used for numeric ranges, e.g. 7–11. I think numbers are often used in Japanese, what would you expect to happen there?

kojiishi · 2019-11-06T02:44:46Z

@fantasai After discussing with Kobayashi-sensei, I need your help. Do you remember why we allow breaking before these code points for normal?

Kobayashi-sensei thinks these are rarely used code points in Japanese, but he feels more natural to prohibit breaking before them. Even if there were cases/reasons where we want to break before them, if it has side effects, changing would be fine.

I checked JLREQ line break table, and found that it prohibits break before cl-03. I thought normal is a copy of JLREQ rules, but maybe we tweaked for some reasons I don't remember.

I think we can remove normal from this rule. Actually, other than this one, normal matches UAX#14. How about removing all rules for normal and defer to UAX#14? UAX#14 doesn't have strict / loose, so we can keep them, but normal does not seem to be necessary if we can match these 4 code points to UAX#14.

kojiishi · 2019-11-06T02:57:34Z

@kojiishi EN DASH is frequently used for numeric ranges, e.g. 7–11. I think numbers are often used in Japanese, what would you expect to happen there?

U+2013 was added to JIS in 2013, it wasn't used before. I think most Japanese use U+002D HYPHEN-MINUS for numeric ranges, or its full-width counterpart if digits are in full-width.

frivoal · 2019-11-07T03:59:43Z

I'd say that for ranges, Japanese people would often write 7〜11, rather than use a dash or hyphen of some kind.

himorin · 2019-11-13T07:22:01Z

just for record,,,
initially this section was introduced by commit 0b1a55a, to have developed list of line breaking rule.

Also, in commit 8d2b106, these text changed from

Following breaks be forbidden in strict line breaking and allowed in normal:

breaks before the hyphens (U+2010, U+2013, U+301C, U+30A0)

to

Additionally, if the language is known to be Chinese or Japanese, breaks
before hyphens (U+2010, U+2013, U+301C, U+30A0) may be allowed in
‘normal’.

himorin · 2019-11-13T07:23:51Z

could be related: https://bugzilla.mozilla.org/show_bug.cgi?id=1595428

kojiishi · 2019-11-13T17:06:15Z

just for record,,,
initially this section was introduced by commit 0b1a55a, to have developed list of line breaking rule.

Oh, thank you. Looks like just an error, I guess the intention was to make sure they are prohibited for strict, but not to allow for normal.

fantasai · 2020-01-23T14:17:22Z

Just discussed with @frivoal @kojiishi @jfkthame, conclusion is:

Disallow before hyphen in normal and strict.
Allow break between ID and hyphen in loose. This means Kanji+Hyphen breaks; and Alphabetic+Hyphen doesn't break, unless word-break: break-all makes Alphabetic behave like ID.

css-meeting-bot · 2020-01-24T10:32:41Z

The CSS Working Group just discussed [css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013, and agreed to the following:

RESOLVED: Adopt the suggestion in https://github.com/w3c/csswg-drafts/issues/4419#issuecomment-577700150
RESOLVED: Disallow before hyphen in normal and strict. Allow break between ID and hyphen in loose. This means Kanji+Hyphen breaks; and Alphabetic+Hyphen doesn't break, unless word-break: break-all makes Alphabetic behave like ID.

The full IRC log of that discussion

<Rossen__> Topic: [css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013
<Rossen__> github: https://github.com//issues/4419
<fantasai> ScribeNick: fantasai
<fantasai> ScribeNick: emilio
<emilio> koji: the current CSS spec says that if the language is japanese and line-break: normal there should be a break opportunity before 2010 and 2013
<emilio> ... it can break strangely for english words within japanese text
<emilio> ... gecko fixed it by not breaking if the previous character is a latin character
<emilio> ... but I want to fix this in the spec
<emilio> ... and make sure all browsers agree
<emilio> fantasai: we got together yesterday and concluded in all langs you want to disallow breaks before hyphens in normal breaking mode but japanese wants to allow it in loose mode
<fantasai> https://github.com//issues/4419#issuecomment-577700150
<emilio> ... so word-break break-all would allow between the latin letter and the hyphen
<Rossen__> q?
<emilio> ... so that's the solution outlined in the last comment (above)
<emilio> myles: are we going to contact ICU
<emilio> koji: if we agree I'll do
<emilio> florian: I support this
<myles> s/ICU/ICU and CLDR?
<myles> s/ICU/ICU and CLDR?/
<emilio> Rossen__: objections?
<emilio> RESOLVED: Adopt the suggestion in https://github.com//issues/4419#issuecomment-577700150
<emilio> RESOLVED: Disallow before hyphen in normal and strict. Allow break between ID and hyphen in loose. This means Kanji+Hyphen breaks; and Alphabetic+Hyphen doesn't break, unless word-break: break-all makes Alphabetic behave like ID.

See w3c/csswg-drafts#4419

…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626

…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626 UltraBlame original commit: 2fe9fdc7c1b09ffa178cc20496b02f1d7ee52878

…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626

kojiishi changed the title ~~[css-text]~~ [css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 Oct 18, 2019

kojiishi added css-text-3 Current Work i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Oct 18, 2019

himorin mentioned this issue Oct 18, 2019

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 w3c/i18n-activity#787

Closed

xfq added the i18n-clreq Chinese language enablement label Nov 7, 2019

kojiishi added the Agenda+ F2F label Jan 15, 2020

fantasai added the Needs Edits label Jan 23, 2020

css-meeting-bot mentioned this issue Jan 24, 2020

[css-ruby-1] ruby overhang control #4492

Closed

mozilla-apprentice mentioned this issue Jan 24, 2020

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 mozilla/wg-decisions#189

Closed

frivoal added the Needs Testcase (WPT) label Jan 24, 2020

astearns removed the Agenda+ F2F label Jan 24, 2020

frivoal added a commit to frivoal/wpt that referenced this issue Feb 6, 2020

[css-text] Adjust test cases to match css-wg resolution

d59cc61

See w3c/csswg-drafts#4419

frivoal mentioned this issue Feb 6, 2020

[css-text] Adjust test cases to match css-wg resolution web-platform-tests/wpt#21626

Merged

frivoal closed this as completed in 2cfcf43 Feb 6, 2020

frivoal added a commit to frivoal/wpt that referenced this issue Feb 6, 2020

[css-text] Adjust test cases to match css-wg resolution

964d265

See w3c/csswg-drafts#4419

frivoal added a commit to web-platform-tests/wpt that referenced this issue Feb 6, 2020

[css-text] Adjust test cases to match css-wg resolution

d2767c0

See w3c/csswg-drafts#4419

frivoal added Tested Memory aid - issue has WPT tests and removed Needs Edits Needs Testcase (WPT) labels Apr 2, 2020

fantasai added Closed Accepted by CSSWG Resolution Tracked in DoC labels Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419

kojiishi commented Oct 15, 2019

kojiishi commented Oct 18, 2019

hftf commented Oct 24, 2019

kojiishi commented Oct 25, 2019

fantasai commented Nov 6, 2019

kojiishi commented Nov 6, 2019

kojiishi commented Nov 6, 2019

frivoal commented Nov 7, 2019

himorin commented Nov 13, 2019 •

edited

Loading

himorin commented Nov 13, 2019

kojiishi commented Nov 13, 2019

fantasai commented Jan 23, 2020

css-meeting-bot commented Jan 24, 2020

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419

[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419

Comments

kojiishi commented Oct 15, 2019

kojiishi commented Oct 18, 2019

hftf commented Oct 24, 2019

kojiishi commented Oct 25, 2019

fantasai commented Nov 6, 2019

kojiishi commented Nov 6, 2019

kojiishi commented Nov 6, 2019

frivoal commented Nov 7, 2019

himorin commented Nov 13, 2019 • edited Loading

himorin commented Nov 13, 2019

kojiishi commented Nov 13, 2019

fantasai commented Jan 23, 2020

css-meeting-bot commented Jan 24, 2020

himorin commented Nov 13, 2019 •

edited

Loading