-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 #4419
Comments
@himorin can JLTF discuss this? Maybe another wild idea is just remove these rules. U+2010 and U+2013 are not commonly used code points IIUC. I'm wondering whether having different behavior for these code points for |
Just to clarify, are you stating that U+2013 EN DASH is not commonly used, or only not commonly used in a specific context (Japanese text)? Because it is certainly one of the most frequently used characters in the General Punctuation block (cf. https://stackoverflow.com/a/5575000/1057509). Marginally related issue about U+2010 HYPHEN and line breaking: #3434 |
Thank you for pointing that out, I meant the latter, in Japanese context. |
@kojiishi EN DASH is frequently used for numeric ranges, e.g. 7–11. I think numbers are often used in Japanese, what would you expect to happen there? |
@fantasai After discussing with Kobayashi-sensei, I need your help. Do you remember why we allow breaking before these code points for Kobayashi-sensei thinks these are rarely used code points in Japanese, but he feels more natural to prohibit breaking before them. Even if there were cases/reasons where we want to break before them, if it has side effects, changing would be fine. I checked JLREQ line break table, and found that it prohibits break before cl-03. I thought I think we can remove |
U+2013 was added to JIS in 2013, it wasn't used before. I think most Japanese use U+002D HYPHEN-MINUS for numeric ranges, or its full-width counterpart if digits are in full-width. |
I'd say that for ranges, Japanese people would often write 7〜11, rather than use a dash or hyphen of some kind. |
just for record,,, Also, in commit 8d2b106, these text changed from
to
|
could be related: https://bugzilla.mozilla.org/show_bug.cgi?id=1595428 |
Oh, thank you. Looks like just an error, I guess the intention was to make sure they are prohibited for |
The CSS Working Group just discussed
The full IRC log of that discussion<Rossen__> Topic: [css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013<Rossen__> github: https://github.com//issues/4419 <fantasai> ScribeNick: fantasai <fantasai> ScribeNick: emilio <emilio> koji: the current CSS spec says that if the language is japanese and line-break: normal there should be a break opportunity before 2010 and 2013 <emilio> ... it can break strangely for english words within japanese text <emilio> ... gecko fixed it by not breaking if the previous character is a latin character <emilio> ... but I want to fix this in the spec <emilio> ... and make sure all browsers agree <emilio> fantasai: we got together yesterday and concluded in all langs you want to disallow breaks before hyphens in normal breaking mode but japanese wants to allow it in loose mode <fantasai> https://github.com//issues/4419#issuecomment-577700150 <emilio> ... so word-break break-all would allow between the latin letter and the hyphen <Rossen__> q? <emilio> ... so that's the solution outlined in the last comment (above) <emilio> myles: are we going to contact ICU <emilio> koji: if we agree I'll do <emilio> florian: I support this <myles> s/ICU/ICU and CLDR? <myles> s/ICU/ICU and CLDR?/ <emilio> Rossen__: objections? <emilio> RESOLVED: Adopt the suggestion in https://github.com//issues/4419#issuecomment-577700150 <emilio> RESOLVED: Disallow before hyphen in normal and strict. Allow break between ID and hyphen in loose. This means Kanji+Hyphen breaks; and Alphabetic+Hyphen doesn't break, unless word-break: break-all makes Alphabetic behave like ID. |
…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626
…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626 UltraBlame original commit: 2fe9fdc7c1b09ffa178cc20496b02f1d7ee52878
…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626 UltraBlame original commit: 2fe9fdc7c1b09ffa178cc20496b02f1d7ee52878
…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626 UltraBlame original commit: 2fe9fdc7c1b09ffa178cc20496b02f1d7ee52878
…s-wg resolution, a=testonly Automatic update from web-platform-tests [css-text] Adjust test cases to match css-wg resolution See w3c/csswg-drafts#4419 -- wpt-commits: d2767c04559c016e04ad43fcc07f63f1153d18bf wpt-pr: 21626
@litherum found that Gecko handles U+2010 very nicely, and I'd like to consider using their idea.
Currently, the line-break property requires:
U+2010 and U+2013 are unified code points, and that it may affect English words in an undesired way. Not sure if this is intentional or not, Gecko supports this only when they follow Japanese characters, and not when they follow Latin letters, regardless of the content language.
jsbin test
It looks to me that this is a very good idea. Maybe not applicable to all cases, but at least these two code points a) are unified and ambiguous, and b) prohibit break before, so looking at the previous character makes sense to me.
Note, the jsbin test includes U+2010 and U+2013 in common CJK fonts, it looks like fonts disagree which code points have full-width CJK glyph and which has Latin glyph.
Thoughts?
/cc @fantasai @frivoal @emilio @jfkthame @drott
The text was updated successfully, but these errors were encountered: