Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text-3] Switch line-breaking handling of atomic inlines #4949

Open
fantasai opened this issue Apr 14, 2020 · 20 comments
Open

[css-text-3] Switch line-breaking handling of atomic inlines #4949

fantasai opened this issue Apr 14, 2020 · 20 comments
Labels
css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Thought

Comments

@fantasai
Copy link
Collaborator

CSS Text tried to define that atomic inlines behave like ID characters with respect to line breaking (e.g., breaks between an atomic inline and a closing parenthesis or comma is forbidden). We had to alter that to allow breaks between atomic inlines and nbsp due to compat. More recently, #4576 found that there were sites depending on the "always breakable" behavior of atomic inlines with punctuation as well. This means by default, atomic inlines need to always allow breaks before and after the atomic inline, regardless of what character is there.

The problem is that this is an unnatural line breaking pattern for things like emoticons, gaiji, or other images that are intended to behave like text. We need some way to switch atomic inlines into this mode.

Two proposals were raised on the thread in #4576:

  • Re-use the inheritable line-break for this purpose: values other than auto (the initial value) treat atomic inlines like ID.
  • Introduce a new value for the non-inherited wrap-before and wrap-after properties in CSS Text Level 4 to make this distinction.

A third option would be to introduce yet another line-breaking property dedicated to this problem. Fourth option is not to solve. (Personally I do not prefer these solutions as we have way too many line-breaking controls already, and I do think this is a problem worth solving.)

What do we want to do here?

@fantasai
Copy link
Collaborator Author

fantasai commented Apr 14, 2020

@frivoal opened PR #4755 for the first option, fwiw.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Apr 14, 2020
@kojiishi
Copy link
Contributor

My preference is 3rd atm.

Also fine with 4th, defer until we hear actual voices wanting it. We already heard some requests to customize line breaking behavior. By deferring, we will have opportunity to design features that can suffice more requests.

@litherum
Copy link
Contributor

I have a slight preference for 2 because these existing properties are already designed to control line breaking around atomic inlines, so they are a natural place to put this functionality.

@fantasai
Copy link
Collaborator Author

fantasai commented Apr 14, 2020

@litherum Note, the wrap properties control line breaking around any inline, not just atomic inlines.

@kojiishi
Copy link
Contributor

I don't have good confidence whether wrap-before is the right way to go or not. Defining/disallowing break opportunities at element boundaries is a complex problem.

I prefer to customize line breaking classes of a character (or a meta character that represents atomic inlines in this case) then use UAX#14 rules as is. If that's not good enough, we can consider customizing UAX#14 rules. But I'm hesitating to define behavior at element boundaries.

@jfkthame I think what I described above is the same as what you described in some other issues about line breaking behavior at element boundaries, but I'm not certain if I understood your opinion there. WDYT about this issue?

@jfkthame
Copy link
Contributor

I started writing a long comment here that I thought was going to be in support of PR #4755, but by the end I found myself thinking that maybe the wrap-* properties are the right solution. So I guess I'm leaning towards proposal 2.

Could we define additional values for wrap-* that don't directly force or prohibit a break, but instead override the line-breaking class of the inline? So wrap-before: ideographic would mean the element behaves as class ID for the purpose of determining whether a break is allowed before it, etc.

So the initial values wrap-before: auto; wrap-after: auto; would give the web-compatible legacy "always breakable" behavior for an inline image; wrap-before: ideographic; wrap-after: ideographic; would give the behavior CSS Text tried to specify as default (without needing any specific line-break value); and other values could be defined to correspond to other line-breaking classes if there's any demand for them.

(It'd be nice to have a shorthand that sets both wrap-before and wrap-after to a single value: img { wrap: ideographic; } to opt in to the ID-like behavior. But maybe wrap is too short and non-specific, given how many assorted line-breaking/wrapping controls we have.)

@kojiishi
Copy link
Contributor

kojiishi commented Apr 30, 2020

I'm good with that too if wrap-* becomes to what @jfkthame suggested. One minor thing though, can it be wrap-before: 'A' to make it "wrap like the the character 'A'", instead of mapping to the line breaking class? This is easier for ICU line breaker to handle.

@fantasai
Copy link
Collaborator Author

fantasai commented May 6, 2020

Maybe wrap-as for the shorthand?

@fantasai
Copy link
Collaborator Author

fantasai commented May 6, 2020

I think picking a character to emulate can be an implementation detail; as long as the UA picks a character from the correct category, there's no difference in behavior.

@jfkthame
Copy link
Contributor

jfkthame commented May 6, 2020

wrap-as sounds workable to me.

@css-meeting-bot
Copy link
Member

css-meeting-bot commented May 6, 2020

The CSS Working Group just discussed Switch line-breaking handling of atomic inlines.

RESOLVED: add "wrap-as" and values, details TBD later

The full IRC log of that discussion <astearns> topic: Switch line-breaking handling of atomic inlines
<astearns> github: https://github.com//issues/4949
<fremy> fantasai: we had defined atomic inlines to work like ideographic characters
<fremy> fantasai: but that is unfortunatley not web compatible
<fremy> fantasai: even if this would be a nicer behavior
<fremy> fantasai: but since forever, atomic inlines have allowed breaking opportunities
<fremy> fantasai: so we accepted our fate
<fremy> fantasai: but there are use cases for the correct behavior though
<fremy> fantasai: so there was a question of how to swtich to that behavior
<fremy> fantasai: line-break not being auto ===> atomic treated as ID
<fremy> fantasai: another option: wrap-before/after to control wrapping before a particular inline, so you could have values to prevent/avoid
<fremy> fantasai: one of them could be this smart behavior
<fremy> fantasai: so, do we want to introduce a switch of behavior toggle
<fremy> fantasai: and if so, which option?
<fremy> fantasai: an issue would be that this won't be very visible to most languages
<fremy> fantasai: and koji was afraid some people might set it, then have big effects for CJK languages
<fremy> fantasai: the other option is more targetted
<fantasai> s/big/subtle/
<myles> q+
<fremy> fantasai: but it has the downside you have to target each element independtly
<astearns> q?
<astearns> ack fantasai
<Zakim> fantasai, you wanted to ask if 'contain:layout' trapping scroll snapping is actually what we wnt
<fremy> florian: one other issue is that the line breaking properties currently don't exist anywhere
<fremy> florian: so adding new behavior to them is wishful thinking
<koji> +q
<xfq_> ack my
<fremy> myles: in all the ebooks that use images-as-text I have seen, they use a class on these images
<astearns> ack myles
<faceless2_> +1 to myles
<fremy> myles: so the rule to target them all is very easy
<astearns> ack koji
<fremy> koji: in the github issue, we said it's fine with the property, but we want a different feature
<fremy> koji: the proposal was to pretend that atomic inline was a line-breaking class
<fremy> koji: and as we discussed in other issues, we have to resolve the ambiguity between elements boundaries
<fremy> koji: and maybe that should be discussed in that context
<fremy> koji: I like that idea that was proposed on github
<fremy> koji: I talked to ICU people to see if that would be possible
<fremy> koji: but that didn't get an approval
<faceless2_> q+
<fremy> koji: so they suggested to pick a specific character instead
<fremy> fantasai: I'm fine with selecting one specific character we consider to be representative of ID
<fremy> fantasai: it would be confusing for people to have to pick on char
<fremy> fantasai: the mapping can be implementation detail
<fantasai> s/on/a/
<fremy> koji: i agree
<astearns> ack faceless2_
<fremy> faceless2: I agree with koji, that proposal is quite flexible
<fremy> florian: if this means we are going to prioritize implementing these properties, I agree
<fremy> florian: but this is a very useful case for us
<fremy> florian: and just pushing it to a new level doesn't do much for us
<faceless2_> We've implemented already I believe.
<faceless2_> pending testing, of course...
<fremy> myles: priority of the feature > stage of the spec
<fremy> myles: we should design the feature well, not worry to much about which spec level we put things in
<fantasai> +1
<fremy> florian: yes, but what we are wanting to do is tie this to a new property nobody implemented
<fremy> florian: and we don't know if that property itself will survive or still function in the same way
<fremy> myles: I think it's true, but if this happens, we can revisit later
<fremy> astearns: I agree with myles here
<fremy> astearns: also, it's very separate to how line-break works today
<fremy> astearns: this extra switch doesn't sound like very good design to me
<fremy> florian: ok, I rescind earlier's me comment
<fremy> astearns: sounds like we are in agreement to resolve to add one more value to wrap-before/after, which would specify which chararcter we want to emulate
<fremy> astearns: is that correcT?
<fremy> faceless2_: does that make sense as a single property?
<fremy> koji: yes, maybe we want only want property, a "wrap" shorthand
<fremy> fantasai: but he also mentioned that it was rather non-specific as a name
<fremy> fantasai: and could be confusing
<fremy> fantasai: also, this wouldn't encompass "wrap-inside"
<fremy> fantasai: but maybe "wrap-as: ideographic"
<fremy> koji: I like that naming
<fremy> koji: maybe we can have different ideas
<fremy> koji: but one nice thing is if you apply on an inline box, we can have each side apply to the first/last character of the inline
<fremy> fantasai: yes
<fremy> florian: I dont like wrap-as: avoid
<fremy> florian: maybe wrap-outside: ideographic/avoid ?
<fremy> fantasai: I like that
<fremy> fantasai: I am worried about changing the class of the chars though
<fremy> fantasai: because it also affects the breaking between first and second
<fremy> fantasai: so I would say "for the purpose of breaking before" the first character
<fremy> fantasai: (abc) + wrap-outside: avoid should not affect breaking between a and b
<fremy> koji: not sure I see what is wrong
<fremy> fantasai: because that is affecting the inside of the element
<fremy> fantasai: while we are trying to change the behavior outside
<fremy> koji: yeah i understood correctly
<fremy> koji: I have use case for that I think
<fremy> koji: elements never break, unless it's inline block
<fremy> myles: but this issue is about atomics?
<fantasai> https://www.w3.org/TR/css-text-4/#wrap-before
<fremy> fantasai: yeah but wrap-before applies to inlines too
<fremy> fantasai: so we need to define an effect for them as well
<fremy> koji: hence what I proposed
<fremy> fantasai: then I would prefer another property
<fremy> fantasai: I really don't find the proposal to change the breaking inside for changing the behavior outside
<fremy> astearns: and that would allows combinations too?
<fremy> fantasai: yes, but there is no combination that makes sense
<fremy> fantasai: (flex is special, and the others don't care about character class)
<fremy> fantasai: but if that's not possible to implement
<fremy> fantasai: then we need another property
<fremy> myles: yes, it's worth talking about implementatibility
<fremy> myles: when we compute the line breaking opportunities, we have a big string, and opportunities
<fremy> myles: the model we propose with before/after is not compatible with how line breaker work today
<fremy> myles: so I am in favor of a single property that works on both sides
<faceless2_> q+
<fremy> astearns: if it doesn't really make sense to have separate switches for both sides
<fremy> astearns: then a new property that affects both is better
<faceless2_> <p>a <span style="margin-left: 5em; white-space:pre>&#0a;</span> b</p>
<astearns> ack faceless2_
<fremy> astearns: correct?
<fremy> faceless2_: we had one use case where this didn't apply to an atomic inline
<fremy> faceless2_: (...)_
<fremy> fantasai: yeah, I don't think we were proposing to remove the properties alltogether
<fremy> fantasai: just that for the specific use case of atomic inlines, we should have a separate one
<faceless2_> My example above was a case where suppressing line-breaking before a non-atomic inline was useful - in that example we would want to prevent the break before the <span>, due to the force break inside it.
<fremy> astearns: ok, so what I am hearing is support for "wrap-as" with values for atomic inline
<fremy> myles: and editors need to figure out interactions with the rest
<fremy> fantasai: I don't think it
<fremy> fantasai: .... is too difficult
<fremy> koji: what about the values? a string would be nice?
<fremy> fantasai: I am ok with the spec behavior described as that
<fremy> fantasai: but I would rather specify keywords
<fremy> fantasai: that would be map to some specific strings
<fremy> myles: was the proposal for the string to be a single char?
<fremy> myles: or "ideographic"
<fremy> koji: no, the char between quotes
<fremy> myles: then I think I agree with florian and fantasai
<fremy> fantasai: and I don't think people will even see this behavior as using ideographic
<faceless2_> -1000 to nomal
<fremy> florian: "normal"?
<fremy> astearns: doesn't mean much to me though
<fremy> fantasai: I think it's decent name; "normal" is ID just because
<fremy> fantasai: it happens ID is the best char to map to to have the desired behavior
<fremy> astearns: proposed resolution is to add "wrap-as" and values, details TBD later
<fremy> astearns: RESOLVED: add "wrap-as" and values, details TBD later
<fremy> florian: level 3?
<fremy> fantasai: no ^_^

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Break.

The full IRC log of that discussion <fremy> Topic: Break
<fantasai> I was thinking 'wrap-as: break-all | normal'
<fantasai> with break-all as the initial value
<fantasai> or something like that I guess it's not clear it only applies to objects
<fantasai> :/
<myles> i disagree with these names
<myles> we can discuss it in github i guess
<fantasai> myles, basically I think we should be clear with the initial value that it breaks everything
<fantasai> and that the other value is treating it as text-like
<myles> fantasai: how about `break-all | ideographic`
<fantasai> I don't like using ideographic because it sounds like the wrong thing to use for most people who will want it
<fantasai> It sounds like only CJK will want to use that value, but in fact it's useful in many more contexts...
<myles> i expect most people will want to use the break-all value
<fantasai> we didn't choose to emulate ID because of CJK, we chose to emulate ID because it happened to have the correct line-breaking behavior
<fantasai> myles, I don't think so
<fantasai> break-all is the default, but it doesn't give sensible behavior in running text
<fantasai> it breaks against nbsp
<fantasai> it breaks against )
<fantasai> it results in very awkward breaks if you actually use it in running text
<myles> right, most images are images. most images don't look like inline text
<myles> they should break on both sides by default
<jfkthame> advantage of `ideographic` is the clear mapping to the unicode line-break algorithm
<fantasai> jfkthame, yes, but that's helpful to implementers not to users :)
<jfkthame> we could use `ID` if you don't want it to sound so clearly CJK-ish
<myles> i think it's helpful to users. it tells them "what kind of text this image should behave as"
<fantasai> myles, most images aren't used as inline-level content in effect
<fantasai> myles, most people don't know about line-breaking rules for languages other than their own
<astearns> github: https://github.com//issues/4949
<fantasai> myles, ideographic is extremely cryptic
<jfkthame> in the event we add more values (e.g. like closing-punctuation, opening-punct, etc) we'll care about that mapping being clear
<myles> we may want to add "alphabetic" one day, and having it be `break-all | normal | alphabetic` doesn't make any sense
<fantasai> myles, to the extent that images are mixed just with other images, they will continue to break
<myles> right, and that's not a bug
<jfkthame> i fear that if we try to do something other than follow the unicode classes we may paint ourselves into an awkward corner
<fantasai> myles, to the extent that they're mixed with punctuation, they should follow kinsoku rules
<fantasai> myles, treating as ID does both of these things
<myles> only if they're supposed to be texty
<fantasai> myles, breaking "([image])" inside the parens is never ok
<myles> disagree
<TabAtkins> ScribeNick: TabAtkins
<myles> if the image is a picture of a tree
<myles> then i want it to break on both sides
<fantasai> why????
<TabAtkins> don't break the forest for the trees
<fantasai> that makes no sense
<myles> cause it doesn't look like text
<fantasai> you put it in parens
<fantasai> don't care what it looks like, I can't imagine anyone wanting that to break
<myles> that is how all browsers behave on all content today. hard to argue it isn't a sensible default
<fantasai> if you didn't put it in parens, whatever.
<TabAtkins> Yeah, having a ( at the end of a line, then the tree and ) at the start of the next line, seems like it woudlo be broke-looking

@litherum
Copy link
Contributor

litherum commented May 6, 2020

I'd like to propose the syntax wrap-as: normal | ideographic where normal is the initial value.

@kojiishi
Copy link
Contributor

kojiishi commented May 7, 2020

I'd like to propose the syntax wrap-as: normal | ideographic where normal is the initial value.

I like it. How about adding a few more? Not wanting to break before or after might be useful for some types of images.

Also great to define whether the table cell width calculation quirk should be applied or not for values other than normal.

@fantasai
Copy link
Collaborator Author

fantasai commented May 14, 2020

I don't like it, because ideographic does not convey the useful things to authors who don't use CJK. We chose to match ID class because it has the right behavior, not because we wanted to match CJK. This kind of naming is helpful to people who implement a line-breaker, not to authors using the property.

@fantasai
Copy link
Collaborator Author

fantasai commented May 14, 2020

Alternate syntax: wrap-as: break-all | letter | word where 'letter' behaves like AL and 'word' behaves like ID.

'break-all' is the initial value for legacy reasons, and breaks against everything including nbsp. (I would not call the initial value's behavior "normal" except insofar as it's legacy behavior, it does very weird things when mixed with text. Breaking within "IMGnbspIMG" or "(IMG)" is very very weird.)

@kojiishi
Copy link
Contributor

I'm fine with either break-all or normal, but letter and word don't work well for scripts that do not use spaces to delimit words. Alphabetic and ideographic looks more correct to me. @r12a, do you have suggestions?

As above, I would like to add open, close, and exclamation. Many chat apps use images for Emoji, and exclamation mark images are common. There are ~40 classes in UAX#14, probably all of them are too much, but these 3 are useful.

@kojiishi
Copy link
Contributor

I don't like it, because ideographic does not convey the useful things to authors who don't use CJK.

We can add emoji alias if people outside CJK don't seem to understand what ideographic would behave.

@kojiishi
Copy link
Contributor

graphic-symbol, from wikipedia.

@fantasai fantasai removed the css-text-3 Current Work label May 25, 2020
@frivoal
Copy link
Collaborator

frivoal commented Oct 20, 2023

@kojiishi Do you think we need the subtle differences between the CL, CP, and EX classes? Theoretically, I suppose we could come up with use cases for most UAX14 classes (as you could have picture-based representation of mostly anything), but in practice, it seems to me that one type of closing (probably based on CL) is good enough, and it is much simpler for authors to only need to deal with one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Thought
Projects
None yet
Development

No branches or pull requests

8 participants