New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{granularity: "line"} promotes reimplementing paragraph layout in script #49
Comments
This is indeed I thought it would be useful for the cases where HTML and CSS doesn't quite cut it by itself, for example certain rich text editors. I hope people don't use it for cases where it is unnecessary. Maybe we can be more explicit about this in the documentation. In the context of the ongoing development of Houdini custom layout APIs, it seems only right if we expose the primitives for line breaking. About accessibility, it is definitely true that extra work would be required to retain accessibility if a site is doing custom line breaking, but do you see the platform as missing any primitives to implement it? |
This can be used not only in Web API but also in Node.js for a line by line text processing if I understand correctly. |
Node.js modules are not web standards. Historically, “I hope people don’t use it for cases that are unnecessary” hasn’t worked on the Web. People abuse our APIs all the time. One of the difference between the Houdini APIs and this is that the Houdini APIs improve your ability to do something you already were able to do before. Before custom paint, there was canvas; before custom layout, there was absolute positioning, etc. There are currently no facilities on the web for performing line breaking. |
@litherum Well, this subsumes Chrome's non-standard It's possible for JavaScript itself to implement line breaking without platform support, and there are many npm modules which do so, such as css-line-break. |
I do not think Web platform should monopolize JavaScript or veto new futures unless they break the Web. JavaScript is general-purpose language now and maybe one of the Perl replacement for the text processing. So it would be a pity to limit its possibilities with a perpetual caution against a possible misuse in the Web. |
I don't think we have to think of this as either-or. We have strong motivating use cases both on and off the web platform. |
Consider the following condition: assume we eventually removes {granularity: "line"} from this API, then those who WANT to layout the line themselves in JS will then be forced to and still can misuse the {granularity: "word"} in place of the lacking of {type: "line"} in this API, and they will still be able to layout as what they want to do, but with worst result, which will sort of work in the English / Latin based page but poorly support the Japanese/Chinese pages. Isn't that an even worst API to promote? |
The objection of supporting {granularity: "line"} is based on the assumption that ECMA402 is always operated under in a world which also have HTML and CSS. But that is simply not true. The entire ECMA402 (see https://ecma-international.org/ecma-402/) has no notion of CSS or HTML and is a lower level library that has less constraints than the functionality in a CSS/HTML based environment. For example, all the ECMA 402 functionality are accessible inside web worker and web worker has no access to DOM nor CSS. With {granularity: "line"} support for Intl.Segmenter, Javascript in Web Worker could break the line based on Intl.Segmenter and other constraints (not necessary font metrics or width in pixels). Also notice not all the web application are operated under HTML and CSS, and therefore in those environment HTML+CSS are not simply not accessible. For example, let's say an user want to support Chinese / Japanese rendering in https://delphic.me.uk/tutorials/webgl-text |
Another example is considering javascript in a Web Worker receiving text from the UI thread and create PDF file as output. The rendering context will be PDF, not HTML with CSS. It will require a line breaking support for the line layout but it cannot depend on HTML/CSS since there are no need to have HTML + CSS to generate PDF file. For example, take a look at https://parall.ax/products/jspdf . The jsPDF will need to line layout the text into PDF, not into HTML and the implementation of the jsPDF is in JavaScript and therefore can access ECMA 402. I am not sure it can currently support multiple line layout, but if one day it want to, it will need to have line breaking support other than one bind with HTML+CSS. For example, switch to the "String Splitter" example on that page to see such usage. It currently won't display Chinese/Japanese correctly but that is due to the fact of the lack of Chinese / Japanese pre-install with PDF viewer and could be addressed by adding embedded fonts. |
This sounds like an argument for not having You're right that people can do line segmentation badly even without My proposal doesn't remove functionality that apps have today. It simply doesn't cater to the problematic use cases.
We can all agree that the Web is a major client ( I'm not arguing against having libraries that do line breaking. I'm arguing that it shouldn't be part of the standard library.
By using the
All major browsers / operating systems support printing to PDF. I don't understand this use case. |
So you now agree that your argument of "user may misusage it to line segmentation badly" disadvantage concern is NON-UNIQUE because such behavior could happen with or without this function in the library, right? In other words, regardless we add this function into the library or NOT, it won't produce any worst result to the web with this addition.
|
I prefer keeping line break in the proposal. Reasons for my position include:
Our official charter says:
It says nothing about the browser versus Node.js, just "programs". |
A number of us from Google, Apple, Mozilla and Igalia got together and discussed this issue. We concluded that line breaking would be better developed as a Houdini API, since it inherently has to do with paragraph layout. I plan to follow up with a PR to remove line breaking from this proposal. |
It sounds like a lot of smart people reached this conclusion, so I'll defer to that, even though I'm not sure I agree. However, in general I'm not comfortable with a subcommittee of a subcommittee making a major decision without conferring back for consensus to the full subcommittee. |
+1 we need to discuss this in the Intl meeting before making the spec change. |
I'd not repeat points made by @FrankYFTang and @vsemozhetbyt .
I don't get why having line-breaking support is incompatible with web. I'd not characterize opening up a door for a possible misuse as 'incompatible'. |
There’ll be less risk of such an API being misused once the Web Platform provides a dedicated text/font layout API (outside of ECMAScript). If we exclude |
If the conclusion is for such remove, I would propose we also change in the end so any JavaScript engine decide to still ship with { granularity: 'line' } will still be considered as conforming to the ECMA402 spec the the browser choose to do so. |
I strongly object to this. Interoperability of implementations is critical. |
Then we must fight tooth and nail to keep { granularity: 'line' } in the spec so our browser can ship with it with both conformance, interoperability and also fulfill customer need. We really have to ask ourself one important question- are we taking a humbling position to empower / entrust that some of the developers will do the right thing to call these APIs, or taking an arrant position to assume developers will likely to misuse these API? I believe the solution of depending on CSS/HTML to solve every line layout is reasonable for most of the case, but in reality not necessary fulfill all possible requirement for our developer. For example, some developer may need to line layout the text by themselves because they have to mimic the line layout behavior of some proprietary software dominated the markets for several decades and it may not be practical to turn all the behavior (even a buggy one) of the line layout of this software they intend to mimic into a standard (since it is buggy and we should not encourage the misusage of those) |
IIRC your browser already ships this without conformance or interoperability. |
OK, let's discuss this further in the next Intl meeting. Let's see if we can bring in some more attendees of that breakout to explain the logic. Cc @tabatkins Note that supporting an additional granularity would be a spec violation, as GetOption is specified to throw an exception on an invalid option (but browsers have tons of know spec violations, so maybe this isn't the end of the world). If we end up going with the line segmentation API deferred until Houdini, I would suggest that Chrome might want to keep its legacy Intl.v8BreakIterator around a bit longer, rather than adding an additional non-standard API. |
If this functionality ends up being part of Houdini, it seems to follow that it won’t be available natively in non-browser envs, correct? It’d continue being userland implementations? |
You recall incorrectly. |
@mathiasbynens I read that as a reference to Intl.v8BreakIterator. |
so... one main reason @litherum propose to drop "line" is because he afraid of "misuage from web developers" and "incompatible to web platform"
|
So, here's the deal. Of the four possible break/segment types, three of them give a useful semantic meaning to the segments between breaks - graphemes, words, and sentences are all meaningful units on their own. One, the line-break type, does not have a meaningful semantic for the segmentation; the stuff between successive breaks are just "fragments of text that linebreak atomically". The upshot of this is that the three "semantic" categories are useful for lots of things beyond layout. You can do word counting, or highlight the entire sentence that a find-in-page match is in, etc. A lot of these things can be done purely in JS or with DOM operations, not invoking the layout engine at all. The line segments, however, don't have this. The sole use for these segments is to collect a sequence of them, see if they'll fit into a container without breaking, and then add more until they do (then back off one to have "a line worth" of text). This operation has no meaning without layout of some kind; you can't tell just by looking at the segments how many you'll need, you need to lay out the text and take measurements. Unless you have monospace text (or something like a dot-font for an LED display that has known character widths and no kerning), doing this requires you to invoke a text layout engine, and thus (generally speaking) requires a browser doing layout. Further, if you are doing text layout like this, you can't even use these line-break segments for it. Actually laying out text properly requires, for example, properly handling bidi - this means you'll need to skip around in the segment list in a somewhat complicated way that does not match the indexing. Giving people an API that acts like you can just accumulate segments in index order is handing them a footgun. Even further in the same concern, note that there is a special argument (line break style) intended solely for configuring the "line" break type; it has no meaning for the other break types. This reproduces the effect of one of CSS's properties for controlling line-breaking; as you can see from https://drafts.csswg.org/css-text/ and https://drafts.csswg.org/css-text-4/, however, there are many options that you have to worry about if you want to linebreak and lay out text professionally - whitespace handling, emergency breaking, hyphenation, etc. So the proposal as it stands carves out a funny special-case for the "line" break style only, and does so very incompletely. This is not what we want to leave authors with! This is the difference Apple is concerned about. If the sole use-case for the "line" break type is to do text layout, and there are zero use-cases for it outside of this, and it doesn't even let you do text layout well in the first place, then why is it here? This functionality belongs in a properly designed text-layout-related API, like Houdini's inline layout stuff coming down the pipeline, which can handle all the weird corner cases properly for you and provide you with all the information you actually need (instead of silly hacks like doing actual text measurement in an off-screen iframe...). Then Intl.Segmenter can remain a more narrowly-focused and coherent design capturing actual semantic groupings of text, each with a multitude of uses outside of text layout. |
Notice, when the web was invented and used with html3, none of these "many options that you have to worry about" you mentioned were addressed but still got used widely, but it does handle line break of different languages. This history prove it is far more important to break the line linguistically correct than handle all these secondary style issues you mentioned above. So it is surely important to provide this functionality alone for the use case for an important existing web api- the https://www.w3.org/TR/2dcontext/ . It does not mean the issues you mentioned are not important, but they are far less important than breaking line linguistically correct. Also there are no need to be address in the same time. (just as browser support HTML3 not even 4 can be widely accepted without them 20 years ago) litherum 's starting argument is "The only use case I can imagine for line break iterators would be people trying to do their own paragraph layout themselves (e.g. eventually painting into a canvas)." I cannot see why it is not important or not appropriate. All major browser support canvas and it is W3C standard. Why should we remove this facility to not supporting this use case that all browser, including Apple, already support? What is wrong to increase such usage in the web? There are complex usage and there are simple use case. The existence of complex use case should not be the reason to remove the need to address simple use case. And most users are under simple use case anyway. litherum said "The best way to perform paragraph layout in a browser ... " |
Note that
If we're shipping this feature with the goal of matching HTML3-era text layout, then, uh, I think we're doing everyone a huge disservice. That said, HTML3 still had far more sophisticated text layout than what Remember, we're not saying never do text layout in JS! We have proper text-layout APIs coming down the pipeline right now - they'll likely be finished and shipping in a year or two! It's just that this specific API is very, very weak for text-layout use-cases, but has no other use beyond text-layout, so it doesn't actually pay for itself, and doesn't have a growth path to become more suitable in the future. (That is, an actual usable text-layout API will look totally different; there's no evolutionary path available here.) |
FWIW, I would not say line breaks are all layout and no semantics. As for human-readable texts, I can think of poetry (where not so rarely line breaks are the only formal prosodic, semantic and syntactic delimiters). Also, think of various lists, table of contents etc. As for semi-human-semi-machine-readable texts, you can think of various line-delimited configs or DSL formats (for example, in some digital dictionary formats, line breaks have key meaning). And JavaScript is not web-only language anymore. |
|
I do not think this is an all-changing difference in the mentioned contexts. We cannot predict all the possible ways and links between opportunities and realization. |
What can I use to reformat hard-broken plain text outside of Web rendering instead of this API? |
It is indeed an all-changing difference, because you're talking about a 100% different things. "Line breaks", as in, a place where a line is purposely broken, are indeed meaningful; they're also indicated by a line-break character. This API under discussion has nothing to do with that; it tells you where, in a string of text, it's appropriate to insert a soft line break, and continue on a further line. In English, for example, if we ignore hyphenation entirely, this will roughly divide up a string into words, plus some additional breaking around punctuation like dashes. Like, given the string "The over-world beckons.", it would return a sequence like |
Not this, except in special circumstances. If you're trying to reformat monospace "plain text", with absolutely no style variations or bidi or anything, then this API is sufficient. You just collect as many segments as will fit on a line. If your font is variable-width, this is insufficient. You have to pair it with If you have anything more complex than this, this API does not help at all. The future Houdini Text Layout API will do the job. |
So the question is: is a possible danger of web abuse more significant than the mentioned API sufficiency for common cases in the wider context to such an extent that we need to remove it from the language level and outsource to some web framework? |
"common cases" is overselling, I think. ^_^ And no, abuse is one aspect of it. It's also simply insufficient to do reasonable text layout. The core thing it's moderately useful for would be to lay out text in a monospace console, and even then, as stated up in my first post, it doesn't provide enough knobs to do that well. (It only has a single knob to twiddle, the equivalent of the This API simply isn't fit for purpose, as far as I can tell. It exposes a single aspect of a larger problem, but without providing a more complete solution for that larger problem, this doesn't do anything sufficiently useful to justify specifying, implementing, testing, and shipping it. |
@tabatkins thanks for explaining the reasons so clearly. Incidentally,
is a good description of typical box-drawing in a terminal, which is what made this interesting to me for Node usage. But the argument that it doesn’t belong alongside the other segmentation functionality on Intl (and may be inadequate in any case) is convincing. |
EDIT: See following comment. It appears that the Docs team had used v8BreakIterator for this purpose in the past, but is no longer using it for line breaks. |
Line breaking iterator alone is never sufficient for line-layout. It's just one piece of information for properly laying out paragraphs and other forms of text. Obviously, it has to be used together with font / text measurement. Those against including line breaking argued that including line breaking would lead to misuse/abuse. Well, my prediction is that NOT including line breaking will lead some folks to come up with their own device using word/grapheme break iterators included in Intl.Segmenter. And, the result would be worse. As for https://drafts.css-houdini.org/, when is it expected to be spec'd out and implemented by major players? Font metrics API proposals have come and gone since 2010... BTW, the current CSS line breaking does not work well for multi-line heading, movie/song/book titles, product names, ad copy for CJK because for those applications, part-of-speech tag is also necessary and browsers are not likely to have that info available anytime soon, which means whatever Houdini does is not sufficient. Segmentation along with PoS has to be combined with font/text measurement to support a satisfactory line-breaking for multi-line heading, song, movie titles and ad copy. See some examples at https://github.com/google/budou . |
Correction: with Jungshik's help, I did some more investigation of Google code, and it seems that although Docs may have used v8BreakIterator for line breaking in the past, I cannot find any instances right now where they are currently using it in line break mode. |
The only use case I can imagine for line break iterators would be people trying to do their own paragraph layout themselves (e.g. eventually painting into a canvas).
The best way to perform paragraph layout in a browser is to use HTML elements and CSS. An author trying to do it themself with Javascript would almost certainly be both slower, less correct, and less accessible than doing it with the browser's engine.
This probably isn't true for the other segmenters - I can think of plenty of use cases for the other ones, but if there is wide adoption of line breaking, specifically, it would be unfortunate for the Web.
The text was updated successfully, but these errors were encountered: