Skip to content

Commit

Permalink
[css-text-3] Undefine segment break transformation rules in Level 3. #…
Browse files Browse the repository at this point in the history
  • Loading branch information
fantasai committed Nov 18, 2020
1 parent d3e4c4a commit b3bb0ed
Showing 1 changed file with 35 additions and 27 deletions.
62 changes: 35 additions & 27 deletions css-text-3/Overview.bs
Expand Up @@ -2048,13 +2048,18 @@ Order of Operations</h4>
white-space-processing-018.xht
</wpt>

<p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>.
Any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
<p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>,
and are collapsed as follows:

<ol>
<li>First, any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
is removed.
Then any remaining <a>segment break</a> is
<li>Then any remaining <a>segment break</a> is
either transformed into a space (U+0020) or removed
depending on the context before and after the break:
depending on the context before and after the break.
The rules for this operation are UA-defined in this level.

<!-- CUT SEGMENT BREAK TRANSFORM
<wpt pathprefix="/css/vendor-imports/mozilla/mozilla-central-reftests/text3/">
segment-break-transformation-removable-2.html
segment-break-transformation-removable-4.html
Expand Down Expand Up @@ -2082,7 +2087,7 @@ Order of Operations</h4>
<li>Otherwise, if both the characters before and after the [=segment break=]
belong to the [=space-discarding character set=] (see [[#space-discard-set]]),
then the [=segment break=] is removed.
<!--
<li>Otherwise, if the <a>East Asian Width property</a> [[!UAX11]] of both
the character before and after the [=segment break=] is
<code>Fullwidth</code>, <code>Wide</code>, or <code>Halfwidth</code>
Expand Down Expand Up @@ -2170,7 +2175,6 @@ Order of Operations</h4>
<wpt>
writing-system/writing-system-segment-break-001.html
</wpt>
-->
<li>Otherwise, the [=segment break=] is converted to a space (U+0020).
<wpt>
Expand All @@ -2183,18 +2187,25 @@ Order of Operations</h4>
</wpt>
</ul>
<!--
<p>
For this purpose,
Emoji (Unicode property <code>Emoji</code>)
with an <a>East Asian Width property</a> of
<code>Wide</code> or <code>Neutral</code>
are treated as having an <a>East Asian Width property</a> of
<code>Ambiguous</code>.
-->
Note: The white space processing rules have already
ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
CUT SEGMENT BREAK TRANSFORM -->

Note: The white space processing rules have already
removed any [=tabs=] and [=spaces=] around the [=segment break=]
before these checks take place.
before this context is evaluated.
</ol>

<div class="example">
The purpose of the segment break transformation rules
Expand All @@ -2210,9 +2221,10 @@ Order of Operations</h4>
Here is an English paragraph
that is broken into multiple lines
in the source code so that it can
more easily read in a text editor.
be more easily read and edited
in a text editor.
</pre>
<p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read in a text editor.</p>
<p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read and edited in a text editor.</p>
<figcaption>
Eliminating a line break in English requires maintaining a [=space=] in its place.
</figcaption>
Expand All @@ -2233,21 +2245,16 @@ Order of Operations</h4>
</figcaption>
</figure>

The segment break transformation rules thus use adjacent context
The segment break transformation rules can use adjacent context
to either transform the segment break into a space
or eliminate it entirely.
</div>

<p class="feedback issue">Comments on how well these rules would work in practice would
be very much appreciated, particularly from people who work with
Thai and similar scripts.
Note that browser implementations do not currently follow these rules consistently
(although IE does in some cases transform the break,
and Firefox follows the first two bullet points).</p>

ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?

ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
Note: Historically, HTML and CSS have unconditionally converted [=segment breaks=] to spaces,
which has prevented content authored in languages such as Chinese
from being able to break lines within the source.
Thus UA heurstics need to be conservative about where they discard [=segment breaks=]
even as they strive to improve support for such languages.

<h3 id="tab-size-property" caniuse="css3-tabsize" oldids="tab-size">
Tab Character Size: the 'tab-size' property</h3>
Expand Down Expand Up @@ -5921,6 +5928,7 @@ Characters and Properties</h2>
but take their other properties from the first combining character in the sequence.
</ul>

<!-- CUT SEGMENT BREAK TRANSFORM
<h2 id="space-discard-set" class="no-num">Appendix F.
Space-Discarding Unicode Characters</h2>
Expand Down Expand Up @@ -6069,15 +6077,15 @@ Space-Discarding Unicode Characters</h2>
the Unicode Consortium will recognize the need for an “unbreaking” algorithm
and take over maintenance of such.
<!-- things that could use an unbreaking algorithm:
things that could use an unbreaking algorithm:
* HTML/CSS
* Markdown
* TeX
* text editors' “unbreak lines” commands
-->
</details>
CUT SEGMENT BREAK TRANSFORM -->

<h2 id="script-tagging" class="no-num">Appendix G.
<h2 id="script-tagging" class="no-num">Appendix F.
Identifying the Content Writing System</h2>

<p><em>This appendix is normative.</em></p>
Expand Down Expand Up @@ -6187,7 +6195,7 @@ Identifying the Content Writing System</h2>
Note: Mere omission of the [=writing system=] information when the [=content language=] is declared
means the that the [=writing system=] is implied, not unknown.

<h2 id="small-kana" class=no-num>Appendix H.
<h2 id="small-kana" class=no-num>Appendix G.
Small Kana Mappings</h2>
<style>
.pairs-table th {
Expand Down

0 comments on commit b3bb0ed

Please sign in to comment.