Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whitespace handling when xml:space="default" #224

Closed
chrisb-bbcrd opened this issue Mar 27, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@chrisb-bbcrd
Copy link

commented Mar 27, 2017

A feature of the fillLineGap example file (example 7) in IMSC1.0.1 has raised a question regarding the handling of whitespace, which @nigelmegitt has suggested I raise here to get clarification.

Example 7 has tabs at the end of some of its lines. The section I'm particularly interested is as follows (with tabs shown as '\t'):

[...]
<span style="spanStyle">jumps over the </span><span style="spanStyleSmall">lazy</span><span style="spanStyle"> dog</span><br/>\t\t\t\t
\t\t\t\t<span style="spanStyle">##Line gaps##</span>
[...]

Between the <br/> and the last span in this section we have an anonymous span with the following text:

"\t\t\t\t
\t\t\t\t"

As I read the specs, the linefeed-treatment and white-space-collapse rules apply as follows:

  1. Replace newline by space:

"\t\t\t\t \t\t\t\t"

  1. Collapse down the whitespace, leaving the initial tab:

"\t"

Then, when it comes to line building, the last line of the block will contain the final span ("##Line gaps##") preceded by the single remaining tab character. According to the suppress-at-line-break="auto" rules, only space (U+0020) characters have a value of 'suppress' applied to them. Thus, the white-space-treatment="ignore-if-surrounding-linefeed" rules won't remove the tab at the start of this final line, and the line is rendered with an indent.

If the sequence I've just outlined is the correct interpretation of the specs, then Fig. 1, showing how the lines should be rendered, is wrong, as the last line in each image should be indented.

The question is: Is this the correct interpretation of the specs?

@skynavga

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2017

@chrisb-bbcrd

This comment has been minimized.

Copy link
Author

commented Mar 28, 2017

[...]

No. The sequence of tabs and the space from the newline are collapsed to (SPACE), and not (HT). See [1] for details. [1] https://www.w3.org/TR/xsl/#white-space-collapse

That doesn't seem to correspond with the rules in the referenced section:

Specifies, for any character flow object such that:

  • its character is classified as white space in XML, and
  • it is not, however, a U+000A (linefeed) character, and
  • the immediately preceding flow object is a character flow object with a character classified as white space in XML or the immediately following flow object is a linefeed,

that flow object shall not generate an area.

So whitespace characters (other than linefeed) don't generate an area if the immediately preceding character is another whitespace character. In the string above ("\t\t\t\t \t\t\t\t"), that applies to every character except the first tab; therefore what remains after this rule is applied is a single tab, is it not?

@skynavga

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2017

@palemieux

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

Is this the correct interpretation of the specs?

Looks like it. I suggest filing a bug against Example 7 to remove tabs.

@nigelmegitt

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

This is scheduled for discussion in tomorrow's TTWG call. I suspect we probably need a bug against TTML1 and TTML2 which are the same in this respect, if this behaviour is not actually what we want.

Alternatively we may reasonably conclude that what we have is deterministic and that the only improvement needed is some informative explanation to warn people about this particular scenario.

@palemieux

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2017

@nigelmegitt

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2017

Meeting 2017-03-30: All agreed to fix the example to remove the tabs.

There appears to be a discrepancy between the spec detail and what implementations do; it is @skynavga 's view that implementations such as Antenna House and FOP would not present anything for the first tab, so if this differs from what the specs say then something needs to be clarified in TTML1 and TTML2. Having investigated during the meeting, the origin of the XSL-FO attributes was in CSS2 yet appear to differ from the CSS2 white-space: normal; property.

Aside from fixing the example in IMSC @skynavga will raise issues on TTML1 and TTML2 and communicate out to other experts who may be able to assist in the correct interpretation. It could be that some changes are needed to clarify this in TTML.

@palemieux

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2017

Filed #225

@palemieux palemieux closed this Mar 30, 2017

fdo-mirror pushed a commit to freedesktop/gstreamer-gst-plugins-bad that referenced this issue Apr 20, 2017

ttmlparse: Convert tabs to spaces in input
The TTML spec has an issue in which tab (U+0009) characters that are
first in a sequence of whitespace characters are not suppressed at the
start and end of line areas. This issue was reported in [1] and the
editor of the TTML specs confirmed that this was not the intention
behind the spec.

The editor has created an issue to fix this in both the TTML1 and TTML2
specs [2], giving a proposal of what the spec should say. This patch
updates ttmlparse to implement the intended behaviour as proposed, in
which tabs in the input are converted to spaces before processing.

[1] w3c/imsc#224
[2] w3c/ttml1#235

https://bugzilla.gnome.org/show_bug.cgi?id=781539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.