-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity regarding tab (U+0009) processing in significant whitespace. #235
Comments
I posted the following question to Steve Zilles and Tony Graham on 2017-03-31:
|
This is basically where TTML1 processing ends since Section 7.2.3 explicitly states "The semantics of the above four cited XSL-FO properties are defined by by [XSL 1.1], § 7.17.3, 7.16.7, 7.16.12, and 7.16.8, respectively," and XSL 1.1 references CSS2. Regardless of what is done in TTML2, it sounds prudent to recommend that TTML1 author not use TAB when xml:space="default" given the ambiguity. |
@skynavga to open PR containing a note to resolve |
Noting that during the call on 2017-05-11 we tentatively agreed for TTML2 to map a tab character to a single space for presentation purposes, which does not match current CSS behaviour. The idea of mapping a tab to no presentation at all was rejected on security grounds since it could facilitate spoofing. |
Can someone clarify if this issue would have a different resolution in TTML2 and TTML1? Is there a TTML2 issue tracking it? And if TTML2's resolution is different, does this create an incompatibility between 1 and 2? |
The processing of "significant" whitespace by an XML application requires [1] that all non-markup characters be passed to the application, and, further, that the
xml:space
attribute, if declared, may be used by an author to signal the application as to whether (1) default application whitespace processing applies or (2) that whitespace should be preserved (by the application, as defined by the application).In TTML the attribute
xml:space
is "declared", and its semantics are mapped [2] to XSL-FO style properties [3], specifically:suppress-at-line-break
,linefeed-treatment
,white-space-collapse
, andwhite-space-treatment
. These properties are intended to reflect the semantics of the CSS2white-space
property [4], but at a finer level of functional granularity.Now, in the course of TTML implementation activity, it has been asked what the behavior should be regarding an element to which
xml:space="default"
applies and which content is, for example:namely, a single HORIZONTAL TAB (U+0009) character followed by a single 'X' character.
The particular question is whether the HORIZONTAL TAB (U+0009) character should:
If the answer to this question is that it should be mapped to the SPACE (U+0020) character, then a secondary question arises as to when, i.e., during which processing step, should this mapping take place?
To untangle this subject, we will need to look at the original specification of CSS2 which defines the (default) initial
normal
value for thewhite-space
property [5] as:and which, further, defines whitespace [6] as:
Now, while whitespace is well defined here, and corresponds precisely with the definition given in XML 1.1 [7], the phrase "collapse sequences of whitespace" is not well defined. In CSS2.1, this latter phrase is given more substance by defining a whitespace processing model [8], which does define an operational model that provides greater detail, including:
So, what is the problem with respect to TTML? TTML bases its definition of
xml:space="default"
semantics on XSL-FO 1.1, published in 2006, which is based on the original CSS2 that does not include the above clarifications found in CSS2.1. Furthermore, TTML bases the definition ofxml:space="default"
semantics on the XSL-FO definitions of the newly minted XSL-FO (but not CSS2) properties:suppress-at-line-break="auto"
linefeed-treatment="treat-as-space"
white-space-collapse="true"
white-space-treatment="ignore-if-surrounding-linefeed"
where these values also happen to be the (default) initial values for these properties when they are not otherwise specified.
In contrast, XSL-FO defines
white-space="normal"
aslinefeed-treatment="treat-as-space"
white-space-collapse="true"
white-space-treatment="ignore-if-surrounding-linefeed"
wrap-option="wrap"
a definition which also happens to be implicitly dependent upon the
suppress-at-line-break
property, since the interpretation ofwhite-space-treatment="ignore-if-surrounding-linefeed"
depends upon the value of thesuppress-at-line-break
property.Combining the default initial values of these properties with the definition of
white-space="normal"
, we surmise that the default whitespace processing behavior for XSL-FO is intended to align with the default whitespace processing behavior of CSS2. However, a detailed reading of the semantics of this behavior raises a number of possible problems:suppress-at-line-break
property definesauto
to suppress only the SPACE (U+0020) but not HORIZONTAL TAB (U+0009), and, further, explicitly states that all other characters are to be treated as if the valueretain
applies;white-space-collapse
property where it is stated that:To return to the example fragment of TTML cited above, absent a mapping from HORIZONTAL TAB (U+0009) to SPACE (U+0020), the whitespace processing behavior that applies to this fragment would seem to retain the HORIZONTAL TAB (U+0009) in
since, according to
white-space-collapse="true"
, we have	
is classified as white space in XML, and	
is not

, but<fo:character character="	"/>
) is not a character flow object and the immediate following flow object is not a linefeed, i.e.,<fo:character character="
"/>
so the
	
is not collapsed, i.e., it does generate a glyph area.But now, we have a problem since the (now elaborated) definition of normal whitespace processing behavior in CSS2.1 appears to call for every
	
to be mapped to 
prior to performing white space collapsing behavior.So, where does this leave us with respect to TTML? I believe we have two questions to resolve:
	
mapped to 
? If so, then in what context and during which processing step?	
is not mapped to 
, then what are the intended presentation semantics?My answers to these questions are as follows:
xml:space="default"
applies, then	
is mapped to 
prior to performing any other white space processing. This mapping would ideally occur during or immediately after constructing the reduced xml infoset of a TTML abstract document instance.xml:space="preserve"
applies, then	
is not mapped to 
, in which case the CSS2.1 semantics would apply, namely:Specification text that implements the above could easily be added to both TTML1 and TTML2, ideally under the definition of
xml:space
[9] (and its TTML2 counterpart).I don't have a strong opinion about whether we should adopt the CSS2.1 presentation semantics for HORIZONTAL TAB in cases where
xml:space="preserve"
applies. Alternative semantics could be to ignore entirely (i.e., treat like ZERO WIDTH SPACE) or treat as SPACE.[1] https://www.w3.org/TR/REC-xml/#sec-white-space
[2] https://www.w3.org/TR/ttml1/#content-attribute-space
[3] https://www.w3.org/TR/xsl/
[4] https://www.w3.org/TR/xsl/#d0e297
[5] https://www.w3.org/TR/1998/REC-CSS2-19980512/text.html#white-space-prop
[6] https://www.w3.org/TR/1998/REC-CSS2-19980512/syndata.html#whitespace
[7] https://www.w3.org/TR/REC-xml/#NT-S
[8] https://www.w3.org/TR/2011/REC-CSS2-20110607/text.html#white-space-model
[9] https://www.w3.org/TR/ttml1/#content-attribute-space
The text was updated successfully, but these errors were encountered: