Specify attribute term delimiter as post-normalized space #191

nigelmegitt · 2016-09-29T15:27:03Z

See also #185 and #170 for background: the current use of <lwsp> permits white space even though XML attribute normalization would remove leading and trailing white space and replace intermediate strings of white space with a single #x20 character. My proposal for this was to replace <lwsp> with <nsp> where:

<nsp>: #x20 after applying the normalization rules in [1]

[1] https://www.w3.org/TR/REC-xml/#AVNormalize

Right now, traversing all the links from https://w3c.github.io/ttml2/spec/ttml2.html#reduced-infoset-attribute through the term definition and the reference into https://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.attribute , we already specify attribute values in terms of normalized values in the reduced infoset, so the use of <lwsp> is actually rather difficult to achieve - anything other than a single #x20 character would have to be escaped. However it is possible to escape those characters. I do not know why that would be useful.

Some (non-mutually-exclusive) proposals to allow for simpler implementations:

Add an informative note that the processing of XML normalized attribute values may limit the type of character that could appear in linear white space.
Add feature designators to indicate that processors handle/do not handle escaped whitespace characters that pass through the normalization process, and that documents contain/do not contain such escaped whitespace characters.
Add an additional requirement to de-escape escaped whitespace characters prior to the XML attribute value normalization process so that the resulting information set never has leading or trailing whitespace and always has exactly one #0x20 character between terms.

The text was updated successfully, but these errors were encountered:

skynavga · 2017-05-16T18:09:58Z

So, the facts are as follows:

TTML is neutral to the concrete representation of a document instance, and merely recommends XML (in the absence of other requirements); consequently, we can't say for certain that XML space normalization has occurred on attribute values prior to creating their counterpart in the reduced infoset;
even if XML is used, one can escape whitespace to avoid XML normalization;

In conclusion, we need to retain the current definition of and not refer to XML normalized space. Therefore, no action is required on this issue, so closing.

nigelmegitt · 2017-05-19T16:16:36Z

@skynavga this is a bit surprising. Firstly, we do require attribute value normalisation when constructing the XML infoset, independently of the concrete representation of the document instance, and secondly you seem not to have addressed the third proposal at the end of #191 (comment):

Add an additional requirement to de-escape escaped whitespace characters prior to the XML attribute value normalization process so that the resulting information set never has leading or trailing whitespace and always has exactly one #0x20 character between terms.

This would ensure that implementations always get a consistent single #x20 between terms regardless of how the document is represented, in other words we would have a processing model with less implementation complexity in handling a variety of white space scenarios. Is that not a good idea?

skynavga · 2017-05-19T17:00:34Z

@nigelmegitt your proposal contradicts the algorithm specified in https://www.w3.org/TR/REC-xml/#AVNormalize

Furthermore, implementations do not assume that normalization applies to character references that allow inserting non-normalized whitespace in attribute values; for example, TTV tests for the presence of whitespace padding around an attribute value and reports an error if it appears; testing this verification process requires the ability to insert non-normalized whitespace in this context, which is done using character references; with your proposal, the expansion of character references would have a second pass of normalization, and would prevent testing the padding detection;

I would suggest we limit changes to adding a note under B.3 [normalized value] that reminds reader that XML normalization does not normalize character references, and, consequently, unnormalized XML whitespace characters 	 (HT), 
 (LF),  (CR), and  (SPACE) may appear in a [normalized value] item;

nigelmegitt · 2017-05-22T08:19:17Z

OK, I do not understand which part of a pre-processing algorithm can be contradictory to a step that comes after it, but on further reflection, most of what I am proposing here is about implementation optimisation, an area we don't need to define normatively.

Adding a note under B.3 as you suggest seems like the best way to go. I'll prepare a pull request.

…#191).

Clarify that unnormalized whitespace may appear in [normalized value] (#191). Merging this editorial only PR.

nigelmegitt mentioned this issue Dec 19, 2016

Specify attribute value syntax post-normalization #190

Closed

skynavga modified the milestone: TTML2WR Feb 23, 2017

nigelmegitt mentioned this issue Mar 21, 2017

Attribute syntax definition: missing spaces w3c/imsc#221

Closed

skynavga self-assigned this Apr 20, 2017

skynavga removed their assignment May 11, 2017

skynavga closed this as completed May 16, 2017

nigelmegitt reopened this May 19, 2017

skynavga added editor considers closed and removed editor considers closed labels May 21, 2017

nigelmegitt self-assigned this May 22, 2017

nigelmegitt mentioned this issue May 22, 2017

LWSP in attribute and value expressions? #315

Closed

skynavga added a commit that referenced this issue May 28, 2017

Clarify that unnormalized whitespace may appear in [normalized value] (…

5696dfa

…#191).

skynavga assigned skynavga and unassigned nigelmegitt May 28, 2017

skynavga mentioned this issue May 28, 2017

Clarify that unnormalized whitespace may appear in [normalized value]… #343

Merged

skynavga closed this as completed in #343 May 28, 2017

skynavga added a commit that referenced this issue May 28, 2017

Merge pull request #343 from w3c/issue-0191-clarify-normalized-value

05757be

Clarify that unnormalized whitespace may appear in [normalized value] (#191). Merging this editorial only PR.

skynavga removed their assignment May 28, 2017

skynavga added the pr merged label May 28, 2017

cconcolato mentioned this issue Jan 18, 2018

Incorporate resolutions of additional TTML1 Issues. #358

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify attribute term delimiter as post-normalized space #191

Specify attribute term delimiter as post-normalized space #191

nigelmegitt commented Sep 29, 2016

skynavga commented May 16, 2017

nigelmegitt commented May 19, 2017

skynavga commented May 19, 2017 •

edited

Loading

nigelmegitt commented May 22, 2017

Specify attribute term delimiter as post-normalized space #191

Specify attribute term delimiter as post-normalized space #191

Comments

nigelmegitt commented Sep 29, 2016

skynavga commented May 16, 2017

nigelmegitt commented May 19, 2017

skynavga commented May 19, 2017 • edited Loading

nigelmegitt commented May 22, 2017

skynavga commented May 19, 2017 •

edited

Loading