Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Data Model and TTML Syntax mapping #214

Open
cconcolato opened this issue Mar 1, 2024 · 7 comments · May be fixed by #216
Open

Investigate Data Model and TTML Syntax mapping #214

cconcolato opened this issue Mar 1, 2024 · 7 comments · May be fixed by #216
Assignees

Comments

@cconcolato
Copy link
Contributor

In the context of #158 (comment) and in #192, the group noted that the specification is currently written with a data model, and for each model part, a corresponding TTML representation is defined. However, the normative TTML profile definition permits more than the TTML representation of the data model indicates. This issue is to study options to address possible mismatches. Possible options (non exhaustive, not necessarily mutually exclusive) are:

  1. add more formal restrictions to the TTML profile to match exactly what the data model permits
  2. recommend to only write TTML documents that match the data model
  3. define recommended behavior for readers encountering TTML syntax not matching the model

This issue is also related to #110

@cconcolato
Copy link
Contributor Author

Examples of possible TTML syntax that would be permitted per the profile definition but not matching the data model:

  • xml:lang on div per Clarify language application and inheritance model #192
  • div elements inside the body element not matching the characteristics of a Script Event, e.g. missing xml:id. Note that seems to be the only current requirement for a Script Event, pending resolution of Collaborative editing or partial editing - do we need "unfinished" state? #52 (comment)
  • Timing attributes (begin, end or dur) on other elements than div (and animation elements).
  • ttm:desc element elsewhere than in a div represented a script event
  • div elements nested in the div representing the script event
  • a metadata element not matching the character representation (e.g. whose type attribute is not set to character or person) or in a different location (div)

The above just focused on checking the elements/attributes already permitted somewhere in DAPT. For other TTML attributes/elements, #110 should cover that.

@nigelmegitt
Copy link
Contributor

Editor's discussion 2024-03-01:

Adopt the proposal in #110 to rename §5.2 to be about unrecognised rather than foreign vocabulary.

General agreement that for extensibility we design the spec so that adding new features does not break old processors, but that they just ignore them for semantic processing (but should preserve them syntactically).

For mapping from TTML to the DAPT data model, either:

  1. put a statement on each data model entity explaining how to do it, or
  2. add a new section (maybe a top level numbered section) explaining the general principles for mapping, being:
    • Compute the value of properties that apply to any data model entity, taking into account any inheritance semantics, using generic TTML processing rules, and only take that value into account on the entity where it applies.
    • Define any rules needed for handling a structural mismatch.

The only TTML element that structurally cannot necessarily be mapped into the DAPT Model is nested <div> - propose to add a new construct to the data model allowing arbitrary grouping of Script Events into Script Event Groups that are each a <div> and can contain other Script Event Groups or Script Events:

  • This then allows arbitrary metadata to be attached to those <div>s.
  • However we have not defined the "applies to" and inheritance models for the properties OnScreen and Script Event Type, so we should do that.
  • This approach would imply that the leaf-most <div> elements are the only ones that can be mapped into Script Events when processing a generic TTML2 document into the DAPT data model.
  • A leafmost <div> without a Script Event Identifier shall not be mapped to a DAPT Script Event. BUT what about a <div> that has both <div> children AND <p> children? EITHER:
    1. A <p> attached to a <div> not mapped to a Script Event shall not be mapped to a DAPT Text object. OR
    2. A <div> with an xml:id and one or more <p> children shall be considered a Script Event and any descendant <div>s shall not be mapped to any DAPT data model entity. OR
    3. We require that in a DAPT document a <div> cannot contain a mix of <div> and <p> children, through the use of an extension feature.

Discussion to be continued...

@nigelmegitt
Copy link
Contributor

My thought over the weekend: if we are going to write a DAPT extension feature constraining a document from having a <div> that includes both a <div> and a <p> child, we may as well constrain the document from having a <div> that can have a <div> child and keep the data model as it is.

The only issue with this is that I don't know how we would later be able to introduce nested <div> elements without breaking existing DAPT document readers. It seems reasonably likely, or at least foreseeable, that we might want such functionality in the future.

@nigelmegitt
Copy link
Contributor

Concluded discussion with @cconcolato .

Decided not to make new constraints, and not to change the data model. Instead, add a new section that explains how to map a DAPT TTML2 document into the DAPT data model.

Effectively these are like "parsing" rules, but not parsing because that's XML's domain. Rather, it should explain that the spec leaves flexibility; there can be conformant DAPT TTML documents that contain more than what is strictly in the data model. That being the case, the new section needs to explains how to retrieve or build a DAPT data model instance from a DAPT TTML document.

This also pertains to considerations about extensibility and backwards compatibility.
Should note that validators can warn if recognised and syntactically valid vocabulary does not fit data model - in this case a DAPT validator might discard such content when mapping into the data model. Validators should still error if syntactically invalid vocabulary is found even if it does not fit in the data model. This way, generic TTML2 validators can be used.

@nigelmegitt nigelmegitt self-assigned this Mar 4, 2024
@nigelmegitt
Copy link
Contributor

Agreed mapping rules (so far):

Any <div> that contains a <div> child does not map to a Script Event in the data model, but its <div> children may do (as long as they also don't contain <div> children and meets any other constraints of a Script Event, such as having an xml:id attribute).

A <p> that is not a child of a <div> that maps to a Script Event is not mapped to a Text object (or anything else in the data model). A <p> child of a Script Event <div> is a Text object.

@nigelmegitt nigelmegitt added the agenda Issue flagged for in-meeting discussion label Mar 12, 2024
@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Investigate Data Model and TTML Syntax mapping w3c/dapt#214, and agreed to the following:

  • SUMMARY: @nigelmegitt to continue drafting a pull request
The full IRC log of that discussion <nigel> Subtopic: Investigate Data Model and TTML Syntax mapping #214
<nigel> github: https://github.com//issues/214
<nigel> Nigel: The issue is that it is possible to construct TTML documents that are conformant DAPT documents
<nigel> .. but which contain things that do not map directly to the DAPT data model.
<nigel> .. Things that we considered were:
<nigel> .. Adding more constraints to the DAPT documents to prevent that;
<nigel> .. Adding generic grouping of Script Events to match nested divs
<nigel> .. Adding statements into the DAPT Data Model -> TTML representation saying how to reverse it
<nigel> .. Adding a new section explaining TTML -> DAPT Data Model mapping (we decided to do that)
<nigel> .. Add no extra constraints or features (we decided it is better not to add any, and to have explanations instead)
<nigel> .. I am drafting a pull request to add a possibly informative new section explaining
<nigel> .. suggested rules for mapping TTML to DAPT, and also updating the Foreign Vocabulary section to make it
<nigel> .. more generally apply to any unrecognised vocabulary even if it's in the TTML or DAPT namespaces.
<atai> q+
<nigel> .. Any thoughts about this?
<nigel> ack at
<nigel> Andreas: You say the new section will be informative?
<nigel> Nigel: That's what I'm thinking at the moment.
<nigel> Andreas: What's the normative expected behaviour of a processor. Is it implementation dependent?
<nigel> Nigel: It's what's defined by the TTML features and extensions
<nigel> Andreas: Er, ok. Nested divs for example, are not forbidden?
<nigel> Nigel: That's right
<nigel> Andreas: That would be part of the expected behaviour to deal with that?
<nigel> Nigel: Yes
<nigel> Andreas: You say the mapping rules will be informative, but what will the normative expected behaviour.
<nigel> Nigel: It's what TTML says. There's no normative requirement to map into the DAPT data model.
<nigel> .. The fact that a DAPT document was generated from the data model is interesting maybe but
<nigel> .. doesn't define the processing behaviour.
<nigel> Andreas: So you cannot guarantee that two DAPT data model-based implementations handle a generic TTML
<nigel> .. document the same way?
<nigel> .. There's no normative deterministic parsing into the data model?
<nigel> Nigel: That's right, but parsing into the data model isn't a requirement.
<nigel> .. There is already text around handling unknown stuff in §5.2, which is normative, and quite broad,
<nigel> .. but essentially the processing semantics are defined by TTML, because DAPT is defining a profile of TTML.
<nigel> .. Most of the extension features are constraining syntax, I don't think there are any that define
<nigel> .. processing behaviours that wouldn't apply more generally.
<nigel> .. In particular, none of the extension features is based on anything in the DAPT data model;
<nigel> .. they are all constraining the TTML representation directly.
<nigel> .. I think adding this guidance feels helpful, but the question is if it actually needs to be any more normative than guidance.
<nigel> .. I suspect you're thinking about it and need to see the pull request.
<nigel> Andreas: Yes, it would be good to see it written down and then play it through.
<nigel> Nigel: Sure, I just wanted to inform you where we got to and the direction of travel.
<nigel> .. Happy to have any comments either on the issue or the pull request when opened.
<nigel> SUMMARY: @nigelmegitt to continue drafting a pull request

@palemieux
Copy link

In my experience, the process of mapping from TTML/XML to an internal model is a two-step process:

  1. apply xml:lang inheritance rules throughout the entire TTML/XML document to obtain a computed value of xml:lang on every element
  2. when mapping XML elements to internal model elements, use the the computed value of xml:lang on the XML elements to set the language of internal model elements

@nigelmegitt nigelmegitt removed the agenda Issue flagged for in-meeting discussion label Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants