Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section about mapping from TTML to the DAPT data model #216

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

nigelmegitt
Copy link
Contributor

@nigelmegitt nigelmegitt commented Mar 21, 2024

Closes #214 and #110 and #234.

Adds a new section about the mapping from TTML to the DAPT data model.

A mix of normative and informative contents.

Clarifies the TTML document type, and provides more clarity about both the TTML representation and the mapping back from TTML into the DAPT data model. Optional #scriptEventMapping feature added.

This PR changed significantly since first being opened, to address review feedback and in-meeting discussions especially about forward and backward compatibility.

Changes the Foreign elements and attributes section to Unrecognised vocabulary and adds a SHOULD requirement on presentation processors to ignore unrecognised vocabulary, and a MUST requirement on transformation processors to prune unrecognised vocabulary except under <metadata> elements.

Transformation processors are prohibited from putting values into ttp:contentProfiles for profiles that they (the processors) don't support.

Also includes informative discussion about forwards and backwards compatibility within the section on unrecognised elements and attributes, and mention of the compatibility design goals within the new Mapping from TTML to the DAPT Data Model section.


Preview | Diff

@nigelmegitt nigelmegitt self-assigned this Mar 21, 2024
@nigelmegitt nigelmegitt added agenda Issue flagged for in-meeting discussion and removed editorial labels Mar 21, 2024
@nigelmegitt nigelmegitt linked an issue Mar 26, 2024 that may be closed by this pull request
@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Add informative section about mapping from TTML to the DAPT data model w3c/dapt#216, and agreed to the following:

  • SUMMARY: Reviews to continue, revisit this after more thought and discussion.
The full IRC log of that discussion <nigel> Subtopic: Add informative section about mapping from TTML to the DAPT data model #216
<nigel> github: https://github.com//pull/216
<nigel> Nigel: Quite a big change, new informative section added, mostly as discussed in the issue and our previous call.
<nigel> .. Introduces some implementation considerations for ingesting generic TTML2 documents that are also
<nigel> .. DAPT conformant, which might not have a 1:1 mapping to the DAPT Data Model.
<nigel> .. My biggest question is: is there anything in here that anyone thinks needs to be normative?
<atai> q+
<nigel> Cyril: Thank you for this, I started to review it, and that was my biggest question.
<nigel> .. You phrased it that there is no requirement for DAPT implementations to perform the task...
<nigel> .. which is what? Parsing the DAPT document into a DAPT Data Model?
<nigel> Nigel: It's "attempting to populate a data structure corresponding to the DAPT data model from any conformant DAPT document"
<nigel> Cyril: Why isn't that a requirement?
<nigel> Nigel: Because the only normative requirement is that a processor conforming to the TTML2 profile can process
<nigel> .. the document.
<nigel> Andreas: But a DAPT processor should be able to do it?
<nigel> Nigel: That's not a thing though, a "DAPT processor"
<nigel> Andreas: You ask if there's any concern about making this informative, and I think I mentioned in the last
<nigel> .. call that I have some questions about whether that's enough.
<nigel> .. My simple thinking: 2 implementations that implement DAPT, are DAPT implementations.
<nigel> .. If they're parsing DAPT-conformant TTML documents they should have the same result,
<nigel> .. so that there's interoperability. If you leave this open to be implementation dependent,
<nigel> .. that could be a problem. Could they come up with different results?
<nigel> q+
<nigel> ack atai
<nigel> .. This would be my main question when I come to review it.
<nigel> ack n
<nigel> Nigel: The reason I don't think that's a problem is because the semantics for processing the document
<nigel> .. are normatively defined by TTML2.
<nigel> .. Where we might need something normative would be if we wanted to have a fixed mapping back from
<nigel> .. every possible DAPT conformant document into the DAPT data model.
<nigel> .. Because that data model is defined by this spec, there can't be normative requirements for doing that within TTML.
<atai> q+
<nigel> Andreas: Again, maybe it's simple how I'm thinking, but a possible implementation would be to make the DAPT
<nigel> .. data model the internal data model, and handle the document accordingly.
<nigel> ack at
<Cyril> q+
<nigel> .. If two implementations do this I would assume they come up with the same values.
<nigel> ack cy
<nigel> Cyril: I'm wondering what are the classes of TTML processor that could do something meaningful with
<nigel> .. a DAPT document. For example if I have a TTML authoring tool and I load a DAPT document,
<nigel> .. I don't think the generic tool would be able to do any edits and guarantee that the output would
<nigel> .. remain conformant with the profile. Are you thinking that a generic authoring tool would see what is
<nigel> .. permitted or prohibited and constrain its UI to do only what's permitted?
<nigel> .. The bigger question: yes, a DAPT document is a TTML document, but is that useful for having a TTML
<nigel> .. processor process the document. I think we need tailored DAPT processors.
<nigel> .. They will do meaningful edits to the document.
<nigel> .. Just knowing they are TTML documents does not seem to be sufficient to do meaningful edits.
<nigel> q?
<nigel> Nigel: A generic TTML2 presentation processor that supports audio features should be able to
<nigel> .. play back a DAPT audio description script for example.
<nigel> .. A generic TTML2 transformation processor can observe the profile feature and extension requirements
<nigel> .. and decide if it supports the required features or not, and if it doesn't, presumably all bets are off, in terms of
<nigel> .. what it does.
<nigel> .. My thinking is that the profile constraints manage this for us.
<nigel> Cyril: On the playback side, I agree.
<nigel> .. A generic TTML processor could present a DAPT document without knowing about the DAPT data model, for sure.
<nigel> .. For a generic TTML transformation processor, that seems like a theoretical concept, so I wouldn't worry about these.
<nigel> .. So I think we should introduce a DAPT processor, which does not necessarily support generic TTML processing,
<nigel> .. but does support DAPT Data model processing.
<nigel> .. Then that section that you wrote can have SHALL statements that only apply to this class of processor.
<nigel> Nigel: This is for validators, or editors?
<nigel> .. I have another concern, which is extensibility of DAPT, and forwards/backwards compatibility.
<nigel> .. If we do as you suggest Cyril, how do we avoid introducing compatibility issues in the future?
<nigel> Cyril: Can you repeat the example?
<nigel> Nigel: If we introduce a new entity into the data model in the future, for example. E.g. a Script Event grouping
<nigel> .. structure. Wouldn't that break this new class of DAPT processor?
<nigel> Cyril: Generically, if we define a new entity that does not exist today at all, the old data model processors
<nigel> .. would ignore it, but the grouping one is a special case - I think that's what you've done, to identify
<nigel> .. what is a script event wherever it is in the document.
<nigel> Nigel: I think the answer will hinge on whether we really need a DAPT processor class.
<nigel> .. It would be helpful to get review comments on the pull request.
<nigel> Cyril: When I looked at your example, today we seem to rely on the xml:id to decide if a div might be a
<nigel> .. Script Event. I wonder if we should have a ttm:role that says "script event" more explicitly.
<nigel> .. Then the DAPT processor just finds this, and that's it.
<nigel> .. How deeply nested in the TTML shouldn't matter.
<nigel> .. Other things can be pruned.
<nigel> .. You do whatever you want with the rest. That seems simpler to me.
<nigel> Nigel: If we do that then we need structural constraints, that we currently don't list.
<nigel> .. For example what if a div marked as a Script Event contains another div marked as a Script Event.
<nigel> Cyril: Yes, we could have constraints about that.
<nigel> Nigel: Yes, that's an option.
<nigel> .. My overall direction here is steered by "less is more", and if we don't need a DAPT Processor, say,
<nigel> .. then we shouldn't create one. Of course maybe it is actually important, and we should define it.
<nigel> Cyril: The original question is what a processor should do if it encounters a DAPT document with
<nigel> .. contents that go beyond what would get directly made by a mapping from this version's DAPT Data Model.
<nigel> Nigel: Yes.
<nigel> SUMMARY: Reviews to continue, revisit this after more thought and discussion.

@nigelmegitt nigelmegitt removed the agenda Issue flagged for in-meeting discussion label Mar 28, 2024
@nigelmegitt
Copy link
Contributor Author

nigelmegitt commented Apr 9, 2024

Adding this to the agenda for 11th April, now that folk have had time to think some more; I'd prefer review comments on the pull requests before the meeting if possible.

I think the main question is if any of the TTML -> DAPT model provisions need to be changed to be normative, or if we can live with them being informative as they currently are in this pull request as it stands. Reminder that this question also strongly relates to #44.

@nigelmegitt nigelmegitt added the agenda Issue flagged for in-meeting discussion label Apr 9, 2024
@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Add informative section about mapping from TTML to the DAPT data model w3c/dapt#216, and agreed to the following:

  • SUMMARY: @nigelmegitt to make edits as discussed, @cconcolato to review, discussion to continue.
The full IRC log of that discussion <nigel> Subtopic: Add informative section about mapping from TTML to the DAPT data model #216
<nigel> github: https://github.com//pull/216
<nigel> Nigel: From last time, I think the determining factor is if we need a class of DAPT implementation
<nigel> .. that maps from TTML2 into the DAPT data model. If we do, that means we need to make these
<nigel> .. provisions normative.
<nigel> Cyril: Taking a step back, we did this pull request to cover the case that there is a document that
<nigel> .. conforms to the profile but does not map directly to the DAPT data model.
<nigel> .. In practice if you have an implementation of DAPT that is "just" DAPT, which I think will be the majority case,
<nigel> .. then this situation should not happen. You shouldn't end up with a document that cannot be easily mapped.
<nigel> Nigel: Yes, to a point.
<nigel> .. The exception could be from the compatibility perspective - some future version of DAPT adds in a feature
<nigel> .. that we want older DAPT processors to handle gracefully.
<nigel> .. It's not just about TTML2 generically.
<nigel> Cyril: You're right.
<nigel> .. Thinking out loud, if we added constraints like feature extensions to restrict a DAPT document
<nigel> .. to correspond only to the DAPT data model, what would be the problem? Extensibility?
<nigel> Nigel: Yes, that would be the main one.
<nigel> .. It's really the structural issue of divs containing other divs or mixed div and p children.
<nigel> .. Which we agreed there could be a future use for.
<nigel> .. If we prohibited that then we wouldn't be able to use that capability in the future without making a
<nigel> .. breaking version change to DAPT.
<nigel> .. Maybe we could argue that, to make sure that conformant implementations can deal with those changes,
<nigel> .. that's why we need to make the informative provisions normative.
<nigel> Cyril: What about text content anywhere other than p and span?
<nigel> Nigel: It's allowed in p and span but TTML doesn't allow it anywhere else.
<nigel> .. Except for metadata elements etc, of course.
<nigel> Cyril: Ok, thank you.
<nigel> Nigel: Does that argument about future compat seem correct?
<nigel> Cyril: Yes. In general I prefer something normative otherwise there won't be interoperability.
<nigel> .. Does this mean we need a DAPT processor type?
<nigel> Nigel: No I don't think it does.
<nigel> Cyril: In §7.2 it defines a DAPT Processor in terms of conformance to the profile provisions and to the document.
<nigel> .. How would we do that?
<nigel> Nigel: I'd make extension features referencing the new normative provisions, so it all ties together.
<nigel> Cyril: I would like to take a stab at re-writing §5 or proposing changes.
<nigel> Nigel: OK, that's fine, otherwise I'd have done it.
<nigel> Cyril: Not sure when I'll do it.
<nigel> Nigel: Why don't I do a first pass, and then you can review it?
<nigel> Cyril: That's fine.
<nigel> .. I think we should move the new section 5 to after the Constraints section. We're only concerned
<nigel> .. with valid documents, which are defined in the Constraints section.
<nigel> .. I would start by saying that the processing behaviour for a processor processing a valid document that
<nigel> .. contains additional content not in the DAPT model is the following...
<nigel> .. Say there may be conformant DAPT docs that contain more, e.g. for a new version, or a round trip through
<nigel> .. a generic TTML tool.
<nigel> .. That's how I'd start, by explaining that.
<nigel> .. Once the context is clear I think it's easier to understand.
<nigel> Nigel: I think it'll be important to say that the graceful handling feature requirements may be replaced
<nigel> .. in future versions by something that defines some other behaviour.
<nigel> Cyril: Did you mention parsing, or just mapping?
<nigel> Nigel: Just mapping. I think parsing is defined by XML, we're talking about building a data model from the parsed entities.
<nigel> Cyril: OK
<nigel> Pierre: Are you going to take the TTML approach of pruning?
<nigel> Nigel: I don't think so, not quite
<nigel> Cyril: For validation purposes, yes.
<nigel> .. But a read/write processor should try to retain unrecognised vocab
<nigel> Pierre: The reason for mentioning: if the processor sees elements or attributes it does not understand then
<nigel> .. there's no hope it can understand how to deal with those unknown elements.
<nigel> .. If you merely preserve them, that doesn't take into account the semantics of the unknown elements.
<nigel> .. Generally it's not possible unless you specify extension rules such as vocabulary in a particular part of the
<nigel> .. model does not affect e.g. timing etc.
<nigel> .. Some things can be preserved with minimal risk, but everything else, it's hopeless.
<nigel> Cyril: You can have multiple values of profiles in the contentProfiles attribute, but if you write back
<nigel> .. a file then you shouldn't write back values of contentProfiles that you don't understand.
<nigel> .. You could end up with semantically incorrect content.
<nigel> Nigel: Example is an attribute for number of words, doc says 3, editor adds 2, saves the value as 3 because it
<nigel> .. doesn't understand it.
<nigel> Pierre: There's a danger of getting rules that are so complex that nobody understands them.
<nigel> Pierre: The TTML model is blunt but straightforward. Just get rid of everything you don't understand.
<nigel> .. Maybe some stuff could have been kept, but at least it is predictable.
<nigel> .. When the author wants the document to be compatible with an older version,
<nigel> .. do it so that when you strip the newer stuff it's still valid for the older version.
<nigel> SUMMARY: @nigelmegitt to make edits as discussed, @cconcolato to review, discussion to continue.

@nigelmegitt nigelmegitt removed the agenda Issue flagged for in-meeting discussion label Apr 11, 2024
* Fix broken reference to unrecognised attribute section
* Mention design goal of future compatibility
* Add section on computed attribute values
* Add TODO for validation warnings and errors
* Make Data model diagram clearly informative
* Add links
* Put explanatory text around examples into a single example `<aside>` including the example XML
* Add rule for non-rejection of Script Events if they contain unrecognised attributes
* Add the results of applying the rules to the div and p example
* Add example showing the effect of attribute computation
* Add section about retaining unrecognised vocabulary
* Add section about validation warnings and errors
Clarify that attribute computation needs to happen before anything is ignored.
@nigelmegitt nigelmegitt force-pushed the issue-0214/map-ttml-to-data-model branch from de94ce1 to fd02219 Compare April 12, 2024 13:16
@nigelmegitt
Copy link
Contributor Author

nigelmegitt commented Apr 15, 2024

Extracting the key change actions from last week's meeting:

  • To ensure the compatibility requirements we have, some of the provisions need to be normative.
    • Add extension features for each one.
  • A processor that reads, allows modification and then writes a DAPT document must, if processing an input document whose ttp:contentProfiles includes values that signal a greater version number than the processor supports, not include those values in the output document.
  • When encountering within an input document unknown vocabulary in a known DAPT or TTML namespace, i.e. an entity whose syntax and semantics are presumably defined in some future version of the specification, any changes to the document could in principle render those unknown entities as invalid. They should be pruned on output since the processor cannot guarantee they remain correct in the presence of those changes.
    • Example: some attribute for a count of the number of words in a Text is introduced in a future version. An older processor, e.g. an editing application, allows the user to add or delete words from the Text. If the attribute is saved unchanged, it would be wrong. Better to omit it altogether.
  • Move the new section to after the Constraints section.
    • Start by saying that the processing behaviour for a processor processing a valid document that contains additional content not in the DAPT model is the following...
    • Say there may be conformant DAPT docs that contain more, e.g. for a new version, or a round trip through a generic TTML tool.
    • Say that the graceful handling feature requirements may be replaced in future versions by something that defines some other behaviour.

* Rework introduction to Mapping from TTML to DAPT section.
  * Make it normative.
* Make most subsections informative, but make the normative ones use normative keywords, e.g. the one about div and p structural handling.
* Add section for document conformance claims handling
@nigelmegitt nigelmegitt added the agenda Issue flagged for in-meeting discussion label Apr 23, 2024
@nigelmegitt
Copy link
Contributor Author

Bringing this back to the table for more discussion. After last meeting I began making the editorial updates, but then realised that we hadn't properly got to a conclusion on two competing and mutually incompatible views expressed by participants.

Considering a processor that reads a DAPT document, potentially makes some changes, and then writes it out again, what should that processor do with vocabulary that it did not recognise? Particularly if it's in a DAPT or TTML namespace?

The options are:

  1. Unrecognised vocabulary must be omitted from the output document, because the processor, being unaware of the semantics of that vocabulary, cannot claim that it the content using it is still valid - example we discussed was an attribute for "word count", which would be invalidated by adding a word.
  2. Unrecognised vocabulary should be preserved when possible, because that means that older processors in the chain can pass through newer vocabulary without bringing the whole chain down to the level that it supports.

The second option is affected by something we agreed: a processor must not include a value in ttp:contentProfiles that it does not actually support. That opens up the possibility that a validating processor supporting some future version could open the document and say "hey, you outputted this vocabulary that you don't support, let's check it's correct and fix it if necessary" - of course there's no guarantee that it's possible to fix!

I (as Editor) need us to have an agreed direction here before I can proceed. Myself, I'm tending towards option 1, but possibly with adequate safeguards, option 2 could work.

@cconcolato
Copy link
Contributor

I (as Editor) need us to have an agreed direction here before I can proceed. Myself, I'm tending towards option 1, but possibly with adequate safeguards, option 2 could work.

I prefer option 1 too

@nigelmegitt
Copy link
Contributor Author

I (as Editor) need us to have an agreed direction here before I can proceed. Myself, I'm tending towards option 1, but possibly with adequate safeguards, option 2 could work.

I prefer option 1 too

Thanks, I will raise this on the agenda for our meeting, but assuming there are no surprises, I'll proceed on that basis. We need to check that we agree if this applies to all unrecognised vocabulary or only unrecognised vocabulary in known namespaces, or in namespaces used in the specification.

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Add informative section about mapping from TTML to the DAPT data model w3c/dapt#216, and agreed to the following:

  • SUMMARY: @nigelmegitt to attempt to edit this into the pull request; everyone else to think about the semantics for including TTML styling vocabulary.
The full IRC log of that discussion <nigel> Subtopic: Add informative section about mapping from TTML to the DAPT data model #216
<nigel> github: https://github.com//pull/216
<nigel> Nigel: We discussed this last call and I began working on the edits to the pull request,
<nigel> .. but then realised we hadn't concluded properly on something where two mutually exclusive views
<nigel> .. had been presented.
<nigel> Nigel: [summarises https://github.com//pull/216#issuecomment-2072894545]
<nigel> .. From the Cyril's response I think it's clear that for vocabulary in DAPT or TTML namespaces, or any
<nigel> .. namespace included in the spec, processors should not write out vocabulary relating to features they don't support.
<nigel> .. My next question is: what about stuff in completely different namespaces?
<nigel> .. Like locally defined metadata.
<nigel> Pierre: Unless the spec defines a place in the model for metadata, with enough semantics that
<nigel> .. the processor can do something with it without knowing its contents, the safest thing is for the processor
<nigel> .. to prune them altogether.
<nigel> Nigel: Such a place might be as a child of the <metadata> element for example.
<nigel> Pierre: You might say that in head/metadata, temporally, any metadata applies to the whole document.
<nigel> .. They might still put in some average word count, say, so a processor that doesn't know it could
<nigel> .. update the document and invalidate it.
<nigel> q+ cyril
<atai> q+
<nigel> .. From a spec perspective it makes sense to prune all unknown vocabulary. It's the best approach.
<nigel> ack cyril
<nigel> Cyril: A few points. I may have responded too quickly! I'm thinking about changing my mind.
<nigel> .. With an MP4 file, if there are boxes I don't understand, do I expect a processor to remove them?
<nigel> .. You would remove the declaration of the features not understood.
<nigel> .. In TTML, you would definitely remove the contentProfiles that aren't supported.
<nigel> .. Does it mean you have to remove everything?
<nigel> .. You could remove it, but if you leave it in, then you're not claiming its validity.
<nigel> .. The entity that would process the document could decide to keep, say, a word count element,
<nigel> .. but would remove the contentProfiles declaration.
<nigel> .. Secondly, when you rewrite a document, it's two steps: parsing and then writing.
<nigel> .. What's the difference here? You could write valid documents against the specifications you declare validity for.
<nigel> .. Option 1 was "MUST omit" - I think it is probably too strong.
<nigel> q+
<nigel> ack ata
<nigel> Andreas: I'm a bit reluctant if we recommend at all to prune unknown vocabulary especially in foreign namespaces,
<nigel> .. because it could remove some benefits of extending TTML documents with data, especially metadata.
<nigel> .. From the user point of view, what could you expect if foreign namespace metadata is in the document,
<nigel> .. and it goes through an implementation not controlled by the user.
<nigel> .. You could say it is implementation dependent, but in the best case it stays where it was before.
<nigel> .. Then the user needs to live with the fact that it may not be meaningful any more.
<nigel> .. If we recommend pruning, then metadata would only survive implementations that understand it.
<nigel> .. I think that's too strong a requirement.
<nigel> Pierre: Just to clarify, I'm saying this strictly from a specification conformance perspective.
<nigel> .. I expect implementations to do whatever.
<nigel> Cyril: What does it mean though, because if the implementation doesn't meet the spec then it's not compliant.
<nigel> Pierre: Say there's a presentation processor, pruning unknown vocabulary can be tested.
<nigel> .. If you feed a presentation processor two documents, one with foreign vocab and one without, you
<nigel> .. would expect the same outcome.
<nigel> .. If there is a conformance for a translation or transport processor...
<nigel> Nigel: That's what we're talking about
<nigel> Pierre: From my experience in TTML and MXF, I've only seen pain in trying to keep around things that
<nigel> .. you don't know what they are.
<nigel> .. From a conformance standpoint it's either a MUST or nothing.
<nigel> q?
<nigel> ack nigel
<nigel> Nigel: The point about contentProfiles was agreed, there was no doubt about that.
<nigel> .. We are talking about a document that reads and writes DAPT documents, rather than a presentation processor.
<nigel> .. A suggestion I have based on the above is as follows:
<nigel> .. If there is any feature or vocabulary that has some semantic that means it could be invalidated by being
<nigel> .. "passed through" then the definer of that vocabulary better make a profile for it and have that appear in
<nigel> .. contentProfiles when output by a processor that supports it.
<nigel> .. Other vocabulary that is just inside any metadata element can be passed through and no assumptions
<nigel> .. about validity should be made.
<nigel> .. I think we should prune any unsupported vocabulary in namespaces listed in DAPT though.
<nigel> .. The third part is that a processor that does support some extra profile and receives a document that
<nigel> .. doesn't claim conformance to that profile but does contain vocabulary relating to it needs to take extra
<nigel> .. validation steps and may need to modify it.
<nigel> .. That last point is hard to write a testable specification conformance rule for.
<Cyril> RRSAgent, pointer
<RRSAgent> See https://www.w3.org/2024/04/25-tt-irc#T15-38-04
<nigel> Nigel: Does that make any sense?
<atai> q+
<nigel> Cyril: What can we say about the definer of vocabulary making a profile?
<nigel> Nigel: We can make a note recommending that people do this.
<nigel> Cyril: That's fine.
<nigel> Andreas: I think at least the first two options seem fine to me, but they're more recommendations
<nigel> .. to the author than the processor, right?
<nigel> Nigel: From this I can define processor behaviour, as follows:
<nigel> .. 1. Never include in ttp:contentProfile profiles which the processor does not support.
<nigel> .. 2. Foreign namespace vocabulary in <metadata> elements should be preserved
<nigel> .. 3. Non-foreign namespace vocabulary that is not supported MUST be removed
<nigel> .. And then there's advice too, for authors as you suggest.
<pal> q+
<nigel> ack at
<nigel> Andreas: Yes I think those are processor requirements.
<nigel> Pierre: Number 2 is not testable because of the SHOULD.
<atai> q+
<nigel> .. By the way, I think that's what will happen in practice, but from a spec perspective it is not testable.
<pal> q-
<nigel> ack at
<nigel> Andreas: I think that it's implementation dependent, what happens, but whatever keyword you use
<nigel> .. it is a recommendation to keep where possible. In some cases it may not be possible,
<nigel> .. because a semantically identical document has its structure changed, and the parent of the metadata element
<nigel> .. was removed while doing that.
<nigel> Nigel: That last point is a whole other headache - I think with DAPT it should not happen but I guess it is possible.
<nigel> Pierre: The only alternative I can think of is to preserve vocabulary but if you put it in a particular
<nigel> .. location then it has limited impact. It's going to be fragile I think.
<nigel> Nigel: There's a good test case there for us to think about which is what if you take a DAPT document
<nigel> .. and add styling in to make it an IMSC document.
<nigel> Cyril: I was thinking the same thing.
<nigel> .. I'm wondering if, once you start making a subtitle out of a DAPT document, you've forked it and you're
<nigel> .. in the subtitling space - I'm not sure you'd want to go back to the DAPT processor for anything.
<nigel> .. It's a one-way door I would say.
<nigel> Pierre: One thing just occurred: does the spec really need to define a transformation processor?
<nigel> Nigel: Validation processors are a subset of transformation processors.
<nigel> Pierre: Just for validation it's okay to prune everything that's unknown.
<nigel> .. Maybe just don't define transformation processor other than validation processor so we don't need this?
<nigel> Nigel: That's where I started out when we began thinking about this.
<nigel> q?
<nigel> Nigel: I think I have enough to go ahead and try editing here, and see how that works out.
<nigel> SUMMARY: @nigelmegitt to attempt to edit this into the pull request; everyone else to think about the semantics for including TTML styling vocabulary.

As discussed in #216 (comment):
* Define restriction on `ttp:contentProfiles` values within the section on contentProfiles
* Define and use the term "foreign vocabulary"
* Advise retention of unsupported foreign vocabulary in `<metadata>` elements
* Require pruning of unsupported vocabulary outside `<metadata>` elements
* Provide a documented path to allowing foreign vocabulary to be defined and supported
* Note that structural changes could lead to loss of foreign vocabulary if the `<metadata>` element to which it is associated is removed.
@nigelmegitt
Copy link
Contributor Author

Those sections imply more complex processing on the reader side, for the purpose of extensibility.

Yes, this is another choice we considered previously, and could come back to, i.e. prohibiting nested <div>s. Then our path forward if we ever want to introduce nested <div>s for some purpose would be to require v n+1 processors for v n+1 documents, unless those documents happen also to be v1 documents. A v n+1 processor would still be able to process a v1 document.

Based on @cconcolato 's review comments, with small editorial adjustments.
Rename "Document conformance claims on output", mark as informative; move the section about processor behaviour into a new normative section "Not supporting features excluded by the content profile".

Some wording edits too.
@cconcolato
Copy link
Contributor

cconcolato commented Jul 9, 2024

Comments about section 5.2.

The title of the section is now "Unrecognised Elements and Attributes" but the section is also about "foreign" elements attributes and moreover it is about their processing. I would suggest renaming it "Processing of unrecognised or foreign elements and attributes".

I think the section would flow better if the sentences were reordered, I would suggest the following order:

  1. NOTE 1 about why the section exists

  2. definition of unrecognised vocabulary

  3. indication that documents may contain unrecognised vocabulary

  4. processing rules with notes 3 and 4 using unrecognised vocabulary (not foreign vocabulary)

  5. definition of foreign vocabulary + NOTE 2 about foreign vocabulary profiles

  6. sections about use of foreign vocabulary for proprietary metadata and consequences (sections 5.2.1 and 5.2.1.1)

I wonder if we should not have 2 clear subsections: 5.2.1 unrecognised vocabulary (steps 1 to 4) and 5.2.2 special considerations on foreign vocabulary (steps 5 and 6) with 2 subsections: 5.2.2.1 foreign vocabulary as metadata and 5.2.2.2 foreign vocabulary that is not metadata

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
Restructure the Unrecognised Elements and Attributes section to split unrecognised vocabulary sections from foreign vocabulary sections.

Rename Profile Designator section to be plural.

Delete informative section that repeats normative requirements elsewhere, re avoiding unverified profile conformance signalling.
@nigelmegitt
Copy link
Contributor Author

I wonder if we should not have 2 clear subsections: 5.2.1 unrecognised vocabulary (steps 1 to 4) and 5.2.2 special considerations on foreign vocabulary (steps 5 and 6) with 2 subsections: 5.2.2.1 foreign vocabulary as metadata and 5.2.2.2 foreign vocabulary that is not metadata

Done in f69d320.

@cconcolato
Copy link
Contributor

Comments about section 6.

Overall, section 6 is sound. The main question being do we need it, per comment #216 (comment)

Editorially:

  • I'm not sure section 6.6 adds much value given that it rephrases 5.2. For overall conciseness, I would suggest removing it, but not a strong feeling.
  • I find section 6.7 a bit verbose but that's not a big deal.
  • I wonder if we should not group 6.6 with 6.7 and maybe rename that to "considerations for transformation and validation processors" with 2 subsections

index.html Outdated Show resolved Hide resolved
Define foreign vocabulary before referencing it.

Put considerations for transformation and validation processors into a separate subsection of §6.
@nigelmegitt
Copy link
Contributor Author

  • I wonder if we should not group 6.6 with 6.7 and maybe rename that to "considerations for transformation and validation processors" with 2 subsections

Done in c68a08a.

@nigelmegitt
Copy link
Contributor Author

Overall, section 6 is sound. The main question being do we need it, per comment #216 (comment)

I think we should keep it for now, and then when working on #233 we may be able to remove some of it - I suspect not all of it though, given some of §6 isn't only about TTML vocabulary.

index.html Outdated
Comment on lines 2324 to 2338
<p>The following processing rules resolve these cases:</p>
<ol>
<li>A <code>&lt;div&gt;</code> element that contains any <code>&lt;div&gt;</code> element children
MUST NOT be mapped to a <a>Script Event</a>;
the processor instead MUST iterate through those <code>&lt;div&gt;</code> element children
(recursively) and consider if each one meets the requirements of a <a>Script Event</a>;</li>
<li>A <code>&lt;div&gt;</code> element that contains unrecognised attributes
MUST NOT be rejected as a <a>Script Event</a> if it otherwise meets the requirements
for being a <a>Script Event</a>, such as having a valid <code>xml:id</code>
representing the <a>Script Event Identifier</a>;</li>
<li>A <code>&lt;p&gt;</code> element that is not a child of a <code>&lt;div&gt;</code> element
that maps to a <a>Script Event</a> MUST NOT be mapped to a <a>Text</a> object;</li>
<li>A <code>&lt;p&gt;</code> element that is a child of a <code>&lt;div&gt;</code> element
that maps to a <a>Script Event</a> MUST be mapped to a <a>Text</a> object.</li>
</ol>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rules are negative (MUST NOT) and it is not clear how to clearly identify a script event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c1b4ab.

Makes the data model section match the requirements of the TTML -> DAPT model parsing section.
@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Add section about mapping from TTML to the DAPT data model w3c/dapt#216, and agreed to the following:

  • SUMMARY: Updates and review continue
The full IRC log of that discussion <nigel> Subtopic: Add section about mapping from TTML to the DAPT data model #216
<nigel> github: https://github.com//pull/216
<nigel> Nigel: I had a useful discussion with Cyril about this, and I think all his feedback has been resolved.
<nigel> .. I also asked some individuals who have contributed in the discussions if they had any more feedback.
<nigel> .. Pierre had some comments by email too, which I think are mostly resolved.
<nigel> .. I will mention some additional work I haven't pushed yet, but any other comments on this right now?
<nigel> Andreas: I have not had time to review it properly yet, but I intend to - however do not wait for me.
<nigel> Nigel: I have begun to draft a further update that clarifies that a DAPT document is a
<nigel> .. TTML timed text content document instance, which is a specific term in TTML2 that clarifies which
<nigel> .. kind of document instance it is.
<nigel> .. The other addition, which is a response to Pierre's feedback, is to state in the DAPT data model, explicitly,
<nigel> .. that there may be intermediate div elements between the body and the div representing a script event,
<nigel> .. and that script event divs must not contain any child div elements.
<nigel> .. I think that sort of closes the loop a bit better than it is now.
<nigel> .. Andreas, do you remember making that point?
<nigel> Andreas: Yes, I remember making that point. I think it is better to avoid misinterpretation and unpredicted parsing of the document.
<nigel> .. To be more restrictive.
<nigel> Nigel: OK, I'll try to finish that and get it pushed.
<nigel> .. I can't merge this until I have at least one approval, so this is going to drag on until someone else
<nigel> .. says they're happy with it!
<nigel> SUMMARY: Updates and review continue

@nigelmegitt nigelmegitt linked an issue Jul 22, 2024 that may be closed by this pull request
Rather than assuming that the reader already knows exactly what a TTML2 timed text content document instance looks like.
* Be explicit about the need for `<head>`, `<metadata>` and `<body>` elements.
* Specify the path of Script Events and Text objects, for convenience.
* Be clear that the `daptm:langSrc` attribute on a `<p>` element represents the Text Language Source (weird that we didn't before).
* State that no semantic is defined for `<div>` elements that are not Script Events.
* Now that there is explicit permission to have nested `<div>`s, allow for that in the section about handling `<div>` and `<p>` elements, i.e. they are not just a theoretical construct allowed in TTML, but they are explicitly permitted in DAPT too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda Issue flagged for in-meeting discussion
Projects
None yet
3 participants