Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework audio description original and translation languages #179

Merged
merged 28 commits into from Dec 21, 2023

Conversation

nigelmegitt
Copy link
Contributor

@nigelmegitt nigelmegitt commented Jul 31, 2023

Closes #173 and #148.

Text Language Source updated to remove the link between it and Primary Language. Needed to introduce a new concept of a "nominal" original language for the programme, for the Original language of the Audio Description.

Added notes and examples about how to handle non-nominal original language in-image text in audio descriptions, and the potential need to translate it.

Text Language Source now indicates the source language for the translation.

Tidied some of the existing notes.


Preview | Diff

Copy link

@andreastai andreastai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of the term nominal original language could make it difficult to understand when to use original or translation. Currently, there is no definition of what distinguishes a nominal original language from an original language.

@nigelmegitt
Copy link
Contributor Author

I've been thinking about how to resolve the difficulties you mention @andreastai and it's not straightforward. My thinking so far, based on this PR as it is now:

  • "Primary Language" is a concept given too much importance - really it's just a default language for the document, and doesn't carry any other significance.
  • When non-dialogue sounds or non-text video image contents are described, we need a concept and a term for expressing what language they are described in, in the "original" transcript.

I'm tempted to downgrade "Primary Language" to "default language" as a term (but not really make any other changes to it), and to introduce something like "Non-verbal Content Original Language" which is a bit of a wordy term, but doesn't need to be used too often, hopefully.

I'm envisaging a specific attribute on the tt level for establishing the "Non-verbal Content Original Language", which would not change even if the descriptive content is translated. Then this slightly worrying sounding "nominal original language" would be an explicitly defined thing we could reference.

Does that sound like a good way forward?

@nigelmegitt nigelmegitt added the agenda Issue flagged for in-meeting discussion label Aug 2, 2023
@nigelmegitt
Copy link
Contributor Author

I've made a change to this PR roughly matching the intent of #179 (comment) that no longer requires or references some invented "nominal" original audio language:

  • Rename Primary Language to Default Language
  • Introduce Original Language as the mandatory daptm:originalLang
  • Reference this Original Language within the definition text for Original text language source

I will check the extension feature set matches these requirements too, will likely need another commit and push.

@nigelmegitt
Copy link
Contributor Author

This is ready for a re-review now @andreastai @cconcolato .

@andreastai
Copy link

Thanks for your attempt to resolve the issues @nigelmegitt. After a quick first review, I think this formally could work. But we still have the issue that it makes the use of the Text Language Source very difficult to understand. Following your proposed concept I think the introduced property "OriginalLanguage" would be better named in a way as proposed in #179 (comment) such as Non-verbal Content Original Language.

But the more I think about it I wonder if for non-verbal content the property Text Language Source should be used at all. It should be considered to not identity the language source for non-verbal content or to add another attribute to Text Objects such as Non-Verbal Content Language Source.

@nigelmegitt nigelmegitt removed the agenda Issue flagged for in-meeting discussion label Aug 3, 2023
@nigelmegitt
Copy link
Contributor Author

See discussion logged at #173 (comment) for suggestions for how to proceed with this.

@nigelmegitt
Copy link
Contributor Author

Also see #148 (comment) for an additional relevant change relating to this.

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Rework audio description original and translation languages w3c/dapt#179.

The full IRC log of that discussion <nigel> Subtopic: Rework audio description original and translation languages #179
<nigel> github: https://github.com//pull/179
<nigel> Nigel: I think the issues have now been resolved by removing Text Language Source for non-translated content,
<nigel> .. and having it set to the source language for translated content.
<nigel> .. The latest commits should express that.
<nigel> Andreas: I will review offline.
<nigel> Nigel: Thank you.

@nigelmegitt nigelmegitt force-pushed the issue-0173-text-lang-source-for-non-dialogue-sound branch from 7509859 to df8261c Compare October 13, 2023 13:06
Copy link
Contributor

@cconcolato cconcolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text language source seems to be optional but it does not appear so in the diagram.

Copy link

@andreastai andreastai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @nigelmegitt The use of the Text Language Source property is clear. However, the name and usage of the Original Source property of the DAPT Script is confusing. The name does not reflect that it only applies to content that does not have an inherent language (e.g. non-dialogue sounds). Furthermore, I wonder if it would be sufficient to just use the Text Language Source property for content without an inherent language to indicate if the text content was translated and if so from which language.

@nigelmegitt
Copy link
Contributor Author

Thanks @andreastai I can see why "Original Language" is a confusing name for text describing content that does not have an inherent language. Is your proposal to treat Text whose Text Language Source is the same as its Language as a special case to represent that the text was authored to describe that content?

For example:

<div xml:id="event_3"
     begin="9663f" end="9682f">
  <p xml:lang="en" daptm:langSrc="en">The computer screen turns blue.</p>
</div>

@nigelmegitt nigelmegitt force-pushed the issue-0173-text-lang-source-for-non-dialogue-sound branch from df8261c to 7e7d4cb Compare October 19, 2023 13:58
@andreastai
Copy link

Is your proposal to treat Text whose Text Language Source is the same as its Language as a special case to represent that the text was authored to describe that content?

For example:

<div xml:id="event_3"
     begin="9663f" end="9682f">
  <p xml:lang="en" daptm:langSrc="en">The computer screen turns blue.</p>
</div>

I would propose not using specific tagging for descriptions of content without an inherent language if it was described just using that content and not a previously existing text, e.g.

 <div xml:id="event_3"
      begin="9663f" end="9682f">
   <p xml:lang="en">The computer screen turns blue.</p>
 </div

But if this is now translated into German based on the English description the daptm:langSrc attribute could be used:

 <div xml:id="event_3"
      begin="9663f" end="9682f">
   <p xml:lang="de" daptm:langSrc="en">Der Computerbildschirm wird blau.</p>
 </div

@nigelmegitt nigelmegitt force-pushed the issue-0173-text-lang-source-for-non-dialogue-sound branch from 2ccd8ac to 932006b Compare November 2, 2023 14:39
@nigelmegitt
Copy link
Contributor Author

I would propose not using specific tagging for descriptions of content without an inherent language if it was described just using that content and not a previously existing text,

@andreastai Thanks, I think I understand. Source content with an inherent language has that language marked up in daptm:langSrc, and source content without an inherent language omits the attribute. That's close to what we have in this pull request but will need some changes. I think it may only need minor tweaks. Enumerating the possibilities with an example where original language is English:

Transcript source Inherent language of what's being transcribed xml:lang daptm:langSrc
Text in video image in original language English en en
Text in video image in non-original language German de de
Video image (non text) none en absent
Sound effect none en absent
Dialogue in original language English en en
Dialogue in non-original language French fr fr

Then whenever these pieces of text are translated, the translations have their daptm:langSrc set to the computed xml:lang of the source for the translation, and at that point the values of xml:lang and daptm:langSrc differ.

Also, in this proposal there is no longer any particular need to record the document's Original Language, because the original language of the transcript content can always be derived.

Does that match your proposal?

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Rework audio description original and translation languages w3c/dapt#179, and agreed to the following:

  • SUMMARY: the proposal seems OK. Original Language still has a use, but we can relax some of its requirements: make it non-mandatory and allow multiple values, if that's the best representation of the content
The full IRC log of that discussion <nigel> Subtopic: Rework audio description original and translation languages #179
<nigel> github: https://github.com//pull/179
<cpn> Nigel: This has gone through some iterations. Andreas has been actively reviewing
<cpn> ... The point to deal with is flagging in the best way the difference between content described that had an inherent language, and content that didn't
<cpn> ... I added a table in the issue with permutations
<cpn> ... It's clear if you're transcribing dialog or signing
<cpn> ... For content in the video image, it might be text or might not be. If text, can use lang source
<cpn> ... If it's a description of the image, or a sound effect, those things don't have a language
<cpn> ... The proposal is for any content with a language, use lang source to record it, and the language of the text transcripted in xml lang
<cpn> ... And omit lang source if there isn't a language
<cpn> ... Also the concept of original language doesn't serve a purpose any more, and could be removed
<cpn> Cyril: On the proposal to be clear on when lang source is present or not, that's fine
<cpn> ... Disagree that we don't need the original language
<cpn> ... It's to provide context on what the language was. When you're dubbing a show with multiple languages in the original source
<cpn> ... or a show translated to a pivot language to create a dubbing, knowing the orignal language can be useful
<cpn> Nigel: That's a change of status of the flag. There are normative statements around original lang to change: make it a list
<cpn> ... if someone creates a silent video with no dialog, then it's audio described, what use is orignal lang?
<cpn> Cyril: Doesn't have to be mandatory. The concept of original language should be preserved in the spec
<cpn> ... or original languages
<cpn> ... If the original languages are French and English, you can translate to English only, French only, or German
<cpn> ... If translation to German, you translate everything, but if translating to French you only translate the English parts
<cpn> ... It's complex, I need to continue to review. Seems going in the right direction
<cpn> Mike: I try to get people to conform to BCP47. If you constrain to that, it's good. Use the IANA codes
<cpn> Nigel: We do, yes
<cpn> Nigel: To summarise, the proposal seems OK. Original language still has a use, but we can relax some of its requirements: make it non-mandatory and allow multiple values, if that's the best reprsentation of the cotent
<cpn> s/cotent/content/
<nigel> SUMMARY: the proposal seems OK. Original Language still has a use, but we can relax some of its requirements: make it non-mandatory and allow multiple values, if that's the best representation of the content

@nigelmegitt
Copy link
Contributor Author

Changes as discussed in #179 (comment) now made in 84f9e0c. Hopefully this makes the handling of content in multiple languages, and the handling of transcripts of content without an inherent language much clearer, and removes the complex dependency on Original Language, which is now optional.

@andreastai @cconcolato please re-review when you can.

index.html Outdated Show resolved Hide resolved
@andreastai
Copy link

@nigelmegitt Apologies for the late reply to your comment in https://github.com/w3c/dapt/pulls#issuecomment-1831523515.

After reading through the current PR I would be in favor for allowing @daptm:langSrc only on <p> and <span> elements (so, only to the elements where it applies). Furthermore I propose to make the the attribute mandatory on <p> elements (one of the options you propose in your comments).

Reasoning: The potential errors scenarios are:

  • Users may oversee inherited values and take only the values of @daptm:langSrc on <p> and <span> elements to conclude if the text objects are Original or Translation.

  • The default value of @daptm:langSrc applies only to a specific use case and not the majority of the use cases. Therefore users may be easily imply unwanted semantics if they do not specify the attribute.

Although you mention these potential errors in the informative note it may better to reduce them by being more restrictive. This would also be better for the requirement of readability that @cconcolato mentions in https://github.com/w3c/dapt/pulls#issuecomment-1828846945.

For "readability" it may also be better to make xm:lang on text objects mandatory.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@nigelmegitt
Copy link
Contributor Author

@nigelmegitt Apologies for the late reply to your comment in https://github.com/w3c/dapt/pulls#issuecomment-1831523515.

After reading through the current PR I would be in favor for allowing @daptm:langSrc only on <p> and <span> elements (so, only to the elements where it applies). Furthermore I propose to make the the attribute mandatory on <p> elements (one of the options you propose in your comments).

Reasoning: The potential errors scenarios are:

* Users may oversee inherited values and take only the values of `@daptm:langSrc`  on `<p>` and `<span>` elements to conclude if the text objects are `Original` or `Translation`.

* The default value of `@daptm:langSrc` applies only to a specific use case and not the majority of the use cases. Therefore users may be easily imply unwanted semantics if they do not specify the attribute.

Although you mention these potential errors in the informative note it may better to reduce them by being more restrictive. This would also be better for the requirement of readability that @cconcolato mentions in https://github.com/w3c/dapt/pulls#issuecomment-1828846945.

For "readability" it may also be better to make xm:lang on text objects mandatory.

On reflection I don't think this is much about 'users', practically, unless someone is making a very naive authoring implementation, or editing manually. Instead, the constituency that would be most worried about this is probably implementers.

The advantage of allowing inheritance of daptm:langSrc and xml:lang is to reduce document size and processor time when parsing the document. In a distribution environment and playback those matter to some extent; in an editing and exchange environment, they are not so important.

You've described the disadvantages quite clearly: in this case, inheritance is a vector for introducing unexpected behaviour or bugs if implementers don't deal with this.

My own view is swayed slightly more in favour of reduced duplication and to allow more inheritance, because I would like to see this format adopted in players eventually, for AD.

But I would certainly like to hear any input from implementers. Also, maybe the notes are targeted wrongly and should be more implementation-focused. At the moment it describes mitigations that authors can take, but realistically, authors are unlikely to be making a choice here. I'd be happy to try to draft a warning directly to implementers to point out that when changing daptm:langSrc or xml:lang on an element they need to check down the tree and be if appropriate specify the value on descendant elements so that their meaning does not change unintentionally.

index.html Show resolved Hide resolved
@nigelmegitt
Copy link
Contributor Author

After reading through the current PR I would be in favor for allowing @daptm:langSrc only on <p> and <span> elements (so, only to the elements where it applies).

@andreastai In 1693603 I changed it so that daptm:langSrc can only be specified on tt, p and span which matches the data model diagram.

@nigelmegitt
Copy link
Contributor Author

Also, maybe the notes are targeted wrongly and should be more implementation-focused. At the moment it describes mitigations that authors can take, but realistically, authors are unlikely to be making a choice here. I'd be happy to try to draft a warning directly to implementers to point out that when changing daptm:langSrc or xml:lang on an element they need to check down the tree and be if appropriate specify the value on descendant elements so that their meaning does not change unintentionally.

Warnings improved to mention implementers as well as authors in 106b951.

@nigelmegitt
Copy link
Contributor Author

@cconcolato @andreastai thank you both for persevering with your reviews on this. I think all the comments have now been addressed, if you'd like to check, and mark any outstanding conversation threads as "resolved" and add positive reviews if you're happy to.

The one conversation where we have maybe not arrived at the same conclusion is whether or not to permit daptm:langSrc on <tt>. I think I've gone as far as I'd be happy with on this for now - @andreastai are you able to accept it as is?

Copy link
Contributor

@cconcolato cconcolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nigelmegitt Thank you for the work done! It reads great to me now. I still have some minor changes to suggest but they are not blocking, so I'm approving this PR!

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
figures/sources/class-diagram.puml Show resolved Hide resolved
index.html Show resolved Hide resolved
Copy link

@andreastai andreastai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the effort @nigelmegitt I went through all the changes to address my comments and all looks good to me : ) I had just one minor comment that is not blocking and could also be applied after merging the PR.

@nigelmegitt nigelmegitt merged commit 8d16753 into main Dec 21, 2023
2 checks passed
@nigelmegitt nigelmegitt deleted the issue-0173-text-lang-source-for-non-dialogue-sound branch December 21, 2023 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda Issue flagged for in-meeting discussion
Projects
None yet
4 participants