-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spoken subtitle #13
Comments
This issue is not in a valid format, as it does not include the information requested in the requirement template. Please edit your comment in the above or close the issue if you do not intend to provide this information. Also, please be aware that TTML2 already provides support for spoken subtitles. |
Thank you. I am writing further requirements for spoken subtitles, and I hope to post them soon. I have never posted to Github, so I had to generate an account and understand how to do it. Sorry but I will deliver very soon. |
@porero thanks; please paste your detailed requirements into the initial comment above (by using the edit option); also, please be sure to identify (1) how your requirements are not met by current TTML2 audio and speech to text functionality, and (2) whether your proposal applies to IMSC and/or TTML; if you are not familiar with IMSC, it is a profile of TTML, and includes only support for a subset of the features of TTML, e.g., it does not (at present) support the audio or speech to text features. |
[Updated: content of this comment moved to the top] |
Hello @porero. I have read your above elaboration, and I must admit that I do not understand what you are asking for that isn't already supported by TTML (in one form or another). Your requests seem to boil down to a need to provide text to speech (although above you mention "speech to text" which confuses me). Much of what you mention above pertains to applications that make use of TTML, and not TTML specific features or technology. We (the TTWG) view TTML (and IMSC and other profiles, such as SMPTE-TT and EBU-TT) as enabling technologies to be integrated into and employed by applications in a variety of domains, only one of which is the delivery of caption or subtitle data. That said, the TTWG does not undertake to define specific applications of TTML, though we have, at times, focused on the needs of specific applications to drive the definition of new features (e.g., Japanese subtitles, live captioning, karaoke, etc.) However, unless we can identify what specific features are missing, we cannot take further action. Regarding the specifics of your proposal, we need more information such as:
Finally, I would urge you to carefully review the details of audio and text to speech support in TTML2 so that we may base this conversation on a common understanding of what is already present, and what might be missing. Absent the identification of specific missing features, I fear the TTWG will (eventually) close this issue without taking any action. |
Please note that @skynavga is a member of the Timed Text Working Group and the group has not discussed this issue yet; as such his #13 (comment) does not represent a consensus view of the group at this time. As Chair of the TTWG I would like to thank you for raising this @porero and congratulate you (and sympathise) for using GitHub in this way for the first time. What you've done is fine for us to make a start with understanding your submission, and I may go and edit the opening comment at the top of the issue to match what you added in #13 (comment), for clarity, if that's okay with you? We may also have some follow-up questions so please watch this space. For the benefit of others watching this, as it happens @porero and I had a chance to discuss this briefly around a month ago, and my understanding was that the core requirement is to be able to provide a user experience where someone who does not understand the original language audio and cannot read the visual representation of the (audio) spoken words translated into a language they do understand, instead gets to hear an audio representation of that translation text, co-timed with the audio. This therefore makes that content accessible. This practice is in use in some countries already - I remember hearing it in use in the Netherlands many years ago. I agree with @skynavga that there are likely to be parts of the big picture requirement that we cannot handle in TTWG, for example optical character recognition of burnt-in translation subtitle text, which does not seem to be within our scope. However there are other parts of this requirement that may need some modification to TTML or IMSC. For example, right now we can specify the language of text, and the timing, and whether or not the presentation of that text should be "forced", so one solution might be to recommend in implementations that all forced subtitles/captions are made available to a screen reader, which can be done. However another might be to add richer data, i.e. to label the text using a Another side to this is what the user experience should be, which may or may not be in scope of the TTWG's work. For example, should we recommend that implementations provide options for presenting translations (however they have been identified) in vision only, in audio only, or in both? Should the default audio renderer be a screen reader, or something else? As mentioned before, TTWG will need to make a call on which of these are in scope and achievable, and which are not. And it may be that no change is needed in the TTML or IMSC specifications at all. |
Dear Nigel, please go ahead and edit the opening comment at the top of the issue to match what I added in #13 (comment) thank you. |
Just to be clear, I am speaking with my editor hat on, not simply as a member. |
@porero Thanks, I've done that. |
@porero Thanks for submitting this important request. I very much like how you added real world example for this use case. I support this request and I think that two additions to TTML, IMSC and/or a profile are needed: a) adding syntax that express the desired behaviour I also think that this not a niche requirement but a requirement that supports an important accessibility service. This should be taken into consideration when there is a discussion if this is in scope. |
@porero @TairT this thread mentions three different possible high-level requirements, as far as I can tell:
While the first two of these are interesting research projects, they are clearly out of scope for TTML/IMSC. The third is also interesting, as part of an application environment that makes use of TTML/IMSC, but here I don't see a specific ask that would lead to the possibility of, say, "adding syntax". What I would need to see to proceed (in any fashion at all on this request) is an actual implementation of a real world system that uses language translation on existing text track content, from which specific proposals might appear that would suggest adding any specific syntax. As it is, the existing metadata (and general language extensibility) support in TTML/IMSC already supports this last (of three) applications, so I again conclude there is no requirement for a new syntax or feature being proposed here. |
@skynavga My reading of this is that there may be gaps in the signalling aspect of when subtitles are a translation vs when they are in the base language, and how to trigger the desired presentation behaviour, i.e. there may be (or may not be) syntactic and semantic gaps relating to this in our specs. I agree that content processing tasks like speech or image to text, or automated translation, are out of scope of the document formats we are chartered to work on in TTWG, except insofar as it should be possible to be able to express the output of those tasks in a TTML document. |
@nigelmegitt It is not the design intent of TTML to define or employ semantic markup, at least beyond what is currently supported by As the TTML metadata and markup systems are extensible on a per-application basis, TTML already supports any and all markup that might be desired by a specific application. The present proposal (this issue) does not make reference to any specific application and does not give any hint of what markup they are seeking, let alone how such markup may have any presentation semantics for TTML. As such, I see this issue as non-actionable, and certainly not suggestive of any new features, either semantic or syntactic. If the proposer develops a specific application and brings to our attention a requirement for specific semantic markup, then they are free to do so in the future. |
This was picked up by TTWG today. The discussion was not fully recorded in the minutes due to the IRC server going down and coming back again mid-way through, but some parts are available at https://www.w3.org/2019/01/31-tt-minutes.html#item08 SUMMARY: We think the requirement here is to signal translations, and describe (potential) workflows for triggering TTS based on translations.
|
Couldn't the forced narrative track be used, since it includes timed text intended for all viewers of a particular language? |
My position is
|
The Timed Text Working Group just discussed The full IRC log of that discussion<nigel> Topic: Spoken subtitle tt-reqs#13<nigel> github: https://github.com//issues/13 <nigel> Glenn: Nigel have you managed to contact the issue raiser on this? <nigel> Nigel: No, sorry, thanks for the reminder, I need to follow up with Pilar. |
Is your feature request related to a problem? Please describe.
When a media content is in a different language to that spoken, and is subtitles, people who can’t read subtitles don’t have access to content, because what they hear is in a different language.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
State if you intend for this requirement to be met by a particular specification
Does this requirement represent a change in scope
Additional context
Use cases:
The text was updated successfully, but these errors were encountered: