-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguishing between cues for visually impaired users and cues for hearing impaired users #488
Comments
Both HTML and the VTT-in-MP4 allow use of a 'kind' designation, and it would seem in scope for that attribute. Could it work? https://developer.mozilla.org/en-US/docs/Web/HTML/Element/track |
Time to understand the different between subtitles and closed captions, I admit it! So, the first case is a subtitle and the second one is a cc? However, what about mixing both kinds in one vtt file this way? |
Visually impaired users are often not helped with text on the screen. There's a specific class of accessibility current called audio descriptions. |
Yes, though I have heard of some braille readers being able to present timed text, but I don't remember when or where. |
Generally, the difference between subtitles and closed captions is that subtitles are just for speech where-as closed captions include other auditory cues. Your second example of mumbling would generally occur in closed captions but not in subtitles. A description There's an example plugin for Video.js that finds description tracks and sues Text to Speech in the browser to read it aloud https://www.ca11y.com/videojs-speak-descriptions-track/ https://github.com/OwenEdwards/videojs-speak-descriptions-track |
If I've understood the issue right, this is a request to be able to tag individual cues within the same file, so they have their own "kind". |
Now I hope I see correctly:
However, the problem I'm trying to solve here, as it looks like for me at least, is probably no one in the industry uses two or three kinds of VTTs for each language. Mybe it's SRT fault as it's the dominant format in the industry despite the fact that it's not as mature as VTT format. But in practice, managing and editing one VTT (or SRT) file for each language is much practical and easier than managing and editing three or four ones for each. Even Cue-3
00:00:23.450 --> 00:00:27.000
## Scene 1: Introduction Markdown in VTT! (Thinking of opening a new issue for this! 🤔) |
In practice most publishers link many vtt files to a video so a user can choose what is the most appropriate for them. Even for authoring, it's much easier to manage the files than to manage cues with different types within a file. You may be hand authoring your file and thus think it's the easiest to have everything in one file. But that's not scalable. Most vtt files go through code pipelines and authoring happens with an authoring application that stores the cues in a database and then creates the vtt files from that. Even if you are hand authoring the vtt file, you need to optimise your result for your users, so they are able to choose what they need from a list once, not for every individual cue. |
Seems like the actual issue here after all was my lack of information about vtt. 😳 So, thank you all for your time and help. |
@anasram First, thanks so much for bringing up your requirements to this public issue tracker. It is so important that users like you bring in their ideas and requests from daily operation. Only through you, standard people can see if a specification can stand against the real world or if it needs an update. It is also never expected that you know all details of the standards. In the end only people who wrote the standard or contributed have this detailed knowledge. And even then, after some time, also standards people need to dig longer for an answer. The issue you brought up is from my perspective a valid use case and I have also encountered this when speaking with other stakeholders. One question is related to what @silviapfeiffer commented:
Do you have a subtitle and caption base that is authored for multiple purposes in advance or does it contain only contain the essential information and you add other distribution audíence/channel-specific metadata dynamically before playing it out? There is the case that you keep all the information in one file. Although subtitle standards are often not made for that users have for example multiple languages in one "master file". For the general task of annotating cues, the other question is if you can do it already with assigning classes through span markup e.g.
You can then use arbitrary class names to tag not only the complete cue but part of the cue. You can also pre-process the file based on the "tags" based. So, I agree with @nigelmegitt that it is about tagging (or annotation) but I would extend this beyond the question of "kind" and complete cues. Of course this first only works in an environment where you agree on the semantics of specific tags. But this is maybe not an issue for a technical standard and you need more flexible ways to negotiate this anyway. Classes are obviously used in WebVTT for assigning properties of a CSS pseudo-element. But I think it is no error to use class names also for annotation. Of course, you need to be sure to not apply unintentionally CSS of a Cue pseudo-element with a selector like for example |
Thank you @TairT for your kindness and interest.
I've been actually working on this: https://libreplanet.org/wiki/Group:FSF/User_Shoetool_Video_Translation/ar The original movie is here: https://www.fsf.org/blogs/community/presenting-shoetool-happy-holidays-from-the-fsf
And with a CSS rule like
... it couldn't be considered as a standard solution. BTW, does MP4 container distinguish between those kinds of VTTs? Seems to me like this VTT standard is not implemented in MP4 standard or related tools like ffmpeg. I just detected this when I tried to split ShoeTool's VTT file to 3 kinds and merging them into the original MP4 file. |
I'm not sure what you're asking here, as you are clearly aware of 14496-32, packing text tracks into MP4 files. There could easily be something extra needing said; what are you looking for? |
Of course I can pack text tracks into an MP4 file, but the resulted mp4 here considers all of those text tracks as "subtitles", even if the VTT header says Is this a limitation in:
Simply I'm looking now for a way to produce and test a video encloses different kinds of VTT tracks. |
the ISOBMFF allows putting the 'kind' into the track as user-data to elevate its visibility. Not sure whether people do, though |
Hi all!
Picture this in a movie:
As you can see:
The question is: don't those two cases should be distinguished from each other? What do you think of using some tags for this propose? Something like this:
Or even using emoji, like this:
Such tags will be useful for filtering cues according to user's preferences and requirements.
The text was updated successfully, but these errors were encountered: