Distinguishing between cues for visually impaired users and cues for hearing impaired users #488

anasram · 2020-08-24T14:06:58Z

Hi all!

Picture this in a movie:

A scene with a textual banner; e.g. "XYZ Shop".
- You probably write a cue for it like this: ["XYZ Shop"].
A scene with non-lingual sounds (music instruments, someone mumbling… etc).
- You may write a cue for it like this: [Soft music] or [Mumbling].

As you can see:

The first case is useful for visually-impaired users, and could be useful for translation for foreign-language speakers.
The second one is useful for hearing impaired users.

The question is: don't those two cases should be distinguished from each other? What do you think of using some tags for this propose? Something like this:

[::text::]["XYZ Shop"]
[::sound::][Soft music], [::sound::][Mumbling]

Or even using emoji, like this:

[🖹]["XYZ Shop"]
[🔊][Soft music], [🔊][Mumbling]

Such tags will be useful for filtering cues according to user's preferences and requirements.

dwsinger · 2020-08-24T17:29:31Z

Both HTML and the VTT-in-MP4 allow use of a 'kind' designation, and it would seem in scope for that attribute. Could it work? https://developer.mozilla.org/en-US/docs/Web/HTML/Element/track

anasram · 2020-08-24T18:54:48Z

Time to understand the different between subtitles and closed captions, I admit it!

So, the first case is a subtitle and the second one is a cc?

However, what about mixing both kinds in one vtt file this way?

silviapfeiffer · 2020-08-24T19:29:52Z

Visually impaired users are often not helped with text on the screen. There's a specific class of accessibility current called audio descriptions.
We've talked a lot about how to turn a webvtt track intro an audio description using speech recognition and example exist.

dwsinger · 2020-08-24T20:05:09Z

Yes, though I have heard of some braille readers being able to present timed text, but I don't remember when or where.

gkatsev · 2020-08-24T20:07:55Z

Generally, the difference between subtitles and closed captions is that subtitles are just for speech where-as closed captions include other auditory cues. Your second example of mumbling would generally occur in closed captions but not in subtitles.
Also, worth noting that Closed Captions is a fairly US-centric term. A lot of other countries refer to these as Subtitles for the Hard of Hearing.

A description kind is likely what's wanted for writing out visible things as Silvia mentioned.

There's an example plugin for Video.js that finds description tracks and sues Text to Speech in the browser to read it aloud https://www.ca11y.com/videojs-speak-descriptions-track/ https://github.com/OwenEdwards/videojs-speak-descriptions-track

nigelmegitt · 2020-08-25T08:56:14Z

If I've understood the issue right, this is a request to be able to tag individual cues within the same file, so they have their own "kind".

anasram · 2020-08-25T13:17:04Z

Now I hope I see correctly:

descriptions: [🖹]["XYZ Shop"]
captions: [🔊][Soft music], [🔊][Mumbling]

However, the problem I'm trying to solve here, as it looks like for me at least, is probably no one in the industry uses two or three kinds of VTTs for each language. Mybe it's SRT fault as it's the dominant format in the industry despite the fact that it's not as mature as VTT format.

But in practice, managing and editing one VTT (or SRT) file for each language is much practical and easier than managing and editing three or four ones for each.

Even chapters VTT file could be merged too. Think of something like this:

Cue-3
00:00:23.450 --> 00:00:27.000
## Scene 1: Introduction

Markdown in VTT!

(Thinking of opening a new issue for this! 🤔)

silviapfeiffer · 2020-08-26T20:42:20Z

In practice most publishers link many vtt files to a video so a user can choose what is the most appropriate for them.

Even for authoring, it's much easier to manage the files than to manage cues with different types within a file.

You may be hand authoring your file and thus think it's the easiest to have everything in one file. But that's not scalable. Most vtt files go through code pipelines and authoring happens with an authoring application that stores the cues in a database and then creates the vtt files from that.

Even if you are hand authoring the vtt file, you need to optimise your result for your users, so they are able to choose what they need from a list once, not for every individual cue.

anasram · 2020-08-29T16:33:28Z

Seems like the actual issue here after all was my lack of information about vtt. 😳

So, thank you all for your time and help.

andreastai · 2020-08-31T07:49:59Z

@anasram First, thanks so much for bringing up your requirements to this public issue tracker. It is so important that users like you bring in their ideas and requests from daily operation. Only through you, standard people can see if a specification can stand against the real world or if it needs an update. It is also never expected that you know all details of the standards. In the end only people who wrote the standard or contributed have this detailed knowledge. And even then, after some time, also standards people need to dig longer for an answer.

The issue you brought up is from my perspective a valid use case and I have also encountered this when speaking with other stakeholders. One question is related to what @silviapfeiffer commented:

You may be hand authoring your file and thus think it's the easiest to have everything in one file. But that's not scalable. Most vtt files go through code pipelines and authoring happens with an authoring application that stores the cues in a database and then creates the vtt files from that.

Do you have a subtitle and caption base that is authored for multiple purposes in advance or does it contain only contain the essential information and you add other distribution audíence/channel-specific metadata dynamically before playing it out?

There is the case that you keep all the information in one file. Although subtitle standards are often not made for that users have for example multiple languages in one "master file".

For the general task of annotating cues, the other question is if you can do it already with assigning classes through span markup e.g.

00:00:00.000 --> 00:00:02.000
<.music>[Soft music]</.music> Hello world!

You can then use arbitrary class names to tag not only the complete cue but part of the cue. You can also pre-process the file based on the "tags" based. So, I agree with @nigelmegitt that it is about tagging (or annotation) but I would extend this beyond the question of "kind" and complete cues.

Of course this first only works in an environment where you agree on the semantics of specific tags. But this is maybe not an issue for a technical standard and you need more flexible ways to negotiate this anyway.

Classes are obviously used in WebVTT for assigning properties of a CSS pseudo-element. But I think it is no error to use class names also for annotation. Of course, you need to be sure to not apply unintentionally CSS of a Cue pseudo-element with a selector like for example music.

anasram · 2020-09-01T10:02:42Z

Thank you @TairT for your kindness and interest.

Do you have a subtitle and caption base that is authored for multiple purposes in advance or does it contain only contain the essential information and you add other distribution audíence/channel-specific metadata dynamically before playing it out?

I've been actually working on this:

https://libreplanet.org/wiki/Group:FSF/User_Shoetool_Video_Translation/ar

The original movie is here:

https://www.fsf.org/blogs/community/presenting-shoetool-happy-holidays-from-the-fsf

Classes are obviously used in WebVTT for assigning properties of a CSS pseudo-element.

And with a CSS rule like {display: none}, we'll have a very flexible solution. But since you need to

use arbitrary class names

... it couldn't be considered as a standard solution.

BTW, does MP4 container distinguish between those kinds of VTTs? Seems to me like this VTT standard is not implemented in MP4 standard or related tools like ffmpeg. I just detected this when I tried to split ShoeTool's VTT file to 3 kinds and merging them into the original MP4 file.

dwsinger · 2020-09-01T16:18:49Z

BTW, does MP4 container distinguish between those kinds of VTTs? Seems to me like this VTT standard is not implemented in MP4 standard or related tools like ffmpeg. I just detected this when I tried to split ShoeTool's VTT file to 3 kinds and merging them into the original MP4 file.

I'm not sure what you're asking here, as you are clearly aware of 14496-32, packing text tracks into MP4 files. There could easily be something extra needing said; what are you looking for?

anasram · 2020-09-02T10:45:22Z

Of course I can pack text tracks into an MP4 file, but the resulted mp4 here considers all of those text tracks as "subtitles", even if the VTT header says kind: captions or kind: descriptions for example.

Is this a limitation in:

MP4 standard?
or the player I use (Kaffeine, XPlayer, VLC... etc)
or the tool I use to pack text tracks into the video file (ffmpeg)?

Simply I'm looking now for a way to produce and test a video encloses different kinds of VTT tracks.

dwsinger · 2020-09-02T14:43:52Z

the ISOBMFF allows putting the 'kind' into the track as user-data to elevate its visibility. Not sure whether people do, though

anasram closed this as completed Aug 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguishing between cues for visually impaired users and cues for hearing impaired users #488

Distinguishing between cues for visually impaired users and cues for hearing impaired users #488

anasram commented Aug 24, 2020

dwsinger commented Aug 24, 2020

anasram commented Aug 24, 2020

silviapfeiffer commented Aug 24, 2020

dwsinger commented Aug 24, 2020

gkatsev commented Aug 24, 2020

nigelmegitt commented Aug 25, 2020

anasram commented Aug 25, 2020 •

edited

silviapfeiffer commented Aug 26, 2020

anasram commented Aug 29, 2020

andreastai commented Aug 31, 2020

anasram commented Sep 1, 2020

dwsinger commented Sep 1, 2020

anasram commented Sep 2, 2020

dwsinger commented Sep 2, 2020

Distinguishing between cues for visually impaired users and cues for hearing impaired users #488

Distinguishing between cues for visually impaired users and cues for hearing impaired users #488

Comments

anasram commented Aug 24, 2020

dwsinger commented Aug 24, 2020

anasram commented Aug 24, 2020

silviapfeiffer commented Aug 24, 2020

dwsinger commented Aug 24, 2020

gkatsev commented Aug 24, 2020

nigelmegitt commented Aug 25, 2020

anasram commented Aug 25, 2020 • edited

silviapfeiffer commented Aug 26, 2020

anasram commented Aug 29, 2020

andreastai commented Aug 31, 2020

anasram commented Sep 1, 2020

dwsinger commented Sep 1, 2020

anasram commented Sep 2, 2020

dwsinger commented Sep 2, 2020

anasram commented Aug 25, 2020 •

edited