Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live captioning - incremental cues review #320

silviapfeiffer opened this issue Oct 14, 2016 · 3 comments

Live captioning - incremental cues review #320

silviapfeiffer opened this issue Oct 14, 2016 · 3 comments


Copy link

@silviapfeiffer silviapfeiffer commented Oct 14, 2016

To address REQ2 of #318 , we are after an extension of the WebVTT file format.

The principle idea is that we map the TextTrack API calls from #319 to how we would archive them in a WebVTT file to replicate the functionality.

The approach we try out here is to use the 00:00:00 cue timestamps as the means to separate cues into smaller incremental cues. <now()> will be acue timestamp with the now() time.

Ref 608 control commands TextTrackCue API calls WebVTT file addition
1 start caption text / resume caption text / resume direct captioning new VTTCue(now(), NULL, '') - make sure to set the defaults as required by 608 - then: textTrack.addCue(cue) now() --> NULL
2 add a character cue.text += char <now()> char
3 next row down toggle (includes end all style) cue.text += '\n' (may need to end </c>,</i>,</b>,</u>) <now()> \n
4 row indicator (one of 15 rows) cue.line = row (whichever row calculated) <now()> <set line=row>
5 underline toggle cue.text += "<u>" or cue.text += "</u>" (need to keep track of toggle state) <now()> <u> or <now()> </u>
6 style change (one of 7 text colors and italics) cue.text += "<c.white>" (need to have the color style classes pre-defined) and cue.text += "<i>" <now()> <c.white> <i> etc.
7 8 ident positions cue.position = offset (whichever offset calculated from ident pos) <now()> <set position=offset>
8 8 background colors cue.text += "<c.bg_white>" (need to have the background color style classes pre-defined) <now()> <c.bg_white>
9 backspace cue.text = cue.text.substr(0, cue.text.length - 1) <now()> <set substr=(0,-1)>
10 delete till end of row cue.text = cue.text.substr(0, cursor_pos) (need to keep track of the 608 cursor position) <now()> <set substr=(0, cursor_pos)>
11 rollup caption with 2, 3 or 4 rows new VTTRegion() then region.lines = x N/A - make sure any required regions have been defined in the header
12 flash on (srlsy?) cue.text += "<c.blink>" <now()> <c.blink>
13 erase displayed memory (clear screen) cue.text = '' <now()> <set substr=(0,0)>
14 carriage return (scroll lines up) cue.endTime = now(); cue.region = region; new VTTCue(); <now()> <set region=ID> \n\n now() --> NULL
15 end of caption cue.endTime = now() <now()> <set endTime=now()>
16 clear screen (erase display memory) cue.text = '' <now()> <set substr=(0,0)>
17 tab offset 1/2/3 (add whitespace) cue.text += " " * num_space (calculate numspace from tab offset as per ) <now()> (add required number of spaces)

Looks like what we need is a way to change cue settings and cut cue length half-way through cues, as well as an undefined end time that can be set at a later stage.


This comment has been minimized.

Copy link
Member Author

@silviapfeiffer silviapfeiffer commented Oct 4, 2017

goal from FOMS: add test cases


This comment has been minimized.

Copy link
Member Author

@silviapfeiffer silviapfeiffer commented Oct 4, 2017

discussion at FOMS: separate between two use cases:

1/ live broadcasting

This has near-realtime requirements and focuses on the use of WebVTT with MSE/HLS.
Main requirement there is to allow undefined end times and update them on cues.

2/ real-time video/audio communication (also called Realtime Captioning RTC)

This has realtime requirements with an ability to support the "editing"-type functionality of 608/708.
This use case motives the spec in this bug.


This comment has been minimized.

Copy link

@dwsinger dwsinger commented Oct 4, 2017

I think we may need VTT file-level support for the concept "I am updating this cue" so that clients can work out what's new/changed.
One case is incremental builds e.g. after speech recognition
I think
I think we may
I think we may need
I think we may need incrementally

and so on. Another case is where a cue has to be sent immediately but can then be edited and fixed.

I would very much like to understand best current practices in captioning of video telephony and conferences (if any).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
3 participants
You can’t perform that action at this time.