-
Notifications
You must be signed in to change notification settings - Fork 40
/
index.bs
5217 lines (3911 loc) · 212 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: WebVTT: The Web Video Text Tracks Format
H1: WebVTT: The Web Video Text Tracks Format
Shortname: webvtt1
Status: CG-DRAFT
Group: texttracks
ED: https://w3c.github.io/webvtt/
Level: 1
Editor: Simon Pieters, Opera Software ASA http://www.opera.com/, simonp@opera.com
Editor: Courtney Kennedy, Apple Inc. http://www.apple.com/, ckennedy@apple.com
Former Editor: Silvia Pfeiffer, NICTA http://nicta.com.au/, silviapfeiffer1@gmail.com
Former Editor: Philip Jägenstedt, Opera Software ASA http://www.opera.com/, philipj@opera.com
Former Editor: Ian Hickson, Google http://www.google.com/, ian@hixie.ch
!Participate: <a href=https://github.com/w3c/webvtt>GitHub w3c/webvtt</a> (<a href=https://github.com/w3c/webvtt/issues/new>new issue</a>, <a href=https://github.com/w3c/webvtt/issues>open issues</a>, <a href=https://www.w3.org/Bugs/Public/buglist.cgi?product=TextTracks%20CG&component=WebVTT&resolution=--->legacy open bugs</a>)
!Commits: <a href=https://github.com/w3c/webvtt/commits>GitHub w3c/webvtt/commits</a>
Test Suite: https://github.com/w3c/web-platform-tests/tree/master/webvtt
Abstract: This specification defines WebVTT, the Web Video Text Tracks format. Its main use is for marking up external text track resources in connection with the HTML <track> element.
Abstract: WebVTT files provide captions or subtitles for video content, and also text video descriptions [[MAUR]], chapters for content navigation, and more generally any form of metadata that is time-aligned with audio or video content.
Boilerplate: omit conformance, omit feedback-header
</pre>
<pre class='anchors'>
urlPrefix: https://html.spec.whatwg.org/multipage/
type: dfn
urlPrefix: infrastructure.html
text: ascii digits
text: split a string on spaces
text: skip whitespace
text: alphanumeric ascii characters
text: space character
urlPrefix: embedded-content.html
text: text track kind
text: text track cue
text: text track list of cues
text: text track
text: list of text tracks
text: media element
text: text track mode
text: text track showing
text: rules for updating the text track rendering
text: text track cue active flag
text: text track cue text
text: text track cue display state
text: current playback position
text: text track cue identifier
text: text track cue pause-on-exit flag
text: rules for extracting the chapter title
text: text track cue start time
text: text track cue end time
text: expose a user interface to the user
text: text track cue order
type: element-attr
urlPrefix: dom.html
text: title; url: #attr-title
text: lang; url: #attr-lang
text: class; url: #classes
urlPrefix: https://encoding.spec.whatwg.org/
type: dfn
text: utf-8 decode
urlPrefix: https://heycam.github.io/webidl/
type: exception
text: IndexSizeError
</pre>
<pre class=link-defaults>
spec:dom-ls; type:interface; text:Document
spec:css-ruby-1; type:value; text:ruby-base
spec:css-color-4; type:property; text:color
spec:css-fonts-3; type:property; text:font-style
spec:css-fonts-3; type:property; text:font-weight
spec:css-ruby-1; type:value; text:ruby
spec:css-ruby-1; type:value; text:ruby-text
spec:css21; type:property; text:min-height
spec:css21; type:property; text:max-height
spec:css-flexbox-1; type:value; text:inline-flex
</pre>
<pre class=biblio>
{
"MAUR": {
"authors": [ "Shane McCarron", "Michael Cooper", "Mark Sadecki" ],
"href": "http://www.w3.org/TR/media-accessibility-reqs/",
"title": "Media Accessibility User Requirements",
"status": "WD",
"publisher": "W3C"
}
}
</pre>
<h2 id=introduction>Introduction</h2>
<p><i>This section is non-normative.</i></p>
<p>The <dfn>WebVTT</dfn> (Web Video Text Tracks) format is intended for marking up external text
track resources.</p>
<p>The main use for WebVTT files is captioning or subtitling video content. Here is a sample file
that captions an interview:</p>
<pre>
WEBVTT
00:11.000 --> 00:13.000
<v Roger Bingham>We are in New York City
00:13.000 --> 00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street
00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History
00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson
00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium
00:22.000 --> 00:24.000
<v Roger Bingham>at the AMNH.
00:24.000 --> 00:26.000
<v Roger Bingham>Thank you for walking down here.
00:27.000 --> 00:30.000
<v Roger Bingham>And I want to do a follow-up on the last conversation we did.
00:30.000 --> 00:31.500 align:end size:50%
<v Roger Bingham>When we e-mailed—
00:30.500 --> 00:32.500 align:start size:50%
<v Neil deGrasse Tyson>Didn't we talk about enough in that conversation?
00:32.000 --> 00:35.500 align:end size:50%
<v Roger Bingham>No! No no no no; 'cos 'cos obviously 'cos
00:32.500 --> 00:33.500 align:start size:50%
<v Neil deGrasse Tyson><i>Laughs</i>
00:35.500 --> 00:38.000
<v Roger Bingham>You know I'm so excited my glasses are falling off here.
</pre>
<h3 id=introduction-multiple-lines>Cues with multiple lines</h3>
<p><i>This section is non-normative.</i></p>
<p>Line breaks in cues are honored. User agents will also insert extra line breaks if necessary to
fit the cue in the cue's width. In general, therefore, authors are encouraged to write cues all on
one line except when a line break is definitely necessary.</p>
<div class="example">
<p>These captions on a public service announcement video demonstrate line breaking:</p>
<pre>
WEBVTT
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
00:10.000 --> 00:14.000
The Organisation for Sample Public Service Announcements accepts no liability for the content of this advertisement, or for the consequences of any actions taken on the basis of the information provided.
</pre>
<p>The first cue is simple, it will probably just display on one line. The second will take two
lines, one for each speaker. The third will wrap to fit the width of the video, possibly taking
multiple lines. For example, the three cues could look like this:</p>
<!-- 50 -->
<pre>
Never drink liquid nitrogen.
— It will perforate your stomach.
— You could die.
The Organisation for Sample Public Service
Announcements accepts no liability for the
content of this advertisement, or for the
consequences of any actions taken on the
basis of the information provided.
</pre>
<p>If the width of the cues is smaller, the first two cues could wrap as well, as in the following
example. Note how the second cue's explicit line break is still honored, however:</p>
<!-- 25 -->
<pre>
Never drink
liquid nitrogen.
— It will perforate
your stomach.
— You could die.
The Organisation for
Sample Public Service
Announcements accepts
no liability for the
content of this
advertisement, or for
the consequences of
any actions taken on
the basis of the
information provided.
</pre>
<p>Also notice how the wrapping is done so as to keep the line lengths balanced.</p>
</div>
<h3 id=introduction-comments>Comments</h3>
<p><i>This section is non-normative.</i></p>
<p>Comments can be included in WebVTT files.</p>
<p>Comments are just blocks that are preceded by a blank line, start with the word
"<code>NOTE</code>" (followed by a space or newline), and end at the first blank line.</p>
<div class="example">
<p>Here, a one-line comment is used to note a possible problem with a cue.</p>
<pre>
WEBVTT
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
NOTE I'm not sure the timing is right on the following cue.
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
</pre>
</div>
<div class="example">
<p>In this example, the author has written many comments.</p>
<pre>
WEBVTT
NOTE
This file was written by Jill. I hope
you enjoy reading it. Some things to
bear in mind:
- I was lip-reading, so the cues may
not be 100% accurate
- I didn't pay too close attention to
when the cues should start or end.
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
NOTE check next cue
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
NOTE end of file
</pre>
</div>
<h3 id=introduction-other-features>Other features</h3>
<p><i>This section is non-normative.</i></p>
<p>WebVTT also supports some less-often used features.</p>
<div class="example">
<p>In this example, the cues have an identifier:</p>
<pre>
WEBVTT
1
00:00.000 --> 00:02.000
That's an, an, that's an L!
crédit de transcription
00:04.000 --> 00:05.000
Transcrit par Célestes™
</pre>
<p>This allows a style sheet to specifically target the cues (notice the use of CSS character
escape sequences):</p>
<pre>
::cue(#\31) { color: green; }
::cue(#crédit\ de\ transcription) { color: red; }
</pre>
</div>
<div class="example">
<p>In this example, each cue says who is talking using voice spans. In the first cue, the span
specifying the speaker is also annotated with two classes, "first" and "loud". In the third cue,
there is also some italics text (not associated with a specific speaker). The last cue is annotated
with just the class "loud".</p>
<pre>
WEBVTT
00:00.000 --> 00:02.000
<v.first.loud Esme>It's a blue apple tree!
00:02.000 --> 00:04.000
<v Mary>No way!
00:04.000 --> 00:06.000
<v Esme>Hee!</v> <i>laughter</i>
00:06.000 --> 00:08.000
<v.loud Mary>That's awesome!
</pre>
<p>Notice that as a special exception, the voice spans don't have to be closed if they cover the
entire cue text.</p>
<p>Style sheets can style these spans:</p>
<pre>
::cue(v[voice="Esme"]) { color: blue }
::cue(v[voice="Mary"]) { color: green }
::cue(i) { font-style: italic }
::cue(.loud) { font-size: 2em }
</pre>
</div>
<div class="example">
<p>This example shows how to position cues at explicit positions in the video viewport.</p>
<pre>
WEBVTT
00:00:00.000 --> 00:00:04.000 position:10%,start align:start size:35%
Where did he go?
00:00:03.000 --> 00:00:06.500 position:90% align:end size:35%
I think he went down this lane.
00:00:04.000 --> 00:00:06.500 position:45%,end align:middle size:35%
What are you waiting for?
</pre>
<p>Since the cues in these examples are horizontal, the "position" setting refers to a percentage
of the width of the video viewpoint. If the text were vertical, the "position" setting would refer
to the height of the viewport.</p>
<p>The "start" or "end" only refers to the physical side of the box to which the "position" setting
applies, in a way which is agnostic regarding the horizontal or vertical direction of the cue. It
does not affect or relate to the direction or position of the text itself within the box.</p>
<p>The cues cover only 35% of the video viewport's width - that's the <a lt="WebVTT cue box">cue
box</a>'s "size" for all three cues.</p>
<p>The first cue has its <a lt="WebVTT cue box">cue box</a> positioned at the 10% mark. The "start"
and "end" within the "position" setting indicates which side of the <a lt="WebVTT cue box">cue
box</a> the position refers to. Since in this case the text is horizontal, "start" refers to the
left side of the box, and the cue box is thus positioned between the 10% and the 45% mark of the
video viewport's width, probably underneath a speaker on the left of the video image. If the cue
was vertical, "start" positioning would be from the top of the video viewport's height and the <a
lt="WebVTT cue box">cue box</a> would cover 35% of the video viewport's height.</p>
<p>The text within the first cue's cue box is aligned using the "align" cue setting. For
left-to-right rendered text, "start" alignment is the left of that box, for right-to-left rendered
text the right of the box. So, independent of the directionality of the text, it will stay
underneath that speaker. Note that "start" alignment of the cue box is the default for start
aligned text, so does not need to be specified in "position".</p>
<p>The second cue has its <a lt="WebVTT cue box">cue box</a> right aligned at the 90% mark of the
video viewport width ("end" aligned text right aligns the box). The same effect can be achieved
with "position:55%,start", which explicitly positions the cue box. The third cue has middle aligned
text within the same positioned cue box as the first cue.</p>
</div>
<div class="example">
<p>This example shows two regions containing rollup captions for two different speakers. Fred's
cues scroll up in a region in the left half of the video, Bill's cues scroll up in a region on the
right half of the video. Fred's first cue disappears at 12.5sec even though it is defined until
20sec because its region is limited to 3 lines and at 12.5sec a fourth cue appears:</p>
<pre>
WEBVTT
Region: id=fred width=40% lines=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up
Region: id=bill width=40% lines=3 regionanchor=100%,100% viewportanchor=90%,90% scroll=up
00:00:00.000 --> 00:00:20.000 region:fred align:left
<v Fred>Hi, my name is Fred
00:00:02.500 --> 00:00:22.500 region:bill align:right
<v Bill>Hi, I'm Bill
00:00:05.000 --> 00:00:25.000 region:fred align:left
<v Fred>Would you like to get a coffee?
00:00:07.500 --> 00:00:27.500 region:bill align:right
<v Bill>Sure! I've only had one today.
00:00:10.000 --> 00:00:30.000 region:fred align:left
<v Fred>This is my fourth!
00:00:12.500 --> 00:00:32.500 region:fred align:left
<v Fred>OK, let's go.
</pre>
<p>Note that regions are only defined for horizontal cues.</p>
</div>
<h2 id=conformance>Conformance</h2>
<h3 id=conformance-for-authors>Conformance for authors</h3>
<p>The <a href="#syntax">Syntax</a> section of this specification defines what consists a valid
WebVTT document. Authors need to follow the <a href="#syntax">Syntax</a> specification and are
encouraged to use a validator.</p>
<p>The <a href="#parsing">Parsing</a> section of this specification defines in some detail the
required processing for valid and also for invalid documents. It is a little more tolerant to author
errors than the syntax allows, so as to reject less documents and provide for extensibility.
However, authors must not take advantage of it. Only documents that follow the <a
href="#syntax">Syntax</a> specification are valid.</p>
<h3 id=unicode-normalization>Unicode normalization</h3>
<p>Implementations of this specification must not normalize Unicode text during processing.</p>
<p>For example, a cue with the identifier consisting of the characters U+0041 LATIN CAPITAL LETTER A
and U+212B ANGSTROM SIGN will not match a selector targeting a cue with an ID consisting of the
character U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE.</p>
<h3 id=document-conformance>Document conformance</h3>
<p>All diagrams, examples, and notes in this specification are non-normative, as are all sections
explicitly marked non-normative. Everything else in this specification is normative.</p>
<p>The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in RFC2119. The key word "OPTIONALLY" in
the normative parts of this document is to be interpreted with the same normative meaning as "MAY"
and "OPTIONAL". For readability, these words do not appear in all uppercase letters in this
specification. [[!RFC2119]]</p>
<p>Requirements phrased in the imperative as part of algorithms (such as "strip any leading space
characters" or "return false and abort these steps") are to be interpreted with the meaning of the
key word ("must", "should", "may", etc) used in introducing the algorithm.</p>
<p>Conformance requirements phrased as algorithms or specific steps may be implemented in any
manner, so long as the end result is equivalent. (In particular, the algorithms defined in this
specification are intended to be easy to follow, and not intended to be performant.)</p>
<h2 id=data-model>Data model</h2>
<!-- Add some content here about cues and serialisation format in general -->
<!-- Describe metadata, caption/subtitle, chapter & description cues -->
<h3 id=cues>WebVTT cues</h3>
<p>A <dfn>WebVTT cue</dfn> is a <a>text track cue</a> that additionally consist of the following:
[[!HTML]]</p>
<dl>
<dt><dfn lt="WebVTT cue box">A cue box</dfn></dt>
<dd>
<p>The cue box of a <a>WebVTT cue</a> is a box within which the text of all lines of the cue is to
be rendered.</p>
<p class="note">The position of the <a lt="WebVTT cue box">cue box</a> within the video viewport's
dimensions depends on the value of the <a>WebVTT cue position</a> and the <a>WebVTT cue
line</a>.</p>
<p class="note">Lines are wrapped within the <a lt="WebVTT cue box">cue box</a>'s <a lt="WebVTT
cue size">size</a> if lines' lengths make this necessary.</p>
</dd>
<dt><dfn lt="WebVTT cue writing direction">A writing direction</dfn></dt>
<dd>
<p>A writing direction, either</p>
<ul>
<li><dfn lt="WebVTT cue horizontal writing direction">horizontal</dfn> (a line extends
horizontally and is offset vertically from the video viewport's top edge, with consecutive lines
displayed below each other),</li>
<li> <dfn lt="WebVTT cue vertical growing left writing direction">vertical growing left</dfn> (a
line extends vertically and is offset horizontally from the video viewport's
right edge, with consecutive lines displayed to the left of each other<!-- used for east
asian-->), or</li>
<li><dfn lt="WebVTT cue vertical growing right writing direction">vertical growing right</dfn> (a
line extends vertically and is offset horizontally from the video viewport's left edge, with
consecutive lines displayed to the right of each other<!-- used for mongolian -->).</li>
</ul>
<p>If the <a lt="WebVTT cue writing direction">writing direction</a> is <a lt="WebVTT cue
horizontal writing direction">horizontal</a>, then the <a lt="WebVTT cue line">line</a>
percentages are relative to the height of the video, and <a lt="WebVTT cue position">position</a>
and <a lt="WebVTT cue size">size</a> percentages are relative to the width of the video.</p>
<p>Otherwise, <a lt="WebVTT cue line">line</a> percentages are relative to the width of the video,
and <a lt="WebVTT cue position">position</a> and <a lt="WebVTT cue size">size</a> percentages are
relative to the height of the video.</p>
<p>The <a lt="WebVTT cue writing direction">writing direction</a> defaults to <a lt="WebVTT cue
horizontal writing direction">horizontal</a>.</p>
</dd>
<dt><dfn lt="WebVTT cue snap-to-lines flag">A snap-to-lines flag</dfn></dt>
<dd>
<p>A boolean indicating whether the <a lt="WebVTT cue line">line</a> is an integer number of lines
(using the line dimensions of the first line of the cue), or whether it is a percentage of the
dimension of the video. The flag is set when lines are counted, unset otherwise.</p>
<p>Cues whose <a>WebVTT cue snap-to-lines flag</a> is set will be placed within the title-safe
area on user agents that use overscan. Cues with the flag unset will be offset as requested
(modulo overlap avoidance if multiple cues are in the same place).</p>
<p>By default, the <a lt="WebVTT cue snap-to-lines flag">snap-to-lines flag</a> is set to
true.</p>
</dd>
<dt><dfn lt="WebVTT cue line">A line</dfn></dt>
<dd>
<p>The <a lt="WebVTT cue line">line</a> defines positioning of the <a lt="WebVTT cue box">cue
box</a>.</p>
<p>The <a lt="WebVTT cue line">line</a> offsets the <a lt="WebVTT cue box">cue box</a> from the
top, the right or left of the video viewport as defined by the <a lt="WebVTT cue writing
direction">writing direction</a>, the <a lt="WebVTT cue snap-to-lines flag">snap-to-lines
flag</a>, or the lines occupied by any other showing tracks.</p>
<p>The <a lt="WebVTT cue line">line</a> is set either as a number of lines, a percentage of the
video viewport height or width, or as the special value <dfn lt="WebVTT cue line
automatic">auto</dfn>, which means the offset is to depend on the other showing tracks.</p>
<p>A <a>WebVTT cue</a> has a <dfn lt="cue computed line">computed line</dfn> whose value is that
returned by the following algorithm, which is defined in terms of the other aspects of the
cue:</p>
<ol>
<li><p>If the <a lt="WebVTT cue line">line</a> is numeric, the <a>WebVTT cue snap-to-lines
flag</a> of the <a>WebVTT cue</a> is not set, and the <a lt="WebVTT cue line">line</a> is
negative or greater than 100, then return 100 and abort these steps.</p></li>
<li><p>If the <a lt="WebVTT cue line">line</a> is numeric, return the value of the <a>WebVTT cue
line</a> and abort these steps. (Either the <a>WebVTT cue snap-to-lines flag</a> is set, so any
value, not just those in the range 0..100, is valid, or the value is in the range 0..100 and is
thus valid regardless of the value of that flag.)</p></li>
<li><p>If the <a>WebVTT cue snap-to-lines flag</a> of the <a>WebVTT cue</a> is not set, return
the value 100 and abort these steps. (The <a lt="WebVTT cue line">line</a> is the special value
<a lt="WebVTT cue line automatic">auto</a>.)</p></li>
<li><p>Let <var>cue</var> be the <a>WebVTT cue</a>.</p></li>
<li><p>If <var>cue</var> is not in a <a lt="text track list of cues">list of cues</a> of a
<a>text track</a>, or if that <a>text track</a> is not in the <a>list of text tracks</a> of a
<a>media element</a>, return −1 and abort these steps.</p></li>
<li><p>Let <var>track</var> be the <a>text track</a> whose <a lt="text track list of cues">list
of cues</a> the <var>cue</var> is in.</p></li>
<li><p>Let <var>n</var> be the number of <a lt="text track">text tracks</a> whose <a>text track
mode</a> is <a lt="text track showing">showing</a> and that are in the <a>media element</a>'s
<a>list of text tracks</a> before <var>track</var>.</p></li>
<li><p>Increment <var>n</var> by one.</p></li>
<li><p>Negate <var>n</var>.</p></li>
<li><p>Return <var>n</var>.</p></li>
</ol>
</dd>
<dt><dfn lt="WebVTT cue line alignment">A line alignment</dfn></dt>
<dd>
<p>An alignment for the <a lt="WebVTT cue box">cue box</a>'s <a lt="WebVTT cue line">line</a>, one
of:</p>
<dl>
<dt><dfn lt="WebVTT cue line start alignment">Start alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a>'s top side (for <a lt="WebVTT cue horizontal writing
direction">horizontal</a> cues), left side (for <a lt="WebVTT cue vertical growing right writing
direction">vertical growing right</a>), or right side (for <a lt="WebVTT cue vertical growing
left writing direction">vertical growing left</a>) is aligned at the <a lt="WebVTT cue
line">line</a>.</dd>
<dt><dfn lt="WebVTT cue line middle alignment">Middle alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a> is centered at the <a lt="WebVTT cue
line">line</a>.</dd>
<dt><dfn lt="WebVTT cue line end alignment">End alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a>'s bottom side (for <a lt="WebVTT cue horizontal
writing direction">horizontal</a> cues), right side (for <a lt="WebVTT cue vertical growing right
writing direction">vertical growing right</a>), or left side (for <a lt="WebVTT cue vertical
growing left writing direction">vertical growing left</a>) is aligned at the <a lt="WebVTT cue
line">line</a>.</dd>
</dl>
<p>A <a>WebVTT cue</a> has a default <a>WebVTT cue line alignment</a> of <a lt="WebVTT cue line
start alignment">start</a>.</p>
</dd>
<dt><dfn lt="WebVTT cue position">A position</dfn></dt>
<dd>
<p>The <a lt="WebVTT cue position">position</a> defines the indent of the <a lt="WebVTT cue
box">cue box</a> in the direction defined by the <a lt="WebVTT cue writing direction">writing
direction</a>.</p>
<p>The <a lt="WebVTT cue position">position</a> is either a number giving the position of the <a
lt="WebVTT cue box">cue box</a> as a percentage value or the special value <dfn lt="WebVTT cue
automatic position">auto</dfn>, which means the position is to depend on the <a lt="WebVTT cue
text alignment">text alignment</a> of the cue.</p>
<p>If the cue is not within a <a lt="WebVTT region">region</a>, the percentage value is to be
interpreted as a percentage of the video dimensions, otherwise as a percentage of the region
dimensions.</p>
<p>A <a>WebVTT cue</a> has a <dfn lt="cue computed position">computed position</dfn> whose value
is that returned by the following algorithm, which is defined in terms of the other aspects of the
cue:</p>
<ol>
<li><p>If the <a lt="WebVTT cue position">position</a> is numeric, then return the value of the
<a lt="WebVTT cue position">position</a> and abort these steps. (Otherwise, the <a lt="WebVTT cue
position">position</a> is the special value <a lt="WebVTT cue automatic
position">auto</a>.)</p></li>
<li><p>If the <a lt="WebVTT cue text alignment">cue text alignment</a> is <a lt="WebVTT cue start
alignment">start</a> or <a lt="WebVTT cue left alignment">left</a>, return 0 and abort these
steps.</p></li>
<li><p>If the <a lt="WebVTT cue text alignment">cue text alignment</a> is <a lt="WebVTT cue end
alignment">end</a> or <a lt="WebVTT cue right alignment">right</a>, return 100 and abort these
steps.</p></li>
<li><p>If the <a lt="WebVTT cue text alignment">cue text alignment</a> is <a lt="WebVTT cue
middle alignment">middle</a>, return 50 and abort these steps.</p></li>
</ol>
<p class="note">Since the default value of the <a>WebVTT cue position alignment</a> is <a
lt="WebVTT cue middle alignment">middle</a>, if there is no <a>WebVTT cue text alignment</a>
setting for a cue, the <a>WebVTT cue position</a> defaults to 50%.</p>
<p class="note">Even for <a lt="WebVTT cue horizontal writing direction">horizontal</a> cues with
right-to-left <i>paragraph direction</i> text, the <a lt="WebVTT cue box">cue box</a> is
positioned from the left edge of the video viewport. This allows defining a rendering space
template which can be filled with either left-to-right or right-to-left <i>paragraph direction</i>
text. If such a <a lt="WebVTT cue box">cue box</a> template is created with <a lt="WebVTT cue
start alignment">start</a> or <a lt="WebVTT cue end alignment">end</a> aligned text, it is best to
also specify a <a lt="WebVTT cue size">size</a> since otherwise the text may flip from one side of
the video viewport to the other.</p>
</dd>
<dt><dfn lt="WebVTT cue position alignment">A position alignment</dfn></dt>
<dd>
<p>An alignment for the <a lt="WebVTT cue box">cue box</a> in the dimension of the <a lt="WebVTT
cue writing direction">writing direction</a>, describing what the <a lt="WebVTT cue
position">position</a> is anchored to, one of:</p>
<dl>
<dt><dfn lt="WebVTT cue position start alignment">Start alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a>'s left side (for <a lt="WebVTT cue horizontal writing
direction">horizontal</a> cues) or top side (otherwise) is aligned at the <a lt="WebVTT cue
position">position</a>.</dd>
<dt><dfn lt="WebVTT cue position middle alignment">Middle alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a> is centered at the <a lt="WebVTT cue
position">position</a>.</dd>
<dt><dfn lt="WebVTT cue position end alignment">End alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a>'s right side (for <a lt="WebVTT cue horizontal writing
direction">horizontal</a> cues) or bottom side (otherwise) is aligned at the <a lt="WebVTT cue
position">position</a>.</dd>
<dt><dfn lt="WebVTT cue position automatic alignment">Auto alignment</dfn></dt>
<dd>The <a lt="WebVTT cue box">cue box</a>'s alignment depends on the value of the <a lt="WebVTT
cue text alignment">text alignment</a> of the cue.</dd>
</dl>
<p>A <a>WebVTT cue</a> has a <dfn lt="cue computed position alignment">computed position
alignment</dfn> whose value is that returned by the following algorithm, which is defined in terms
of other aspects of the cue:</p>
<ol>
<li><p>If the <a>WebVTT cue position alignment</a> is not <a lt="WebVTT cue position automatic
alignment">auto</a>, then return the value of the <a>WebVTT cue position alignment</a> and abort
these steps.</p></li>
<li><p>If the <a>WebVTT cue text alignment</a> is <a lt="WebVTT cue start alignment">start</a> or
<a lt="WebVTT cue left alignment">left</a>, return <a lt="WebVTT cue position start
alignment">start</a> and abort these steps.</p></li>
<li><p>If the <a>WebVTT cue text alignment</a> is <a lt="WebVTT cue end alignment">end</a> or <a
lt="WebVTT cue right alignment">right</a>, return <a lt="WebVTT cue position end
alignment">end</a> and abort these steps.</p></li>
<li><p>If the <a>WebVTT cue text alignment</a> is <a lt="WebVTT cue middle alignment">middle</a>,
return <a lt="WebVTT cue position middle alignment">middle</a> and abort these steps.</p></li>
</ol>
<p class="note">Since the <a lt="WebVTT cue position">position</a> always measures from the left
of the video (for <a lt="WebVTT cue horizontal writing direction">horizontal</a> cues) or the top
(otherwise), the <a>WebVTT cue position alignment</a> <a lt="WebVTT cue position start
alignment">start value</a> varies between left and top for horizontal and vertical cues, but not
between left and right even for changing <i>paragraph direction</i>.</p>
</dd>
<dt><dfn lt="WebVTT cue size">A size</dfn></dt>
<dd>
<p>A number giving the size of the <a lt="WebVTT cue box">cue box</a>, to be interpreted as a
percentage of the video, as defined by the <a lt="WebVTT cue writing direction">writing
direction</a>.</p>
<p>By default, the <a>WebVTT cue size</a> is 100%.</p>
</dd>
<dt><dfn lt="WebVTT cue text alignment">A text alignment</dfn></dt>
<dd>
<p>An alignment for all lines of text within the <a lt="WebVTT cue box">cue box</a>, in the
dimension of the <a lt="WebVTT cue writing direction">writing direction</a> and the <i>paragraph
direction</i> [[!BIDI]], one of:</p>
<dl>
<dt><dfn lt="WebVTT cue start alignment">Start alignment</dfn></dt>
<dd>The text is aligned towards the <i>paragraph direction</i> start side of the <a lt="WebVTT
cue box">cue box</a>.</dd>
<dt><dfn lt="WebVTT cue middle alignment">Middle alignment</dfn></dt>
<dd>The text is aligned centered between the box's start and end sides.</dd>
<dt><dfn lt="WebVTT cue end alignment">End alignment</dfn></dt>
<dd>The text is aligned towards the <i>paragraph direction</i> end side of the <a lt="WebVTT cue
box">cue box</a>.</dd>
<dt><dfn lt="WebVTT cue left alignment">Left alignment</dfn></dt>
<dd>The text is aligned to the box's left side.</dd>
<dt><dfn lt="WebVTT cue right alignment">Right alignment</dfn></dt>
<dd>The text is aligned to the box's right side.</dd>
</dl>
<p>By default, the value of the <a>WebVTT cue text alignment</a> is <a lt="WebVTT cue middle
alignment">middle aligned</a>.</p>
</dd>
<dt><dfn lt="WebVTT cue region">A region</dfn></dt>
<dd>
<p>An optional <a>WebVTT region</a> to which a cue belongs.</p>
</dd>
</dl>
<p>The associated <a>rules for updating the text track rendering</a> of <a lt="WebVTT cue">WebVTT
cues</a> are the <a>rules for updating the display of WebVTT text tracks</a>.</p>
<div class="impl">
<p>When a <a>WebVTT cue</a> whose <a lt="text track cue active flag">active flag</a> is set has its
<a lt="WebVTT cue writing direction">writing direction</a>, <a lt="WebVTT cue snap-to-lines
flag">snap-to-lines flag</a>, <a lt="WebVTT cue line">line</a>, <a lt="WebVTT cue
position">position</a>, <a lt="WebVTT cue size">size</a>, <a lt="WebVTT cue text alignment">text
alignment</a>, <a lt="WebVTT cue region">region</a>, or <a lt="text track cue text">text</a> change
value, then the user agent must empty the <a>text track cue display state</a>, and then immediately
run the <a>text track</a>'s <a>rules for updating the display of WebVTT text tracks</a>.</p>
</div>
<h3 id=regions>WebVTT regions</h3>
<p>A <dfn>WebVTT region</dfn> represents a subpart of the video viewport and provides a rendering
area for <a lt="WebVTT cue">WebVTT cues</a>.</p>
<p>Each <a>WebVTT region</a> consists of:</p>
<dl>
<dt><dfn lt="WebVTT region identifier">An identifier</dfn></dt>
<dd>
<p>An arbitrary string.</p>
</dd>
<dt><dfn lt="WebVTT region width">A width</dfn></dt>
<dd>
<p>A number giving the width of the box within which the text of each line of the containing cues
is to be rendered, to be interpreted as a percentage of the video width. Defaults to 100.</p>
</dd>
<dt><dfn lt="WebVTT region lines">A lines value</dfn></dt>
<dd>
<p>A number giving the number of lines of the box within which the text of each line of the
containing cues is to be rendered. Defaults to 3.</p>
</dd>
<dt><dfn lt="WebVTT region anchor">A region anchor point</dfn></dt>
<dd>
<p>Two numbers giving the x and y coordinates within the region which is anchored to the video
viewport and does not change location even when the region does, e.g. because of font size
changes. Defaults to (0,100), i.e. the bottom left corner of the region.</p>
</dd>
<dt><dfn lt="WebVTT region viewport anchor">A region viewport anchor point</dfn></dt>
<dd>
<p>Two numbers giving the x and y coordinates within the video viewport to which the region anchor
point is anchored. Defaults to (0,100), i.e. the bottom left corner of the viewport.</p>
</dd>
<dt><dfn lt="WebVTT region scroll">A scroll value</dfn></dt>
<dd>
<p>One of the following:</p>
<dl>
<dt><dfn lt="WebVTT region scroll none">None</dfn></dt>
<dd>Indicates that the cues in the region are not to scroll and instead stay fixed at the
location they were first painted in.</dd>
<dt><dfn lt="WebVTT region scroll up">Up</dfn></dt>
<dd>Indicates that the cues in the region will be added at the bottom of the region and push any
already displayed cues in the region up until all lines of the new cue are visible in the
region.</dd>
<!-- in the future we may introduce scroll="down"-->
</dl>
</dd>
</dl>
<div class="note">
<p>The following diagram illustrates how anchoring of a region to a video viewport works. The black
cross is the anchor, orange explains the anchor's offset within the region and green the anchor's
offset within the viewport. Think of it as sticking a pin through a note onto a board:</p>
<p><img src="webvtt-region-diagram.png" alt="Within the video viewport, there is a WebVTT region.
Inside the region, there is an anchor point marked with a black cross. The vertical and horizontal
distance from the video viewport's edges to the anchor is marked with green arrows, representing
the region viewport anchor X and Y offsets. The vertical and horizontal distance from the region's
edges to the anchor is marked with orange arrows, representing the region anchor X and Y offsets.
The size of the region is represented by the region width for the horizontal axis, and region lines
for the vertical axis."></p>
</div>
<p>For parsing, we also need the following:</p>
<dl>
<dt><dfn lt="text track list of regions">A text track list of regions</dfn></dt>
<dd>
<p>A list of zero or more <a lt="WebVTT region">WebVTT regions</a>.</p>
</dd>
</dl>
<h2 id=syntax>Syntax</h2>
<h3 id=file-structure>WebVTT file structure</h3>
<p>A <dfn>WebVTT file</dfn> must consist of a <a>WebVTT file body</a> encoded as UTF-8 and labeled
with the <a spec=html>MIME type</a> <code>text/vtt</code>. [[!RFC3629]]</p>
<p>A <dfn>WebVTT file body</dfn> consists of the following components, in the following order:</p>
<ol>
<li>An optional U+FEFF BYTE ORDER MARK (BOM) character.</li>
<li>The string "<code>WEBVTT</code>".</li>
<li>Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character
followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN
(CR) characters.</li> <!-- allows for Emacs line -->
<li>Exactly one <a lt="WebVTT line terminator">WebVTT line terminators</a> to terminate the line
with the file magic and separate it from the rest of the body.</li>
<li>Zero or more <a lt="WebVTT metadata header">WebVTT metadata headers</a>.</li>
<li>One or more <a lt="WebVTT line terminator">WebVTT line terminators</a> to terminate the header
block and separate the cues from the file header.</li>
<li>Zero or more <a lt="WebVTT cue block">WebVTT cue blocks</a> and <a lt="WebVTT comment
block">WebVTT comment blocks</a> separated from each other by one or more <a lt="WebVTT line
terminator">WebVTT line terminators</a>.</li>
<li>Zero or more <a lt="WebVTT line terminator">WebVTT line terminators</a>.</li>
</ol>
<p>A <dfn>WebVTT line terminator</dfn> consists of one of the following:</p>
<ul class="brief">
<li>A U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.</li>
<li>A single U+000A LINE FEED (LF) character.</li>
<li>A single U+000D CARRIAGE RETURN (CR) character.</li>
</ul>
<p>A <dfn>WebVTT metadata header</dfn> consists of the following components, in the given order:</p>
<ol>
<li>A <a>WebVTT metadata header name</a>.</li>
<li>A U+003A COLON (colon) character.</li>
<li>A <a>WebVTT metadata header value</a>.</li>
<li>A <a>WebVTT line terminator</a>.</li>
</ol>
<p>A <dfn>WebVTT metadata header name</dfn> and a <dfn>WebVTT metadata header value</dfn> each
consist of any sequence of one or more characters other than U+000A LINE FEED (LF) characters and
U+000D CARRIAGE RETURN (CR) characters except that the entire resulting string must not contain the
substring "<code>--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
SIGN).</p>
<p>A <dfn>WebVTT cue block</dfn> consists of the following components, in the given order:</p>
<ol>
<li>Optionally, a <a>WebVTT cue identifier</a> followed by a <a>WebVTT line terminator</a>.</li>
<li><a>WebVTT cue timings</a>.</li>
<li>Optionally, one or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters
followed by a <a>WebVTT cue settings list</a>.</li>
<li>A <a>WebVTT line terminator</a>.</li>
<li>The <dfn>cue payload</dfn>: either <a>WebVTT cue text</a>, <a>WebVTT chapter title text</a>, or
<a>WebVTT metadata text</a>, but it must not contain the substring "<code>--></code>" (U+002D
HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</li>
<li>A <a>WebVTT line terminator</a>.</li>
</ol>
<p class="note">A <a>WebVTT cue block</a> corresponds to one piece of time-aligned text or data in
the <a>WebVTT file</a>, for example one subtitle. The <a>cue payload</a> is the text or data
associated with the cue.</p>
<p>A <dfn>WebVTT cue identifier</dfn> is any sequence of one or more characters not containing the
substring "<code>--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN),
nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.</p>
<p>A <a>WebVTT cue identifier</a> must be unique amongst all the <a lt="WebVTT cue
identifier">WebVTT cue identifiers</a> of all <a lt="WebVTT cue">WebVTT cues</a> of a <a>WebVTT
file</a>.</p>
<p class="note">A <a>WebVTT cue identifier</a> can be used to reference a specific cue, for example
from script or CSS.</p>
<p>The <dfn>WebVTT cue timings</dfn> part of a <a>WebVTT cue block</a> consists of the following
components, in the given order:</p>
<ol>
<!-- we could allow leading and trailing spaces and tabs, and make the space between the arrow
either optional or allow multiple spaces or tabs -->
<li>A <a>WebVTT timestamp</a> representing the start time offset of the cue. The time represented
by this <a>WebVTT timestamp</a> must be greater than or equal to the start time offsets of all
previous cues in the file.</li>
<li>One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.</li>
<li>The string "<code>--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
SIGN).</li>
<li>One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.</li>