Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wide Review Comment 2017: serialisation and parsing #367

Closed
nigelmegitt opened this issue Sep 27, 2017 · 14 comments
Closed

Wide Review Comment 2017: serialisation and parsing #367

nigelmegitt opened this issue Sep 27, 2017 · 14 comments

Comments

@nigelmegitt
Copy link
Contributor

Copy/paste from https://lists.w3.org/Archives/Public/public-tt/2017Sep/0080.html - raising as an issue for tracking/disposition purposes.

The WebVTT syntax is similar to (but incompatible with) SRT but otherwise distinct from all other syntaxes, and includes a subsection that is effectively CSS syntax. I consider the serialisation and parsing of a document format to be an architectural layer in its own right, ideally with tests, tools and support for the format. In the case of WebVTT the fact that it has a unique format means that the benefits of referencing an independent serialisation and parsing layer are absent. For internal business to business transactions this creates some hurdles: it is costlier to develop a syntax checker for example to validate that received files are well formed, or to quality check the content; writing custom parser code becomes a security risk since issues like buffer overflow are more commonly, though not uniquely, found in less mature code. The tool support for e.g. JSON, HTML or XML serialisation is much more mature and less likely to suffer from these problems.

It is unclear what action could resolve this with WebVTT in its current form, without taking seemingly extreme steps. For example if WebVTT were a semantic model plus an API, and alternative representations were defined, and at least one of those alternative representations were a more commonly used one, that would help, though at the expense of adding an initial step for every WebVTT import or export, which is to work out which representation to use.

From this perspective, the syntax of WebVTT seems better suited to direct writing and editing in text editors by humans than by software, though obviously it is ultimately feasible to use either. For an organisation like the BBC authoring and distributing subtitle documents at scale it would be better to optimise for machine reading and writing instead of human reading and writing, since we expect subtitle authors and editors to use specialist software rather than tweaking files directly.

@dwsinger
Copy link

I'm not sure that I agree that the model means it's easier to write by hand. But though this is an interesting comment, it doesn't seem actionable.("It is unclear what action could resolve this with WebVTT").

@nigelmegitt
Copy link
Contributor Author

The possible line of action I propose is to split the WebVTT semantic model from the representation. I (probably) expect a "won't fix" response to that, but it's at least a route to consider.

@dwsinger
Copy link

This is an architectural question that was resolved by the WG long ago, and such comments should have been made in the WG prior to asking for a first, let alone second, external wide review.

@nigelmegitt
Copy link
Contributor Author

Perhaps that was before my time. I had the impression it was an architectural decision made by WHATWG in their fork of HTML before WebVTT was split out into a separate spec. I certainly don't recall it being discussed in the WG.

@silviapfeiffer
Copy link
Member

The comment is that the line-based format of WebVTT should be abandoned and be replaced by a XML or JSON based format. While it is appreciated that JSON parsing is already built into the browser and an alternative version of WebVTT could be specified in JSON, this is not a practical suggestion in the context of the current status of WebVTT, which has a well established usage and community of use. The WG may wish to discuss further, but I see no other resolution than "works for me".

Also note that both XML and JSON require "end tags" for creating valid files. When first discussing the ability to specify time-aligned text and an encapsulation format for it (about 8 years ago), a file format with end tags was deemed unusable because of the need for progressive delivery, particularly in live streaming and when muxed into media files where you cannot wait for the end of the caption file before rendering captions. Thus, not having end tags is actually an advantage of WebVTT, since it avoids the need for file flattening which hierarchical file formats require.

@nigelmegitt
Copy link
Contributor Author

a file format with end tags was deemed unusable because of the need for progressive delivery

Doesn't it depend on the unit of information that requires tags to be closed? I mean, you could define an "update message" format with close tags that modifies a previous entity. My main point was that it is helpful to separate the serialisation and parsing layer from the format, and to make that layer reusable.

In any case the general idea that live use cases cannot be achieved if end tags are required has been shown to be false by counter-example, in that there exist now live subtitled streams whose data format contains close tags.

I'm not sure why this is labelled "WR-commenter-rejected" since I haven't received a disposition to reject or accept yet, so I'm going to remove that label.

@silviapfeiffer
Copy link
Member

I admit I don't fully understand what the labels mean ;-)

@silviapfeiffer
Copy link
Member

@nigelmegitt You can most certainly use a format with end tags for live use cases, but that requires repackaging the format as you describe in the need for "serialisation" of other formats. That's not required for WebVTT because it is inherently serialised. So, the decision on creation of WebVTT (which happened jointly in the HTML and WHATWG groups) was that WebVTT should be serialised from the start. It's a bit moot now anyway.

@silviapfeiffer
Copy link
Member

I'd like to close this as "works for me"

@dwsinger
Copy link

WFM too.

@dwsinger dwsinger added the WR label Dec 11, 2017
@silviapfeiffer
Copy link
Member

@nigelmegitt now that the bug is closed, could you add your disposition?

@nigelmegitt
Copy link
Contributor Author

I'm still not clear what the WG disposition is that I'm being asked to respond to. On the basis that it is "comment rejected, we will take no action", all I can do is recognise that is the case and move on. It hasn't changed my view or original comment.

@dwsinger
Copy link

It's hard to say what the disposition of this is, as it suggests that some other syntax might be better, but then says "It is unclear what action could resolve this with WebVTT in its current form, without taking seemingly extreme steps." And we all agree that extreme steps are...mildly undesirable. I hope I got the tags right.

@nigelmegitt
Copy link
Contributor Author

Yes, they look about right, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants