[WebVTT] Allow spaces before newlines for CueBlock #7681
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes the
CueBlock
parser for WebVTT to allow optional spaces or tabs in front of the settings list.See: https://www.w3.org/TR/webvtt1/#webvtt-cue-block
In particular:
And this part of the WebVTT cue settings list definition:
This PR came to be when I recently encountered a vtt file that included one space before the newline in the WebVTT cue block, which caused the parser to fail like this:
The actual error doesn't occur at position 57 (which is the 1), but rather at position 88 (which is right at the end of the last timestamp) because no newline character can be matched.
yt-dlp/yt_dlp/webvtt.py
Lines 289 to 290 in 86aea0d
Returning null here then causes the
parse_fragment
method to raise theParserError
as seen in the error snippet.yt-dlp/yt_dlp/webvtt.py
Lines 392 to 397 in 86aea0d
Fixes #7453
Replaces #7454, as it doesn't adhere to the WebVTT spec.
Template notes
Comments about some template items
I have a few issues with some of the template items, but didn't know where to add my comments, so I'm doing that here:
There is another open PR for this (see above), but it doesn't adhere to the WebVTT spec and would break vtt files without a space at the end of the cue timings.
Running flake8 results in the following error:
This is not code that I changed and can't actually be resolved by running black.
Black changes a few more things in that file and also causes another issue for flake8:
The code in question was reformatted like that:
Removing the whitespace after
parser._pos
then causes this flake8 error:No idea what I'm supposed to do here, so I ignored BLK100 instead and completed the flake8 run successfully.
As for the tests, I couldn't find any WebVTT tests to run, so I executed the normal testsuite as explained by the developer instructions.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
馃 Generated by Copilot at 28ccfe9
Summary
馃洜锔忦煋濔煄烇笍
Improve WebVTT cue parsing by using a regex to handle whitespace. This affects the file
yt_dlp/webvtt.py
.Walkthrough
parse
method of theCueBlock
class (link) inyt_dlp/webvtt.py