New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GoPro extractor. #9019
Fix GoPro extractor. #9019
Conversation
yt_dlp/extractor/gopro.py
Outdated
metadata = self._parse_json( | ||
self._html_search_regex(r'window\.__reflectData\s*=\s*([^;]+)', webpage, 'metadata'), video_id) | ||
self._search_regex(r'<script>window\.__reflectData\s*=\s*(.+)\s*;\s*</script>', | ||
webpage, 'metadata'), video_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata = self._search_json(r'<script>window\.__reflectData\s*=', ...)
should work better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
yt_dlp/extractor/gopro.py
Outdated
'track': '', | ||
'artist': '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The track is not supposed to be an empty string. Extractor's returning wrong value. You don't have to fix the code in this PR, but don't add wrong value to test.
'track': '', | |
'artist': '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the fix into a separate commit.
The extractor fails even when used on the URLs recorded in the tests with an error like: [GoPro] Extracting URL: https://gopro.com/v/KRm6Vgp2peg4e [GoPro] KRm6Vgp2peg4e: Downloading webpage ERROR: [GoPro] KRm6Vgp2peg4e: KRm6Vgp2peg4e: Failed to parse JSON (caused by JSONDecodeError('Expecting \',\' delimiter in \'"shop_url"=>"https:/\': line 1 column 5120 (char 5119)')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U File "/storage/avn/git/yt-dlp/yt_dlp/extractor/common.py", line 718, in extract ... - The JSON in the webpage now contains colons as part of HTML entities such as ". This makes the regexp used for matching truncate the JSON prematurely. - The same entities are replaced with the respective characters by clean_html, resulting in an invalid JSON. At the reviewer's suggestion, replaced _html_search_regex with _search_json to resolve both issues. Signed-off-by: Alexey Neyman <stilor@att.net>
gopro.com returns empty string as author/track for some videos; consider such attributes to be missing. Signed-off-by: Alexey Neyman <stilor@att.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please don't force push. The commits will be squashed upon merge.
yt_dlp/extractor/gopro.py
Outdated
'artist': str_or_none( | ||
'artist': nonempty_or_none( | ||
video_info.get('music_track_artist')), | ||
'track': str_or_none( | ||
'track': nonempty_or_none( | ||
video_info.get('music_track_name')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do:
'artist': str_or_none(
video_info.get('music_track_artist')) or None,
'track': str_or_none(
video_info.get('music_track_name')) or None,
Signed-off-by: Alexey Neyman <stilor@att.net>
Authored by: stilor
IMPORTANT: PRs without the template will be CLOSED
The GoPro extractor fails even when used on the URLs recorded in the tests with an error like:
"
. This makes the regexp used for matching truncate the JSON prematurely.clean_html
, resulting in an invalid JSON.author
andtrack
fields in the returned dictionary.Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?