Fix GoPro extractor. #9019

stilor · 2024-01-18T02:59:56Z

IMPORTANT: PRs without the template will be CLOSED

The GoPro extractor fails even when used on the URLs recorded in the tests with an error like:

[GoPro] Extracting URL: https://gopro.com/v/KRm6Vgp2peg4e [GoPro] KRm6Vgp2peg4e: Downloading webpage
ERROR: [GoPro] KRm6Vgp2peg4e: KRm6Vgp2peg4e: Failed to parse JSON (caused by JSONDecodeError('Expecting \',\' delimiter in \'"shop_url"=>"https:/\': line 1 column 5120 (char 5119)')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/storage/avn/git/yt-dlp/yt_dlp/extractor/common.py", line 718, in extract
...

The JSON in the webpage now contains semicolons as part of HTML entities such as ". This makes the regexp used for matching truncate the JSON prematurely.
The same entities are replaced with the respective characters by clean_html, resulting in an invalid JSON.
One of the tests for the extractor fails because of unexpected author and track fields in the returned dictionary.

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

pukkandan · 2024-01-18T03:26:08Z

yt_dlp/extractor/gopro.py

        metadata = self._parse_json(
-            self._html_search_regex(r'window\.__reflectData\s*=\s*([^;]+)', webpage, 'metadata'), video_id)
+            self._search_regex(r'<script>window\.__reflectData\s*=\s*(.+)\s*;\s*</script>',
+                               webpage, 'metadata'), video_id)


metadata = self._search_json(r'<script>window\.__reflectData\s*=', ...)

should work better

pukkandan · 2024-01-18T03:27:58Z

yt_dlp/extractor/gopro.py

+            'track': '',
+            'artist': '',


The track is not supposed to be an empty string. Extractor's returning wrong value. You don't have to fix the code in this PR, but don't add wrong value to test.

Suggested change

'track': '',

'artist': '',

Split the fix into a separate commit.

The extractor fails even when used on the URLs recorded in the tests with an error like: [GoPro] Extracting URL: https://gopro.com/v/KRm6Vgp2peg4e [GoPro] KRm6Vgp2peg4e: Downloading webpage ERROR: [GoPro] KRm6Vgp2peg4e: KRm6Vgp2peg4e: Failed to parse JSON (caused by JSONDecodeError('Expecting \',\' delimiter in \'"shop_url"=>"https:/\': line 1 column 5120 (char 5119)')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U File "/storage/avn/git/yt-dlp/yt_dlp/extractor/common.py", line 718, in extract ... - The JSON in the webpage now contains colons as part of HTML entities such as ". This makes the regexp used for matching truncate the JSON prematurely. - The same entities are replaced with the respective characters by clean_html, resulting in an invalid JSON. At the reviewer's suggestion, replaced _html_search_regex with _search_json to resolve both issues. Signed-off-by: Alexey Neyman <stilor@att.net>

gopro.com returns empty string as author/track for some videos; consider such attributes to be missing. Signed-off-by: Alexey Neyman <stilor@att.net>

seproDev

Also, please don't force push. The commits will be squashed upon merge.

seproDev · 2024-01-18T14:54:13Z

yt_dlp/extractor/gopro.py

-            'artist': str_or_none(
+            'artist': nonempty_or_none(
                video_info.get('music_track_artist')),
-            'track': str_or_none(
+            'track': nonempty_or_none(
                video_info.get('music_track_name')),


I would just do:

'artist': str_or_none( video_info.get('music_track_artist')) or None, 'track': str_or_none( video_info.get('music_track_name')) or None,

Signed-off-by: Alexey Neyman <stilor@att.net>

yt_dlp/extractor/gopro.py

Authored by: stilor

pukkandan requested changes Jan 18, 2024

View reviewed changes

seproDev added the site-bug Issue with a specific website label Jan 18, 2024

stilor force-pushed the master branch from 8afb3bb to 64a94e9 Compare January 18, 2024 06:13

Fix the test for GoPro extractor.

5687be1

gopro.com returns empty string as author/track for some videos; consider such attributes to be missing. Signed-off-by: Alexey Neyman <stilor@att.net>

stilor force-pushed the master branch from 64a94e9 to 5687be1 Compare January 18, 2024 06:14

seproDev requested changes Jan 18, 2024

View reviewed changes

seproDev added the pending-fixes PR has had changes requested label Jan 19, 2024

Review feedback: use or None instead of a func.

f32deb1

Signed-off-by: Alexey Neyman <stilor@att.net>

seproDev added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Jan 19, 2024

Fix formatting

6b017ff

bashonly approved these changes Jan 19, 2024

View reviewed changes

yt_dlp/extractor/gopro.py Outdated Show resolved Hide resolved

bashonly removed the pending-review PR needs a review label Jan 19, 2024

bashonly assigned seproDev Jan 19, 2024

Remove script tag

4c03a30

seproDev merged commit 4a07a45 into yt-dlp:master Jan 19, 2024
5 checks passed

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/GoPro] Fix extractor (yt-dlp#9019)

1d8c309

Authored by: stilor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GoPro extractor. #9019

Fix GoPro extractor. #9019

stilor commented Jan 18, 2024

pukkandan Jan 18, 2024

stilor Jan 18, 2024

pukkandan Jan 18, 2024

stilor Jan 18, 2024

seproDev left a comment

seproDev Jan 18, 2024

Fix GoPro extractor. #9019

Fix GoPro extractor. #9019

Conversation

stilor commented Jan 18, 2024

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

pukkandan Jan 18, 2024

Choose a reason for hiding this comment

stilor Jan 18, 2024

Choose a reason for hiding this comment

pukkandan Jan 18, 2024

Choose a reason for hiding this comment

stilor Jan 18, 2024

Choose a reason for hiding this comment

seproDev left a comment

Choose a reason for hiding this comment

seproDev Jan 18, 2024

Choose a reason for hiding this comment