[extractor/abc] Fix #6433 by add 2 re patterns #7434

meliber · 2023-06-26T19:09:41Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

ADD DESCRIPTION HERE

Fixes #6433

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 1e57797`

Summary

🎥🆕🧪

Enhance ABCIE extractor to handle YouTube-embedded videos. Add a new test case for yt_dlp/extractor/abc.py.

Walkthrough

Add logic to handle ABC-embedded videos in ABCIE extractor ([link](https://github.com/yt-dlp/yt-dlp/pull/7434/files?diff=unified&w=0#diff-65fb45ff5464c45c0963dd9a56a6ea7b3f85fab95e15cbd3fe360391c073051bR108-R116))

bashonly · 2023-06-26T19:29:51Z

instead of adding more if/else statements, I think this could be done like this

diff --git a/yt_dlp/extractor/abc.py b/yt_dlp/extractor/abc.py
index 0ca76b85a..b45e4d12c 100644
--- a/yt_dlp/extractor/abc.py
+++ b/yt_dlp/extractor/abc.py
@@ -12,6 +12,7 @@
     int_or_none,
     parse_iso8601,
     str_or_none,
+    traverse_obj,
     try_get,
     unescapeHTML,
     update_url_query,
@@ -107,7 +108,7 @@ def _real_extract(self, url):
                 video = True
 
         if mobj is None:
-            mobj = re.search(r'(?P<type>)"sources": (?P<json_data>\[[^\]]+\]),', webpage)
+            mobj = re.search(r'(?P<type>)"(?:sources|files|renditions)":\s*(?P<json_data>\[[^\]]+\])', webpage)
             if mobj is None:
                 mobj = re.search(
                     r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
@@ -121,7 +122,7 @@ def _real_extract(self, url):
             urls_info = self._parse_json(
                 mobj.group('json_data'), video_id, transform_source=js_to_json)
             youtube = mobj.group('type') == 'YouTube'
-            video = mobj.group('type') == 'Video' or urls_info[0]['contentType'] == 'video/mp4'
+            video = mobj.group('type') == 'Video' or traverse_obj(urls_info, (0, 'contentType')) == 'video/mp4'
 
         if not isinstance(urls_info, list):
             urls_info = [urls_info]

meliber · 2023-06-26T20:09:09Z

instead of adding more if/else statements, I think this could be done like this

diff --git a/yt_dlp/extractor/abc.py b/yt_dlp/extractor/abc.py
index 0ca76b85a..b45e4d12c 100644
--- a/yt_dlp/extractor/abc.py
+++ b/yt_dlp/extractor/abc.py
@@ -12,6 +12,7 @@
     int_or_none,
     parse_iso8601,
     str_or_none,
+    traverse_obj,
     try_get,
     unescapeHTML,
     update_url_query,
@@ -107,7 +108,7 @@ def _real_extract(self, url):
                 video = True
 
         if mobj is None:
-            mobj = re.search(r'(?P<type>)"sources": (?P<json_data>\[[^\]]+\]),', webpage)
+            mobj = re.search(r'(?P<type>)"(?:sources|files|renditions)":\s*(?P<json_data>\[[^\]]+\])', webpage)
             if mobj is None:
                 mobj = re.search(
                     r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
@@ -121,7 +122,7 @@ def _real_extract(self, url):
             urls_info = self._parse_json(
                 mobj.group('json_data'), video_id, transform_source=js_to_json)
             youtube = mobj.group('type') == 'YouTube'
-            video = mobj.group('type') == 'Video' or urls_info[0]['contentType'] == 'video/mp4'
+            video = mobj.group('type') == 'Video' or traverse_obj(urls_info, (0, 'contentType')) == 'video/mp4'
 
         if not isinstance(urls_info, list):
             urls_info = [urls_info]

That's way more better. Thank you.

yt_dlp/extractor/abc.py

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

Closes yt-dlp#6433 Authored by: meliber

[extractor/abc] Fix yt-dlp#6433 by add 2 re patterns

1e57797

bashonly added site-bug Issue with a specific website pending-fixes PR has had changes requested labels Jun 26, 2023

[extractor/abc] Fix yt-dlp#6433 combine re patterns

a8347a7

bashonly approved these changes Jun 26, 2023

View reviewed changes

yt_dlp/extractor/abc.py Outdated Show resolved Hide resolved

bashonly added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Jun 26, 2023

Update yt_dlp/extractor/abc.py

ee5d9b9

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

pukkandan assigned bashonly Jun 27, 2023

bashonly merged commit 8f05fba into yt-dlp:master Jun 27, 2023
11 checks passed

bashonly removed the pending-review PR needs a review label Jun 27, 2023

meliber deleted the fix_#6433 branch June 27, 2023 21:24

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[extractor/abc] Fix extraction (yt-dlp#7434)

30a1864

Closes yt-dlp#6433 Authored by: meliber

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[extractor/abc] Fix #6433 by add 2 re patterns #7434

[extractor/abc] Fix #6433 by add 2 re patterns #7434

meliber commented Jun 26, 2023 •

edited by ghost

bashonly commented Jun 26, 2023

meliber commented Jun 26, 2023

[extractor/abc] Fix #6433 by add 2 re patterns #7434

[extractor/abc] Fix #6433 by add 2 re patterns #7434

Conversation

meliber commented Jun 26, 2023 • edited by ghost

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 1e57797

Summary

Walkthrough

bashonly commented Jun 26, 2023

meliber commented Jun 26, 2023

meliber commented Jun 26, 2023 •

edited by ghost

`🤖 Generated by Copilot at 1e57797`