[extractor/antenna] Improvements #7584

stdedos · 2023-07-13T21:36:39Z

Rename base extractor as Antenna, as that's the top-level name
Modernize extractor code, including
- Avoid _og_search_description warning
- Set format to mp4 (or equivalent, instead of 0)
Update tests for bigger images
Support extracting also antenna.gr websites (+ add a test)

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

ADD DESCRIPTION HERE

Fixes #

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 5b2d77b`

Summary

🚚✨♻️

Renamed and refactored the ant1newsgr.py extractor to antenna.py to support both ant1news.gr and antenna.gr websites. Added a new extractor class for antenna.gr watch pages and a base class for common logic. Updated the import statement in _extractors.py.

ant1newsgr.py
Renamed to antenna.py
Snowflakes of refactor

Walkthrough

Rename ant1newsgr.py to antenna.py and update import statement in _extractors.py (link)
Create abstract base class AntennaWatchGrBaseIE with abstract property _API_PATH and simplify format and subtitle extraction logic (link, link, link)
Modify Ant1NewsGrWatchIE to inherit from AntennaWatchGrBaseIE and implement _API_PATH (link)
Add new subclass AntennaGrWatchIE to handle videos from antenna.gr and provide test case (link)
Modify Ant1NewsGrEmbedIE to inherit from AntennaWatchGrBaseIE and implement _API_PATH (link)

pukkandan

All you had to do is

-    _VALID_URL = r'https?://(?P<netloc>(?:www\.)?ant1news\.gr)/watch/(?P<id>\d+)/'
+    _VALID_URL = r'https?://(?P<netloc>(?:www\.)?(?:antenna|ant1news)\.gr)/watch/(?P<id>\d+)/'

then add test and maybe change IE_NAME/DESC. The refactoring is not welcome and the new class seem pointless

stdedos · 2023-07-14T06:05:48Z

The refactoring is not welcome and the new class seem pointless

Well - they look like different websites, and I thought changing the names reported might be confusing for the users.

The new class (and the selected test) suffer from

ERROR: [antennagr:watch] 1643812: [antennagr:watch] unable to extract OpenGraph description; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 710, in extract
    ie_result = self._real_extract(url)
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/antenna.py", line 65, in _real_extract
    info['description'] = self._og_search_description(webpage)
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1392, in _og_search_description
    return self._og_search_property('description', html, fatal=False, **kargs)
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1383, in _og_search_property
    escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1260, in _search_regex
    self.report_warning('unable to extract %s' % _name + bug_reports_message())
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1127, in report_warning
    self._downloader.report_warning(msg, *args, **kwargs)
  File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/test/helper.py", line 315, in _report_warning
    real_warning(w, *args, **kwargs)
  File "test/test_download.py", line 51, in report_warning
    raise ExtractorError(message)

E

which I don't know how to fix either 😕

pukkandan · 2023-07-14T06:09:13Z

Do we want that warning to be shown to user? If so, add expected_warnings to test. If not, add a default=None to self._og_search_description

stdedos · 2023-07-14T07:21:55Z

Test should now be working.

Do you still want me to squash the new class to the old one?
If so, which name do I pick?

stdedos · 2023-07-14T08:28:33Z

Can I somehow make that output better?

[info] 1643812: Downloading 1 format(s): 0

like ... idk? mp4?

That definitely makes sense in the youtube extractor, but not on this extractor 😕

pukkandan · 2023-07-22T02:43:35Z

Can I somehow make that output better?
[info] 1643812: Downloading 1 format(s): 0
like ... idk? mp4?

You can set format_ids, but mp4 won't work unless there's only a single format and it's guaranteed to be mp4

pukkandan · 2023-07-22T02:54:22Z

Pls enable "allow edits by maintainers"
The tests are failing
Apply https://github.com/pukkandan/yt-dlp-dev/tree/pr/7584

stdedos · 2023-07-22T07:07:44Z

The tests are failing

I am not sure which tests are failing 😕

I ran the ones I edited, and the CI didn't give anything anything red

* Rename base extractor as Antenna, as that's the top-level name * Make `Ant1NewsGrBaseIE` to `AntennaWatchGrBaseIE` as an ABC * `_API_PATH` is mandatory, differs per extractor * Untangle `formats, subs` assignment * Create a clone of `Ant1NewsGrWatchIE`, `AntennaGrWatchIE` (with almost everything similar) * Convert `Ant1NewsGrArticleIE` to `InfoExtractor`, as it shared nothing with its parent class Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

pukkandan · 2023-07-22T08:16:58Z

❯ pytest --tb=short -k AntennaGrWatch
========================================================================================== test session starts ==========================================================================================
platform win32 -- Python 3.11.4, pytest-7.4.0, pluggy-1.2.0 -- REDACTED
cachedir: .pytest_cache
rootdir: REDACTED
configfile: setup.cfg
collected 5882 items / 5880 deselected / 2 selected

test/test_download.py::TestDownload::test_AntennaGrWatch FAILED                                                                                                                                    [ 50%]
test/test_download.py::TestDownload::test_AntennaGrWatch_all PASSED                                                                                                                                [100%]

=============================================================================================== FAILURES ================================================================================================
___________________________________________________________________________________ TestDownload.test_AntennaGrWatch ____________________________________________________________________________________
test\test_download.py:215: in test_template
    expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
test\helper.py:233: in expect_info_dict
    expect_dict(self, got_dict, expected_dict)
test\helper.py:186: in expect_dict
    expect_value(self, got, expected, info_field)
test\helper.py:178: in expect_value
    self.assertEqual(
E   AssertionError: 'ΟΙ ΠΡΟΔΟΤΕΣ – ΕΠΕΙΣΟΔΙΟ 01' != 'antennagr-watch video #1643812'
E   - ΟΙ ΠΡΟΔΟΤΕΣ – ΕΠΕΙΣΟΔΙΟ 01
E   + antennagr-watch video #1643812
E    : Invalid value for field title, expected 'ΟΙ ΠΡΟΔΟΤΕΣ – ΕΠΕΙΣΟΔΙΟ 01', got 'antennagr-watch video #1643812'
----------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------
[debug] Loaded 1856 extractors
[antennagr:watch] Extracting URL: https://www.antenna.gr/watch/1643812/oi-prodotes-epeisodio-01
[antennagr:watch] 1643812: Downloading webpage
[antennagr:watch] 1643812: Downloading JSON metadata
[debug] Extractor gave empty title. Creating a generic title
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] 1643812: Downloading 1 format(s): 0
[info] Writing video metadata as JSON to: test_AntennaGrWatch_1643812.info.json
[debug] Invoking http downloader on "https://antennavod.akamaized.net/VODS2/noGR.mp4"
[debug] File locking is not supported. Proceeding without locking[download] Destination: test_AntennaGrWatch_1643812.mp4
[download] 100% of   10.00KiB in 00:00:00 at 53.79KiB/s
======================================================================================== short test summary info ========================================================================================
FAILED test/test_download.py::TestDownload::test_AntennaGrWatch - AssertionError: 'ΟΙ ΠΡΟΔΟΤΕΣ – ΕΠΕΙΣΟΔΙΟ 01' != 'antennagr-watch video #1643812'

Add missed `class` rename Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

stdedos · 2023-07-22T09:31:40Z

I don't have the same failures as you 😕

$ pytest --tb=short -k AntennaGrWatch
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: ~repos/yt-dlp, configfile: setup.cfg
plugins: pudb-0.7.0
collected 6013 items / 6010 deselected / 3 selected                                                                                                                                                       

test/test_download.py::TestDownload::test_AntennaGrWatch FAILED                                                                                                                                     [ 33%]
test/test_download.py::TestDownload::test_AntennaGrWatch_1 PASSED                                                                                                                                   [ 66%]
test/test_download.py::TestDownload::test_AntennaGrWatch_all PASSED                                                                                                                                 [100%]

================================================================================================ FAILURES =================================================================================================
____________________________________________________________________________________ TestDownload.test_AntennaGrWatch _____________________________________________________________________________________
test/test_download.py:212: in test_template
    expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
test/helper.py:233: in expect_info_dict
    expect_dict(self, got_dict, expected_dict)
test/helper.py:186: in expect_dict
    expect_value(self, got, expected, info_field)
test/helper.py:178: in expect_value
    self.assertEqual(
E   AssertionError: 'http[22 chars]e.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg' != 'http[22 chars]e.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
E   - https://ant1media.azureedge.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg
E   ?                                            ^^
E   + https://ant1media.azureedge.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg
E   ?                                            ^^^
E    : Invalid value for field thumbnail, expected 'https://ant1media.azureedge.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg', got 'https://ant1media.azureedge.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------
[debug] Loaded 1857 extractors
[antenna:watch] Extracting URL: https://www.ant1news.gr/watch/1506168/ant1-news-09112021-stis-18-45
[antenna:watch] 1506168: Downloading webpage
[antenna:watch] 1506168: Downloading JSON metadata
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] 1506168: Downloading 1 format(s): mp4
[info] Writing video metadata as JSON to: test_AntennaGrWatch_1506168.info.json
[debug] Invoking http downloader on "https://antennavod.akamaized.net/VODS2/3c2ce15b-0749-40cb-853a-869fb34d6fed.mp4"
[download] Destination: test_AntennaGrWatch_1506168.mp4
[download] 100% of   10.00KiB in 00:00:00 at 35.10KiB/s    
========================================================================================= short test summary info =========================================================================================
FAILED test/test_download.py::TestDownload::test_AntennaGrWatch - AssertionError: 'http[22 chars]e.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg' != 'http[22 chars]e.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
============================================================================== 1 failed, 2 passed, 6010 deselected in 3.69s ===============================================================================

Both URLs are valid; they just have difference in size (funnily, /1000/ has an 999px * img) - and the templates/data/player was changed to offer 1000 "since then".
It seems that https://ant1media.azureedge.net/imgHandler/%d/26d46bf6-8158-4f02-b197-7096c714b2de.jpg is an argument`

Not strictly related to this PR, but since we are updating the extractor, might as well clean it up

pukkandan · 2023-07-22T11:40:28Z

I don't have the same failures as you 😕

Must be due to geolocation. If tests are passing for you, we can merge.

Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

stdedos · 2023-07-22T13:04:30Z

Somehow I had to update the md5 hash. Took the opportunity to add one for the new test too.

stdedos · 2023-08-22T07:45:07Z

@pukkandan Are we now ready? 😕

bashonly · 2023-08-22T12:23:16Z

shouldn't we revert the AntennaGrWatchIE class name back to Ant1NewsGrWatchIE ?

stdedos · 2023-08-22T12:38:18Z

I don't see why. "Antenna" is the company, and Ant1 "sounds the same as Antenna", and "ant1news.gr" belongs to "antenna.gr".
Original Author probably focused on the specific place he tested - but the logic and code are exactly the same for both.

"I'd consider that" if we go with the original approach of AntennaGrBaseIE.

bashonly · 2023-08-22T12:52:46Z

I don't see why.

It's not just a cosmetic change, it would be to maintain compatibility of --download-archive

Or else you should probably add _old_archive_ids to its returned info dict

Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

stdedos · 2023-08-22T13:14:45Z

Is this now complete + correct?

yt_dlp/extractor/antenna.py

stdedos · 2023-08-22T13:26:00Z

"Are we there yet"? 🙃

bashonly

Needs a final review by maintainer-in-chief before it can be merged; it will happen when time permits. Thanks for your patience

pukkandan

Would have been easier imo to not touch the IE class names. It's only IE_NAME that users see. But since it's already done, I won't object

Authored by: stdedos

pukkandan requested changes Jul 14, 2023

View reviewed changes

pukkandan added site-enhancement Feature request for some website pending-fixes PR has had changes requested labels Jul 14, 2023

stdedos force-pushed the impr/extractor/antenna/2023-07-14 branch from 5b2d77b to 3d23a30 Compare July 14, 2023 07:21

stdedos force-pushed the impr/extractor/antenna/2023-07-14 branch from 846ec51 to 5c6e8a7 Compare July 22, 2023 07:13

Revert and rewrite

7ecd092

stdedos force-pushed the impr/extractor/antenna/2023-07-14 branch from 5c6e8a7 to e52fb83 Compare July 22, 2023 07:15

Make the format output prettier: API returns a URL filename

46ea0b8

Add missed `class` rename Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

stdedos force-pushed the impr/extractor/antenna/2023-07-14 branch from e52fb83 to 46ea0b8 Compare July 22, 2023 09:31

Cleanup extractor

185da65

Not strictly related to this PR, but since we are updating the extractor, might as well clean it up

pukkandan removed the pending-fixes PR has had changes requested label Jul 22, 2023

Fix/Add md5 for videos

dd675d1

Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

Add _old_archive_ids to preserve compatibility with the new name

a8df283

Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>

bashonly requested changes Aug 22, 2023

View reviewed changes

yt_dlp/extractor/antenna.py Outdated Show resolved Hide resolved

... only where needed

a4fb755

bashonly approved these changes Aug 22, 2023

View reviewed changes

bashonly added the pending-review PR needs a review label Aug 22, 2023

pukkandan approved these changes Aug 26, 2023

View reviewed changes

pukkandan assigned bashonly Aug 26, 2023

bashonly removed the pending-review PR needs a review label Aug 27, 2023

bashonly merged commit 6658760 into yt-dlp:master Aug 29, 2023
13 checks passed

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/antenna] Support antenna.gr (yt-dlp#7584)

6cfc9e1

Authored by: stdedos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[extractor/antenna] Improvements #7584

[extractor/antenna] Improvements #7584

stdedos commented Jul 13, 2023 •

edited

pukkandan left a comment

stdedos commented Jul 14, 2023

pukkandan commented Jul 14, 2023

stdedos commented Jul 14, 2023

stdedos commented Jul 14, 2023

pukkandan commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

stdedos commented Aug 22, 2023

bashonly commented Aug 22, 2023

stdedos commented Aug 22, 2023

bashonly commented Aug 22, 2023

stdedos commented Aug 22, 2023

stdedos commented Aug 22, 2023

bashonly left a comment

pukkandan left a comment •

edited

[extractor/antenna] Improvements #7584

[extractor/antenna] Improvements #7584

Conversation

stdedos commented Jul 13, 2023 • edited

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 5b2d77b

Summary

Walkthrough

pukkandan left a comment

Choose a reason for hiding this comment

stdedos commented Jul 14, 2023

pukkandan commented Jul 14, 2023

stdedos commented Jul 14, 2023

stdedos commented Jul 14, 2023

pukkandan commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

pukkandan commented Jul 22, 2023

stdedos commented Jul 22, 2023

stdedos commented Aug 22, 2023

bashonly commented Aug 22, 2023

stdedos commented Aug 22, 2023

bashonly commented Aug 22, 2023

stdedos commented Aug 22, 2023

stdedos commented Aug 22, 2023

bashonly left a comment

Choose a reason for hiding this comment

pukkandan left a comment • edited

Choose a reason for hiding this comment

stdedos commented Jul 13, 2023 •

edited

`🤖 Generated by Copilot at 5b2d77b`

pukkandan left a comment •

edited