New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extractor/antenna] Improvements #7584
[extractor/antenna] Improvements #7584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All you had to do is
- _VALID_URL = r'https?://(?P<netloc>(?:www\.)?ant1news\.gr)/watch/(?P<id>\d+)/'
+ _VALID_URL = r'https?://(?P<netloc>(?:www\.)?(?:antenna|ant1news)\.gr)/watch/(?P<id>\d+)/'
then add test and maybe change IE_NAME/DESC. The refactoring is not welcome and the new class seem pointless
Well - they look like different websites, and I thought changing the names reported might be confusing for the users. The new class (and the selected test) suffer from ERROR: [antennagr:watch] 1643812: [antennagr:watch] unable to extract OpenGraph description; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 710, in extract
ie_result = self._real_extract(url)
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/antenna.py", line 65, in _real_extract
info['description'] = self._og_search_description(webpage)
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1392, in _og_search_description
return self._og_search_property('description', html, fatal=False, **kargs)
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1383, in _og_search_property
escaped = self._search_regex(og_regexes, html, name, flags=re.DOTALL, **kargs)
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1260, in _search_regex
self.report_warning('unable to extract %s' % _name + bug_reports_message())
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/yt_dlp/extractor/common.py", line 1127, in report_warning
self._downloader.report_warning(msg, *args, **kwargs)
File "/home/stdedos/Documents/WorkBulk/Ubuntu/repos/yt-dlp/test/helper.py", line 315, in _report_warning
real_warning(w, *args, **kwargs)
File "test/test_download.py", line 51, in report_warning
raise ExtractorError(message)
E which I don't know how to fix either 😕 |
Do we want that warning to be shown to user? If so, add |
5b2d77b
to
3d23a30
Compare
Test should now be working. Do you still want me to squash the new class to the old one? |
Can I somehow make that output better?
like ... idk? That definitely makes sense in the youtube extractor, but not on this extractor 😕 |
You can set |
|
I am not sure which tests are failing 😕 I ran the ones I edited, and the CI didn't give anything anything red |
* Rename base extractor as Antenna, as that's the top-level name * Make `Ant1NewsGrBaseIE` to `AntennaWatchGrBaseIE` as an ABC * `_API_PATH` is mandatory, differs per extractor * Untangle `formats, subs` assignment * Create a clone of `Ant1NewsGrWatchIE`, `AntennaGrWatchIE` (with almost everything similar) * Convert `Ant1NewsGrArticleIE` to `InfoExtractor`, as it shared nothing with its parent class Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>
846ec51
to
5c6e8a7
Compare
5c6e8a7
to
e52fb83
Compare
|
Add missed `class` rename Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>
e52fb83
to
46ea0b8
Compare
I don't have the same failures as you 😕 $ pytest --tb=short -k AntennaGrWatch
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: ~repos/yt-dlp, configfile: setup.cfg
plugins: pudb-0.7.0
collected 6013 items / 6010 deselected / 3 selected
test/test_download.py::TestDownload::test_AntennaGrWatch FAILED [ 33%]
test/test_download.py::TestDownload::test_AntennaGrWatch_1 PASSED [ 66%]
test/test_download.py::TestDownload::test_AntennaGrWatch_all PASSED [100%]
================================================================================================ FAILURES =================================================================================================
____________________________________________________________________________________ TestDownload.test_AntennaGrWatch _____________________________________________________________________________________
test/test_download.py:212: in test_template
expect_info_dict(self, tc_res_dict, tc.get('info_dict', {}))
test/helper.py:233: in expect_info_dict
expect_dict(self, got_dict, expected_dict)
test/helper.py:186: in expect_dict
expect_value(self, got, expected, info_field)
test/helper.py:178: in expect_value
self.assertEqual(
E AssertionError: 'http[22 chars]e.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg' != 'http[22 chars]e.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
E - https://ant1media.azureedge.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg
E ? ^^
E + https://ant1media.azureedge.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg
E ? ^^^
E : Invalid value for field thumbnail, expected 'https://ant1media.azureedge.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg', got 'https://ant1media.azureedge.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------
[debug] Loaded 1857 extractors
[antenna:watch] Extracting URL: https://www.ant1news.gr/watch/1506168/ant1-news-09112021-stis-18-45
[antenna:watch] 1506168: Downloading webpage
[antenna:watch] 1506168: Downloading JSON metadata
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] 1506168: Downloading 1 format(s): mp4
[info] Writing video metadata as JSON to: test_AntennaGrWatch_1506168.info.json
[debug] Invoking http downloader on "https://antennavod.akamaized.net/VODS2/3c2ce15b-0749-40cb-853a-869fb34d6fed.mp4"
[download] Destination: test_AntennaGrWatch_1506168.mp4
[download] 100% of 10.00KiB in 00:00:00 at 35.10KiB/s
========================================================================================= short test summary info =========================================================================================
FAILED test/test_download.py::TestDownload::test_AntennaGrWatch - AssertionError: 'http[22 chars]e.net/imgHandler/640/26d46bf6-8158-4f02-b197-7096c714b2de.jpg' != 'http[22 chars]e.net/imgHandler/1000/26d46bf6-8158-4f02-b197-7096c714b2de.jpg'
============================================================================== 1 failed, 2 passed, 6010 deselected in 3.69s =============================================================================== Both URLs are valid; they just have difference in size (funnily, |
Not strictly related to this PR, but since we are updating the extractor, might as well clean it up
Must be due to geolocation. If tests are passing for you, we can merge. |
Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>
Somehow I had to update the |
@pukkandan Are we now ready? 😕 |
shouldn't we revert the |
I don't see why. "Antenna" is the company, and "I'd consider that" if we go with the original approach of |
It's not just a cosmetic change, it would be to maintain compatibility of Or else you should probably add |
Signed-off-by: Stavros Ntentos <133706+stdedos@users.noreply.github.com>
Is this now complete + correct? |
"Are we there yet"? 🙃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a final review by maintainer-in-chief before it can be merged; it will happen when time permits. Thanks for your patience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have been easier imo to not touch the IE class names. It's only IE_NAME
that users see. But since it's already done, I won't object
Authored by: stdedos
_og_search_description
warningformat
tomp4
(or equivalent, instead of 0)IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
ADD DESCRIPTION HERE
Fixes #
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at 5b2d77b
Summary
🚚✨♻️
Renamed and refactored the
ant1newsgr.py
extractor toantenna.py
to support both ant1news.gr and antenna.gr websites. Added a new extractor class for antenna.gr watch pages and a base class for common logic. Updated the import statement in_extractors.py
.Walkthrough
ant1newsgr.py
toantenna.py
and update import statement in_extractors.py
(link)AntennaWatchGrBaseIE
with abstract property_API_PATH
and simplify format and subtitle extraction logic (link, link, link)Ant1NewsGrWatchIE
to inherit fromAntennaWatchGrBaseIE
and implement_API_PATH
(link)AntennaGrWatchIE
to handle videos from antenna.gr and provide test case (link)Ant1NewsGrEmbedIE
to inherit fromAntennaWatchGrBaseIE
and implement_API_PATH
(link)