-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[duoplay] Add Duoplay extractor #8542
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
$ pip install -r requirements.txt $ python test/test_download.py TestDownload.test_Duoplay
To customize filename, use output format instead: -o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"
added more metadata, so you can use this output format to save file with show and episode info:
|
Can copilot be refreshed to use HEAD commit? |
Don't worry about it. Copilot is pretty useless and will be removed soon. |
so. this is ready from my side (changed pr to non-draft). can this be reviewed now? |
telecast_id, episode = self._match_valid_url(url).groups('id', 'ep') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: groups() takes at most 1 argument (2 given)
@seproDev thanks for the feedback! addressed the issues. CI should pass as well. |
- https://docs.python.org/3/library/re.html#re.Match.groups > Match.groups(default=None)
yt_dlp/extractor/duoplay.py
Outdated
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'), | ||
**traverse_obj(episode_attr, { | ||
'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now brings up the problem of the title. This is the best I came up with, tell me if you think this makes sense:
For movies, always use the title. For shows/series, first try the subtitle, then try to name it after the episode number and if all else fails also use the title.
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'), | |
**traverse_obj(episode_attr, { | |
'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))), | |
'title': episode_attr.get('title') if episode_attr.get('category') == 'movies' else ( | |
traverse_obj(episode_attr, 'subtitle', ('episode_nr', {lambda x: f'Episode {x}' if x else None}), 'title')), | |
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'), | |
**traverse_obj(episode_attr, { |
This means the tests will need to be updated. But I think it would be unexpected to have for example https://duoplay.ee/17/uhikarotid?ep=14
be named Episode 14
even though it is the 2nd episode of the 2nd season.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of movies in a separate pr with same logic, and my version was more verbose.
sadly for the yhikarotid, you have to accept the absolute number scheme, or rename items manually. maybe if show has only absolute numbers, then omit the season 01 from it's name?
diff --git a/yt_dlp/extractor/duoplay.py b/yt_dlp/extractor/duoplay.py
index f12c91012..d2dbc27c4 100644
--- a/yt_dlp/extractor/duoplay.py
+++ b/yt_dlp/extractor/duoplay.py
@@ -53,8 +53,15 @@ class DuoplayIE(InfoExtractor):
},
}]
+ @staticmethod
+ def is_movie(attr):
+ # return attr.get('episode_id', None) == 0
+ return attr.get('category', None) == 'movies'
+
def _real_extract(self, url):
telecast_id, episode = self._match_valid_url(url).groups()
video_id = join_nonempty(telecast_id, episode, delim='_')
webpage = self._download_webpage(url, video_id)
video_player = try_call(lambda: extract_attributes(
@@ -63,14 +70,28 @@ def _real_extract(self, url):
self.raise_no_formats('No video found', expected=True)
episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {}
- return {
+ data = {
'id': video_id,
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
+ 'description': 'synopsis',
+ 'thumbnail': traverse_obj(episode_attr, ('images', 'original')),
+ }
+
+ if self.is_movie(episode_attr):
+ return {
+ **data,
+ 'title': f"{traverse_obj(episode_attr, 'title')} ({traverse_obj(episode_attr, 'year')})",
+ }
+
+ return {
+ **data,
**traverse_obj(episode_attr, {
'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),
- 'description': 'synopsis',
- 'thumbnail': ('images', 'original'),
'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}),
'series': 'title',
'series_id': ('telecast_id', {str_or_none}),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe if show has only absolute numbers, then omit the season 01 from it's name?
That could work.
If no season number present ⇒ absolute value. If season number present ⇒ relative value.
I'd be curious to hear what one of the maintainers thinks.
As for your diff, I see that you are excluding season and episode data on movies, which makes sense. I'll make a better suggestion later working that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added test data for a movie. and yes, the series data doesn't make sense for movie. also i'd like to put year
to movie title. can you help with that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also. the movie will expire in 4 days, not sure if they remove the URL and data if that expires.
"_visible_until": {
"diff": 4,
"unit": "day"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. the test movie expires in 3 hours: "Vaadatav 3 tundi". should set that test data to skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, found one "movie" (in translation it says "special episode") without an expiry date:
and that's reflected in the metadata too: "category": "shows"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test data for movie that doesn't expire: 29ac66b
what to do with the item that will expire? (of course have to wait few hours to see how it appears on the site)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so. the movie has expired and returns 404. Should the test be dropped or use some skip attribute?
✖ curl -i https://duoplay.ee/4325/naljamangud
HTTP/2 404
content-language: et-ee
x-frame-options: SAMEORIGIN
content-type: text/html; charset=utf-8
date: Fri, 17 Nov 2023 08:57:30 GMT
via: 1.1 varnish (Varnish/6.0)
vary: Accept-Encoding, User-Agent
cache-control: no-cache
age: 0
content-length: 33749
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it in the next cleanup PR/commit. Thanks for the reminder
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay final set of suggestions from me.
Having thought about it a bit more, I don't think there is a truly great way to handle the metadata here. I think it's fine as is. Maybe one of the core maintainers has a different opinion.
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Authored by: glensc
Description of your pull request and other information
Adds extractor for duoplay.ee site
Example:
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at 61a6bcc
Summary
🚚📄🛠️
Added a new extractor class
DuoplayIE
to handle videos from theduoplay.ee
website. Created a new moduleduoplay.py
to define the class and its extraction logic.Walkthrough
duoplay.ee
videos (link, link)DuoplayIE
class induoplay.py
(link)_real_extract
method to parse video webpage and extract info and formats (link)_VALID_URL
pattern to matchduoplay.ee
URLs and_TESTS
cases to verify extraction (link)sts.postimees.ee
API (link)DuoplayIE
class in_extractors.py
and register it withgen_extractor_classes
(link)