Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[duoplay] Add Duoplay extractor #8542

Merged
merged 41 commits into from
Nov 16, 2023
Merged

[duoplay] Add Duoplay extractor #8542

merged 41 commits into from
Nov 16, 2023

Conversation

glensc
Copy link
Contributor

@glensc glensc commented Nov 7, 2023

Description of your pull request and other information

Adds extractor for duoplay.ee site

Example:

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 61a6bcc

Summary

🚚📄🛠️

Added a new extractor class DuoplayIE to handle videos from the duoplay.ee website. Created a new module duoplay.py to define the class and its extraction logic.

DuoplayIE class
extracts video formats
from winter website

Walkthrough

  • Create and register a new extractor class for duoplay.ee videos (link, link)
    • Import helper modules and define DuoplayIE class in duoplay.py (link)
    • Implement _real_extract method to parse video webpage and extract info and formats (link)
    • Define _VALID_URL pattern to match duoplay.ee URLs and _TESTS cases to verify extraction (link)
    • Use custom logic to obtain and update session token from sts.postimees.ee API (link)
    • Leave some debugging code and unfinished parts for further development (link)
    • Import DuoplayIE class in _extractors.py and register it with gen_extractor_classes (link)

@bashonly bashonly self-requested a review November 7, 2023 15:45
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
@glensc

This comment was marked as outdated.

$ pip install -r requirements.txt
$ python test/test_download.py TestDownload.test_Duoplay
@glensc glensc changed the title Add Duoplay extractor [duoplay] Add Duoplay extractor Nov 7, 2023
@glensc glensc marked this pull request as ready for review November 7, 2023 18:50
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
To customize filename, use output format instead:
-o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"
@glensc
Copy link
Contributor Author

glensc commented Nov 7, 2023

added more metadata, so you can use this output format to save file with show and episode info:

 -o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"

@glensc
Copy link
Contributor Author

glensc commented Nov 8, 2023

🤖 Generated by Copilot at 61a6bcc

Can copilot be refreshed to use HEAD commit?

@seproDev
Copy link
Member

seproDev commented Nov 8, 2023

Can copilot be refreshed to use HEAD commit?

Don't worry about it. Copilot is pretty useless and will be removed soon.

@glensc
Copy link
Contributor Author

glensc commented Nov 8, 2023

so. this is ready from my side (changed pr to non-draft). can this be reviewed now?

    telecast_id, episode = self._match_valid_url(url).groups('id', 'ep')
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: groups() takes at most 1 argument (2 given)
@glensc
Copy link
Contributor Author

glensc commented Nov 10, 2023

@seproDev thanks for the feedback! addressed the issues. CI should pass as well.

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
Comment on lines 69 to 71
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
**traverse_obj(episode_attr, {
'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now brings up the problem of the title. This is the best I came up with, tell me if you think this makes sense:
For movies, always use the title. For shows/series, first try the subtitle, then try to name it after the episode number and if all else fails also use the title.

Suggested change
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
**traverse_obj(episode_attr, {
'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),
'title': episode_attr.get('title') if episode_attr.get('category') == 'movies' else (
traverse_obj(episode_attr, 'subtitle', ('episode_nr', {lambda x: f'Episode {x}' if x else None}), 'title')),
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
**traverse_obj(episode_attr, {

This means the tests will need to be updated. But I think it would be unexpected to have for example https://duoplay.ee/17/uhikarotid?ep=14 be named Episode 14 even though it is the 2nd episode of the 2nd season.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of movies in a separate pr with same logic, and my version was more verbose.

sadly for the yhikarotid, you have to accept the absolute number scheme, or rename items manually. maybe if show has only absolute numbers, then omit the season 01 from it's name?

diff --git a/yt_dlp/extractor/duoplay.py b/yt_dlp/extractor/duoplay.py
index f12c91012..d2dbc27c4 100644
--- a/yt_dlp/extractor/duoplay.py
+++ b/yt_dlp/extractor/duoplay.py
@@ -53,8 +53,15 @@ class DuoplayIE(InfoExtractor):
         },
     }]
 
+    @staticmethod
+    def is_movie(attr):
+        # return attr.get('episode_id', None) == 0
+        return attr.get('category', None) == 'movies'
+
     def _real_extract(self, url):
         telecast_id, episode = self._match_valid_url(url).groups()
         video_id = join_nonempty(telecast_id, episode, delim='_')
         webpage = self._download_webpage(url, video_id)
         video_player = try_call(lambda: extract_attributes(
@@ -63,14 +70,28 @@ def _real_extract(self, url):
             self.raise_no_formats('No video found', expected=True)
 
         episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {}
 
-        return {
+        data = {
             'id': video_id,
             'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
+            'description': 'synopsis',
+            'thumbnail': traverse_obj(episode_attr, ('images', 'original')),
+        }
+
+        if self.is_movie(episode_attr):
+            return {
+                **data,
+                'title': f"{traverse_obj(episode_attr, 'title')} ({traverse_obj(episode_attr, 'year')})",
+            }
+
+        return {
+            **data,
             **traverse_obj(episode_attr, {
                 'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),
-                'description': 'synopsis',
-                'thumbnail': ('images', 'original'),
                 'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}),
                 'series': 'title',
                 'series_id': ('telecast_id', {str_or_none}),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if show has only absolute numbers, then omit the season 01 from it's name?

That could work.
If no season number present ⇒ absolute value. If season number present ⇒ relative value.
I'd be curious to hear what one of the maintainers thinks.

As for your diff, I see that you are excluding season and episode data on movies, which makes sense. I'll make a better suggestion later working that in.

Copy link
Contributor Author

@glensc glensc Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test data for a movie. and yes, the series data doesn't make sense for movie. also i'd like to put year to movie title. can you help with that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also. the movie will expire in 4 days, not sure if they remove the URL and data if that expires.

  "_visible_until": {
    "diff": 4,
    "unit": "day"
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. the test movie expires in 3 hours: "Vaadatav 3 tundi". should set that test data to skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, found one "movie" (in translation it says "special episode") without an expiry date:

and that's reflected in the metadata too: "category": "shows"

Copy link
Contributor Author

@glensc glensc Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test data for movie that doesn't expire: 29ac66b

what to do with the item that will expire? (of course have to wait few hours to see how it appears on the site)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so. the movie has expired and returns 404. Should the test be dropped or use some skip attribute?

✖ curl -i https://duoplay.ee/4325/naljamangud
HTTP/2 404 
content-language: et-ee
x-frame-options: SAMEORIGIN
content-type: text/html; charset=utf-8
date: Fri, 17 Nov 2023 08:57:30 GMT
via: 1.1 varnish (Varnish/6.0)
vary: Accept-Encoding, User-Agent
cache-control: no-cache
age: 0
content-length: 33749

[...]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it in the next cleanup PR/commit. Thanks for the reminder

glensc and others added 6 commits November 13, 2023 11:37
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Copy link
Member

@seproDev seproDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay final set of suggestions from me.
Having thought about it a bit more, I don't think there is a truly great way to handle the metadata here. I think it's fine as is. Maybe one of the core maintainers has a different opinion.

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
@seproDev seproDev added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Nov 14, 2023
@bashonly bashonly removed the pending-review PR needs a review label Nov 15, 2023
@bashonly bashonly assigned bashonly and unassigned bashonly Nov 15, 2023
glensc and others added 4 commits November 16, 2023 15:59
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved
@bashonly bashonly merged commit 66a0127 into yt-dlp:master Nov 16, 2023
15 checks passed
@glensc
Copy link
Contributor Author

glensc commented Nov 17, 2023

yay. thanks @seproDev, @bashonly!

@glensc glensc deleted the ie-duoplay branch November 17, 2023 08:42
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants