[duoplay] Add Duoplay extractor #8542

glensc · 2023-11-07T15:38:18Z

Description of your pull request and other information

Adds extractor for duoplay.ee site

Example:

https://duoplay.ee/4312/siberi-vomm?ep=24

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 61a6bcc`

Summary

🚚📄🛠️

Added a new extractor class DuoplayIE to handle videos from the duoplay.ee website. Created a new module duoplay.py to define the class and its extraction logic.

DuoplayIE class
extracts video formats
from winter website

Walkthrough

Create and register a new extractor class for duoplay.ee videos (link, link)
- Import helper modules and define DuoplayIE class in duoplay.py (link)
- Implement _real_extract method to parse video webpage and extract info and formats (link)
- Define _VALID_URL pattern to match duoplay.ee URLs and _TESTS cases to verify extraction (link)
- Use custom logic to obtain and update session token from sts.postimees.ee API (link)
- Leave some debugging code and unfinished parts for further development (link)
- Import DuoplayIE class in _extractors.py and register it with gen_extractor_classes (link)

yt_dlp/extractor/duoplay.py

$ pip install -r requirements.txt $ python test/test_download.py TestDownload.test_Duoplay

yt_dlp/extractor/duoplay.py

To customize filename, use output format instead: -o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"

glensc · 2023-11-07T19:19:09Z

added more metadata, so you can use this output format to save file with show and episode info:

 -o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"

glensc · 2023-11-08T09:59:26Z

🤖 Generated by Copilot at 61a6bcc

Can copilot be refreshed to use HEAD commit?

seproDev · 2023-11-08T10:01:44Z

Can copilot be refreshed to use HEAD commit?

Don't worry about it. Copilot is pretty useless and will be removed soon.

glensc · 2023-11-08T22:08:25Z

so. this is ready from my side (changed pr to non-draft). can this be reviewed now?

telecast_id, episode = self._match_valid_url(url).groups('id', 'ep') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: groups() takes at most 1 argument (2 given)

glensc · 2023-11-10T10:04:56Z

@seproDev thanks for the feedback! addressed the issues. CI should pass as well.

- https://docs.python.org/3/library/re.html#re.Match.groups > Match.groups(default=None)

yt_dlp/extractor/duoplay.py

seproDev · 2023-11-12T10:12:59Z

yt_dlp/extractor/duoplay.py

+            'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
+            **traverse_obj(episode_attr, {
+                'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),


This now brings up the problem of the title. This is the best I came up with, tell me if you think this makes sense:
For movies, always use the title. For shows/series, first try the subtitle, then try to name it after the episode number and if all else fails also use the title.

Suggested change

'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),

**traverse_obj(episode_attr, {

'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))),

'title': episode_attr.get('title') if episode_attr.get('category') == 'movies' else (

traverse_obj(episode_attr, 'subtitle', ('episode_nr', {lambda x: f'Episode {x}' if x else None}), 'title')),

'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),

**traverse_obj(episode_attr, {

This means the tests will need to be updated. But I think it would be unexpected to have for example https://duoplay.ee/17/uhikarotid?ep=14 be named Episode 14 even though it is the 2nd episode of the 2nd season.

I thought of movies in a separate pr with same logic, and my version was more verbose.

sadly for the yhikarotid, you have to accept the absolute number scheme, or rename items manually. maybe if show has only absolute numbers, then omit the season 01 from it's name?

diff --git a/yt_dlp/extractor/duoplay.py b/yt_dlp/extractor/duoplay.py index f12c91012..d2dbc27c4 100644 --- a/yt_dlp/extractor/duoplay.py +++ b/yt_dlp/extractor/duoplay.py @@ -53,8 +53,15 @@ class DuoplayIE(InfoExtractor): }, }] + @staticmethod + def is_movie(attr): + # return attr.get('episode_id', None) == 0 + return attr.get('category', None) == 'movies' + def _real_extract(self, url): telecast_id, episode = self._match_valid_url(url).groups() video_id = join_nonempty(telecast_id, episode, delim='_') webpage = self._download_webpage(url, video_id) video_player = try_call(lambda: extract_attributes( @@ -63,14 +70,28 @@ def _real_extract(self, url): self.raise_no_formats('No video found', expected=True) episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {} - return { + data = { 'id': video_id, 'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'), + 'description': 'synopsis', + 'thumbnail': traverse_obj(episode_attr, ('images', 'original')), + } + + if self.is_movie(episode_attr): + return { + **data, + 'title': f"{traverse_obj(episode_attr, 'title')} ({traverse_obj(episode_attr, 'year')})", + } + + return { + **data, **traverse_obj(episode_attr, { 'title': (None, ('subtitle', ('episode_id', {lambda x: f'Episode {x}'}))), - 'description': 'synopsis', - 'thumbnail': ('images', 'original'), 'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}), 'series': 'title', 'series_id': ('telecast_id', {str_or_none}),

maybe if show has only absolute numbers, then omit the season 01 from it's name?

That could work.
If no season number present ⇒ absolute value. If season number present ⇒ relative value.
I'd be curious to hear what one of the maintainers thinks.

As for your diff, I see that you are excluding season and episode data on movies, which makes sense. I'll make a better suggestion later working that in.

I added test data for a movie. and yes, the series data doesn't make sense for movie. also i'd like to put year to movie title. can you help with that too?

also. the movie will expire in 4 days, not sure if they remove the URL and data if that expires.

"_visible_until": { "diff": 4, "unit": "day" }

lol. the test movie expires in 3 hours: "Vaadatav 3 tundi". should set that test data to skip?

so, found one "movie" (in translation it says "special episode") without an expiry date:

https://duoplay.ee/8279/mulle-eestist-ei-piisa

and that's reflected in the metadata too: "category": "shows"

Added test data for movie that doesn't expire: 29ac66b

what to do with the item that will expire? (of course have to wait few hours to see how it appears on the site)

so. the movie has expired and returns 404. Should the test be dropped or use some skip attribute?

✖ curl -i https://duoplay.ee/4325/naljamangud HTTP/2 404 content-language: et-ee x-frame-options: SAMEORIGIN content-type: text/html; charset=utf-8 date: Fri, 17 Nov 2023 08:57:30 GMT via: 1.1 varnish (Varnish/6.0) vary: Accept-Encoding, User-Agent cache-control: no-cache age: 0 content-length: 33749 [...]

I will add it in the next cleanup PR/commit. Thanks for the reminder

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

seproDev

Okay final set of suggestions from me.
Having thought about it a bit more, I don't think there is a truly great way to handle the metadata here. I think it's fine as is. Maybe one of the core maintainers has a different opinion.

yt_dlp/extractor/duoplay.py

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

yt_dlp/extractor/duoplay.py

glensc · 2023-11-17T08:42:16Z

yay. thanks @seproDev, @bashonly!

Authored by: glensc

glensc added 6 commits November 7, 2023 17:10

Create DuoplayIE

8938c68

Register DuoplayIE

2895aa5

fixup! Create DuoplayIE

9e75ce9

WIP

61a6bcc

fixup! Register DuoplayIE

291b3e1

Remove copy paste leftovers for now

74bd2c8

bashonly self-requested a review November 7, 2023 15:45

Save some progress

6496d43

glensc commented Nov 7, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

Finalized implementation

b6566cd

This comment was marked as outdated.

Sign in to view

glensc added 2 commits November 7, 2023 20:20

Cleanup imports

37832d0

Update test data

d580a1e

$ pip install -r requirements.txt $ python test/test_download.py TestDownload.test_Duoplay

glensc changed the title ~~Add Duoplay extractor~~ [duoplay] Add Duoplay extractor Nov 7, 2023

glensc marked this pull request as ready for review November 7, 2023 18:50

fixup! Update test data

86aa729

glensc commented Nov 7, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

glensc added 3 commits November 7, 2023 21:14

Include season-episode in title

8cb4fbb

Fill extra info obj rather change default title

f2f9362

Fill only title to title

1e7c6a0

To customize filename, use output format instead: -o "%(series)s S%(season_number)02dE%(episode_number)02d - %(title)s.%(ext)s"

glensc added 2 commits November 7, 2023 21:19

Cleanup unused join_nonempty import

f000349

Test data fix

a98bad9

glensc added 4 commits November 9, 2023 01:36

Add fallback to absolute episode id in case of missing subtitle

13d896e

Pass series_id and episode_id metadata

9a3c374

Add test for example with no title

34bbe84

fixup! Add test for example with no title

1301c71

glensc added 2 commits November 10, 2023 11:50

Fix groups arguments

5beb56f

telecast_id, episode = self._match_valid_url(url).groups('id', 'ep') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: groups() takes at most 1 argument (2 given)

Update test data

f848464

Fix .groups() arguments

834a590

- https://docs.python.org/3/library/re.html#re.Match.groups > Match.groups(default=None)

seproDev reviewed Nov 12, 2023

View reviewed changes

glensc and others added 6 commits November 13, 2023 11:37

Update yt_dlp/extractor/duoplay.py: Use group() method

0936b27

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Update yt_dlp/extractor/duoplay.py: Try episode number cast

de17d2e

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Update yt_dlp/extractor/duoplay.py: Title changes

c54853f

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Add back episode_id, it's useful for absolute ordering

eb112c9

Update test data

e6bc3cd

Add test data for a movie

8a9a5de

seproDev approved these changes Nov 14, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

seproDev added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Nov 14, 2023

seproDev reviewed Nov 14, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

bashonly approved these changes Nov 15, 2023

View reviewed changes

bashonly removed the pending-review PR needs a review label Nov 15, 2023

bashonly assigned bashonly and unassigned bashonly Nov 15, 2023

glensc and others added 4 commits November 16, 2023 15:59

No longer include series related data in movies, also add release_year

2775f97

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Update test data

e3182e5

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Update test data, cleanup None

d92869b

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>

Add test data of movie without expiry date

29ac66b

bashonly reviewed Nov 16, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

bashonly reviewed Nov 16, 2023

View reviewed changes

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

yt_dlp/extractor/duoplay.py Outdated Show resolved Hide resolved

Apply suggestions from code review

29bfc41

bashonly merged commit 66a0127 into yt-dlp:master Nov 16, 2023
15 checks passed

glensc deleted the ie-duoplay branch November 17, 2023 08:42

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/duoplay] Add extractor (yt-dlp#8542)

8d8fcca

Authored by: glensc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[duoplay] Add Duoplay extractor #8542

[duoplay] Add Duoplay extractor #8542

glensc commented Nov 7, 2023 •

edited

Loading

This comment was marked as outdated.

glensc commented Nov 7, 2023

glensc commented Nov 8, 2023

seproDev commented Nov 8, 2023

glensc commented Nov 8, 2023

glensc commented Nov 10, 2023

seproDev Nov 12, 2023

glensc Nov 13, 2023

seproDev Nov 13, 2023

glensc Nov 13, 2023 •

edited

Loading

glensc Nov 13, 2023

glensc Nov 16, 2023

glensc Nov 16, 2023

glensc Nov 16, 2023 •

edited

Loading

glensc Nov 17, 2023

bashonly Nov 17, 2023

seproDev left a comment

glensc commented Nov 17, 2023

[duoplay] Add Duoplay extractor #8542

[duoplay] Add Duoplay extractor #8542

Conversation

glensc commented Nov 7, 2023 • edited Loading

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 61a6bcc

Summary

Walkthrough

This comment was marked as outdated.

glensc commented Nov 7, 2023

glensc commented Nov 8, 2023

seproDev commented Nov 8, 2023

glensc commented Nov 8, 2023

glensc commented Nov 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glensc Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glensc Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seproDev left a comment

Choose a reason for hiding this comment

glensc commented Nov 17, 2023

glensc commented Nov 7, 2023 •

edited

Loading

`🤖 Generated by Copilot at 61a6bcc`

glensc Nov 13, 2023 •

edited

Loading

glensc Nov 16, 2023 •

edited

Loading