New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RadioComercial] Add extractor #8508
Conversation
Same as #8507 (comment). Please instead edit the description and push additional commits onto the other branch |
@seproDev thank you very much for taking the time to look into this, and provide those valuable suggestions. I'm working on those and will update the code once complete. |
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after this, we should be good from my side.
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
…h single episodes
…ve query parameters or use anchors. Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
…ate extraction code
I'm currently having an issue extracting playlists from the website because of how it deals with unavailable episodes. At first, I was using a Python The trouble is, any unavailable episode defaults to the same URL (like radiocomercial/podcast/<season>). This behaviour can be seen in test number 4 of the This situation leads to two problems: Problem 1: When I was using a Python set and more than one episode was missing, I ended up with fewer episodes than the expected number ( Problem 2: The unavailable episodes cause an error when the downloader tries to process them.: [RadioComercialPlaylist] Playlist TNT - Todos No Top - Temporada 2023: Downloading 41 items of 41
[download] Downloading item 1 of 41
[RadioComercial] Extracting URL: https://radiocomercial.pt/podcasts/tnt-todos-no-top/2023/t-n-t-29-de-outubro
[RadioComercial] t-n-t-29-de-outubro: Downloading webpage
[info] t-n-t-29-de-outubro: Downloading 1 format(s): 0
[download] Destination: T.N.T 29 de outubro [t-n-t-29-de-outubro].mp3
[download] 100% of 86.69MiB in 00:00:13 at 6.62MiB/s
[download] Downloading item 2 of 41
ERROR: No suitable extractor (RadioComercial) found for URL https://radiocomercial.pt/podcasts/tnt-todos-no-top/2023/
[download] Downloading item 3 of 41
[RadioComercial] Extracting URL: https://radiocomercial.pt/podcasts/tnt-todos-no-top/2023/t-n-t-15-de-outubro
[RadioComercial] t-n-t-15-de-outubro: Downloading webpage` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of the problems you mentioned mean that a PagedList
is not a viable option for this playlist extractor.
Instead we'll need to just use a generator function (e.g. _entries()
), and we can use playlist_from_matches()
which casts the URLs to an orderedSet
and then constructs the url_result
s for us. We can also use RadioComercialIE.suitable()
to ensure the URLs are not bogus
I think I initially suggested using a |
@seproDev all good, the issues with using a |
Thank you very much for taking the time and providing the proper solution for this use case. This has been a great learning experience so far. |
yt_dlp/extractor/radiocomercial.py
Outdated
|
||
|
||
class RadioComercialIE(InfoExtractor): | ||
_VALID_URL = r'https?://(?:www\.)?radiocomercial\.pt/podcasts/[^/?#]+/t?(?P<season>\d+)/(?P<id>[\w-]+)/?(?:$|[?#])' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read #8535 (comment)
Here I have the same question:
_VALID_URL = r'https?://(?:www\.)?radiocomercial\.pt/podcasts/[^/?#]+/t?(?P<season>\d+)/(?P<id>[\w-]+)/?(?:$|[?#])' | |
_VALID_URL = r'https?://(?:www\.)?radiocomercial\.pt/podcasts/[^/?#]+/t?(?P<season>\d+)/(?P<id>[\w-]+)' |
It's needed in RadioComercialPlaylistIE._VALID_URL
, but I don't think we need it here? Unless I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are correct bashonly. I removed it from the single episode regex.
Authored by: SirElderling
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
This extractor was created specifically for the Portuguese radio station
Radio Comercial
.Its main function is to fetch and download podcast episodes.
Presently, it offers two extract functions:
Valid URLs that are covered by this extractor:
https://radiocomercial.pt/podcasts/convenca-me-num-minuto/t3/convenca-me-num-minuto-que-os-lobisomens-existem
https://radiocomercial.pt/podcasts/as-minhas-coisas-favoritas
https://radiocomercial.pt/podcasts/convenca-me-num-minuto/t3
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at bd9904c
Summary
🎙️📻🎧
This pull request adds two new extractors for
yt-dlp
,RadioComercialIE
andRadioComercialPlaylistIE
, which enable downloading audio and playlists from the Portuguese radio station Radio Comercial. It also fixes a minor formatting issue in_extractors.py
.Walkthrough
RadioComercialIE
andRadioComercialPlaylistIE
fromradiocomercial.py
in_extractors.py
(link)radiocomercial.py
that inherit fromRadioComercialBaseExtractor
(link)_extractors.py
for formatting consistency (link)