-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RadioFrance] Extractor for Radio France stations (www.radiofrance.fr) #31435
base: master
Are you sure you want to change the base?
Conversation
These are not working anymore after their respectives websites were merged into www.radiofrance.fr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I've made what may look like a load of suggestions, mostly aimed at robustness, but a lot are very similar.
int_or_none, | ||
parse_iso8601, | ||
strip_or_none, | ||
url_or_none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url_or_none | |
url_or_none, |
_VALID_URL = r'^https?://maison\.radiofrance\.fr/radiovisions/(?P<id>[^?#]+)' | ||
IE_NAME = 'radiofrance' | ||
class RadioFranceBaseIE(InfoExtractor): | ||
_BASE_URL = r'https://www.radiofrance.fr/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_BASE_URL = r'https://www.radiofrance.fr/' | |
_BASE_URL = 'https://www.radiofrance.fr/' |
'url': 'http://maison.radiofrance.fr/radiovisions/one-one', | ||
'md5': 'bdbb28ace95ed0e04faab32ba3160daf', | ||
def extract_api_data(self, api_path, id, html): | ||
pattern = r'<script [^>]*sveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern = r'<script [^>]*sveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path | |
pattern = r'<script\b[^>]+\bsveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path |
|
||
if not json: | ||
raise ExtractorError('%s: JSON data not found' % id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_search_regex()
already raises unless passed fatal=False
:
if not json: | |
raise ExtractorError('%s: JSON data not found' % id) |
_TEST = { | ||
'url': 'http://maison.radiofrance.fr/radiovisions/one-one', | ||
'md5': 'bdbb28ace95ed0e04faab32ba3160daf', | ||
def extract_api_data(self, api_path, id, html): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a different parameter name instead of id
, say item_id
.
If extract_api_data()
is only meant to be used in this class or its subclasses, call it _extract_api_data()
.
} | ||
}] | ||
|
||
def get_livestream_formats(self, id, api_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_livestream_formats()
?
Change id
parameter name, as before.
sources = api_data['media']['sources'] | ||
|
||
formats = [] | ||
for source in sources: | ||
url = source.get('url') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid crashing:
sources = api_data['media']['sources'] | |
formats = [] | |
for source in sources: | |
url = source.get('url') | |
formats = [] | |
for source in traverse_object(api_data, ('media', 'sources', Ellipsis), expected_type=dict): | |
url = url_or_none(source.get('url')) |
'formats': self.get_livestream_formats(id, api_data), | ||
'thumbnail': self.get_thumbnail(api_data, webpage), | ||
'channel_id': self.get_brand(api_data, webpage), | ||
'is_live': True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'is_live': True | |
'is_live': True, |
return formats | ||
|
||
def _real_extract(self, url): | ||
id = self._match_id(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id = self._match_id(url) | |
live_id = self._match_id(url) |
etc
if len(formats) == 0: | ||
raise ExtractorError('No live streaming URL found') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needed in yt-dl, especially as preference
was set, and raises if formats
is empty.
if len(formats) == 0: | |
raise ExtractorError('No live streaming URL found') | |
self._sort_formats(formats) |
Or allow empty formats
to be returned and call _sort_formats()
in the caller.
To customise the error message, specialise _sort_formats()
in the base class:
def _sort_formats(self, formats, *args, **kwargs):
try:
return super(RadioFranceBaseIE, self)._sort_formats(self, formats, *args, **kwargs)
except ExtractorError as e:
e.args = ('No formats found', )
raise e
Consider adding playlist support, as per #31464 (comment). In the problem page, the playlist is in an |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This PR adds support for podcasts (radio shows) and live musical webradio of Radio France (stations France Inter, France Culture, France Info, FIP, Le Mouv and France Musique).
Old extractors are removed since they do not work anymore after the merge of their respective websites under www.radiofrance.fr.
This closes :