Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RadioFrance] Extractor for Radio France stations (www.radiofrance.fr) #31435

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

otrichet
Copy link

@otrichet otrichet commented Dec 22, 2022

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR adds support for podcasts (radio shows) and live musical webradio of Radio France (stations France Inter, France Culture, France Info, FIP, Le Mouv and France Musique).

Old extractors are removed since they do not work anymore after the merge of their respective websites under www.radiofrance.fr.

This closes :

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I've made what may look like a load of suggestions, mostly aimed at robustness, but a lot are very similar.

int_or_none,
parse_iso8601,
strip_or_none,
url_or_none
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url_or_none
url_or_none,

_VALID_URL = r'^https?://maison\.radiofrance\.fr/radiovisions/(?P<id>[^?#]+)'
IE_NAME = 'radiofrance'
class RadioFranceBaseIE(InfoExtractor):
_BASE_URL = r'https://www.radiofrance.fr/'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_BASE_URL = r'https://www.radiofrance.fr/'
_BASE_URL = 'https://www.radiofrance.fr/'

'url': 'http://maison.radiofrance.fr/radiovisions/one-one',
'md5': 'bdbb28ace95ed0e04faab32ba3160daf',
def extract_api_data(self, api_path, id, html):
pattern = r'<script [^>]*sveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern = r'<script [^>]*sveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path
pattern = r'<script\b[^>]+\bsveltekit:data-url="https://www\.radiofrance\.fr/api/v[\d.]+/%s[^>]*>(?P<json>.*)</script>' % api_path

Comment on lines +23 to +25

if not json:
raise ExtractorError('%s: JSON data not found' % id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_search_regex() already raises unless passed fatal=False:

Suggested change
if not json:
raise ExtractorError('%s: JSON data not found' % id)

_TEST = {
'url': 'http://maison.radiofrance.fr/radiovisions/one-one',
'md5': 'bdbb28ace95ed0e04faab32ba3160daf',
def extract_api_data(self, api_path, id, html):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a different parameter name instead of id, say item_id.

If extract_api_data() is only meant to be used in this class or its subclasses, call it _extract_api_data().

}
}]

def get_livestream_formats(self, id, api_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_livestream_formats()?

Change id parameter name, as before.

Comment on lines +244 to +248
sources = api_data['media']['sources']

formats = []
for source in sources:
url = source.get('url')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid crashing:

Suggested change
sources = api_data['media']['sources']
formats = []
for source in sources:
url = source.get('url')
formats = []
for source in traverse_object(api_data, ('media', 'sources', Ellipsis), expected_type=dict):
url = url_or_none(source.get('url'))

'formats': self.get_livestream_formats(id, api_data),
'thumbnail': self.get_thumbnail(api_data, webpage),
'channel_id': self.get_brand(api_data, webpage),
'is_live': True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'is_live': True
'is_live': True,

return formats

def _real_extract(self, url):
id = self._match_id(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
id = self._match_id(url)
live_id = self._match_id(url)

etc

Comment on lines +272 to +273
if len(formats) == 0:
raise ExtractorError('No live streaming URL found')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed in yt-dl, especially as preference was set, and raises if formats is empty.

Suggested change
if len(formats) == 0:
raise ExtractorError('No live streaming URL found')
self._sort_formats(formats)

Or allow empty formats to be returned and call _sort_formats() in the caller.

To customise the error message, specialise _sort_formats() in the base class:

    def _sort_formats(self, formats, *args, **kwargs):
        try:
            return super(RadioFranceBaseIE, self)._sort_formats(self, formats, *args, **kwargs)
        except ExtractorError as e:
            e.args = ('No formats found', )
            raise e

@dirkf
Copy link
Contributor

dirkf commented Jan 23, 2023

Consider adding playlist support, as per #31464 (comment).

In the problem page, the playlist is in an ItemList in a ld+json block, as well as in the HTML itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken site support: https://www.radiofrance.fr/ to download playlists
2 participants