-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extractor for www.mujrozhlas.cz #31513
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Petr Tesarik <petr@tesarici.cz>
This seems to be a new site. Please create an issue for it using the site support template, or paste the completed template here. But I'm pretty sure the site will be fine to support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
I've made some suggestions (I didn't add the necessary imports).
display_id = self._match_id(url) | ||
webpage = self._download_webpage(url, display_id) | ||
player_data = self._search_regex( | ||
r'\bvar dl = ({[^\n]+});', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relax the RE, and unless you set the s
flag [^\n]
is implied by .
:
r'\bvar dl = ({[^\n]+});', | |
r'\bvar\s+dl\s*=\s*({.+});', |
webpage, 'player data', default=None) | ||
if not player_data: | ||
raise ExtractorError('Could not find player data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have the same effect:
webpage, 'player data', default=None) | |
if not player_data: | |
raise ExtractorError('Could not find player data') | |
webpage, 'player data') |
if bundle not in ('episode', 'serialPart'): | ||
raise ExtractorError('Unsupported entity: {0}'.format(bundle)) | ||
|
||
url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(player_data['contentId']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(player_data['contentId']) | |
url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(audio_id']) |
'format_id': '-'.join(('mp3', str(bitrate))), | ||
'vcodec': 'none', | ||
'abr': bitrate, | ||
'tbr': bitrate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't return tbr
. Having the two previous items will cause this to be calculated automatically, or we need to fix it if not.
'tbr': bitrate, |
'url': link['url'], | ||
'protocol': m.group('proto'), | ||
'ext': m.group('ext'), | ||
'format_id': '-'.join(('mp3', str(bitrate))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use compat_str
, not str
in yt-dl; but here just use string formatting:
'format_id': '-'.join(('mp3', str(bitrate))), | |
'format_id': 'mp3-{0}'.format('NA' if bitrate is None else bitrate ), |
return { | ||
'id': audio_id, | ||
'title': attr['title'], | ||
'description': clean_html(attr['description']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a mandatory field:
'description': clean_html(attr['description']), | |
'description': clean_html(attr.get('description')), |
url = link['url'] | ||
m = re.search( | ||
r'(?P<proto>[^:]+):(?:.*/)*(?P<id>[^.]+)\.(?P<ext>[^/.]+)$', | ||
url) | ||
bitrate = link['bitrate'] | ||
formats.append({ | ||
'url': link['url'], | ||
'protocol': m.group('proto'), | ||
'ext': m.group('ext'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the library functions to parse a URL; also bitrate
is going to be abr
which is not a required field:
url = link['url'] | |
m = re.search( | |
r'(?P<proto>[^:]+):(?:.*/)*(?P<id>[^.]+)\.(?P<ext>[^/.]+)$', | |
url) | |
bitrate = link['bitrate'] | |
formats.append({ | |
'url': link['url'], | |
'protocol': m.group('proto'), | |
'ext': m.group('ext'), | |
parsed_url = compat_urlparse.urlparse(link_url) | |
bitrate = int_or_none(link.get('bitrate')) | |
formats.append({ | |
'url': link_url, | |
'protocol': parsed_url.scheme, | |
'ext': determine_ext(parsed_url.path), |
for link in attr['audioLinks']: | ||
variant = link['variant'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More robust, and also condition the media url? Then use link_url
later.
for link in attr['audioLinks']: | |
variant = link['variant'] | |
for link in traverse_obj(attr, ('audioLinks', Ellipsis), expected_type=dict): | |
link_url = url_or_none(link.get('url')) | |
if not link_url: | |
continue | |
variant = link.get('variant') |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
The online archive at www.mujrozhlas.cz provides access to a comprehensive collection of audio material created by Český rozhlas (Czech Radio), an official public broadcasting company in Czechia.