Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extractor for www.mujrozhlas.cz #31513

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ptesarik
Copy link

@ptesarik ptesarik commented Feb 5, 2023

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

The online archive at www.mujrozhlas.cz provides access to a comprehensive collection of audio material created by Český rozhlas (Czech Radio), an official public broadcasting company in Czechia.

Signed-off-by: Petr Tesarik <petr@tesarici.cz>
@dirkf
Copy link
Contributor

dirkf commented Feb 6, 2023

This seems to be a new site. Please create an issue for it using the site support template, or paste the completed template here. But I'm pretty sure the site will be fine to support.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

I've made some suggestions (I didn't add the necessary imports).

display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_data = self._search_regex(
r'\bvar dl = ({[^\n]+});',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relax the RE, and unless you set the s flag [^\n] is implied by .:

Suggested change
r'\bvar dl = ({[^\n]+});',
r'\bvar\s+dl\s*=\s*({.+});',

Comment on lines +49 to +51
webpage, 'player data', default=None)
if not player_data:
raise ExtractorError('Could not find player data')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have the same effect:

Suggested change
webpage, 'player data', default=None)
if not player_data:
raise ExtractorError('Could not find player data')
webpage, 'player data')

if bundle not in ('episode', 'serialPart'):
raise ExtractorError('Unsupported entity: {0}'.format(bundle))

url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(player_data['contentId'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(player_data['contentId'])
url = 'https://api.mujrozhlas.cz/episodes/{0}'.format(audio_id'])

'format_id': '-'.join(('mp3', str(bitrate))),
'vcodec': 'none',
'abr': bitrate,
'tbr': bitrate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return tbr. Having the two previous items will cause this to be calculated automatically, or we need to fix it if not.

Suggested change
'tbr': bitrate,

'url': link['url'],
'protocol': m.group('proto'),
'ext': m.group('ext'),
'format_id': '-'.join(('mp3', str(bitrate))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use compat_str, not str in yt-dl; but here just use string formatting:

Suggested change
'format_id': '-'.join(('mp3', str(bitrate))),
'format_id': 'mp3-{0}'.format('NA' if bitrate is None else bitrate ),

return {
'id': audio_id,
'title': attr['title'],
'description': clean_html(attr['description']),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a mandatory field:

Suggested change
'description': clean_html(attr['description']),
'description': clean_html(attr.get('description')),

Comment on lines +74 to +82
url = link['url']
m = re.search(
r'(?P<proto>[^:]+):(?:.*/)*(?P<id>[^.]+)\.(?P<ext>[^/.]+)$',
url)
bitrate = link['bitrate']
formats.append({
'url': link['url'],
'protocol': m.group('proto'),
'ext': m.group('ext'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the library functions to parse a URL; also bitrate is going to be abr which is not a required field:

Suggested change
url = link['url']
m = re.search(
r'(?P<proto>[^:]+):(?:.*/)*(?P<id>[^.]+)\.(?P<ext>[^/.]+)$',
url)
bitrate = link['bitrate']
formats.append({
'url': link['url'],
'protocol': m.group('proto'),
'ext': m.group('ext'),
parsed_url = compat_urlparse.urlparse(link_url)
bitrate = int_or_none(link.get('bitrate'))
formats.append({
'url': link_url,
'protocol': parsed_url.scheme,
'ext': determine_ext(parsed_url.path),

Comment on lines +64 to +65
for link in attr['audioLinks']:
variant = link['variant']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More robust, and also condition the media url? Then use link_url later.

Suggested change
for link in attr['audioLinks']:
variant = link['variant']
for link in traverse_obj(attr, ('audioLinks', Ellipsis), expected_type=dict):
link_url = url_or_none(link.get('url'))
if not link_url:
continue
variant = link.get('variant')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants