Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/joqrag] Add extractor #8384

Merged
merged 19 commits into from Dec 19, 2023
Merged

[ie/joqrag] Add extractor #8384

merged 19 commits into from Dec 19, 2023

Conversation

pzhlkj6612
Copy link
Contributor

@pzhlkj6612 pzhlkj6612 commented Oct 19, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Hi! I would like to let yt-dlp support radio "超!A&G+" on Nippon Cultural Broadcasting (JOQR).

Please do not to be confused with Radiko (radiko.py).

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 4f478a9

Summary

📻🇯🇵🎙️

Add support for extracting live streams from JOQR radio station. Add joqrag.py module with JoqrAgIE class and import it in _extractors.py.

JoqrAgIE class
Extracts anime radio streams
Autumn leaves flutter

Walkthrough

  • Implement live stream extraction for JOQR radio station (link, link)
    • Import JoqrAgIE class from joqrag.py module in _extractors.py (link)
    • Define JoqrAgIE class in joqrag.py module that inherits from InfoExtractor base class (link)
    • Override suitable and ie_key methods to match the JOQR website url and return the extractor key (link)
    • Override _real_extract method to parse the webpage and extract the metadata and the m3u8 url using self._html_search_regex and self._extract_m3u8_formats methods (link)
    • Return a dictionary with the extracted information, such as id, title, formats, is_live, etc. (link)
    • Add a test case for the extractor in joqrag.py module that checks the id, title, ext, and is_live fields of the extracted information (link)

Support radio "超!A&G+" on Nippon Cultural Broadcasting (JOQR).
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
@garret1317 garret1317 added the site-request Request to support a new website label Oct 20, 2023
@pzhlkj6612 pzhlkj6612 marked this pull request as draft October 20, 2023 10:35
pzhlkj6612 and others added 5 commits October 21, 2023 01:51
Co-authored-by: garret <garret1317@yandex.com>
Co-authored-by: garret <garret1317@yandex.com>
Co-authored-by: garret <garret1317@yandex.com>
Co-authored-by: garret <garret1317@yandex.com>
Co-authored-by: garret <garret1317@yandex.com>
@pzhlkj6612 pzhlkj6612 marked this pull request as ready for review October 20, 2023 19:02
@bashonly bashonly self-requested a review October 21, 2023 13:51
@pzhlkj6612
Copy link
Contributor Author

Hi there, I'm going to add a change:

The "joqrag" site stops at night and plays a loop video containing their logo. I think that the extractor should not download the stream but treat it as an "up_coming" live and "raise_no_formats()".

Now convert this PR to draft.

@pzhlkj6612 pzhlkj6612 marked this pull request as draft October 21, 2023 14:13
@pzhlkj6612 pzhlkj6612 marked this pull request as ready for review October 21, 2023 19:13
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved
@bashonly bashonly added the pending-fixes PR has had changes requested label Nov 15, 2023
@pzhlkj6612 pzhlkj6612 marked this pull request as draft November 16, 2023 14:26
pzhlkj6612 and others added 4 commits November 28, 2023 05:47
- Order of imports
- More readable url regex
- More strict and safer quotes matching
- Shorter inner function name
- Using single quotes
- More lenient start time regex
- Inlined description
- More readable code for constructing message
- Raising exceptions if no m3u8 formats

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@pzhlkj6612 pzhlkj6612 marked this pull request as ready for review November 28, 2023 07:53
Comment on lines 15 to 16
_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/player\.php',
r'https?://www\.uniqueradio\.jp/agplayer5/inc-player-hls\.php',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/player\.php',
r'https?://www\.uniqueradio\.jp/agplayer5/inc-player-hls\.php',
_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/(?:player|inc-player-hls)\.php',

Comment on lines 47 to 50
def _extract_metadata(self, variable, html, name):
return clean_html(urllib.parse.unquote_plus(self._search_regex(
rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
html, name, group='value', default=''))) or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's always non-fatal so the name param doesn't really matter

Suggested change
def _extract_metadata(self, variable, html, name):
return clean_html(urllib.parse.unquote_plus(self._search_regex(
rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
html, name, group='value', default=''))) or None
def _extract_metadata(self, variable, html):
return clean_html(urllib.parse.unquote_plus(self._search_regex(
rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
html, 'metadata', group='value', default=''))) or None

metadata = self._download_webpage(
'https://www.uniqueradio.jp/aandg', video_id,
note='Downloading metadata', errnote='Failed to download metadata')
title = self._extract_metadata('Program_name', metadata, 'program title')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title = self._extract_metadata('Program_name', metadata, 'program title')
title = self._extract_metadata('Program_name', metadata)

'id': video_id,
'title': title,
'channel': '超!A&G+',
'description': self._extract_metadata('Program_text', metadata, 'program description'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'description': self._extract_metadata('Program_text', metadata, 'program description'),
'description': self._extract_metadata('Program_text', metadata),

Comment on lines 56 to 62
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(?P<time>\d{1,2}:\d{1,2})',
self._download_webpage(
f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
note=f'Downloading program list of {date}', fatal=False,
errnote=f'Failed to download program list of {date}') or '',
'start time of the first program', default=None, group='time')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make this a little less busy

Suggested change
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(?P<time>\d{1,2}:\d{1,2})',
self._download_webpage(
f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
note=f'Downloading program list of {date}', fatal=False,
errnote=f'Failed to download program list of {date}') or '',
'start time of the first program', default=None, group='time')
start_time = self._search_regex(
r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(\d{1,2}:\d{1,2})',
self._download_webpage(
f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
note=f'Downloading program list of {date}', fatal=False,
errnote=f'Failed to download program list of {date}') or '',
'start time', default=None)

@bashonly bashonly removed the pending-fixes PR has had changes requested label Dec 12, 2023
@pzhlkj6612 pzhlkj6612 marked this pull request as draft December 12, 2023 14:34
pzhlkj6612 and others added 2 commits December 12, 2023 23:14
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@pzhlkj6612 pzhlkj6612 marked this pull request as ready for review December 12, 2023 15:25
@bashonly bashonly merged commit db8b4ed into yt-dlp:master Dec 19, 2023
15 checks passed
@pzhlkj6612 pzhlkj6612 deleted the joqr branch December 19, 2023 14:28
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants