[ie/joqrag] Add extractor #8384

pzhlkj6612 · 2023-10-19T10:55:32Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Hi! I would like to let yt-dlp support radio "超!A&G+" on Nippon Cultural Broadcasting (JOQR).

Please do not to be confused with Radiko (radiko.py).

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 4f478a9`

Summary

📻🇯🇵🎙️

Add support for extracting live streams from JOQR radio station. Add joqrag.py module with JoqrAgIE class and import it in _extractors.py.

JoqrAgIE class
Extracts anime radio streams
Autumn leaves flutter

Walkthrough

Implement live stream extraction for JOQR radio station (link, link)
- Import JoqrAgIE class from joqrag.py module in _extractors.py (link)
- Define JoqrAgIE class in joqrag.py module that inherits from InfoExtractor base class (link)
- Override suitable and ie_key methods to match the JOQR website url and return the extractor key (link)
- Override _real_extract method to parse the webpage and extract the metadata and the m3u8 url using self._html_search_regex and self._extract_m3u8_formats methods (link)
- Return a dictionary with the extracted information, such as id, title, formats, is_live, etc. (link)
- Add a test case for the extractor in joqrag.py module that checks the id, title, ext, and is_live fields of the extracted information (link)

Support radio "超!A&G+" on Nippon Cultural Broadcasting (JOQR).

yt_dlp/extractor/joqrag.py

Co-authored-by: garret <garret1317@yandex.com>

pzhlkj6612 · 2023-10-21T14:13:52Z

Hi there, I'm going to add a change:

The "joqrag" site stops at night and plays a loop video containing their logo. I think that the extractor should not download the stream but treat it as an "up_coming" live and "raise_no_formats()".

Now convert this PR to draft.

yt_dlp/extractor/joqrag.py

Daily programs list: http://www.joqr.co.jp/qr/agdailyprogram/?date=%Y%m%d

Co-authored-by: bashonly <bashonly@bashonly.com>

A failed download will return False and make _search_regex() failed: TypeError: expected string or bytes-like object

yt_dlp/extractor/joqrag.py

- Order of imports - More readable url regex - More strict and safer quotes matching - Shorter inner function name - Using single quotes - More lenient start time regex - Inlined description - More readable code for constructing message - Raising exceptions if no m3u8 formats Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

bashonly · 2023-12-12T00:37:52Z

yt_dlp/extractor/joqrag.py

+    _VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/player\.php',
+                  r'https?://www\.uniqueradio\.jp/agplayer5/inc-player-hls\.php',


Suggested change

_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/player\.php',

r'https?://www\.uniqueradio\.jp/agplayer5/inc-player-hls\.php',

_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/(?:player|inc-player-hls)\.php',

bashonly · 2023-12-12T00:38:54Z

yt_dlp/extractor/joqrag.py

+    def _extract_metadata(self, variable, html, name):
+        return clean_html(urllib.parse.unquote_plus(self._search_regex(
+            rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
+            html, name, group='value', default=''))) or None


it's always non-fatal so the name param doesn't really matter

Suggested change

def _extract_metadata(self, variable, html, name):

return clean_html(urllib.parse.unquote_plus(self._search_regex(

rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',

html, name, group='value', default=''))) or None

def _extract_metadata(self, variable, html):

return clean_html(urllib.parse.unquote_plus(self._search_regex(

rf'var\s+{variable}\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',

html, 'metadata', group='value', default=''))) or None

bashonly · 2023-12-12T00:39:14Z

yt_dlp/extractor/joqrag.py

+        metadata = self._download_webpage(
+            'https://www.uniqueradio.jp/aandg', video_id,
+            note='Downloading metadata', errnote='Failed to download metadata')
+        title = self._extract_metadata('Program_name', metadata, 'program title')


Suggested change

title = self._extract_metadata('Program_name', metadata, 'program title')

title = self._extract_metadata('Program_name', metadata)

bashonly · 2023-12-12T00:39:35Z

yt_dlp/extractor/joqrag.py

+            'id': video_id,
+            'title': title,
+            'channel': '超!A&G+',
+            'description': self._extract_metadata('Program_text', metadata, 'program description'),


Suggested change

'description': self._extract_metadata('Program_text', metadata, 'program description'),

'description': self._extract_metadata('Program_text', metadata),

bashonly · 2023-12-12T00:42:40Z

yt_dlp/extractor/joqrag.py

+            start_time = self._search_regex(
+                r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(?P<time>\d{1,2}:\d{1,2})',
+                self._download_webpage(
+                    f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,
+                    note=f'Downloading program list of {date}', fatal=False,
+                    errnote=f'Failed to download program list of {date}') or '',
+                'start time of the first program', default=None, group='time')


could make this a little less busy

Suggested change

start_time = self._search_regex(

r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(?P<time>\d{1,2}:\d{1,2})',

self._download_webpage(

f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,

note=f'Downloading program list of {date}', fatal=False,

errnote=f'Failed to download program list of {date}') or '',

'start time of the first program', default=None, group='time')

start_time = self._search_regex(

r'<h3[^>]+\bclass="dailyProgram-itemHeaderTime"[^>]*>[\s\d:]+–\s*(\d{1,2}:\d{1,2})',

self._download_webpage(

f'https://www.joqr.co.jp/qr/agdailyprogram/?date={date}', video_id,

note=f'Downloading program list of {date}', fatal=False,

errnote=f'Failed to download program list of {date}') or '',

'start time', default=None)

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

Authored by: pzhlkj6612

[ie/joqrag] Add extractor

4f478a9

Support radio "超!A&G+" on Nippon Cultural Broadcasting (JOQR).

garret1317 suggested changes Oct 20, 2023

View reviewed changes

garret1317 added the site-request Request to support a new website label Oct 20, 2023

pzhlkj6612 marked this pull request as draft October 20, 2023 10:35

pzhlkj6612 and others added 5 commits October 21, 2023 01:51

[ie/joqrag] Add channel; clean title

988329c

Co-authored-by: garret <garret1317@yandex.com>

[ie/joqrag] Extract text from title and description in HTML

d5c6e48

Co-authored-by: garret <garret1317@yandex.com>

[ie/joqrag] Join url segments with urljoin

e9f8f3e

Co-authored-by: garret <garret1317@yandex.com>

[ie/joqrag] Match multiple urls by pattern array

3c90db7

Co-authored-by: garret <garret1317@yandex.com>

[ie/joqrag] Fix download test; add only_matching tests

11eb240

Co-authored-by: garret <garret1317@yandex.com>

pzhlkj6612 marked this pull request as ready for review October 20, 2023 19:02

pzhlkj6612 requested a review from garret1317 October 20, 2023 19:18

bashonly self-requested a review October 21, 2023 13:51

pzhlkj6612 marked this pull request as draft October 21, 2023 14:13

garret1317 reviewed Oct 21, 2023

View reviewed changes

yt_dlp/extractor/joqrag.py Outdated Show resolved Hide resolved

pzhlkj6612 and others added 2 commits October 22, 2023 03:09

[ie/joqrag] Do not download if the stream has not started yet

c2c195d

Daily programs list: http://www.joqr.co.jp/qr/agdailyprogram/?date=%Y%m%d

[ie/joqrag] Extract code for metadata extraction into a function

dcc0c86

Co-authored-by: bashonly <bashonly@bashonly.com>

pzhlkj6612 marked this pull request as ready for review October 21, 2023 19:13

pzhlkj6612 added 4 commits October 22, 2023 03:14

[ie/joqrag] Fix string literal

b2920d0

[ie/joqrag] Add ability to find release_timestamp

0d5d575

[ie/joqrag] Remove useless datetime precision

2113492

[ie/joqrag] Non-fatal download in _search_regex is not reasonable

d63976e

A failed download will return False and make _search_regex() failed: TypeError: expected string or bytes-like object

bashonly requested changes Nov 15, 2023

View reviewed changes

bashonly added the pending-fixes PR has had changes requested label Nov 15, 2023

pzhlkj6612 marked this pull request as draft November 16, 2023 14:26

pzhlkj6612 and others added 4 commits November 28, 2023 05:47

[ie/joqrag] merge "master" branch

aa9b253

[ie/joqrag] Add the formerly known name

0207600

[ie/joqrag] Make program list request non-fatal

9361343

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

pzhlkj6612 marked this pull request as ready for review November 28, 2023 07:53

pzhlkj6612 requested a review from bashonly November 28, 2023 07:53

bashonly approved these changes Dec 12, 2023

View reviewed changes

bashonly removed the pending-fixes PR has had changes requested label Dec 12, 2023

pzhlkj6612 marked this pull request as draft December 12, 2023 14:34

pzhlkj6612 and others added 2 commits December 12, 2023 23:14

[ie/joqrag] combine regex; no variant metadata names; shorter func call

a330815

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>

[ie/joqrag] merge "master"

c839381

pzhlkj6612 marked this pull request as ready for review December 12, 2023 15:25

[ie/joqrag] "AGQR" is the old name

ef39027

bashonly merged commit db8b4ed into yt-dlp:master Dec 19, 2023
15 checks passed

pzhlkj6612 deleted the joqr branch December 19, 2023 14:28

pzhlkj6612 mentioned this pull request Apr 6, 2024

[ie/joqrag] Fix live_status check when "Program_name" is empty #9624

Merged

9 tasks

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/JoqrAg] Add extractor (yt-dlp#8384)

2640f43

Authored by: pzhlkj6612

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ie/joqrag] Add extractor #8384

[ie/joqrag] Add extractor #8384

pzhlkj6612 commented Oct 19, 2023 •

edited by ghost

pzhlkj6612 commented Oct 21, 2023

bashonly Dec 12, 2023

bashonly Dec 12, 2023

bashonly Dec 12, 2023

bashonly Dec 12, 2023

bashonly Dec 12, 2023

		_VALID_URL = [r'https?://www\.uniqueradio\.jp/agplayer5/player\.php',
		r'https?://www\.uniqueradio\.jp/agplayer5/inc-player-hls\.php',

	title = self._extract_metadata('Program_name', metadata, 'program title')
	title = self._extract_metadata('Program_name', metadata)

	'description': self._extract_metadata('Program_text', metadata, 'program description'),
	'description': self._extract_metadata('Program_text', metadata),

[ie/joqrag] Add extractor #8384

[ie/joqrag] Add extractor #8384

Conversation

pzhlkj6612 commented Oct 19, 2023 • edited by ghost

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 4f478a9

Summary

Walkthrough

pzhlkj6612 commented Oct 21, 2023

bashonly Dec 12, 2023

Choose a reason for hiding this comment

bashonly Dec 12, 2023

Choose a reason for hiding this comment

bashonly Dec 12, 2023

Choose a reason for hiding this comment

bashonly Dec 12, 2023

Choose a reason for hiding this comment

bashonly Dec 12, 2023

Choose a reason for hiding this comment

pzhlkj6612 commented Oct 19, 2023 •

edited by ghost

`🤖 Generated by Copilot at 4f478a9`