NHK World vods in Japanese exhibit list index out of range error #8303

Contik · 2023-10-07T15:42:47Z

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

I'm reporting that yt-dlp is broken on a supported site
I've verified that I'm running yt-dlp version 2023.10.07 (update instructions) or later (specify commit)
I've checked that all provided URLs are playable in a browser with the same IP and same login details
I've checked that all URLs and arguments with special characters are properly quoted or escaped
I've searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
I've read the guidelines for opening an issue
I've read about sharing account credentials and I'm willing to share it if required

Region

From anywhere outside of Japan determined by request source IP address

Provide a description that is worded well enough to be understood

Per issue #8242 comment 1751738030:

Attempting to download NhkVod videos in Japanese at https://www3.nhk.or.jp/nhkworld/ja/... currently produces a list index out of range error. English-language videos at https://www3.nhk.or.jp/nhkworld/en/... do not exhibit the same behavior, these are now working as of yesterday's merged pull request 8249 and yt-dlp version 2023.10.07.

For example daily noon and evening news videos in Japanese at https://www3.nhk.or.jp/nhkworld/ja/ondemand/video produce attached verbose output.

For context: these videos are intended for Japanese out of country so downloadable only outside of Japan. The site sends different HTTP response bodies depending on whether or not it perceives a request source IP address to be within Japan or outside of Japan. When outside of Japan the page shows:

By my understanding the videos highlighted with red border don't have any retention, NHK only ever offers the current day's video for download. These two examples are today's 7 pm news (ニュース7 aka nyusu 7) and noon news (正午のニュース aka shogo no nyusu).

"Within" Japan you'll get:

Basically asking you to use your NHK Plus account to watch a show you missed or to get yourself an NHK World Premium subscription.

Provide verbose output that clearly demonstrates the problem

Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
If using API, add 'verbose': True to YoutubeDL params instead
Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', 'https://www3.nhk.or.jp/nhkworld/ja/ondemand/video/0451269387/']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.10.07 [377e85a17]
[debug] Python 3.11.5 (CPython x86_64 64bit) - Linux-6.5.5-arch1-1-x86_64-with-glibc2.38 (OpenSSL 3.1.3 19 Sep 2023, glibc 2.38)
[debug] exe versions: ffmpeg 6.0 (setts), ffprobe 6.0, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.12.0, certifi-2023.07.22, sqlite3-3.43.1, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1886 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Available version: stable@2023.10.07, Current version: stable@2023.10.07
yt-dlp is up to date (stable@2023.10.07)
[NhkVod] Extracting URL: https://www3.nhk.or.jp/nhkworld/ja/ondemand/video/0451269387/
[NhkVod] 0451-269: Downloading JSON metadata
ERROR: list index out of range
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1567, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1702, in __extract_info
    ie_result = ie.extract(url)
                ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/nhk.py", line 205, in _real_extract
    return self._extract_episode_info(url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/nhk.py", line 77, in _extract_episode_info
    episode = self._call_api(
              ^^^^^^^^^^^^^^^
IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

garret1317 · 2023-10-08T00:26:00Z

looks like the episode_id is getting cut off
for https://www3.nhk.or.jp/nhkworld/ja/ondemand/video/0046268465/ the extractor grabs

https://nwapi.nhk.jp/nhkworld/vodesdlist/v7b/episode/0046-268/ja/all/all.json (nothing there)
but the site grabs
https://nwapi.nhk.jp/nhkworld/vodesdlist/v7b/episode/0046-268465/ja/all/all.json

seems the assumption is that ids will always be 7 characters, but the japanese news is 10
maybe they ran out idk
r'%s%s(?P<id>[0-9a-z]{7}|[^/]+?-\d{8}-[0-9a-z]+)' NhkVodIE regex
if i replace the {7} with a +
and remove the length check in _extract_episode_info

        if len(episode_id) == 7:
            episode_id = episode_id[:4] + '-' + episode_id[4:]

it all starts working beautifully

but
the NhkVodIE regex has another section [^/]+?-\d{8}-[0-9a-z]+
should probably see what that's for and if these changes break it

garret1317 · 2023-10-08T00:32:28Z

it could be for radio on demand?
oh well thats broken already

edit:
yes, was added in 061d1cd, updated in b79df1b

garret1317 · 2023-10-08T00:44:58Z

wait no its only broken because it gets matched by the video regex lmao

radio was getting matched by a section of the regex meant for the video extractor, and japanese-language vods broke because their ids were too long. this commit fixes NhkVodIE so it can extract japanese-language vods, by removing the explicit specification of the length of the ID. It also splits radio and tv into their own IEs, with separate regexes, so they don't conflict with each other. closes yt-dlp#8303 and fixes radio extraction

radio was getting matched by a section of the regex meant for the video extractor, and japanese-language vods broke because their ids were too long. this commit fixes NhkVodIE so it can extract japanese-language vods, by removing the explicit specification of the length of the ID. It also splits radio and tv into their own regexes so they don't conflict with each other. fixes yt-dlp#8303 and radio extraction, replaces yt-dlp#8305

Closes #8303 Authored by: garret1317

Contik · 2023-10-16T20:33:57Z

❤️

Closes yt-dlp#8303 Authored by: garret1317

Contik added site-bug Issue with a specific website triage Untriaged issue labels Oct 7, 2023

Contik mentioned this issue Oct 7, 2023

NHK World vods posted after September 26th, 2023 are broken #8242

Closed

11 tasks

garret1317 removed the triage Untriaged issue label Oct 8, 2023

garret1317 mentioned this issue Oct 8, 2023

NHK World regex fixes #8305

Closed

9 tasks

garret1317 mentioned this issue Oct 8, 2023

NHK World regex fixes 2 #8309

Merged

9 tasks

This comment was marked as spam.

Sign in to view

bashonly closed this as completed in #8309 Oct 9, 2023

bashonly pushed a commit that referenced this issue Oct 9, 2023

[ie/nhk] Fix Japanese-language VOD extraction (#8309)

4de94b9

Closes #8303 Authored by: garret1317

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024

[ie/nhk] Fix Japanese-language VOD extraction (yt-dlp#8309)

946292c

Closes yt-dlp#8303 Authored by: garret1317

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NHK World vods in Japanese exhibit list index out of range error #8303

NHK World vods in Japanese exhibit list index out of range error #8303

Contik commented Oct 7, 2023 •

edited

garret1317 commented Oct 8, 2023

garret1317 commented Oct 8, 2023 •

edited

garret1317 commented Oct 8, 2023

This comment was marked as spam.

This comment was marked as spam.

Contik commented Oct 16, 2023

NHK World vods in Japanese exhibit list index out of range error #8303

NHK World vods in Japanese exhibit list index out of range error #8303

Comments

Contik commented Oct 7, 2023 • edited

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Provide a description that is worded well enough to be understood

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

garret1317 commented Oct 8, 2023

garret1317 commented Oct 8, 2023 • edited

garret1317 commented Oct 8, 2023

This comment was marked as spam.

This comment was marked as spam.

Contik commented Oct 16, 2023

Contik commented Oct 7, 2023 •

edited

garret1317 commented Oct 8, 2023 •

edited