Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBC.com URL failed #18236

Closed
josejuan05 opened this issue Nov 18, 2018 · 1 comment
Closed

NBC.com URL failed #18236

josejuan05 opened this issue Nov 18, 2018 · 1 comment
Labels

Comments

@josejuan05
Copy link

@josejuan05 josejuan05 commented Nov 18, 2018

Originally posted in response to a different Issue where the NBC.com extractor failed on the same line (See #18202), I think I actually hit a different bug. For http://www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729 I received a trace that failed on line 96 of nbc.py. (When I try it now it no longer appears to be failing)

Looking at the _real_extract function that was failing:

    def _real_extract(self, url):
        permalink, video_id = re.match(self._VALID_URL, url).groups()
        permalink = 'http' + compat_urllib_parse_unquote(permalink)
        response = self._download_json(
            'https://api.nbc.com/v3/videos', video_id, query={
                'filter[permalink]': permalink,
                'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
                'fields[shows]': 'shortTitle',
                'include': 'show.shortTitle',
            })
        video_data = response['data'][0]['attributes']

The function was failing because the data list is empty.

If I ran youtube-dl http://www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729 --dump-pages it dumped out the following for the response object:

{"data":[],"meta":{"count":0,"version":"v3.0.0"},"links":{"self":"https://api.nbc.com/v3/videos?filter%5Bpermalink%5D=http%3A//www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729&include=show%2Cshow.shortTitle&page%5Bnumber%5D=1"}}

If I curl'd the URL in response['links']['self'] it appeared to return the object that was actually required by the video_data declaration.

There are other NBC videos that still worked (e.g. https://www.nbc.com/the-good-place/video/the-ballad-of-donkey-doug/3814933). The fix shouldn't break those.

I added a line to the above function as shown in the following code, and it appeared to fix my problem.

    def _real_extract(self, url):
        permalink, video_id = re.match(self._VALID_URL, url).groups()
        permalink = 'http' + compat_urllib_parse_unquote(permalink)
        response = self._download_json(
            'https://api.nbc.com/v3/videos', video_id, query={
                'filter[permalink]': permalink,
                'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
                'fields[shows]': 'shortTitle',
                'include': 'show.shortTitle',
            })
        if len(response['data']) is 0:
          response = self._download_json(response['links']['self'],video_id)

Originally posted by @josejuan05 in #18202 (comment)

@josejuan05 josejuan05 mentioned this issue Nov 18, 2018
5 of 9 tasks complete
@dstftw dstftw closed this Nov 19, 2018
@dstftw dstftw added the duplicate label Nov 19, 2018
@josejuan05
Copy link
Author

@josejuan05 josejuan05 commented Dec 2, 2018

I had this problem occur again this morning, with the following URL:
https://www.nbc.com/saturday-night-live/video/december-1-claire-foy/3836786

Again, after a few hours something was changed on NBC's end, and youtube-dl works again. In both cases there is no problem viewing the video immediately, but stock youtube-dl breaks.

I recorded the terminal output, writing downloaded pages to file:

youtube-dl -F https://www.nbc.com/saturday-night-live/video/december-1-claire-foy/3836786 --verbose --write-pages
[debug] System config: []   
[debug] User config: []
[debug] Custom config: []   
[debug] Command-line args: [u'-F', u'https://www.nbc.com/saturday-night-live/video/december-1-claire-foy/3836786', u'--verbose', u'--write-pages']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2018.11.18
[debug] Python version 2.7.15+ (CPython) - Linux-4.18.0-12-generic-x86_64-with-Ubuntu-18.10-cosmic
[debug] exe versions: ffmpeg 4.0.2-2, ffprobe present, rtmpdump 2.4
[debug] Proxy map: {}
[NBC] 3836786: Downloading JSON metadata
[NBC] Saving request to 3836786_https_-_api.nbc.com_v3_videosfilter%5Bpermalink%5D=http%3A%2F%2Fwww.nbc.com%2Fsaturday-night-live%2Fvideo%2Fdecember-1-claire-foy%2F3836786_include=show.shortTitle_fields%5Bshows%5D=shortTitle_fie_105927de57e84e2f6f81a967bf60f6d5.dump
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 472, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 462, in _real_mounmain
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2001, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 792, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 508, in extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/nbc.py", line 96, in _real_extract
IndexError: list index out of range

The request saved to 3836786_https_-_api.nbc.com_v3_videosfilter%5Bpermalink%5D=http%3A%2F%2Fwww.nbc.com%2Fsaturday-night-live%2Fvideo%2Fdecember-1-claire-foy%2F3836786_include=show.shortTitle_fields%5Bshows%5D=shortTitle_fie_105927de57e84e2f6f81a967bf60f6d5.dump looked like:

{"data":[],"meta":{"count":0,"version":"v3.0.0"},"links":{"self":"https://api.nbc.com/v3/videos?filter%5Bpermalink%5D=http%3A//www.nbc.com/saturday-night-live/video/december-1-claire-foy/3836786&include=show%2Cshow.shortTitle&page%5Bnumber%5D=1"}}

Again, the reason this fails is that the 'data' element is empty. But if you apply the above patch in #18233 , which retries the api.nbc.com request with the URL in request['links']['self'], it fixes the problem (See the result of the second request : request_links_self.txt) :

[NBC] 3836786: Downloading JSON metadata
[ThePlatform] 3836786: Downloading SMIL data
[ThePlatform] 3836786: Downloading m3u8 information
[ThePlatform] 3836786: Downloading JSON metadata
[info] Available formats for 3836786:
format code  extension  resolution note
hls-95       mp4        audio only   95k , mp4a.40.2
hls-319      mp4        416x236     319k , avc1.66.30, mp4a.40.2
hls-525      mp4        416x236     525k , avc1.66.30, mp4a.40.2
hls-847      mp4        1920x1080   847k , avc1.66.30, mp4a.40.2
hls-1347     mp4        960x540    1347k , avc1.66.30, mp4a.40.2
hls-1987     mp4        960x540    1987k , avc1.77.30, mp4a.40.2
hls-2716     mp4        1280x720   2716k , avc1.77.30, mp4a.40.2
hls-4274     mp4        1920x1080  4274k , avc1.77.30, mp4a.40.2 (best)

(the same result could be observed without the -F flag: the video can be successfully downloaded)

The patch above was rejected for just repeating the same request, but that is an incorrect assessment. The URL changes. I don't know how many shows or pages it applies to, but I would be that if you try youtube-dl on a new episode of Saturday Night Live, via the NBC webpage the morning after it airs (for the next episode that would be December 9, sometime before noon EST), you will likely run into this bug, and the above patch will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.