Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBC site broken #22693

Closed
ddurdle opened this issue Oct 13, 2019 · 17 comments
Closed

NBC site broken #22693

ddurdle opened this issue Oct 13, 2019 · 17 comments

Comments

@ddurdle
Copy link

@ddurdle ddurdle commented Oct 13, 2019

Checklist

  • [X ] I'm reporting a broken site support
  • [X ] I've verified that I'm running youtube-dl version 2019.09.28
  • [ X] I've checked that all provided URLs are alive and playable in a browser
  • [X ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [ X] I've searched the bugtracker for similar issues including closed ones

Verbose log

 youtube-dl --hls-prefer-native https://www.nbc.com/dateline/video/the-plan/4040189 -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--hls-prefer-native', u'https://www.nbc.com/dateline/video/the-plan/4040189', u'-v']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.09.28
[debug] Python version 2.7.13 (CPython) - Linux-3.10.0-957.12.2.vz7.96.21-i686-with-debian-9.9
[debug] exe versions: ffmpeg 3.2.14-1, ffprobe 3.2.14-1, rtmpdump 2.4
[debug] Proxy map: {}
[NBC] 4040189: Downloading JSON metadata
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 474, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 464, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2018, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 796, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 530, in extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/nbc.py", line 94, in _real_extract
IndexError: list index out of range

Description

Latest NBC dateline episode plays via browser but Index Error via youtube-dl
https://www.nbc.com/dateline/video/the-plan/4040189

@bonacker1
Copy link

@bonacker1 bonacker1 commented Oct 13, 2019

The original poster's URL works for me, but this one does not and gives the same log as he posted

http://www.nbc.com/saturday-night-live/video/october-12-david-harbour/4046108

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Oct 13, 2019

I ran into this this morning. The index error is from this:

 /Users/jhawk/src/youtube-dl/youtube_dl/extractor/nbc.py(94)_real_extract()
-> video_data = response['data'][0]['attributes']
(Pdb) print response['data']
[]
(Pdb) print response
{u'meta': {u'count': 1, u'version': u'v3.0.0'}, u'data': [], u'links': {u'self': u'https://api.nbc.com/v3/videos?filter%5Bpermalink%5D=http%3A//www.nbc.com/saturday-night-live/video/october-12-david-harbour/4046108&include=show%2Cshow.shortTitle&page%5Bnumber%5D=1'}}

That is, the data array in the metadata response (from api.nbc.com/v3/videos) is empty. Youtube-dl expects to get the guid, title, and authentication entitlement from there:

video_data = response['data'][0]['attributes']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
video_id = video_data['guid']
title = video_data['title']
if video_data.get('entitlement') == 'auth':
resource = self._get_mvpd_resource(
'nbcentertainment', title, video_id,
video_data.get('vChipRating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(
'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/' + video_id,
query), {'force_smil_url': True})

For an example like this SNL URL, we know the guid (it's 4040189) but a manual query of the theplatform query returns an inaccessible mp4. (But maybe I'm just missing an authentication step, since I get a similar problem for a guid that is otherwise available to youtube-dl) Here's an example with irrelevant stuff trimmed:

pb3:xo13 jhawk$ curl -vL 'http://link.theplatform.com/s/NnzsPC/media/guid/2410887629/4040189'
*> GET /s/NnzsPC/media/guid/2410887629/4040189 HTTP/1.1
> Host: link.theplatform.com
> 
< HTTP/1.1 302 Found
< Location: http://nbcmpx-vh.akamaihd.net/z/video/55/1010/191001_4040189_The_Plan_200.mp4?hdnea=st=1570988119~exp=1571000749~acl=/z/video/55/1010/191001_4040189_The_Plan_*~id=fd9758e5-7e88-45e9-b006-8c36ce20a460~hmac=f876fa6aecb0d27e57a7bd7b1b977355239925f20fbda1b895f1054fa9d6c20d

so far so good but:

* Issue another request to this URL: 'http://nbcmpx-vh.akamaihd.net/z/video/55/1010/191001_4040189_The_Plan_200.mp4?hdnea=st=1570988119~exp=1571000749~acl=/z/video/55/1010/191001_4040189_The_Plan_*~id=fd9758e5-7e88-45e9-b006-8c36ce20a460~hmac=f876fa6aecb0d27e57a7bd7b1b977355239925f20fbda1b895f1054fa9d6c20d'
* Connected to nbcmpx-vh.akamaihd.net (162.216.58.66) port 80 (#1)
> GET /z/video/55/1010/191001_4040189_The_Plan_200.mp4?hdnea=st=1570988119~exp=1571000749~acl=/z/video/55/1010/191001_4040189_The_Plan_*~id=fd9758e5-7e88-45e9-b006-8c36ce20a460~hmac=f876fa6aecb0d27e57a7bd7b1b977355239925f20fbda1b895f1054fa9d6c20d HTTP/1.1
> Host: nbcmpx-vh.akamaihd.net
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 403 Forbidden
< Server: AkamaiGHost
<
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>```

perhaps this is helpful to someone more familiar with the NBC/theplatform APIs.
@raleeper
Copy link
Contributor

@raleeper raleeper commented Oct 13, 2019

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Oct 13, 2019

GOOD EYE, @raleeper!

Confirmed. That seems…really weird.

https://api.nbc.com/ lists the API versions, and oddly enough I had tried v4 earlier, but I didn't think to try any of the v3 variants. I would have assumed that v3 would map to the latest of them (v3.14), but actually it appears to map to v3.0.0 (this is reported by any query to it).

It appears that v3.0.0 through v3.1.1 fail, but v3.2 through v3.14 all work.

I'm really at a loss to understand a rational basis for this behavior. Surely 2 through 14 have been around for a while, and why would they tinker with the functionality of 0 through 1? Maybe the most plausible theory is somehow the mapping of v3 to the latest was inadvertently messed up, and it went from v3 -> v3.13 to v3 -> v3.0.0 overnight by accident?

The below patch certainly makes it work, but it doesn't seem like the best solution? On the other hand, forcing a roundtrip to check for the latest v3.* API version seems a waste — we could just submit a PR if that were necessary.

(I had also spent awhile trying to figure out how to make the v4/videos API return anything at all with a ?filter parameter and didn't succeed. Although I see now the same mapping problem exists there and v4.21/video is probably the endpoint to use, though I didn't make real progress there either.)

diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py
index 3282f84ee..49c987320 100644
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -85,7 +85,7 @@ class NBCIE(AdobePassIE):
         permalink, video_id = re.match(self._VALID_URL, url).groups()
         permalink = 'http' + compat_urllib_parse_unquote(permalink)
         response = self._download_json(
-            'https://api.nbc.com/v3/videos', video_id, query={
+            'https://api.nbc.com/v3.14/videos', video_id, query={
                 'filter[permalink]': permalink,
                 'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
                 'fields[shows]': 'shortTitle',
@raleeper
Copy link
Contributor

@raleeper raleeper commented Oct 13, 2019

There's also a time element here. As others have reported, some videos return empty data using the v3 api when they are very new (like morning after airing). The data appears correctly later in the day. Seems like some sort of timezone issue in v3 that is fixed in v3.14. That explains why the issue has been hard to re-create after the fact.

It's a mystery why v3 isn't an alias for the latest v3.x.

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Oct 13, 2019

So, PR with the above patch (and perhaps an explanatory comment, although I've been slapped down for those before)?

Or something else?

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Oct 13, 2019

Oh, also, do we want better error handling for this situation ("No metadata returned by API call."), or is "IndexError: list index out of range" ok?

@ddurdle
Copy link
Author

@ddurdle ddurdle commented Oct 13, 2019

Right now it is an uncaught exception, a customzied error to help guide the user would be better.

@ddurdle
Copy link
Author

@ddurdle ddurdle commented Oct 13, 2019

BTW, I could have swore I came back here this morning to post the update that the original cited video started working this morning. I even clicked the "close and comment". I assumed something was broken on the provider side that was since fixed, but it sounds like more is at foul here.

@raleeper
Copy link
Contributor

@raleeper raleeper commented Oct 13, 2019

I would go with the simplest possible change. If you aren't a python programmer, any message about a situation that you can't fix by changing command line parameters is the same as an uncaught exception. Both mean "extractor broken."

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Oct 13, 2019

Right now it is an uncaught exception, a customzied error to help guide the user would be better.

Although it's true that there's not much difference from the user's perspective, and we don't want to discourage users from reporting an Issue with a message like "This will probably work if you wait a day and try tomorrow," it's also true that many users are python programmers and giving them an idea of what the problem is that doesn't involve pulling out a debugger means it's far more likely that someone will take a look at the problem and either craft a solution or at least produce a bug report with some analysis.

That said, without some guidance as to what kind of checking and reporting would be most appropriate, I'm not at all sure what to do. At a minimum we could break
video_data = response['data'][0]['attributes']
into a few more statements, so it is clear from code inspection which dereference failed.

I would probably test for length(response['data']) and if it failed, print a message about missing metadata as well as pretty-print the JSON content of response. But maybe that's too much.

In any event, this could be a separate PR. Perhaps the committers could offer some guidance assuming they merge #22701.

@bonacker1
Copy link

@bonacker1 bonacker1 commented Oct 14, 2019

This is just a temporary work-around for downloading NBC and NBC News videos until the programmers update the YT-DL code. This worked for me in downloading both today's and last week's Meet The Press and last night's SNL.

Using the following as a template

https://player.theplatform.com/p/HNK2IC/uW4uIUm_KHR6/select/media/guid/2410887629/xxxxxxx

replace the seven x's with the number of the video to be downloaded. The video's number, currently, should begin with "404." (The "404" has nothing to do with a 404 webpage.)

To get the video number, refer to the URL of the webpage containing the video. Today's Meet The Press is:

https://www.nbc.com/meet-the-press/video/meet-the-press-101319/4047204

Note that the last number is 4047204

Replacing the 7 x's in the template with 4047204, we have:

https://player.theplatform.com/p/HNK2IC/uW4uIUm_KHR6/select/media/guid/2410887629/4047204

The above URL will download (at the moment) in YT-DL. This may not last but might be helpful until the programmers update YT-DL.

@ryanhilton
Copy link

@ryanhilton ryanhilton commented Oct 14, 2019

This is just a temporary work-around for downloading NBC and NBC News videos until the programmers update the YT-DL code. This worked for me in downloading both today's and last week's Meet The Press and last night's SNL.

Using the following as a template

https://player.theplatform.com/p/HNK2IC/uW4uIUm_KHR6/select/media/guid/2410887629/xxxxxxx

replace the seven x's with the number of the video to be downloaded. The video's number, currently, should begin with "404." (The "404" has nothing to do with a 404 webpage.)

To get the video number, refer to the URL of the webpage containing the video. Today's Meet The Press is:

https://www.nbc.com/meet-the-press/video/meet-the-press-101319/4047204

Note that the last number is 4047204

Replacing the 7 x's in the template with 4047204, we have:

https://player.theplatform.com/p/HNK2IC/uW4uIUm_KHR6/select/media/guid/2410887629/4047204

The above URL will download (at the moment) in YT-DL. This may not last but might be helpful until the programmers update YT-DL.

Thank you @bonacker1 !! This worked for me!

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Nov 24, 2019

@remitamine: I hesitate to open a new issue, but as of a few minutes ago, https://www.nbc.com/saturday-night-live/video/november-23-will-ferrell/4069408 doesn't work in the current release:

"errors":[{"message":"No video found with that id."

but I tried an old checkout of 2017.11.06 and it worked fine. This suggests to me that graphql may not have been the right choice for all situations.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Nov 24, 2019

not accessible for me in the browser.

Well, this is awkward
Sorry, it looks like the video or page you're looking for seems to have disappeared - or maybe it never existed to begin with.
In the meantime, we invite you to visit the NBC home page.

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Nov 24, 2019

not accessible for me in the browser.

Agreed. But it is linked[*] from https://www.nbc.com/saturday-night-live/episodes and present in the v3.14 API and downloadable with older youtube-dl versions.

Edit: [*] well, it's some kind of a javascript click event handler rather than a link per se, but...

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Nov 24, 2019

I see this as a problem with the website, not the use of GraphQL API.
if the page was accessible in the browser but not with the GraphQL API, then it would be considered as a bug that needs to be fixed.
i prefered to GraphQL API as it will eliminate the problem where some videos are only accessible on certain API version, which will require updating the API code frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.