Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
NBC.com URL failed #18236
NBC.com URL failed #18236
Comments
|
I had this problem occur again this morning, with the following URL: Again, after a few hours something was changed on NBC's end, and youtube-dl works again. In both cases there is no problem viewing the video immediately, but stock youtube-dl breaks. I recorded the terminal output, writing downloaded pages to file:
The request saved to 3836786_
Again, the reason this fails is that the 'data' element is empty. But if you apply the above patch in #18233 , which retries the api.nbc.com request with the URL in request['links']['self'], it fixes the problem (See the result of the second request : request_links_self.txt) :
(the same result could be observed without the The patch above was rejected for just repeating the same request, but that is an incorrect assessment. The URL changes. I don't know how many shows or pages it applies to, but I would be that if you try youtube-dl on a new episode of Saturday Night Live, via the NBC webpage the morning after it airs (for the next episode that would be December 9, sometime before noon EST), you will likely run into this bug, and the above patch will fix it. |
Originally posted in response to a different Issue where the NBC.com extractor failed on the same line (See #18202), I think I actually hit a different bug. For http://www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729 I received a trace that failed on line 96 of nbc.py. (When I try it now it no longer appears to be failing)
Looking at the
_real_extractfunction that was failing:The function was failing because the
datalist is empty.If I ran
youtube-dl http://www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729 --dump-pagesit dumped out the following for theresponseobject:{"data":[],"meta":{"count":0,"version":"v3.0.0"},"links":{"self":"https://api.nbc.com/v3/videos?filter%5Bpermalink%5D=http%3A//www.nbc.com/saturday-night-live/video/november-17-steve-carell/3828729&include=show%2Cshow.shortTitle&page%5Bnumber%5D=1"}}If I
curl'd the URL inresponse['links']['self']it appeared to return the object that was actually required by thevideo_datadeclaration.There are other NBC videos that still worked (e.g. https://www.nbc.com/the-good-place/video/the-ballad-of-donkey-doug/3814933). The fix shouldn't break those.
I added a line to the above function as shown in the following code, and it appeared to fix my problem.
Originally posted by @josejuan05 in #18202 (comment)