Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please print the URLs of failing downloads #12368

Open
johnhawkinson opened this issue Mar 5, 2017 · 0 comments
Open

Please print the URLs of failing downloads #12368

johnhawkinson opened this issue Mar 5, 2017 · 0 comments

Comments

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Mar 5, 2017

I'm filing this as an issue to try to get some direction, because it has some similarity to #11053 where I implemented a solution to a related but more narrow problem, and it was rejected. So I don't want to code up stuff to have it rejected.


While thinking about #12364, I wondered how many SNL videos exhibited that failure. So I saved the set of links to a file and fed the file to youtube-dl -s -i -a snlinks to see what succeded and what failed. To my surprise, youtube-dl does not clearly identify which URL it is downloading when it prints a message, be it informational, warning, or [potentially fatal] error. For instance:

pb3:x jhawk$ youtube-dl -is -a ~/Desktop/snllinks
[NBC] 3477503: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[NBC] 3478272: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[NBC] 3478269: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[NBC] 3480398: Downloading webpage
[ThePlatform] 3480398: Downloading webpage
[ThePlatform] 3480398: Downloading SMIL data
[ThePlatform] 3480398: Downloading m3u8 information
[ThePlatform] 3480398: Downloading JSON metadata
[NBC] 3480409: Downloading webpage
[ThePlatform] 3480409: Downloading webpage
[ThePlatform] 3480409: Downloading SMIL data
[ThePlatform] 3480409: Downloading m3u8 information
[ThePlatform] 3480409: Downloading JSON metadata
...

It's notable to me that in the case of success, it prints 5 lines of information across two InfoExtractors, and none of the information would be as useful as the URL. (I would support each of those lines printing the URL along with them, although I gather that is disfavored, given #11053).

It also seems wrong to me that a separate line is printed for SMIL, m3u8, and JSON download information in the absence of --verbose. I don't think most users know or need to see that, although it does sort of service as a useful but extremely crude progress indication.

Some might argue that printing the video_id is sufficient, because it is often extracted from the URL and so can correlate. I disagree. video_ids are opaque and while most users know what a URL is, I don't think very many understand what a video_id is or that that is what youtube-dl is printing. In the above example, the initial 3477503 id is associated with http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-steve-higgins-makes-sound-effects-for-gym-class/3477503 and of course it matches the last path element. But that's not necessarily clear if you don't know to look for it, and not user-friendly. It also breaks down in cases where the video_id does not correlate with the URL; or cases where all the video_ids are the same, like how they are all "story" in #12099 (comment). And in this particular case, of NBC SNL,
http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-steve-higgins-makes-sound-effects-for-gym-class/3477503
fails but
http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-steve-higgins-makes-sound-effects-for-gym-class/3477503?snl=0
works and they have the same video_id.


Some might argue that URLs should only be displayed with --verbose. Again, that seems wrong to me, since other much less useful information (SMIL, &c. progress) is shown without it. But even with --verbose they are not shown. For instance:

pb3:x jhawk$ youtube-dl -sv -a ~/Desktop/snllinks -i
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-sv', u'-a', u'/Users/jhawk/Desktop/snllinks', u'-i']
[debug] Batch file urls: [u'http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-steve-higgins-makes-sound-effects-for-gym-class/3477503', u'http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-the-music-department/3478272', u'http://www.nbc.com/saturday-night-live/video/creating-saturday-night-live-the-photo-department/3478269', u'http://www.nbc.com/saturday-night-live/video/drug-company-hearing/3480398', u'http://www.nbc.com/saturday-night-live/video/father-john-misty-pure-comedy/3480409', u'http://www.nbc.com/saturday-night-live/video/father-john-misty-total-entertainment-forever/3480402', u'http://www.nbc.com/saturday-night-live/video/girl-at-a-bar/3480399', u'http://www.nbc.com/saturday-night-live/video/jeff-sessions-gump-cold-open/3480395', u'http://www.nbc.com/saturday-night-live/video/octavia-spencer-and-father-john-misty-are-at-snl-and-its-on/3478950', u'http://www.nbc.com/saturday-night-live/video/octavia-spencer-monologue/3480396', u'http://www.nbc.com/saturday-night-live/video/republican-movie-trailer/3480397', u'http://www.nbc.com/saturday-night-live/video/sean-spicer-press-conference-cold-open/3468882', u'http://www.nbc.com/saturday-night-live/video/snl-host-octavia-spencer-finds-studio-8h/3477499', u'http://www.nbc.com/saturday-night-live/video/spencers-gifts-hq/3480411', u'http://www.nbc.com/saturday-night-live/video/sticky-bun/3480407', u'http://www.nbc.com/saturday-night-live/video/the-chocolate-man/3480410', u'http://www.nbc.com/saturday-night-live/video/weekend-update-eric-and-donald-trump-jr/3480405', u'http://www.nbc.com/saturday-night-live/video/weekend-update-laura-parsons-on-the-2017-oscars-and-trans-rights/3480406', u'http://www.nbc.com/saturday-night-live/video/weekend-update-on-churchs-celibacy-rule/3480404', u'http://www.nbc.com/saturday-night-live/video/weekend-update-on-donald-trumps-wiretapping-accusation/3480403', u'http://www.nbc.com/saturday-night-live/video/wine-bar/3480408', u'http://www.nbc.com/saturday-night-live/video/youngblood/3480401', u'http://www.nbc.com/saturday-night-live/video/zooopalis-voice-actors/3480400']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.03.05
[debug] Python version 2.7.10 - Darwin-14.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg git-2017-02-28-7f62368, ffprobe git-2017-02-28-7f62368, rtmpdump 2.4
[debug] Proxy map: {}
[NBC] 3477503: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 761, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 427, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/nbc.py", line 139, in _real_extract
    webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 768, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 759, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

[NBC] 3478272: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 761, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 427, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/nbc.py", line 139, in _real_extract
    webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 768, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 759, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

[NBC] 3478269: Downloading webpage
ERROR: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 761, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 427, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/nbc.py", line 139, in _real_extract
    webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 768, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 759, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract theplatform url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

[NBC] 3480398: Downloading webpage
[ThePlatform] 3480398: Downloading webpage
[ThePlatform] 3480398: Downloading SMIL data
[ThePlatform] 3480398: Downloading m3u8 information
[ThePlatform] 3480398: Downloading JSON metadata
[NBC] 3480409: Downloading webpage
[ThePlatform] 3480409: Downloading webpage
[ThePlatform] 3480409: Downloading SMIL data
[ThePlatform] 3480409: Downloading m3u8 information
[ThePlatform] 3480409: Downloading JSON metadata
...

It's true that the URLs are printed in the batch listing initially, but again it's not sufficiently helpful. It requires the user to perform correlation (probably manually), and I could repeat all the reasons from the non-verbose case.


Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.