Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic extractor with default settings crashes on long URLs #21944

Closed
hseg opened this issue Jul 30, 2019 · 0 comments
Closed

Generic extractor with default settings crashes on long URLs #21944

hseg opened this issue Jul 30, 2019 · 0 comments
Labels

Comments

@hseg
Copy link

@hseg hseg commented Jul 30, 2019

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2019.07.30
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl -v https://www.omnycontent.com/d/playlist/2b205f04-5ead-4e7a-9f30-a8d9007aa8c1/d54a48ab-647c-4266-b9fa-a9f300774a28/83de6c47-68ba-4351-901d-a9f3007a6593/podcast.rss --playlist-end 1
[debug] System config: []
[debug] User config: ['--external-downloader', 'aria2c', '--external-downloader-args', '--dir . -j 5 -x 5 -s 5 -c -k 1MB', '--format', 'best[filesize<300MB]/best']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.omnycontent.com/d/playlist/2b205f04-5ead-4e7a-9f30-a8d9007aa8c1/d54a48ab-647c-4266-b9fa-a9f300774a28/83de6c47-68ba-4351-901d-a9f3007a6593/podcast.rss', '--playlist-end', '1']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.07.30
[debug] Python version 3.7.4 (CPython) - Linux-5.2.3-arch1-1-ARCH-x86_64-with-arch
[debug] exe versions: ffmpeg 4.1.4, ffprobe 4.1.4
[debug] Proxy map: {}
[generic] podcast: Requesting header
WARNING: Falling back on generic information extractor.
[generic] podcast: Downloading webpage
[generic] podcast: Extracting information
[download] Downloading playlist: המשרוקית של גלובס
[generic] playlist המשרוקית של גלובס: Collected 14 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[generic] audio: Requesting header
[redirect] Following redirect to https://d23hjgczzncdts.cloudfront.net/organization-2b205f04-5ead-4e7a-9f30-a8d9007aa8c1-private/programs/d54a48ab-647c-4266-b9fa-a9f300774a28/clips/954c32a2-511d-49e6-8177-aa9200f7baa3/published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo%2BKZIJA%2BkEsoBHQ2FjCag4C%2B40lJ%2Bw%3D&se=2019-08-22T04%3A01%3A37Z&rscd=attachment%3B%20filename%2A%3DUTF-8%27%2714.%2520%25D7%2594%25D7%2598%25D7%25A7%25D7%2598%25D7%2599%25D7%25A7%25D7%2594%2520%25D7%25A9%25D7%259C%2520%25D7%259E%25D7%25A4%25D7%2599%25D7%25A6%25D7%2599%2520%25D7%2594%25D7%25A4%25D7%2599%25D7%2599%25D7%25A7%2520%25D7%25A0%25D7%2599%25D7%2595%25D7%2596.mp3&sessionId=95c110dd-70cf-5125-81d1-59312a32eb85&utm_source=Podcast
[generic] published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04:01:37Z&rscd=attachment; filename*=UTF-8''14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96: Requesting header
[debug] Invoking downloader on 'https://d23hjgczzncdts.cloudfront.net/organization-2b205f04-5ead-4e7a-9f30-a8d9007aa8c1-private/programs/d54a48ab-647c-4266-b9fa-a9f300774a28/clips/954c32a2-511d-49e6-8177-aa9200f7baa3/published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo%2BKZIJA%2BkEsoBHQ2FjCag4C%2B40lJ%2Bw%3D&se=2019-08-22T04%3A01%3A37Z&rscd=attachment%3B%20filename%2A%3DUTF-8%27%2714.%2520%25D7%2594%25D7%2598%25D7%25A7%25D7%2598%25D7%2599%25D7%25A7%25D7%2594%2520%25D7%25A9%25D7%259C%2520%25D7%259E%25D7%25A4%25D7%2599%25D7%25A6%25D7%2599%2520%25D7%2594%25D7%25A4%25D7%2599%25D7%2599%25D7%25A7%2520%25D7%25A0%25D7%2599%25D7%2595%25D7%2596.mp3&sessionId=95c110dd-70cf-5125-81d1-59312a32eb85&utm_source=Podcast'
[download] Destination: 14. הטקטיקה של מפיצי הפייק ניוז-published.mp3sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04_01_37Z&rscd=attachment; filename_=UTF-8''14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96.mp3
[debug] aria2c command line: aria2c -c --dir . -j 5 -x 5 -s 5 -c -k 1MB --out '14. הטקטיקה של מפיצי הפייק ניוז-published.mp3sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04_01_37Z&rscd=attachment; filename_=UTF-8'"'"''"'"'14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96.mp3.part' --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3776.1 Safari/537.36' --header 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Encoding: gzip, deflate' --header 'Accept-Language: en-us,en;q=0.5' --check-certificate=true -- 'https://d23hjgczzncdts.cloudfront.net/organization-2b205f04-5ead-4e7a-9f30-a8d9007aa8c1-private/programs/d54a48ab-647c-4266-b9fa-a9f300774a28/clips/954c32a2-511d-49e6-8177-aa9200f7baa3/published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo%2BKZIJA%2BkEsoBHQ2FjCag4C%2B40lJ%2Bw%3D&se=2019-08-22T04%3A01%3A37Z&rscd=attachment%3B%20filename%2A%3DUTF-8%27%2714.%2520%25D7%2594%25D7%2598%25D7%25A7%25D7%2598%25D7%2599%25D7%25A7%25D7%2594%2520%25D7%25A9%25D7%259C%2520%25D7%259E%25D7%25A4%25D7%2599%25D7%25A6%25D7%2599%2520%25D7%2594%25D7%25A4%25D7%2599%25D7%2599%25D7%25A7%2520%25D7%25A0%25D7%2599%25D7%2595%25D7%2596.mp3&sessionId=95c110dd-70cf-5125-81d1-59312a32eb85&utm_source=Podcast'

07/30 18:11:26 [NOTICE] Downloading 1 item(s)

07/30 18:11:26 [ERROR] CUID#7 - Download aborted. URI=https://d23hjgczzncdts.cloudfront.net/organization-2b205f04-5ead-4e7a-9f30-a8d9007aa8c1-private/programs/d54a48ab-647c-4266-b9fa-a9f300774a28/clips/954c32a2-511d-49e6-8177-aa9200f7baa3/published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo%2BKZIJA%2BkEsoBHQ2FjCag4C%2B40lJ%2Bw%3D&se=2019-08-22T04%3A01%3A37Z&rscd=attachment%3B%20filename%2A%3DUTF-8%27%2714.%2520%25D7%2594%25D7%2598%25D7%25A7%25D7%2598%25D7%2599%25D7%25A7%25D7%2594%2520%25D7%25A9%25D7%259C%2520%25D7%259E%25D7%25A4%25D7%2599%25D7%25A6%25D7%2599%2520%25D7%2594%25D7%25A4%25D7%2599%25D7%2599%25D7%25A7%2520%25D7%25A0%25D7%2599%25D7%2595%25D7%2596.mp3&sessionId=95c110dd-70cf-5125-81d1-59312a32eb85&utm_source=Podcast
Exception: [AbstractCommand.cc:403] errorCode=16 URI=https://d23hjgczzncdts.cloudfront.net/organization-2b205f04-5ead-4e7a-9f30-a8d9007aa8c1-private/programs/d54a48ab-647c-4266-b9fa-a9f300774a28/clips/954c32a2-511d-49e6-8177-aa9200f7baa3/published.mp3?sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo%2BKZIJA%2BkEsoBHQ2FjCag4C%2B40lJ%2Bw%3D&se=2019-08-22T04%3A01%3A37Z&rscd=attachment%3B%20filename%2A%3DUTF-8%27%2714.%2520%25D7%2594%25D7%2598%25D7%25A7%25D7%2598%25D7%2599%25D7%25A7%25D7%2594%2520%25D7%25A9%25D7%259C%2520%25D7%259E%25D7%25A4%25D7%2599%25D7%25A6%25D7%2599%2520%25D7%2594%25D7%25A4%25D7%2599%25D7%2599%25D7%25A7%2520%25D7%25A0%25D7%2599%25D7%2595%25D7%2596.mp3&sessionId=95c110dd-70cf-5125-81d1-59312a32eb85&utm_source=Podcast
  -> [RequestGroup.cc:760] errorCode=16 Download aborted.
  -> [AbstractDiskWriter.cc:221] errNum=36 errorCode=16 Failed to open the file ./14. הטקטיקה של מפיצי הפייק ניוז-published.mp3sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04_01_37Z&rscd=attachment; filename_=UTF-8''14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96.mp3.part, cause: File name too long

07/30 18:11:26 [NOTICE] Download GID#e74f7a7c5ef443d0 not complete: ./14. הטקטיקה של מפיצי הפייק ניוז-published.mp3sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04_01_37Z&rscd=attachment; filename_=UTF-8''14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96.mp3.part

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
e74f7a|ERR |       0B/s|./14. הטקטיקה של מפיצי הפייק ניוז-published.mp3sv=2018-03-28&sr=b&si=private&sig=B1q4rv9ydyYJGo+KZIJA+kEsoBHQ2FjCag4C+40lJ+w=&se=2019-08-22T04_01_37Z&rscd=attachment; filename_=UTF-8''14.%20%D7%94%D7%98%D7%A7%D7%98%D7%99%D7%A7%D7%94%20%D7%A9%D7%9C%20%D7%9E%D7%A4%D7%99%D7%A6%D7%99%20%D7%94%D7%A4%D7%99%D7%99%D7%A7%20%D7%A0%D7%99%D7%95%D7%96.mp3.part

Status Legend:
(ERR):error occurred.

aria2 will resume download if the transfer is restarted.
If there are any errors, then see the log file. See '-l' option in help/man page for details.



ERROR: aria2c exited with code 16
  File "/usr/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2019.7.30', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3.7/site-packages/youtube_dl/__init__.py", line 474, in main
    _real_main(argv)
  File "/usr/lib/python3.7/site-packages/youtube_dl/__init__.py", line 464, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 2008, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 807, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 1006, in process_ie_result
    extra_info=extra)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 899, in process_ie_result
    new_result, download=download, extra_info=extra_info)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 899, in process_ie_result
    new_result, download=download, extra_info=extra_info)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 861, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 1642, in process_video_result
    self.process_info(new_info)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 1915, in process_info
    success = dl(filename, info_dict)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 1854, in dl
    return fd.download(name, info)
  File "/usr/lib/python3.7/site-packages/youtube_dl/downloader/common.py", line 366, in download
    return self.real_download(filename, info_dict)
  File "/usr/lib/python3.7/site-packages/youtube_dl/downloader/external.py", line 64, in real_download
    self.get_basename(), retval))
  File "/usr/lib/python3.7/site-packages/youtube_dl/downloader/common.py", line 165, in report_error
    self.ydl.report_error(*args, **kargs)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 624, in report_error
    self.trouble(error_message, tb)
  File "/usr/lib/python3.7/site-packages/youtube_dl/YoutubeDL.py", line 586, in trouble
    tb_data = traceback.format_list(traceback.extract_stack())

Description

The generic extractor (which is called on RSS enclosures) sets the video ID to be
$slug$query from the original URL. Normally, this doesn't pose any problems,
except some feed syndicators (notably, omny.fm as seen above) have long query strings.

This leads to a call to open a file with a too long filename, and the downloader rightly crashes.
Solution: Truncate the filename generated to NAME_MAX, minus e.g. 5 characters for the
.part extension.

@dstftw dstftw closed this Jul 30, 2019
@dstftw dstftw added the duplicate label Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.