Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] aria2 ignores whitespace around filenames, causing postprocessing to fail #276

Closed
6 tasks done
Zirro opened this issue Apr 26, 2021 · 8 comments
Closed
6 tasks done
Labels
bug Bug that is not site-specific

Comments

@Zirro
Copy link
Contributor

Zirro commented Apr 26, 2021

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running yt-dlp version 2021.04.22
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

yt-dlp 'https://www.youtube.com/watch?v=8_SepdI_RmU' --external-downloader aria2c --verbose
[debug] Command-line config: ['https://www.youtube.com/watch?v=8_SepdI_RmU', '--external-downloader', 'aria2c', '--verbose']
[debug] Loading archive file None
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] yt-dlp version 2021.04.22 (zip)
[debug] Python version 3.9.4 (CPython 64bit) - macOS-10.15.7-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] 8_SepdI_RmU: Downloading webpage
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 8_SepdI_RmU: Downloading format(s) 248+251
[debug] Invoking downloader on 'https://r1---sn-gvopm-vu2e.googlevideo.com/videoplayback?expire=1619485722&ei=ug-HYLi7F8bb1gKVhoKoAw&ip=37.120.206.166&id=o-AGW5PCJeeRM6kj7jM2C3JFH_imQYi81noWOPCyhr5kQM&itag=248&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278&source=youtube&requiressl=yes&mh=dt&mm=31%2C29&mn=sn-gvopm-vu2e%2Csn-5hne6nsz&ms=au%2Crdu&mv=m&mvi=1&pcm2cms=yes&pl=24&initcwndbps=406250&vprv=1&mime=video%2Fwebm&ns=g7yxBwfmLHYDAob0GrXOVUIF&gir=yes&clen=97591705&dur=1266.514&lmt=1537344479149721&mt=1619463662&fvip=1&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5533332&n=Boca52OnAodY86cD2H&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAJJ-WFhlwzkNBLtaTxewJrX6gM6V0VdsQBr26OTc1i_kAiEAzjGLLnQ91luxsrRgCPk9gUKmsB__jc4sbAvSGlWIyc4%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgW0Cpu34silG65TbzObrgBKJ1VZazuivGyGIx3YTFozkCIQCROygk-pwWVmbFHCRldSerzvbaH76Dg9P7VjLVrrIMbg%3D%3D'
[download] Destination:  -・゚✧ ASMR Tag ✧゚・ -  [8_SepdI_RmU].f248.webm
[debug] aria2c command line: aria2c -c --console-log-level=warn --summary-interval=0 --download-result=hide --file-allocation=none -x16 -j16 -s16 --header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3647.0 Safari/537.36' --header 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Encoding: gzip, deflate' --header 'Accept-Language: en-us,en;q=0.5' --check-certificate=true --remote-time=true --out ' -・゚✧ ASMR Tag ✧゚・ -  [8_SepdI_RmU].f248.webm.part' --auto-file-renaming=false -- 'https://r1---sn-gvopm-vu2e.googlevideo.com/videoplayback?expire=1619485722&ei=ug-HYLi7F8bb1gKVhoKoAw&ip=37.120.206.166&id=o-AGW5PCJeeRM6kj7jM2C3JFH_imQYi81noWOPCyhr5kQM&itag=248&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278&source=youtube&requiressl=yes&mh=dt&mm=31%2C29&mn=sn-gvopm-vu2e%2Csn-5hne6nsz&ms=au%2Crdu&mv=m&mvi=1&pcm2cms=yes&pl=24&initcwndbps=406250&vprv=1&mime=video%2Fwebm&ns=g7yxBwfmLHYDAob0GrXOVUIF&gir=yes&clen=97591705&dur=1266.514&lmt=1537344479149721&mt=1619463662&fvip=1&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5533332&n=Boca52OnAodY86cD2H&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAJJ-WFhlwzkNBLtaTxewJrX6gM6V0VdsQBr26OTc1i_kAiEAzjGLLnQ91luxsrRgCPk9gUKmsB__jc4sbAvSGlWIyc4%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgW0Cpu34silG65TbzObrgBKJ1VZazuivGyGIx3YTFozkCIQCROygk-pwWVmbFHCRldSerzvbaH76Dg9P7VjLVrrIMbg%3D%3D'
[#123d7f 93MiB/93MiB(99%) CN:1 DL:671KiB]ERROR: unable to download video data: [Errno 2] No such file or directory: ' -・゚✧ ASMR Tag ✧゚・ -  [8_SepdI_RmU].f248.webm.part'
Traceback (most recent call last):
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2460, in process_info
    partial_success, real_download = dl(fname, new_info)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2240, in dl
    return fd.download(name, new_info, subtitle)
  File "/usr/local/bin/yt-dlp/yt_dlp/downloader/common.py", line 381, in download
    return self.real_download(filename, info_dict), True
  File "/usr/local/bin/yt-dlp/yt_dlp/downloader/external.py", line 63, in real_download
    fsize = os.path.getsize(encodeFilename(tmpfilename))
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: ' -・゚✧ ASMR Tag ✧゚・ -  [8_SepdI_RmU].f248.webm.part'

Description

aria2 has a long-standing issue causing it to ignore spaces at the beginning and end of filenames provided to its --out flag even when the value is quoted. Given that the title of some YouTube videos begin with a space, this causes issues with the default naming pattern.

There's an open issue for it in the aria2 repo but given that its not seen much development lately and there are conflicting opinions on whether it should be fixed at all, it seems unlikely to get resolved any time soon. There have been issues created for this in the youtube-dl issue tracker as well, but these have been closed with the stance that aria2 should fix it on their end. I hope the yt-dlp project can be a bit more pragmatic.

I see two ways to handle it. Either by making trimming whitespace a normal step of the filename sanitization process or by renaming the files right after they have been processed by aria2 to restore the lost spaces.

Here's an example of a video with a title displaying this issue when downloaded: https://www.youtube.com/watch?v=8_SepdI_RmU

@pukkandan
Copy link
Member

yt-dlp also strips out any explicitly given spaces at the start/end of filenames. So it would make sense to do the same again after the template is applied.

$ yt-dlp -o " test.%(ext)s " YBHQbu5rbdQ --get-filename
test.webm

@pukkandan
Copy link
Member

Actually, this behavior is inconsistent.

$ yt-dlp ".\ test.%(ext)s " YBHQbu5rbdQ --get-filename
 test.webm

The stripping is done only when the argument is taken from the command line.

The best solution probably is to "fix" the stripping from command-line and to do a renaming when passing through aria2c. We already do similar renaming for ffmpeg since ffmpeg doesn't work well with some special characters

@pukkandan pukkandan added the bug Bug that is not site-specific label Apr 27, 2021
@Zirro
Copy link
Contributor Author

Zirro commented Apr 29, 2021

Adding a rename step to the aria2 extractor sounds fine to me (along with restoring the previous ctime for systems where renaming files may update it).

@fstirlitz I noticed that you added a thumbs down to the initial post. Was this perhaps accidental, or do you have another opinion on the issue/proposed solutions?

@fstirlitz
Copy link
Contributor

My stance is just like youtube-dl’s: it’s bug in aria2, so let them fix it. Send a patch their way (apparently someone has already figured out how to fix it); if they refuse it, send it to downstream packagers like Debian.

@Zirro
Copy link
Contributor Author

Zirro commented Apr 30, 2021

it’s bug in aria2, so let them fix it

I see. For the reasons given in my post (with aria2 barely seeing maintenance as it is) this seems unviable.

if they refuse it, send it to downstream packagers like Debian.

Even if Debian were to accept it (I'm not familiar with their policy on disputed patches), this would produce a worse situation than before. Now we'd see diverging behaviour - not just between different versions of aria2 - but depending on how the user obtained the utility. With a package manager vs self-compiled vs official binary, on certain flavours of Linux vs other flavours of Linux vs macOS vs Windows.

We'd almost certainly have to end up fixing it on our end eventually as well for non-Debian/non-package-manager users. If this had involved a more commonly reported bug, I'm sure we'd also strongly prefer the ability to diagnose a known issue right away by looking at the version number instead of having to ask the reporting user how they obtained their installation of aria2 and whether it includes unofficial patches.

@pukkandan
Copy link
Member

pukkandan commented May 1, 2021

It turns out a fix is actually much simpler than I initially thought. It is possible to make aria2c accept the whitespaces as long as they are not the first/last characters in the given argument. So we can simply pass --dir './path/' --out './filename.ext' instead of --dir 'path' --out 'filename.ext'.

PS: ofc, there needs to be a check for absolute path and / will actually be replaced by os.path.sep

nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
@glenn-slayden
Copy link
Contributor

For what it's worth, all of this is either far worse--or not an issue at all--on Windows, because it basically doesn't allow filenames to begin or end with the space character at all:

docs.microsoft.com:
"File and Folder names that begin or end with the ASCII Space (0x20) will be saved without these characters. File and Folder names that end with the ASCII Period (0x2E) character will also be saved without this character. All other trailing or leading whitespace characters are retained."

Although you can forcibly give files such names by tampering with low-level disk APIs, attempting to access them will likely rampantly fail throughout the OS. So this is already a "diverging behaviour" in yt-dlp--this time between Windows and Linux--for anyone using an output template that exposes uncontrolled data such as %(title)s in first or last position. Frankly, I don't think I'd want any possiblity of generating filenames that begin or end with spaces, even if I were on Linux. It just seems like begging for unnecessary trouble.

But seriously, all this is burying the lede... beyond the platform inconsistency there's actually a full-blown bug in yt-dlp here, which is that, according to the above citation, the default output template %(title)s [%(id)s].%(ext)s, by its very definition, can and will generate invalid Win32 filenames. Now maybe yt-dlp currently works by subsequently applying hacks to remove leading or trailing spaces (and dots?) when they do manifest (I know there's other unavoidable filename cleaning as well); but wouldn't it be better to simply avoid adding more mess in the first place, especially when it's so easy to do:

'default': '%(title)s [%(id)s].%(ext)s',

                      ↓         ↓      
          'default': '[%(title)s] %(id)s.%(ext)s',

While I realize this will never happen (because it would break just about every user in the world), we should at least acknowledge that changing the specification of the default output template (as shown, or anything else that trivally eliminates the problem) is actually the correct fix here. Of course, if the yt-dlp default was changed, users would always be able to change it back at their own risk, but at least then any problems would be their own fault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific
Projects
None yet
Development

No branches or pull requests

5 participants
@Zirro @fstirlitz @glenn-slayden @pukkandan and others