Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Handle Long filenames in default template and temporary files #1136

Open
7 tasks done
tylerszabo opened this issue Oct 1, 2021 · 42 comments
Open
7 tasks done
Labels
enhancement New feature or request

Comments

@tylerszabo
Copy link
Contributor

Checklist

  • I'm reporting a bug unrelated to a specific site
  • I've verified that I'm running yt-dlp version 2021.09.25
  • I've checked that all provided URLs are alive and playable in a browser
  • The provided URLs do not contain any DRM to the best of my knowledge
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

[debug] Command-line config: ['--verbose', 'https://twitter.com/NASA/status/1443572363757559808', '-o', 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] yt-dlp version 2021.09.25 (source)
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Git HEAD: ad095c428
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19043-SP0
[debug] exe versions: ffmpeg n4.4-80-gbf87bdd3f6-20210811, ffprobe n4.4-80-gbf87bdd3f6-20210811, phantomjs 2.1.1
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [twitter] Extracting URL: https://twitter.com/NASA/status/1443572363757559808
[twitter] 1443572363757559808: Downloading guest token
[twitter] 1443572363757559808: Downloading JSON metadata
[twitter] 1443572363757559808: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 1443572363757559808: Downloading 1 format(s): hls-2176
[debug] Invoking downloader on "https://video.twimg.com/amplify_video/1443570535904935945/pl/1280x720/eB7trHC2QS5NrGUL.m3u8"
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 22
[download] Destination: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[download]  40.9% of ~5.05MiB at  9.73MiB/s ETA 00:01 ERROR: unable to open for writing: [Errno 22] Invalid argument: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.part-Frag10.part'
Traceback (most recent call last):
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\http.py", line 262, in download
    ctx.stream, ctx.tmpfilename = sanitize_open(
  File "F:\source\repos\yt-dlp\yt_dlp\utils.py", line 2068, in sanitize_open
    stream = open(encodeFilename(filename), open_mode)
OSError: [Errno 22] Invalid argument: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.part-Frag10.part'

ERROR: unable to download video data: [Errno 2] No such file or directory: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.part-Frag10'
Traceback (most recent call last):
  File "F:\source\repos\yt-dlp\yt_dlp\YoutubeDL.py", line 2758, in process_info
    success, real_download = self.dl(temp_filename, info_dict)
  File "F:\source\repos\yt-dlp\yt_dlp\YoutubeDL.py", line 2475, in dl
    return fd.download(name, new_info, subtitle)
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\common.py", line 408, in download
    return self.real_download(filename, info_dict), True
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\hls.py", line 350, in real_download
    return self.download_and_append_fragments(ctx, fragments, info_dict)
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\fragment.py", line 478, in download_and_append_fragments
    frag_content, frag_index = download_fragment(fragment, ctx)
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\fragment.py", line 418, in download_fragment
    success, frag_content = self._download_fragment(ctx, fragment['url'], info_dict, headers)
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\fragment.py", line 132, in _download_fragment
    return True, self._read_fragment(ctx)
  File "F:\source\repos\yt-dlp\yt_dlp\downloader\fragment.py", line 135, in _read_fragment
    down, frag_sanitized = sanitize_open(ctx['fragment_filename_sanitized'], 'rb')
  File "F:\source\repos\yt-dlp\yt_dlp\utils.py", line 2068, in sanitize_open
    stream = open(encodeFilename(filename), open_mode)
FileNotFoundError: [Errno 2] No such file or directory: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.part-Frag10'

Description

The defaults can result in filenames that are too long and confusingly this can also occur in temporary files. The example here shows a 239 character filename fails when the .part-Frag10 suffix causes the filename to exceed 255 characters.

The workaround is to explicitly specify an output template that will not become too long even when suffixes are added. The default %(title)s [%(id)s].%(ext)s and fallback %(title)s-%(id)s.%(ext)s can both exceed 255 characters, especially with videos in long tweets.

In the example provided the output template is explicitly set to a filename that will exactly exceed 255 characters on fragment 10 to better illustrate the issue. However, omitting template or explicitly using the default %(title)s [%(id)s].%(ext)s will still result in an immediate failure on fragment 1.

This is related to #1003 and would appear to be cross-platform issue (and depends on the filesystem for the target files rather than the OS or runtime).

Work published by NASA is in the public domain so there are no licensing concerns with testing using this URL.

There are various codepaths that expect to be able to add a suffix to a temporary file and the temporary filename is based on the destination filename. By adding an option to specify tempfile format such as %(extractor)s-%(id)s.%(ext)s and altering the default file template to set some high but sane limits this could be mitigated for most users and still allow advanced users to explicitly specify long filenames.

@pukkandan
Copy link
Member

pukkandan commented Oct 1, 2021

Are you asking only about the default? or about user-provided templates too? For user provided templates, you can do something like -o %(title).200B.%(ext)s

@pukkandan pukkandan changed the title Long titles and IDs can result in filenames that are too long for the OS [Feature request] Handle Long filenames in default template Oct 1, 2021
@pukkandan pukkandan added the enhancement New feature or request label Oct 1, 2021
@pukkandan
Copy link
Member

Related: ytdl-org/youtube-dl#29989

@tylerszabo
Copy link
Contributor Author

Improved defaults would be nice but I thought the effects of temp files (since they will always be longer than the main output file) was the more confusing aspect and there is currently no mechanism for specifying a template for temp files (they just inherit the output template).

I experimented with a number of techniques for deterministically shortening but they were not ideal - any format with length limits would ultimately end up being a guess but some limits as you suggested with maybe ~160 characters for title and ~40 for ID might be sensible. Then if temp files could have a different format maybe as high as ~180 for title would be okay without risking a sudden error at chunk 10000.

@pukkandan pukkandan changed the title [Feature request] Handle Long filenames in default template [Feature request] Handle Long filenames in default template and temporary files Oct 1, 2021
@GenericGoose
Copy link

annoyingly, when using yt-dlp Ive had an issue where the filename which was video title + chapter name was too long, which only gave a confusing 'no such file or directory' error. I suppose this is a separate issue

@tylerszabo
Copy link
Contributor Author

@GenericGoose I think that may under this issue as I see it. I don't think there's a reliable and deterministic way to prevent invalid names in all cases since the limitations come down to filesystems (not even just OS differences) but the way I see this problem is in terms of defaults. The current defaults have no built-in size limitations (even for common limitations such as 255 chars) and since they're composed from multiple properties there are many combinations that can exceed limitations at different stages.

In your case the defaults that include chapter names can also encounter this issue. A workaround is to override the default templates unless a more conservative default is introduced.

If we consider this issue to have 2 parts:

  1. The default templates have no limits of any kind and can exceed common limits.
  2. The temp files append additional suffixes which exacerbates this limitation (and makes a sane template require even stricter limits)

The your issue would fall clearly under part 1. While I think part 2 is a bigger concern (because it causes confusing errors partway into a download rather than giving a clearer error and failing fast) part 1 is certainly a component of this issue in my view :)

@InconsolableCellist
Copy link

Re: cross-platform compatibility, you can allow the user to specify a maximum file length as a flag. Users can then alias yt-dlp to automatically include that flag at their leisure.

Alternatively, you can determine if the attempted path exceeds the maximum file length by checking to see if the file was successfully created or not. File APIs for all major platforms will provide this information. You can then truncate the requested filename and inform the user. This is how things were handled all the way back in FAT16 with the "FILENA~1.EXT" convention.

I run into this bug all the time when archiving tweets.

@chapmanjacobd
Copy link

chapmanjacobd commented May 1, 2022

It should also be noted that the limitation is often not 255 characters but 255 bytes so for non-ASCII languages you are limited to a lot fewer than 255 char. "%(uploader)s/%(title).100s [%(id)s].%(ext)s" is pretty safe but I doubt it is a complete soluti

@pukkandan
Copy link
Member

It should also be noted that the limitation is often not 255 characters but 255 bytes so for non-ASCII languages you are limited to a lot fewer than 255 char.

This is why I added the B formatter. Eg: %(title).200B to limit to 200 bytes. See "output template" section of readme for a list of the custom formatters yt-dlp provides

@InconsolableCellist

This comment was marked as resolved.

@chapmanjacobd

This comment was marked as resolved.

@InconsolableCellist

This comment was marked as resolved.

@pukkandan
Copy link
Member

@InconsolableCellist Read the comments literally just above your question! #1136 (comment) #1136 (comment)

@nick-s-b
Copy link

ERROR: unable to open for writing: [Errno 36] File name too long

This is THE MOST ANNOYING bug in yt-dlp I can think of. Almost every time I try to save a Twitter video, yt-dlp fails to save a file. Why can't it just truncate the filename?

@rebane2001
Copy link
Contributor

It should also be noted that the limitation is often not 255 characters but 255 bytes so for non-ASCII languages you are limited to a lot fewer than 255 char.

This is why I added the B formatter. Eg: %(title).200B to limit to 200 bytes. See "output template" section of readme for a list of the custom formatters yt-dlp provides

Thank you for this feature. I figured I'd leave a comment with some things I found unclear myself and had to check so if anybody else comes across this thread they don't need to check it for themselves:

  1. The readme states yt-dlp additionally supports converting to B = Bytes, but the end result is still a filename with correct unescaped unicode in it, not something like \xf0\x9f\xa6\x84. The readme isn't wrong, but the "converts" part can be interpreted in multiple ways.
  2. The bytes are truncated up to the last valid byte, so a string such as 🦄🦄 with a formatting of 6B will only take the first emoji (4 bytes) and result in 🦄 without leaving half of the other one like 🦄\xf0\x9f.

@erlenmayr
Copy link

This happens with Twitter videos often because it uses the whole tweet as filename. How about just using the Twitter handle plus tweet ID and then crop the Tweet?

@alrepin

This comment was marked as off-topic.

@rpdelaney
Copy link

Thanks to comments in this thread I got this working:

$ yt-dlp -o '%(title).200B.%(ext)s' '<url>'

Hypothetically, if I wanted to be able to tell which files have truncated names, is it possible to add something to indicate that? Could be an (0x2026) or w/e, doesn't matter.

@pukkandan
Copy link
Member

Add a %(title.201&…|)s - read as "if title.201 (201'th char) exists, then (&) add , else (|) add nothing"

@rpdelaney
Copy link

rpdelaney commented Apr 11, 2023

Awesome. At first I misunderstood you, but this seems to be working great!

yt-dlp --output '%(title).200B%(title.201&…|)s.%(ext)s'

Edit:
This seems better: count bytes not characters when adding the ellipsis too. (See below)

yt-dlp --output '%(title).200B%(title.201B&…|)s.%(ext)s'

@chrizilla
Copy link

chrizilla commented Apr 26, 2023

yt-dlp --output '%(title).200B%(title.201&…|)s.%(ext)s

@pukkandan : isn't this mixing apples and oranges (200 bytes and 201st character) ?

@rpdelaney
Copy link

@chrizilla possibly. Do you have any suggestions?

in #6882 this was suggested but I haven't tried it myself: -o "%(title).150B [%(id)s].%(ext)s" I suppose it's unlikely that the id or the extension would have many surprises, though.

@chrizilla
Copy link

chrizilla commented Apr 27, 2023

@rpdelaney : If your title consists of ANSI characters only, I guess the 201st character is the 201st byte. But for unicode chars, we probably need to ask if the 201st byte is present, not the 201st character. But I haven't looked into if and how this can be done. Maybe %(title.201B&…|)s ?

I suppose it's unlikely that id or ext would have many surprises, though.

Depends. For example, the id for this CNN video is 91 bytes/chars long: 😮
id = world/2023/05/03/russia-attempted-drone-attack-kremlin-putin-video-ukraine-reaction-vpx.cnn

@rpdelaney
Copy link

At least in linux, I know that the filename length limits are counted in raw bytes since there is no enforcement of a specific character encoding for a filesystem. Counting bytes would be safest.

@rpdelaney
Copy link

@chrizilla How about this then?

yt-dlp --output '%(title).200B%(title.201B&…|)s.%(ext)s'

@chrizilla
Copy link

chrizilla commented Apr 28, 2023

@rpdelaney : Does it work for you? Yes, this was my suggestion, but upon closer inspection it doesn't work for me. I am not sure this is really on-topic here, so I opened #6983.

@kenorb
Copy link

kenorb commented Dec 13, 2023

Same for https://twitter.com/BrainStorm_Joe/status/1734386440706953260.

ERROR: unable to open for writing: [Errno 36] File name too long

@bashonly bashonly mentioned this issue Dec 19, 2023
11 tasks
@kenorb

This comment was marked as duplicate.

@keithstellyes

This comment was marked as duplicate.

@pukkandan
Copy link
Member

Everyone agrees this is a good idea. The reason this is not implemented is mostly due to technical reasons and partly due to compatibility reasons. More examples aren't helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Filename
Development

Successfully merging a pull request may close this issue.