Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align with GNU Tar when a file name is too long #130819

Open
gdh1995 opened this issue Mar 4, 2025 · 1 comment
Open

Align with GNU Tar when a file name is too long #130819

gdh1995 opened this issue Mar 4, 2025 · 1 comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gdh1995
Copy link

gdh1995 commented Mar 4, 2025

Bug report

Bug description:

Recently I found tarfile may generate a file slightly different with the one made by GNU Tar (https://www.gnu.org/software/tar/), especially when a path name is longer than 100 bytes.

Here's the test code:

# py-tar.py
import tarfile, io
memory_file = io.BytesIO()
tar_obj = tarfile.open(name=None, mode="w", fileobj=memory_file, format=tarfile.GNU_FORMAT)
tar_info = tarfile.TarInfo("abcdef" * 20)
tar_info.type = tarfile.DIRTYPE
tar_info.mode = 0o755
tar_info.mtime = 1609459200  # UTC 2021-01-01
tar_info.uid = 1000
tar_info.gid = 1000
tar_info.uname = "ubuntu"
tar_info.gname = "ubuntu"
tar_obj.addfile(tar_info, None)
tar_obj.close()
memory_file.seek(0)
binary_data = memory_file.read()
# import binascii
# hex_data = binascii.hexlify(binary_data)
# sep = 16
# for i in range(0, len(hex_data), sep * 2):
#     part = hex_data[i:i + sep * 2]
#     print(*(part[i:i+2].decode() for i in range(0, len(part), 2)), binary_data[i//2:][:sep], sep=" ")
with open("py.tar", "wb") as fp:
    fp.write(binary_data)
mkdir -m 755 abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef
tar cf gnu.tar --sort=name --owner=ubuntu:1000 --group=ubuntu:1000 --mtime='UTC 2021-01-01' abcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdefabcdef/
python ./py-tar.py

As a result, when comparing the generated py.tar and gnu.tar, we may get such a difference:

Image

So I wonder will python might align such a detail on tarfile.GNU_FORMAT with the one of GNU tar?

BTW, here's my environment (I'm on Ubuntu 24.04), and I find the main branch of CPython has a similar Lib/tarfile.py and should have a same behavior difference:

$ LANG=C tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
$ LANG=C python --version
Python 3.12.3

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

@gdh1995 gdh1995 added the type-bug An unexpected behavior, bug, or error label Mar 4, 2025
@encukou encukou added the stdlib Python modules in the Lib dir label Mar 5, 2025
@picnixz picnixz added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 8, 2025
@picnixz
Copy link
Member

picnixz commented Mar 8, 2025

Reposting the comment from the PR:

Can you motivate the choice for this? namely is there a real benefit between having an explicit user+mode rather than letting the "defaults"? And more importantly, can you cite the relevant manpage / specs where we can find this?

Note: whether this is accpeted or not, this should be treated as a feature request and not a bug IMO. As such, a What's New entry will need to be created, unless the motivation behind this change is not sufficient (in which case we would close the issue as "not planned")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

3 participants