Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on tarball truncation #6

Closed
vasi opened this issue Oct 12, 2012 · 1 comment
Closed

Warn on tarball truncation #6

vasi opened this issue Oct 12, 2012 · 1 comment

Comments

@vasi
Copy link
Owner

vasi commented Oct 12, 2012

In tarballs, 1024 zero bytes at the start of a header indicates End Of Archive. Since pixz always tries to interpret its input as tar formatted, it will erroneously truncate its input if it finds this sequence. Users who don't know better lose their data!

There are two distinct cases:

  1. The input starts with EOA, and contains more data afterwards. This is almost certainly not a tarball, and should be interpreted as non-tarball data with no warning.
  2. The input is a non-empty tarball, but contains data after EOA. This could occur due to user error (eg: concatenated tar files with 'cat') or because some program is storing useful data after EOA. There are several reasonable courses of action to take:
    • a: Truncate the file at EOA. This loses any following data, but the pixz indexing works fine.
    • b: Continue compressing data after EOA, but turn off pixz indexing. This preserves all data, but loses the advantages of indexing, like fast listing and extraction.
    • c: Continue compressing after EOA, and leave indexing on. This preserves all data. However, if the archive is decompressed with plain xz, whatever other program puts data after EOA may be confused by the output.

Case 1 should be implemented with high priority. Case 2 is less common, but perhaps there should at least be a warning until I decide what to do.

@vasi
Copy link
Owner Author

vasi commented Oct 14, 2012

The easiest solution seems to be just including all data after tar EOF. This is case 1 and case 2c. If the user wants to turn off indexing, they can use the -t flag, and I see no reason why 2a might be preferred to 2c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant