You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In tarballs, 1024 zero bytes at the start of a header indicates End Of Archive. Since pixz always tries to interpret its input as tar formatted, it will erroneously truncate its input if it finds this sequence. Users who don't know better lose their data!
There are two distinct cases:
The input starts with EOA, and contains more data afterwards. This is almost certainly not a tarball, and should be interpreted as non-tarball data with no warning.
The input is a non-empty tarball, but contains data after EOA. This could occur due to user error (eg: concatenated tar files with 'cat') or because some program is storing useful data after EOA. There are several reasonable courses of action to take:
a: Truncate the file at EOA. This loses any following data, but the pixz indexing works fine.
b: Continue compressing data after EOA, but turn off pixz indexing. This preserves all data, but loses the advantages of indexing, like fast listing and extraction.
c: Continue compressing after EOA, and leave indexing on. This preserves all data. However, if the archive is decompressed with plain xz, whatever other program puts data after EOA may be confused by the output.
Case 1 should be implemented with high priority. Case 2 is less common, but perhaps there should at least be a warning until I decide what to do.
The text was updated successfully, but these errors were encountered:
The easiest solution seems to be just including all data after tar EOF. This is case 1 and case 2c. If the user wants to turn off indexing, they can use the -t flag, and I see no reason why 2a might be preferred to 2c.
In tarballs, 1024 zero bytes at the start of a header indicates End Of Archive. Since pixz always tries to interpret its input as tar formatted, it will erroneously truncate its input if it finds this sequence. Users who don't know better lose their data!
There are two distinct cases:
Case 1 should be implemented with high priority. Case 2 is less common, but perhaps there should at least be a warning until I decide what to do.
The text was updated successfully, but these errors were encountered: