You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#398 added a valuable feature to set the compression level. One important use is to be able to specify compression-level: 0 to avoid wasting computation for no benefit when the files being added to the zip are themselves already compressed.
However, upload-archive with compression-level: 0 produces a zip archive that cannot be unzipped with a streaming implementation such as java.util.zip.ZipInputStream.
That means the entire archive must first be downloaded and then unzipped with a non-streaming, seekable-file implementation such as java.util.zip.ZipFile.
The streaming failure reported by Java's ZipInputStream is:
Exception java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
What did you expect to happen?
I expected the zip archive created by upload-artifact to be equally unzippable using a seekable-file implementation like ZipFile or with a streaming implementation like ZipInputStream.
How can we reproduce it?
Create a zip archive foo.zip with upload-artifact and compression-level: 0. Then attempt to list its contents as in this jshell script:
Observe that the first try block, using ZipFile, successfully lists the archive, and the second try block, using ZipInputStream, throws the only DEFLATED entries can have EXT descriptor exception.
Retry using a zip file created locally with zip -c -0 on Linux and observe that both methods successfully list the archive.
Anything else we need to know?
There is a related report at bugs.openjdk.org that helps explain what is happening.
A zip archive has a "local header" preceding each member, and a "central directory" at the end, repeating some of the same information, such as the member's name, its original length, its length in the archive, and its CRC. It is also possible to have a "data descriptor" (or "EXT descriptor") record that follows a member.
The data descriptor is used when the utility creating the archive doesn't know those values at the time it starts writing the member. For example, when compressing, the size in the archive won't be known until the member has been written. For that case, bit 3 (0x08) in the "local header" will be set, telling any consumer that the values in the local header aren't set, and to go ahead and read the member to the end and get the values from the data descriptor that follows it.
That works for any compression level other than 0, because the DEFLATE compressed-data format has enough structure that the consumer can know where the end is, and find the data descriptor after that. But at compression level 0, there is no structure, just arbitrary bytes of original data, which could include bytes that look like a data descriptor record. A streaming consumer has no way to know where a compression-0 member ends, unless its length was in the "local header" at the front.
A seekable-file consumer can still succeed, because the member lengths are all repeated in the "central directory" at the end of the archive, which the consumer knows how to find.
But to make the archive usable by a streaming consumer, any producer creating a zip file with compression-0 members in it has got to write those with the correct information in the local header preceding each member, and cannot be lazy and put a data descriptor after the member, because a streaming consumer has no dependable way to recover that. (With no compression, a producer already knows the length in the archive in advance if it knows the original size of the file.)
The zip-writing implementation being used by upload-artifact is setting bit 3 in the local header and putting data descriptors after the members, even for compression-level 0.
As a workaround, where compression-level: 0 might be ideal (such as for archiving already-compressed files), compression-level: 1 can be used instead. It will spend as little wasted compression effort as possible, but still create a zip file with the necessary structure to be recovered both by seekable-file and by streaming consumers.
If it is not easy to fix the behavior, then the documentation should warn about it and suggest the workaround.
What version of the action are you using?
v4
What are your runner environments?
linux
Are you on GitHub Enterprise Server? If so, what version?
No response
The text was updated successfully, but these errors were encountered:
What happened?
#398 added a valuable feature to set the compression level. One important use is to be able to specify
compression-level: 0
to avoid wasting computation for no benefit when the files being added to the zip are themselves already compressed.However,
upload-archive
withcompression-level: 0
produces a zip archive that cannot be unzipped with a streaming implementation such asjava.util.zip.ZipInputStream
.That means the entire archive must first be downloaded and then unzipped with a non-streaming, seekable-file implementation such as
java.util.zip.ZipFile
.The streaming failure reported by Java's
ZipInputStream
is:What did you expect to happen?
I expected the zip archive created by
upload-artifact
to be equally unzippable using a seekable-file implementation likeZipFile
or with a streaming implementation likeZipInputStream
.How can we reproduce it?
Create a zip archive
foo.zip
withupload-artifact
andcompression-level: 0
. Then attempt to list its contents as in thisjshell
script:Observe that the first
try
block, usingZipFile
, successfully lists the archive, and the secondtry
block, usingZipInputStream
, throws theonly DEFLATED entries can have EXT descriptor
exception.Retry using a zip file created locally with
zip -c -0
on Linux and observe that both methods successfully list the archive.Anything else we need to know?
There is a related report at
bugs.openjdk.org
that helps explain what is happening.A zip archive has a "local header" preceding each member, and a "central directory" at the end, repeating some of the same information, such as the member's name, its original length, its length in the archive, and its CRC. It is also possible to have a "data descriptor" (or "EXT descriptor") record that follows a member.
The data descriptor is used when the utility creating the archive doesn't know those values at the time it starts writing the member. For example, when compressing, the size in the archive won't be known until the member has been written. For that case, bit 3 (0x08) in the "local header" will be set, telling any consumer that the values in the local header aren't set, and to go ahead and read the member to the end and get the values from the data descriptor that follows it.
That works for any compression level other than 0, because the
DEFLATE
compressed-data format has enough structure that the consumer can know where the end is, and find the data descriptor after that. But at compression level 0, there is no structure, just arbitrary bytes of original data, which could include bytes that look like a data descriptor record. A streaming consumer has no way to know where a compression-0 member ends, unless its length was in the "local header" at the front.A seekable-file consumer can still succeed, because the member lengths are all repeated in the "central directory" at the end of the archive, which the consumer knows how to find.
But to make the archive usable by a streaming consumer, any producer creating a zip file with compression-0 members in it has got to write those with the correct information in the local header preceding each member, and cannot be lazy and put a data descriptor after the member, because a streaming consumer has no dependable way to recover that. (With no compression, a producer already knows the length in the archive in advance if it knows the original size of the file.)
The zip-writing implementation being used by
upload-artifact
is setting bit 3 in the local header and putting data descriptors after the members, even for compression-level 0.As a workaround, where
compression-level: 0
might be ideal (such as for archiving already-compressed files),compression-level: 1
can be used instead. It will spend as little wasted compression effort as possible, but still create a zip file with the necessary structure to be recovered both by seekable-file and by streaming consumers.If it is not easy to fix the behavior, then the documentation should warn about it and suggest the workaround.
What version of the action are you using?
v4
What are your runner environments?
linux
Are you on GitHub Enterprise Server? If so, what version?
No response
The text was updated successfully, but these errors were encountered: