Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Are encrypted zip files using Deflate64 as the compression method supported? #29

Closed
raychanks opened this issue Sep 10, 2022 · 10 comments

Comments

@raychanks
Copy link

Hi there.
From this PR, it seems that encrypted zip files are supported by now, but I found that I cannot unzip a password protected zip file using Deflate64 as the compression method.
I am not sure whether it is not implemented yet or it is an unexpected bug. Thanks.

@michalc
Copy link
Member

michalc commented Sep 10, 2022

So far it's unexpected - I would have thought they should work

So a few questions:

  • What happens when you try to unzip such a file?
  • What sort of encryption is used in the file?
  • Can you post the zip file somewhere? (Understood if this is not possible)
  • Can you post details on how the file was created?

@michalc
Copy link
Member

michalc commented Sep 10, 2022

And one more

  • Can you post the Python code you're using to unzip?

@michalc
Copy link
Member

michalc commented Sep 10, 2022

Also - I've now tested 4 zip files: a Deflate64 zip with a file encrypted with one of AES256, AES192, AES128 or ZipCrypto. So created with:

echo "some contents" | 7z a -mm=Deflate64 -mem=AES256 -ppass test.zip -sicontents.txt

(or similarly for each of the other mechanisms)

and unzipping with:

from stream_unzip import stream_unzip

with open('test.zip', 'rb') as f:
    for name, size, chunks in stream_unzip(iter(lambda: f.read(65536), b''), password=b'pass'):
        print(b''.join(chunks))

and in each case it seems to work. So so far I think that yes, it should be supported.

@raychanks
Copy link
Author

raychanks commented Sep 11, 2022

Thanks for the prompt reply. After fiddling a little bit more, it seems that not only Deflate64 has this issue, but also Deflate.

What happens when you try to unzip such a file?

  • I got IncorrectZipCryptoPasswordError()

What sort of encryption is used in the file?

  • I was using the default ZipCrypto encryption. No problem if I use AES.

I found that the problem only occurs when I zip a file directly from the file system, no issue if I zip the file from stdin. I replicated the issue with the following script:

echo "Lorem Ipsum is simply dummy text of the printing and typesetting industry." > testing.txt

# The one that has problem
7z a -ppass testing-fs.zip testing.txt

# No problem for this one
7z a -ppass -sitesting.txt testing-stdin.zip < testing.txt

Not sure whether this matter, but I am using an M1 chip.
Python version: 3.10.4

I tried to use both 7-Zip [64] 17.04 (through brew install p7zip) and 7-Zip (z) 22.01 (arm64) (through brew install sevenzip) with the above script to generate the zip files, and got the same result.

Can you post the Python code you're using to unzip?

The original code I used is like this:

with open("testing-fs.zip", "rb") as f:
    for name, size, chunks in stream_unzip(f, password=b"pass"):
        print(b"".join(chunks))

But I got the same result if I use the one like yours:

with open("testing-fs.zip", "rb") as f:
    for name, size, chunks in stream_unzip(
        iter(lambda: f.read(65536), b""), password=b"pass"
    ):
        print(b"".join(chunks))

These are the two zip files I generated.

@michalc
Copy link
Member

michalc commented Sep 11, 2022

Thanks! I can now re-create the issue, and can also confirm that Python's zipfile module appears to decrypt+unzip such files fine (although just Deflate, not Deflate64)

Investigating further

@michalc
Copy link
Member

michalc commented Sep 11, 2022

Just one thing, by-the-by

with open("testing-fs.zip", "rb") as f:
    for name, size, chunks in stream_unzip(f, password=b"pass"):

I suspect this doesn't stream the file. stream_unzip attempts to iterate over its first argument, in this case f, but I think such a file object, created with open in binary mode, ends up being read completely into memory on first iteration. I've not done extensive testing into this, and this was just one version of Python etc.

So all fine with small files, but less good with larger ones.

@michalc
Copy link
Member

michalc commented Sep 11, 2022

So (and this is just me thinking out loud) I wonder if it's something to do with the "data descriptor" - a difference between files created from stdin and not is often the presence of a data descriptor - metadata that goes after the file when generating from stdin, but often no present when not.

@michalc
Copy link
Member

michalc commented Sep 11, 2022

So I think yes, it was to do with the data descriptor. WIP PR at #30 to address it (just need to add a test really)

@michalc
Copy link
Member

michalc commented Sep 12, 2022

#30 now merged, and part of v0.0.70

Will release onto pypi in the next few days

@michalc
Copy link
Member

michalc commented Sep 15, 2022

Now released

@michalc michalc closed this as completed Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants