New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix n² directory checks during package installation #1472
Conversation
Dramatically improves install times for packages with large amounts of files. Takes it from 1+ hours to <1 minute.
065c2a2
to
486fdfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love sets. (#942)
for path in package_zip.namelist(): | ||
extracted_paths = set() | ||
for info in package_zip.infolist(): | ||
path = info.filename | ||
dest = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be shortened to dest = path = info.filename
, but really does't matter.
extracted_paths = [] | ||
for path in package_zip.namelist(): | ||
extracted_paths = set() | ||
for info in package_zip.infolist(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference in the call to infolist()
versus namelist()
, or is it just in the usage of set()
?
If it is important to call infolist()
, it would be good to add a comment explaining why we use it so it isn't refactored away in the future to be more "pythonic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It simply avoids creating a whole new list from infolist()
. It's implemented as [info.filename for info in infolist()]
. So it's not very important, but I felt that if the zip info
is ever going to be required in this loop it'll be done right if it's already available - rather than someone querying the info by filename for every file.
Awesome, thanks for the fix! |
Dramatically improves install times for packages with large amounts of
files. Takes it from 1+ hours to <1 minute.