Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix n² directory checks during package installation #1472

Merged
merged 1 commit into from Jul 29, 2020

Conversation

BenjaminSchaaf
Copy link
Contributor

Dramatically improves install times for packages with large amounts of
files. Takes it from 1+ hours to <1 minute.

Dramatically improves install times for packages with large amounts of
files. Takes it from 1+ hours to <1 minute.
Copy link
Collaborator

@FichteFoll FichteFoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love sets. (#942)

for path in package_zip.namelist():
extracted_paths = set()
for info in package_zip.infolist():
path = info.filename
dest = path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be shortened to dest = path = info.filename, but really does't matter.

extracted_paths = []
for path in package_zip.namelist():
extracted_paths = set()
for info in package_zip.infolist():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a performance difference in the call to infolist() versus namelist(), or is it just in the usage of set()?

If it is important to call infolist(), it would be good to add a comment explaining why we use it so it isn't refactored away in the future to be more "pythonic".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It simply avoids creating a whole new list from infolist(). It's implemented as [info.filename for info in infolist()]. So it's not very important, but I felt that if the zip info is ever going to be required in this loop it'll be done right if it's already available - rather than someone querying the info by filename for every file.

@wbond
Copy link
Owner

wbond commented Jul 29, 2020

Awesome, thanks for the fix!

@wbond wbond merged commit 33672c0 into wbond:master Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants