Skip to content

ZipReadBinaryProvider: Decompress largest zip archives first #1715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

autoantwort
Copy link
Contributor

@autoantwort autoantwort commented Jun 26, 2025

This results in lower total time needed to decompress all zip files.

For example from 4.4 min to 3.8 min @ daily job

This results in lower total time needed to decompress all zip files.
@BillyONeal
Copy link
Member

... what? Do we know why that order is faster?

@autoantwort
Copy link
Contributor Author

If you have 2 cores and the Tasks aaa, bb and ccccc: Random Ordering

1: aaa
2: bbccccc

You are using 7 time slots. If you start with the largest:

1: ccccc
2: aaabb

you only need 5 time slots.

@BillyONeal
Copy link
Member

I see, because we are parallelizing it's the bin packing problem and we're using size as a proxy for how big each task is in the bin.

Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement!

@BillyONeal BillyONeal merged commit b6a74be into microsoft:main Jun 30, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants