Skip to content

Optimize core loop of _Count_vbool #5640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

localspook
Copy link

To handle the difference between counting ones and counting zeros, _Count_vbool's core loop conditionally complements every block it processes: that's O(n) work. We can do better: we can unconditionally count ones, then, after the loop, if we actually wanted zeros, subtract the number of ones from the number of bits we processed: O(1) work.

@localspook localspook requested a review from a team as a code owner July 9, 2025 17:18
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jul 9, 2025
@localspook
Copy link
Author

All CI jobs are failing with "The agent did not connect within the alloted time of 45 minute(s)." 🤔

@StephanTLavavej
Copy link
Member

If that happens, ping me on #stalled-checks on the STL Discord and I can rerun the checks. Unfortunately, while our CI is highly reliable when it runs, we're at the mercy of Azure deciding to give us any VMs, and it's been rough lately.

@StephanTLavavej StephanTLavavej self-assigned this Jul 9, 2025
@StephanTLavavej StephanTLavavej added the performance Must go faster label Jul 9, 2025
@AlexGuteniev
Copy link
Contributor

Can you post benchmark results before / after?
(I guess there is such benchmark already)

@AlexGuteniev
Copy link
Contributor

I mean intuitively it looks like an improvement, but without benchmarking I wouldn't be sure that the compiler doesn't do something clever on its own.

This PR optimization is not what compiler is likely to do, but it might be able to achieve the same by duplication the loop for *_VbFirst and ~*_VbFirst and making the condition out of loop.

I don't think it is very likely to happen, and that's why we implemented the _Select_popcount_impl thing, but would be good to prove with measurement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Status: Initial Review
Development

Successfully merging this pull request may close these issues.

3 participants