Skip to content

Make Int{Map,Set} folds friendlier to optimizations #1149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 21, 2025

Conversation

meooow25
Copy link
Contributor

@meooow25 meooow25 commented Jun 19, 2025

Move the Nil branch to the top-level and error on Nil in go.
The Note [IntMap folds] added in Data.IntMap.Internal explains the details.


Benchmarks on GHC 9.10.1:

IntMap:

Name                                  Time - - - - - - - -    Allocated - - - - -
                                           A       B     %         A       B     %
folds with key.foldMap_elem            14 μs   14 μs   +0%     34 B    33 B    -2%
folds with key.foldMap_traverseSum     17 μs   12 μs  -30%     64 KB   42 B   -99%
folds with key.foldl'_maximum          30 μs   18 μs  -41%    128 KB   64 KB  -49%
folds with key.foldl'_sum              12 μs   12 μs   +0%     45 B    44 B    -2%
folds with key.foldl_cpsOneShotSum     36 μs   34 μs   -5%    320 KB  320 KB   +0%
folds with key.foldl_cpsSum            75 μs   34 μs  -54%    608 KB  320 KB  -47%
folds with key.foldl_elem              18 μs   19 μs   +2%    128 KB  128 KB   +0%
folds with key.foldl_traverseSum       66 μs   24 μs  -63%    480 KB  160 KB  -66%
folds with key.foldr'_maximum          26 μs   14 μs  -46%     65 B    60 B    -7%
folds with key.foldr'_sum              12 μs   12 μs   +0%     45 B    44 B    -2%
folds with key.foldr_cpsOneShotSum     36 μs   35 μs   -2%    320 KB  320 KB   +0%
folds with key.foldr_cpsSum            75 μs   34 μs  -54%    608 KB  320 KB  -47%
folds with key.foldr_elem              18 μs   18 μs   +0%    128 KB  128 KB   +0%
folds with key.foldr_traverseSum       66 μs   24 μs  -63%    480 KB  160 KB  -66%
folds.foldMap_elem                     14 μs   14 μs   +1%     36 B    28 B   -22%
folds.foldMap_traverseSum              17 μs   10 μs  -42%     64 KB   44 B   -99%
folds.foldl'_maximum                   28 μs   14 μs  -51%     64 KB   44 B   -99%
folds.foldl'_sum                       10 μs   10 μs   -2%     44 B    44 B    +0%
folds.foldl_cpsOneShotSum              35 μs   34 μs   -4%    288 KB  288 KB   +0%
folds.foldl_cpsSum                     61 μs   34 μs  -44%    416 KB  288 KB  -30%
folds.foldl_elem                       18 μs   18 μs   +2%    128 KB  128 KB   +0%
folds.foldl_traverseSum                51 μs   24 μs  -52%    288 KB  160 KB  -44%
folds.foldr'_maximum                   26 μs   12 μs  -51%     49 B    44 B   -10%
folds.foldr'_sum                       10 μs   10 μs   +0%     41 B    44 B    +7%
folds.foldr_cpsOneShotSum              35 μs   35 μs   +0%    288 KB  288 KB   +0%
folds.foldr_cpsSum                     61 μs   35 μs  -43%    416 KB  288 KB  -30%
folds.foldr_elem                       17 μs   18 μs   +4%    128 KB  128 KB   +0%
folds.foldr_traverseSum                51 μs   24 μs  -52%    288 KB  160 KB  -44%

IntSet:

Name                                Time - - - - - - - -    Allocated - - - - -
                                         A       B     %         A       B     %
folds:dense.foldMap_elem            5.3 μs  5.4 μs   +1%     26 B    26 B    +0%
folds:dense.foldMap_traverseSum     4.5 μs  4.4 μs   -2%    1.0 KB   42 B   -96%
folds:dense.foldl'_maximum          5.1 μs  4.9 μs   -3%    2.1 KB  2.1 KB   +0%
folds:dense.foldl'_sum              4.3 μs  4.4 μs   +1%     42 B    42 B    +0%
folds:dense.foldl_cpsOneShotSum     5.7 μs  6.1 μs   +7%    2.5 KB  2.5 KB   +0%
folds:dense.foldl_cpsSum            6.2 μs  6.1 μs   +0%    5.1 KB  2.5 KB  -49%
folds:dense.foldl_elem              5.6 μs  5.6 μs   +0%    2.0 KB  2.0 KB   +0%
folds:dense.foldl_traverseSum       5.9 μs  5.8 μs   -2%    5.1 KB  2.6 KB  -49%
folds:dense.foldr'_maximum          6.1 μs  5.9 μs   -4%    2.1 KB  2.1 KB   +0%
folds:dense.foldr'_sum              5.5 μs  5.5 μs   +1%     42 B    42 B    +0%
folds:dense.foldr_cpsOneShotSum     5.6 μs  5.6 μs   +0%    2.5 KB  2.5 KB   +0%
folds:dense.foldr_cpsSum            5.8 μs  5.6 μs   -4%    5.1 KB  2.5 KB  -49%
folds:dense.foldr_elem              4.8 μs  4.8 μs   +0%    2.0 KB  2.0 KB   +0%
folds:dense.foldr_traverseSum       4.8 μs  4.6 μs   -4%    5.1 KB  2.6 KB  -49%
folds:sparse.foldMap_elem            18 μs   19 μs   +7%     33 B    33 B    +0%
folds:sparse.foldMap_traverseSum     19 μs   14 μs  -28%     64 KB   49 B   -99%
folds:sparse.foldl'_maximum          34 μs   23 μs  -31%    128 KB  128 KB   +0%
folds:sparse.foldl'_sum              14 μs   14 μs   +1%     44 B    44 B    +0%
folds:sparse.foldl_cpsOneShotSum     38 μs   35 μs   -8%    160 KB  160 KB   +0%
folds:sparse.foldl_cpsSum            60 μs   34 μs  -42%    320 KB  160 KB  -49%
folds:sparse.foldl_elem              29 μs   30 μs   +1%    128 KB  128 KB   +0%
folds:sparse.foldl_traverseSum       54 μs   34 μs  -36%    320 KB  160 KB  -49%
folds:sparse.foldr'_maximum          40 μs   30 μs  -24%     75 B    64 KB  +87346%
folds:sparse.foldr'_sum              24 μs   24 μs   +0%     49 B    49 B    +0%
folds:sparse.foldr_cpsOneShotSum     29 μs   25 μs  -15%    160 KB  160 KB   +0%
folds:sparse.foldr_cpsSum            54 μs   25 μs  -53%    320 KB  160 KB  -50%
folds:sparse.foldr_elem              20 μs   20 μs   +2%    128 KB  128 KB   +0%
folds:sparse.foldr_traverseSum       45 μs   24 μs  -46%    320 KB  160 KB  -49%

The maximum benchmarks are new. I was not expecting this, but it also benefits cpsSum and traverseSum benchmarks.

Move the Nil branch to the top-level and error on Nil in go.
The Note [IntMap folds] added in Data.IntMap.Internal explains the
details.

For the "maximum" benchmarks added here, the time improves by ~40% for
IntMap and ~30% for sparse IntSets. "traverseSum" and "cpsSum" benchmarks
also improve by 30-60% for IntMaps and sparse IntSets. Dense IntSets are
barely affected.
@meooow25 meooow25 merged commit 17d354f into haskell:master Jun 21, 2025
14 checks passed
@meooow25 meooow25 deleted the intmap-folds branch June 21, 2025 09:26
@meooow25
Copy link
Contributor Author

meooow25 commented Jun 21, 2025

Apparently SpecConstr does not trigger for IntSet in the maximum benchmarks like it does for IntMap. I should have checked to make sure. But I don't understand why, so I've opened GHC #26141 about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant