Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement internal iteration for cartesian_product #92

Merged

Conversation

brycelelbach
Copy link
Collaborator

@brycelelbach brycelelbach commented Jul 11, 2023

This PR implements internal iteration for cartesian_product[_with], which enables compiler vectorization of multidimensional iteration patterns constructed with cartesian_product[_with] (such as this one).

Core tasks:

  • Add internal iteration to cartesian_product.
    • Correct cartesian_product internal iteration to invoke the predicate with a tuple of elements, not a tuple of indices (review feedback).
  • Add internal iteration to cartesian_product_with.
  • Fix cartesian_product[_with] random access indexing.
    • Don't treat cursors as arbitrary numeric types (review feedback).
  • Improve testing for cartesian_product[_with] in general.
    • Test cartesian_product with 3 underlying sequences.
    • Test cartesian_product with less than 2 and more than 3 underlying sequences.
    • Test cartesian_product random access indexing.
    • Test cartesian_product for_each_while short circuiting (review feedback).
    • Test cartesian_product[_with] with for_each[_while].
    • Correct element type tests for cartesian_product of ints to fix MSVC build (review feedback).
  • Add benchmarks and examples for multi-dimensional iteration with cartesian_product and iota.
  • Add descriptive alias for cartesian_product of iota multi-dimensional iteration pattern.
  • Miscellaneous drive-by fixes and improvements.
    • Change STATIC_CHECK to throw instead of returning to improve compile-time diagnostics.
    • Disable word_count example CTest as it will always hang waiting for stdin.

@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Patch coverage: 88.63% and project coverage change: -0.09% ⚠️

Comparison is base (1864f09) 97.67% compared to head (0d3116f) 97.58%.
Report is 3 commits behind head on main.

❗ Current head 0d3116f differs from pull request most recent head abba492. Consider uploading reports for the commit abba492 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #92      +/-   ##
==========================================
- Coverage   97.67%   97.58%   -0.09%     
==========================================
  Files          66       66              
  Lines        2236     2278      +42     
==========================================
+ Hits         2184     2223      +39     
- Misses         52       55       +3     
Files Changed Coverage Δ
include/flux/core/numeric.hpp 83.33% <66.66%> (-6.67%) ⬇️
include/flux/op/cartesian_product.hpp 98.87% <96.87%> (-1.13%) ⬇️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tcbrindle
Copy link
Owner

Thanks very much for the drive-by fixes, they look great!

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Other than that I think this looks like a reasonable approach -- I wish the keep_going variable wasn't necessary at every level, but I can't think of a way to get rid of it, and if it doesn't affect vectorisation then I guess we're okay.

@brycelelbach
Copy link
Collaborator Author

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

@brycelelbach
Copy link
Collaborator Author

Other than that I think this looks like a reasonable approach -- I wish the keep_going variable wasn't necessary at every level, but I can't think of a way to get rid of it, and if it doesn't affect vectorisation then I guess we're okay.

I was a little bit concerned that the control flow of this would cause issues, but it seems to be fine.

The one idea I had for working around this was to not use flux::for_each_while to iterate each of the underlying sequences, however that seems like a gross violation of the basic principles here. If we did that, it would break more deeply nested internal iteration (for example, cartesian_product(filter(...), ...)).

@brycelelbach
Copy link
Collaborator Author

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

Ironically, the unpack test that you added does a for_each on cartesian_product, but it's on ints so it doesn't catch this, and I didn't run it before committing anyways.

@brycelelbach
Copy link
Collaborator Author

brycelelbach commented Jul 12, 2023

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

I'm having trouble writing a test that covers this issue. Any ideas?

EDIT: All the cartesian_product tests currently use things that are convertible to the index types, which is probably why I'm not seeing the failures. I can probably engineer something with a mock type.

example/word_count.cpp Outdated Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
@brycelelbach
Copy link
Collaborator Author

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Tests added in a240dcc, fixed in 02e73b3.

@tcbrindle
Copy link
Owner

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.
I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

I'm having trouble writing a test that covers this issue. Any ideas?

I think what you want is some sequences where the element type and cursor type are distinct and can't be converted to one another? What about something like:

auto seq1 = flux::ints(1).take(3);
auto seq2 = std::array<std::string_view, 2>{"aaa", "bbb"};

auto cart = flux::cartesian_product(seq1, seq2);

auto cur = flux::find_if(cart, flux::unpack([](auto i, auto s) { return i == 2 && s == "bbb"; }));

STATIC_CHECK(std::get<0>(cart[cur]) == 2);
STATIC_CHECK(std::get<1>(cart[cur]) == "bbb");

@brycelelbach brycelelbach force-pushed the cartesian_product_internal_iteration branch from 02e73b3 to d1e7921 Compare July 12, 2023 19:31
include/flux/core/numeric.hpp Outdated Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
test/test_cartesian_product.cpp Outdated Show resolved Hide resolved
benchmark/multidimensional_memset_benchmark_kernels.cpp Outdated Show resolved Hide resolved
@brycelelbach brycelelbach force-pushed the cartesian_product_internal_iteration branch 2 times, most recently from f9fb22c to 0d3116f Compare July 13, 2023 15:56
Copy link
Owner

@tcbrindle tcbrindle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbers on my laptop:

With Clang 16:

relative ns/op op/s err% total benchmark
100.0% 637,413.32 1,568.84 5.8% 0.30 〰️ memset_2d_reference (Unstable with ~43.5 iters. Increase minEpochIterations to e.g. 435)
16.1% 3,953,071.73 252.97 0.8% 1.89 memset_2d_std_cartesian_product_iota
117.2% 543,979.24 1,838.31 1.0% 0.26 memset_2d_flux_cartesian_product_iota
100.0% 434,128.10 2,303.47 2.2% 0.21 memset_diagonal_2d_reference
16.3% 2,659,737.14 375.98 0.3% 1.28 memset_diagonal_2d_std_cartesian_product_iota_filter
100.3% 432,654.76 2,311.31 1.9% 0.21 memset_diagonal_2d_flux_cartesian_product_iota_filter

With GCC 13.1:

relative ns/op op/s err% total benchmark
100.0% 910,021.74 1,098.87 5.9% 0.44 〰️ memset_2d_reference (Unstable with ~43.5 iters. Increase minEpochIterations to e.g. 435)
100.6% 904,318.18 1,105.81 3.2% 0.44 memset_2d_std_cartesian_product_iota
105.0% 866,658.54 1,153.86 1.7% 0.41 memset_2d_flux_cartesian_product_iota
100.0% 585,775.00 1,707.14 1.7% 0.28 memset_diagonal_2d_reference
26.1% 2,246,652.17 445.11 0.5% 1.07 memset_diagonal_2d_std_cartesian_product_iota_filter
34.0% 1,723,046.51 580.37 1.0% 0.83 memset_diagonal_2d_flux_cartesian_product_iota_filter

It looks like GCC isn't able to vectorise the filter case, but Clang is doing great.

include/flux/op/cartesian_product.hpp Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved
`distance_t` instead of `long`, as `distance_t` isn't always `long` (for
example on MSVC).
…s that a

tuple of elements not a tuple of cursors is passed to the `for_each` operation.
…nstead of

a tuple of cursors to the predicate.
brycelelbach and others added 9 commits July 31, 2023 08:48
* Change `cartesian_product::ra_inc` to not treat cursors as numeric types.
* Add new `num::checked_(div|mod)` and associated divide-by-zero policies.
… builds the

tuple in one pass, as suggested by @brevzin.
* Add 2D diagonal memset benchmark, which evaluates the performance of
  the composition of `cartesian_product` of `iota`s and `filter`ing.
* Add a reference implementation of C++23's `cartesian_product`.
* Add comparisons against C++ Standard Library ranges to all of the memset
  benchmarks.
…lementation

details to avoid confusing with `flux::drop`.
Co-authored-by: Tristan Brindle <t.c.brindle@gmail.com>
…or and

`keep_going` variable as parameters instead of returning them.
@brycelelbach brycelelbach force-pushed the cartesian_product_internal_iteration branch from 0d3116f to abba492 Compare July 31, 2023 17:37
include/flux/op/cartesian_product.hpp Show resolved Hide resolved
include/flux/op/cartesian_product.hpp Show resolved Hide resolved
@brycelelbach brycelelbach merged commit d906839 into tcbrindle:main Aug 2, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants