Skip to content

Parallel and more aggressive strided assigner#2660

Merged
JohanMabille merged 33 commits intoxtensor-stack:masterfrom
ewoudwempe:strided_assign_improvements
Mar 17, 2023
Merged

Parallel and more aggressive strided assigner#2660
JohanMabille merged 33 commits intoxtensor-stack:masterfrom
ewoudwempe:strided_assign_improvements

Conversation

@ewoudwempe
Copy link
Copy Markdown
Contributor

@ewoudwempe ewoudwempe commented Feb 27, 2023

Checklist

  • The title and commit message(s) are descriptive.
  • Small commits made to fix your PR have been squashed to avoid history pollution.
  • Tests have been added for new features or bug fixes.
  • API of new functions and classes are documented.

Description

I needed some faster strided assigner, and continued the work of @starboerg, who made a first working version (for OPENMP) in
#1973. I fixed a few bugs, made it work a bit more generally, and furthermore refactored things to make it testable. @starboerg made a first the initial working version (for OPENMP), thanks! @starboerg, I haven't asked, but hope you're okay with this PR as well.
The major changes

  • Add the parallel strided assigner
  • Use TBB and OPENMP in the benchmarking build configuration
  • Add a benchmark for stencil-like operations (partly from OpenMP parallelisation of strided assigner #1973)
  • Make is_contiguous check work for broadcasted containers
  • Specifically check the strides to decide whether to use the strided assigner and which direction (row-major and column-major), so that layout_type::dynamic can be supported with the strided assigner.
  • Add some tests verifying that the expected assigner gets used

Can you let me know what needs to be done for this to be merged?

Some quick benchmarks of doing stencil-like operators are (run benchmark/benchmark_xtensor --benchmark_filter=stencil_)

Test CPU (old) CPU (new) Speedup
stencil_threedirections/stencil_threedirections_50 2489376 189548 13.13322219
stencil_threedirections/stencil_threedirections_100 20367264 1076572 18.9186269
stencil_threedirections/stencil_threedirections_200 163588321 7356985 22.23578286
stencil_threedirections/stencil_threedirections_300 554954006 24373148 22.76907382
stencil_threedirections/stencil_threedirections_500 2689852264 112058541 24.00399149
stencil_twodirections/stencil_twodirections_50 1653291 28051 58.93875441
stencil_twodirections/stencil_twodirections_100 15497451 2734497 5.66738636
stencil_twodirections/stencil_twodirections_200 122158662 10555985 11.57245506
stencil_twodirections/stencil_twodirections_300 419578316 33222766 12.62924092
stencil_twodirections/stencil_twodirections_500 1952272879 121129696 16.11721108
stencil_onedirection/stencil_onedirections_50 875264 17200 50.88744186
stencil_onedirection/stencil_onedirections_100 8649378 2504468 3.453578964
stencil_onedirection/stencil_onedirections_200 68331792 10678499 6.399007201
stencil_onedirection/stencil_onedirections_300 222922728 33437123 6.666923108
stencil_onedirection/stencil_onedirections_500 1056342901 120677104 8.753465786

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Feb 27, 2023

Wow, very impressive work. Quick question regarding the benchmarks -- is the CPU(new) single threaded or multithreaded using OpenMP / TBB?

@ewoudwempe
Copy link
Copy Markdown
Contributor Author

Thank you :)
It's with using multithreading (TBB), although to be honest the multithreading only helps for a certain array size, I guess because this benchmark is memory bound after some point. Here's a more complete benchmark, with on the x-axis the size of the 3d-array on each side, and the y-axis the CPU time needed for a stencil-like computation.
bench

@starboerg
Copy link
Copy Markdown
Contributor

starboerg commented Feb 28, 2023

Thanks @ewoudwempe for picking up my work and feeding it back to xtensor. I am really happy to see this landing in xtensor.

Apparently you also solved the remaining issue that the strided assigner is properly picked for 'xview' stencils. This issue kept me back from working on a PR. In particular, the xview degraded the layout to dynamic for sub-volumes that are not contiguous in all dimensions, though the strided assigner just requires the first dimension to be contiguous.

Good work, and the benchmarks might look even better on HPC hardware with multiple memory controllers and NUMA's first-touch policy (if initialized properly).

@tdegeus
Copy link
Copy Markdown
Member

tdegeus commented Feb 28, 2023

I just scrolled passed this. Very impressive! A preliminary huge thanks for taking the effort!

@JohanMabille
Copy link
Copy Markdown
Member

Wow very impressive, thanks for the hard work! I will review it in the next few days, hopefully we can get it in the next release!

@JohanMabille JohanMabille self-requested a review March 14, 2023 21:10
@tdegeus
Copy link
Copy Markdown
Member

tdegeus commented Mar 16, 2023

If we don't manage to review this before next Friday, please ping us. Indeed it would be nice to get it in the next release.

@ewoudwempe
Copy link
Copy Markdown
Contributor Author

Thanks, I'd be happy to see this in soon as well!
And @starboerg, thanks! Yes, the original code did not use the strided assigner when the layout was dynamic. But since apart from xview there are more cases where a dynamic layout can still have a strided assign (in my case, xadapt with strides, but I suspect also xstrided_view), to me the easiest thing seemed to remove the restriction that dynamic layouts use the fallback assigner from the assigning code. In practice that means that the strides get tested explicitly for all types of input.

Copy link
Copy Markdown
Member

@JohanMabille JohanMabille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neat implementation, thanks for this. I think we could mutualise the parts of the code about the index minpulation (we do similar things in the stepper tools, but a bit differently), but this should be done in a dedicated PR.

@JohanMabille JohanMabille merged commit c51af85 into xtensor-stack:master Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants