Parallel and more aggressive strided assigner by ewoudwempe · Pull Request #2660 · xtensor-stack/xtensor

ewoudwempe · 2023-02-27T13:14:50Z

Checklist

The title and commit message(s) are descriptive.
Small commits made to fix your PR have been squashed to avoid history pollution.
Tests have been added for new features or bug fixes.
API of new functions and classes are documented.

Description

I needed some faster strided assigner, and continued the work of @starboerg, who made a first working version (for OPENMP) in
#1973. I fixed a few bugs, made it work a bit more generally, and furthermore refactored things to make it testable. @starboerg made a first the initial working version (for OPENMP), thanks! @starboerg, I haven't asked, but hope you're okay with this PR as well.
The major changes

Add the parallel strided assigner
Use TBB and OPENMP in the benchmarking build configuration
Add a benchmark for stencil-like operations (partly from OpenMP parallelisation of strided assigner #1973)
Make is_contiguous check work for broadcasted containers
Specifically check the strides to decide whether to use the strided assigner and which direction (row-major and column-major), so that layout_type::dynamic can be supported with the strided assigner.
Add some tests verifying that the expected assigner gets used

Can you let me know what needs to be done for this to be merged?

Some quick benchmarks of doing stencil-like operators are (run benchmark/benchmark_xtensor --benchmark_filter=stencil_)

Test	CPU (old)	CPU (new)	Speedup
stencil_threedirections/stencil_threedirections_50	2489376	189548	13.13322219
stencil_threedirections/stencil_threedirections_100	20367264	1076572	18.9186269
stencil_threedirections/stencil_threedirections_200	163588321	7356985	22.23578286
stencil_threedirections/stencil_threedirections_300	554954006	24373148	22.76907382
stencil_threedirections/stencil_threedirections_500	2689852264	112058541	24.00399149
stencil_twodirections/stencil_twodirections_50	1653291	28051	58.93875441
stencil_twodirections/stencil_twodirections_100	15497451	2734497	5.66738636
stencil_twodirections/stencil_twodirections_200	122158662	10555985	11.57245506
stencil_twodirections/stencil_twodirections_300	419578316	33222766	12.62924092
stencil_twodirections/stencil_twodirections_500	1952272879	121129696	16.11721108
stencil_onedirection/stencil_onedirections_50	875264	17200	50.88744186
stencil_onedirection/stencil_onedirections_100	8649378	2504468	3.453578964
stencil_onedirection/stencil_onedirections_200	68331792	10678499	6.399007201
stencil_onedirection/stencil_onedirections_300	222922728	33437123	6.666923108
stencil_onedirection/stencil_onedirections_500	1056342901	120677104	8.753465786

wolfv · 2023-02-27T21:29:02Z

Wow, very impressive work. Quick question regarding the benchmarks -- is the CPU(new) single threaded or multithreaded using OpenMP / TBB?

ewoudwempe · 2023-02-28T10:37:14Z

Thank you :)
It's with using multithreading (TBB), although to be honest the multithreading only helps for a certain array size, I guess because this benchmark is memory bound after some point. Here's a more complete benchmark, with on the x-axis the size of the 3d-array on each side, and the y-axis the CPU time needed for a stencil-like computation.

starboerg · 2023-02-28T15:29:12Z

Thanks @ewoudwempe for picking up my work and feeding it back to xtensor. I am really happy to see this landing in xtensor.

Apparently you also solved the remaining issue that the strided assigner is properly picked for 'xview' stencils. This issue kept me back from working on a PR. In particular, the xview degraded the layout to dynamic for sub-volumes that are not contiguous in all dimensions, though the strided assigner just requires the first dimension to be contiguous.

Good work, and the benchmarks might look even better on HPC hardware with multiple memory controllers and NUMA's first-touch policy (if initialized properly).

tdegeus · 2023-02-28T16:18:43Z

I just scrolled passed this. Very impressive! A preliminary huge thanks for taking the effort!

JohanMabille · 2023-03-14T21:09:55Z

Wow very impressive, thanks for the hard work! I will review it in the next few days, hopefully we can get it in the next release!

tdegeus · 2023-03-16T15:00:30Z

If we don't manage to review this before next Friday, please ping us. Indeed it would be nice to get it in the next release.

ewoudwempe · 2023-03-17T09:56:15Z

Thanks, I'd be happy to see this in soon as well!
And @starboerg, thanks! Yes, the original code did not use the strided assigner when the layout was dynamic. But since apart from xview there are more cases where a dynamic layout can still have a strided assign (in my case, xadapt with strides, but I suspect also xstrided_view), to me the easiest thing seemed to remove the restriction that dynamic layouts use the fallback assigner from the assigning code. In practice that means that the strides get tested explicitly for all types of input.

JohanMabille

Really neat implementation, thanks for this. I think we could mutualise the parts of the code about the index minpulation (we do similar things in the stepper tools, but a bit differently), but this should be done in a dedicated PR.

Jörn Starruß and others added 30 commits February 24, 2023 19:28

initial thoughts parallel strided assign

e3ab517

trigger strided assign properly

2087ed0

Fix openMP parallel linear assign

643b601

Fix typo on xassign.hpp

8e75d92

initial thoughts parallel strided assign

63381a8

trigger strided assign properly

01e156e

Fix openMP parallel linear assign

da29eda

Fix typo on xassign.hpp

1531f81

Fix wrong merge

198b385

Try at fixed strided loop assignment

aa9d078

Compiles and maybe works now..

68ba3de

Use static partitioner instead

0a62ded

Remove unused variable

3c4e93a

Make explicitly the steppers in teh lambda instead of a copy

d1209f2

Fix the strided assigner

01c8b4c

Add a test, and fix a few bugs for the strided (parallel) assigner.

e4558a6

Revert some unnecessary changes again

1cc5b59

Add tbb to benchmark

011d843

Fix typo

7c179a5

Fix a few things and refactor to amke things backward-compatible

4c49f91

Use tbb and opnemp also in benchmark

0b3caff

Use the various THRESHOLDs for the strided loops

609d848

Add benchmark for stencil operations

6a9d205

Add two-direction stencil too

4299283

Add more elaborate view test

f66ab3f

Remove some buggy checks

32595af

Fix a few more bugs

df7abfb

run clang-format on xcontainer

b2cea00

Run clang-format on xassign & test_xassign

d84724e

Fix for column layout default

68a5a9e

ewoudwempe added 2 commits February 27, 2023 14:55

Fix openmp version

36483d6

Try to fix the VS2015 test on appveyor

2f678c1

JohanMabille self-requested a review March 14, 2023 21:10

Remove no longer relevant comment and refactor a few lines

3bf9d1b

JohanMabille approved these changes Mar 17, 2023

View reviewed changes

JohanMabille merged commit c51af85 into xtensor-stack:master Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel and more aggressive strided assigner#2660

Parallel and more aggressive strided assigner#2660
JohanMabille merged 33 commits intoxtensor-stack:masterfrom
ewoudwempe:strided_assign_improvements

ewoudwempe commented Feb 27, 2023 •

edited

Loading

Uh oh!

wolfv commented Feb 27, 2023

Uh oh!

ewoudwempe commented Feb 28, 2023

Uh oh!

starboerg commented Feb 28, 2023 •

edited

Loading

Uh oh!

tdegeus commented Feb 28, 2023

Uh oh!

JohanMabille commented Mar 14, 2023

Uh oh!

tdegeus commented Mar 16, 2023

Uh oh!

ewoudwempe commented Mar 17, 2023

Uh oh!

JohanMabille left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ewoudwempe commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Uh oh!

wolfv commented Feb 27, 2023

Uh oh!

ewoudwempe commented Feb 28, 2023

Uh oh!

starboerg commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdegeus commented Feb 28, 2023

Uh oh!

JohanMabille commented Mar 14, 2023

Uh oh!

tdegeus commented Mar 16, 2023

Uh oh!

ewoudwempe commented Mar 17, 2023

Uh oh!

JohanMabille left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ewoudwempe commented Feb 27, 2023 •

edited

Loading

starboerg commented Feb 28, 2023 •

edited

Loading