Skip to content

Fix template argument order for batch_bool loads#1181

Merged
serge-sans-paille merged 2 commits into
xtensor-stack:masterfrom
kalenedrael:booldispatch
Nov 3, 2025
Merged

Fix template argument order for batch_bool loads#1181
serge-sans-paille merged 2 commits into
xtensor-stack:masterfrom
kalenedrael:booldispatch

Conversation

@kalenedrael
Copy link
Copy Markdown
Contributor

batch_bool<T, A>::load_(un)aligned dispatches to kernel::load_(un)aligned<A>, which requires that the architecture be the first template parameter.

@serge-sans-paille
Copy link
Copy Markdown
Contributor

mmmh, it means that it was mostly untested, would you mind adding some tests in test/test_batch.cpp ?

@kalenedrael
Copy link
Copy Markdown
Contributor Author

I'm not really sure how to test this. We need to check that the "highest" overload is called for a given architecture, but I don't know how to do that without just checking each overload and architecture individually.

@DiamonDinoia
Copy link
Copy Markdown
Contributor

Hi @kalenedrael,

Just my two cents, feel free to ignore if not accurate.
If this works as the other tests it is enough to call batch_bool<T, A>::load_(un)aligned with the T, A provided by the test class and verify the result against reference answers. Then these tests are compiled for a wide variety of archs in CI with means for that for sse2 only the sse2 overload exsist and only that arch is tested. For avx, it calles the highest overload as you said and so avx is tested even if sse is available and so on.

Cheers,
Marco

@kalenedrael
Copy link
Copy Markdown
Contributor Author

The test coverage for batch_bool<T, A>::load_(un)aligned seems reasonably complete. The problem is that because the template parameters are swapped, the compiler ignores the architecture-specific overloads for AVX2 and Neon due to incompatible types and falls back to the common implementation. It still compiles and isn't expected to cause any test failures; I only noticed because I was using batch_bool and performance was much worse than I expected.

The dispatching is what I don't know how to test. In general, it seems like you more or less have to hard-code the expected overloads for each architecture, and find a way to disable the other ones so they can't be used as a fallback. This is both convoluted and the worst kind of test, because it's essentially copying the code itself.

`batch_bool<T, A>::load_(un)aligned` dispatches to
`kernel::load_(un)aligned<A>`, which requires that the architecture be
the first template parameter.
@DiamonDinoia
Copy link
Copy Markdown
Contributor

Oh yes, that would be hard to test. My previous suggestion missed the point. I apologize.

Then I guess the only way to make this impossible is to either use static_asserts i.e std::is_base_of or sfinae on the common implementation making sure that the type are what they should be.

@serge-sans-paille serge-sans-paille merged commit 21d9634 into xtensor-stack:master Nov 3, 2025
57 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants