Unify scalar, batch, and batch_bool in some places#1205
Unify scalar, batch, and batch_bool in some places#1205serge-sans-paille merged 4 commits intoxtensor-stack:masterfrom
batch, and batch_bool in some places#1205Conversation
|
FWIW, the goal is to make it easier to write generic code that works on scalar and batch types. For instance, this enables something like: template <typename VecF, VecI, VecB>
VecF DoStuff(VecF fv, VecI iv, VecB cond) {
using F = typename batch_traits<VecF>::scalar_type;
auto cond_i = iv > 42;
return select(batch_cast<F>(cond_i) & cond, fv + batch_cast<F>(iv), 0.0));
}which can be used as Unifying template <typename Vec>
auto LoadSelect(bool *b, Vec x, Vec y) {
using VecB = typename batch_traits<Vec>::mask_type;
return select(VecB::load_unaligned(b), x, y);
}Tangentially related: I wonder if, in general, |
I like that!
I 'm not quite sure about this one. We currently have a conversion operator from |
|
True. Do you think it's better to allow implicit conversion between |
I'm not a big fan of this either :-/ |
|
I'll remove it from this PR; I think the other features are enough for what I want at the moment. The problem I'm trying to solve is that |
batch, and batch_bool in some places
|
Ok, |
| template <class A, class T, bool... Values> | ||
| XSIMD_INLINE batch_bool<T, A> select(batch_bool_constant<T, A, Values...> const&, batch_bool<T, A> const& true_br, batch_bool<T, A> const& false_br, requires_arch<common>) | ||
| { | ||
| return select<A>(batch_bool<T, A> { Values... }, true_br, false_br, A {}); |
There was a problem hiding this comment.
This seems sub-optimal in the case where we don't have access to a fast select, see https://godbolt.org/z/3GqaxTsbd
There was a problem hiding this comment.
There's no earthly reason GCC should generate different code for the two implementations. Maybe it's hung up on generating andn and refuses to const-prop through it, so it has to copy the mask as well as the inputs. Clang does just fine: https://godbolt.org/z/6766ExWMK
I tested both versions in a loop as well (https://godbolt.org/z/Gxe4E4fK5). GCC uses and and andn with one mask with the current implementation but two masks for your version. It's a tradeoff of one movaps r, r vs using another (architectural) register; which version is better probably depends on the context. I'm happy to change it if you can explain why one option is clearly better than the other, but this feels like a rather brittle optimization.
There was a problem hiding this comment.
you're correct, it's very compiler dependent, and probably also context dependent. Let's keep the code simple, and change later if it creates an issue for someone else.
There was a problem hiding this comment.
I'll use your version as it seems simpler for now, rather than dispatching to the batch<T> version.
|
Would you mind rewriting history to have (at least) one commit for |
`xsimd::batch_traits` with mask type, scalar type, and some properties for scalar, `batch`, and `batch_bool`.
|
Sure thing, that's a good idea. |
|
thanks! 🙇 |
batch_traitsthat unifies traits for scalar,batch, andbatch_bool.batch_boolas its own mask type.select()forbatch_bool.absimplementation.