-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaned up version of the fallback batch type #100
Conversation
Alright, I think it's time to un-WIP this! There are two points that we should discuss at some point, either as part of this MR or as follow-up work:
|
Oh, and another question: in order to avoid accidental slowdowns, I have decided to make the scalar fallback only available on demand, via the XSIMD_ENABLE_FALLBACK preprocessor define. Where should I document this (and the rest of the feature)? |
How about adding an entry to the Wrapper Types docs section with "fallback batch" as a title or something similar, that also documents all the drawbacks? I think that could also be the place to document how to enable it ... Btw. it might turn out to be super-useful work here, once we have to support the SVE instructions we can already be sure that our algos work for any vector length! :) |
Also, I just had to implement a no-op bitwise cast (e.g. |
@wolfv I added a documentation of the scalar fallback in 0b18fc4 . Regarding no-op bitwise_cast, I think the new framework should make it quite easy. One possibility would be to implement a variant of the XSIMD_BITWISE_CAST_INTRINSIC macro which constrains the input and output (T, N) to be the same and which is implemented as a no-op. Here is an example: // Shorthand for defining a no-op bitwise_cast implementation
#define XSIMD_BITWISE_CAST_IDENTITY(T, N) \
template <> \
struct bitwise_cast_impl<batch<T, N>, batch<T, N>> \
{ \
static inline batch<T, N> run(const batch<T, N>& x) \
{ \
return x; \
} \
}; I'm not sure if the C++ specialization rules would allow you to go one step further and implement this as a template, like so: template <class T, std::size_t N>
struct bitwise_cast_impl<batch<T, N>, batch<T, N>>
{
static batch<T, N> run(const batch<T, N>& x) {
return x;
}
}; My main worry is that this specialization could be considered to collide with the more general implementation provided by the fallback: template <class T_in, class T_out, std::size_t N_in>
struct bitwise_cast_impl<batch<T_in, N_in>,
batch<T_out, sizeof(T_in)*N_in/sizeof(T_out)>>; I'm not sure if that is the case or not, I would need to check the C++ specialization rules in order to figure it out. EDIT: After checking the relevant rules, I think the second solution should work (and compile more efficiently due to reduced code bloat). |
@@ -108,7 +110,7 @@ namespace xsimd | |||
for (std::size_t count = 0; count < number; ++count) | |||
{ | |||
auto start = std::chrono::steady_clock::now(); | |||
for (std::size_t i = 0; i < s; i += B::size) | |||
for (std::size_t i = 0; i <= (s - B::size); i += B::size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keeping i < s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing so assumes that s is a multiple of B::size. This is not true when using an odd, non-power-of-two batch size, which I am currently doing in order to make sure that the fallback will be used (remember that the hardware types are a specialization of the fallback).
If you have a cleaner idea for forcing use of the fallback, I am interested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, it just I didn't get it so my question ;) Thanks for the explanation.
@@ -130,7 +132,7 @@ namespace xsimd | |||
for (std::size_t count = 0; count < number; ++count) | |||
{ | |||
auto start = std::chrono::steady_clock::now(); | |||
for (std::size_t i = 0; i < s; i += inc) | |||
for (std::size_t i = 0; i <= (s - inc); i += inc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer :)
include/xsimd/types/xsimd_base.hpp
Outdated
|
||
/************************** | ||
* bitwise cast functions * | ||
**************************/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should go after the declaration of the generic batch operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/xsimd/types/xsimd_base.hpp
Outdated
|
||
// Backwards-compatible interface to bitwise_cast_impl | ||
template <class B, std::size_t N = simd_batch_traits<B>::size> | ||
B bitwise_cast(const batch<float, N>& x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coding style: it would be nicer to split the declaration and the definitions of these casting functions, just like it has been done for operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// Tools for reinterpreting stuff as an unsigned integer | ||
template <typename T> | ||
struct as_unsigned; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these structures are already defined in xsimd/types/xsimd_utils.hpp
(see as_integer
and as_unsigned_integer
), so it would be better to move some of these definitions there (and the names should be unified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@HadrienG2 thanks for this very involved work! In addition to my comments, could you reduce the amount of commits (by squashing) in order to make the history tree cleaner? This works deserves definitely more than one commit (so no need to squash everything), but the commits may be gathered by feature / theme. Besides, regarding the stuff in the namespace |
Regarding the CI, we should definitely add a configuration with fallback enabled. |
The introduction of a generic fallback requires a revamp of the bitwise_cast infrastructure because we effectively need a generic bitwise_cast implementation which is specialized by hardware types.
This requires generalizing the boilerplate generators, which are cleaned up along the way to make their interface more consistent.
- Allow fallback batches to be created from a list of scalars - Merge implementation details into xsimd_utils.hpp - Use default constructor instead of 0 in broadcasting array constructor - Turn to_unsigned and from_unsigned into functions
@JohanMabille I squashed the history and moved most of the implementation details to xsimd_utils.hpp. What do you think? |
Also, do you have any suggestions concerning CI changes? It would be expensive to run each build configuration with and without the fallback enabled, so I assume that we should pick one configuration (or a couple of them) and build/test it with and without fallback. But which configuration(s) should that be? |
Indeed my idea was to pick a gcc and a neon configuration to test the fallback (we can also add a configuration on appveyor). So let's test fallback on gcc6 and the last clang 3.9 used for neon. EDIT: we can activate the CI in a separate PR. |
This MR strives to be a hack-free version of #88 . It is mostly a rewrite, using the existing branch as inspiration when things go wrong. The end goal is to get a fallback version of
batch<T, N>
which compiles down to scalar loops, that may or may not be autovectorized by the compiler.While I was at it, I also cleaned a couple of minor things along the way, such as the inability to print or query batch_bool.
Current status: