Cleaned up version of the fallback batch type #100

HadrienG2 · 2018-05-03T10:35:25Z

This MR strives to be a hack-free version of #88 . It is mostly a rewrite, using the existing branch as inspiration when things go wrong. The end goal is to get a fallback version of batch<T, N> which compiles down to scalar loops, that may or may not be autovectorized by the compiler.

While I was at it, I also cleaned a couple of minor things along the way, such as the inability to print or query batch_bool.

Current status:

HadrienG2 · 2018-05-03T23:45:18Z

Alright, I think it's time to un-WIP this! There are two points that we should discuss at some point, either as part of this MR or as follow-up work:

The utilities in the fallback's detail namespace are quite general in nature, and could probably be put at a more general scope (xtl?) or be replaced with other similar utilities in the QuantStack codebase. What do you think about this matter?
We should probably have CI configurations for both ENABLE_FALLBACK=ON and ENABLE_FALLBACK=OFF. If we only do the former, we don't test that the core library does not depend on the fallback. If we only do the latter, we don't test that the fallback itself is working :)

HadrienG2 · 2018-05-07T14:28:11Z

Oh, and another question: in order to avoid accidental slowdowns, I have decided to make the scalar fallback only available on demand, via the XSIMD_ENABLE_FALLBACK preprocessor define. Where should I document this (and the rest of the feature)?

wolfv · 2018-05-07T14:38:57Z

How about adding an entry to the Wrapper Types docs section with "fallback batch" as a title or something similar, that also documents all the drawbacks? I think that could also be the place to document how to enable it ...

Btw. it might turn out to be super-useful work here, once we have to support the SVE instructions we can already be sure that our algos work for any vector length! :)

wolfv · 2018-05-07T14:41:43Z

Also, I just had to implement a no-op bitwise cast (e.g. bitwise_cast<batch<int, 8>>(batch<int, 8>& ...) so that I just need to do one implementation for the select/interleave/ ... stuff.
That should be easy with your new framework, right?

HadrienG2 · 2018-05-09T09:49:55Z

@wolfv I added a documentation of the scalar fallback in 0b18fc4 .

Regarding no-op bitwise_cast, I think the new framework should make it quite easy. One possibility would be to implement a variant of the XSIMD_BITWISE_CAST_INTRINSIC macro which constrains the input and output (T, N) to be the same and which is implemented as a no-op. Here is an example:

    // Shorthand for defining a no-op bitwise_cast implementation
    #define XSIMD_BITWISE_CAST_IDENTITY(T, N)  \
        template <>                                                            \
        struct bitwise_cast_impl<batch<T, N>, batch<T, N>>       \
        {                                                                      \
            static inline batch<T, N> run(const batch<T, N>& x)  \
            {                                                                  \
                return x;                                           \
            }                                                                  \
        };

I'm not sure if the C++ specialization rules would allow you to go one step further and implement this as a template, like so:

    template <class T, std::size_t N>
    struct bitwise_cast_impl<batch<T, N>, batch<T, N>>
    {
        static batch<T, N> run(const batch<T, N>& x) {
            return x;
        }
    };

My main worry is that this specialization could be considered to collide with the more general implementation provided by the fallback:

template <class T_in, class T_out, std::size_t N_in>
struct bitwise_cast_impl<batch<T_in, N_in>,
                         batch<T_out, sizeof(T_in)*N_in/sizeof(T_out)>>;

I'm not sure if that is the case or not, I would need to check the C++ specialization rules in order to figure it out.

EDIT: After checking the relevant rules, I think the second solution should work (and compile more efficiently due to reduced code bloat).

JohanMabille · 2018-05-22T08:16:05Z

benchmark/xsimd_benchmark.hpp

@@ -108,7 +110,7 @@ namespace xsimd
        for (std::size_t count = 0; count < number; ++count)
        {
            auto start = std::chrono::steady_clock::now();
-            for (std::size_t i = 0; i < s; i += B::size)
+            for (std::size_t i = 0; i <= (s - B::size); i += B::size)


Why not keeping i < s ?

Doing so assumes that s is a multiple of B::size. This is not true when using an odd, non-power-of-two batch size, which I am currently doing in order to make sure that the fallback will be used (remember that the hardware types are a specialization of the fallback).

If you have a cleaner idea for forcing use of the fallback, I am interested :)

I don't, it just I didn't get it so my question ;) Thanks for the explanation.

JohanMabille · 2018-05-22T08:17:41Z

benchmark/xsimd_benchmark.hpp

@@ -130,7 +132,7 @@ namespace xsimd
        for (std::size_t count = 0; count < number; ++count)
        {
            auto start = std::chrono::steady_clock::now();
-            for (std::size_t i = 0; i < s; i += inc)
+            for (std::size_t i = 0; i <= (s - inc); i += inc)


Same question here

Same answer :)

JohanMabille · 2018-05-22T08:27:49Z

include/xsimd/types/xsimd_base.hpp

+
+    /**************************
+     * bitwise cast functions *
+     **************************/


This section should go after the declaration of the generic batch operators

JohanMabille · 2018-05-22T08:29:09Z

include/xsimd/types/xsimd_base.hpp

+
+    // Backwards-compatible interface to bitwise_cast_impl
+    template <class B, std::size_t N = simd_batch_traits<B>::size>
+    B bitwise_cast(const batch<float, N>& x)


coding style: it would be nicer to split the declaration and the definitions of these casting functions, just like it has been done for operators.

JohanMabille · 2018-05-22T10:11:06Z

include/xsimd/types/xsimd_fallback.hpp

+
+        // Tools for reinterpreting stuff as an unsigned integer
+        template <typename T>
+        struct as_unsigned;


Some of these structures are already defined in xsimd/types/xsimd_utils.hpp (see as_integer and as_unsigned_integer), so it would be better to move some of these definitions there (and the names should be unified).

JohanMabille · 2018-05-22T10:35:59Z

@HadrienG2 thanks for this very involved work!

In addition to my comments, could you reduce the amount of commits (by squashing) in order to make the history tree cleaner? This works deserves definitely more than one commit (so no need to squash everything), but the commits may be gathered by feature / theme.

Besides, regarding the stuff in the namespace detail, we'll move to C++14 in the next backward incompatible release of xsimd. So we can keep the definition of integer_sequence for now and remove it later. The definition of is_array_initializer should be available in the xtl too, however I'm not sure that we want to make xsimd depend on xtl just for that metafunction. So it should be moved to xsimd_utils.hpp for now.

JohanMabille · 2018-05-23T08:16:25Z

Regarding the CI, we should definitely add a configuration with fallback enabled.

The introduction of a generic fallback requires a revamp of the bitwise_cast infrastructure because we effectively need a generic bitwise_cast implementation which is specialized by hardware types.

This requires generalizing the boilerplate generators, which are cleaned up along the way to make their interface more consistent.

- Allow fallback batches to be created from a list of scalars - Merge implementation details into xsimd_utils.hpp - Use default constructor instead of 0 in broadcasting array constructor - Turn to_unsigned and from_unsigned into functions

HadrienG2 · 2018-05-24T16:26:16Z

@JohanMabille I squashed the history and moved most of the implementation details to xsimd_utils.hpp. What do you think?

HadrienG2 · 2018-05-24T16:33:25Z

Also, do you have any suggestions concerning CI changes?

It would be expensive to run each build configuration with and without the fallback enabled, so I assume that we should pick one configuration (or a couple of them) and build/test it with and without fallback. But which configuration(s) should that be?

JohanMabille · 2018-05-24T20:09:14Z

Indeed my idea was to pick a gcc and a neon configuration to test the fallback (we can also add a configuration on appveyor). So let's test fallback on gcc6 and the last clang 3.9 used for neon.

EDIT: we can activate the CI in a separate PR.

HadrienG2 changed the title ~~WIP: Cleaned up version of the fallback SIMD vector type~~ WIP: Cleaned up version of the fallback batch type May 3, 2018

HadrienG2 changed the title ~~WIP: Cleaned up version of the fallback batch type~~ Cleaned up version of the fallback batch type May 3, 2018

HadrienG2 mentioned this pull request May 4, 2018

Unit tests should exercise constructors #103

Closed

JohanMabille requested changes May 22, 2018

View reviewed changes

HadrienG2 added 7 commits May 24, 2018 18:16

First draft of a fallback xsimd implementation

5b390a1

The introduction of a generic fallback requires a revamp of the bitwise_cast infrastructure because we effectively need a generic bitwise_cast implementation which is specialized by hardware types.

Implement missing load/store operations

57e0329

Add fallback rounding functions

9f63450

This requires generalizing the boilerplate generators, which are cleaned up along the way to make their interface more consistent.

Add debug printout for batch_bool

fbb526b

Add unit tests for the fallback implementation

9ddaaba

Only enable the fallback on user request

f4cc5da

JohanMabille approved these changes May 24, 2018

View reviewed changes

JohanMabille merged commit 97da31c into xtensor-stack:master May 25, 2018

JohanMabille mentioned this pull request May 25, 2018

WIP Adding a shim type (batch<T, N>) #88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaned up version of the fallback batch type #100

Cleaned up version of the fallback batch type #100

HadrienG2 commented May 3, 2018 •

edited

Loading

HadrienG2 commented May 3, 2018

HadrienG2 commented May 7, 2018

wolfv commented May 7, 2018

wolfv commented May 7, 2018

HadrienG2 commented May 9, 2018 •

edited

Loading

JohanMabille May 22, 2018

HadrienG2 May 24, 2018

JohanMabille May 24, 2018

JohanMabille May 22, 2018

HadrienG2 May 24, 2018

JohanMabille May 22, 2018

HadrienG2 May 24, 2018

JohanMabille May 22, 2018

HadrienG2 May 24, 2018

JohanMabille May 22, 2018 •

edited

Loading

HadrienG2 May 24, 2018

JohanMabille commented May 22, 2018 •

edited

Loading

JohanMabille commented May 23, 2018

HadrienG2 commented May 24, 2018

HadrienG2 commented May 24, 2018

JohanMabille commented May 24, 2018 •

edited

Loading

Cleaned up version of the fallback batch type #100

Cleaned up version of the fallback batch type #100

Conversation

HadrienG2 commented May 3, 2018 • edited Loading

HadrienG2 commented May 3, 2018

HadrienG2 commented May 7, 2018

wolfv commented May 7, 2018

wolfv commented May 7, 2018

HadrienG2 commented May 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanMabille May 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohanMabille commented May 22, 2018 • edited Loading

JohanMabille commented May 23, 2018

HadrienG2 commented May 24, 2018

HadrienG2 commented May 24, 2018

JohanMabille commented May 24, 2018 • edited Loading

HadrienG2 commented May 3, 2018 •

edited

Loading

HadrienG2 commented May 9, 2018 •

edited

Loading

JohanMabille May 22, 2018 •

edited

Loading

JohanMabille commented May 22, 2018 •

edited

Loading

JohanMabille commented May 24, 2018 •

edited

Loading