Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaned up version of the fallback batch type #100

Merged
merged 7 commits into from
May 25, 2018
Merged

Cleaned up version of the fallback batch type #100

merged 7 commits into from
May 25, 2018

Conversation

HadrienG2
Copy link
Contributor

@HadrienG2 HadrienG2 commented May 3, 2018

This MR strives to be a hack-free version of #88 . It is mostly a rewrite, using the existing branch as inspiration when things go wrong. The end goal is to get a fallback version of batch<T, N> which compiles down to scalar loops, that may or may not be autovectorized by the compiler.

While I was at it, I also cleaned a couple of minor things along the way, such as the inability to print or query batch_bool.

Current status:

  • Write the "type" header
  • Rework the bitwise_cast API (in a backwards-compatible way) to allow genericity over vector length
  • Generalize the math functions to arbitrary vector lengths
  • Integrate the fallback in the benchmarks
  • Integrate it in the tests
  • Get the thing to compile, link, run, and pass tests :)
  • Resolve memory corruption in xbenchmark
  • Check that there is no non-fallback performance regression
  • Add a way to disable the fallback or detect if it is used
  • Discuss what should be done with stuff in the detail namespace
  • Discuss how to best document this
  • Discuss running CI with and without the fallback

@HadrienG2 HadrienG2 changed the title WIP: Cleaned up version of the fallback SIMD vector type WIP: Cleaned up version of the fallback batch type May 3, 2018
@HadrienG2 HadrienG2 changed the title WIP: Cleaned up version of the fallback batch type Cleaned up version of the fallback batch type May 3, 2018
@HadrienG2
Copy link
Contributor Author

Alright, I think it's time to un-WIP this! There are two points that we should discuss at some point, either as part of this MR or as follow-up work:

  • The utilities in the fallback's detail namespace are quite general in nature, and could probably be put at a more general scope (xtl?) or be replaced with other similar utilities in the QuantStack codebase. What do you think about this matter?
  • We should probably have CI configurations for both ENABLE_FALLBACK=ON and ENABLE_FALLBACK=OFF. If we only do the former, we don't test that the core library does not depend on the fallback. If we only do the latter, we don't test that the fallback itself is working :)

@HadrienG2
Copy link
Contributor Author

Oh, and another question: in order to avoid accidental slowdowns, I have decided to make the scalar fallback only available on demand, via the XSIMD_ENABLE_FALLBACK preprocessor define. Where should I document this (and the rest of the feature)?

@wolfv
Copy link
Member

wolfv commented May 7, 2018

How about adding an entry to the Wrapper Types docs section with "fallback batch" as a title or something similar, that also documents all the drawbacks? I think that could also be the place to document how to enable it ...

Btw. it might turn out to be super-useful work here, once we have to support the SVE instructions we can already be sure that our algos work for any vector length! :)

@wolfv
Copy link
Member

wolfv commented May 7, 2018

Also, I just had to implement a no-op bitwise cast (e.g. bitwise_cast<batch<int, 8>>(batch<int, 8>& ...) so that I just need to do one implementation for the select/interleave/ ... stuff.
That should be easy with your new framework, right?

@HadrienG2
Copy link
Contributor Author

HadrienG2 commented May 9, 2018

@wolfv I added a documentation of the scalar fallback in 0b18fc4 .

Regarding no-op bitwise_cast, I think the new framework should make it quite easy. One possibility would be to implement a variant of the XSIMD_BITWISE_CAST_INTRINSIC macro which constrains the input and output (T, N) to be the same and which is implemented as a no-op. Here is an example:

    // Shorthand for defining a no-op bitwise_cast implementation
    #define XSIMD_BITWISE_CAST_IDENTITY(T, N)  \
        template <>                                                            \
        struct bitwise_cast_impl<batch<T, N>, batch<T, N>>       \
        {                                                                      \
            static inline batch<T, N> run(const batch<T, N>& x)  \
            {                                                                  \
                return x;                                           \
            }                                                                  \
        };

I'm not sure if the C++ specialization rules would allow you to go one step further and implement this as a template, like so:

    template <class T, std::size_t N>
    struct bitwise_cast_impl<batch<T, N>, batch<T, N>>
    {
        static batch<T, N> run(const batch<T, N>& x) {
            return x;
        }
    };

My main worry is that this specialization could be considered to collide with the more general implementation provided by the fallback:

template <class T_in, class T_out, std::size_t N_in>
struct bitwise_cast_impl<batch<T_in, N_in>,
                         batch<T_out, sizeof(T_in)*N_in/sizeof(T_out)>>;

I'm not sure if that is the case or not, I would need to check the C++ specialization rules in order to figure it out.

EDIT: After checking the relevant rules, I think the second solution should work (and compile more efficiently due to reduced code bloat).

@@ -108,7 +110,7 @@ namespace xsimd
for (std::size_t count = 0; count < number; ++count)
{
auto start = std::chrono::steady_clock::now();
for (std::size_t i = 0; i < s; i += B::size)
for (std::size_t i = 0; i <= (s - B::size); i += B::size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keeping i < s ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so assumes that s is a multiple of B::size. This is not true when using an odd, non-power-of-two batch size, which I am currently doing in order to make sure that the fallback will be used (remember that the hardware types are a specialization of the fallback).

If you have a cleaner idea for forcing use of the fallback, I am interested :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, it just I didn't get it so my question ;) Thanks for the explanation.

@@ -130,7 +132,7 @@ namespace xsimd
for (std::size_t count = 0; count < number; ++count)
{
auto start = std::chrono::steady_clock::now();
for (std::size_t i = 0; i < s; i += inc)
for (std::size_t i = 0; i <= (s - inc); i += inc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer :)


/**************************
* bitwise cast functions *
**************************/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should go after the declaration of the generic batch operators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Backwards-compatible interface to bitwise_cast_impl
template <class B, std::size_t N = simd_batch_traits<B>::size>
B bitwise_cast(const batch<float, N>& x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style: it would be nicer to split the declaration and the definitions of these casting functions, just like it has been done for operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Tools for reinterpreting stuff as an unsigned integer
template <typename T>
struct as_unsigned;
Copy link
Member

@JohanMabille JohanMabille May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these structures are already defined in xsimd/types/xsimd_utils.hpp (see as_integer and as_unsigned_integer), so it would be better to move some of these definitions there (and the names should be unified).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JohanMabille
Copy link
Member

JohanMabille commented May 22, 2018

@HadrienG2 thanks for this very involved work!

In addition to my comments, could you reduce the amount of commits (by squashing) in order to make the history tree cleaner? This works deserves definitely more than one commit (so no need to squash everything), but the commits may be gathered by feature / theme.

Besides, regarding the stuff in the namespace detail, we'll move to C++14 in the next backward incompatible release of xsimd. So we can keep the definition of integer_sequence for now and remove it later. The definition of is_array_initializer should be available in the xtl too, however I'm not sure that we want to make xsimd depend on xtl just for that metafunction. So it should be moved to xsimd_utils.hpp for now.

@JohanMabille
Copy link
Member

Regarding the CI, we should definitely add a configuration with fallback enabled.

The introduction of a generic fallback requires a revamp of the
bitwise_cast infrastructure because we effectively need a generic
bitwise_cast implementation which is specialized by hardware types.
This requires generalizing the boilerplate generators, which are cleaned
up along the way to make their interface more consistent.
- Allow fallback batches to be created from a list of scalars
- Merge implementation details into xsimd_utils.hpp
- Use default constructor instead of 0 in broadcasting array constructor
- Turn to_unsigned and from_unsigned into functions
@HadrienG2
Copy link
Contributor Author

@JohanMabille I squashed the history and moved most of the implementation details to xsimd_utils.hpp. What do you think?

@HadrienG2
Copy link
Contributor Author

Also, do you have any suggestions concerning CI changes?

It would be expensive to run each build configuration with and without the fallback enabled, so I assume that we should pick one configuration (or a couple of them) and build/test it with and without fallback. But which configuration(s) should that be?

@JohanMabille
Copy link
Member

JohanMabille commented May 24, 2018

Indeed my idea was to pick a gcc and a neon configuration to test the fallback (we can also add a configuration on appveyor). So let's test fallback on gcc6 and the last clang 3.9 used for neon.

EDIT: we can activate the CI in a separate PR.

@JohanMabille JohanMabille merged commit 97da31c into xtensor-stack:master May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants