Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider the use of inheritance for compile time constants in SYCL backend parallel-for bricks #2023

Open
mmichel11 opened this issue Jan 27, 2025 · 1 comment
Assignees
Milestone

Comments

@mmichel11
Copy link
Contributor

@MikeDvorskiy made a very good design suggestion in the review of #1976 in the following comment: https://github.com/uxlfoundation/oneDPL/pull/1976/files#r1929054989. Due to the closeness of the 2022.8.0 milestone and amount of change this entails, I suggested we defer this potential design change as a refactoring in the next milestone if it proves beneficial which was agreed upon offline.

For context, the PR introduces the use of two tuning classes, walk_scalar_base and walk_vector_or_scalar_base, which define the following compile-time parameters, __can_vectorize, __preferred_iters_per_item, __preferred_vector_size based on a set of ranges provided through the class' template parameters. These compile-time parameters are queried by the parallel-for implementation to define strides and access patterns for use in the kernel. Individual bricks inherit from these base classes and provide a set of ranges, so they do not need to manually define these tuning parameters themselves.

This design approach has a downside of requiring that the types of the underlying ranges are provided through the class-template when they were previously deduced from the function call operator. This results in more code at the hetero pattern level as more template parameters must be provided by the caller. For instance, __pattern_walk_3's call to parallel for has changed from this:

    auto __future =
        oneapi::dpl::__par_backend_hetero::__parallel_for(_BackendTag{}, ::std::forward<_ExecutionPolicy>(__exec),
                                                          unseq_backend::walk_n<_ExecutionPolicy, _Function>{__f}, __n,
                                                          __buf1.all_view(), __buf2.all_view(), __buf3.all_view());

to

    auto __future = oneapi::dpl::__par_backend_hetero::__parallel_for(
        _BackendTag{}, std::forward<_ExecutionPolicy>(__exec),
        unseq_backend::walk3_vectors_or_scalars<_ExecutionPolicy, _Function, decltype(__buf1.all_view()),
                                                decltype(__buf2.all_view()), decltype(__buf3.all_view())>{
            __f, static_cast<size_t>(__n)},
        __n, __buf1.all_view(), __buf2.all_view(), __buf3.all_view());

which results in more code and decreased readability.

The proposed suggestion would remove the inheritance relationship entirely and instead make the members __can_vectorize, __preferred_iters_per_item, and __preferred_vector_size templated which would remove the need to pass range types through class templates. The effect on the parallel-for implementation would be calls such as:

auto [__idx, __stride, __is_full] = __stride_recommender( 
	     __item, __count, __iters_per_work_item, _Fp::__preferred_vector_size, __work_group_size); 

would simply need to be changed to:

auto [__idx, __stride, __is_full] = __stride_recommender( 
	     __item, __count, __iters_per_work_item, _Fp::__preferred_vector_size<_Ranges...>, __work_group_size); 

This design is easily implementable through variable templates, which could still make use of the walk_vector_or_scalar_base and walk_scalar_base classes just with no inheritance relationship. Bricks would need to define these members themselves, such as:

template <typename... _Rngs>
constexpr static std::uint8_t __preferred_vector_size = walk_vector_or_scalar_base<_Rngs...>::__preferred_vector_size;

The only downside I see of this approach would be duplicated member definitions across some bricks, but I think it would be simpler than the current approach and is certainly cleaner from the caller side.

The design suggestion made should be further explored and adopted if it results in cleaner, more readable code than the current state. I expect this refactoring work to be rather straightforward and achievable in the next milestone as existing infrastructure only needs to be repurposed.

@mmichel11
Copy link
Contributor Author

mmichel11 commented Mar 5, 2025

I've started an approach to address this in https://github.com/uxlfoundation/oneDPL/tree/dev/mmichel11/pfor_remove_ranges_class_template.

Several changes are made:

  1. The inheritance relationship between each brick and some base class to obtain __can_vectorize, __preferred_iters_per_item, and __preferred_vector_size has been removed.
  2. Each brick defines two boolean constexpr variables: __can_vectorize and __can_process_multiple_iters. These fields are independent of the range type and simply specify whether the brick itself has this capability assuming the ranges meet additional vectorization conditions.
  3. Some new class, __pfor_params, is introduced which the parallel-for implementation uses to define iters-per-item and vector size. The parallel-for implementation queries the two static members mentioned above in the brick and passes them to this parameter class through templates.
  4. An instance of __pfor_params is passed through the strided for-loop utility and into the brick function call overload which the brick uses to determine if vectorization is enabled and the vector size to use. __scalar_path_impl and __vector_path_impl are replaced with two operator() implementations which use SFINAE to select the scalar or vector path.

The approach simplifies creation of brick objects as ranges are no longer passed through class templates, removes the previous implementation of operator() that dispatches to the scalar / vector paths, and condenses the brick implementations by about ~100 lines in total.

@mmichel11 mmichel11 self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant