New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variadic template implementation of thrust::tuple #695
Comments
See also the discussion on the mailing list [1]. [1] https://groups.google.com/d/msg/thrust-users/FKf5n11OuAQ/1tmoQABfqNwJ |
I've begun a variadic branch in my repository. There are numerous tasks that must be accomplished to have a variadic First, the rest of Thrust, needs to be made compatible with the public interface of tuple, and no longer rely on the implementation details of
Then, an equivalent to
I have a question: should
Finally, I want to mention, that in my branch I am simply replacing the non-variadic code. I don't expect that approach to be taken with mainline Thrust. If and when this code is ready to be merged, I would only expect it to be experimental functionality guarded by |
I have fixed all the relevant code in the rest of Thrust to support variadic tuples. I then tried the two approaches to replace non-variadic
I tried various things to define
|
Thrust now has variadic tuple support. The tuple implementation from agency did the trick. You need to define THRUST_VARIADIC_TUPLE to use the variadic implementation. Example:
Gives output:
Known issues:
|
I have updated my variadic branch [1] using the latest non-recursive, variadic tuple code from [2]. Using the new variadic tuple, my code sped up by 50% when using the cpp backend with clang. I encourage other Thrust users to give this a try. You might be pleasantly surprised by a similar performance boost in your code if you use tuple a lot like my code does! Known issues:
[1] https://github.com/andrewcorrigan/thrust-multi-permutation-iterator/tree/variadic |
When will this variadic template be merged into the thrust master branch? |
I've been wondering the same thing for a long time. @jaredhoberock any word on a roadmap for C++11, including variadic tuple, support in Thrust? |
If thrust::tuple was variadic, metaprogramming would be a lot easier. @andrewcorrigan I am currently using your variadic branch, it is really working great. |
I should mention, I haven't updated the variadic branch since Feb. 5. I am pretty sure that there have been changes and bug fixes since then in my local copy of Thrust based on the latest https://github.com/jaredhoberock/tuple. I should get you the latest version. If you don't see an update from me in the next few days, please ping me. |
@jaredhoberock Will you entertain a pull request for variadic tuple as well? |
Variadic tuple is a huge, risky change. Is there any evidence that it is ready or that the compiler is ready for it? Here are some questions I'd like to see answered before contemplating how to merge variadic tuple:
|
Quick response off the top of my head.
What's risky about it? You already implemented variadic tuple, I already got it integrated. We can just leave it optional if you don't want it be the default behavior right away.
Yes... See above. I demonstrated an example above. We are both aware of a compiler workaround to get this compiling with CUDA 7, and that the compiler bug is fixed in CUDA 7.5.
I already have them merged in my local copy. If you are interested in moving forward I will push the latest very soon (otherwise, I have limited time to work on stuff that you may or may not be interested in pursuing).
Compilation performance is almost exclusively a function of compiler frontend (clang is great, gcc is good, icpc and nvcc are terrible). Qualitatively: It's atrocious either way with nvcc. It's fine either way with clang or gcc on the host side.
My Thrust-based code sped up 50%. That is huge... I am guessing you want more fine-grained comparisons. As per my post here NVIDIA/thrust#488 (comment), myself and others I work with have tried to run the performance tests built-in to Thrust, but with no luck. Would you be willing to help with this? I can get you precise error messages and description of what we tried.
As per my post here NVIDIA/thrust#488 (comment), I have not had success getting the performance / unit tests to actually run. Would you be willing to help with this? I can get you precise error messages and description of what we tried.
Net decrease. There is a lot of repetitive code in master due to the lack of variadic templates. See for example this commit: [https://github.com/andrewcorrigan/thrust-multi-permutation-iterator/commit/d000fcf7f1302cdaa7b615dc3c903fadb0cbc18f. Variadic tuple [https://github.com/andrewcorrigan/thrust-multi-permutation-iterator/blob/variadic/thrust/detail/tuple/variadic_tuple.h] itself is ~700 lines. This replaces, for example [https://github.com/thrust/thrust/blob/master/thrust/detail/tuple.inl] which is larger. |
If you would care to file a new issue for getting scons to work on Mac, we can discuss the problems you encountered with unit tests there. Besides the functional correctness of the variadic tuple implementation, I am concerned about the compile time and runtime performance of algorithms which use tuples internally in their implementation. Tuple is ubiquitous within Thrust's implementation. That means that casual users of Thrust who are uninterested in using variadic tuple will be indirectly affected by them. For example, if introducing variadic tuple were to increase the compile time of, say, In order to be convinced that it is safe to do this merge, we need to convince ourselves that this change does no harm. |
I want to say that it's wonderful to finally get some form of feedback on all this. I was confused when I went through the trouble of getting this new significant functionality working, observed a significant performance boost on top of that, and yet there was complete silence in response.
|
|
Thanks. I will start using file suffixes to handle the differences. If there is anything else you already don't like that we haven't already discussed, or anything you discover if/when you look at the code, can you please let me know? Aside from any possible performance issues putting things on hold, I want this code to be as ready-to-merge as possible. |
I don't think we've discussed unit tests -- there will need to be new tests which test the functionality of tuples larger than 10 elements. These should go in the tuple-specific unit tests and also the tests for |
What about |
Let's keep the scope of this effort focused on variadic tuple for now in order to maximize the chance of its success. |
So then are you going to merge the older PR first? If you don't merge that first, I have to redo a lot of variadic tuple work since variadic tuple was done according to the new directory structure you asked me to follow in the tuple_io PR. In terms of duplicating things on a file-by-file basis, what about |
@DachZiegel I just pushed to https://github.com/andrewcorrigan/thrust-multi-permutation-iterator/tree/variadic, which includes the latest from https://github.com/jaredhoberock/tuple, as well as a workaround for a bug in CUDA 7.0. See https://github.com/andrewcorrigan/thrust-multi-permutation-iterator/commits/variadic for details. Please let me know if it works for you or not. |
Thanks Andrew, I will try it. |
Should this also work with THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP? I tried this code: thrust_variadic_tuple_test.cpp It fails to compile for me using
However, using |
Yes, I agree that integrating variadic tuple will require quite a bit of work. Nonetheless, I do not wish to condition this pull request on a separate tangentially related request. The |
Can you please clarify what you want to do about the request you made in NVIDIA/thrust#488 (comment)? Which directory structure do you want now? If you still want the new directory structure, I am going to do a PR with just that reorganization. If not, then I am going to move the contents of |
That directory structure I suggested looks good to me. Please keep everything confined to this pull request. Thanks. |
Thanks for testing it. It should absolutely work with Here's what I did:
It's pretty hard for me to diagnose without reproducing the error. The only thing I can say is that you are getting errors in code that would be ignored if you had defined |
Are you sure your test code is correct? See the error:
It seems like your code is calling |
Sorry for the false alarm, I thought that setting When setting this define, my code compiles fine for me. |
It seems to me that if it works with variadic tuple, then it's by accident. get<3> of a tuple of size 3 should be an error. |
The |
Sorry for asking, as I know all of this is a lot of work. But do you know when/if this variadic template will be merged into the thrust master branch? |
so.... I'm guessing at this point this will never actually be merged into master, How would I get access to this functionality when using master? Is there another person's include that replaces thrust::tuple that I can use instead, or do I have to switch to someone elses branch entirely? |
Thrust is currently under heavy reorganization, I believe, and so this is probably waiting until then. This is a key feature that the Hydra library is built on, and that has it's own maintained Thrust fork (but heavily customized and renamed to avoid interfering with a system Thrust). |
And gtc 2018 has come and gone with no mention of any of this. I saw your link on hydra, I'm only really concerned about variadic templates and not the additional functionality, I believe that is located here. I do have CUDA 9.1 however and it looks like from the discussion you linked it uses a different version of thrust from what hosted on github currently? so maybe it is already there aswell? |
He didn't actually say Thrust would be at GTC, just their plans for CUDA... The version built into CUDA 9.1 is mostly stripped down and split into parts (CUB, etc), and of course patched for CUDA 9. I don't think it has any new functionality. I'm not sure how easy that is to extract from Hydra, but that is the best current version that I know of. You might be able to ask @AAAlvesJr for a recommendation for the fork he started from along with key changes necessary for CUDA 9. |
Hey @Cazadorro - I didn't give a talk at GTC, although I was around to chat with folks about Thrust. This PR will likely be merged for the next release of CUDA that we are working on now (not CUDA 9.2, but the one after that). |
@brycelelbach so in like 3 - 4 months ish? |
@Cazadorro I can't commit firmly to a time frame for CUDA releases. For just Thrust, I think we will have an updated version on Github and this integrated in 2 - 4 months. |
finally.. |
@andrewcorrigan I'm interested in this as well. I'm using cuda 10 but I can't find the implementation of variadic tuple. Did this ever make it to either CUDA release thrust or the github thrust? |
Not that I'm aware of. I've been migrating my code away from relying on Thrust for this sort of thing anyway, so I haven't really kept up with more recent Thrust versions. |
@andrewcorrigan How have you been able to do this? I've found thrust is convenient until it isn't (like in these scenarios). What do you do to migrate away? |
It's still on our TODO list - probably for this winter.
…------------------------------------------------------
Bryce Adelstein Lelbach aka wash
ISO C++ LEWGI Chair
CppCon and C++Now Program Chair
Thrust Maintainer, HPX Developer
CUDA Convert and Reformed AVX Junkie
Ask "Dumb" Questions
------------------------------------------------------
________________________________
From: Andrew Corrigan <notifications@github.com>
Sent: Wednesday, December 19, 2018 10:19 PM
To: thrust/thrust
Cc: Bryce Lelbach; Mention
Subject: Re: [thrust/thrust] variadic template implementation of thrust::tuple (#524)
Not that I'm aware of. I've been migrating my code away from relying on Thrust for this sort of thing anyway.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/thrust/thrust/issues/524#issuecomment-448845128>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAYTco9Ssul8F_yGCVI82dO3TTPUysF2ks5u6wG2gaJpZM4CL8QU>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
|
@Cazadorro When thrust was initially developed it had to duplicate functionality from the (pre-standard) C++11 standard library and Boost to be Another change is that much of Thrust has been standardized as part of the C++17 parallel algorithms library, so now there are alternatives like parallelstl that cover many of the parallel algorithms provided by Thrust. |
@andrewcorrigan Oh I didn't realize that! so it would be better now for thrust to just not duplicate that functionality because it all works with standard. I knew it was odd when I was able to just use std::get on thrust::tuples. Is cuda an actual backend to parrallel algorithms? So far it seems GCC is not up to date with that part of the standard and has only experimentally merged intels parallel implementation. |
There are indications [1] that there will be CUDA support in the parallel algorithms library implementation, but I don't know anything other than what I read on public mailing lists. [1] http://lists.llvm.org/pipermail/llvm-dev/2018-October/126998.html |
Hi @andrewcorrigan @brycelelbach @allisonvacanti Just checking in: Any progress for improving (We can work around on our side so it's ok if this is not on the near-term roadmap, but it's still nice to have.) #include <tuple>
#include <thrust/tuple.h>
#include <thrust/type_traits/integer_sequence.h>
template <typename F, size_t... Is>
auto gen_tuple_impl(F func, std::index_sequence<Is...> ) {
return std::make_tuple(func(Is)...);
}
template <size_t N, typename F>
auto gen_tuple(F func) {
return gen_tuple_impl(func, std::make_index_sequence<N>{} );
}
template <typename F, size_t... Is>
auto gen_thrust_tuple_impl(F func, thrust::index_sequence<Is...> ) {
return thrust::make_tuple(func(Is)...);
}
template <size_t N, typename F>
auto gen_thrust_tuple(F func) {
return gen_thrust_tuple_impl(func, thrust::make_index_sequence<N>{} );
}
int main() {
constexpr int NDIM = 11;
int data[NDIM];
auto tuple_std = gen_tuple<NDIM>([&](int i){ return data[i]; }); // this works
auto tuple_thrust = gen_thrust_tuple<NDIM>([&](int i){ return data[i]; }); // this does not
auto tuple_std2 = std::make_tuple(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11); // this works
auto tuple_thrust2 = thrust::make_tuple(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11); // this does not
return 0;
} cc: @kmaehashi @asi1024 |
The only other info on this that I'm aware of is here: #742. Of course, I don't speak for NVIDIA / Thrust. |
Ah yes, thanks for pointer Andrew. I've seen it but forgot about it... |
We're slowly progressing on this -- We'll be adding a libcu++ dependency in the near future, which will enable us to start using their variadic tuple implementation. |
Closed by #262 |
thrust::tuple does not use variadic templates. A variadic template implementation of thrust::tuple would not just make the internal implementation more elegant, there are serious practical issues with the current thrust::tuple implementation:
In consideration of the fact that C++11 support in nvcc is still in the release candidate, undocumented, and requires host compiler support, I don't expect C++03 support to be dropped anytime soon. Instead, can a user-supplied compile-time macro (e.g. THRUST_USE_CXX11) or auto-detection of __cplusplus >= 201103L please be considered to switch to an alternate implementation?
[1] http://www.jot.fm/issues/issue_2008_02/article2.pdf
The text was updated successfully, but these errors were encountered: