Integrate improved CUDA radix sort implementations #136

Closed
jaredhoberock opened this Issue Jun 4, 2012 · 1 comment

1 participant

@jaredhoberock
A Parallel Algorithms Library member

B40C has a faster implementation

@jaredhoberock
A Parallel Algorithms Library member

This branch almost works but there are some corner cases involving multigpu setups where it will fail. We'll have to push this back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment