<a href="https://colab.research.google.com/github/trefftzc/cis677/blob/main/Thrust_algorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Thrust's algorithms

Based on
https://nvidia.github.io/cccl/thrust/api_docs/algorithms.html

Nine groups of algorithms:

1. Copying
2. Merging
3. Prefix sums
4. Reductions
5. Reordering
6. Searching
7. Set Operations
8. Sorting
9. Transformations

## 1. Copying

a. Gather

b. Scatter

c. swap_ranges

d. copy

e. copy_n

f. unitialized_copy



1.a. Gather:

gather copies elements from a source array into a destination range according to a map. For each input iterator i in the range [map_first, map_last), the value input_first[*i] is assigned to *(result + (i - map_first)). RandomAccessIterator must permit random access.

In [None]:
%%writefile gather.cu
#include <thrust/gather.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <iostream>

int main() {
  // mark even indices with a 1; odd indices with a 0
  int values[10] = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
  thrust::device_vector<int> d_values(values, values + 10);

  // gather all even indices into the first half of the range
  // and odd indices to the last half of the range
  int map[10]   = {0, 2, 4, 6, 8, 1, 3, 5, 7, 9};
  thrust::device_vector<int> d_map(map, map + 10);

  thrust::device_vector<int> d_output(10);
  thrust::gather(d_map.begin(), d_map.end(),
               d_values.begin(),
               d_output.begin());
// d_output is now {1, 1, 1, 1, 1, 0, 0, 0, 0, 0}
  thrust::host_vector<int> h_output(10);
  thrust::copy(d_output.begin(), d_output.end(), h_output.begin());
  for(int value : h_output) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  return 0;
}

Overwriting gather.cu


In [None]:
!!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub gather.cu -o gather -arch sm_75


[]

In [None]:
!./gather

1 1 1 1 1 0 0 0 0 0 


1.b. scatter

scatter copies elements from a source range into an output array according to a map. For each iterator i in the range [first, last), the value *i is assigned to output[*(map + (i - first))]. The output iterator must permit random access. If the same index appears more than once in the range [map, map + (last - first)), the result is undefined.

In [None]:
%%writefile scatter.cu
#include <thrust/scatter.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <iostream>

int main() {
  // mark even indices with a 1; odd indices with a 0
  int values[10] = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
  thrust::device_vector<int> d_values(values, values + 10);

  // scatter all even indices into the first half of the
  // range, and odd indices vice versa
  int map[10]   = {0, 5, 1, 6, 2, 7, 3, 8, 4, 9};
  thrust::device_vector<int> d_map(map, map + 10);

  thrust::device_vector<int> d_output(10);
  thrust::scatter(d_values.begin(), d_values.end(),
                d_map.begin(), d_output.begin());
  // d_output is now {1, 1, 1, 1, 1, 0, 0, 0, 0, 0}
  thrust::host_vector<int> h_output(10);
  thrust::copy(d_output.begin(), d_output.end(), h_output.begin());
  for(int value : h_output) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  return 0;
}



Writing scatter.cu


In [None]:
!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub scatter.cu -o scatter -arch sm_75

In [None]:
!./scatter

1 1 1 1 1 0 0 0 0 0 


1.c. swap_ranges

swap_ranges swaps each of the elements in the range [first1, last1) with the corresponding element in the range [first2, first2 + (last1 - first1)). That is, for each integer n such that 0 <= n < (last1 - first1), it swaps *(first1 + n) and *(first2 + n). The return value is first2 + (last1 - first1).

In [None]:
%%writefile swap_ranges.cu
#include <thrust/swap.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <iostream>

int main() {
 thrust::device_vector<int> v1(2), v2(2);
  v1[0] = 1;
  v1[1] = 2;
  v2[0] = 3;
  v2[1] = 4;

  thrust::swap_ranges(v1.begin(), v1.end(), v2.begin());
// v1[0] == 3, v1[1] == 4, v2[0] == 1, v2[1] == 2
  thrust::host_vector<int> h_v1(2);
  thrust::host_vector<int> h_v2(2);
  thrust::copy(v1.begin(), v1.end(), h_v1.begin());
  thrust::copy(v2.begin(), v2.end(), h_v2.begin());
  std::cout << "v1: ";
  for(int value : h_v1) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  std::cout << "v2: ";
  for(int value : h_v2) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  return 0;
}


Overwriting swap_ranges.cu


In [None]:
!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub swap_ranges.cu -o swap_ranges -arch sm_75

In [None]:
!./swap_ranges

v1: 3 4 
v2: 1 2 


1.d. copy

copy copies elements from the range [first, last) to the range [result, result + (last - first)). That is, it performs the assignments *result = *first, *(result + 1) = *(first + 1), and so on. Generally, for every integer n from 0 to last - first, copy performs the assignment *(result + n) = *(first + n). Unlike std::copy, copy offers no guarantee on order of operation. As a result, calling copy with overlapping source and destination ranges has undefined behavior.

The return value is result + (last - first).

In [None]:
%%writefile copy.cu
#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <iostream>

int main() {
  thrust::device_vector<int> vec0(10);
  thrust::device_vector<int> vec1(10);
  for(int i = 0; i < 10; ++i) {
    vec0[i] = i;
  }

  thrust::copy(vec0.begin(), vec0.end(),
             vec1.begin());

// vec1 is now a copy of vec0
  thrust::host_vector<int> h_v0(10);
  thrust::host_vector<int> h_v1(10);
  thrust::copy(vec0.begin(), vec0.end(), h_v0.begin());
  thrust::copy(vec1.begin(), vec1.end(), h_v1.begin());
  std::cout << "vec0: ";
  for(int value : h_v0) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  std::cout << "vec1: ";
  for(int value : h_v1) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  return 0;
}


Writing copy.cu


In [None]:
!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub copy.cu -o copy -arch sm_75

In [None]:
!./copy

vec0: 0 1 2 3 4 5 6 7 8 9 
vec1: 0 1 2 3 4 5 6 7 8 9 


1.3. copy_n

copy_n copies elements from the range [first, first + n) to the range [result, result + n). That is, it performs the assignments *result = *first, *(result + 1) = *(first + 1), and so on. Generally, for every integer i from 0 to n, copy performs the assignment *(result

i) = *(first + i). Unlike std::copy_n, copy_n offers no guarantee on order of operation. As a result, calling copy_n with overlapping source and destination ranges has undefined behavior.

The return value is result + n.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use copy to copy from

In [3]:
%%writefile copy_n.cu
#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/execution_policy.h>
#include <iostream>

int main() {
  thrust::device_vector<int> vec0(10);
  thrust::device_vector<int> vec1(10);
  for(int i = 0; i < 10; ++i) {
    vec0[i] = i;
  }
  int n = 5;
  thrust::copy_n(thrust::device,vec0.begin(), n,
             vec1.begin());

// vec1 now contains the first 5 elements of vec0
  thrust::host_vector<int> h_v0(10);
  thrust::host_vector<int> h_v1(10);
  thrust::copy(vec0.begin(), vec0.end(), h_v0.begin());
  thrust::copy(vec1.begin(), vec1.end(), h_v1.begin());
  std::cout << "vec0: ";
  for(int value : h_v0) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  std::cout << "vec1: ";
  for(int value : h_v1) {
    std::cout << value << " ";
  }
  std::cout << std::endl;
  return 0;
}

Overwriting copy_n.cu


In [4]:
!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub copy_n.cu -o copy_n -arch sm_75

In [5]:
!./copy_n

vec0: 0 1 2 3 4 5 6 7 8 9 
vec1: 0 1 2 3 4 0 0 0 0 0 


1.f. unitialized_copy

In thrust, the function thrust::device_new allocates memory for an object and then creates an object at that location by calling a constructor. Occasionally, however, it is useful to separate those two operations. If each iterator in the range [result, result + (last - first)) points to uninitialized memory, then uninitialized_copy creates a copy of [first, last) in that range. That is, for each iterator i in the input, uninitialized_copy creates a copy of *i in the location pointed to by the corresponding iterator in the output range by ForwardIterator's value_type's copy constructor with *i as its argument.

The algorithm’s execution is parallelized as determined by exec.


In [16]:
%%writefile unitialized_copy.cu
#include <thrust/uninitialized_copy.h>
#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/device_malloc.h>
#include <iostream>

struct Int
{
  __host__ __device__
  Int(int x) : val(x) {}
  int val;
};

const int N = 137;


int main() {
  Int val(46);
  thrust::device_vector<Int> input(N, val);
  thrust::device_ptr<Int> array = thrust::device_malloc<Int>(N);
  thrust::uninitialized_copy(thrust::device, input.begin(), input.end(), array);

// Int x = array[i];
// x.val == 46 for all 0 <= i < N


  return 0;
}

Overwriting unitialized_copy.cu


In [17]:
!nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub unitialized_copy.cu -o unitialized_copy -arch sm_75

In [18]:
!./unitialized_copy