# Basic Kokkos tutorial

Declare include path for kokkos header files and import kokkos library.

In [1]:
.I /root/install/do-conf-tp-serial/include/



In [2]:
.L libkokkoscore



Include necessary header files

In [3]:
#include <Kokkos_Core.hpp>



In [4]:
#include <typeinfo>



## parallel_for

Define basic functor

In [5]:
struct hello_world {
  // If a functor has an "execution_space" (or "execution_space", for
  // backwards compatibility) public typedef, parallel_* will only run
  // the functor in that execution space.  That's a good way to mark a
  // functor as specific to an execution space.  If the functor lacks
  // this typedef, parallel_for will run it in the default execution
  // space, unless you tell it otherwise (that's an advanced topic;
  // see "execution policies").

  // The functor's operator() defines the loop body.  It takes an
  // integer argument which is the parallel for loop index.  Other
  // arguments are possible; see the "hierarchical parallelism" part
  // of the tutorial.
  //
  // The operator() method must be const, and must be marked with the
  // KOKKOS_INLINE_FUNCTION macro.  If building with CUDA, this macro
  // will mark your method as suitable for running on the CUDA device
  // (as well as on the host).  If not building with CUDA, the macro
  // is unnecessary but harmless.
  KOKKOS_INLINE_FUNCTION
  void operator() (const int i) const {
    //printf ("Hello from i = %i \n", i);
      std::cout << "Hello from i = " << i << std::endl;
  }
};



Run `parallel_for` on two threads

In [6]:
Kokkos::InitArguments args;
args.num_threads = 2;
// 2 (CPU) NUMA regions per process
args.num_numa = 1;
// If Kokkos was built with CUDA enabled, use the GPU with device ID 1.
args.device_id = 1;
Kokkos::initialize(args);
Kokkos::parallel_for (15, hello_world ());
Kokkos::finalize();

IncrementalExecutor::executeFunction: symbol '__emutls_v._ZN6Kokkos4Impl22SharedAllocationRecordIvvE18t_tracking_enabledE' unresolved while linking function '_GLOBAL__sub_I_cling_module_3'!


Hello from i = 8
Hello from i = Hello from i = 90

Hello from i = 10
Hello from i = 11
Hello from i = 12
Hello from i = 13
Hello from i = 14
Hello from i = 1
Hello from i = 2
Hello from i = 3
Hello from i = 4
Hello from i = 5
Hello from i = 6
Hello from i = 7


(void) @0x7fc6d152aa28


<b>Note:</b>
* It only works if the Kokkos::InitArguments struct is defined in the same cell as the `Kokkos::initialization` is called. The `parallel_for` must be in the same cell as the `Kokkos::initialization` call
* It always shows up the warning about an unresolved symbol Kokkos::Impl::SharedAllocationRecord tracking enabled...
* Sometimes the C++ kernel is crashing and the jupyter notebook needs to be restarted
* Have not tested lambdas

## parallel_reduce

In [7]:
struct sum_up_doubles {
  KOKKOS_INLINE_FUNCTION
  void operator() (const int i, double& update) const {
    //printf ("Hello from i = %i \n", i);
    update += (double) i;
  }
};



In [8]:
Kokkos::InitArguments args2;
args2.num_threads = 2;
// 2 (CPU) NUMA regions per process
args2.num_numa = 1;
// If Kokkos was built with CUDA enabled, use the GPU with device ID 1.
args2.device_id = 1;
double sum = 0.0;
size_t N = 5;
Kokkos::initialize(args2);
Kokkos::parallel_reduce (N, sum_up_doubles(), sum);
std::cout << sum << std::endl;
Kokkos::finalize();

IncrementalExecutor::executeFunction: symbol '__emutls_v._ZN6Kokkos4Impl22SharedAllocationRecordIvvE18t_tracking_enabledE' unresolved while linking function '_GLOBAL__sub_I_cling_module_7'!


10


(void) @0x7fc6d152aa28


## Execution space

In [9]:
  printf ("Hello World on Kokkos execution space %s\n",
          typeid (Kokkos::DefaultExecutionSpace).name ());

Hello World on Kokkos execution space N6Kokkos7ThreadsE


(int) 56
