Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ interop #12

Open
mratsim opened this issue Mar 24, 2018 · 3 comments
Open

C++ interop #12

mratsim opened this issue Mar 24, 2018 · 3 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Mar 24, 2018

Not sure what Calypso is but I have no problem doing C++ Cuda integration in Nim:

Example on applying element-wise operation on 2 tensors dest and src

Cuda C++ kernels

# Assignment op
# Does element-wise A[i] `op=` B[i]
template cuda_assign_op(op_name, op_symbol: string)=
  {.emit: ["""
  template<typename T>
  struct """,op_name,"""{
  __device__ __forceinline__ void operator()(
      T *  __restrict__ dst,
      const T *  __restrict__ src){
      *dst """,op_symbol,""" __ldg(src);
      }
  };
  """].}
cuda_assign_op("CopyOp", "=")
cuda_assign_op("mAddOp", "+=")
cuda_assign_op("mSubOp", "-=")
cuda_assign_op("mMulOp", "*=")
cuda_assign_op("mDivOp", "/=")

Apply the operation on a whole tensor

Apply Cuda C++ Higher order function

{.emit:["""
  template<typename T, typename Op>
  __global__ void cuda_apply2(const int rank,
                              const int len,
                              const int *  __restrict__ dst_shape,
                              const int *  __restrict__ dst_strides,
                              const int dst_offset,
                              T * __restrict__ dst_data,
                              Op f,
                              const int *  __restrict__ src_shape,
                              const int *  __restrict__ src_strides,
                              const int src_offset,
                              const T * __restrict__ src_data){
    for (int elemID = blockIdx.x * blockDim.x + threadIdx.x;
         elemID < len;
         elemID += blockDim.x * gridDim.x) {
      // ## we can't instantiate the variable outside the loop
      // ## each threads will store its own in parallel
      const int dst_real_idx = cuda_getIndexOfElementID(
                               rank,
                               dst_shape,
                               dst_strides,
                               dst_offset,
                               elemID);
      const int src_real_idx = cuda_getIndexOfElementID(
                               rank,
                               src_shape,
                               src_strides,
                               src_offset,
                               elemID);
      f(&dst_data[dst_real_idx], &src_data[src_real_idx]);
    }
  }
"""].}

Generate the Nim <-> C++ bindings

template cuda_assign_binding(kernel_name: string, binding_name: untyped)=
  # Generate a Nim proc that wraps the C++/Cuda kernel proc

  const import_string:string = kernel_name & "<'*8>(@)"
  # We pass the 8th parameter type to the template.
  # The "*" in '*8 is needed to remove the pointer *

  # We create an new identifier on the fly with backticks
  proc `binding_name`[T: SomeReal](
    blocksPerGrid, threadsPerBlock: cint,
    rank, len: cint,
    dst_shape, dst_strides: ptr cint, dst_offset: cint, dst_data: ptr T,
    src_shape, src_strides: ptr cint, src_offset: cint, src_data: ptr T
  ) {.importcpp: import_string, noSideEffect.}



template cuda_assign_glue*(
  kernel_name, op_name: string, binding_name: untyped): untyped =
  # Input
  #   - kernel_name and the Cuda function object
  # Result
  #   - Auto-generate cuda kernel based on the function object
  #   - Bindings with name "kernel_name" that can be called directly
  #   or with the convenience function ``cuda_assign_call``

  {.emit:["""
  template<typename T>
  inline void """, kernel_name,"""(
    const int blocksPerGrid, const int threadsPerBlock,
    const int rank,
    const int len,
    const int * __restrict__ dst_shape,
    const int * __restrict__ dst_strides,
    const int dst_offset,
    T * __restrict__ dst_data,
    const int * __restrict__ src_shape,
    const int * __restrict__ src_strides,
    const int src_offset,
    const T * __restrict__ src_data){
      cuda_apply2<<<blocksPerGrid, threadsPerBlock>>>(
        rank, len,
        dst_shape, dst_strides, dst_offset, dst_data,
        """,op_name,"""<T>(),
        src_shape, src_strides, src_offset, src_data
      );
    }
    """].}

  cuda_assign_binding(kernel_name, binding_name)

template cuda_assign_call*[T: SomeReal](
  kernel_name: untyped, destination: var CudaTensor[T], source: CudaTensor[T]): untyped =
  ## Does the heavy-lifting to format the tensors for the cuda call
  #
  # TODO: why doesn't this template works with "cudaLaunchKernel" instead
  # of triple-chevrons notation kernel<<<blocksPerGrid, threadsPerBlock>>>(params).
  # This would avoid an intermediate function call

  let dst = layoutOnDevice destination
  let src = layoutOnDevice source

  kernel_name[T](
    CUDA_HOF_TPB, CUDA_HOF_BPG,
    src.rank, dst.len, # Note: small shortcut, in this case len and size are the same
    dst.shape[], dst.strides[],
    dst.offset, dst.data,
    src.shape[], src.strides[],
    src.offset, src.data
)

Usage:

cuda_assign_glue("cuda_mAdd", "mAddOp", cuda_mAdd)

proc `+=`*[T: SomeReal](a: var CudaTensor[T], b: CudaTensor[T]) =
  ## CudaTensor in-place addition
  when compileOption("boundChecks"):
    check_elementwise(a,b)

  cuda_assign_call(cuda_mAdd, a, b)

cuda_assign_glue("cuda_mSub", "mSubOp", cuda_mSub)
proc `-=`*[T: SomeReal](a: var CudaTensor[T], b: CudaTensor[T]) =
  ## CudaTensor in-place substraction

  when compileOption("boundChecks"):
    check_elementwise(a,b)

  cuda_assign_call(cuda_mSub, a, b)

...
@ghost
Copy link

ghost commented Mar 24, 2018

@mratsim author probably meant that both D (with this special compiler) and Nim can both use C++ directly
default

@timotheecour
Copy link
Owner

timotheecour commented Jul 11, 2019

comment from @Laeeth which I'm moving to here:

I guess you should mention extern(C++). You can use dpp to #include C++ headers if you blacklist STL and Boost types and it will work for quite a lot. It's still not complete yet but it keeps getting better and in a year or two STL should be fine. It's used in production but it's not something I would say is easy for just anyone to use and expect to work immediately. For C++ - C mostly just works.
Can you do that now using Nim? Just include c++ headers and use them?
How about generating c++ headers from Nim code? We do that in D but the available options aren't yet polished.

@timotheecour
Copy link
Owner

PR's welcome to update this section (especially with the latest D work on extern(C++), but it deserves a whole entry on its own; this is something I've been very interested in myself, and for which I contributed both in D (see https://github.com/Syniurge/Calypso) and in Nim (see https://github.com/nimterop/nimterop: Nimterop is a Nim package that aims to make C/C++ interop seamless)

the answer is a lot more complicated and more nuanced than with C #30.

For example you can do the same as for #30 but with C++ (importcpp and emit still apply to C++):

# search for other such examples in Nim repo
proc make_unique_str(a: cstring): stdUniquePtr[stdString] {.importcpp: "std::make_unique<std::string>(#)", header: "<string>".}
proc foobar(a: cint) {.importcpp.} # untested but something like that works; at very least by providing mangled name
{.emit:"""
#include <stdio.h>
template<typename T> void foobar(T a){
  printf("in foobar\n");
}
template void foobar<int>(int a); // explicit instantiation required otherwise link error
""".}

however this won't work with templates (unless explicitly instantiated); and also still requires defining headers manually;

this is what nimterop (and to a lesser extend the "official" c2nim) aim to fix: wrapper free interop. Templates (that aren't instantiated) aren't yet supported in nimterop. My understanding is this isn't also supported in D (unless you use Calypso)

relevant links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants