## Homework 10: GPUs

## Due Date: May 3, 2021, 04:00pm

#### Firstname Lastname: Yuhan Liu

#### E-mail: yl7576@nyu.edu

#### Enter your solutions and submit this notebook

---

**Problem 1 (100p)**


Write two programs which will be able to run in parallel on a GPU, one using Numba/CUDA (50p), one using PyOpenCL (50p).


Each program will:

- draw two random vectors $\vec u$ and $\vec v$ from $[0,1]^N$ where $N = 10^7$;


- calculate and output similarity between $\vec u$ and $\vec v$.




The similarity between two vectors $\vec u$ and $\vec v$ is defined here as a `cosine` value of the angle between them $\measuredangle \left( \vec u, \vec v \right)$. That is, the program returns: 

$$\cos \left( \measuredangle \left( \vec u, \vec v \right) \right).$$


Note that the output is a real value and must belong to $[-1, 1]$.

In [15]:
from numba import cuda
print(cuda.gpus)

cuda.select_device(0)

<Managed Device 0>


<weakproxy at 0x7f3debd2c650 to Device at 0x7f3dedb5b490>

In [20]:
from __future__ import division
from numba import cuda
import numpy as np
import math

@cuda.jit
def cos_sim(u,v,uv,u_norm,v_norm):
  x = cuda.grid(1)
  if x > u.size:
    return
  u_norm[x] += u[x]*u[x]
  v_norm[x] += v[x]*v[x]
  uv[x] += u[x]*v[x]

N = 10000000
u,v = np.random.rand(N).astype(np.float32),np.random.rand(N).astype(np.float32)
du, dv = cuda.to_device(u),cuda.to_device(v)
duv,du_norm,dv_norm = cuda.device_array(N),cuda.device_array(N),cuda.device_array(N)
threadsperblock = 1000
blockspergrid = math.ceil(u.shape[0]/threadsperblock)

cos_sim[blockspergrid, threadsperblock](du, dv, duv, du_norm, dv_norm)
uv = duv.copy_to_host()
u_norm = du_norm.copy_to_host()
v_norm = dv_norm.copy_to_host()
similarity = np.sum(uv)/np.sqrt(np.sum(u_norm))/np.sqrt(np.sum(v_norm))
print('similarity of u and v is', similarity)

similarity of u and v is 0.7500517491664885


In [None]:
!pip install pyopencl

In [25]:
# Use OpenCL To Add Two Random Arrays (This Way Hides Details)
from scipy import spatial
import pyopencl as cl  # Import the OpenCL GPU computing API
import pyopencl.array as pycl_array  # Import PyOpenCL Array 
#(a Numpy array plus an OpenCL buffer object)

import numpy as np  # Import Numpy number tools

np.set_printoptions(suppress=True)
context = cl.create_some_context()  # Initialize the Context
queue = cl.CommandQueue(context)  # Instantiate a Queue

### Create two random pyopencl arrays
N = 10000000
u = pycl_array.to_device(queue, np.random.rand(N).astype(np.float32))
v = pycl_array.to_device(queue, np.random.rand(N).astype(np.float32))  

### Create an empty pyopencl destination array
uv, u_norm, v_norm = pycl_array.empty_like(u), pycl_array.empty_like(u), pycl_array.empty_like(u)    

program = cl.Program(context, """
__kernel void sum(__global const float *u, __global const float *v, __global float *uv, __global float *u_norm, __global float *v_norm)
{
  int i = get_global_id(0);
  uv[i] = u[i] * v[i];
  u_norm[i] = u[i] * u[i];
  v_norm[i] = v[i] * v[i];
}""").build()  # Create the OpenCL program

### Enqueue the program for execution and store the result in c
program.sum(queue, u.shape, None, u.data, v.data, uv.data, u_norm.data, v_norm.data)  
similarity = np.sum(uv.get())/np.sqrt(float(np.sum(u_norm.get())))/np.sqrt(float(np.sum(v_norm.get())))
print('similarity of u and v is', similarity)

similarity of u and v is 0.7499180906446112
