## GPU Demos

### Install and Imports

Do a `pip install` of the [numba](https://numba.pydata.org/) library and check for where the cuda `.so` files are kept.  If a `.so` appears CUDA is likely installed.

In [4]:
!pip install numba
!find / -iname 'libdevice'
!find / -iname 'libnvvm.so'

/usr/local/cuda-10.1/nvvm/libdevice
/usr/local/cuda-10.0/nvvm/libdevice
/usr/local/cuda-10.1/nvvm/lib64/libnvvm.so
/usr/local/cuda-10.0/nvvm/lib64/libnvvm.so


Install libraries

In [12]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import math
from numba import jit, njit, vectorize, cuda, uint32, f8, uint8, float32
from pylab import imshow, show
from timeit import default_timer as timer

In [21]:

def matmul_cpu(A, B, C):
    """Perform square matrix multiplication of C = A * B without using GPU
    """
    for i in range(C.shape[0]):
        for j in range(C.shape[1]):
            tmp = 0.
            for k in range(A.shape[1]):
                tmp += A[i, k] * B[k, j]
            C[i, j] = tmp


# references https://numba.pydata.org/numba-doc/dev/cuda/examples.html
@cuda.jit
def matmul_gpu(A, B, C):
    """Perform square matrix multiplication of C = A * B using GPU
    """

    i, j = cuda.grid(2)
    if i < C.shape[0] and j < C.shape[1]:
        tmp = 0.
        for k in range(A.shape[1]):
            tmp += A[i, k] * B[k, j]
        C[i, j] = tmp







A = np.random.rand(400,200)
B = np.random.rand(200,100)
C = np.random.rand(400,100)
start = timer()
matmul_cpu(A, B, C)    
print(timer() - start)

start = timer()
matmul_gpu(A, B, C)
print(timer() - start)



3.7225886399999126
0.16488140299998122
