## 1. Environment Sanity Check

**If you are using DGX-1 and a rapids docker image, you can skip this!**

We need to make sure if we have been allocated a **Tesla T4** GPU to successfully run RAPIDS libraries. Somtimes Google Colab allcates a Tesla K80 instead of **Tesla T4**. If you get a K80 GPU, please try **Runtime -> Reset all runtimes** until you get a **Tesla T4** GPU.

In [0]:
!nvidia-smi

Sat Sep 28 23:08:33 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P8    12W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## 2. Setup


**If you are using DGX-1 and a rapids docker image, you can skip this!**

Let's install the followings:

1. Miniconda for Google Colab's Python install
2. RAPIDS libraries
3. Related Python packages

Then, we are going to setup the environment as follows:

4. Set environment variables
5. Copy .so files to the current working directory

In [0]:
# install miniconda
!wget -c https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

# install RAPIDS packages
!conda install -q -y --prefix /usr/local -c conda-forge \
  -c rapidsai-nightly/label/cuda10.0 -c nvidia/label/cuda10.0 \
  cudf cuml

# install relevant python package
!conda install -y nvstrings
!pip install dropbox

# set environment vars
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

# copy .so files to current working dir
for fn in ['libcudf.so', 'librmm.so']:
  shutil.copy('/usr/local/lib/'+fn, os.getcwd())

--2019-09-28 23:08:39--  https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58468498 (56M) [application/x-sh]
Saving to: ‘Miniconda3-4.5.4-Linux-x86_64.sh’


2019-09-28 23:08:41 (65.4 MB/s) - ‘Miniconda3-4.5.4-Linux-x86_64.sh’ saved [58468498/58468498]

PREFIX=/usr/local
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-hdf63c60_3 ...
installing: libstdcxx-ng-7.2.0-hdf63c60_3 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-hf484d3e_0 ...
installing: openssl-1.0.2o-h20670df_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7

## 3. Get Datasets

We are going to use five datasets in this bonus problem. Each dataset includes 2d coordinates, where 
- each row is a data point (a 2d coordinate)
- the first column is x coordinates
- the second column is y coordinates
- there is no column header

**If you are using DGX-1 and a rapids docker image, you can skip this!**
The datasets is already available at:
- /app/sample_data/data0.csv
- /app/sample_data/data1.csv
- /app/sample_data/data2.csv
- /app/sample_data/data3.csv
- /app/sample_data/data4.csv

If you are using Google Colab, you need to run the following cell to get the datasets. They will be available at:
- ./sample_data/data0.csv
- ./sample_data/data1.csv
- ./sample_data/data2.csv
- ./sample_data/data3.csv
- ./sample_data/data4.csv






In [0]:
%%time 

import dropbox
import numpy as np

# Get access into the dropbox directory
token = 'sKC7vZAuXmAAAAAAAAABVKm0AgbOagLq8a11wNL71NoP0DbfCnj-KZlxda-7n55-'
dbx = dropbox.Dropbox(token)

# Parse the datasets and save them
for i in range(5):
  
  # Get the dataset
  filename = "/sample_data/data%d.csv" % i
  metadata, res = dbx.files_download(path=filename)
  content = res.content.decode("utf-8")
  lst = content.split('\n')
  
  # Parse the dataset
  total_len = len(lst) - 1
  arr = np.zeros((total_len, 2))
  for j, e in enumerate(lst):
    if j < total_len:
        arr[j] = np.array(e.split(',')).astype(float)
  
  # Save the dataset
  np.savetxt("." + filename, arr, fmt='%.2lf', delimiter=",")

CPU times: user 1.46 s, sys: 16 ms, total: 1.48 s
Wall time: 11.4 s


In [0]:
!ls -alh sample_data/data*.csv

-rw-r--r-- 1 root root 479K Sep 17 04:47 sample_data/data0.csv
-rw-r--r-- 1 root root 609K Sep 17 04:47 sample_data/data1.csv
-rw-r--r-- 1 root root 1.2M Sep 17 04:47 sample_data/data2.csv
-rw-r--r-- 1 root root 1.5M Sep 17 04:47 sample_data/data3.csv
-rw-r--r-- 1 root root 2.0M Sep 17 04:47 sample_data/data4.csv


## Now all got set up! You can do your homework now!