<a id="introduction"></a>
## Supervised Learning with cuML
#### By Paul Hendricks
-------

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

In this notebook, we will show how to quickly setup Dask and train an XGBoost model using cuDF and read the data from disk using cuIO.

**Table of Contents**

* [Supervised Learning with cuML](#introduction)
* [Setup](#setup)
* [Linear Regression](#linearregression)
* [Ridge Regression](#ridgeregression)
* [Stochastic Gradient Descent](#sgd)
* [K Nearest Neighbors](#knn)
* [Conclusion](#conclusion)

Before going any further, let's make sure we have access to `matplotlib`, a popular Python library for data visualization.

In [None]:
import os

try:
    import matplotlib; print('Matplotlib Version:', matplotlib.__version__)
except ModuleNotFoundError:
    os.system('conda install -y matplotlib')

## Setup

This notebook was tested using the following Docker containers:

* `rapidsai/rapidsai:0.6-cuda10.0-devel-ubuntu18.04-gcc7-py3.7` from [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai)
* `rapidsai/rapidsai-nightly:0.6-cuda10.0-devel-ubuntu18.04-gcc7-py3.7` from [DockerHub](https://hub.docker.com/r/rapidsai/rapidsai-nightly)

This notebook was run on the NVIDIA Tesla V100 GPU. Please be aware that your system may be different and you may need to modify the code or install packages to run the below examples. 

If you think you have found a bug or an error, please file an issue here: https://github.com/rapidsai/notebooks/issues

Before we begin, let's check out our hardware setup by running the `nvidia-smi` command.

In [None]:
!nvidia-smi

Next, let's see what CUDA version we have:

In [None]:
!nvcc --version

Next, let's load some helper functions from `matplotlib` and configure the Jupyter Notebook for visualization.

In [None]:
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt


%matplotlib inline

## Liner regression

There are a few dfferences to note with [Dask] cuML's linear regression and standard OLS as found in Scikit-Learn.  Even though the OLS interface of cuML is very similar to Scikit-Learn's implemetation, cuML doesn't use some of the parameters such as "copy" and "n_jobs". Also, cuML includes two different implementation of OLS using SVD and Eigen decomposition. Eigen decomposition based implementation is very fast but causes very small errors in the coefficients which is negligible for most of the applications. SVD is stable but slower than eigen decomposition based implementation. Its not something you may notice, but its a good word of warning if you need to do a super precise analysis

### Get your data

In [None]:
%%time
X_cudf = cudf.DataFrame.from_pandas(X)
y_cudf = y.values
y_cudf = y_cudf[:,0]
y_cudf = cudf.Series(y_cudf)

### Linear regression Parameters

In [None]:
reg_cuml = cumlOLS(fit_intercept=fit_intercept, normalize=normalize, algorithm=algorithm)
result_cuml = reg_cuml.fit(X_cudf, y_cudf)

In [None]:
y_cuml = reg_cuml.predict(X_cudf)
y_cuml = to_nparray(y_cuml).ravel()
error_cuml = mean_squared_error(y,y_cuml)

<a id="conclusion"></a>
## Conclusion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

To learn more about RAPIDS, be sure to check out: 

* [Open Source Website](http://rapids.ai)
* [GitHub](https://github.com/rapidsai/)
* [Press Release](https://nvidianews.nvidia.com/news/nvidia-introduces-rapids-open-source-gpu-acceleration-platform-for-large-scale-data-analytics-and-machine-learning)
* [NVIDIA Blog](https://blogs.nvidia.com/blog/2018/10/10/rapids-data-science-open-source-community/)
* [Developer Blog](https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/)
* [NVIDIA Data Science Webpage](https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/)
