Name		Name	Last commit message	Last commit date
parent directory ..
cuda-10.1		cuda-10.1
cuda-11		cuda-11
README.md		README.md

README.md

GPU Containers

There are two variations of GPU containers that can be used depending upon the CUDA version you wish to use: cuda-10.1 contains the layers which install CUDA 10.1 cuda-11 contains the layers which install CUDA 11

Example base layers to build your own container:

gpu-base extends the official NVIDIA CUDA container with Databricks Container Service minimal requirements.
gpu-conda extends gpu-base by installing Miniconda.

Example containers for common GPU use cases:

gpu-tensorflow extends gpu-conda by creating a conda environment that contains TensorFlow.
gpu-pytorch extends gpu-conda by creating a conda environment that contains PyTorch.
gpu-rapids extends gpu-conda by creating a conda environment that contains RAPIDS.
- RAPIDS requires NVIDIA Pascal GPU or better. If you receive a cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device error, you likely are using GPUs that are incompatible, e.g., K80 on EC2 P2 instances. You should try switching to newer instance types.
- The Dockerfile provides an example to create the root conda environment from an environment spec file, which does not require dependency resolution.

Launching GPU Clusters

When launching a GPU cluster with a custom container with conda, we recommend setting the Spark conf spark.databricks.libraryIsolation.enabled false. This disables notebook-scoped libraries, which do not support conda. The example images use conda for environment creation.
After the cluster is ready, you can run %sh nvidia-smi to view GPU devices and confirm that they are available.

Creating Custom Dockerfiles:

You can modify the gpu-base Dockerfile and add additional system packages and NVIDIA libraries, for example, TensorRT (libnvinfer). You can also change the base image (FROM) to use CUDA 10.0 or 10.2.
You cannot change the NVIDIA driver version, because it must match the driver version on the host machine, which is 450.80.
You must install conda at /databricks/conda/ if you are replacing the gpu-conda layer.
You must set ENV DEFAULT_DATABRICKS_ROOT_CONDA_ENV in your Dockerfile. This environment variable is used by Databricks to determine which conda environment to activate by default.
The gpu-tensorflow and gpu-pytorch Dockerfiles provide examples to create the root conda environment from an environment.yml file. These packages are required for Python notebooks and PySpark to work: python, ipython, numpy, pandas, pyarrow, six, and ipykernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu

gpu

README.md

GPU Containers

Launching GPU Clusters

Creating Custom Dockerfiles:

Files

gpu

Directory actions

More options

Directory actions

More options

Latest commit

History

gpu

Folders and files

parent directory

README.md

GPU Containers

Launching GPU Clusters

Creating Custom Dockerfiles: