nvidia-pytorch-tensorflow-conda-jupyter-ssh

This is a Dockerfile for an environment that integrates CUDA, Conda, Jupyter (Notebook and JupyterLab), SSH, PyTorch, TensorFlow (and Code Server). This dockerfile is mainly designed for creating different environment for different users when you want to share a server with other users.

Summary

The key features include:

Flexible environment with Conda: Using Conda, you can easily maintain different versions of Python and packages. You can also create and register new environments as Jupyter kernels. For example, when you want to register a new env (python 3.10 + tensorflow 2.11), use the following code

conda create -n "name_of_env" python=3.10 ipykernel # build new env
conda activate "name_of_env" # switch to the new env
python -m ipykernel install --user --name "name_of_env" --display-name "name shown in Jupyter"

Easy access with SSH and Jupyter: SSH and Jupyter are integrated into the environment, allowing you to access it through ports 22 and 8888, respectively.
File permissions: Creating a new user ensures that the user IDs (UID) and group IDs (GID) within the container are the same as those on the host. This avoids many file permission issues.
Computing resource sharing and limiting: Docker ensures that a single user does not occupy the computing resources exclusively. Additionally, you can easily set resource limitations when setting up the Docker containers.
Separation and persistency of file systems: Each user in Docker has its own file system. By setting up the Docker containers, you can mount key folders into the host machines, enabling persistency.

How to compile a docker image from Dockerfile

Suppose you want to compile from Dockerfile.combined_pt_tf. Navigate to the directory and use the following command:

sudo docker build -t <image_name> -f Dockerfile.combined_pt_tf --build-arg USER=<user> --build-arg PASSWORD=<password> --build-arg UID=<UID> --build-arg GID=<GID> .

Replace <image_name>, <user>``, and with your desired values. Setandwith the uid and gid of your host machine (obtained by runningid` in command line). Also, feel free to change the variables (e.g., TZ) and version of packages (python, TensorFlow, PyTorch, etc.) in the dockerfile.

To enable pip cache, you need to run export DOCKER_BUILDKIT=1 in the host before building. Or, you can add this before sudo, i.e., DOCKER_BUILDKIT=1 sudo docker build ...

Dockerfile.combined vs. Dockerfile.separate

These two versions exist because for some version of of cuda (e.g., 11.2), we are unable to find compatible versions of Pytorch and Tensorflow. Refer to Tensorflow and PyTorch.

Docker-compose.yaml for starting a docker container

version: "3"

services:
  pt-tf:
    container_name: <to_replace> # name
    image: <image_name>
    hostname: Ubuntu
    restart: always
    # tty: true # useful in the debugging stage of Dockerfile
    # privileged: true # this can avoid nvidia-smi error (Failed to initialize NVML: Unknown Error) after calling systemctl daemon-reload. However, this also gives all capabilities to the container. A better way is to diable privileged and mount devices manually (see below).
    init: true # Run an init inside the container that forwards signals and reaps processes. see https://docs.docker.com/compose/compose-file/compose-file-v3/
    pid: "host" # share pid with host to allow container to use nvidia-smi to show the processes ocupying GPUs 
    volumes:
      - ./data:/home/sjtu/workspace:rw
      # - /data/<to_replace>:/home/sjtu/data:rw # uncomment it when you need to mount more folders
    ports:
      - "<to_replace>:8888" # port for jupyter
      - "<to_replace>:22" # port for ssh
      - "<to_replace>:1000" # reserved some ports
      - "<to_replace>:1001" # reserved some ports
      - "<to_replace>:1002" # reserved some ports
    devices:
      - /dev/nvidia0:/dev/nvidia0
      #- /dev/nvidia1:/dev/nvidia1 # if you got multiple gpus
      - /dev/nvidiactl:/dev/nvidiactl
      - /dev/nvidia-caps:/dev/nvidia-caps
      - /dev/nvidia-modeset:/dev/nvidia-modeset
      - /dev/nvidia-uvm:/dev/nvidia-uvm
      - /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
    deploy:
      resources:
        reservations: # whether use gpu or not
          devices:
            - capabilities: ["gpu"]
              device_ids: ["0"]
        # limits: # set cpu  and memory limits
          # cpus: 48
          # memory: 40960M

After starting the container, you can access the container via ssh or Jupyter with ports 22 or 8888 respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cuda11.2-pt1.12-tf2.7		cuda11.2-pt1.12-tf2.7
cuda11.8-pt2.0.1-tf2.14		cuda11.8-pt2.0.1-tf2.14
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda11.2-pt1.12-tf2.7

cuda11.2-pt1.12-tf2.7

cuda11.8-pt2.0.1-tf2.14

cuda11.8-pt2.0.1-tf2.14

LICENSE

LICENSE

README.md

README.md

Repository files navigation

nvidia-pytorch-tensorflow-conda-jupyter-ssh

Summary

How to compile a docker image from Dockerfile

Dockerfile.combined vs. Dockerfile.separate

Docker-compose.yaml for starting a docker container

About

Releases

Packages

Languages

License

xuan-wei/nvidia-pytorch-tensorflow-conda-jupyter-ssh

Folders and files

Latest commit

History

Repository files navigation

nvidia-pytorch-tensorflow-conda-jupyter-ssh

Summary

How to compile a docker image from Dockerfile

Dockerfile.combined vs. Dockerfile.separate

Docker-compose.yaml for starting a docker container

About

Resources

License

Stars

Watchers

Forks

Languages