username: ulsee
password: ulsee!
username: root
password: **********
In order to reduce Operation Engineer's work, we choose docker to manage dev&ops.
On server, CentOS7 and Nvidia 375 diver are installed, by nvidia-docker tool we can
passthrough GPU device beyond the docker container, this will bring performance and
ease of usage.
Item | Detail |
---|---|
OS | centos7 |
CUDA | nvidia-375 |
Item | Detail |
---|---|
CPU | Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz X 2 |
GPU | Nvidia Titan X Pascal X 8 |
Memory | Samsung DDR4 128GB |
SSD | Intel PCIex4 1TB(mounted on /) |
HardDisk | 7TB raid5(mounted on /raid5) |
If you have no experience on Docker, please google some materials about it, also here
is a reference of Docker tutorial.
Do not worry about the difference between docker and nvidia-docker, good news is that
nvidia-docker tried to stay as close as possible to the Docker philosophy. So you can use
nvidia-docker as an alternative Docker CLI.
For the custom requirements, we need to define our own Docker image by Dockerfile. We can
find a lot of Caffe/MXNet/TensorFLow images online, eg dockerhub or github
It's an very clear procedure to make a Dockerfile, you can try to explore and find your featured
Dockerfile lines to add into yours.
We refer to the nvidia/cuda and caffe dockerfile, create some basic images located at
- /raid5/SourceCode/ulsDeepLearningDocker
You can start from the Local images, eg. add this line into Dockerfile's 1st line, then fill your
own commands, which will reduce a lot of building time, especially, there're lots of downloading tasks.
Thus, you can skip waiting for downloading.
From ulsee/cuda8.0-cudnnv5.1-caffe-centos7
To passthrough GPU device in Docker Container, you can run
nvidia-docker run --rm ulsee/cuda8.0-cudnn5.1-caffe-tensorflow nvidia-smi
Note: If you want to use network in the container, please add --net=host
in the middle. I.E.
nvidia-docker run --rm ulsee/cuda8.0-cudnn5.1-caffe-tensorflow nvidia-smi
Note: --rm
can destroy the backup images generated in the run
period.
nvidia-docker run --rm -ti --net=host --name ulsee_test ulsee/cuda8.0-cudnn5.1-caffe-tensorflow
If you have apt-get
something new, please remember to docker commit
your container to the images. Or all the operation in the image will be cleared.
docker commit f14dc ulsee/cuda8.0-cudnn5.1-caffe-tensorflow
f14dc
is the identical container ID(can be represented by the specific first several letters), ulsee/cuda8.0-cudnn5.1-caffe-tensorflow
is the docker image to be updated, which can be obtained by :
docker ps -l
For more, you can nvidia-docker/docker run --help Some tips to run docker images in background and foreground again freely
You can add a data volume to a container using the -v
flag with the docker create
and docker run
command. You can use the -v
multiple times to mount multiple data volumes.
nvidia-docker run --rm -ti -v /Users/<path>:/<container path> ulsee/cuda8.0-cudnn5.1-caffe-tensorflow
Note: They must be AP(Absolute Path). You can also use the VOLUME instruction in a Dockerfile to add one or more new volumes to any container created from that image. For more, please refer to tips.
You can run matlab
by mount
the matlab of the host.
nvidia-docker --rm -ti -v /usr/local/MATLAB:/usr/local/MATLAB ulsee/cuda8.0-cudnn5.1-caffe-tensorflow
Then,
cd /usr/local/MATLAB/R2016a/bin
Then, execute the matlab.
./matlab
We totally have 8 Nvidia Titan X Pascal GPUs on our server, to achive the greatest value of our GPUs and let everyone to use the resouce fairly, there are some polices we should observe:
- Before select the GPUs to start training process, you should type
nvidia-smi
to query GPU devices state, then choose the idle one. We should avoid sharing the same GPU for computing with others. - In most cases, you should use one GPU card for training if sufficient. At most you can use two cards for training, if it still not satisfy your requirement, please conntact Chris to get the authorisation for more cards.
- If you have made sure which gpus will be used, you should pass the indices of the GPUs to the environment variable
NV_GPU
to isolate with others:NV_GPU=5 nvidia-docker run -ti nvidia/cuda nvidia-smi
. Thus you can only see GPU 5 in your container now.
Your own custom docker images should be named like:
${USERNAME}/${ImageName}
,
eg. shikun/abc
The public basic docker images shared by us should be named like:
ulsee/${ImageName}
,
eg. ulsee/faceapi:v1.0
We should use -m
flag to specify the maximum amount of memory the container can use, the maximum memory we allow to use is 30GB, if it doesn't satify your requriement, please conntact Chris for more memory resouces. The command looks like:
docker run -m 30g ...
If you have read all the content before this chapter, you can conntact me(Shikun) to create an account for you.Then you will have your own home dir on server, please place your train/test data at this dir, DO NOT put them inside the container or other place.
Enjoy!!!