-
Notifications
You must be signed in to change notification settings - Fork 45.3k
Description
I want to train cifarnet on single machine with 4 gpus, but the performance reduces comparing with training with only one gpu.
[slim] Train cifarnet using the default script slim/scripts/train_cifarnet_on_cifar10.sh
When using the default script the speed is as follow:
INFO:tensorflow:global step 13900: loss = 0.7609 (0.06 sec/step)
Modify slim/scripts/train_cifarnet_on_cifar10.sh by set num_clones=4
The speed become slow (I also try change num_preprocessing_threads = 1/2/4/8/16, num_readers=4/8, useless)
INFO:tensorflow:global step 14000: loss = 0.7438 (0.26 sec/step)
INFO:tensorflow:global step 14100: loss = 0.6690 (0.26 sec/step)
Hardware
Four Titan X
02:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
82:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
+------------------------------------------------------+
| NVIDIA-SMI 352.30 Driver Version: 352.30 |
|-----------------------------------+----------------------+------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================+=================+=================|
| 0 GeForce GTX TIT... On | 0000:02:00.0 Off | N/A |
| 28% 67C P2 75W / 250W | 228MiB / 12287MiB | 0% Default |
+----------------------------------+-----------------------+------------------------+
。。。
32 processor each as follow :
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Finally
Could anyone give some advices?
I read some issues about multi-gpu in this rep, but still can't solve this.
I think it caused by IO, because I notice that when train on single gpu the GPU-Util is above 90%( when train with 4 gpus, the GPU-Util is about 20% ).
And I don't think it's due to the hardware performance of my machine.