Skip to content

[slim] performance reduce when train cifarnet with multi-gpu #490

@D-X-Y

Description

@D-X-Y

I want to train cifarnet on single machine with 4 gpus, but the performance reduces comparing with training with only one gpu.

[slim] Train cifarnet using the default script slim/scripts/train_cifarnet_on_cifar10.sh

When using the default script the speed is as follow:

INFO:tensorflow:global step 13900: loss = 0.7609 (0.06 sec/step)

Modify slim/scripts/train_cifarnet_on_cifar10.sh by set num_clones=4

The speed become slow (I also try change num_preprocessing_threads = 1/2/4/8/16, num_readers=4/8, useless)

INFO:tensorflow:global step 14000: loss = 0.7438 (0.26 sec/step)
INFO:tensorflow:global step 14100: loss = 0.6690 (0.26 sec/step)

Hardware

Four Titan X
02:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
82:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation Device 17c2 (rev a1)
+------------------------------------------------------+
| NVIDIA-SMI 352.30 Driver Version: 352.30 |
|-----------------------------------+----------------------+------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================+=================+=================|
| 0 GeForce GTX TIT... On | 0000:02:00.0 Off | N/A |
| 28% 67C P2 75W / 250W | 228MiB / 12287MiB | 0% Default |
+----------------------------------+-----------------------+------------------------+
。。。

32 processor each as follow :
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

Finally

Could anyone give some advices?
I read some issues about multi-gpu in this rep, but still can't solve this.
I think it caused by IO, because I notice that when train on single gpu the GPU-Util is above 90%( when train with 4 gpus, the GPU-Util is about 20% ).
And I don't think it's due to the hardware performance of my machine.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions