Why do some benchmarks not show a speedup when running on multiple devices? #55

gareth-ferneyhough · 2016-11-08T22:21:40Z

I am expecting to observe a speedup when I run either an EP or TP benchmark on multiple devices, but that is not the case.
The Stencil2D benchmark does show a speedup when I use multiple devices:
./shocdriver -d 0 -cuda -s 4 -benchmark Stencil2D
result for stencil: 141.2280 GFLOPS
vs.
./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Stencil2D
result for stencil: 406.1190 GFLOPS

However, this is the only benchmark I have found (so far) that shows a speedup. For example:
./shocdriver -d 0 -cuda -s 4 -benchmark Scan
result for scan: 46.8924 GB/s
vs
./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Scan
result for scan: 46.8561 GB/s
Similarly, Reduction and GEMM show no improvement either.
Am I missing something here? I am running version 1.1.5

The text was updated successfully, but these errors were encountered:

cponder · 2021-01-19T23:06:14Z

I see increased performance with the MaxFlops & QTC benchmarks.

In my runs, at least, the Stencil2D GFLOPS metric holds steady and so do the Scan cases.
It may be the case that the runs on each GPU are done in sequence, so the time increases in proportion to the number of GPUs and the time-normalized performance metrics average-out the same.

cponder mentioned this issue Dec 19, 2020

Where can I download version 1.1.5 ? #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do some benchmarks not show a speedup when running on multiple devices? #55

Why do some benchmarks not show a speedup when running on multiple devices? #55

gareth-ferneyhough commented Nov 8, 2016

cponder commented Jan 19, 2021

Why do some benchmarks not show a speedup when running on multiple devices? #55

Why do some benchmarks not show a speedup when running on multiple devices? #55

Comments

gareth-ferneyhough commented Nov 8, 2016

cponder commented Jan 19, 2021