Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do some benchmarks not show a speedup when running on multiple devices? #55

Open
gareth-ferneyhough opened this issue Nov 8, 2016 · 1 comment

Comments

@gareth-ferneyhough
Copy link

I am expecting to observe a speedup when I run either an EP or TP benchmark on multiple devices, but that is not the case.
The Stencil2D benchmark does show a speedup when I use multiple devices:
./shocdriver -d 0 -cuda -s 4 -benchmark Stencil2D
result for stencil: 141.2280 GFLOPS
vs.
./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Stencil2D
result for stencil: 406.1190 GFLOPS

However, this is the only benchmark I have found (so far) that shows a speedup. For example:
./shocdriver -d 0 -cuda -s 4 -benchmark Scan
result for scan: 46.8924 GB/s
vs
./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Scan
result for scan: 46.8561 GB/s
Similarly, Reduction and GEMM show no improvement either.
Am I missing something here? I am running version 1.1.5

@cponder
Copy link

cponder commented Jan 19, 2021

I see increased performance with the MaxFlops & QTC benchmarks.

In my runs, at least, the Stencil2D GFLOPS metric holds steady and so do the Scan cases.
It may be the case that the runs on each GPU are done in sequence, so the time increases in proportion to the number of GPUs and the time-normalized performance metrics average-out the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants