Misc. bug: performance drop with 2x SYCL GPUs #12575

ky438 · 2025-03-25T22:27:44Z

Name and Version

version: 4956 (e2f56017)
built with Intel(R) oneAPI DPC++/C++ Compiler 2025.1.0 (2025.1.0.20250317) for x86_64-unknown-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

bin/llama-bench -m models/llama-2-7b.Q4_0.gguf -mmp 0
bin/llama-bench -m models/llama-2-7b.Q4_0.gguf -mmp 0 -sm none

Problem description & steps to reproduce

I notice that performance drops drastically, and variance explodes, if two Intel Arc B580 GPUs are used instead of one:

2x GPUs:

| model                          |       size |     params | backend    | ngl | mmap |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |    0 |         pp512 |      2114.89 ± 19.10 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |    0 |         tg128 |        18.81 ± 13.86 |

1x GPU with -sm none

| model                          |       size |     params | backend    | ngl |    sm | mmap |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ---: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |    0 |         pp512 |       2233.09 ± 2.90 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |    0 |         tg128 |         41.76 ± 0.04 |

Why is this?

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

NeoZhangJianyu · 2025-03-26T01:16:07Z

could you share the whole log?

NeoZhangJianyu · 2025-03-26T02:23:48Z

@ky438
I can't see any profile info of this github account.
I see some several comments of different PRs/issues created by this account in same day.

Could you share background of this issue?

ky438 · 2025-03-26T03:16:43Z

I'm an software engineer working in mechanical engineering, and I am setting up LLMs for local use.

…

On Tue, Mar 25, 2025 at 07:24:10PM -0700, Neo Zhang Jianyu wrote: NeoZhangJianyu left a comment (ggml-org/llama.cpp#12575) @ky438 I can't see any profile info of this github account. I see some several comments of different PRs/issues created by this account in same day. Could you share background of this issue? -- Reply to this email directly or view it on GitHub: #12575 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

NeoZhangJianyu · 2025-03-26T05:31:18Z

OK! I think you should provide whole log of this issue.
So that we could help you.

ky438 added the bug-unconfirmed label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: performance drop with 2x SYCL GPUs #12575

Misc. bug: performance drop with 2x SYCL GPUs #12575

ky438 commented Mar 25, 2025

NeoZhangJianyu commented Mar 26, 2025

NeoZhangJianyu commented Mar 26, 2025

ky438 commented Mar 26, 2025 via email

NeoZhangJianyu commented Mar 26, 2025

Misc. bug: performance drop with 2x SYCL GPUs #12575

Misc. bug: performance drop with 2x SYCL GPUs #12575

Comments

ky438 commented Mar 25, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

NeoZhangJianyu commented Mar 26, 2025

NeoZhangJianyu commented Mar 26, 2025

ky438 commented Mar 26, 2025 via email

NeoZhangJianyu commented Mar 26, 2025