ggml-qnn: refine ggml backend subsystem #216

zhouwg · 2024-05-31T00:19:02Z

This PR is equal to a PR in upstream GGML communtiy:

Unfortunately, the PR in upstream was closed by the maintainer of ggml backend subsystem very quickly/immediately less then 1 minute after I summited this PR in upstream llama.cpp.

I totally disagree with what the maintainer of ggml backend subsystem said(because some special backends only need system memory):

"There are too many things wrong here to list. At the most basic level, this approach will not work because backends typically have a memory that is not accessible from other backends, and when switching to a different backend it is necessary to ensure that all the tensors required to evaluate the graph are available in the backend memory. This is the main job of ggml_backend_sched."

…ocal and PR in upstream

zhouwg · 2024-05-31T05:29:55Z

whisper and llm and minicpm-v infereence works fine with this PR(mixed inference between Qualcomm's CPU&GPU / CPU&NPU).

next step:
(1) buf fix in JNI layer, this is not important in development stage(there are three known bugs in JNI layer);

(2)QNN performance fine-tuning: focus on matmul because the performance of QNN's matmul is 2x - 10x then original GGML's matmul, all the other GGML OPs will be computed by the original GGML in the CPU side, just offload the matmul in QNN side.

zhouwg added 2 commits May 30, 2024 11:59

ggml-qnn: refine code and keep sync ggml-qnn.cpp&ggml-qnn.h between l…

fcf5338

…ocal and PR in upstream

ggml-qnn: refine ggml backend subsystem

70835aa

zhouwg force-pushed the ggml-qnn-backend-refine branch from 0b726f2 to 70835aa Compare May 31, 2024 01:42

zhouwg added 3 commits May 31, 2024 09:42

ggml-qnn: keep sync with PR(ggerganov/llama.cpp#7641) in upstream

e00612d

fix merge conflict

74dd709

ggml-jni: refine UT to adapt new ggml backend subsystem

7e492c9

zhouwg merged commit bee4a4b into master May 31, 2024

zhouwg mentioned this pull request Jun 4, 2024

llama : offload to RPC in addition to other backends ggerganov/llama.cpp#7640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-qnn: refine ggml backend subsystem #216

ggml-qnn: refine ggml backend subsystem #216

zhouwg commented May 31, 2024 •

edited

Loading

zhouwg commented May 31, 2024 •

edited

Loading

ggml-qnn: refine ggml backend subsystem #216

ggml-qnn: refine ggml backend subsystem #216

Conversation

zhouwg commented May 31, 2024 • edited Loading

zhouwg commented May 31, 2024 • edited Loading

zhouwg commented May 31, 2024 •

edited

Loading

zhouwg commented May 31, 2024 •

edited

Loading