Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml-qnn: refine ggml backend subsystem #216

Merged
merged 5 commits into from
May 31, 2024
Merged

Conversation

zhouwg
Copy link
Owner

@zhouwg zhouwg commented May 31, 2024

This PR is equal to a PR in upstream GGML communtiy:

ggerganov/llama.cpp#7641

Unfortunately, the PR in upstream was closed by the maintainer of ggml backend subsystem very quickly/immediately less then 1 minute after I summited this PR in upstream llama.cpp.

I totally disagree with what the maintainer of ggml backend subsystem said(because some special backends only need system memory):

"There are too many things wrong here to list. At the most basic level, this approach will not work because backends typically have a memory that is not accessible from other backends, and when switching to a different backend it is necessary to ensure that all the tensors required to evaluate the graph are available in the backend memory. This is the main job of ggml_backend_sched."

@zhouwg
Copy link
Owner Author

zhouwg commented May 31, 2024

whisper and llm and minicpm-v infereence works fine with this PR(mixed inference between Qualcomm's CPU&GPU / CPU&NPU).

next step:
(1) buf fix in JNI layer, this is not important in development stage(there are three known bugs in JNI layer);

(2)QNN performance fine-tuning: focus on matmul because the performance of QNN's matmul is 2x - 10x then original GGML's matmul, all the other GGML OPs will be computed by the original GGML in the CPU side, just offload the matmul in QNN side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant