common: detect and prefer big cores on AArch64 hybrid CPU on linux #14532
+127
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
common : improve CPU detection on AArch64 hybrid systems
This PR enhances CPU detection logic for Linux on AArch64 platforms (excluding Android) by adding support for identifying big.LITTLE hybrid architectures. It introduces the following changes:
Features
•Hybrid CPU detection via is_hybrid_cpu(): determines whether the system uses heterogeneous cores by comparing max frequencies of CPUs.
•Efficiency core check with is_running_on_efficiency_core(): identifies whether the current thread is pinned to a lower-performance core.
•Big core counting logic in cpu_count_math_cpus(): identifies high-performance (“big”) cores based on frequency and returns their count.
•Logs total logical threads (n_threads) and big core detection results to aid diagnostics.
Motivation
Some AArch64 systems (e.g., ARMv8 CPUs like RK3588) include a mix of high-performance and power-efficient cores. Using all cores indiscriminately can lead to suboptimal performance. This patch aims to:
•Prefer high-performance cores when selecting threads for computation.
•Retain compatibility with existing systems by falling back to physical core count when necessary.
Testing
•Verified functional behavior on RK3588 running Ubuntu. From 3.44 tokens/s to 5.66 tokens/s for Qwen3-30bA3b model on CPU only.
•Confirmed llama-bench performance improvements when using only big cores.
•No changes to inference accuracy or perplexity.
•No impact on other architectures or platforms (code is gated behind aarch64 && linux && !ANDROID).