Skip to content

Conversation

@akapoor3518
Copy link

@akapoor3518 akapoor3518 commented Oct 3, 2025

Posix validation when GGML_PERF COmplilation flag enable

[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a

llama_perf_sampler_print: sampling time = 27.81 ms / 16 runs ( 1.74 ms per token, 575.27 tokens per second)
llama_perf_context_print: load time = 3417.34 ms
llama_perf_context_print: prompt eval time = 2660.66 ms / 6 tokens ( 443.44 ms per token, 2.26 tokens per second)
llama_perf_context_print: eval time = 4752.34 ms / 9 runs ( 528.04 ms per token, 1.89 tokens per second)
llama_perf_context_print: total time = 8200.33 ms / 15 tokens

=== GGML Perf Summary ===
Op Runs Total us Avg us
ADD 2024 3816682 1885.71
MUL 2070 1540061 743.99
RMS_NORM 2070 1727102 834.35
MUL_MAT 36345 55148271 1517.36
CONT 7767 417007 53.69
RESHAPE 11309 6838 0.60
VIEW 17574 2930 0.17
PERMUTE 13831 2767 0.20
TRANSPOSE 3304 738 0.22
GET_ROWS 373 3778 10.13
SET_ROWS 7115 6055 0.85
SOFT_MAX 3604 327305 90.82
ROPE 7761 42109 5.43
GLU 1012 1964278 1940.99
cat

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

########
##########
Enable compilation flag GGML_PERF_DETAIL
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 1 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is

llama_perf_sampler_print: sampling time = 2.00 ms / 7 runs ( 0.29 ms per token, 3503.50 tokens per second)
llama_perf_context_print: load time = 2893.11 ms
llama_perf_context_print: prompt eval time = 2260.28 ms / 6 tokens ( 376.71 ms per token, 2.65 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 2895.79 ms / 7 tokens

=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 44 163071 3706.16
MUL OPU 45 137723 3060.51
RMS_NORM OPU 45 42687 948.60
MUL_MAT CPU 696 5810462 8348.36
CONT CPU 173 8377 48.42
RESHAPE CPU 233 165 0.71
VIEW CPU 392 36 0.09
PERMUTE CPU 298 85 0.29
TRANSPOSE CPU 74 21 0.28
GET_ROWS CPU 9 67 7.44
SET_ROWS CPU 173 274 1.58
SOFT_MAX CPU 88 37450 425.57
ROPE CPU 174 3023 17.37
GLU OPU 22 121001 5500.05

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

686 130.7100 0.1905 21.7140 [ 2.64%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
1372 108.7940 0.0793 108.7940 └─ [ 2.20%] tsi::runtime::executeWithTimeout
686 0.2020 2.94e-04 0.2020 └─ [4.08e-03%] LOAD_BLOB Command Execution
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148032128[0x800...
2 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148054656[0x800...
19 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148033152[0x800...
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148034176[0x800...
19 0.0000 0.0000 0.0000 └─

@atrivedi-tsavoritesi
Copy link

Approved

@akapoor3518 akapoor3518 merged commit 8cbd5a4 into master Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants