@FIR-1001 - GGML: Tsavorite Performance OPs data #60

akapoor3518 · 2025-10-03T22:21:15Z

Posix validation when GGML_PERF COmplilation flag enable

[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a

llama_perf_sampler_print: sampling time = 27.81 ms / 16 runs ( 1.74 ms per token, 575.27 tokens per second)
llama_perf_context_print: load time = 3417.34 ms
llama_perf_context_print: prompt eval time = 2660.66 ms / 6 tokens ( 443.44 ms per token, 2.26 tokens per second)
llama_perf_context_print: eval time = 4752.34 ms / 9 runs ( 528.04 ms per token, 1.89 tokens per second)
llama_perf_context_print: total time = 8200.33 ms / 15 tokens

=== GGML Perf Summary ===
Op Runs Total us Avg us
ADD 2024 3816682 1885.71
MUL 2070 1540061 743.99
RMS_NORM 2070 1727102 834.35
MUL_MAT 36345 55148271 1517.36
CONT 7767 417007 53.69
RESHAPE 11309 6838 0.60
VIEW 17574 2930 0.17
PERMUTE 13831 2767 0.20
TRANSPOSE 3304 738 0.22
GET_ROWS 373 3778 10.13
SET_ROWS 7115 6055 0.85
SOFT_MAX 3604 327305 90.82
ROPE 7761 42109 5.43
GLU 1012 1964278 1940.99
cat

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

########
##########
Enable compilation flag GGML_PERF_DETAIL
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 1 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is

llama_perf_sampler_print: sampling time = 2.00 ms / 7 runs ( 0.29 ms per token, 3503.50 tokens per second)
llama_perf_context_print: load time = 2893.11 ms
llama_perf_context_print: prompt eval time = 2260.28 ms / 6 tokens ( 376.71 ms per token, 2.65 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 2895.79 ms / 7 tokens

=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 44 163071 3706.16
MUL OPU 45 137723 3060.51
RMS_NORM OPU 45 42687 948.60
MUL_MAT CPU 696 5810462 8348.36
CONT CPU 173 8377 48.42
RESHAPE CPU 233 165 0.71
VIEW CPU 392 36 0.09
PERMUTE CPU 298 85 0.29
TRANSPOSE CPU 74 21 0.28
GET_ROWS CPU 9 67 7.44
SET_ROWS CPU 173 274 1.58
SOFT_MAX CPU 88 37450 425.57
ROPE CPU 174 3023 17.37
GLU OPU 22 121001 5500.05

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

686 130.7100 0.1905 21.7140 [ 2.64%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
1372 108.7940 0.0793 108.7940 └─ [ 2.20%] tsi::runtime::executeWithTimeout
686 0.2020 2.94e-04 0.2020 └─ [4.08e-03%] LOAD_BLOB Command Execution
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148032128[0x800...
2 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148054656[0x800...
19 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148033152[0x800...
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148034176[0x800...
19 0.0000 0.0000 0.0000 └─

src/llama-context.cpp

atrivedi-tsavoritesi · 2025-10-03T23:45:00Z

Approved

…flag GGML_PERF_DETAIL. Fix this issue

@FIR-1001 - GGML: Tsavorite Performance OPs data

4b00537

akapoor3518 requested review from Nithyanand-G, atrivedi-tsavoritesi, dineshReddy6381, dmpatra, gkethamallax, mikeuhler and mmankal as code owners October 3, 2025 22:21

atrivedi-tsavoritesi reviewed Oct 3, 2025

View reviewed changes

src/llama-context.cpp Show resolved Hide resolved

atrivedi-tsavoritesi approved these changes Oct 3, 2025

View reviewed changes

Extra llama-cpp model related message coming when enable compilation …

b6fd008

…flag GGML_PERF_DETAIL. Fix this issue

akapoor3518 merged commit 8cbd5a4 into master Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

@FIR-1001 - GGML: Tsavorite Performance OPs data #60

@FIR-1001 - GGML: Tsavorite Performance OPs data #60

Uh oh!

akapoor3518 commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

atrivedi-tsavoritesi commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@FIR-1001 - GGML: Tsavorite Performance OPs data #60

@FIR-1001 - GGML: Tsavorite Performance OPs data #60

Uh oh!

Conversation

akapoor3518 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OPU Profiling Results:

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

Uh oh!

Uh oh!

atrivedi-tsavoritesi commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akapoor3518 commented Oct 3, 2025 •

edited

Loading