Skip to content

Conversation

@akapoor3518
Copy link

POSIX result
akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a

llama_perf_sampler_print: sampling time = 25.53 ms / 16 runs ( 1.60 ms per token, 626.66 tokens per second)
llama_perf_context_print: load time = 3486.81 ms
llama_perf_context_print: prompt eval time = 2823.51 ms / 6 tokens ( 470.59 ms per token, 2.13 tokens per second)
llama_perf_context_print: eval time = 4913.36 ms / 9 runs ( 545.93 ms per token, 1.83 tokens per second)
llama_perf_context_print: total time = 8428.57 ms / 15 tokens

=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 2024 5320381 2628.65
MUL OPU 2070 1538497 743.24
RMS_NORM OPU 2070 1757942 849.25
MUL_MAT CPU 36427 55584308 1525.91
CONT CPU 7723 437475 56.65
RESHAPE CPU 11372 6246 0.55
VIEW CPU 17813 2351 0.13
PERMUTE CPU 13764 3058 0.22
TRANSPOSE CPU 3506 926 0.26
GET_ROWS CPU 409 1006 2.46
SET_ROWS CPU 7121 6517 0.92
SOFT_MAX CPU 3463 330536 95.45
ROPE CPU 7709 41772 5.42
GLU OPU 1012 2989455 2954.01
cat

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

2090 1292.2790 0.6183 117.3370 [12.33%] [Thread] tsi::runtime::TsavRTPosix::loadBlob

Copy link

@atrivedi-tsavoritesi atrivedi-tsavoritesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to make changes to ggml-kernel as well

@akapoor3518 akapoor3518 merged commit 272b85c into master Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants