Skip to content

Conversation

@akapoor3518
Copy link

#########
Posix validation

akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a cat

llama_perf_sampler_print: sampling time = 25.69 ms / 16 runs ( 1.61 ms per token, 622.86 tokens per second)
llama_perf_context_print: load time = 3649.58 ms
llama_perf_context_print: prompt eval time = 2890.32 ms / 6 tokens ( 481.72 ms per token, 2.08 tokens per second)
llama_perf_context_print: eval time = 4057.99 ms / 9 runs ( 450.89 ms per token, 2.22 tokens per second)
llama_perf_context_print: total time = 7737.19 ms / 15 tokens

=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 4048 7484834 1849.02
MUL OPU 4140 3206684 774.56
RMS_NORM OPU 4140 3341530 807.13
MUL_MAT CPU 72790 104568500 1436.58
CONT CPU 15674 827400 52.79
RESHAPE CPU 22692 13660 0.60
VIEW CPU 35074 5018 0.14
PERMUTE CPU 27582 5844 0.21
TRANSPOSE CPU 6652 1748 0.26
GET_ROWS CPU 766 2180 2.85
SET_ROWS CPU 15056 12454 0.83
SOFT_MAX CPU 7606 648106 85.21
ROPE CPU 15752 82048 5.21
GLU OPU 2024 3616792 1786.95

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

2090 871.9760 0.4172 107.6070 [ 8.91%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
4180 763.4950 0.1827 763.4950 └─ [ 7.80%] tsi::runtime::executeWithTimeout
2090 0.8740 4.18e-04 0.8740 └─ [8.93e-03%] LOAD_BLOB Command Execution
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148311680[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148324992[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148312704[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148313728[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148314752[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148315776[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148316800[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148317824[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148318848[0x800...

##########
FPGA Validation(working on will add shortly)

@atrivedi-tsavoritesi
Copy link

@akapoor3518 please add the logs from posix and FPGA

@akapoor3518 akapoor3518 merged commit c806b46 into master Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants