@FIR1016 - ggml: Release change with new compiler SDK release #62
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
POSIX result
akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a
llama_perf_sampler_print: sampling time = 25.53 ms / 16 runs ( 1.60 ms per token, 626.66 tokens per second)
llama_perf_context_print: load time = 3486.81 ms
llama_perf_context_print: prompt eval time = 2823.51 ms / 6 tokens ( 470.59 ms per token, 2.13 tokens per second)
llama_perf_context_print: eval time = 4913.36 ms / 9 runs ( 545.93 ms per token, 1.83 tokens per second)
llama_perf_context_print: total time = 8428.57 ms / 15 tokens
=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 2024 5320381 2628.65
MUL OPU 2070 1538497 743.24
RMS_NORM OPU 2070 1757942 849.25
MUL_MAT CPU 36427 55584308 1525.91
CONT CPU 7723 437475 56.65
RESHAPE CPU 11372 6246 0.55
VIEW CPU 17813 2351 0.13
PERMUTE CPU 13764 3058 0.22
TRANSPOSE CPU 3506 926 0.26
GET_ROWS CPU 409 1006 2.46
SET_ROWS CPU 7121 6517 0.92
SOFT_MAX CPU 3463 330536 95.45
ROPE CPU 7709 41772 5.42
GLU OPU 1012 2989455 2954.01
cat
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
2090 1292.2790 0.6183 117.3370 [12.33%] [Thread] tsi::runtime::TsavRTPosix::loadBlob