@FIR-1001 - GGML: Tsavorite Performance OPs data #60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Posix validation when GGML_PERF COmplilation flag enable
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a
llama_perf_sampler_print: sampling time = 27.81 ms / 16 runs ( 1.74 ms per token, 575.27 tokens per second)
llama_perf_context_print: load time = 3417.34 ms
llama_perf_context_print: prompt eval time = 2660.66 ms / 6 tokens ( 443.44 ms per token, 2.26 tokens per second)
llama_perf_context_print: eval time = 4752.34 ms / 9 runs ( 528.04 ms per token, 1.89 tokens per second)
llama_perf_context_print: total time = 8200.33 ms / 15 tokens
=== GGML Perf Summary ===
Op Runs Total us Avg us
ADD 2024 3816682 1885.71
MUL 2070 1540061 743.99
RMS_NORM 2070 1727102 834.35
MUL_MAT 36345 55148271 1517.36
CONT 7767 417007 53.69
RESHAPE 11309 6838 0.60
VIEW 17574 2930 0.17
PERMUTE 13831 2767 0.20
TRANSPOSE 3304 738 0.22
GET_ROWS 373 3778 10.13
SET_ROWS 7115 6055 0.85
SOFT_MAX 3604 327305 90.82
ROPE 7761 42109 5.43
GLU 1012 1964278 1940.99
cat
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
########
##########
Enable compilation flag GGML_PERF_DETAIL
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 1 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is
llama_perf_sampler_print: sampling time = 2.00 ms / 7 runs ( 0.29 ms per token, 3503.50 tokens per second)
llama_perf_context_print: load time = 2893.11 ms
llama_perf_context_print: prompt eval time = 2260.28 ms / 6 tokens ( 376.71 ms per token, 2.65 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 2895.79 ms / 7 tokens
=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 44 163071 3706.16
MUL OPU 45 137723 3060.51
RMS_NORM OPU 45 42687 948.60
MUL_MAT CPU 696 5810462 8348.36
CONT CPU 173 8377 48.42
RESHAPE CPU 233 165 0.71
VIEW CPU 392 36 0.09
PERMUTE CPU 298 85 0.29
TRANSPOSE CPU 74 21 0.28
GET_ROWS CPU 9 67 7.44
SET_ROWS CPU 173 274 1.58
SOFT_MAX CPU 88 37450 425.57
ROPE CPU 174 3023 17.37
GLU OPU 22 121001 5500.05
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
686 130.7100 0.1905 21.7140 [ 2.64%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
1372 108.7940 0.0793 108.7940 └─ [ 2.20%] tsi::runtime::executeWithTimeout
686 0.2020 2.94e-04 0.2020 └─ [4.08e-03%] LOAD_BLOB Command Execution
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148032128[0x800...
2 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148054656[0x800...
19 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148033152[0x800...
13 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148034176[0x800...
19 0.0000 0.0000 0.0000 └─