@FIR-810: Update llama.cpp and ggml-tsi-kernel to use sdk-0.1.6 #39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The test results are as follows:
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv32_06_20_2025/bin# ./run_llama_cli.sh
my cat's name is Luna.
llama_perf_sampler_print: sampling time = 112.13 ms / 11 runs ( 10.19 ms per token, 98.10 tokens per second)llama_perf_context_print: load time = 94205.21 ms
llama_perf_context_print: prompt eval time = 71666.10 ms / 6 tokens (11944.35 ms per token, 0.08 tokens per second)
llama_perf_context_print: eval time = 133802.66 ms / 4 runs (33450.66 ms per token, 0.03 tokens per second)
llama_perf_context_print: total time = 228145.18 ms / 10 tokens
=== GGML Perf Summary ===
Op : Runs Total us Avg us
ADD : 220 11355075 51613.98
[TSAVORITE ] : 220 11355075 51613.98
MUL : 335 24230290 72329.22
[TSAVORITE ] : 335 24230290 72329.22
RMS_NORM : 750 57746 76.99
[CPU ] : 750 57746 76.99
MUL_MAT : 3437 408046717 118721.77
[CPU ] : 3437 408046717 118721.77
CPY : 662 25750 38.90
[CPU ] : 662 25750 38.90
CONT : 245 2654 10.83
[CPU ] : 245 2654 10.83
RESHAPE : 795 9050 11.38
[CPU ] : 795 9050 11.38
VIEW : 714 1182 1.66
[CPU ] : 714 1182 1.66
PERMUTE : 727 1128 1.55
[CPU ] : 727 1128 1.55
TRANSPOSE : 187 491 2.63
[CPU ] : 187 491 2.63
GET_ROWS : 49 54016 1102.37
[CPU ] : 49 54016 1102.37
SOFT_MAX : 320 67989 212.47
[CPU ] : 320 67989 212.47
ROPE : 756 63382 83.84
[CPU ] : 756 63382 83.84
UNARY : 110 14492686 131751.69
[TSAVORITE ] : 110 14492686 131751.69
-> SILU : 110 14492686 131751.69
GGML Tsavorite Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
65680 18552.9420 0.2825 0.0000 [ 8.89%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
65680 1.95e+05 2.9660 1.95e+05 └─ [93.39%] TXE 0 Idle
33000 4778.1774 0.1448 4778.1774 └─ [ 2.29%] [ txe_mult ]
18920 2787.2660 0.1473 2787.2660 └─ [ 1.34%] [ txe_silu ]
13760 1998.9862 0.1453 1998.9862 └─ [9.58e-01%] [ txe_add ]
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
65680 7230.2060 0.1101 6762.2310 [ 3.47%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
65680 467.9750 0.0071 467.9750 └─ [2.24e-01%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
65680 18425.6450 0.2805 1552.1490 [ 8.83%] [Thread] tsi::runtime::TsavRT::processResponses
65680 16873.4960 0.2569 16873.4960 └─ [ 8.09%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
65681 654.6860 0.0100 654.6860 [3.14e-01%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
65680 9682.9330 0.1474 9682.9330 [ 4.64%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
65680 2237.1790 0.0341 2237.1790 [ 1.07%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
65680 4743.3790 0.0722 4743.3790 [ 2.27%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
65680 665.7930 0.0101 665.7930 [3.19e-01%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.9988
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv32_06_20_2025/bin#