@FIR-895 - llama.cpp: updating the MLIR SDK Version to 1.8 #41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#######
FPGA LOG for llama-cli run command with MLIr Version 1.8
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin# ls *sh
run_llama_cli.sh tnApcMgr_run.sh tsi_shutdown.sh
run_platform_test.sh tsi_env.sh
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin# ./run_llama_cli.sh
is Luna.
llama_perf_sampler_print: sampling time = 111.02 ms / 11 runs ( 10.09 ms per token, 99.08 tokens per second)
llama_perf_context_print: load time = 77720.19 ms
llama_perf_context_print: prompt eval time = 64469.37 ms / 6 tokens (10744.89 ms per token, 0.09 tokens per second)
llama_perf_context_print: eval time = 74661.68 ms / 4 runs (18665.42 ms per token, 0.05 tokens per second)
llama_perf_context_print: total time = 152525.13 ms / 10 tokens
=== GGML Perf Summary ===
Op : Runs Total us Avg us
ADD : 220 9480048 43091.13
[TSAVORITE ] : 220 9480048 43091.13
MUL : 335 20925728 62464.86
[TSAVORITE ] : 335 20925728 62464.86
RMS_NORM : 723 55503 76.77
[CPU ] : 723 55503 76.77
MUL_MAT : 3452 270573306 78381.61
[CPU ] : 3452 270573306 78381.61
CPY : 633 32637 51.56
[CPU ] : 633 32637 51.56
CONT : 264 2510 9.51
[CPU ] : 264 2510 9.51
RESHAPE : 803 8719 10.86
[CPU ] : 803 8719 10.86
VIEW : 739 1058 1.43
[CPU ] : 739 1058 1.43
PERMUTE : 716 987 1.38
[CPU ] : 716 987 1.38
TRANSPOSE : 176 371 2.11
[CPU ] : 176 371 2.11
GET_ROWS : 48 22389 466.44
[CPU ] : 48 22389 466.44
SOFT_MAX : 318 65230 205.13
[CPU ] : 318 65230 205.13
ROPE : 742 69131 93.17
[CPU ] : 742 69131 93.17
UNARY : 110 12170489 110640.81
[TSAVORITE ] : 110 12170489 110640.81
-> SILU : 110 12170489 110640.81
GGML Tsavorite Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
65680 12846.4770 0.1956 0.0000 [ 9.03%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
65680 2.67e+05 4.0655 2.67e+05 └─ [187.59%] TXE 0 Idle
33000 5242.8418 0.1589 5242.8418 └─ [ 3.68%] [ txe_mult ]
18920 3041.5815 0.1608 3041.5815 └─ [ 2.14%] [ txe_silu ]
13760 2190.1478 0.1592 2190.1478 └─ [ 1.54%] [ txe_add ]
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
65680 7172.3360 0.1092 6698.2220 [ 5.04%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
65680 474.1140 0.0072 474.1140 └─ [3.33e-01%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
65680 12796.2930 0.1948 1650.2110 [ 8.99%] [Thread] tsi::runtime::TsavRT::processResponses
65680 11146.0820 0.1697 11146.0820 └─ [ 7.83%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
65681 637.6180 0.0097 637.6180 [4.48e-01%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
65680 8800.0570 0.1340 8800.0570 [ 6.18%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
65680 2210.1040 0.0336 2210.1040 [ 1.55%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
65680 4751.4070 0.0723 4751.4070 [ 3.34%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
65680 655.7580 0.0100 655.7580 [4.61e-01%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.9539
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin#