Skip to content

Conversation

@akapoor3518
Copy link

@akapoor3518 akapoor3518 commented Aug 15, 2025

#######
FPGA LOG for llama-cli run command with MLIr Version 1.8
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin# ls *sh
run_llama_cli.sh tnApcMgr_run.sh tsi_shutdown.sh
run_platform_test.sh tsi_env.sh
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin# ./run_llama_cli.sh
is Luna.

llama_perf_sampler_print: sampling time = 111.02 ms / 11 runs ( 10.09 ms per token, 99.08 tokens per second)

llama_perf_context_print: load time = 77720.19 ms
llama_perf_context_print: prompt eval time = 64469.37 ms / 6 tokens (10744.89 ms per token, 0.09 tokens per second)
llama_perf_context_print: eval time = 74661.68 ms / 4 runs (18665.42 ms per token, 0.05 tokens per second)
llama_perf_context_print: total time = 152525.13 ms / 10 tokens

=== GGML Perf Summary ===
Op : Runs Total us Avg us
ADD : 220 9480048 43091.13
[TSAVORITE ] : 220 9480048 43091.13
MUL : 335 20925728 62464.86
[TSAVORITE ] : 335 20925728 62464.86
RMS_NORM : 723 55503 76.77
[CPU ] : 723 55503 76.77
MUL_MAT : 3452 270573306 78381.61
[CPU ] : 3452 270573306 78381.61
CPY : 633 32637 51.56
[CPU ] : 633 32637 51.56
CONT : 264 2510 9.51
[CPU ] : 264 2510 9.51
RESHAPE : 803 8719 10.86
[CPU ] : 803 8719 10.86
VIEW : 739 1058 1.43
[CPU ] : 739 1058 1.43
PERMUTE : 716 987 1.38
[CPU ] : 716 987 1.38
TRANSPOSE : 176 371 2.11
[CPU ] : 176 371 2.11
GET_ROWS : 48 22389 466.44
[CPU ] : 48 22389 466.44
SOFT_MAX : 318 65230 205.13
[CPU ] : 318 65230 205.13
ROPE : 742 69131 93.17
[CPU ] : 742 69131 93.17
UNARY : 110 12170489 110640.81
[TSAVORITE ] : 110 12170489 110640.81
-> SILU : 110 12170489 110640.81

GGML Tsavorite Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1   138.8340  138.8340     33.5960  [9.75e-02%] [Thread] GGML Tsavorite 
1   105.2380  105.2380     85.3620  └─ [7.39e-02%] tsi::runtime::TsavRTFPGA::initialize
1     8.5770    8.5770      8.5770    └─ [6.03e-03%] tsi::runtime::TsavRTFPGA::initializeQueues
1     8.1190    8.1190      8.1190    └─ [5.70e-03%] tsi::runtime::TsavRT::initialize
1     3.1800    3.1800      2.6980    └─ [2.23e-03%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2     0.4820    0.2410      0.4820      └─ [3.39e-04%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

65680 12846.4770 0.1956 0.0000 [ 9.03%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
65680 2.67e+05 4.0655 2.67e+05 └─ [187.59%] TXE 0 Idle
33000 5242.8418 0.1589 5242.8418 └─ [ 3.68%] [ txe_mult ]
18920 3041.5815 0.1608 3041.5815 └─ [ 2.14%] [ txe_silu ]
13760 2190.1478 0.1592 2190.1478 └─ [ 1.54%] [ txe_add ]

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

65680 7172.3360 0.1092 6698.2220 [ 5.04%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
65680 474.1140 0.0072 474.1140 └─ [3.33e-01%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

65680 12796.2930 0.1948 1650.2110 [ 8.99%] [Thread] tsi::runtime::TsavRT::processResponses
65680 11146.0820 0.1697 11146.0820 └─ [ 7.83%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

1    54.0620   54.0620     43.8350  [3.80e-02%] [Thread] tsi::runtime::TsavRTFPGA::finalize
1    10.2270   10.2270     10.2270  └─ [7.18e-03%] tsi::runtime::TsavRTFPGA::releaseTxes

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

65681 637.6180 0.0097 637.6180 [4.48e-01%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

65680 8800.0570 0.1340 8800.0570 [ 6.18%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

65680 2210.1040 0.0336 2210.1040 [ 1.55%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

65680 4751.4070 0.0723 4751.4070 [ 3.34%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

65680 655.7580 0.0100 655.7580 [4.61e-01%] [Thread] tsi::runtime::TsavRT::deallocate

-   1.42e+05    0.0000    1.42e+05  [100.00%] TOTAL

========================================================================================================================

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.9539

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv34_08_14_2025/bin#

@atrivedi-tsavoritesi
Copy link

@akapoor3518 did you have to update the submodule ?

@akapoor3518 akapoor3518 merged commit 61fb7ab into master Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants