Skip to content

Conversation

@atrivedi-tsavoritesi
Copy link

The test results are as follows:

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv32_06_20_2025/bin# ./run_llama_cli.sh
my cat's name is Luna.

llama_perf_sampler_print: sampling time = 112.13 ms / 11 runs ( 10.19 ms per token, 98.10 tokens per second)llama_perf_context_print: load time = 94205.21 ms
llama_perf_context_print: prompt eval time = 71666.10 ms / 6 tokens (11944.35 ms per token, 0.08 tokens per second)
llama_perf_context_print: eval time = 133802.66 ms / 4 runs (33450.66 ms per token, 0.03 tokens per second)
llama_perf_context_print: total time = 228145.18 ms / 10 tokens

=== GGML Perf Summary ===
Op : Runs Total us Avg us
ADD : 220 11355075 51613.98
[TSAVORITE ] : 220 11355075 51613.98
MUL : 335 24230290 72329.22
[TSAVORITE ] : 335 24230290 72329.22
RMS_NORM : 750 57746 76.99
[CPU ] : 750 57746 76.99
MUL_MAT : 3437 408046717 118721.77
[CPU ] : 3437 408046717 118721.77
CPY : 662 25750 38.90
[CPU ] : 662 25750 38.90
CONT : 245 2654 10.83
[CPU ] : 245 2654 10.83
RESHAPE : 795 9050 11.38
[CPU ] : 795 9050 11.38
VIEW : 714 1182 1.66
[CPU ] : 714 1182 1.66
PERMUTE : 727 1128 1.55
[CPU ] : 727 1128 1.55
TRANSPOSE : 187 491 2.63
[CPU ] : 187 491 2.63
GET_ROWS : 49 54016 1102.37
[CPU ] : 49 54016 1102.37
SOFT_MAX : 320 67989 212.47
[CPU ] : 320 67989 212.47
ROPE : 756 63382 83.84
[CPU ] : 756 63382 83.84
UNARY : 110 14492686 131751.69
[TSAVORITE ] : 110 14492686 131751.69
-> SILU : 110 14492686 131751.69

GGML Tsavorite Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1   140.6560  140.6560     44.7560  [6.74e-02%] [Thread] GGML Tsavorite
1    95.9000   95.9000     76.0890  └─ [4.60e-02%] tsi::runtime::TsavRTFPGA::initialize
1     9.3800    9.3800      9.3800    └─ [4.50e-03%] tsi::runtime::TsavRTFPGA::initializeQueues
1     7.7450    7.7450      7.7450    └─ [3.71e-03%] tsi::runtime::TsavRT::initialize
1     2.6860    2.6860      1.8410    └─ [1.29e-03%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2     0.8450    0.4225      0.8450      └─ [4.05e-04%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

65680 18552.9420 0.2825 0.0000 [ 8.89%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
65680 1.95e+05 2.9660 1.95e+05 └─ [93.39%] TXE 0 Idle
33000 4778.1774 0.1448 4778.1774 └─ [ 2.29%] [ txe_mult ]
18920 2787.2660 0.1473 2787.2660 └─ [ 1.34%] [ txe_silu ]
13760 1998.9862 0.1453 1998.9862 └─ [9.58e-01%] [ txe_add ]

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

65680 7230.2060 0.1101 6762.2310 [ 3.47%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
65680 467.9750 0.0071 467.9750 └─ [2.24e-01%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

65680 18425.6450 0.2805 1552.1490 [ 8.83%] [Thread] tsi::runtime::TsavRT::processResponses
65680 16873.4960 0.2569 16873.4960 └─ [ 8.09%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

1    38.9070   38.9070     37.2570  [1.87e-02%] [Thread] tsi::runtime::TsavRTFPGA::finalize
1     1.6500    1.6500      1.6500  └─ [7.91e-04%] tsi::runtime::TsavRTFPGA::releaseTxes

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

65681 654.6860 0.0100 654.6860 [3.14e-01%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

65680 9682.9330 0.1474 9682.9330 [ 4.64%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

65680 2237.1790 0.0341 2237.1790 [ 1.07%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

65680 4743.3790 0.0722 4743.3790 [ 2.27%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

65680 665.7930 0.0101 665.7930 [3.19e-01%] [Thread] tsi::runtime::TsavRT::deallocate

-   2.09e+05    0.0000    2.09e+05  [100.00%] TOTAL

========================================================================================================================

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.9988

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv32_06_20_2025/bin#

Copy link

@LewisLui777 LewisLui777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ashish this looks great.

@atrivedi-tsavoritesi atrivedi-tsavoritesi merged commit 2e5b057 into master Jul 11, 2025
@atrivedi-tsavoritesi atrivedi-tsavoritesi deleted the FIR-810 branch July 11, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants