Skip to content

Conversation

@atrivedi-tsavoritesi
Copy link

The test results with ./run_llama_cli.sh with 5 tokens is as follows

+++
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin# ./run_llama_cli.sh
my cat's name is Max. He'

llama_perf_sampler_print: sampling time = 111.70 ms / 11 runs ( 10.15 ms per token, 98.47 tokens per second)llama_perf_context_print: load time = 132926.48 ms
llama_perf_context_print: prompt eval time = 109957.33 ms / 6 tokens (18326.22 ms per token, 0.05 tokens per second)
llama_perf_context_print: eval time = 195682.91 ms / 4 runs (48920.73 ms per token, 0.02 tokens per second)
llama_perf_context_print: total time = 328764.01 ms / 10 tokens

GGML Tsavorite Profiling Results:

Calls Total(ms) T/call Self(ms) Function

33160 100086.000 3.018 47907.157 [32%] RuntimeHostShim::awaitCommandListCompletion
18920 29912.952 1.581 29912.952 └─ [10%] [ txe_silu ]
14080 22010.102 1.563 22010.102 └─ [ 7%] [ txe_mult ]
160 253.071 1.582 253.071 └─ [ 0%] [ txe_add ]
33160 1.178 0.000 1.178 └─ [ 0%] TXE 0 Idle
1 114.000 114.000 18.000 [ 0%] GGML Tsavorite
1 96.000 96.000 96.000 └─ [ 0%] RuntimeHostShim::initialize
1 52.000 52.000 52.000 [ 0%] RuntimeHostShim::finalize
33160 26.000 0.001 26.000 [ 0%] RuntimeHostShim::loadBlob
33160 23.000 0.001 23.000 [ 0%] RuntimeHostShim::finalizeCommandList
33160 5.000 0.000 5.000 [ 0%] RuntimeHostShim::addCommandToList
33161 3.000 0.000 3.000 [ 0%] RuntimeHostShim::allocate
33160 3.000 0.000 3.000 [ 0%] RuntimeHostShim::createCommandList
113720 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate

412163 308849.000 0.749308849.000 [100%] TOTAL

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin# +++

Make sure to read the contributing guidelines before submitting a PR

The test results with ./run_llama_cli.sh with 5 tokens is as follows

+++
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin# ./run_llama_cli.sh
 my cat's name is Max. He'

llama_perf_sampler_print:    sampling time =     111.70 ms /    11 runs   (   10.15 ms per token,    98.47 tokens per second)llama_perf_context_print:        load time =  132926.48 ms
llama_perf_context_print: prompt eval time =  109957.33 ms /     6 tokens (18326.22 ms per token,     0.05 tokens per second)
llama_perf_context_print:        eval time =  195682.91 ms /     4 runs   (48920.73 ms per token,     0.02 tokens per second)
llama_perf_context_print:       total time =  328764.01 ms /    10 tokens

GGML Tsavorite Profiling Results:
------------------------------------------------------------------------------------------------------------------------
Calls  Total(ms)    T/call  Self(ms)  Function
------------------------------------------------------------------------------------------------------------------------
33160 100086.000     3.018 47907.157  [32%] RuntimeHostShim::awaitCommandListCompletion
18920  29912.952     1.581 29912.952  └─ [10%] [ txe_silu ]
14080  22010.102     1.563 22010.102  └─ [ 7%] [ txe_mult ]
  160    253.071     1.582   253.071  └─ [ 0%] [ txe_add ]
33160      1.178     0.000     1.178  └─ [ 0%] TXE 0 Idle
    1    114.000   114.000    18.000  [ 0%] GGML Tsavorite
    1     96.000    96.000    96.000  └─ [ 0%] RuntimeHostShim::initialize
    1     52.000    52.000    52.000  [ 0%] RuntimeHostShim::finalize
33160     26.000     0.001    26.000  [ 0%] RuntimeHostShim::loadBlob
33160     23.000     0.001    23.000  [ 0%] RuntimeHostShim::finalizeCommandList
33160      5.000     0.000     5.000  [ 0%] RuntimeHostShim::addCommandToList
33161      3.000     0.000     3.000  [ 0%] RuntimeHostShim::allocate
33160      3.000     0.000     3.000  [ 0%] RuntimeHostShim::createCommandList
113720      0.000     0.000     0.000  [ 0%] RuntimeHostShim::getShmemManager
33160      0.000     0.000     0.000  [ 0%] RuntimeHostShim::launchBlob
33160      0.000     0.000     0.000  [ 0%] RuntimeHostShim::unloadBlob
33160      0.000     0.000     0.000  [ 0%] RuntimeHostShim::deallocate
========================================================================================================================
412163 308849.000     0.749308849.000  [100%] TOTAL
========================================================================================================================

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin#
+++
Copy link

@LewisLui777 LewisLui777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Good job.

Copy link

@akapoor3518 akapoor3518 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atrivedi-tsavoritesi atrivedi-tsavoritesi merged commit 15e7365 into master Jun 18, 2025
@atrivedi-tsavoritesi atrivedi-tsavoritesi deleted the FIR-757 branch June 18, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants