Skip to content

Conversation

@akapoor3518
Copy link

No more crash
build-posix/bin/llama-cli -p "ヨーロッパには何人の国がありますか" -m /proj/rel/sw/ggml/models/SakanaAI-TinySwallow-1.5B-Instruct-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1    11.4190   11.4190      2.8230  [2.35e-01%] [Thread] tsi::runtime::TsavRTPosix::initialize
1     8.4450    8.4450      0.2720  └─ [1.74e-01%] tsi::runtime::TsavRTPosix::initializeQueues
1     7.2190    7.2190      7.2190    └─ [1.48e-01%] tsi::runtime::TsavRT::awaitCommandListCompletion
1     0.8190    0.8190      0.8190    └─ [1.68e-02%] tsi::runtime::TsavRTPosix::requestTXEDevice
1     0.1350    0.1350      0.0750    └─ [2.77e-03%] tsi::runtime::TsavRT::finalizeCommandList
1     0.0600    0.0600      0.0600      └─ [1.23e-03%] tsi::runtime::executeWithTimeout
1     0.1510    0.1510      0.1510  └─ [3.10e-03%] tsi::runtime::TsavRT::initialize

[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)

1    12.9040   12.9040     12.3210  [2.65e-01%] [Thread] tsi::runtime::TsavRT::finalize
1     0.5730    0.5730      0.0590  └─ [1.18e-02%] tsi::runtime::TsavRTPosix::detachFromTXEDevice
1     0.5140    0.5140      0.0940    └─ [1.06e-02%] tsi::runtime::TsavRT::executeSyncCommand
1     0.3630    0.3630      0.3630      └─ [7.46e-03%] tsi::runtime::TsavRT::awaitCommandListCompletion
1     0.0570    0.0570      0.0520      └─ [1.17e-03%] tsi::runtime::TsavRT::finalizeCommandList
1     0.0050    0.0050      0.0050        └─ [1.03e-04%] tsi::runtime::executeWithTimeout
2     0.0100    0.0050      0.0100  └─ [2.05e-04%] tsi::runtime::TsavRT::deallocate

[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)

39 8.7490 0.2243 1.1770 [1.80e-01%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
78 7.5610 0.0969 7.5610 └─ [1.55e-01%] tsi::runtime::executeWithTimeout
39 0.0110 2.82e-04 0.0110 └─ [2.26e-04%] LOAD_BLOB Command Execution
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148533888[0x801...
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle

[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)

39 6.2130 0.1593 1.3430 [1.28e-01%] [Thread] tsi::runtime::TsavRTPosix::unloadBlob
78 4.8570 0.0623 4.8570 └─ [9.98e-02%] tsi::runtime::executeWithTimeout
39 0.0130 3.33e-04 0.0130 └─ [2.67e-04%] UNLOAD_BLOB Command Execution
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=3 (UNLOAD_BLOB), blob_args=[2148533888[0x8...
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

41 14.5470 0.3548 0.4390 [2.99e-01%] [Thread] tsi::runtime::TsavRT::processResponses
41 14.1080 0.3441 14.1080 └─ [2.90e-01%] tsi::runtime::executeWithTimeout

[Thread] OPU (cumulative over all threads)

1     0.0860    0.0860      0.0550  [1.77e-03%] [Thread] OPU 
1     0.0310    0.0310      0.0310  └─ [6.37e-04%] tsi::runtime::TsavRT::allocate

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

39 0.6400 0.0164 0.5630 [1.32e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
39 0.0770 0.0020 0.0770 └─ [1.58e-03%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

41 0.1720 0.0042 0.1720 [3.53e-03%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

39 0.1730 0.0044 0.1730 [3.55e-03%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

39 7.8040 0.2001 7.8040 [1.60e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

39 0.0530 0.0014 0.0530 [1.09e-03%] [Thread] tsi::runtime::TsavRT::deallocate

-  4866.4410    0.0000   4866.4410  [100.00%] TOTAL

========================================================================================================================

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.9917

[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$

@mikeuhler mikeuhler removed their request for review October 20, 2025 20:41
@akapoor3518 akapoor3518 merged commit fa1d772 into master Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants