@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9) #54

akapoor3518 · 2025-09-24T17:35:42Z

Tested at FPGA and posix

FPGA Result
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin# tar -zxvf tsi-ggml-0.0.8.tz
tsi-ggml/blobs/
tsi-ggml/blobs/txe_add.blob
tsi-ggml/blobs/txe_sub.blob
tsi-ggml/blobs/txe_sqrt.blob
tsi-ggml/blobs/txe_sqr.blob
tsi-ggml/blobs/txe_inv.blob
tsi-ggml/blobs/txe_sin.blob
tsi-ggml/blobs/txe_sigmoid.blob
tsi-ggml/blobs/txe_silu.blob
tsi-ggml/blobs/txe_mult.blob
tsi-ggml/blobs/txe_div.blob
tsi-ggml/blobs/txe_abs.blob
tsi-ggml/blobs/txe_neg.blob
tsi-ggml/ggml.sh
tsi-ggml/libggml-base.so
tsi-ggml/libggml-cpu.so
tsi-ggml/libggml.so
tsi-ggml/libggml-tsavorite.so
tsi-ggml/libllama.so
tsi-ggml/llama-cli
tsi-ggml/simple-backend-tsi
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin# cd /tsi-ggml
-sh: cd: /tsi-ggml: No such file or directory
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin# cd tsi-ggml
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin/tsi-ggml# ls
blobs libggml-cpu.so libllama.so
ggml.sh libggml-tsavorite.so llama-cli
libggml-base.so libggml.so simple-backend-tsi
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin/tsi-ggml# ./ggml.sh
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin/tsi-ggml# ./simple-backend-tsi
load_model: using TSavorite backend

Calculating mem_size 384 2 and creating ggml context

Creating input Tensor

Creating Backend Buffer

Loading Input Tensor Data to Backend Buffer

Bringing tensor data from Backend buffer and printing 32 tensor data:
[ 1.10 2.30 3.20 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 ]
main: compute buffer size: 0.2500 KB

Under Test case for compute API creating build_graph

Compute Done

operation type: add, num of elements 32

compute is also done
Index 0: expected bits 400ccccd, actual bits 400ccccd
Index 1: expected bits 40900000, actual bits 40900000
Index 2: expected bits 40d00000, actual bits 40d00000
Index 3: expected bits 41000000, actual bits 41000000
Index 4: expected bits 41200000, actual bits 41200000
Index 5: expected bits 41400000, actual bits 41400000
Index 6: expected bits 41600000, actual bits 41600000
Index 7: expected bits 41800000, actual bits 41800000
Index 8: expected bits 41900000, actual bits 41900000
Index 9: expected bits 41a00000, actual bits 41a00000
Index 10: expected bits 41b00000, actual bits 41b00000
Index 11: expected bits 41c00000, actual bits 41c00000
Index 12: expected bits 41d00000, actual bits 41d00000
Index 13: expected bits 41e00000, actual bits 41e00000
Index 14: expected bits 41f00000, actual bits 41f00000
Index 15: expected bits 42000000, actual bits 42000000
Index 16: expected bits 42080000, actual bits 42080000
Index 17: expected bits 42100000, actual bits 42100000
Index 18: expected bits 42180000, actual bits 42180000
Index 19: expected bits 42200000, actual bits 42200000
Index 20: expected bits 42280000, actual bits 42280000
Index 21: expected bits 42300000, actual bits 42300000
Index 22: expected bits 42380000, actual bits 42380000
Index 23: expected bits 42400000, actual bits 42400000
Index 24: expected bits 42480000, actual bits 42480000
Index 25: expected bits 42500000, actual bits 42500000
Index 26: expected bits 42580000, actual bits 42580000
Index 27: expected bits 42600000, actual bits 42600000
Index 28: expected bits 42680000, actual bits 42680000
Index 29: expected bits 42700000, actual bits 42700000
Index 30: expected bits 42780000, actual bits 42780000
Index 31: expected bits 42800000, actual bits 42800000

TEST CASE PASSED

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1    19.5840   19.5840     18.4450  [9.41e-01%] [Thread] tsi::runtime::TsavRTFPGA::initialize
1     0.5580    0.5580      0.5580  └─ [2.68e-02%] tsi::runtime::TsavRTFPGA::initializeQueues
1     0.4340    0.4340      0.4340  └─ [2.09e-02%] tsi::runtime::TsavRT::initialize
1     0.1470    0.1470      0.0980  └─ [7.07e-03%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2     0.0490    0.0245      0.0490    └─ [2.36e-03%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

1     0.1860    0.1860      0.0000  [8.94e-03%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
1     5.0024    5.0024      5.0024  └─ [2.40e-01%] TXE 0 Idle
1     0.0959    0.0959      0.0959  └─ [4.61e-03%] [ txe_add ]

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

1     0.1260    0.1260      0.1180  [6.06e-03%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1     0.0080    0.0080      0.0080  └─ [3.85e-04%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

1     0.2430    0.2430      0.2330  [1.17e-02%] [Thread] tsi::runtime::TsavRT::processResponses
1     0.0100    0.0100      0.0100  └─ [4.81e-04%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

1    53.9940   53.9940     53.3080  [ 2.60%] [Thread] tsi::runtime::TsavRTFPGA::finalize
1     0.6860    0.6860      0.6860  └─ [3.30e-02%] tsi::runtime::TsavRTFPGA::releaseTxes

[Thread] OPU (cumulative over all threads)

1     0.3420    0.3420      0.3420  [1.64e-02%] [Thread] OPU

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

3     0.0980    0.0327      0.0980  [4.71e-03%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

1     0.7790    0.7790      0.7790  [3.75e-02%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

1     0.1990    0.1990      0.1990  [9.57e-03%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

1     0.0370    0.0370      0.0370  [1.78e-03%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

1     0.0210    0.0210      0.0210  [1.01e-03%] [Thread] tsi::runtime::TsavRT::deallocate

========================================================================================================================
- 2080.0950 0.0000 2080.0950 [100.00%] TOTAL

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.3333

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin/tsi-ggml# cd ..
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin# ./run_llama_cli.sh
is Luna.

llama_perf_sampler_print: sampling time = 109.04 ms / 11 runs ( 9.91 ms per token, 100.88 tokens per second)
llama_perf_context_print: load time = 24766.44 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 49392.86 ms / 4 runs (12348.22 ms per token, 0.08 tokens per second)
llama_perf_context_print: total time = 61805.17 ms / 5 tokens

=== GGML Perf Summary ===
Op Runs Total us Avg us

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1    20.8360   20.8360     19.4370  [3.26e-02%] [Thread] tsi::runtime::TsavRTFPGA::initialize
1     0.6810    0.6810      0.6810  └─ [1.06e-03%] tsi::runtime::TsavRTFPGA::initializeQueues
1     0.5200    0.5200      0.5200  └─ [8.13e-04%] tsi::runtime::TsavRT::initialize
1     0.1980    0.1980      0.1480  └─ [3.10e-04%] tsi::runtime::TsavRTFPGA::sendNOPTestCommand
2     0.0500    0.0250      0.0500    └─ [7.82e-05%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

356 243.2140 0.6832 0.0000 [3.80e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
356 61035.1231 171.4470 61035.1231 └─ [95.43%] TXE 0 Idle
180 48.7143 0.2706 48.7143 └─ [7.62e-02%] [ txe_mult ]
176 47.9831 0.2726 47.9831 └─ [7.50e-02%] [ txe_add ]

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

356 288.4650 0.8103 278.2360 [4.51e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
356 10.2290 0.0287 10.2290 └─ [1.60e-02%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

356 408.2510 1.1468 23.7530 [6.38e-01%] [Thread] tsi::runtime::TsavRT::processResponses
356 384.4980 1.0801 384.4980 └─ [6.01e-01%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

1    76.3370   76.3370     57.5970  [1.19e-01%] [Thread] tsi::runtime::TsavRTFPGA::finalize
1    18.7400   18.7400     18.7400  └─ [2.93e-02%] tsi::runtime::TsavRTFPGA::releaseTxes

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

358 37.2500 0.1041 37.2500 [5.82e-02%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] OPU (cumulative over all threads)

1     5.0620    5.0620      5.0620  [7.91e-03%] [Thread] OPU

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

356 147.2580 0.4136 147.2580 [2.30e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

356 31.5090 0.0885 31.5090 [4.93e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

356 28.4240 0.0798 28.4240 [4.44e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

356 5.4670 0.0154 5.4670 [8.55e-03%] [Thread] tsi::runtime::TsavRT::deallocate

- 63956.4440    0.0000  63956.4440  [100.00%] TOTAL

========================================================================================================================

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.6313

root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin#
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin#
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv37_09_25_2025/bin#

Posix
akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a cat

llama_perf_sampler_print: sampling time = 28.20 ms / 16 runs ( 1.76 ms per token, 567.40 tokens per second)
llama_perf_context_print: load time = 13464.16 ms
llama_perf_context_print: prompt eval time = 3526.06 ms / 6 tokens ( 587.68 ms per token, 1.70 tokens per second)
llama_perf_context_print: eval time = 7123.48 ms / 9 runs ( 791.50 ms per token, 1.26 tokens per second)
llama_perf_context_print: total time = 20618.80 ms / 15 tokens

=== GGML Perf Summary ===
Op Runs Total us Avg us

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

1    15.8340   15.8340      7.7810  [6.98e-02%] [Thread] tsi::runtime::TsavRTPosix::initialize
1     7.9120    7.9120      0.2610  └─ [3.49e-02%] tsi::runtime::TsavRTPosix::initializeQueues
1     6.6080    6.6080      6.6080    └─ [2.91e-02%] tsi::runtime::TsavRT::awaitCommandListCompletion
1     0.9050    0.9050      0.9050    └─ [3.99e-03%] tsi::runtime::TsavRTPosix::requestTXEDevice
1     0.1380    0.1380      0.1280    └─ [6.09e-04%] tsi::runtime::TsavRT::finalizeCommandList
1     0.0100    0.0100      0.0100      └─ [4.41e-05%] tsi::runtime::executeWithTimeout
1     0.1410    0.1410      0.1410  └─ [6.22e-04%] tsi::runtime::TsavRT::initialize

[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)

1     4.5580    4.5580      2.9040  [2.01e-02%] [Thread] tsi::runtime::TsavRT::finalize
1     1.6440    1.6440      0.0570  └─ [7.25e-03%] tsi::runtime::TsavRTPosix::detachFromTXEDevice
1     1.5870    1.5870      0.0950    └─ [7.00e-03%] tsi::runtime::TsavRT::executeSyncCommand
1     1.4340    1.4340      1.4340      └─ [6.32e-03%] tsi::runtime::TsavRT::awaitCommandListCompletion
1     0.0580    0.0580      0.0420      └─ [2.56e-04%] tsi::runtime::TsavRT::finalizeCommandList
1     0.0160    0.0160      0.0160        └─ [7.06e-05%] tsi::runtime::executeWithTimeout
2     0.0100    0.0050      0.0100  └─ [4.41e-05%] tsi::runtime::TsavRT::deallocate

[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)

1315 1513.7180 1.1511 50.9540 [ 6.68%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
2630 1462.0330 0.5559 1462.0330 └─ [ 6.45%] tsi::runtime::executeWithTimeout
1315 0.7310 5.56e-04 0.7310 └─ [3.22e-03%] LOAD_BLOB Command Execution
1315 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148008576[0x800...
1315 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle

[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)

1315 413.5820 0.3145 47.6210 [ 1.82%] [Thread] tsi::runtime::TsavRTPosix::unloadBlob
2630 365.3530 0.1389 365.3530 └─ [ 1.61%] tsi::runtime::executeWithTimeout
1315 0.6080 4.62e-04 0.6080 └─ [2.68e-03%] UNLOAD_BLOB Command Execution
1315 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=3 (UNLOAD_BLOB), blob_args=[2148008576[0x8...
1315 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

1317 504.8030 0.3833 9.7470 [ 2.23%] [Thread] tsi::runtime::TsavRT::processResponses
1317 495.0560 0.3759 495.0560 └─ [ 2.18%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

1315 19.6310 0.0149 17.8480 [8.66e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1315 1.7830 0.0014 1.7830 └─ [7.86e-03%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

1317 6.7330 0.0051 6.7330 [2.97e-02%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] OPU (cumulative over all threads)

1     0.0530    0.0530      0.0530  [2.34e-04%] [Thread] OPU

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

1315 5.9530 0.0045 5.9530 [2.63e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

1315 942.2370 0.7165 942.2370 [ 4.16%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

1315 2.3790 0.0018 2.3790 [1.05e-02%] [Thread] tsi::runtime::TsavRT::deallocate

- 22672.1370    0.0000  22672.1370  [100.00%] TOTAL

========================================================================================================================

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.9987

[akapoor@wssw01 llama.cpp]$

dineshReddy6381

Approved

@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9)

74be9a5

akapoor3518 requested a review from Nithyanand-G as a code owner September 24, 2025 17:35

akapoor3518 requested review from atrivedi-tsavoritesi, dineshReddy6381, dmpatra and mmankal and removed request for Nithyanand-G September 24, 2025 17:36

dineshReddy6381 approved these changes Sep 24, 2025

View reviewed changes

atrivedi-tsavoritesi approved these changes Sep 24, 2025

View reviewed changes

akapoor3518 merged commit d799844 into master Sep 24, 2025

akapoor3518 deleted the FIR-979 branch September 24, 2025 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9) #54

@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9) #54

Uh oh!

akapoor3518 commented Sep 24, 2025

Uh oh!

dineshReddy6381 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9) #54

@FIR-979 - llama.cpp update to latest SDK(sdk-r.0.1.9) #54

Uh oh!

Conversation

akapoor3518 commented Sep 24, 2025

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

[Thread] OPU (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

======================================================================================================================== - 2080.0950 0.0000 2080.0950 [100.00%] TOTAL

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.3333

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

356 243.2140 0.6832 0.0000 [3.80e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion 356 61035.1231 171.4470 61035.1231 └─ [95.43%] TXE 0 Idle 180 48.7143 0.2706 48.7143 └─ [7.62e-02%] [ txe_mult ] 176 47.9831 0.2726 47.9831 └─ [7.50e-02%] [ txe_add ]

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

356 288.4650 0.8103 278.2360 [4.51e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList 356 10.2290 0.0287 10.2290 └─ [1.60e-02%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

356 408.2510 1.1468 23.7530 [6.38e-01%] [Thread] tsi::runtime::TsavRT::processResponses 356 384.4980 1.0801 384.4980 └─ [6.01e-01%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

358 37.2500 0.1041 37.2500 [5.82e-02%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] OPU (cumulative over all threads)

[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)

356 147.2580 0.4136 147.2580 [2.30e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

356 31.5090 0.0885 31.5090 [4.93e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)

356 28.4240 0.0798 28.4240 [4.44e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

356 5.4670 0.0154 5.4670 [8.55e-03%] [Thread] tsi::runtime::TsavRT::deallocate

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.6313

OPU Profiling Results:

Calls Total(ms) T/call Self(ms) Function

[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)

[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)

[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)

1317 504.8030 0.3833 9.7470 [ 2.23%] [Thread] tsi::runtime::TsavRT::processResponses 1317 495.0560 0.3759 495.0560 └─ [ 2.18%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)

1315 19.6310 0.0149 17.8480 [8.66e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList 1315 1.7830 0.0014 1.7830 └─ [7.86e-03%] tsi::runtime::executeWithTimeout

[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)

1317 6.7330 0.0051 6.7330 [2.97e-02%] [Thread] tsi::runtime::TsavRT::allocate

[Thread] OPU (cumulative over all threads)

[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)

1315 5.9530 0.0045 5.9530 [2.63e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList

[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)

1315 942.2370 0.7165 942.2370 [ 4.16%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion

[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)

1315 2.3790 0.0018 2.3790 [1.05e-02%] [Thread] tsi::runtime::TsavRT::deallocate

Counter Metrics:

Metric Min Max Avg

Queue_0_Occupancy 0.0000 1.0000 0.9987

Uh oh!

dineshReddy6381 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

========================================================================================================================
- 2080.0950 0.0000 2080.0950 [100.00%] TOTAL

356 243.2140 0.6832 0.0000 [3.80e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
356 61035.1231 171.4470 61035.1231 └─ [95.43%] TXE 0 Idle
180 48.7143 0.2706 48.7143 └─ [7.62e-02%] [ txe_mult ]
176 47.9831 0.2726 47.9831 └─ [7.50e-02%] [ txe_add ]

356 288.4650 0.8103 278.2360 [4.51e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
356 10.2290 0.0287 10.2290 └─ [1.60e-02%] tsi::runtime::executeWithTimeout

356 408.2510 1.1468 23.7530 [6.38e-01%] [Thread] tsi::runtime::TsavRT::processResponses
356 384.4980 1.0801 384.4980 └─ [6.01e-01%] tsi::runtime::executeWithTimeout

1317 504.8030 0.3833 9.7470 [ 2.23%] [Thread] tsi::runtime::TsavRT::processResponses
1317 495.0560 0.3759 495.0560 └─ [ 2.18%] tsi::runtime::executeWithTimeout

1315 19.6310 0.0149 17.8480 [8.66e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1315 1.7830 0.0014 1.7830 └─ [7.86e-03%] tsi::runtime::executeWithTimeout