@FIR-1039 - llama.cpp: new release of 0.2.0 with sync with MLIR SDK 0.2.0 #69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Validated at Posix & FPGA
akapoor@wssw01 ggml]$ ls -lrt
total 1373444
drwxr-xr-x 3 akapoor tsiusers 4096 May 5 13:06 vendor
drwxr-xr-x 2 akapoor tsiusers 8192 May 12 13:46 lib
-rw-r--r-- 1 atrivedi tsiusers 10930369 May 25 12:04 tsi-ggml-0.0.1.tz
-rw-r--r-- 1 atrivedi tsiusers 14485488 Jun 5 11:11 tsi-ggml-0.0.2.tz
-rw-r--r-- 1 atrivedi tsiusers 14476846 Jun 18 12:10 tsi-ggml-0.0.3.tz
-rw-r--r-- 1 akapoor tsiusers 14481013 Jul 2 10:18 tsi-ggml-0.0.4.tz
-rw-r--r-- 1 atrivedi tsiusers 14066216 Jul 11 09:47 tsi-ggml-0.0.5.tz
-rw-r--r-- 1 atrivedi tsiusers 14066294 Aug 15 15:17 tsi-ggml-0.0.6.tz
-rw-r--r-- 1 atrivedi tsiusers 14067550 Sep 12 15:26 tsi-ggml-0.0.7.tz
-rw-r--r-- 1 akapoor tsiusers 16576955 Sep 24 12:31 tsi-ggml-0.0.8.tz
drwxrwxrwx 6 atrivedi tsiusers 4096 Oct 2 10:01 models_bf16
-rw-r--r-- 1 akapoor tsiusers 16593627 Oct 6 22:48 tsi-ggml-0.0.9.tz
-rw-r--r-- 1 akapoor tsiusers 16594215 Oct 10 16:19 tsi-ggml-0.0.10.tz
drwxrwxrwx 2 akapoor tsiusers 4096 Oct 16 14:47 models
-rw-r--r-- 1 kraza tsiusers 1237843968 Oct 16 14:56 modelsclear
-rw-r--r-- 1 akapoor tsiusers 16599289 Oct 22 16:24 tsi-ggml-0.2.0.tz
lrwxrwxrwx 1 akapoor tsiusers 39 Oct 22 16:24 tsi-ggml-aws-latest.tz -> /aws/proj/rel/sw/ggml/tsi-ggml-0.2.0.tz
lrwxrwxrwx 1 akapoor tsiusers 35 Oct 22 16:24 tsi-ggml-latest.tz -> /proj/rel/sw/ggml/tsi-ggml-0.2.0.tz
[akapoor@wssw01 ggml]$
###########
POSIX LOG
akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I
llama_perf_sampler_print: sampling time = 15.42 ms / 12 runs ( 1.29 ms per token, 778.01 tokens per second)
llama_perf_context_print: load time = 5562.28 ms
llama_perf_context_print: prompt eval time = 4478.28 ms / 6 tokens ( 746.38 ms per token, 1.34 tokens per second)
llama_perf_context_print: eval time = 3905.12 ms / 5 runs ( 781.02 ms per token, 1.28 tokens per second)
llama_perf_context_print: total time = 9484.59 ms / 11 tokens
=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 704 914 957122 1359.55
MUL OPU 720 935 507975 705.52
RMS_NORM OPU 720 720 435028 604.21
MUL_MAT CPU 12663 0 2093600 1653.32
CONT CPU 2660 0 138887 52.21
RESHAPE CPU 4044 0 1669 0.41
VIEW CPU 5995 0 678 0.11
PERMUTE CPU 4734 0 844 0.18
TRANSPOSE CPU 1088 0 269 0.25
GET_ROWS CPU 134 0 324 2.42
SET_ROWS CPU 2502 0 1921 0.77
SOFT_MAX OPU 352 14784 8219517 23350.90
ROPE CPU 2650 0 13404 5.06
GLU OPU 352 457 560127 1591.27
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)
9210 1615.4930 0.1754 229.6150 [13.95%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
18420 1384.8900 0.0752 1384.8900 └─ [11.96%] tsi::runtime::executeWithTimeout
9210 0.9880 1.07e-04 0.9880 └─ [8.53e-03%] LOAD_BLOB Command Execution
9210 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148009600[0x800...
9210 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle
[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)
9210 1514.3190 0.1644 272.3920 [13.07%] [Thread] tsi::runtime::TsavRTPosix::unloadBlob
18420 1240.2900 0.0673 1240.2900 └─ [10.71%] tsi::runtime::executeWithTimeout
9210 1.6370 1.78e-04 1.6370 └─ [1.41e-02%] UNLOAD_BLOB Command Execution
9210 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=3 (UNLOAD_BLOB), blob_args=[2148009600[0x8...
9210 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
9212 1680.7330 0.1825 58.8010 [14.51%] [Thread] tsi::runtime::TsavRT::processResponses
9212 1621.9320 0.1761 1621.9320 └─ [14.00%] tsi::runtime::executeWithTimeout
[Thread] OPU (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
###########
FPGA LOG
rwx------ 7 101006 100003 504 Jan 1 1970 aot-tests
drwxr-xr-x 4 root root 952 Mar 9 12:35 tsi-ggml
drwxr-xr-x 4 root root 952 Mar 9 12:36 tsi-ggml-orig
-rw-r--r-- 1 root root 16598931 Oct 22 2025 tsi-ggml-0.2.0.tz
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv39_10_19_2025/bin# ./run_llama_cli.sh
is Luna.
llama_perf_sampler_print: sampling time = 108.05 ms / 11 runs ( 9.82 ms per token, 101.80 tokens per second)
llama_perf_context_print: load time = 53827.25 ms
llama_perf_context_print: prompt eval time = 42797.28 ms / 6 tokens ( 7132.88 ms per token, 0.14 tokens per second)
llama_perf_context_print: eval time = 48400.95 ms / 4 runs (12100.24 ms per token, 0.08 tokens per second)
llama_perf_context_print: total time = 102362.41 ms / 10 tokens
=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 484 694 1528861 3158.80
MUL OPU 495 710 966104 1951.73
RMS_NORM OPU 495 495 1120119 2262.87
MUL_MAT CPU 8227 0 510991926 62111.57
CONT CPU 1329 0 1720169 1294.33
RESHAPE CPU 1329 0 15171 11.42
VIEW CPU 1847 0 2506 1.36
PERMUTE CPU 1552 0 2970 1.91
TRANSPOSE CPU 307 0 696 2.27
GET_ROWS CPU 83 0 18355 221.14
SET_ROWS CPU 1644 0 538218 327.38
SOFT_MAX CPU 629 0 1007642 1601.97
ROPE CPU 1580 0 99573 63.02
GLU OPU 242 347 1051099 4343.38
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
1310 1007.8860 0.7694 0.0000 [9.64e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
1310 1.01e+05 77.1745 1.01e+05 └─ [96.71%] TXE 0 Idle
215 201.5619 0.9375 201.5619 └─ [1.93e-01%] [ txe_swiglu ]
225 136.1007 0.6049 136.1007 └─ [1.30e-01%] [ txe_rms_norm ]
440 128.8312 0.2928 128.8312 └─ [1.23e-01%] [ txe_mult ]
430 124.7104 0.2900 124.7104 └─ [1.19e-01%] [ txe_add ]
[Thread] OPU (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
1310 781.7130 0.5967 765.6000 [7.48e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
1310 16.1130 0.0123 16.1130 └─ [1.54e-02%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
1310 1485.2700 1.1338 46.5140 [ 1.42%] [Thread] tsi::runtime::TsavRT::processResponses
1310 1438.7560 1.0983 1438.7560 └─ [ 1.38%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
1312 66.6430 0.0508 66.6430 [6.38e-02%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
1310 339.9690 0.2595 339.9690 [3.25e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
1310 68.7070 0.0524 68.7070 [6.57e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
1310 98.5160 0.0752 98.5160 [9.42e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
1310 14.8400 0.0113 14.8400 [1.42e-02%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.7774
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv39_10_19_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv39_10_19_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv39_10_19_2025/bin#
Terminating...
Thanks for using picocom
akapoor@fpga4:/proj/work/akapoor$