@FIR-1032 - LLama.cpp: wrong model cause crash the llama-cli process #65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
POSIX Validation
akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F321.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 1 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
========================================================================================================================
- 2114.9780 0.0000 2114.9780 [100.00%] TOTAL
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.6667
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
FPGA Validation
With good and bas model
oot@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin# ./run_llama_cli.sh
is Luna.
llama_perf_sampler_print: sampling time = 110.51 ms / 11 runs ( 10.05 ms per token, 99.54 tokens per second)
llama_perf_context_print: load time = 24828.69 ms
llama_perf_context_print: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: eval time = 51249.24 ms / 4 runs (12812.31 ms per token, 0.08 tokens per second)
llama_perf_context_print: total time = 63245.64 ms / 5 tokens
=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 440 440 1287759 2926.72
MUL OPU 450 450 683687 1519.30
RMS_NORM OPU 450 450 1174534 2610.08
MUL_MAT CPU 7881 0 465226500 59031.40
CONT CPU 1208 0 1153354 954.76
RESHAPE CPU 1148 0 31816 27.71
VIEW CPU 1770 0 2891 1.63
PERMUTE CPU 1423 0 2425 1.70
TRANSPOSE CPU 294 0 741 2.52
GET_ROWS CPU 79 0 15513 196.37
SET_ROWS CPU 1525 0 28783 18.87
SOFT_MAX CPU 584 0 660988 1131.83
ROPE CPU 1446 0 130146 90.00
GLU OPU 220 220 840087 3818.58
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
624 559.2910 0.8963 0.0000 [8.55e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
624 62293.2020 99.8288 62293.2020 └─ [95.23%] TXE 0 Idle
88 79.3556 0.9018 79.3556 └─ [1.21e-01%] [ txe_swiglu ]
180 68.6981 0.3817 68.6981 └─ [1.05e-01%] [ txe_rms_norm ]
180 55.1381 0.3063 55.1381 └─ [8.43e-02%] [ txe_mult ]
176 50.7105 0.2881 50.7105 └─ [7.75e-02%] [ txe_add ]
[Thread] OPU (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
624 407.4300 0.6529 389.5260 [6.23e-01%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
624 17.9040 0.0287 17.9040 └─ [2.74e-02%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
624 734.5330 1.1771 26.1010 [ 1.12%] [Thread] tsi::runtime::TsavRT::processResponses
624 708.4320 1.1353 708.4320 └─ [ 1.08%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
626 67.4180 0.1077 67.4180 [1.03e-01%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRTFPGA::loadBlob (cumulative over all threads)
624 239.5510 0.3839 239.5510 [3.66e-01%] [Thread] tsi::runtime::TsavRTFPGA::loadBlob
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
624 46.8340 0.0751 46.8340 [7.16e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRTFPGA::unloadBlob (cumulative over all threads)
624 47.4160 0.0760 47.4160 [7.25e-02%] [Thread] tsi::runtime::TsavRTFPGA::unloadBlob
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
624 8.3380 0.0134 8.3380 [1.27e-02%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.7556
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin# vi run_llama_cli.sh
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin# ./run_llama_cli.sh
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRTFPGA::finalize (cumulative over all threads)
========================================================================================================================
- 2171.7140 0.0000 2171.7140 [100.00%] TOTAL
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 0.0000 0.0000
root@agilex7_dk_si_agf014eb:/usr/bin/tsi/v0.1.1.tsv38_10_12_2025/bin#