@FIR-1044 - lama.cpp: its crashing for new model due to recent bug #71

akapoor3518 · 2025-10-25T07:36:51Z

Validated all gemma model

akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/gemma3:1b-it-q4_K_M --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
I understand you're trying

quit

llama_perf_sampler_print: sampling time = 0.01 ms / 12 runs ( 0.00 ms per token, 857142.86 tokens per second)
llama_perf_context_print: load time = 11005.97 ms
llama_perf_context_print: prompt eval time = 8869.34 ms / 15 tokens ( 591.29 ms per token, 1.69 tokens per second)
llama_perf_context_print: eval time = 5165.31 ms / 5 runs ( 1033.06 ms per token, 0.97 tokens per second)
llama_perf_context_print: total time = 21792.62 ms / 20 tokens

=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 1144 1844 947503 828.24
MUL OPU 3454 8404 4266409 1235.21
RMS_NORM OPU 3454 3454 2435789 705.21
MUL_MAT CPU 20598 0 78091817 3791.23
SCALE CPU 2225 0 2710 1.22
CONT CPU 4181 0 230922 55.23
RESHAPE CPU 6551 0 2577 0.39
VIEW CPU 9772 0 1294 0.13
PERMUTE CPU 8016 0 1288 0.16
TRANSPOSE CPU 2095 0 298 0.14
GET_ROWS CPU 129 0 454 3.52
SET_ROWS CPU 3883 0 2705 0.70
SOFT_MAX OPU 572 3744 2125329 3715.61
ROPE CPU 4395 0 36985 8.42
GLU CPU 2156 0 296249 137.41
Interrupted by user
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/gemma3:1b-it-fp16 --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
I understand you're looking

llama_perf_sampler_print: sampling time = 99.35 ms / 21 runs ( 4.73 ms per token, 211.38 tokens per second)

llama_perf_context_print: load time = 17071.58 ms
llama_perf_context_print: prompt eval time = 6913.49 ms / 15 tokens ( 460.90 ms per token, 2.17 tokens per second)
llama_perf_context_print: eval time = 3696.45 ms / 5 runs ( 739.29 ms per token, 1.35 tokens per second)
llama_perf_context_print: total time = 27211.75 ms / 20 tokens

=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 1144 1844 954084 833.99
MUL OPU 3454 8404 4300667 1245.13
RMS_NORM OPU 3454 3454 2227842 645.00
MUL_MAT CPU 20475 0 47999475 2344.30
SCALE CPU 2226 0 2757 1.24
CONT CPU 4213 0 234803 55.73
RESHAPE CPU 6561 0 2271 0.35
VIEW CPU 9892 0 1445 0.15
PERMUTE CPU 7844 0 1228 0.16
TRANSPOSE CPU 2000 0 351 0.18
GET_ROWS CPU 117 0 353 3.02
SET_ROWS CPU 3777 0 2629 0.70
SOFT_MAX OPU 572 3744 2036892 3561.00
ROPE CPU 4376 0 37872 8.65
GLU CPU 2198 0 321821 146.42
Interrupted by user
[akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/google-gemma-3-1b-it-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
Please tell me your cat'

llama_perf_sampler_print: sampling time = 102.87 ms / 20 runs ( 5.14 ms per token, 194.43 tokens per second)
llama_perf_context_print: load time = 7667.50 ms
llama_perf_context_print: prompt eval time = 5957.95 ms / 14 tokens ( 425.57 ms per token, 2.35 tokens per second)
llama_perf_context_print: eval time = 3045.03 ms / 5 runs ( 609.01 ms per token, 1.64 tokens per second)
llama_perf_context_print: total time = 15241.09 ms / 19 tokens

=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 1144 1794 996862 871.38
MUL OPU 3454 8173 4397025 1273.02
RMS_NORM OPU 3454 3454 2343316 678.44
MUL_MAT CPU 20520 0 31297671 1525.23
SCALE CPU 2188 0 3248 1.48
CONT CPU 4227 0 244907 57.94
RESHAPE CPU 6549 0 2715 0.41
VIEW CPU 9830 0 1484 0.15
PERMUTE CPU 7972 0 1506 0.19
TRANSPOSE CPU 2017 0 314 0.16
GET_ROWS CPU 115 0 356 3.10
SET_ROWS CPU 3779 0 2909 0.77
SOFT_MAX OPU 572 3640 2109735 3688.35
ROPE CPU 4405 0 38399 8.72
GLU CPU 2161 0 332427 153.83
Interrupted by user
[akapoor@wssw01 llama.cpp]$

atrivedi-tsavoritesi

approving but it is not clear why this clean up triggers a crash, is it because in interactive mode instead of waiting we are cleaning the backend ?

akapoor3518 · 2025-10-25T19:08:58Z

approving but it is not clear why this clean up triggers a crash, is it because in interactive mode instead of waiting we are cleaning the backend ?

Actually, this wasn’t an error condition—it was a lambda function return that I misinterpreted as a regular return in the middle of the program, which led me to trigger cleanup prematurely. During cleanup, I freed memory that was later accessed during a memcpy, causing the issue.
You’re not seeing this on FPGA because you weren’t running in interactive mode.

atrivedi-tsavoritesi · 2025-10-25T19:11:26Z

approving but it is not clear why this clean up triggers a crash, is it because in interactive mode instead of waiting we are cleaning the backend ?

Actually, this wasn’t an error condition—it was a lambda function return that I misinterpreted as a regular return in the middle of the program, which led me to trigger cleanup prematurely. During cleanup, I freed memory that was later accessed during a memcpy, causing the issue. You’re not seeing this on FPGA because you weren’t running in interactive mode.

got it, thanks for clarification

@FIR-1044 - lama.cpp: its crashing for new model due to recent bug

f45c3ea

akapoor3518 requested review from Nithyanand-G, atrivedi-tsavoritesi, dineshReddy6381, dmpatra, gkethamallax, mikeuhler and mmankal as code owners October 25, 2025 07:36

Reverted this change

9ab4b3c

atrivedi-tsavoritesi approved these changes Oct 25, 2025

View reviewed changes

akapoor3518 merged commit 11b4019 into master Oct 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

@FIR-1044 - lama.cpp: its crashing for new model due to recent bug #71

@FIR-1044 - lama.cpp: its crashing for new model due to recent bug #71

Uh oh!

akapoor3518 commented Oct 25, 2025

Uh oh!

atrivedi-tsavoritesi left a comment

Uh oh!

akapoor3518 commented Oct 25, 2025

Uh oh!

atrivedi-tsavoritesi commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@FIR-1044 - lama.cpp: its crashing for new model due to recent bug #71

@FIR-1044 - lama.cpp: its crashing for new model due to recent bug #71

Uh oh!

Conversation

akapoor3518 commented Oct 25, 2025

Uh oh!

atrivedi-tsavoritesi left a comment

Choose a reason for hiding this comment

Uh oh!

akapoor3518 commented Oct 25, 2025

Uh oh!

atrivedi-tsavoritesi commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants