@FIR-1033 - llama.cpp crash with below prompt for model SakanaAI-Tiny… #66
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No more crash
build-posix/bin/llama-cli -p "ヨーロッパには何人の国がありますか" -m /proj/rel/sw/ggml/models/SakanaAI-TinySwallow-1.5B-Instruct-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 6 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)
39 8.7490 0.2243 1.1770 [1.80e-01%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
78 7.5610 0.0969 7.5610 └─ [1.55e-01%] tsi::runtime::executeWithTimeout
39 0.0110 2.82e-04 0.0110 └─ [2.26e-04%] LOAD_BLOB Command Execution
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148533888[0x801...
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle
[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)
39 6.2130 0.1593 1.3430 [1.28e-01%] [Thread] tsi::runtime::TsavRTPosix::unloadBlob
78 4.8570 0.0623 4.8570 └─ [9.98e-02%] tsi::runtime::executeWithTimeout
39 0.0130 3.33e-04 0.0130 └─ [2.67e-04%] UNLOAD_BLOB Command Execution
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=3 (UNLOAD_BLOB), blob_args=[2148533888[0x8...
39 0.0000 0.0000 0.0000 └─ [0.00e+00%] TXE 0 Idle
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
41 14.5470 0.3548 0.4390 [2.99e-01%] [Thread] tsi::runtime::TsavRT::processResponses
41 14.1080 0.3441 14.1080 └─ [2.90e-01%] tsi::runtime::executeWithTimeout
[Thread] OPU (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
39 0.6400 0.0164 0.5630 [1.32e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
39 0.0770 0.0020 0.0770 └─ [1.58e-03%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
41 0.1720 0.0042 0.1720 [3.53e-03%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
39 0.1730 0.0044 0.1730 [3.55e-03%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
39 7.8040 0.2001 7.8040 [1.60e-01%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
39 0.0530 0.0014 0.0530 [1.09e-03%] [Thread] tsi::runtime::TsavRT::deallocate
========================================================================================================================
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.9917
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$
[akapoor@wssw01 llama.cpp]$