-
Notifications
You must be signed in to change notification settings - Fork 0
@FIR-754: Added all parameter parsing for the llama-cli #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test results are as follows
Model Response
cd /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/; ./run_llama_cli.sh "My cat's name"
" 50 tinyllama-vo-5m-para.gguf tSavorite 1.5 1024 50 0.9 5 12288 0.0
[2018-03-09 13:03:17.788243] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
My cat's name was Tim. He loved to play with his toy car. He would run and jump in the park, making loud noises. Tim was very happy with his new toy car.
One day, Tim's mom said, "Tim. You
llama_perf_sampler_print: sampling time = 999.96 ms / 56 runs ( 17.86 ms per token, 56.00 tokens per second)llama_perf_context_print: load time = 1713.55 ms
llama_perf_context_print: prompt eval time = 603.51 ms / 6 tokens ( 100.58 ms per token, 9.94 tokens per second)
llama_perf_context_print: eval time = 7069.36 ms / 49 runs ( 144.27 ms per token, 6.93 tokens per second)
llama_perf_context_print: total time = 10046.17 ms / 55 tokens
[2018-03-09 13:03:28.875126] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.
GGML Tsavorite Profiling Results:
------------------------------------------------------------------------------------------------------------------------
Calls Total(ms) T/call Self(ms) Function
------------------------------------------------------------------------------------------------------------------------
2715 2720.000 1.002 0.000 [25%] RuntimeHostShim::awaitCommandListCompletion
1740 2635.984 1.515 2635.984 └─ [24%] [ txe_silu ]
925 1379.715 1.492 1379.715 └─ [12%] [ txe_mult ]
50 74.450 1.489 74.450 └─ [ 1%] [ txe_add ]
2715 0.448 0.000 0.448 └─ [ 0%] TXE 0 Idle
1 34.000 34.000 34.000 [ 0%] RuntimeHostShim::finalize
1 16.000 16.000 1.000 [ 0%] GGML Tsavorite
1 15.000 15.000 15.000 └─ [ 0%] RuntimeHostShim::initialize
2716 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate
9120 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate
========================================================================================================================
33558 11098.000 0.331 11098.000 [100%] TOTAL
========================================================================================================================
⟵ Back to Form
The URL used is as follows
http://10.50.0.124:5003/llama-cli?model=tiny-llama&backend=tSavorite&tokens=10&prompt=My+cat%27s+name&repeat-penalty=1.5&batch-size=1024&top-k=50&top-p=0.9&last-n=5&context-length=12288&temp=0.0
|
Updated test results cd /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/; ./run_llama_cli.sh "My cat's name" " 10 tinyllama-vo-5m-para.gguf tSavorite 1.5 1024 50 0.9 5 12288 0.0 [2018-03-09 14:27:48.730593] 271:272 [�[32m info�[m] :: TXE resource allocation request processed successfully. My cat's name was Tim. He loved to play with his toy llama_perf_sampler_print: sampling time = 200.83 ms / 16 runs ( 12.55 ms per token, 79.67 tokens per second)llama_perf_context_print: load time = 1717.23 ms llama_perf_context_print: prompt eval time = 605.29 ms / 6 tokens ( 100.88 ms per token, 9.91 tokens per second) llama_perf_context_print: eval time = 1292.54 ms / 9 runs ( 143.62 ms per token, 6.96 tokens per second) llama_perf_context_print: total time = 3260.48 ms / 15 tokens [2018-03-09 14:27:53.028716] 271:272 [�[32m info�[m] :: TXE resource release request processed successfully. GGML Tsavorite Profiling Results: ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 715 717.000 1.003 0.000 [17%] RuntimeHostShim::awaitCommandListCompletion 460 696.899 1.515 696.899 └─ [16%] [ txe_silu ] 245 365.414 1.491 365.414 └─ [ 8%] [ txe_mult ] 10 14.900 1.490 14.900 └─ [ 0%] [ txe_add ] 715 0.458 0.001 0.458 └─ [ 0%] TXE 0 Idle 1 34.000 34.000 34.000 [ 1%] RuntimeHostShim::finalize 1 15.000 15.000 1.000 [ 0%] GGML Tsavorite 1 14.000 14.000 14.000 └─ [ 0%] RuntimeHostShim::initialize 716 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate 2400 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate ======================================================================================================================== 8838 4310.000 0.488 4310.000 [100%] TOTAL ======================================================================================================================== |
0dab042 to
98e7cef
Compare
akapoor3518
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
dineshReddy6381
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
The test results are as follows
Model Response
cd /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/; ./run_llama_cli.sh "My cat's name" " 50 tinyllama-vo-5m-para.gguf tSavorite 1.5 1024 50 0.9 5 12288 0.0 [2018-03-09 13:03:17.788243] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
My cat's name was Tim. He loved to play with his toy car. He would run and jump in the park, making loud noises. Tim was very happy with his new toy car.
One day, Tim's mom said, "Tim. You
llama_perf_sampler_print: sampling time = 999.96 ms / 56 runs ( 17.86 ms per token, 56.00 tokens per second)llama_perf_context_print: load time = 1713.55 ms
llama_perf_context_print: prompt eval time = 603.51 ms / 6 tokens ( 100.58 ms per token, 9.94 tokens per second)
llama_perf_context_print: eval time = 7069.36 ms / 49 runs ( 144.27 ms per token, 6.93 tokens per second)
llama_perf_context_print: total time = 10046.17 ms / 55 tokens
[2018-03-09 13:03:28.875126] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.
GGML Tsavorite Profiling Results:
Calls Total(ms) T/call Self(ms) Function
2715 2720.000 1.002 0.000 [25%] RuntimeHostShim::awaitCommandListCompletion
1740 2635.984 1.515 2635.984 └─ [24%] [ txe_silu ]
925 1379.715 1.492 1379.715 └─ [12%] [ txe_mult ]
50 74.450 1.489 74.450 └─ [ 1%] [ txe_add ]
2715 0.448 0.000 0.448 └─ [ 0%] TXE 0 Idle
1 34.000 34.000 34.000 [ 0%] RuntimeHostShim::finalize
1 16.000 16.000 1.000 [ 0%] GGML Tsavorite
1 15.000 15.000 15.000 └─ [ 0%] RuntimeHostShim::initialize
2716 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate
9120 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate
33558 11098.000 0.331 11098.000 [100%] TOTAL
⟵ Back to Form
The URL used is as follows
http://10.50.0.124:5003/llama-cli?model=tiny-llama&backend=tSavorite&tokens=10&prompt=My+cat%27s+name&repeat-penalty=1.5&batch-size=1024&top-k=50&top-p=0.9&last-n=5&context-length=12288&temp=0.0
Make sure to read the contributing guidelines before submitting a PR