@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance #40

akapoor3518 · 2025-07-18T04:44:10Z

currently this script is used for prompt different size, This script is used for llama.cpp/ggml profiling

following are output
python3 model-rerun-latest.py /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf

🔄 Run 1: Testing with prompt size 1x, actual size = 825 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 2: Testing with prompt size 2x, actual size = 1650 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 3: Testing with prompt size 3x, actual size = 2475 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 4: Testing with prompt size 4x, actual size = 3300 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 5: Testing with prompt size 5x, actual size = 4125 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

📊 Benchmark Summary:
Run Prompt Size Load Time (ms) Prompt Eval Time (ms) Eval Time (ms)
1 1x 175857.14 76355.68 76355.68
2 2x 158176.25 155966.18 155966.18
3 3x 242583.75 241903.71 241903.71
4 4x 333449.07 332706.51 332706.51
5 5x 422943.94 419110.52 419110.52
[akapoor@wssw01 llama.cpp]$

…t to measure performance

atrivedi-tsavoritesi · 2025-07-18T04:46:41Z

@akapoor3518 does this work with FPGA as well ?

akapoor3518 · 2025-07-18T17:46:33Z

We're currently focused on POSIX. A separate pull request will follow to enable FPGA support and introduce enhancements. This is the initial seed version, with more changes planned to enhance profiling and performance test result.

@FIR-827 - llama.cpp: python script to run model with different promp…

fa2243b

…t to measure performance

akapoor3518 requested a review from atrivedi-tsavoritesi July 18, 2025 04:45

akapoor3518 requested a review from LewisLui777 July 18, 2025 04:49

atrivedi-tsavoritesi approved these changes Jul 18, 2025

View reviewed changes

akapoor3518 merged commit 7d0eb95 into master Jul 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance #40

@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance #40

Uh oh!

akapoor3518 commented Jul 18, 2025

Uh oh!

atrivedi-tsavoritesi commented Jul 18, 2025

Uh oh!

akapoor3518 commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance #40

@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance #40

Uh oh!

Conversation

akapoor3518 commented Jul 18, 2025

Uh oh!

atrivedi-tsavoritesi commented Jul 18, 2025

Uh oh!

akapoor3518 commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants