Skip to content

Conversation

@akapoor3518
Copy link

currently this script is used for prompt different size, This script is used for llama.cpp/ggml profiling

following are output
python3 model-rerun-latest.py /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf

🔄 Run 1: Testing with prompt size 1x, actual size = 825 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 2: Testing with prompt size 2x, actual size = 1650 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 3: Testing with prompt size 3x, actual size = 2475 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 4: Testing with prompt size 4x, actual size = 3300 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

🔄 Run 5: Testing with prompt size 5x, actual size = 4125 characters
🚀 Executing llama-cli...
✅ Execution complete.
🔍 Parsing performance metrics...
📦 Metrics captured.

📊 Benchmark Summary:
Run Prompt Size Load Time (ms) Prompt Eval Time (ms) Eval Time (ms)
1 1x 175857.14 76355.68 76355.68
2 2x 158176.25 155966.18 155966.18
3 3x 242583.75 241903.71 241903.71
4 4x 333449.07 332706.51 332706.51
5 5x 422943.94 419110.52 419110.52
[akapoor@wssw01 llama.cpp]$

@atrivedi-tsavoritesi
Copy link

@akapoor3518 does this work with FPGA as well ?

@akapoor3518 akapoor3518 requested a review from LewisLui777 July 18, 2025 04:49
@akapoor3518
Copy link
Author

We're currently focused on POSIX. A separate pull request will follow to enable FPGA support and introduce enhancements. This is the initial seed version, with more changes planned to enhance profiling and performance test result.

@akapoor3518 akapoor3518 merged commit 7d0eb95 into master Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants