Skip to content

Conversation

@akapoor3518
Copy link

Perf Summary:
=== GGML Perf Summary ===
ADD : 172 runs, 63454 us total, avg 368.92 us
MUL : 133 runs, 6807770 us total, avg 51186.24 us
RMS_NORM : 178 runs, 3009 us total, avg 16.90 us
MUL_MAT : 676 runs, 6675856 us total, avg 9875.53 us
CPY : 168 runs, 1123 us total, avg 6.68 us
CONT : 87 runs, 344 us total, avg 3.95 us
RESHAPE : 316 runs, 284 us total, avg 0.90 us
VIEW : 294 runs, 67 us total, avg 0.23 us
PERMUTE : 294 runs, 105 us total, avg 0.36 us
TRANSPOSE : 68 runs, 31 us total, avg 0.46 us
GET_ROWS : 9 runs, 127 us total, avg 14.11 us
SOFT_MAX : 88 runs, 5991 us total, avg 68.08 us
ROPE : 175 runs, 3593 us total, avg 20.53 us
UNARY : 22 runs, 8535113 us total, avg 387959.68 us

Detailed information is written to the file, including the size (shape) of each tensor node. Below is a sample output:
cat ggml_perf.log
ggml_graph_compute_perf: total compute time: 22096.867 ms

  • BACKEND:CPU OP:GET_ROWS: total 0.110 ms over 4 runs (avg 0.028 ms) [shape=2048,6,1]
  • BACKEND:CPU OP:RMS_NORM: total 0.148 ms over 4 runs (avg 0.037 ms) [shape=2048,6,1]
  • BACKEND:TSAVORITE OP:MUL: total 143.046 ms over 1 runs (avg 143.046 ms) [shape=2048,6,1]
  • BACKEND:CPU OP:MUL_MAT: total 34.345 ms over 4 runs (avg 8.586 ms) [shape=2048,6,1]
  • BACKEND:CPU OP:RESHAPE: total 0.005 ms over 2 runs (avg 0.003 ms) [shape=64,32,6]
  • BACKEND:CPU OP:ROPE: total 0.329 ms over 4 runs (avg 0.082 ms) [shape=64,32,6]
  • BACKEND:CPU OP:MUL_MAT: total 4.916 ms over 4 runs (avg 1.229 ms) [shape=256,6,1]
  • BACKEND:CPU OP:RESHAPE: total 0.002 ms over 4 runs (avg 0.001 ms) [shape=64,4,6]
  • BACKEND:CPU OP:ROPE: total 0.032 ms over 4 runs (avg 0.008 ms) [shape=64,4,6]
  • BACKEND:CPU OP:MUL_MAT: total 4.840 ms over 4 runs (avg 1.210 ms) [shape=256,6,1]
  • BACKEND:CPU OP:RESHAPE: total 0.002 ms over 3 runs (avg 0.001 ms) [shape=64,4,6]
  • BACKEND:CPU OP:VIEW: total 0.007 ms over 4 runs (avg 0.002 ms) [shape=1536,1,1]
  • BACKEND:CPU OP:CPY: total 0.021 ms over 4 runs (avg 0.005 ms) [shape=1536,1,1]
  • BACKEND:CPU OP:RESHAPE: total 0.000 ms over 3 runs (avg 0.000 ms) [shape=256,6,1]
  • BACKEND:CPU OP:TRANSPOSE: total 0.002 ms over 4 runs (avg 0.001 ms) [shape=6,256,1]
  • BACKEND:CPU OP:VIEW: total 0.000 ms over 4 runs (avg 0.000 ms) [shape=6,256,1]
  • BACKEND:CPU OP:CPY: total 0.040 ms over 4 runs (avg 0.010 ms) [shape=6,256,1]
  • BACKEND:CPU OP:VIEW: total 0.001 ms over 4 runs (avg 0.000 ms) [shape=32,4,64]
  • BACKEND:CPU OP:PERMUTE: total 0.004 ms over 4 runs (avg 0.001 ms) [shape=32,64,4]

Copy link

@LewisLui777 LewisLui777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job this looks good.

Copy link

@atrivedi-tsavoritesi atrivedi-tsavoritesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes look good, just few minor indentation to be taken care of. Feel free to push the changes once indentation is addressed.

@akapoor3518 akapoor3518 merged commit 83de276 into master Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants