-
Notifications
You must be signed in to change notification settings - Fork 0
@FIR-770 - LLama.cpp: Adding Perf Status for all GGML Operation #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LewisLui777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job this looks good.
…ow it will be printed at last
atrivedi-tsavoritesi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes look good, just few minor indentation to be taken care of. Feel free to push the changes once indentation is addressed.
Perf Summary:
=== GGML Perf Summary ===
ADD : 172 runs, 63454 us total, avg 368.92 us
MUL : 133 runs, 6807770 us total, avg 51186.24 us
RMS_NORM : 178 runs, 3009 us total, avg 16.90 us
MUL_MAT : 676 runs, 6675856 us total, avg 9875.53 us
CPY : 168 runs, 1123 us total, avg 6.68 us
CONT : 87 runs, 344 us total, avg 3.95 us
RESHAPE : 316 runs, 284 us total, avg 0.90 us
VIEW : 294 runs, 67 us total, avg 0.23 us
PERMUTE : 294 runs, 105 us total, avg 0.36 us
TRANSPOSE : 68 runs, 31 us total, avg 0.46 us
GET_ROWS : 9 runs, 127 us total, avg 14.11 us
SOFT_MAX : 88 runs, 5991 us total, avg 68.08 us
ROPE : 175 runs, 3593 us total, avg 20.53 us
UNARY : 22 runs, 8535113 us total, avg 387959.68 us
Detailed information is written to the file, including the size (shape) of each tensor node. Below is a sample output:
cat ggml_perf.log
ggml_graph_compute_perf: total compute time: 22096.867 ms