@FIR-998 - Create GLU/SWIGLU Support for posix and fpga #59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#########
Posix validation
akapoor@wssw01 llama.cpp]$ build-posix/bin/llama-cli -p "my cat's name" -m /proj/rel/sw/ggml/models/Tiny-Llama-v0.3-FP32-1.1B-F32.gguf --device tSavorite -c 12288 --temp 0.0 --n-predict 10 --repeat-penalty 1.5 -b 1024 --top-k 50 --top-p 0.9 --repeat-last-n 5 --no-warmup --no-display-prompt
is Luna.
I'm a cat
llama_perf_sampler_print: sampling time = 25.69 ms / 16 runs ( 1.61 ms per token, 622.86 tokens per second)
llama_perf_context_print: load time = 3649.58 ms
llama_perf_context_print: prompt eval time = 2890.32 ms / 6 tokens ( 481.72 ms per token, 2.08 tokens per second)
llama_perf_context_print: eval time = 4057.99 ms / 9 runs ( 450.89 ms per token, 2.22 tokens per second)
llama_perf_context_print: total time = 7737.19 ms / 15 tokens
=== GGML Perf Summary ===
Op Target Runs Total us Avg us
ADD OPU 4048 7484834 1849.02
MUL OPU 4140 3206684 774.56
RMS_NORM OPU 4140 3341530 807.13
MUL_MAT CPU 72790 104568500 1436.58
CONT CPU 15674 827400 52.79
RESHAPE CPU 22692 13660 0.60
VIEW CPU 35074 5018 0.14
PERMUTE CPU 27582 5844 0.21
TRANSPOSE CPU 6652 1748 0.26
GET_ROWS CPU 766 2180 2.85
SET_ROWS CPU 15056 12454 0.83
SOFT_MAX CPU 7606 648106 85.21
ROPE CPU 15752 82048 5.21
GLU OPU 2024 3616792 1786.95
OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
2090 871.9760 0.4172 107.6070 [ 8.91%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
4180 763.4950 0.1827 763.4950 └─ [ 7.80%] tsi::runtime::executeWithTimeout
2090 0.8740 4.18e-04 0.8740 └─ [8.93e-03%] LOAD_BLOB Command Execution
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148311680[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148324992[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148312704[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148313728[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148314752[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148315776[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148316800[0x800...
4 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148317824[0x800...
3 0.0000 0.0000 0.0000 └─ [0.00e+00%] Command{command=2 (LOAD_BLOB), blob_args=[2148318848[0x800...
##########
FPGA Validation(working on will add shortly)