You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
main.cpp depends on common.cpp and sampling.cpp - both of which I consider out of scope for this project to maintain bindings too. (I wrote our own version of grammar.cpp to avoid extra bindings).
There is a plan on the llama.cpp side to move sampling.cpp behind llama.h in which case I imagine the sampling params would align a lot better. If there's specific context, model, or sampling params you want to tune I'd be happy to add them one by one.
I've created #109 to attempt to slowly move us towards allowing replicating main.cpp in rust.
When compiling llama.cpp "out of the box" and prompting it as follows... ( in this case on a Mac M1 )
./main -p "Write a rhyme haiku about a rabbit and a cube." -m llama-2-7b-chat.Q4_0.gguf -n 128 -ngl 33 --mlock --threads 8
We can see that llama.cpp use the following sampling settings and order...
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 1
The ability to replicate settings and sample order would be very useful when comparing results with llama.cpp
Also - several of these are key to adjusting LLM behaviour - like temperature and penalty etc
The text was updated successfully, but these errors were encountered: