Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for silent llama #279

Merged
merged 3 commits into from
May 10, 2023
Merged

Fix for silent llama #279

merged 3 commits into from
May 10, 2023

Conversation

dibrale
Copy link
Contributor

@dibrale dibrale commented Apr 28, 2023

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.

@jmtatsch - This should fix the issue you experienced with my prior PR.

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.
@dibrale
Copy link
Contributor Author

dibrale commented Apr 28, 2023

Attached is a rather lengthy example of the sort of output I get with this PR. More than anything, it illustrates the difficulties that my local model has staying on task and how these issues get propagated and/or handled internally with subsequent iteration. The propensity to produce empty output from time to time may be a context length issue, and I believe that further optimization in this regard is possible. Nonetheless, I feel that output has gotten much better. As ever, comments, criticisms and suggestions are greatly appreciated!

sample_output.txt

@cornpo
Copy link
Contributor

cornpo commented Apr 28, 2023

This fixed my mlock issue. Current;y running for first time.

@cornpo cornpo mentioned this pull request Apr 28, 2023
@jmtatsch
Copy link
Contributor

@dibrale Thanks, that solves the issue for me.
However there are a couple of changes that don't seem necessary for me.
Maybe you can explain a bit why you chose to do them.
Why did you disable memlock?
Why did you halve CTX_MAX = 1024?
Why did you reduce max_tokens=200?

@dibrale
Copy link
Contributor Author

dibrale commented Apr 28, 2023

@jmtatsch - To answer your questions:

1. mlock was not working properly for me anyway, and I believe it causes crashes on some machines. It is disabled by default in llama_cpp, and keeping it disabled in my PR is probably what fixed the issue @cornpo was having. Perhaps enabling mlock can be a .env option at some point.
2. At least some of the empty output problems seem to have been caused by prompt input exceeding what a llama model can work with. Reducing CTX_MAX to a fixed value is a quick and dirty way to fit more new input into the prompt without causing this issue. Prompt allowance utilization can probably be improved by some sort of dynamic adjustment to context length later on.
3. A modest max_tokens keeps the potential amount of context the LLM can add in any one step to a manageable amount. Since context_agent returns the top 5 results for the execution agent to work with, keeping each of these results to a smaller size prevents the execution agent from ingesting an overwhelmingly large prompt.

I hope this clarifies my reasoning a bit. Please let me know if you have any other questions or suggestions.

@francip
Copy link
Collaborator

francip commented May 1, 2023

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

@dibrale
Copy link
Contributor Author

dibrale commented May 1, 2023

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

CTX_MAX only gets set for Llama models anyway, unless I am mistaken. Its assignment occurs behind a branch that checks for llama, and this behavior does not change in my PR. max_tokens is likewise only set to 200 when calling a llama model. Please correct me if I'm wrong on either of these points, and I will adjust the code if required. I'll work on reverting the prompt changes in the meantime.

@dibrale
Copy link
Contributor Author

dibrale commented May 1, 2023

@francip - I've reverted the prompt changes, so hopefully this PR meets your needs. I'll submit prompt changes for separate consideration.

@dibrale
Copy link
Contributor Author

dibrale commented May 1, 2023

Prompt changes are now in a separate PR as requested. Please let me know if there are any further issues.

@francip francip merged commit d0a8400 into yoheinakajima:main May 10, 2023
@dibrale dibrale deleted the patch-1 branch May 10, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants