Fix for silent llama #279

dibrale · 2023-04-28T00:39:30Z

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.

@jmtatsch - This should fix the issue you experienced with my prior PR.

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.

dibrale · 2023-04-28T00:54:22Z

Attached is a rather lengthy example of the sort of output I get with this PR. More than anything, it illustrates the difficulties that my local model has staying on task and how these issues get propagated and/or handled internally with subsequent iteration. The propensity to produce empty output from time to time may be a context length issue, and I believe that further optimization in this regard is possible. Nonetheless, I feel that output has gotten much better. As ever, comments, criticisms and suggestions are greatly appreciated!

sample_output.txt

cornpo · 2023-04-28T04:14:58Z

This fixed my mlock issue. Current;y running for first time.

jmtatsch · 2023-04-28T09:36:32Z

@dibrale Thanks, that solves the issue for me.
However there are a couple of changes that don't seem necessary for me.
Maybe you can explain a bit why you chose to do them.
Why did you disable memlock?
Why did you halve CTX_MAX = 1024?
Why did you reduce max_tokens=200?

dibrale · 2023-04-28T17:32:05Z

@jmtatsch - To answer your questions:

1. mlock was not working properly for me anyway, and I believe it causes crashes on some machines. It is disabled by default in llama_cpp, and keeping it disabled in my PR is probably what fixed the issue @cornpo was having. Perhaps enabling mlock can be a .env option at some point.
2. At least some of the empty output problems seem to have been caused by prompt input exceeding what a llama model can work with. Reducing CTX_MAX to a fixed value is a quick and dirty way to fit more new input into the prompt without causing this issue. Prompt allowance utilization can probably be improved by some sort of dynamic adjustment to context length later on.
3. A modest max_tokens keeps the potential amount of context the LLM can add in any one step to a manageable amount. Since context_agent returns the top 5 results for the execution agent to work with, keeping each of these results to a smaller size prevents the execution agent from ingesting an overwhelmingly large prompt.

I hope this clarifies my reasoning a bit. Please let me know if you have any other questions or suggestions.

francip · 2023-05-01T00:34:24Z

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

dibrale · 2023-05-01T01:42:50Z

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

CTX_MAX only gets set for Llama models anyway, unless I am mistaken. Its assignment occurs behind a branch that checks for llama, and this behavior does not change in my PR. max_tokens is likewise only set to 200 when calling a llama model. Please correct me if I'm wrong on either of these points, and I will adjust the code if required. I'll work on reverting the prompt changes in the meantime.

At the request of the maintainer

dibrale · 2023-05-01T02:13:52Z

@francip - I've reverted the prompt changes, so hopefully this PR meets your needs. I'll submit prompt changes for separate consideration.

dibrale · 2023-05-01T02:59:34Z

Prompt changes are now in a separate PR as requested. Please let me know if there are any further issues.

Fix for silent llama

dc7c3b1

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.

dibrale mentioned this pull request Apr 28, 2023

Further Llama Compatibility #265

Merged

cornpo mentioned this pull request Apr 28, 2023

duckdb.ParserException: #276

Closed

dibrale added 2 commits April 30, 2023 20:45

Merge branch 'yoheinakajima:main' into patch-1

7faedc4

Revert prompt changes

e3d0f47

At the request of the maintainer

francip approved these changes May 10, 2023

View reviewed changes

francip merged commit d0a8400 into yoheinakajima:main May 10, 2023

dibrale deleted the patch-1 branch May 10, 2023 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for silent llama #279

Fix for silent llama #279

dibrale commented Apr 28, 2023

dibrale commented Apr 28, 2023

cornpo commented Apr 28, 2023

jmtatsch commented Apr 28, 2023

dibrale commented Apr 28, 2023

francip commented May 1, 2023

dibrale commented May 1, 2023

dibrale commented May 1, 2023

dibrale commented May 1, 2023

Fix for silent llama #279

Fix for silent llama #279

Conversation

dibrale commented Apr 28, 2023

dibrale commented Apr 28, 2023

cornpo commented Apr 28, 2023

jmtatsch commented Apr 28, 2023

dibrale commented Apr 28, 2023

francip commented May 1, 2023

dibrale commented May 1, 2023

dibrale commented May 1, 2023

dibrale commented May 1, 2023