-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The context window seems to "reset" to earlier points, rather than later points in time #7175
Comments
I saw something like this on the remote hosted version of Something does seem wrong. It's like, instead of dropping or summarizing the old half, it drops or summarizes the new half. (or whatever percentage that is, it's not half for the condenser, but the point is which messages or which info remains in context) |
Something is definitely strange. I can't identify any issues with the condenser code (this test is not the most robust, but I can't square it with dropping the new half). I'm currently running a few experiments to improve the condenser for 3.7 -- I'm hoping it'll be enough to tweak the context window size and prompt, but that would also suggest we'll (I'll) have to fine-tune the condenser prompt whenever a new SotA model comes out. |
I just reloaded a conversation I started on the remote version last night. My guess is it must have got to 500 iterations on its own, and stopped, waiting. Then the session was closed. When I sent a new message, I saw:
So it seems the controller truncation went into effect three times on reload.
If the LLM is to be believed, it only had like 12 newest iterations, and the original user message, in context. That must have been much much less than 100k -ish tokens. 🤔 (The 12 steps were reading files. Max was 590 lines of code, several of those 12 had 100 lines of code...) Please correct me if wrong, as far as the condenser goes, it's not saving its state, and it should have been reset three times in a row, I guess? And it must have had nothing to do on 12 steps? |
I'm looking into the controller issue. There is at least this strange thing where it doesn't seem to save the state, though it should. |
Hmm, it's possible one of those
The check happens here, so it won't happen until the agent actually calls out to the condenser. The default configuration requires 40 events in the history before enabling summarization, that shouldn't happen in 12 steps. EDIT: I thought there was a unit test for the condenser reset behavior, but I guess there wasn't. Added one in #7186. |
I found another issue in the truncation, this must be wrong, making a PR once I get a better, detailed test to capture both truncation and restore session. Yes... yes it's possible:
If the whole history did get sent to the agent when restored from the stream, the condenser must have had a lot to begin with. We should maybe catch this in the condenser, and have it figure out the right events. 🤔 If it falls back to the controller it will truncate everything until the limit, thus losing all benefit of summaries. |
I cannot seem to recreate this behavior using our benchmarking infrastructure (subset of SWE-bench Verified, 250 max iterations, Claude 3.7 and the default condenser settings, OpenHands v0.28.1). The benchmarks also dump completion logs, so at every step of the way I can look at messages sent and LLM responses. First thing to look at are the number of events in the context (here, that means the number of messages sent to the LLM): That strong sawtooth pattern is what we'd expect. Every spike downwards is a summarizing condensation, then the context slowly grows until there are too many events and we get another spike downwards. So what happens during that condensation? Well, I can look at the messages before/after and see which messages "survive" and make it through to the next context. Here are the results: The first event and approximately the last half always make it through. Also expected. Every event in between has ~20% of surviving the condensation -- I believe these come from the action/observation pairing handled by the Notably, I see no signs of the context truncation from the agent controller. Did we fix that in v0.28.1? Is 250 iterations not enough to illustrate this behavior? Or am I just not seeing it because I'm using an All Hands API key with huge rate/token limits? (I don't think that last case matters, both Xingyao and I have seen the mentioned behavior while using the app with that same key). |
@csmith49 I think you're not seeing it because these experiments above are on the initial session. Isn't that right? When running evals, there is no restore of an older session, which is what happens in normal use. (SWE-bench and all, only go from the first user message to a FinishAction = only one session) I think the condenser does its job very well during the initial session, and I have rarely if ever seen the controller truncation go into effect at all. But it doesn't save its state. So in our conversations, when the session is reloaded for whatever reason (runtime errors, staying overnight, or even simply the user is tabbing out for some time):
Could I know if you have other concerns with PR 6909, the microagents stuff / memory architecture? I would love your help to get that done, so if it's fine with you and Xingyao as it is now, I'll just do the renaming and merge it, and then focus on what we need next: the condenser internal implementation is fine, we just need to integrate it better with the rest of the system... e.g. we talked about this focusing on FE visibility, but visibility in the stream means also that we can reload the condenser properly! So we can restore the right context: |
I feel like the explanation was staring us in the face and I still didn't see it for a while. You know it, Calvin, you attracted my attention on the interplay between these two. 😅 The condenser is fine, taken in isolation, we just... can't take it in isolation anymore, once in real use. 😂 |
Oh no. I think you're absolutely right -- we restore the session, the history gets rebuilt from the event stream, triggers truncation until all that's left is some initial messages and whatever else will fit in the context buffer so it looks like the agent just restarts the conversation.
I think that's the conclusion I was coming to from staring at my data 😢
But you still got there. Great sleuthing! |
Originally posted by @amirshawn in #7023
The text was updated successfully, but these errors were encountered: