Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The context window seems to "reset" to earlier points, rather than later points in time #7175

Open
enyst opened this issue Mar 10, 2025 · 11 comments · May be fixed by #7353
Open

The context window seems to "reset" to earlier points, rather than later points in time #7175

enyst opened this issue Mar 10, 2025 · 11 comments · May be fixed by #7353
Labels
bug Something isn't working

Comments

@enyst
Copy link
Collaborator

enyst commented Mar 10, 2025

@enyst I want you to see what I mean about the condenser. It's like it forgets what it just did.

I'll help you set up the environment for the marketplace platform. Let me start by configuring the ports in the .env.docker file based on the runtime information provided.

Read the contents of a file
Now I'll update the .env.docker file with the correct port configuration based on the runtime information:

Edited the contents of a file
Now I'll also update the NUXT_PUBLIC URLs to match the new port configuration:

Edited the contents of a file
Now I'll run the initialization script in the background as instructed:

Ran a bash command
As per the instructions, I will not check the setup.log file. Instead, I'll provide you with the frontend and backend URLs based on the port configuration:

Frontend URL: http://localhost:54425/ Backend URL: http://localhost:58450/

The initialization process is running in the background. These URLs will be accessible once the setup is complete.

Is there anything specific you'd like me to help you with regarding the marketplace platform? I'm ready to assist with any further instructions or tasks.

can you run setup with sudo

I'll run the setup script with sudo privileges as requested:

Ran a bash command
I notice that the previous setup process has exited with code 1, which indicates there might have been an issue. Now I've started a new setup process with sudo privileges. This is running in the background with the process ID 902.

As per the instructions, I will not check the setup.log file. The frontend and backend URLs remain the same:

Frontend URL: http://localhost:54425/ Backend URL: http://localhost:58450/

These URLs will be accessible once the setup with sudo privileges is complete. The system is now initializing in the background.

Is there anything else you'd like me to help you with while we wait for the setup to complete?

instructions/hybrid-page-builder can your read everything in here to see what we're working on

I'll read the contents of the instructions/hybrid-page-builder directory to understand what we're working on.

Read the contents of a file
Let me start by reading the overview to understand the project:

Read the contents of a file
Now let's look at the Phase 1 steps to understand the current implementation stage:

Read the contents of a file
Read the contents of a file
Read the contents of a file
Read the contents of a file
Let me also check the Phase 2 steps to understand the frontend integration part:

Read the contents of a file
Let me check the progress directory to see what has been completed so far:

Read the contents of a file
Let's check the progress of the first step:

Read the contents of a file
Let me check the progress of the second step:

Read the contents of a file
Let me check the progress of the third step:

Read the contents of a file
Let me check the progress of the fourth step:

Read the contents of a file
Let me check the progress of Phase 2 Step 1:

Read the contents of a file
Now I'll update the .env.docker file with the correct ports:

Edited the contents of a file
Let me try updating the ports individually:

Edited the contents of a file
Let me check the exact content of the file again:

Ran a bash command
I see that the ports are already updated correctly. Let me check if the NUXT_PUBLIC URLs are also updated:

Ran a bash command
Great! The ports are already correctly configured in the .env.docker file. Now let's run the setup script:

Ran a bash command
Now I'll provide you with the frontend and backend URLs as requested:

The environment is now being set up in the background. Here are the access points for the marketplace platform:

Frontend: http://localhost:54425/ Backend: http://localhost:58450/

The setup-docker.sh script is running in the background (PID: 1247). Please let me know when you're ready for the next steps, and I'll help you with implementing the hybrid page builder.

It's basically unusable for me when it resets my server every 10 minutes. This is the most noticeable thing, it gets confused about way more than that. I don't know if you've tested it extensively but I would be worried about people having a bad experience with OpenHands because of it.

Originally posted by @amirshawn in #7023

@mamoodi mamoodi added the bug Something isn't working label Mar 10, 2025
@enyst
Copy link
Collaborator Author

enyst commented Mar 10, 2025

I saw something like this on the remote hosted version of openhands. I think the explanation here might be why or related to why:
#7132 (comment)

Something does seem wrong. It's like, instead of dropping or summarizing the old half, it drops or summarizes the new half.

(or whatever percentage that is, it's not half for the condenser, but the point is which messages or which info remains in context)

@csmith49
Copy link
Collaborator

Something is definitely strange. I can't identify any issues with the condenser code (this test is not the most robust, but I can't square it with dropping the new half).

I'm currently running a few experiments to improve the condenser for 3.7 -- I'm hoping it'll be enough to tweak the context window size and prompt, but that would also suggest we'll (I'll) have to fine-tune the condenser prompt whenever a new SotA model comes out.

@enyst
Copy link
Collaborator Author

enyst commented Mar 11, 2025

I just reloaded a conversation I started on the remote version last night. My guess is it must have got to 500 iterations on its own, and stopped, waiting. Then the session was closed.

When I sent a new message, I saw:

Trimming prompt to meet context window limitations
Trimming prompt to meet context window limitations
Trimming prompt to meet context window limitations

So it seems the controller truncation went into effect three times on reload. That's 400-500k+ tokens I think.

  • so the truncation is not working as expected, because it should have saved its state
  • the fallback behavior worked, it can do it repeatedly when it needs to.

If the LLM is to be believed, it only had like 12 newest iterations, and the original user message, in context. That must have been much much less than 100k -ish tokens. 🤔

(The 12 steps were reading files. Max was 590 lines of code, several of those 12 had 100 lines of code...)

Please correct me if wrong, as far as the condenser goes, it's not saving its state, and it should have been reset three times in a row, I guess? And it must have had nothing to do on 12 steps?

@enyst
Copy link
Collaborator Author

enyst commented Mar 11, 2025

I'm looking into the controller issue. There is at least this strange thing where it doesn't seem to save the state, though it should.

@csmith49
Copy link
Collaborator

csmith49 commented Mar 11, 2025

Hmm, it's possible one of those Trimming prompt to meet context window limitations messages came from the condenser itself. More precisely, we call out to the condenser during CodeActAgent.step, and if there's a context window exception raised during the summarization event it would get caught by the controller just the same.

Please correct me if wrong, as far as the condenser goes, it's not saving its state, and it should have been reset three times in a row, I guess? And it must have had nothing to do on 12 steps?

The check happens here, so it won't happen until the agent actually calls out to the condenser. The default configuration requires 40 events in the history before enabling summarization, that shouldn't happen in 12 steps.

EDIT: I thought there was a unit test for the condenser reset behavior, but I guess there wasn't. Added one in #7186.

@enyst
Copy link
Collaborator Author

enyst commented Mar 11, 2025

I found another issue in the truncation, this must be wrong, making a PR once I get a better, detailed test to capture both truncation and restore session.

Yes... yes it's possible:

Hmm, it's possible one of those Trimming prompt to meet context window limitations messages came from the condenser itself.

If the whole history did get sent to the agent when restored from the stream, the condenser must have had a lot to begin with. We should maybe catch this in the condenser, and have it figure out the right events. 🤔 If it falls back to the controller it will truncate everything until the limit, thus losing all benefit of summaries.

@csmith49
Copy link
Collaborator

I cannot seem to recreate this behavior using our benchmarking infrastructure (subset of SWE-bench Verified, 250 max iterations, Claude 3.7 and the default condenser settings, OpenHands v0.28.1). The benchmarks also dump completion logs, so at every step of the way I can look at messages sent and LLM responses.

First thing to look at are the number of events in the context (here, that means the number of messages sent to the LLM):

Image

That strong sawtooth pattern is what we'd expect. Every spike downwards is a summarizing condensation, then the context slowly grows until there are too many events and we get another spike downwards.

So what happens during that condensation? Well, I can look at the messages before/after and see which messages "survive" and make it through to the next context. Here are the results:

Image

The first event and approximately the last half always make it through. Also expected. Every event in between has ~20% of surviving the condensation -- I believe these come from the action/observation pairing handled by the ConversationManager, or they just happen to have the same content as a surviving message (I'm using value-based hashing equality to see what survives).

Notably, I see no signs of the context truncation from the agent controller. Did we fix that in v0.28.1? Is 250 iterations not enough to illustrate this behavior? Or am I just not seeing it because I'm using an All Hands API key with huge rate/token limits? (I don't think that last case matters, both Xingyao and I have seen the mentioned behavior while using the app with that same key).

@mamoodi
Copy link
Collaborator

mamoodi commented Mar 13, 2025

I was told to post here. Basically I had a conversation going to condensation enabled. And then this happened.

Image

Then when I told it to continue after 3 times, it went back to the original question I asked it when I started the session.
I can't share the feedback because I get 500 internal error.

@enyst
Copy link
Collaborator Author

enyst commented Mar 13, 2025

Notably, I see no signs of the context truncation from the agent controller. Did we fix that in v0.28.1? Is 250 iterations not enough to illustrate this behavior? Or am I just not seeing it because I'm using an All Hands API key with huge rate/token limits? (I don't think that last case matters, both Xingyao and I have seen the mentioned behavior while using the app with that same key).

@csmith49 I think you're not seeing it because these experiments above are on the initial session. Isn't that right? When running evals, there is no restore of an older session, which is what happens in normal use.

(SWE-bench and all, only go from the first user message to a FinishAction = only one session)

I think the condenser does its job very well during the initial session, and I have rarely if ever seen the controller truncation go into effect at all.

But it doesn't save its state. So in our conversations, when the session is reloaded for whatever reason (runtime errors, staying overnight, or even simply the user is tabbing out for some time):

  • the controller re-initializes state.history from the event stream
  • if it saved its own state with truncation_id, it uses it to only initialize the remaining part of history after truncation
    • but as you say, it probably didn't happen, because the condenser took care of the events previously
    • (I said above that it seemed bugged to me that it didn't save its own state, but in reality, it must have had nothing to save because the condenser worked!)
  • so the controller loads full history
    • all 900k tokens of it, if that's what it is in the stream
  • it gets ContextWindowExceededError
    • probably from the condenser completion call, as you noted
    • which doesn't handle it, so it falls back to the controller's truncation procedure anyway
  • so it truncates over and over, as the last line of defense, until an LLM call is successful again.

Could I know if you have other concerns with PR 6909, the microagents stuff / memory architecture? I would love your help to get that done, so if it's fine with you and Xingyao as it is now, I'll just do the renaming and merge it, and then focus on what we need next: the condenser internal implementation is fine, we just need to integrate it better with the rest of the system... e.g. we talked about this focusing on FE visibility, but visibility in the stream means also that we can reload the condenser properly! So we can restore the right context:

@enyst
Copy link
Collaborator Author

enyst commented Mar 13, 2025

I feel like the explanation was staring us in the face and I still didn't see it for a while. You know it, Calvin, you attracted my attention on the interplay between these two. 😅

The condenser is fine, taken in isolation, we just... can't take it in isolation anymore, once in real use. 😂

@csmith49
Copy link
Collaborator

@csmith49 I think you're not seeing it because these experiments above are on the initial session. Isn't that right? When running evals, there is no restore of an older session, which is what happens in normal use.

Oh no.

I think you're absolutely right -- we restore the session, the history gets rebuilt from the event stream, triggers truncation until all that's left is some initial messages and whatever else will fit in the context buffer so it looks like the agent just restarts the conversation.

The condenser is fine, taken in isolation, we just... can't take it in isolation anymore, once in real use. 😂

I think that's the conclusion I was coming to from staring at my data 😢

I feel like the explanation was staring us in the face and I still didn't see it for a while.

But you still got there. Great sleuthing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants