fix(codex): filter synthetic AGENTS startup messages by voarsh2 · Pull Request #1346 · vectorize-io/hindsight

voarsh2 · 2026-04-29T20:46:11Z

Summary

Filters Codex synthetic AGENTS.md startup messages out of the Codex integration transcript reader.

Codex can persist project startup instructions as a normal response_item with role: "user" and content shaped like:

# AGENTS.md instructions for ...
<INSTRUCTIONS>...</INSTRUCTIONS>

The Codex integration previously treated that as normal user conversation, so auto-retain and recall transcript composition could include agent rules/setup context instead of only the actual user/assistant session.

Why

This can pollute retained memory with hook rules, memory policy, MCP setup instructions, and other agent-control text that the user did not actually say as part of the working conversation.

In long-running Codex sessions, this can also contribute to oversized auto-retain payloads because the retained transcript includes synthetic setup context before the real conversation starts.

Implementation

Adds a small detector for Codex AGENTS.md startup instruction messages.
Skips those synthetic user messages in the text transcript reader.
Skips the same synthetic messages in the rich transcript reader used when tool calls are retained.
Leaves normal user discussion of AGENTS.md untouched.

Validation

pytest hindsight-integrations/codex/tests/test_content.py
pytest hindsight-integrations/codex/tests/test_hooks.py
Tested against a real Codex rollout transcript where the AGENTS startup block was persisted as a user message. After this change, the startup block is no longer present in parsed transcript output or retained chunk content.

voarsh2 · 2026-04-29T20:46:47Z

Also has some relevance to my issue #1345

fix(codex): filter synthetic AGENTS startup messages

d29e9ff

nicoloboschi approved these changes Apr 30, 2026

View reviewed changes

nicoloboschi merged commit b41e5e3 into vectorize-io:main Apr 30, 2026

voarsh2 mentioned this pull request Apr 30, 2026

Codex auto-retain includes synthetic setup context and can create huge full-session retain jobs #1345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex): filter synthetic AGENTS startup messages#1346

fix(codex): filter synthetic AGENTS startup messages#1346
nicoloboschi merged 1 commit intovectorize-io:mainfrom
voarsh2:pr/codex-filter-synthetic-startup-context

voarsh2 commented Apr 29, 2026

Uh oh!

voarsh2 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

voarsh2 commented Apr 29, 2026

Summary

Why

Implementation

Validation

Uh oh!

voarsh2 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants