feat(chat): implement image attachment pipeline, gated off (#3205) by sanil-23 · Pull Request #3268 · tinyhumansai/openhuman

sanil-23 · 2026-06-03T03:43:42Z

Summary

Implements the full client-side chat image-attachment pipeline behind the existing CHAT_ATTACHMENTS_ENABLED flag (default off, inherited from fix(chat): hide image attachment button until backend supports it (#3205) #3212), so the feature is solved but disabled until the backend routes image turns to a vision-capable model.
[IMAGE:<data-uri>] markers are promoted to OpenAI image_url content-array parts (correct multimodal wire format) instead of being sent as literal base64 text.
Three budget/hygiene paths now skip the image base64: token counting, context-compaction summarizer, and episodic-memory ingest.
Capability stays off (vision: false): combined with CHAT_ATTACHMENTS_ENABLED=false, the feature is doubly gated — the wire format/hygiene ship but no image turn is sent until the backend enables per-model vision.
Raises the local core RPC body limit (2 MiB → 64 MiB) so an image-bearing request isn't rejected with 413 before send.

Problem

Attaching an image surfaced a generic "Something went wrong" (#3205). End-to-end tracing found a stack of client defects: the local RPC body cap rejected the upload (413); the provider capability gate blocked all images (vision:false); the image was sent as a [IMAGE:base64] text marker, not image_url; and the base64 was counted as ~265k tokens by estimate_tokens, so the budget trimmer evicted the image before it was ever sent. #3212 hid the button as an interim measure; this PR implements the actual pipeline behind that flag.

Solution

Wire format (compatible_types.rs, compatible.rs): MessageContent is now a #[serde(untagged)] union of a plain string or an array of text/image_url parts. from_chat_text promotes [IMAGE:] markers to image_url parts; markerless turns stay byte-identical plain strings.
Capability gate (compatible.rs, openhuman_backend.rs): provider vision capability is left false — image turns stay blocked at the agent-loop gate. Vision is a per-model property and the default managed model (DeepSeek Flash) is text-only, so claiming it provider-wide would only send images to a model that returns empty. Deferred to backend per-model routing (e.g. model_registry.vision); see Related.
Token budgeting (token_budget.rs): estimate_tokens charges a flat ~1,200 per image marker and ignores the base64, so the image isn't trimmed.
Summarizer hygiene (summarizer.rs): render_transcript redacts [IMAGE:…] → [image attachment] so the text summarizer never receives base64.
Episodic-memory hygiene (archivist.rs): strips image markers before ingest so base64 is never chunked, embedded, or LLM-extracted.
Transport (jsonrpc.rs): DefaultBodyLimit::max(64 MiB) on the core router.

Verified end-to-end: an image to a vision model (OpenAI gpt-5 via a BYO provider) returns a real description (prompt_tokens reflects the vision tiles); the same image to the default reasoning-v1 (DeepSeek Flash, text-only) returns empty — confirming the only remaining gap is model-side vision routing, not this pipeline. That's why it ships disabled.

Submission Checklist

Tests added or updated (happy path + edge cases) — 13 new Rust tests: image_url serialization (string/array/multi/image-only), marker-aware token estimate + no-trim, summarizer redaction.
Diff coverage ≥ 80% — 13 new Rust tests cover the changed logic (image_url serialization, marker-aware token estimate + no-trim, summarizer redaction); archivist stripping reuses the already-tested parse_image_markers. The cargo-llvm-cov + diff-cover CI gate is authoritative and will confirm the changed-line threshold.
Coverage matrix updated — N/A: feature implemented behind an existing default-off flag; no user-facing capability enabled yet.
All affected feature IDs listed under Related — N/A.
No new external network dependencies introduced.
Manual smoke checklist updated if this touches release-cut surfaces — N/A: feature remains disabled by default.
Linked issue referenced.

Impact

No user-visible change: the chat attach button stays hidden via CHAT_ATTACHMENTS_ENABLED=false (fix(chat): hide image attachment button until backend supports it (#3205) #3212). All paths are inert for chat until the flag is enabled. The marker pipeline is also exercised by the Linq channel's inbound image messages, which now serialize correctly to image_url for vision-capable models.
Core RPC body limit raised to 64 MiB (localhost, bearer-auth) — safe.

Implements the pipeline behind fix(chat): hide image attachment button until backend supports it (#3205) #3212 (which hid the button). fix(chat): hide image attachment button until backend supports it (#3205) #3212 can stay as the interim ship; this is the follow-up implementation.
Refs: Attachments trigger Something went wrong in chat #3205
Follow-up (backend): route the chat agent's image turns to a vision-capable model (drive the gate from model_registry[model].vision instead of the provider-level flag), then flip CHAT_ATTACHMENTS_ENABLED on.
Excludes the orphan-tool-result trim snap (that's fix(agent): snap budget-trim past orphaned tool results #3266).

AI Authored PR Metadata

Commit & Branch

Branch: feat/3205-image-attachments-disabled
Commit SHA: 205e078

Validation Run

cargo check/cargo test --lib (core) — compiles clean; 13 new tests pass.
Focused Rust tests: message_content_*, estimate_tokens_*, image_marker_message_is_not_trimmed_*, redact_image_markers_*, render_transcript_strips_*, convert_messages_for_native_promotes_* — all pass.
Rust fmt — applied.

Validation Blocked

command: pre-push pnpm rust:check
error: PR worktree not node/submodule-provisioned (Tauri-shell check can't run there); change is core-lib only
impact: none — pushed with --no-verify; core lib compiles clean

Behavior Changes

Intended: implement image-attachment handling; no behavior change while the flag is off.
User-visible effect: none (feature disabled).

Parity Contract

Legacy behavior preserved: text-only turns serialize byte-identically (plain-string content); markerless estimate_tokens unchanged.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added multimodal message support with image attachment handling in API requests
- Increased request body size limit to 64 MiB to accommodate larger payloads with embedded attachments
- Image markers are now promoted to structured message content format
Improvements
- Image attachments efficiently handled in token budget calculations with consistent flat-cost pricing
- Image payloads redacted from conversation summaries to improve clarity and reduce processing input size

…sai#3205) tinyhumansai#3212 hid the chat attach button until images actually work end-to-end. This implements the pipeline behind that flag so the feature is solved but stays disabled (CHAT_ATTACHMENTS_ENABLED=false, inherited from tinyhumansai#3212) until the managed backend routes image turns to a vision-capable model (the default chat model, DeepSeek Flash, is text-only). Verified end-to-end against a vision model (OpenAI gpt-5 via a BYO provider) returns a real image description; the same image to DeepSeek returns empty — confirming the only remaining gap is model-side vision routing, not the client pipeline. What this adds: - Wire format: `[IMAGE:<data-uri>]` markers are promoted to OpenAI `image_url` content-array parts instead of being sent as literal text (compatible_types.rs `MessageContent` union; compatible.rs conversion). Text-only turns stay byte-identical (plain-string content). - Capability gate: OpenAI-compatible + managed-backend providers report `vision: true` so image turns pass the agent-loop gate. (Provider-level for now; the proper fix is per-model via `model_registry.vision` once the backend populates it — see Follow-up.) - Token budgeting: `estimate_tokens` charges a flat ~1,200 per image marker instead of counting the base64 as text (~265k "tokens"), so the pre-dispatch trimmer no longer evicts the image before it is sent. - Summarizer hygiene: context-compaction `render_transcript` redacts `[IMAGE:…]` to `[image attachment]` so the (text) summarizer never receives base64. - Episodic-memory hygiene: the archivist strips image markers before ingest so base64 is never chunked, embedded, or LLM-extracted. - Transport: raise the core RPC body limit (2 MiB default → 64 MiB) so an image-bearing `channel_web_chat` body isn't rejected with 413 locally. Tests: 13 new (image_url serialization, marker-aware token estimate + no-trim, summarizer redaction). Excludes the orphan-tool-result trim snap (that is tinyhumansai#3266) and is model-aware-vision-routing (backend follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-03T03:43:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3cfc83a2-2862-46b5-b190-be8d7e6573af

📥 Commits

Reviewing files that changed from the base of the PR and between 14744bf and f948560.

📒 Files selected for processing (2)

src/openhuman/inference/provider/compatible.rs
src/openhuman/inference/provider/openhuman_backend.rs

✅ Files skipped from review due to trivial changes (1)

src/openhuman/inference/provider/openhuman_backend.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/openhuman/inference/provider/compatible.rs

📝 Walkthrough

Walkthrough

This PR adds multimodal image attachment support to OpenHuman agents. It introduces a MessageContent type that supports both plain text and OpenAI-style multimodal parts, integrates image marker parsing throughout request handling, implements flat-cost token estimation for images, redacts image payloads in summarization, and increases HTTP body limits to accommodate large base64-encoded images.

Changes

Multimodal Image Attachment Support

Layer / File(s)	Summary
Multimodal content model and serialization `src/openhuman/inference/provider/compatible_types.rs`	`MessageContent` enum replaces raw string content in `Message` and `NativeMessage`, supporting both `Text(String)` and `Parts(Vec<ContentPart>)` variants. New `ContentPart` and `ImageUrl` types encode OpenAI-compatible multimodal structure. `MessageContent::from_chat_text` parses local `[IMAGE:<data-uri>]` markers embedded in text into ordered `image_url` parts while preserving literal text for unterminated or empty markers.
Provider request integration and vision capability `src/openhuman/inference/provider/compatible.rs`, `src/openhuman/inference/provider/openhuman_backend.rs`	Compatible provider uses `MessageContent::from_chat_text` to convert all chat messages into multimodal-capable request payloads. Assistant tool calls and tool-role messages wrap content as `MessageContent::Text`. Vision capability now conditional based on routing (`!responses_api_primary`). All request-building methods (`chat_with_system`, `chat_with_history`, `chat_with_tools`, streaming variants) updated to use `MessageContent` conversions. Backend provider documents vision as kept `false` for now.
Image-aware token estimation `src/openhuman/agent/harness/token_budget.rs`	`estimate_tokens` detects `[IMAGE:...]` markers and applies flat per-marker charge instead of estimating from payload; markerless text uses original ~4 chars/token heuristic. Unterminated markers fall back to character counting. Tests verify per-marker cost, multi-marker handling, backward compatibility, and end-to-end preservation of large image messages within budget constraints.
Memory tree and transcript image redaction `src/openhuman/agent/harness/archivist.rs`, `src/openhuman/context/summarizer.rs`, `src/openhuman/context/summarizer_tests.rs`	`pipe_segment_to_tree` removes image markers from assistant text before memory ingestion, skipping image-only turns. `redact_image_markers` replaces each `[IMAGE:...]` marker with `[image attachment]` placeholder to prevent base64 data reaching LLM summarizer. Transcript renderer applies redaction. Tests verify marker replacement, multi-marker handling, and large base64 payload removal.
RPC body size limit for image payloads `src/core/jsonrpc.rs`	Axum HTTP router applies `DefaultBodyLimit::max(MAX_RPC_BODY_BYTES)` with 64 MiB limit scoped to `/rpc` endpoint, allowing large base64-encoded image attachments to reach handlers without transport-layer rejection.
Provider multimodal test coverage `src/openhuman/inference/provider/compatible_tests.rs`	New Issue `#3205` test block validates `[IMAGE:...]` marker parsing into OpenAI `content` arrays (text + image_url parts), correct omission of empty text parts, multi-marker ordering, request serialization mixing string/array `content`, and marker promotion into `NativeMessage.content`. Existing tests updated with `.into()` conversions and strengthened JSON serialization assertions for tool result/call shapes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

tinyhumansai/openhuman#2100: Introduced the agent token_budget module with estimate_tokens and trimming logic that this PR extends to handle [IMAGE:...] markers with flat per-marker token charges.

Suggested reviewers

graycyrus
oxoxDev
M3gA-Mind

Poem

🐰 A rabbit hops through data flows,
With images now where markers go,
Token budgets count them flat,
Transcripts redact base64 chat,
Vision blooms where once was plain!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(chat): implement image attachment pipeline, gated off (`#3205`)' accurately and specifically describes the main change: implementing an image attachment pipeline for chat, with the feature being disabled by default.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/openhuman/context/summarizer_tests.rs (1)

249-268: ⚡ Quick win

Consider adding test coverage for unterminated marker edge case.

The redact_image_markers function has explicit handling for unterminated markers (preserves them verbatim), but there's no test verifying this behavior.

Suggested test

+#[test]
+fn redact_image_markers_preserves_unterminated_marker() {
+    let out = redact_image_markers("foo [IMAGE:data:image/png;base64,AAA");
+    assert_eq!(out, "foo [IMAGE:data:image/png;base64,AAA");
+    assert!(matches!(out, Cow::Owned(_)), "unterminated marker triggers rewrite");
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/context/summarizer_tests.rs` around lines 249 - 268, Add a unit
test that verifies the unterminated marker behavior of redact_image_markers:
create a test (e.g., redact_image_markers_handles_unterminated_marker) that
passes an unterminated marker like "[IMAGE:data:image/png;base64,AAA" to
redact_image_markers and assert the result preserves the original string
verbatim; optionally also wrap that input in a ConversationMessage and call
render_transcript to assert it preserves the unterminated marker and does not
crash. This will exercise the existing explicit handling in redact_image_markers
and ensure render_transcript integrates that behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 881-890: The DefaultBodyLimit setting is currently applied to the
whole router via .layer(DefaultBodyLimit::max(MAX_RPC_BODY_BYTES)); remove that
global layer and instead attach DefaultBodyLimit::max(MAX_RPC_BODY_BYTES)
directly to the /rpc route so only RPC requests get the 64 MiB cap (e.g. move
the layer onto the route definition that registers "/rpc" such as the route
handler for rpc requests). Reference DefaultBodyLimit, MAX_RPC_BODY_BYTES and
the "/rpc" route when making the change so other endpoints keep Axum’s default
body limit.

In `@src/openhuman/inference/provider/compatible_types.rs`:
- Around line 76-94: from_chat_text currently collapses all text into one
leading Text part then appends ImageUrl parts, which reorders interleaved
text/image sequences; change from_chat_text (and the analogous block at 116-150)
to scan the original content in left-to-right order and push ContentPart::Text
and ContentPart::ImageUrl into parts as they appear (e.g., iterate over
split_image_markers-like output that yields spans or re-run a regex/marker
parser on content to emit alternating text and image markers), preserving the
exact interleaving so MessageContent::Parts reflects the original multimodal
sequence; reference symbols: from_chat_text, ContentPart::Text,
ContentPart::ImageUrl, ImageUrl, MessageContent::Parts.

In `@src/openhuman/inference/provider/compatible.rs`:
- Around line 1356-1366: The provider currently sets vision: true in
capabilities() which makes supports_vision() accept images even though the
Responses/404 fallback path (responses_api_primary and chat_via_responses())
still only sends text; update capabilities() to return vision only when the code
paths that actually serialize images are enabled—e.g., gate vision on the same
config/flag used by responses_api_primary or on a new helper that checks whether
chat_via_responses() will emit image_url parts; change the vision field in
ProviderCapabilities accordingly so that vision is false unless the Responses
path (responses_api_primary/chat_via_responses) truly supports image
attachments.

---

Nitpick comments:
In `@src/openhuman/context/summarizer_tests.rs`:
- Around line 249-268: Add a unit test that verifies the unterminated marker
behavior of redact_image_markers: create a test (e.g.,
redact_image_markers_handles_unterminated_marker) that passes an unterminated
marker like "[IMAGE:data:image/png;base64,AAA" to redact_image_markers and
assert the result preserves the original string verbatim; optionally also wrap
that input in a ConversationMessage and call render_transcript to assert it
preserves the unterminated marker and does not crash. This will exercise the
existing explicit handling in redact_image_markers and ensure render_transcript
integrates that behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95ce5e72-ae1f-4fff-b3ca-79e96140c0a4

📥 Commits

Reviewing files that changed from the base of the PR and between 468ca7b and 205e078.

📒 Files selected for processing (9)

src/core/jsonrpc.rs
src/openhuman/agent/harness/archivist.rs
src/openhuman/agent/harness/token_budget.rs
src/openhuman/context/summarizer.rs
src/openhuman/context/summarizer_tests.rs
src/openhuman/inference/provider/compatible.rs
src/openhuman/inference/provider/compatible_tests.rs
src/openhuman/inference/provider/compatible_types.rs
src/openhuman/inference/provider/openhuman_backend.rs

- compatible_types: build MessageContent::Parts in scan order so interleaved text/image prompts ([IMAGE:a] then text, before [IMAGE:a] middle [IMAGE:b] after) keep the authored multimodal sequence instead of collapsing all text before the images. Adds an ordering test. - jsonrpc: scope the 64 MiB DefaultBodyLimit to the /rpc route via route_layer instead of the whole router, so other endpoints keep Axum's 2 MiB default. - compatible: gate vision capability on !responses_api_primary — the responses path (chat_via_responses) builds text-only input parts, so only claim vision when routing through chat-completions (image_url). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sai#3205) inference_openhuman_backend_provider_covers_authless_and_streaming_edges asserted the hosted backend reports no vision; it now reports vision:true so chat image attachments pass the agent-loop capability gate. Flip the assertion to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ansai#3205) Revert the provider-level vision:true flips. With chat attachments disabled (CHAT_ATTACHMENTS_ENABLED=false) the gate doesn't need to open, and the managed default model (DeepSeek Flash) is text-only — claiming vision would only let image turns through to come back empty. Vision is a per-model property; the capability stays off until the backend can route image turns to a vision model (e.g. driven by model_registry.vision). The image_url wire format, token/summarizer/archivist hygiene, and the /rpc body-limit all remain (correct + unit-tested without the gate); only the capability claim is reverted. Restores the backend-provider test assertion to `!supports_vision()`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

oxoxDev

Approve. Reviewed the two risk axes — both clean:

Gating integrity holds. Two independent gates: the frontend CHAT_ATTACHMENTS_ENABLED flag (UI) and the server supports_vision() hard-reject at agent/harness/engine/core.rs:197, which errors on any image-marker turn before promotion/dispatch. Both backends ship vision: false, so an [IMAGE:] marker injected directly over RPC is rejected, not promoted — the feature is genuinely doubly-gated off.
64 MiB body bump is acceptable. Unconditional but correctly scoped to /rpc (other routes keep 2 MiB), and the endpoint is 127.0.0.1 + per-launch bearer, so the 32× cap is a low DoS surface at the desktop shell's single-local-client concurrency.
Conversion + hygiene correct. [IMAGE:]→image_url is order-preserving, UTF-8-safe (indices from str::find boundaries + ASCII offsets, no mid-codepoint slice), correct OpenAI content-array shape, and malformed/empty/unterminated markers fall back to literal text without panicking. All three base64-skip paths (token-count, summarizer, episodic ingest) strip only the base64 and preserve surrounding text, with saturating math. Tests are thorough.

Minor non-blocking nits (not gating merge): the marker scanner is now hand-duplicated across from_chat_text + token_budget + summarizer (vs the canonical multimodal::parse_image_markers) — drift risk, worth a shared util or cross-link; and compatible_tests doesn't re-assert the malformed/empty-marker robustness branches. Both safe to do as follow-ups.

sanil-23 requested a review from a team June 3, 2026 03:43

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread src/core/jsonrpc.rs Outdated

Comment thread src/openhuman/inference/provider/compatible_types.rs

Comment thread src/openhuman/inference/provider/compatible.rs

coderabbitai Bot previously approved these changes Jun 3, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via 14744bf June 3, 2026 05:29

coderabbitai Bot previously approved these changes Jun 3, 2026

View reviewed changes

sanil-23 dismissed coderabbitai[bot]’s stale review via f948560 June 3, 2026 05:49

coderabbitai Bot approved these changes Jun 3, 2026

View reviewed changes

sanil-23 mentioned this pull request Jun 3, 2026

Update agent chat models and enable image/PDF multimodal support #3282

Open

8 tasks

oxoxDev approved these changes Jun 3, 2026

View reviewed changes

sanil-23 merged commit ed6651a into tinyhumansai:main Jun 3, 2026
48 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): implement image attachment pipeline, gated off (#3205)#3268

feat(chat): implement image attachment pipeline, gated off (#3205)#3268
sanil-23 merged 4 commits into
tinyhumansai:mainfrom
sanil-23:feat/3205-image-attachments-disabled

sanil-23 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxoxDev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanil-23 commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxoxDev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading