Conversation
Handle ImageContent blocks in user messages for Anthropic (base64 image block), OpenAI completions (image_url content part), and OpenAI responses (input_image with data URI). Text-only messages still use the efficient plain string path. Part of #29
Change Runner.Chat and Pool.Chat to accept `any` message (string or []ContentBlock). Add RPCEvent.Content field for persisting image blocks in session history. Add UserMessageToRPCEvent and MessageText helpers. Update convertHistory to reconstruct multimodal user messages. Part of #29
Add tele.OnPhoto handler that downloads the photo, base64-encodes it, detects MIME type, and sends it as a multimodal message (ImageContent + optional TextContent caption) through the agent pipeline. Refactor handleText into a shared handleMessage flow. Part of #29
Add ImageEvent to runner.Event for carrying base64 images through the stream. Collect images during streaming and send them as tele.Photo after the text response. Plumbs through streamDraft and streamEditEvents. Part of #29
Add image support section to telegram.md, update README feature list, and note multimodal message support in architecture.md. Part of #29
📊 Coverage ReportTotal coverage: 54.9% Per-package breakdown |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa8050e737
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| // Store user message so stateless runners can reconstruct the conversation. | ||
| userEvt := runner.RPCEvent{Type: "user_message", Summary: message} | ||
| userEvt := runner.UserMessageToRPCEvent(message) |
There was a problem hiding this comment.
Persist multimodal user content to session store
This now creates user events via UserMessageToRPCEvent(message), but the storage pipeline still writes user messages from evt.Summary only (agent/store/store.go rpcEventToEntry) and reads them back as summary text (entryToRPCEvents), so RPCEvent.Content (image blocks) is dropped on disk round-trips. In practice, photo inputs disappear after restart/cold load and auto-compaction token estimates undercount image-heavy sessions because the base64 payload never reaches the persisted log.
Useful? React with 👍 / 👎.
- Add MessageContent type alias for type-safe message passing - Extract shared HasImage/FlattenText helpers to ai/types - Cap photo download at 20MB via io.LimitReader - Reorder multimodal content blocks to text-first for better model results - Log JSON marshal errors in UserMessageToRPCEvent - Fix MessageText to join all text blocks instead of returning only the first - Add comprehensive tests for multimodal message handling and round-tripping
The store layer was only writing evt.Summary (text) for user messages, dropping the Content field that carries base64 image blocks. After a restart or cold load, photo inputs disappeared from session history. - Add piImageContent type for Pi-format image serialization - Write multimodal content blocks (text + image) when evt.Content is set - Restore RPCEvent.Content on load when image blocks are detected - Remove unused extractUserText (replaced by extractUserContent) - Add round-trip tests for multimodal and text-only user messages
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6aaf2759a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| b.bot.Handle(tele.OnPhoto, b.guard(func(c tele.Context) error { | ||
| return b.handlePhoto(c) | ||
| })) |
There was a problem hiding this comment.
Honor caption mentions for photo messages
Registering tele.OnPhoto behind b.guard means group photo updates now go through shouldRespondInGroup, but isMentionedOrReplied only checks c.Message().Text (not Caption). In group_mode=mention, a non-reply photo with @botname in the caption will be silently ignored, so the new image-input path does not work in the default group configuration unless users reply to a bot message. Please make mention detection include captions (or bypass this filter for photo captions) so photo handling is consistent with text mention semantics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 0e66a5c. isMentionedOrReplied now checks Caption in addition to Text, and handlePhoto strips the bot mention from captions in groups (consistent with handleText).
…dge cases - Export ContentBlockJSON and add BlockKindText/BlockKindImage constants - Merge double iteration in extractUserContent into single pass - Add DataURI() method on ImageContent, deduplicate across providers - Inline trivial flattenToolResult wrappers in both OpenAI providers - Use FlattenText in ToolResultToRPCEvent for consistency - Fix truncate() to respect UTF-8 rune boundaries - Fix io.LimitReader silent truncation for oversized photos
Summary
ImageContentsupport to all 3 AI providers (Anthropic, OpenAI completions, OpenAI responses)Runner.ChatandPool.Chatto accept multimodal messages (stringor[]ContentBlock)tele.OnPhotohandler in Telegram — downloads photo, base64 encodes, sends as multimodal messageImageEventplumbing through the stream for sending images back to TelegramRPCEvent.ContentfieldTest plan
go test -race ./...)Closes #29