Skip to content

Telegram channel: image input support #29

@vaayne

Description

@vaayne

Telegram channel: image input support

Summary

Anna's Telegram channel currently only handles text messages. Users should be able to send images to Anna via Telegram and have the image content passed to the underlying model for vision-based understanding.

Motivation

Modern AI models like Claude 3 support multimodal input (text + images). Anna's Telegram channel does not yet forward image attachments to the model, so users who send photos, screenshots, or diagrams get no response or an error.

Supporting image input would unlock a wide range of practical use cases:

  • Ask Anna to describe or analyze a screenshot
  • Send a photo of a document and ask questions about it
  • Share a diagram and get feedback

Goals

  • Accept image messages (photo attachments) sent via Telegram
  • Download the image from Telegram's file API
  • Pass the image to the underlying model as a multimodal message (base64 or URL, depending on provider support)
  • Return a meaningful response based on image content

Non-goals

  • Video or audio input support
  • Image generation
  • Storing images persistently

Proposed approach

Telegram side

  • Handle tele.OnPhoto message events in the Telegram channel handler
  • Download the highest-resolution version of the photo using the Telegram Bot API
  • Encode the image as base64 or pass as a URL depending on the AI provider's multimodal API requirements

Model side

  • Extend the message content type to support image parts alongside text parts
  • Pass image content to the model in the format the provider expects (e.g. Anthropic's image content block)

Fallback

  • If the current model does not support vision, return a clear message to the user explaining that image input is not supported with the current model

Acceptance criteria

  • User sends a photo to Anna via Telegram
  • Anna receives and processes the image
  • Anna responds with a relevant description or answer based on image content
  • If the model does not support vision, Anna responds with a clear fallback message

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions