Skip to content

0.3.0

Choose a tag to compare

@bayger bayger released this 10 Apr 15:34
· 171 commits to release since this release
1ffd75a

Breaking Changes

  • conversationId removed from startConversation WebSocket message — the field was previously accepted (and ignored) in the session start request; it has been removed from the schema entirely.
  • Action effect events deprecated — individual action effect events are replaced by a single executionPlan event emitted at the start of action execution. The old per-effect events are still emitted for now but are considered deprecated and will be removed in a future release.
  • linear16 audio format removed — the audio format option linear16 has been removed from the TTS provider schemas and channel configuration. Use pcm_16000 instead.

Added

  • Communication channel architecture — a new pluggable channel system (IClientConnection) decouples the conversation engine from transport. Channels supported in this release:
    • WebSocket channel — existing real-time WebSocket transport, now a first-class channel.
    • WebRTC channel — new channel for real-time audio communication over WebRTC.
    • Twilio Messaging channel — inbound/outbound SMS and WhatsApp messaging via the Twilio API with webhook signature validation.
    • Twilio Voice channel — inbound phone calls via Twilio Media Streams with audio streaming and DTMF support.
  • Server-side Voice Activity Detection (VAD) — experimental server-side VAD mode for voice conversations, with pre-warming of ASR sessions and improved handling of audio that arrives during the awaiting-user-input state.
  • Sample Copy system — a new content distribution mechanism that lets you define a pool of sample AI response copies:
    • SampleCopy entity with CRUD API (/api/projects/:projectId/sample-copies).
    • CopyDecorator entity for decorating selected copies with additional instructions (/api/projects/:projectId/copy-decorators).
    • Project-level sample copy settings (classifier assignment, distribution weights).
    • Forced-mode support: a sample copy can force-replace the LLM response.
    • Sample copy selection is tracked as a conversation event.
  • Slice-and-dice analytics query engine — a new flexible analytics sub-system:
    • Data sources: conversations, tool calls, classifications, context transformations, LLM events.
    • Dimensions include stageName, provider, model, and more.
    • normalizeBy parameter for two-phase aggregation.
    • Relative time range support (e.g. last_7_days).
    • Saved Slice Queries — persist and manage named slice queries with metadata via /api/projects/:projectId/saved-slice-queries.
  • Token usage statistics and trend endpoints — new analytics endpoints exposing LLM token consumption and trends.
  • User banning — admins can ban users; a banUser action effect is available in stage actions.
  • Audio format negotiation and conversion — the server now negotiates the optimal audio format with the client and performs on-the-fly conversion (via ffmpeg / SpeexResampler) when necessary. Mind the latency tax!
  • Content moderation execution modesstrict (block on any hit) and standard (block only on high-confidence hits) execution modes for the moderation pipeline.
  • Execution plan event — a single executionPlan WebSocket event is emitted at the start of action execution, replacing the previous per-effect event stream.
  • Detailed timing metrics — new timing fields throughout the analytics and conversation event pipeline: TTS connection time, stage-transition duration, prompt-render time, turn-end timestamp, and more.
  • API key channel and feature permission types — extended ApiKeyChannel enum with Twilio voice and messaging; new feature-level permission scopes for channels.
  • Permissions in authentication responsesPOST /api/auth/login and POST /api/auth/refresh now include the resolved permissions list for the authenticated user alongside roles.
  • Schema validation for WebSocket message handlers — all incoming WebSocket messages are validated against the contract schema before dispatch.
  • Channel catalog — a ChannelCatalog class exposes metadata and JSON schemas for all registered channel types.
  • version field in capability responses — versions are surfaced through channel descriptors.

Fixed

  • OpenAI TTS provider was not sending the last sentence of a response.
  • Forced copy and filler responses could be silently overwritten by subsequent LLM output.
  • Sample copy classifier could produce an empty candidate list, causing an unhandled error.
  • Duplicate entries in round-robin copy distribution.
  • Empty messages were being added to conversation history.
  • No speech was generated when a conversation ended or was aborted mid-turn.
  • Entity audit logs were not filtered by projectId, leaking entries across projects.
  • Deleting a project failed when associated user or guardrail entities existed (foreign key constraint).
  • Tool validation was incorrectly allowing null values in required fields.
  • firstTokenMs timing metric was recorded even when llmStartMs was absent, producing invalid deltas.
  • Turn data was not reset when a client-initiated action started a new turn, causing skewed timing metrics in analytics.