Skip to content

Release 0.4.0#59

Merged
ashish-spext merged 98 commits intomainfrom
release-0-4-0
Feb 12, 2026
Merged

Release 0.4.0#59
ashish-spext merged 98 commits intomainfrom
release-0-4-0

Conversation

@ankit-v2-3
Copy link
Collaborator

@ankit-v2-3 ankit-v2-3 commented Jan 6, 2026

Pull Request

Description:
Version 0.4.0 introduces real-time streaming and desktop capture capabilities to the VideoDB Python SDK. This release adds support for capturing audio/video from local devices, managing capture sessions in the cloud, and real-time transcription for live streams. WebSocket support enables real-time event handling for live workflows.


Major Features

Desktop Capture SDK

  • New CaptureClient class for local desktop recording with async/await support

    • request_permission() - Request microphone and screen capture permissions
    • list_channels() - Discover available audio/video input devices
    • start_session() / stop_session() - Control recording sessions
    • pause() / resume() - Pause and resume individual channels
  • Channel Management

    • Channel, AudioChannel, VideoChannel classes
    • ChannelList with .default property for easy access to default channel
    • Channels container with grouped access:
      • channels.mics.default / channels.displays.default / channels.system_audio.default
      • channels.all() - Get all channels as flat list
    • Configurable store property on channels

Capture Session Management

  • New CaptureSession class for cloud-side session handling

    • Track session status, RTStreams, and metadata
    • get_rtstream(category) - Filter RTStreams by channel type (mic, screen, system_audio)
  • Connection methods:

    • create_capture_session() - Initialize new capture sessions
    • get_capture_session() / list_capture_sessions() - Retrieve sessions
    • generate_client_token() - Generate time-limited tokens for capture operations

WebSocket Support

  • New WebSocketConnection class for real-time event streaming
    • connect_websocket() on Connection for establishing WebSocket connections
    • Async message sending and receiving
    • Context manager support (async with)

Authentication

  • New session_token parameter for videodb.connect()
    • Alternative authentication method alongside api_key
    • Enables temporary/scoped access for client applications

RTStream Enhancements

  • Transcript support:

    • start_transcript() / stop_transcript() - Control real-time transcription
    • get_transcript() - Retrieve transcription with pagination
  • Specialized indexing methods:

    • index_audio() - Index audio content
    • index_visuals() - Index visual content
  • Export functionality:

    • export() - Export RTStream recording to video/audio asset
    • RTStreamExportResult - Result class with video_id, stream_url, player_url
  • WebSocket integration:

    • ws_connection_id parameter support across RTStream methods for real-time updates
  • connect_rtstream() improvements:

    • New media_types parameter (replaces audio/video boolean flags)
    • New store parameter to enable recording storage for export

Video Enhancements

  • New clip() method - Generate AI-powered clips from video using prompts
  • New index_visuals() method - Index visual scenes with configurable batch extraction
  • New index_audio() method - Index audio content (alias for spoken word indexing)

Improvements

  • Added RTStreamChannelType constants: mic, screen, system_audio
  • Enhanced Shot objects with scene_index_id, scene_index_name, and metadata fields
  • Added segmentation_type parameter to index_spoken_words() with LLM segmentation support
  • Fixed upload from collection objects defaulting to wrong collection
  • Corrected zero-score threshold behavior in search defaults
  • Timeline editor handles large payloads via file upload
  • Windows console encoding support (non-UTF-8 handling)

Dependencies

  • Added: websockets>=11.0.3 as mandatory dependency
  • New optional extra: videodb[capture] for desktop capture (videodb-capture-bin>=0.2.7)

0xrohitgarg and others added 26 commits January 22, 2026 11:43
Rename the CaptureClient parameter from upload_token to client_token for consistency with the generate_client_token() method. This improves API naming consistency and matches user expectations.

- Rename __init__ parameter: upload_token -> client_token
- Update docstring to reflect new parameter name
- Keep "uploadToken" in binary protocol payload (required by recorder)
- Add session_token parameter to connect() function as alternative to api_key
- Update Connection class to accept and handle session_token authentication
- Update .gitignore to exclude videodb-recorder binary
- Update capture_bin package manifest

This enables frontend clients to create WebSocket connections using
time-bound session tokens for capture operations.
Changed store from hardcoded True to a configurable property that defaults to False.
- Reorganize capture binaries into platform folders (darwin_arm64, darwin_x86_64, win_amd64)
- Update package_data glob pattern to include subfolders (bin/**/*)
- Add platform detection logic to select correct binary at runtime
- Export RTStreamChannelType from videodb package
Use errors="replace" when decoding stdout/stderr from recorder binary
to handle Windows console encoding (CP1252) gracefully.
The second __all__ was overwriting the first, causing capture classes
(CaptureClient, Channel, etc.) to be missing from exports.
- Remove capture_bin/ from main SDK repo (will live in separate repo)
- Add ChannelList class with .default property for cleaner API
- Change API: channels.default_mic -> channels.mics.default
- Export ChannelList from package

Breaking change: channels.default_mic, channels.default_display,
channels.default_system_audio replaced with channels.mics.default,
channels.displays.default, channels.system_audio.default
- Bump videodb-capture-bin from >=0.2.4 to >=0.2.5
- Remove capture_bin from .gitignore (moved to separate repo)
…_rtstream

- Convert MediaType to str Enum for strict type enforcement
- Replace video: bool and audio: bool params with media_types: List[str]
- Default media_types to [MediaType.video] when not specified
- Add validation to reject invalid media type values
- Bump videodb-capture-bin dependency to >=0.2.7
- Add export() method to RTStream for exporting recordings as video/audio assets
- Add RTStreamExportResult class to hold export metadata
- Add store parameter to create_rtstream() for enabling recording storage
- Rename capture methods for consistency:
  - start_capture_session -> start_session
  - stop_capture -> stop_session
@ashish-spext ashish-spext merged commit 4800657 into main Feb 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants