Skip to content

Releases: KoljaB/RealtimeSTT

v0.3.100

23 Mar 11:03
Compare
Choose a tag to compare

RealtimeSTT 0.3.100

New VAD callbacks on_vad_start and on_vad_stop

  • triggering on VAD presence
  • reverted functionality of on_vad_detect_start, on_vad_detect_stop back to: triggered when the system starts/stops detecting for VAD presence

v0.3.99

21 Mar 19:10
Compare
Choose a tag to compare

RealtimeSTT 0.3.99

1. Enhanced Logging Configuration

  • Introduced a dedicated named logger realtimestt instead of using the root logger.
  • Added structured logging with handlers for both console (level set by user) and file (always DEBUG).
  • Logging no longer propagates to the root logger by default (logger.propagate = False).

2. Added possibility to disable Faster-Whisper VAD Filter

  • Added faster_whisper_vad_filter parameter (default: True) to enable voice activity detection (VAD) from the faster_whisper library.
  • Improves robustness against background noise at the cost of additional GPU resources.
  • Integrated into both real-time and main transcription workflows.

3. Audio Worker Improvements

  • Added improved, detailed debug logging for audio device initialization, sample rate handling, and resampling.

4. VAD Callback Adjustments

  • fixes #215
  • Moved on_vad_detect_start and on_vad_detect_stop callbacks to trigger directly during voice activity checks instead of state transitions.
  • Ensures callbacks align more accurately with actual speech/silence events.

v0.3.98

10 Mar 22:42
Compare
Choose a tag to compare

RealtimeSTT 0.3.98

  • minor fix for pypi wheel

v0.3.97

10 Mar 20:35
Compare
Choose a tag to compare

RealtimeSTT 0.3.97

v0.3.95

15 Feb 16:38
Compare
Choose a tag to compare

RealtimeSTT 0.3.95

  • better warmup (using audio file)
  • merged #200

v0.3.94

23 Jan 20:26
Compare
Choose a tag to compare

RealtimeSTT 0.3.94

  • New Parameters for stop-method of AudioToTextRecorder:
    • backdate_stop_seconds (float, default=0.0):

      • Description: Specifies the number of seconds to backdate the stop time when ending a recording.
      • Usage: When invoking stop() due to a wake word detection or a speaker diarization change event, this parameter compensates for any latency, ensuring that only relevant audio is included in the recording and transcription.
    • backdate_resume_seconds (float, default=0.0):

      • Description: Specifies the number of seconds to backdate the resume time when restarting listening after a recording has stopped.
      • Usage: Typically set to the same value as backdate_stop_seconds, this parameter allows for fine-tuning.

v0.3.93

18 Dec 18:19
Compare
Choose a tag to compare
  • fix for stt-server (got broken by webservers dependency upgrade because of an api change)
  • added initial_prompt_realtime to AudioToTextRecorder to be able to give different prompts to final and realtime model
  • added new parameters to client/server (download root, batch sizes)

v0.3.92

13 Dec 14:30
Compare
Choose a tag to compare
  • fixed dependencies (causing "ImportError: cannot import name 'BatchedInferencePipeline' from 'faster_whisper'")

v0.3.91

12 Dec 07:59
Compare
Choose a tag to compare
  • upgrade to 0.3.91 since 0.3.9 had issues on PyPi

v0.3.9

11 Dec 11:45
Compare
Choose a tag to compare

RealtimeSTT v0.3.9 Release Notes

🚀 New Features

Batched Transcription

  • Added support for batched transcription in both main and real-time models which improves performance and efficiency
  • New parameters introduced:
    • batch_size: Controls the batch size for main transcription tasks.
    • realtime_batch_size: Configures batch size for real-time transcription.

This feature is designed to speed up processing. I can't say yet if there may be cases where batching overhead impacts performance negatively. It looked promising for me in initial tests, but I need your feedback! Please report if you get into any issues or notice even slower transcription due to batching.