Skip to content

Releases: trsdn/openwritr-windows

v0.3.0 — Parakeet on Hexagon NPU

03 Jun 09:24

Choose a tag to compare

📥 Which file do I download?

Your machine Download this
Snapdragon / ARM (e.g. Surface Pro 11, Surface Laptop 7) openwritr-windows-arm64-v0.3.0-setup.exe
Intel / AMD (most other laptops & desktops) openwritr-windows-x64-v0.3.0-setup.exe

Not sure which you have? Settings → System → About → "System type": if it says ARM-based processor take arm64, otherwise x64.

Install

  1. Run the setup .exe. Windows SmartScreen will warn ("Windows protected your PC") because the binaries are not code-signed yet — click More info → Run anyway. You can verify your download against SHA256SUMS.txt: Get-FileHash .\openwritr-windows-<arch>-v0.3.0-setup.exe in PowerShell.
  2. The installer creates everything itself (Start Menu entry, optional autostart, uninstaller under Settings → Apps). No folders to prepare.
  3. First launch downloads the speech model (~0.6–1.2 GB) — one-time, takes a couple of minutes. A microphone icon appears in the system tray when ready.
  4. Hold Ctrl + Win, speak, release — the text is pasted at your cursor.
Manual install (portable zip, no installer)

Download the .zip for your architecture instead, unzip it into any folder you like (e.g. C:\Tools\OpenWritr\), and run openwritr.exe from there. Same binaries — just no Start Menu entry, no autostart, no uninstaller. User data (settings, models, logs) goes to %LOCALAPPDATA%\OpenWritr\ automatically either way.


First release with Parakeet TDT v3 running on the Snapdragon X Elite Hexagon NPU (arm64) — plus a CPU-only build for Intel/AMD (x64).

Push-to-talk transcription on the NPU at typical 200–400 ms total decode for a 5-second utterance, with the encoder itself at ~67 ms steady-state per 8-second window. Long-form audio is chunked transparently (8 s window, 1 s overlap, decoder runs once over the stitched feature stream).

Performance

Measured on Snapdragon X Elite (X1E80100):

Audio length Decode (preproc + NPU encode + TDT) × Realtime Chunks
3 s 128 ms 23× 1
5.8 s 221 ms 26× 1
16.4 s 375 ms 44× 3
23.0 s 626 ms 37× 4

The x64 build runs the same pipeline on the CPU (no Hexagon NPU on Intel/AMD) at roughly 25× realtime on a modern CPU.

Switch engines (arm64 only)

Right-click the tray icon → Settings → Transcription engine. NPU is used when available, with automatic CPU fallback.

Companion model

The NPU encoder is hosted at trsdn/parakeet-tdt-0.6b-v3-htp-int8-8s (632 MB QAIRT context binary). CC-BY-4.0, attribution NVIDIA Parakeet. Downloaded automatically on first NPU launch.

What landed

  • NPU pipeline: Direct ort_sys FFI (src/asr/qnn_ffi.rs) loading an AI-Hub-compiled QNN context binary; chunked long-audio with seam stitching.
  • Focus-robust hotkey: global low-level keyboard hook — recording survives focus steals (popups, UAC, shortcuts).
  • Audio idle fix: capture stream rebuilt per recording — no more dead mic after long idle.
  • App icon, settings links, included-Copilot-model markers.
  • x64 build for Intel/AMD (CPU INT8, 9 MB zip).
  • CI: every release is built reproducibly on GitHub Actions (windows-11-arm).

Known limits

  • arm64 NPU model is device-gated to Snapdragon X Elite; other Snapdragons fall back to CPU.
  • Static 8 s NPU window; longer audio is chunked transparently.
  • Binaries unsigned (SmartScreen warning) — Store/signing in progress.

🤖 Generated with Claude Code

v0.2.2 — modifiers-only hotkey, model dropdown, licence compliance

02 Jun 10:44

Choose a tag to compare

Hotkey is now modifier-only by default: hold Ctrl+Win to record, Ctrl+Shift+Win to also run an LLM cleanup pass. Detection runs continuously via GetAsyncKeyState, so it keeps working even if Windows reserves the combo elsewhere.

Settings now exposes a model dropdown (Claude Haiku 4.5 default, GPT-5 Mini, GPT-4.1) matching the macOS app's enhanced-model menu. Custom model names are still editable via a text field.

*\gh auth token* now runs with no-window flag and caches the result for 10 minutes — no more CLI flash on every enhanced recording.

Third-party licence compliance: each release zip now ships a \ hird-party-licenses/\ folder containing the Qualcomm AI Engine Direct redistributable licence, Microsoft's ThirdPartyNotices for the onnxruntime-qnn package, and the onnxruntime-qnn LICENSE + Privacy notices. The QNN DLLs were already being bundled; only the licence text was missing.

README rewritten from scratch to reflect v0.2.x reality.

v0.2.1 — settings + overlay + multi-channel audio fixes

02 Jun 09:47

Choose a tag to compare

Fixes the three biggest issues users hit in v0.2.0:

  • Transcription no longer returns empty strings on multi-channel mics. The Qualcomm Aqstic mic array on Surface Pro exposes 4-8 interleaved channels at 48 kHz; the recorder was feeding the raw interleaved buffer to the resampler as if it were mono. Now downmixed properly inside the audio callback.
  • Settings UI no longer freezes the tray app. CreateProcessW on Windows ARM64 (especially with Defender real-time scanning) can block the calling thread for several seconds. Spawning it inline from the winit tick handler stalled the tray's message pump → app went 'Not Responding' → hotkey died. Now spawned from a worker thread; main pump keeps draining.
  • Visual recording indicator — small dark pill at bottom-center with 22 vertical white bars that breathe with the audio level. Custom Win32 layered top-most window on its own thread, color-key transparent for a clean shape, painted with double-buffered GDI.

Other improvements: ControlFlow::Wait + EventLoopProxy so the main pump sleeps when idle (no more 16 ms spin in about_to_wait); settings.json hot-reload via mtime polling (engine + hotkey changes take effect live, no restart).

OpenWritr v0.2.0 - Native Rust Build

02 Jun 08:57

Choose a tag to compare

OpenWritr v0.2.0 — Native Rust Build for Windows on ARM

Single ~22 MB distribution, no Python runtime, models downloaded on first launch from Hugging Face.

Install

  1. Download openwritr-windows-arm64-v0.2.0-dev.zip below.
  2. Extract anywhere (e.g. C:\Program Files\OpenWritr\).
  3. Run openwritr.exe. First launch downloads the Parakeet TDT v3 model (~670 MB) from Hugging Face into %LOCALAPPDATA%\OpenWritr\models\.
  4. Hold Ctrl + Win + Space, speak, release. Text is pasted at the cursor. Right-click the tray icon for Settings….

Highlights

  • ~6.4 MB openwritr.exe (was ~300 MB Python in v0.1)
  • ~30 MB cold-start RSS (was ~700 MB)
  • 25 languages via NVIDIA Parakeet TDT 0.6B v3 (INT8 CPU, ~140 ms / 11 s)
  • egui Settings dialog with hotkey + engine + LLM cleanup configuration
  • Warm start/stop tone cues
  • Optional cleanup pass via GitHub Copilot (uses gh auth token) or any OpenAI-compatible endpoint (hold Alt with hotkey)
  • Reliable hotkey FSM with safety auto-stop on modifier release

Known limitations vs v0.1 (Python)

  • NPU backends fall back to CPU INT8 in the native build. The ort Rust crate 2.0-rc.10 does not expose RegisterExecutionProviderLibrary, so we cannot load Qualcomm's onnxruntime_providers_qnn.dll the way Python ORT does. The Python v0.1 app (python/ folder) remains the NPU-capable reference.
  • No animated overlay yet — tray icon colour change is the visual state feedback. Coming in v0.3.

Licenses

  • OpenWritr source: MIT
  • NVIDIA Parakeet TDT 0.6B v3: CC-BY-4.0
  • istupakov Parakeet ONNX export: CC-BY-4.0
  • ONNX Runtime: MIT
  • Qualcomm QNN runtime DLLs (bundled): Qualcomm proprietary

OpenWritr for Windows v0.1.0

02 Jun 06:41

Choose a tag to compare

OpenWritr for Windows v0.1.0 — First Public Release

Push-to-talk voice-to-text tray app for Windows on ARM (Snapdragon X).
Hold a hotkey, speak, release — the transcript is pasted at your cursor.
Everything runs locally on your machine; nothing leaves the device unless
you turn on optional LLM cleanup.

Three transcription engines, all running locally

Engine Where it runs Languages Latency on 11 s English
Parakeet TDT v3 — CPU INT8 (default) CPU 25 ~160 ms (70× realtime)
Parakeet TDT v3 — NPU INT8 Hexagon NPU 25 ~110 ms (105× realtime)
Whisper Large v3 Turbo — NPU Hexagon NPU 99 ~800 ms / 30 s window

NPU runs use Qualcomm AI Engine Direct via onnxruntime-qnn 2.1, with the
encoder context cached on first use. Statically INT8-QDQ-quantized
Parakeet encoder runs entirely on the NPU; transcripts match the CPU
baseline exactly on validation clips.

Features

  • Native Windows tray app with Fluent-styled WPF settings dialog
  • Animated overlay with live audio-level meter (HiDPI-correct, Mica backdrop)
  • Configurable hotkey: any combination of Ctrl / Shift / Alt / Win plus a
    trigger key from Space / Tab / Caps Lock / Scroll Lock / Pause / Insert
    / Right Ctrl / F13..F20
  • Auto-paste at cursor with clipboard save/restore
  • Soft, warm start/stop audio cues
  • Optional LLM cleanup pass via GitHub Copilot (gh auth token) or any
    OpenAI-compatible endpoint — toggle with Alt held during recording
  • Reliable hotkey FSM: stops cleanly when any required modifier is
    released; safety auto-stop after max_record_seconds (60 s default)

Install (from source)

git clone https://github.com/trsdn/openwritr-windows.git
cd openwritr-windows
py -3.11-arm64 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r python\requirements.txt

# 640 MB Parakeet INT8 ONNX (default engine)
python python\fetch_model.py

# Optional: 1.6 GB Whisper Large v3 Turbo NPU build from Qualcomm AI Hub
python python\fetch_whisper.py

python python\openwritr.py

A blue microphone icon appears in your system tray. Default hotkey is
Ctrl + Win + Space — hold, speak, release.

What's not in this release

  • No pre-built .exe installer yet; the app runs from a Python venv.
    PyInstaller packaging is planned for the next release.
  • No auto-update mechanism.
  • Code signing: the source is MIT; no signed binaries to distribute yet.
  • The 50-sample FLEURS-calibrated Parakeet NPU model is not used at
    runtime — the 8-sample calibration variant proved more stable for
    long-form audio. See scripts/quantize_qdq*.py for the toolchain.

Acknowledgements