Releases: trsdn/openwritr-windows
v0.3.0 — Parakeet on Hexagon NPU
📥 Which file do I download?
| Your machine | Download this |
|---|---|
| Snapdragon / ARM (e.g. Surface Pro 11, Surface Laptop 7) | openwritr-windows-arm64-v0.3.0-setup.exe |
| Intel / AMD (most other laptops & desktops) | openwritr-windows-x64-v0.3.0-setup.exe |
Not sure which you have? Settings → System → About → "System type": if it says ARM-based processor take arm64, otherwise x64.
Install
- Run the setup
.exe. Windows SmartScreen will warn ("Windows protected your PC") because the binaries are not code-signed yet — click More info → Run anyway. You can verify your download againstSHA256SUMS.txt:Get-FileHash .\openwritr-windows-<arch>-v0.3.0-setup.exein PowerShell. - The installer creates everything itself (Start Menu entry, optional autostart, uninstaller under Settings → Apps). No folders to prepare.
- First launch downloads the speech model (~0.6–1.2 GB) — one-time, takes a couple of minutes. A microphone icon appears in the system tray when ready.
- Hold Ctrl + Win, speak, release — the text is pasted at your cursor.
Manual install (portable zip, no installer)
Download the .zip for your architecture instead, unzip it into any folder you like (e.g. C:\Tools\OpenWritr\), and run openwritr.exe from there. Same binaries — just no Start Menu entry, no autostart, no uninstaller. User data (settings, models, logs) goes to %LOCALAPPDATA%\OpenWritr\ automatically either way.
First release with Parakeet TDT v3 running on the Snapdragon X Elite Hexagon NPU (arm64) — plus a CPU-only build for Intel/AMD (x64).
Push-to-talk transcription on the NPU at typical 200–400 ms total decode for a 5-second utterance, with the encoder itself at ~67 ms steady-state per 8-second window. Long-form audio is chunked transparently (8 s window, 1 s overlap, decoder runs once over the stitched feature stream).
Performance
Measured on Snapdragon X Elite (X1E80100):
| Audio length | Decode (preproc + NPU encode + TDT) | × Realtime | Chunks |
|---|---|---|---|
| 3 s | 128 ms | 23× | 1 |
| 5.8 s | 221 ms | 26× | 1 |
| 16.4 s | 375 ms | 44× | 3 |
| 23.0 s | 626 ms | 37× | 4 |
The x64 build runs the same pipeline on the CPU (no Hexagon NPU on Intel/AMD) at roughly 25× realtime on a modern CPU.
Switch engines (arm64 only)
Right-click the tray icon → Settings → Transcription engine. NPU is used when available, with automatic CPU fallback.
Companion model
The NPU encoder is hosted at trsdn/parakeet-tdt-0.6b-v3-htp-int8-8s (632 MB QAIRT context binary). CC-BY-4.0, attribution NVIDIA Parakeet. Downloaded automatically on first NPU launch.
What landed
- NPU pipeline: Direct ort_sys FFI (
src/asr/qnn_ffi.rs) loading an AI-Hub-compiled QNN context binary; chunked long-audio with seam stitching. - Focus-robust hotkey: global low-level keyboard hook — recording survives focus steals (popups, UAC, shortcuts).
- Audio idle fix: capture stream rebuilt per recording — no more dead mic after long idle.
- App icon, settings links, included-Copilot-model markers.
- x64 build for Intel/AMD (CPU INT8, 9 MB zip).
- CI: every release is built reproducibly on GitHub Actions (
windows-11-arm).
Known limits
- arm64 NPU model is device-gated to Snapdragon X Elite; other Snapdragons fall back to CPU.
- Static 8 s NPU window; longer audio is chunked transparently.
- Binaries unsigned (SmartScreen warning) — Store/signing in progress.
🤖 Generated with Claude Code
v0.2.2 — modifiers-only hotkey, model dropdown, licence compliance
Hotkey is now modifier-only by default: hold Ctrl+Win to record, Ctrl+Shift+Win to also run an LLM cleanup pass. Detection runs continuously via GetAsyncKeyState, so it keeps working even if Windows reserves the combo elsewhere.
Settings now exposes a model dropdown (Claude Haiku 4.5 default, GPT-5 Mini, GPT-4.1) matching the macOS app's enhanced-model menu. Custom model names are still editable via a text field.
*\gh auth token* now runs with no-window flag and caches the result for 10 minutes — no more CLI flash on every enhanced recording.
Third-party licence compliance: each release zip now ships a \ hird-party-licenses/\ folder containing the Qualcomm AI Engine Direct redistributable licence, Microsoft's ThirdPartyNotices for the onnxruntime-qnn package, and the onnxruntime-qnn LICENSE + Privacy notices. The QNN DLLs were already being bundled; only the licence text was missing.
README rewritten from scratch to reflect v0.2.x reality.
v0.2.1 — settings + overlay + multi-channel audio fixes
Fixes the three biggest issues users hit in v0.2.0:
- Transcription no longer returns empty strings on multi-channel mics. The Qualcomm Aqstic mic array on Surface Pro exposes 4-8 interleaved channels at 48 kHz; the recorder was feeding the raw interleaved buffer to the resampler as if it were mono. Now downmixed properly inside the audio callback.
- Settings UI no longer freezes the tray app.
CreateProcessWon Windows ARM64 (especially with Defender real-time scanning) can block the calling thread for several seconds. Spawning it inline from the winit tick handler stalled the tray's message pump → app went 'Not Responding' → hotkey died. Now spawned from a worker thread; main pump keeps draining. - Visual recording indicator — small dark pill at bottom-center with 22 vertical white bars that breathe with the audio level. Custom Win32 layered top-most window on its own thread, color-key transparent for a clean shape, painted with double-buffered GDI.
Other improvements: ControlFlow::Wait + EventLoopProxy so the main pump sleeps when idle (no more 16 ms spin in about_to_wait); settings.json hot-reload via mtime polling (engine + hotkey changes take effect live, no restart).
OpenWritr v0.2.0 - Native Rust Build
OpenWritr v0.2.0 — Native Rust Build for Windows on ARM
Single ~22 MB distribution, no Python runtime, models downloaded on first launch from Hugging Face.
Install
- Download
openwritr-windows-arm64-v0.2.0-dev.zipbelow. - Extract anywhere (e.g.
C:\Program Files\OpenWritr\). - Run
openwritr.exe. First launch downloads the Parakeet TDT v3 model (~670 MB) from Hugging Face into%LOCALAPPDATA%\OpenWritr\models\. - Hold Ctrl + Win + Space, speak, release. Text is pasted at the cursor. Right-click the tray icon for Settings….
Highlights
- ~6.4 MB
openwritr.exe(was ~300 MB Python in v0.1) - ~30 MB cold-start RSS (was ~700 MB)
- 25 languages via NVIDIA Parakeet TDT 0.6B v3 (INT8 CPU, ~140 ms / 11 s)
- egui Settings dialog with hotkey + engine + LLM cleanup configuration
- Warm start/stop tone cues
- Optional cleanup pass via GitHub Copilot (uses
gh auth token) or any OpenAI-compatible endpoint (hold Alt with hotkey) - Reliable hotkey FSM with safety auto-stop on modifier release
Known limitations vs v0.1 (Python)
- NPU backends fall back to CPU INT8 in the native build. The
ortRust crate 2.0-rc.10 does not exposeRegisterExecutionProviderLibrary, so we cannot load Qualcomm'sonnxruntime_providers_qnn.dllthe way Python ORT does. The Python v0.1 app (python/folder) remains the NPU-capable reference. - No animated overlay yet — tray icon colour change is the visual state feedback. Coming in v0.3.
Licenses
- OpenWritr source: MIT
- NVIDIA Parakeet TDT 0.6B v3: CC-BY-4.0
- istupakov Parakeet ONNX export: CC-BY-4.0
- ONNX Runtime: MIT
- Qualcomm QNN runtime DLLs (bundled): Qualcomm proprietary
OpenWritr for Windows v0.1.0
OpenWritr for Windows v0.1.0 — First Public Release
Push-to-talk voice-to-text tray app for Windows on ARM (Snapdragon X).
Hold a hotkey, speak, release — the transcript is pasted at your cursor.
Everything runs locally on your machine; nothing leaves the device unless
you turn on optional LLM cleanup.
Three transcription engines, all running locally
| Engine | Where it runs | Languages | Latency on 11 s English |
|---|---|---|---|
| Parakeet TDT v3 — CPU INT8 (default) | CPU | 25 | ~160 ms (70× realtime) |
| Parakeet TDT v3 — NPU INT8 | Hexagon NPU | 25 | ~110 ms (105× realtime) |
| Whisper Large v3 Turbo — NPU | Hexagon NPU | 99 | ~800 ms / 30 s window |
NPU runs use Qualcomm AI Engine Direct via onnxruntime-qnn 2.1, with the
encoder context cached on first use. Statically INT8-QDQ-quantized
Parakeet encoder runs entirely on the NPU; transcripts match the CPU
baseline exactly on validation clips.
Features
- Native Windows tray app with Fluent-styled WPF settings dialog
- Animated overlay with live audio-level meter (HiDPI-correct, Mica backdrop)
- Configurable hotkey: any combination of Ctrl / Shift / Alt / Win plus a
trigger key from Space / Tab / Caps Lock / Scroll Lock / Pause / Insert
/ Right Ctrl / F13..F20 - Auto-paste at cursor with clipboard save/restore
- Soft, warm start/stop audio cues
- Optional LLM cleanup pass via GitHub Copilot (
gh auth token) or any
OpenAI-compatible endpoint — toggle with Alt held during recording - Reliable hotkey FSM: stops cleanly when any required modifier is
released; safety auto-stop aftermax_record_seconds(60 s default)
Install (from source)
git clone https://github.com/trsdn/openwritr-windows.git
cd openwritr-windows
py -3.11-arm64 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r python\requirements.txt
# 640 MB Parakeet INT8 ONNX (default engine)
python python\fetch_model.py
# Optional: 1.6 GB Whisper Large v3 Turbo NPU build from Qualcomm AI Hub
python python\fetch_whisper.py
python python\openwritr.pyA blue microphone icon appears in your system tray. Default hotkey is
Ctrl + Win + Space — hold, speak, release.
What's not in this release
- No pre-built
.exeinstaller yet; the app runs from a Python venv.
PyInstaller packaging is planned for the next release. - No auto-update mechanism.
- Code signing: the source is MIT; no signed binaries to distribute yet.
- The 50-sample FLEURS-calibrated Parakeet NPU model is not used at
runtime — the 8-sample calibration variant proved more stable for
long-form audio. Seescripts/quantize_qdq*.pyfor the toolchain.
Acknowledgements
- macOS original: trsdn/OpenWritr
- Parakeet TDT 0.6B v3 ONNX export: istupakov/parakeet-tdt-0.6b-v3-onnx
- Whisper Large v3 Turbo NPU build: qualcomm/Whisper-Large-V3-Turbo
onnx-asrfor the Parakeet decoding pipeline