Skip to content

v0.3.0 — Parakeet on Hexagon NPU

Latest

Choose a tag to compare

@trsdn trsdn released this 03 Jun 09:24
· 2 commits to experiment/aihub-npu-8s since this release

📥 Which file do I download?

Your machine Download this
Snapdragon / ARM (e.g. Surface Pro 11, Surface Laptop 7) openwritr-windows-arm64-v0.3.0-setup.exe
Intel / AMD (most other laptops & desktops) openwritr-windows-x64-v0.3.0-setup.exe

Not sure which you have? Settings → System → About → "System type": if it says ARM-based processor take arm64, otherwise x64.

Install

  1. Run the setup .exe. Windows SmartScreen will warn ("Windows protected your PC") because the binaries are not code-signed yet — click More info → Run anyway. You can verify your download against SHA256SUMS.txt: Get-FileHash .\openwritr-windows-<arch>-v0.3.0-setup.exe in PowerShell.
  2. The installer creates everything itself (Start Menu entry, optional autostart, uninstaller under Settings → Apps). No folders to prepare.
  3. First launch downloads the speech model (~0.6–1.2 GB) — one-time, takes a couple of minutes. A microphone icon appears in the system tray when ready.
  4. Hold Ctrl + Win, speak, release — the text is pasted at your cursor.
Manual install (portable zip, no installer)

Download the .zip for your architecture instead, unzip it into any folder you like (e.g. C:\Tools\OpenWritr\), and run openwritr.exe from there. Same binaries — just no Start Menu entry, no autostart, no uninstaller. User data (settings, models, logs) goes to %LOCALAPPDATA%\OpenWritr\ automatically either way.


First release with Parakeet TDT v3 running on the Snapdragon X Elite Hexagon NPU (arm64) — plus a CPU-only build for Intel/AMD (x64).

Push-to-talk transcription on the NPU at typical 200–400 ms total decode for a 5-second utterance, with the encoder itself at ~67 ms steady-state per 8-second window. Long-form audio is chunked transparently (8 s window, 1 s overlap, decoder runs once over the stitched feature stream).

Performance

Measured on Snapdragon X Elite (X1E80100):

Audio length Decode (preproc + NPU encode + TDT) × Realtime Chunks
3 s 128 ms 23× 1
5.8 s 221 ms 26× 1
16.4 s 375 ms 44× 3
23.0 s 626 ms 37× 4

The x64 build runs the same pipeline on the CPU (no Hexagon NPU on Intel/AMD) at roughly 25× realtime on a modern CPU.

Switch engines (arm64 only)

Right-click the tray icon → Settings → Transcription engine. NPU is used when available, with automatic CPU fallback.

Companion model

The NPU encoder is hosted at trsdn/parakeet-tdt-0.6b-v3-htp-int8-8s (632 MB QAIRT context binary). CC-BY-4.0, attribution NVIDIA Parakeet. Downloaded automatically on first NPU launch.

What landed

  • NPU pipeline: Direct ort_sys FFI (src/asr/qnn_ffi.rs) loading an AI-Hub-compiled QNN context binary; chunked long-audio with seam stitching.
  • Focus-robust hotkey: global low-level keyboard hook — recording survives focus steals (popups, UAC, shortcuts).
  • Audio idle fix: capture stream rebuilt per recording — no more dead mic after long idle.
  • App icon, settings links, included-Copilot-model markers.
  • x64 build for Intel/AMD (CPU INT8, 9 MB zip).
  • CI: every release is built reproducibly on GitHub Actions (windows-11-arm).

Known limits

  • arm64 NPU model is device-gated to Snapdragon X Elite; other Snapdragons fall back to CPU.
  • Static 8 s NPU window; longer audio is chunked transparently.
  • Binaries unsigned (SmartScreen warning) — Store/signing in progress.

🤖 Generated with Claude Code