Press a global hotkey, speak, and your words are transcribed into whatever is focused — any app, any input.
Open source (MIT). No account, no telemetry, no server of ours. Your Groq API key goes directly to Groq.
Free alternative to SuperWhisper.
macOS
brew install --cask tonyyun/tap/audio-inputOr grab the .dmg from Releases. First launch: right-click → Open to bypass Gatekeeper.
Windows
Download Audio.Input_x.x.x_x64-setup.exe from Releases and run it.
First launch: Windows SmartScreen may say "Windows protected your PC". Click More info → Run anyway.
- Get a free API key at console.groq.com (no credit card required)
- Right-click the system tray mic icon → Configure API Key
- Press
Ctrl+Shift+Space(Windows) or⌘⇧Space(macOS) anywhere and start talking
- Global hotkey — default
⌘⇧Space, fully customizable - Works everywhere — injects text into any focused input via Accessibility API
- 50+ languages — Whisper large-v3-turbo auto-detects your language
- AI polish — optional LLM pass to clean up filler words and punctuation (toggle from menu bar). At recording start, a screenshot is taken and sent as context to a vision LLM (llama-4-scout on Groq) to improve accuracy of technical and domain-specific terms.
- Tiny footprint — ~20 MB RAM, built with Rust + Tauri
Powered by Groq's Whisper large-v3-turbo — the fastest Whisper inference available.
$0.04 per hour of audio (~$0.00067/minute).
For typical use — a few minutes of voice input per day — that's well under $0.10/month. The Groq free tier alone covers most personal use.
- Press the global hotkey — a screenshot of the active screen is captured immediately
- Speak; audio is recorded locally while you hold (or toggle) the hotkey
- Audio is sent to Groq's Whisper large-v3-turbo for transcription
- If AI polish is enabled, the transcript + screenshot are sent to a vision LLM (llama-4-scout) to fix technical terms, proper nouns, and punctuation
- The final text is injected into whatever input is focused via the Accessibility API
Audio is sent to Groq for transcription — Groq's data retention policy applies. Screenshots are taken locally and sent to Groq's vision API only when AI polish is enabled; neither audio nor screenshots are stored by this app. No analytics, no telemetry, no account required. See PRIVACY.md for full details.
| Icon | State |
|---|---|
| Black mic | Idle |
| Red mic | Recording |
| Blue mic | Transcribing |
| Orange mic | Error |
Prerequisites: Node 20+, Rust stable
# Install Rust if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/tyun08/audio-input
cd audio-input
npm install
npm run tauri dev # dev mode
npm run tauri build # release build → produces .dmg + .app in src-tauri/target/release/bundle/Prerequisites:
- Node.js 20+ — https://nodejs.org (LTS)
- Rust — https://rustup.rs
- Microsoft C++ Build Tools — https://visualstudio.microsoft.com/visual-cpp-build-tools/
- In the installer select "Desktop development with C++"
- WebView2 Runtime — pre-installed on Windows 11; on Windows 10 get it from https://developer.microsoft.com/microsoft-edge/webview2/
git clone https://github.com/tyun08/audio-input
cd audio-input
npm install
npm run tauri devTauri 2 · Rust (cpal, reqwest) · Svelte · Groq API (Whisper large-v3-turbo + LLM polish)
MIT