Skip to content

tyun08/audio-input

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

143 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Input

Press a global hotkey, speak, and your words are transcribed into whatever is focused — any app, any input.

Open source (MIT). No account, no telemetry, no server of ours. Your Groq API key goes directly to Groq.

Free alternative to SuperWhisper.

Release License: MIT Platform


Install

macOS

brew install --cask tonyyun/tap/audio-input

Or grab the .dmg from Releases. First launch: right-click → Open to bypass Gatekeeper.

Windows

Download Audio.Input_x.x.x_x64-setup.exe from Releases and run it.

First launch: Windows SmartScreen may say "Windows protected your PC". Click More info → Run anyway.


Setup

  1. Get a free API key at console.groq.com (no credit card required)
  2. Right-click the system tray mic icon → Configure API Key
  3. Press Ctrl+Shift+Space (Windows) or ⌘⇧Space (macOS) anywhere and start talking

Features

  • Global hotkey — default ⌘⇧Space, fully customizable
  • Works everywhere — injects text into any focused input via Accessibility API
  • 50+ languages — Whisper large-v3-turbo auto-detects your language
  • AI polish — optional LLM pass to clean up filler words and punctuation (toggle from menu bar). At recording start, a screenshot is taken and sent as context to a vision LLM (llama-4-scout on Groq) to improve accuracy of technical and domain-specific terms.
  • Tiny footprint — ~20 MB RAM, built with Rust + Tauri

Cost

Powered by Groq's Whisper large-v3-turbo — the fastest Whisper inference available.

$0.04 per hour of audio (~$0.00067/minute).

For typical use — a few minutes of voice input per day — that's well under $0.10/month. The Groq free tier alone covers most personal use.


How It Works

  1. Press the global hotkey — a screenshot of the active screen is captured immediately
  2. Speak; audio is recorded locally while you hold (or toggle) the hotkey
  3. Audio is sent to Groq's Whisper large-v3-turbo for transcription
  4. If AI polish is enabled, the transcript + screenshot are sent to a vision LLM (llama-4-scout) to fix technical terms, proper nouns, and punctuation
  5. The final text is injected into whatever input is focused via the Accessibility API

Privacy

Audio is sent to Groq for transcription — Groq's data retention policy applies. Screenshots are taken locally and sent to Groq's vision API only when AI polish is enabled; neither audio nor screenshots are stored by this app. No analytics, no telemetry, no account required. See PRIVACY.md for full details.


Menu bar states

Icon State
Black mic Idle
Red mic Recording
Blue mic Transcribing
Orange mic Error

Build from source

macOS

Prerequisites: Node 20+, Rust stable

# Install Rust if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

git clone https://github.com/tyun08/audio-input
cd audio-input
npm install
npm run tauri dev    # dev mode
npm run tauri build  # release build → produces .dmg + .app in src-tauri/target/release/bundle/

Windows

Prerequisites:

  1. Node.js 20+https://nodejs.org (LTS)
  2. Rusthttps://rustup.rs
  3. Microsoft C++ Build Toolshttps://visualstudio.microsoft.com/visual-cpp-build-tools/
    • In the installer select "Desktop development with C++"
  4. WebView2 Runtime — pre-installed on Windows 11; on Windows 10 get it from https://developer.microsoft.com/microsoft-edge/webview2/
git clone https://github.com/tyun08/audio-input
cd audio-input
npm install
npm run tauri dev

Stack

Tauri 2 · Rust (cpal, reqwest) · Svelte · Groq API (Whisper large-v3-turbo + LLM polish)

License

MIT

About

Global hotkey voice input for every app. Open source, no account, no telemetry.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors