WhisperVoiceInput

A cross-platform desktop application that records audio and transcribes it to text using OpenAI's Whisper API or compatible services. Perfect for dictation, note-taking, and accessibility.

Disclaimer

The project is a tool for fulfilling my personal needs. I use Linux + Wayland and the tool has been tested only on this platform.

It supports only OpenAI compatible Whisper API. Supported output methods you can find down below.

Feel free to fork the project and make it compatible with your needs. PRs are welcome.

Features

Audio Recording: Capture audio from your system's default microphone
Speech-to-Text Transcription: Convert speech to text using OpenAI's Whisper API or compatible services
Multiple Output Options:
- Copy to clipboard - The stating splash screen is a workaround for a clipboard issue (as soon as I find a solution I will fix it)
- Use wl-copy for Wayland systems
- Type text directly using ydotool
- Type text directly using wtype
System Tray Integration: Monitor recording status with color-coded tray icon
Unix Socket Control: Control the application via command line scripts
Configurable Settings:
- API endpoint and key
- Whisper model selection
- Language preference
- Custom prompts for better recognition

Roadmap

Remove the splash screen after clipboard issue is fixed
Add shortcut support
Add post-processing options

Requirements

.NET 9.0 or higher
For Wayland clipboard support: wl-copy
For typing output: ydotool
OpenAL compatible sound card/drivers
OpenAI API key or compatible Whisper API endpoint
- Default OpenAI base URL: https://api.openai.com
- Default OpenAI whisper model name: whisper-1

Installation

Prerequisites

For Linux: Install lame from your package manager.

From Source

Clone the repository:

git clone https://github.com/yourusername/WhisperVoiceInput.git
cd WhisperVoiceInput

Build the application:
```
dotnet build -c Release
```

Run the application:

dotnet run --project WhisperVoiceInput/WhisperVoiceInput.csproj

Pre-built Binaries

Download the latest release from the Releases page.

Configuration

On first run, the application creates a configuration directory at:

~/.config/WhisperVoiceInput/ (Linux/macOS)
%APPDATA%\WhisperVoiceInput\ (Windows)

API Configuration

Open the settings window by clicking on the tray icon
Enter your OpenAI API key or configure a compatible endpoint
Select the Whisper model (default: whisper-large)
Set your preferred language (e.g., "en" for English)
Optionally add a prompt to guide the transcription

Output Configuration

Choose your preferred output method:

Clipboard: Standard clipboard (uses AvaloniaUI API and works on most systems)
wl-copy: For Wayland systems (requires wl-copy to be installed)
ydotool: Types the text directly (requires ydotool to be installed and configured)
wtype: Types the text directly (requires wtype to be installed and configured)

Self-Hosted Whisper API

I personally use Speaches as a self-hosted Whisper API.

An example of docker-compose file for GPU enhanced version of Speaches:

  speaches:
    image: ghcr.io/speaches-ai/speaches:0.7.0-cuda # https://github.com/speaches-ai/speaches/pkgs/container/speaches/versions?filters%5Bversion_type%5D=tagged
    container_name: speaches
    restart: unless-stopped
    ports:
      - "1264:8000"
    volumes:
      - ./speaches_cache:/home/ubuntu/.cache/huggingface/hub
    environment:
      - ENABLE_UI=false
      - WHISPER__TTL=-1 # default TTL is 300 (5min), -1 to disable, 0 to unload directly, 43200=12h
      - WHISPER__INFERENCE_DEVICE=cuda
      - WHISPER__COMPUTE_TYPE=float16
      - WHISPER__MODEL=deepdml/faster-whisper-large-v3-turbo-ct2 # uses ~2.5Gb VRAM in CUDA version
      #- WHISPER__MODEL=Systran/faster-whisper-large-v3
      - WHISPER__DEVICE_INDEX=1
      - ALLOW_ORIGINS=[ "*", "app://obsidian.md" ]
      - API_KEY=sk-1234567890
      - LOOPBACK_HOST_URL=yourdomain.com
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Usage

GUI Usage

Click the tray icon to start/stop recording
When recording, the icon turns yellow
During transcription processing, the icon turns light blue
On success, the icon briefly turns green and the transcribed text is output according to your settings
On error, the icon turns red

Command Line Control

The application can be controlled via Unix socket commands. Two scripts are provided:

Simple Toggle Script (toggle.sh)

#!/bin/bash

MESSAGE="transcribe_toggle"
PIPE_PATH="/tmp/WhisperVoiceInput/pipe"

echo "$MESSAGE" | socat - UNIX-CONNECT:$PIPE_PATH

Enhanced Toggle Script (transcribe_toggle.sh)

#!/bin/bash

MESSAGE="transcribe_toggle"
PIPE_PATH="/tmp/WhisperVoiceInput/pipe"

# Check if socat is installed
if ! command -v socat &> /dev/null; then
    echo "Error: socat is not installed. Please install it with your package manager."
    echo "For example: sudo apt install socat"
    exit 1
fi

# Check if the socket exists
if [ ! -S "$PIPE_PATH" ]; then
    echo "Error: Socket $PIPE_PATH does not exist."
    echo "Make sure WhisperVoiceInput is running."
    exit 1
fi

echo "Sending '$MESSAGE' command to WhisperVoiceInput..."
echo "$MESSAGE" | socat - UNIX-CONNECT:$PIPE_PATH
echo "Command sent."

Make the scripts executable:

chmod +x toggle.sh transcribe_toggle.sh

Run the script to toggle recording:

./toggle.sh

Keyboard Shortcuts

You can bind the toggle script to a keyboard shortcut in your desktop environment for quick access:

GNOME Example:

gsettings set org.gnome.settings-daemon.plugins.media-keys custom-keybindings "['/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/']"
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ name "Toggle WhisperVoiceInput"
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ command "/path/to/toggle.sh"
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ binding "<Ctrl><Alt>w"

KDE Example:

System Settings > Shortcuts > Custom Shortcuts
Add a new shortcut
Set the command to /path/to/toggle.sh
Assign a keyboard shortcut

Troubleshooting

On Linux, logs are stored in ~/.config/WhisperVoiceInput/logs On Windows, logs are stored in %APPDATA%\WhisperVoiceInput\logs

Local Seq server is supported. I should be running on the localhost default port 5341.

Recording Issues

Ensure your microphone is properly connected and set as the default input device
Check system permissions for microphone access
Verify OpenAL is properly installed and configured

Transcription Issues

Verify your API key is correct
Check your internet connection
Ensure the server address is correct
Try a different Whisper model (smaller models may be faster but less accurate)

Socket Control Issues

Ensure the application is running
Check if the socket file exists at /tmp/WhisperVoiceInput/pipe
Verify socat is installed: sudo apt install socat

Logs

Logs are stored in:

~/.config/WhisperVoiceInput/logs/ (Linux/macOS)
%APPDATA%\WhisperVoiceInput\logs\ (Windows)

License

MIT License

Acknowledgements

OpenAI Whisper - Speech recognition model
Avalonia UI - Cross-platform UI framework
ReactiveUI - MVVM framework
NAudio - Audio library for .NET
OpenTK.OpenAL - OpenAL bindings for .NET

Diagrams

sequenceDiagram
    participant User as User
    participant Tray as Tray Icon
    participant Recorder as Recording Module
    participant Cloud as Cloud Server
    participant Clipboard as Clipboard
    participant UDS as Unix Domain Socket

    alt Trigger by User
        User->>Tray: Click tray icon (start recording)
    else Trigger by Command
        UDS->>Tray: Send record command
    end

    Tray->>Recorder: Start recording
    Recorder-->>Tray: Recording started (icon turns yellow)
    Recorder-->>Tray: Recording finished (audio data)
    Tray->>Tray: Change icon to light blue (processing)
    Tray->>Cloud: Send audio data (API request)
    alt Success
        Cloud-->>Tray: Transcribed text
        Tray->>Clipboard: Copy text to clipboard
        Tray->>Tray: Change icon to green (5 seconds)
    else Error
        Cloud-->>Tray: Transcription error
        Tray->>Tray: Change icon to red (5 seconds)
        Note over Tray: Display error tooltip on hover
    end
    Tray->>Tray: Revert icon to white (idle)

Designed with Mermaid

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
WhisperVoiceInput		WhisperVoiceInput
.gitignore		.gitignore
LICENSE		LICENSE
WhisperVoiceInput.sln		WhisperVoiceInput.sln
readme.md		readme.md
transcribe_toggle.sh		transcribe_toggle.sh
transcribe_toggle_simplified.sh		transcribe_toggle_simplified.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperVoiceInput

Disclaimer

Features

Roadmap

Requirements

Installation

Prerequisites

From Source

Pre-built Binaries

Configuration

API Configuration

Output Configuration

Self-Hosted Whisper API

Usage

GUI Usage

Command Line Control

Simple Toggle Script (toggle.sh)

Enhanced Toggle Script (transcribe_toggle.sh)

Keyboard Shortcuts

GNOME Example:

KDE Example:

Troubleshooting

Recording Issues

Transcription Issues

Socket Control Issues

Logs

License

Acknowledgements

Diagrams

About

Uh oh!

Releases 6

Packages

Uh oh!

Languages

License

V0v1kkk/WhisperVoiceInput

Folders and files

Latest commit

History

Repository files navigation

WhisperVoiceInput

Disclaimer

Features

Roadmap

Requirements

Installation

Prerequisites

From Source

Pre-built Binaries

Configuration

API Configuration

Output Configuration

Self-Hosted Whisper API

Usage

GUI Usage

Command Line Control

Simple Toggle Script (toggle.sh)

Enhanced Toggle Script (transcribe_toggle.sh)

Keyboard Shortcuts

GNOME Example:

KDE Example:

Troubleshooting

Recording Issues

Transcription Issues

Socket Control Issues

Logs

License

Acknowledgements

Diagrams

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Languages

Packages