Super simple python script to start recording sound, send it to whisper then have it type for you anywhere.
- Can also modify text according to voice commands.
- Latency is as low as I could (instant if deepgram is used, <1s for openai's whisper).
- It can be seen as a minimalist alternative to AquaVoice
- starts recording
- when you're done press shift (escape or spacebar to cancel)
- whisper will transcribe your speech
4.a if
--auto_paste
is True: your current clipboard will be saved, replaced by the transcription, "ctrl+v" will automatically be pressed, then your old clipboard will replace again like nothing happened. 4.b if--auto_paste
is False: your clipboard will be replaced by the transcription
- starts recording
- when you're done press shift (escape or spacebar to cancel)
- whisper will transcribe your speech
- the transcription will be interpreted as an instruction for
--llm_model
on how to transform the text found in your clipboard - the result will either be pasted or stored in the clipboard like for
--task=write
- starts recording
- when you're done press shift (escape or spacebar to cancel)
- whisper will transcribe your speech
- the transcription will be interpreted as the first user message in a conversation with
--llm_model
- the result will either be pasted or stored in the clipboard like for
--task=write
, and optionaly read aloud if--voice_engine
is set - To continue the conversation, use the task
--task=continue_voice_chat
- I want to write text:
python quick_whisper_typer.py --task=write --auto_paste
- I want to translate text: copy the text in to the clipboard then
python quick_whisper_typer.py --task=transform_clipboard --auto_paste
- I want to start a vocal conversation:
python quick_whisper_typer.py --task="new_voice_chat" --voice_engine='openai'
- I want to continue the conversation:
python quick_whisper_typer.py --task="continue_voice_chat" --voice_engine='openai'
- I want to call it from anywhere without setting up keybindings, use
--loop
then pressshift
key several times from anywhere and you'll see a notification appear to trigger the tasks.
- Supports any spoken languages supported by whisper
- Supports both openai's whisper and deepgram's whisper
- Minimalist code
- Low latency: it starts as fast as possible to be ready to listen to you
- Four supported voice_engine: openai, piper, deepgram, espeak (fallback if any of the other fails)
- Optional audio cleanup and long silence removal via sox
--loop
to trigger the script from anywhere just by pressing shift multiple times. You can define any king of argument to customize your loop shortcuts by passing a dict to--loop_tasks
- Support virtually any type of LLM (ChatGPT, Claude, Huggingface, Llama, etc) thanks to litellm.
- Supposedly multiplatform, but I can't test it on anything else than Linux so please open an issue to tell me how it went!
- Make sure your environment contains the appropriate api keys (eg as OPENAI_API_KEY, MISTRAL_API_KEY, DEEPGRAM_API_KEY etc)
- optional: add a keyboard shortcut to call this script. See my i3 bindings below.
- If using deepgram: make sure you are on python 3.10+
chmod +x ./quick_whisper_typer.py
mode "$mode_launch_microphone" {
# enter text
bindsym f exec /PATH/TO/quick_whisper_typer.py --task write, mode "default
# edit clipboard
bindsym e exec /PATH/TO/quick_whisper_typer.py --task=transform_clipboard, mode "default"
bindsym v exec /PATH/TO/quick_whisper_typer.py --task=continue_voice_chat, mode "default"
bindsym shift+V exec /PATH/TO/quick_whisper_typer.py --task=new_voice_chat, mode "default"
bindsym Return mode "default"
bindsym Escape mode "default"
}
.ogg
files were in my/usr/share/sounds/ubuntu/notifications
folder.