Skip to content

zhangchaodesign/LiveSynth

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LiveSynth

LiveSynth is a program that allows you to generate a voice with AI using OpenAI's Whisper and the ElevenLabs API. By holding a key, you can record your voice, have it transcribed, and used to generate an AI voice from what you said. I use this to give myself an interesting voice in games. Windows is not and will not be supported lol.

Requirements

  • Linux
  • PipeWire or PulseAudio
  • CUDA

Usage

python whisper.py --key alt_r --voice 21m00Tcm4TlvDq8ikWAM --api-key <your xi-api-key>

The --key option expects the name of an x11 keysym, those are listed here.

The default whisper model requires 5GB of vram. This can be changed with --model or CPU transcription can be used with --cpu. The available whisper models are documented here.

To output to a custom sink (useful for using as an input device) you can use the --output-sink option which expects the name or serial of the sink.

--input-source is also available to choose a specific input.

Creating a virtual sink (named LiveSynth) and source (named LiveSynthSource) can be done with a couple commands:

pactl load-module module-null-sink sink_name="LiveSynth" sink_properties=device.description="LiveSynth Sink"
pactl load-module module-remap-source master="LiveSynth.monitor" source_name="LiveSynthSource" source_properties=device.description="LiveSynth Source"

Caveats

  • Whisper uses a lot of VRAM (5GB for medium)
  • The ElevenLabs API often takes a while to return a result. Hopefully open source AI voice will catch up soon because this sucks.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.3%
  • Nix 15.7%