This is currently a proof of concept, not a final product
Mentalese is a real-time, two-way voice translation application that runs on your local machine. It's designed to sit between you and a voice chat application (like Discord, Skype, or Google Meet) and translate your conversation in real-time.
- Real-time, two-way translation: Translates your voice for the other person and their voice for you.
- Voice cloning: Can clone your voice and the other person's voice for more natural-sounding translations.
- Multiple languages supported: Leverages Facebook's NLLB model for translation between many languages and Coqui TTS for voice synthesis in several languages.
- GUI for configuration and monitoring: A simple interface to control the engine, view transcriptions and translations, and configure settings.
- Local and private: All processing is done on your machine, ensuring privacy.
Mentalese uses a clever audio routing system with a virtual audio cable (VB-CABLE) to intercept and process audio streams.
-
Your Voice:
- Mentalese captures audio from your microphone.
- It transcribes your speech to text using Whisper.
- It translates the text to the target language using NLLB.
- It synthesizes the translated text to speech using Coqui TTS (optionally cloning your voice).
- The translated audio is sent to the Virtual Cable Input, which should be set as the microphone in your chat application.
-
Their Voice:
- You must set your chat application's audio output to the Virtual Cable Output.
- Mentalese captures the other person's audio from the Virtual Cable Output.
- It transcribes, translates, and synthesizes their speech, similar to your voice's pipeline.
- The translated audio is sent directly to your headphones.
- Python 3.10+
- PyTorch
- A CUDA-enabled GPU is highly recommended for decent performance.
- VB-CABLE Virtual Audio Device: You need to have VB-CABLE installed.
-
Install VB-CABLE:
- If you don't have it, you can find the installer in the project's root directory (
VBCABLE_Setup_x64.exe). - Right-click
VBCABLE_Setup_x64.exeand Run as administrator.
- If you don't have it, you can find the installer in the project's root directory (
-
Install Python dependencies:
pip install -r requirements.txt
-
Download AI Models:
- The first time you run the application, the necessary AI models (Whisper, NLLB, and TTS) will be downloaded and cached in the
modelsdirectory. This may take a while and requires a good internet connection.
- The first time you run the application, the necessary AI models (Whisper, NLLB, and TTS) will be downloaded and cached in the
-
Configure
config.json:- Open
config.jsonand configure your audio devices and language settings. See the Configuration section below for more details.
- Open
-
Run the application:
python gui.py
-
Configure your chat application:
- Input Device: Set to
CABLE Input (VB-Audio Virtual Cable). - Output Device: Set to
CABLE Output (VB-Audio Virtual Cable).
- Input Device: Set to
-
Start the engine:
- Click the "Start Engine" button in the Mentalese GUI.
The config.json file is the main configuration file for the application.
models: Specifies the AI models to use.devices: The names of your audio devices.user_mic_name: Your actual microphone.headphones_name: Your headphones or speakers.cable_input_name: Should beCABLE Input (VB-Audio Virtual Cable).cable_output_name: Should beCABLE Output (VB-Audio Virtual Cable).
vad_settings: Voice Activity Detection settings.user_pipeline: Settings for your voice's translation.clone_voice: Set totrueto enable voice cloning. The first time you speak, it will register your voice.target_language_nllb: The language to translate your voice to for the other person (using NLLB language codes).target_language_tts: The language for the synthesized voice (using TTS language codes).
other_pipeline: Settings for the other person's voice translation.clone_voice: Set totrueto enable voice cloning for the other person.source_language_nllb: The language the other person is speaking (using NLLB language codes).source_language_tts: The language the other person is speaking (for Whisper).