Enhanced Speech Clarity Tool

A tool for converting difficult-to-understand speech (such as accented English) into clear, pleasant voices with disfluency removal.

Features

Extract audio from YouTube videos or local files
Transcribe speech with high accuracy using OpenAI's Whisper (locally)
Remove speech disfluencies (um, uh, false starts, repetitions)
Convert to natural-sounding speech using high-quality neural TTS
Maintain the original timing and pacing of the conversation

Requirements

Python 3.8 or higher
FFmpeg
4GB+ RAM for transcription
500MB+ disk space for models

Installation

Quick Setup (Recommended)

Run the provided setup script, which will install all required dependencies and download the necessary models:

# Make the setup script executable
chmod +x setup.sh

# Run the setup script
./setup.sh

Manual Installation

If you prefer to install manually:

Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install Python dependencies:

pip install yt-dlp pydub tqdm openai-whisper piper-tts torch

Download the TTS voice model:

piper-download --voice en_US-lessac-medium

Usage

Process a YouTube Video

python enhanced_speech_tool.py -yt "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"

Process a Local Audio File

python enhanced_speech_tool.py -f "path/to/audio/file.mp3"

Command Line Options

-yt, --youtube: YouTube URL
-f, --file: Local audio file path
-u, --url: Direct audio URL
-c, --config: Path to configuration file
-v, --voice: Voice to use for synthesis
--no-disfluencies: Remove disfluencies (um, uh, etc.)
--simplify: Simplify language for easier understanding
-o, --output-dir: Output directory

Voice Options

Several high-quality voices are available through Piper TTS:

en_US-lessac-medium: Clear American English (default)
en_GB-alba-medium: British English
en_US-ryan-high: Male American voice
en_AU-sydney-medium: Australian English

Configuration

You can customize the tool's behavior by creating a configuration file. Example:

{
    "output_dir": "enhanced_audio",
    "whisper_model": "base",
    "device": null,
    "remove_disfluencies": true,
    "simplify_language": false,
    "tts_engine": "piper",
    "voice": "en_US-lessac-medium",
    "maintain_timing": true
}

Project Structure

enhanced_speech_tool/
├── enhanced_speech_tool.py     # Main script
├── setup.sh                    # Setup script
├── config/                     # Configuration files
│   └── default.json            # Default configuration
├── src/                        # Source code
│   ├── audio_extractor.py      # Audio extraction module
│   ├── transcriber.py          # Speech transcription module
│   ├── text_processor.py       # Text processing module
│   ├── speech_synthesizer.py   # Speech synthesis module
│   ├── audio_mixer.py          # Audio mixing module
│   └── config.py               # Configuration module
└── enhanced_audio/             # Output directory

Limitations

Whisper transcription may not be perfect for all accents or poor audio quality
Neural TTS voices require downloading models (~200MB per voice)
Processing long audio files (>30 minutes) can take significant time

Credits

This tool uses the following open-source libraries:

OpenAI Whisper for transcription
Piper TTS for speech synthesis
yt-dlp for YouTube audio extraction
PyDub for audio processing

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
enhanced_audio		enhanced_audio
src		src
.gitignore		.gitignore
enhanced_speech_tool.py		enhanced_speech_tool.py
readme.md		readme.md
setup.sh		setup.sh
simple_pydub.py		simple_pydub.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhanced Speech Clarity Tool

Features

Requirements

Installation

Quick Setup (Recommended)

Manual Installation

Usage

Process a YouTube Video

Process a Local Audio File

Command Line Options

Voice Options

Configuration

Project Structure

Limitations

Credits

About

Uh oh!

Releases

Packages

Languages

gojiplus/speech

Folders and files

Latest commit

History

Repository files navigation

Enhanced Speech Clarity Tool

Features

Requirements

Installation

Quick Setup (Recommended)

Manual Installation

Usage

Process a YouTube Video

Process a Local Audio File

Command Line Options

Voice Options

Configuration

Project Structure

Limitations

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages