Speech-to-Text Transcription Tool

A powerful and flexible audio transcription toolkit that processes audio files, transcribes them using OpenAI's Whisper model, and performs intelligent text formatting to produce high-quality transcriptions.

📋 Overview

This repository contains tools for automated transcription of audio recordings, with specific optimizations for meeting audio. It solves several common challenges:

Processing large audio files that exceed API size limits
Handling long recordings efficiently through parallel processing
Correcting technical terms and framework names in transcriptions
Maintaining natural language flow while improving text readability

🔑 Key Features

Audio compression and format conversion
Automatic bitrate optimization to meet size constraints
Concurrent audio transcription for faster processing
Intelligent correction of technical terms and framework names
Comprehensive error handling and logging

🗂️ Repository Structure

transcription.py: Core transcription tool for general audio files
meet_transcription.py: Specialized tool for meeting recordings with advanced features
config.ini: Configuration file for API keys and settings (not tracked in git)

🧠 How It Works

Audio Preprocessing

Compression & Conversion: Audio files are compressed and converted to compatible formats (MP3/MP4)
Size Optimization: Files are automatically optimized to meet API size constraints by adjusting:
- Bitrate
- Sample rate
- Audio channels

Transcription Process

Audio Chunking: Larger files are split into manageable chunks
Parallel Processing: Chunks are processed concurrently for improved speed
Text Assembly: Transcribed chunks are assembled in the correct order
Text Correction: Framework and library names are automatically corrected
Formatting: Grammar and coherence are improved for better readability

🚀 Getting Started

Prerequisites

Python 3.7+
OpenAI API key
Required Python packages (see Installation)

Installation

Clone this repository:

git clone https://github.com/yourusername/speech-to-text.git
cd speech-to-text

Install dependencies:
```
pip install openai pydub
```
Create a config.ini file with your OpenAI API key:
```
[OPENAI]
api_key = your_openai_api_key_here
```

Usage

Basic Audio Transcription

For simple audio file transcription:

python transcription.py

This will:

Load the audio file specified in the script (default: audio.m4a)
Compress and convert it if needed
Transcribe the audio using OpenAI's Whisper model
Format the transcription for readability
Save the result to transcricao_formatada.txt

Meeting Transcription with Technical Term Correction

For meeting recordings with technical discussions:

python meet_transcription.py

This advanced script will:

Process the audio file with optimized settings for speech
Split the audio into multiple chunks for parallel processing
Transcribe all chunks concurrently
Correct technical terms, framework names, and libraries
Save both the raw and corrected transcriptions

⚙️ Customization

You can customize the behavior by modifying:

Audio file paths in the main functions
Maximum file size limits (default: 20-25MB)
Number of processing chunks and workers for parallel processing
Language settings for transcription (default: Portuguese)
Formatting prompts for text correction

📝 Logging

Both scripts include comprehensive logging that records:

Process steps and completion status
File sizes and compression details
Errors and exceptions
Processing times and results

Logs are saved to transcription.log and also output to the console.

🔧 Technical Implementation

Audio Processing

The project uses pydub for audio manipulation, which supports:

Reading various audio formats
Bitrate adjustment
Sample rate conversion
Audio chunking

Transcription Engine

Transcription is performed using OpenAI's Whisper model through the OpenAI API, with optimized parameters for accuracy.

Concurrent Processing

The concurrent.futures module enables efficient parallel processing of audio chunks, significantly reducing processing time for large files.

📄 License

[Add your license information here]

🤝 Contributing

Contributions, issues, and feature requests are welcome!

🙏 Acknowledgments

OpenAI for providing the Whisper transcription model
Contributors and developers of the pydub library

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
compressed_audio.mp3		compressed_audio.mp3
meet_transcription.py		meet_transcription.py
transcription.py		transcription.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech-to-Text Transcription Tool

📋 Overview

🔑 Key Features

🗂️ Repository Structure

🧠 How It Works

Audio Preprocessing

Transcription Process

🚀 Getting Started

Prerequisites

Installation

Usage

Basic Audio Transcription

Meeting Transcription with Technical Term Correction

⚙️ Customization

📝 Logging

🔧 Technical Implementation

Audio Processing

Transcription Engine

Concurrent Processing

📄 License

🤝 Contributing

🙏 Acknowledgments

speech-to-text

About

Uh oh!

Releases

Packages

Languages

AntonioAEMartins/speech-to-text

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Transcription Tool

📋 Overview

🔑 Key Features

🗂️ Repository Structure

🧠 How It Works

Audio Preprocessing

Transcription Process

🚀 Getting Started

Prerequisites

Installation

Usage

Basic Audio Transcription

Meeting Transcription with Technical Term Correction

⚙️ Customization

📝 Logging

🔧 Technical Implementation

Audio Processing

Transcription Engine

Concurrent Processing

📄 License

🤝 Contributing

🙏 Acknowledgments

speech-to-text

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages