# Whisper Audio/Video Transcription Tool for Google Colab

This notebook provides an easy-to-use interface for transcribing audio and video files using OpenAI's Whisper model.

## Features
- Support for multiple audio/video formats (m4a, mp3, wav, mp4, avi, mov)
- Multiple output formats (txt, srt, vtt, json)
- Optional email notifications
- Batch processing capability

## Instructions
1. Run each cell in order
2. Fill in your configuration when prompted
3. Upload your files to Google Drive
4. Run the transcription

## Step 1: Mount Google Drive

In [None]:
# Mount Google Drive to access your files
from google.colab import drive
drive.mount('/content/drive')
print("✅ Google Drive mounted successfully!")

## Step 2: Install Dependencies

In [None]:
# Install Whisper and dependencies
print("📦 Installing Whisper and dependencies...")
!pip install -q git+https://github.com/openai/whisper.git
!apt-get -qq update && apt-get -qq install ffmpeg
print("✅ Installation complete!")

## Step 3: Configuration Setup

Fill in your settings below. You can change these values as needed.

In [None]:
# Configuration Settings
# Edit these values according to your needs

# Path to your audio/video files on Google Drive
# Example: "/content/drive/MyDrive/audio_files"
AUDIO_DIR = "/content/drive/MyDrive/audio_files"  # @param {type:"string"}

# Path where transcription results will be saved
# Example: "/content/drive/MyDrive/transcriptions"
OUTPUT_DIR = "/content/drive/MyDrive/transcriptions"  # @param {type:"string"}

# Whisper model size
# Options: "tiny", "base", "small", "medium", "large"
# Larger models are more accurate but slower
MODEL_SIZE = "medium"  # @param ["tiny", "base", "small", "medium", "large"]

# Language of the audio
# Use "auto" for automatic detection or specify language code (e.g., "en", "zh", "es")
LANGUAGE = "auto"  # @param {type:"string"}

# Email notification settings (optional)
ENABLE_EMAIL = False  # @param {type:"boolean"}
EMAIL_FROM = ""  # @param {type:"string"}
EMAIL_PASSWORD = ""  # @param {type:"string"}
EMAIL_TO = ""  # @param {type:"string"}

# Processing mode
PROCESS_MODE = "latest"  # @param ["latest", "all", "specific"]
SPECIFIC_FILE = ""  # @param {type:"string"}

print("✅ Configuration saved!")
print(f"📁 Audio directory: {AUDIO_DIR}")
print(f"📁 Output directory: {OUTPUT_DIR}")
print(f"🤖 Model: {MODEL_SIZE}")
print(f"🌐 Language: {LANGUAGE}")

## Step 4: Core Functions

In [None]:
import os
import glob
import json
import whisper
from datetime import datetime
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

class WhisperTranscriber:
    def __init__(self):
        """Initialize the transcriber with configuration"""
        self.audio_dir = AUDIO_DIR
        self.output_dir = OUTPUT_DIR
        self.model_size = MODEL_SIZE
        self.language = LANGUAGE if LANGUAGE != "auto" else None
        self.model = None
        
        # Supported file formats
        self.supported_formats = [".m4a", ".mp3", ".wav", ".mp4", ".avi", ".mov", ".mkv"]
        
        # Create output directory if it doesn't exist
        os.makedirs(self.output_dir, exist_ok=True)
        
    def load_model(self):
        """Load the Whisper model"""
        if self.model is None:
            print(f"🔄 Loading Whisper {self.model_size} model...")
            self.model = whisper.load_model(self.model_size)
            print("✅ Model loaded successfully!")
        return self.model
    
    def find_files(self, specific_file=None):
        """Find audio/video files in the input directory"""
        if specific_file:
            file_path = os.path.join(self.audio_dir, specific_file)
            if os.path.exists(file_path):
                return [file_path]
            else:
                print(f"❌ File not found: {file_path}")
                return []
        
        all_files = []
        for fmt in self.supported_formats:
            pattern = os.path.join(self.audio_dir, f"*{fmt}")
            all_files.extend(glob.glob(pattern))
        
        if not all_files:
            print(f"❌ No supported files found in {self.audio_dir}")
            print(f"   Supported formats: {', '.join(self.supported_formats)}")
            return []
        
        # Sort by modification time (newest first)
        all_files.sort(key=os.path.getmtime, reverse=True)
        return all_files
    
    def transcribe_file(self, file_path):
        """Transcribe a single audio/video file"""
        # Load model if not already loaded
        model = self.load_model()
        
        # Get file info
        file_name = os.path.basename(file_path)
        file_base = os.path.splitext(file_name)[0]
        
        # Create output subdirectory for this file
        output_subdir = os.path.join(self.output_dir, file_base)
        os.makedirs(output_subdir, exist_ok=True)
        
        print(f"\n🎯 Processing: {file_name}")
        print(f"📁 Output folder: {output_subdir}")
        
        try:
            # Transcribe with Whisper
            print(f"🔄 Transcribing... (this may take a while)")
            start_time = datetime.now()
            
            result = model.transcribe(
                file_path,
                language=self.language,
                verbose=False
            )
            
            end_time = datetime.now()
            duration = (end_time - start_time).total_seconds()
            
            # Save outputs
            self.save_outputs(result, output_subdir, file_base)
            
            print(f"✅ Transcription complete! (Time: {duration:.1f}s)")
            print(f"📄 Text saved to: {output_subdir}/{file_base}.txt")
            print(f"📄 Subtitles saved to: {output_subdir}/{file_base}.srt")
            
            return {
                'status': 'success',
                'file': file_name,
                'output_dir': output_subdir,
                'duration': duration,
                'text_preview': result['text'][:200] + '...' if len(result['text']) > 200 else result['text']
            }
            
        except Exception as e:
            print(f"❌ Error transcribing {file_name}: {str(e)}")
            return {
                'status': 'failed',
                'file': file_name,
                'error': str(e)
            }
    
    def save_outputs(self, result, output_dir, base_name):
        """Save transcription results in multiple formats"""
        # Save plain text
        with open(os.path.join(output_dir, f"{base_name}.txt"), "w", encoding="utf-8") as f:
            f.write(result["text"])
        
        # Save JSON with timestamps
        with open(os.path.join(output_dir, f"{base_name}.json"), "w", encoding="utf-8") as f:
            json.dump(result, f, ensure_ascii=False, indent=2)
        
        # Save SRT subtitles
        self.save_srt(result["segments"], os.path.join(output_dir, f"{base_name}.srt"))
        
        # Save VTT subtitles
        self.save_vtt(result["segments"], os.path.join(output_dir, f"{base_name}.vtt"))
    
    def save_srt(self, segments, output_path):
        """Save subtitles in SRT format"""
        with open(output_path, "w", encoding="utf-8") as f:
            for i, segment in enumerate(segments, 1):
                start = self.format_timestamp(segment["start"], "srt")
                end = self.format_timestamp(segment["end"], "srt")
                text = segment["text"].strip()
                f.write(f"{i}\n{start} --> {end}\n{text}\n\n")
    
    def save_vtt(self, segments, output_path):
        """Save subtitles in WebVTT format"""
        with open(output_path, "w", encoding="utf-8") as f:
            f.write("WEBVTT\n\n")
            for segment in segments:
                start = self.format_timestamp(segment["start"], "vtt")
                end = self.format_timestamp(segment["end"], "vtt")
                text = segment["text"].strip()
                f.write(f"{start} --> {end}\n{text}\n\n")
    
    def format_timestamp(self, seconds, format_type="srt"):
        """Format timestamp for subtitle files"""
        hours = int(seconds // 3600)
        minutes = int((seconds % 3600) // 60)
        secs = int(seconds % 60)
        millis = int((seconds % 1) * 1000)
        
        if format_type == "srt":
            return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
        else:  # vtt
            return f"{hours:02d}:{minutes:02d}:{secs:02d}.{millis:03d}"
    
    def send_notification(self, results):
        """Send email notification with results"""
        if not ENABLE_EMAIL or not EMAIL_FROM or not EMAIL_TO:
            return
        
        try:
            # Prepare email content
            subject = "Whisper Transcription Complete"
            
            body = "Transcription Results:\n\n"
            for result in results:
                if result['status'] == 'success':
                    body += f"✅ {result['file']}\n"
                    body += f"   Output: {result['output_dir']}\n"
                    body += f"   Time: {result['duration']:.1f}s\n\n"
                else:
                    body += f"❌ {result['file']}\n"
                    body += f"   Error: {result['error']}\n\n"
            
            # Send email
            msg = MIMEMultipart()
            msg['From'] = EMAIL_FROM
            msg['To'] = EMAIL_TO
            msg['Subject'] = subject
            msg.attach(MIMEText(body, 'plain'))
            
            with smtplib.SMTP('smtp.gmail.com', 587) as server:
                server.starttls()
                server.login(EMAIL_FROM, EMAIL_PASSWORD)
                server.send_message(msg)
            
            print("📧 Email notification sent!")
        except Exception as e:
            print(f"⚠️ Failed to send email: {e}")

# Create transcriber instance
transcriber = WhisperTranscriber()
print("✅ Transcriber initialized!")

## Step 5: Run Transcription

Choose one of the following options based on your needs:

### Option A: Process Latest File
Run this cell to transcribe the most recently added file in your audio directory.

In [None]:
# Process the latest file
if PROCESS_MODE == "latest" or PROCESS_MODE == "":
    files = transcriber.find_files()
    if files:
        print(f"📋 Found {len(files)} file(s) in total")
        print(f"🎯 Processing latest file: {os.path.basename(files[0])}")
        
        result = transcriber.transcribe_file(files[0])
        
        if result['status'] == 'success':
            print("\n" + "="*50)
            print("🎉 SUCCESS! Transcription complete.")
            print("="*50)
            print(f"\n📝 Preview of transcription:")
            print(result['text_preview'])
            
            # Send notification
            transcriber.send_notification([result])
    else:
        print("❌ No files found to process")

### Option B: Process All Files
Run this cell to transcribe ALL files in your audio directory.

In [None]:
# Process all files
if PROCESS_MODE == "all":
    files = transcriber.find_files()
    if files:
        print(f"📋 Found {len(files)} file(s) to process")
        print("="*50)
        
        results = []
        for i, file_path in enumerate(files, 1):
            print(f"\n[{i}/{len(files)}] Processing...")
            result = transcriber.transcribe_file(file_path)
            results.append(result)
        
        # Summary
        print("\n" + "="*50)
        print("📊 BATCH PROCESSING COMPLETE")
        print("="*50)
        
        success_count = sum(1 for r in results if r['status'] == 'success')
        print(f"✅ Successful: {success_count}/{len(results)}")
        print(f"❌ Failed: {len(results) - success_count}/{len(results)}")
        
        # Send notification
        transcriber.send_notification(results)
    else:
        print("❌ No files found to process")

### Option C: Process Specific File
Run this cell to transcribe a specific file by name.

In [None]:
# Process specific file
if PROCESS_MODE == "specific" and SPECIFIC_FILE:
    print(f"🎯 Looking for file: {SPECIFIC_FILE}")
    
    files = transcriber.find_files(SPECIFIC_FILE)
    if files:
        result = transcriber.transcribe_file(files[0])
        
        if result['status'] == 'success':
            print("\n" + "="*50)
            print("🎉 SUCCESS! Transcription complete.")
            print("="*50)
            print(f"\n📝 Preview of transcription:")
            print(result['text_preview'])
            
            # Send notification
            transcriber.send_notification([result])
    else:
        print(f"❌ File '{SPECIFIC_FILE}' not found in {AUDIO_DIR}")

## Step 6: View Results

Run this cell to see all transcription results in your output directory.

In [None]:
# Display output directory structure
import os

def show_directory_tree(path, prefix="", max_depth=3, current_depth=0):
    """Display directory tree structure"""
    if current_depth >= max_depth:
        return
    
    items = os.listdir(path)
    items.sort()
    
    for i, item in enumerate(items):
        item_path = os.path.join(path, item)
        is_last = i == len(items) - 1
        
        # Print current item
        print(prefix + ("└── " if is_last else "├── ") + item)
        
        # Recursively print subdirectories
        if os.path.isdir(item_path):
            extension = "    " if is_last else "│   "
            show_directory_tree(item_path, prefix + extension, max_depth, current_depth + 1)

print(f"📁 Output Directory: {OUTPUT_DIR}")
print("="*50)

if os.path.exists(OUTPUT_DIR):
    show_directory_tree(OUTPUT_DIR)
else:
    print("Output directory is empty or doesn't exist yet.")

# Count total files
total_files = sum(len(files) for _, _, files in os.walk(OUTPUT_DIR))
print("\n" + "="*50)
print(f"📊 Total files: {total_files}")

## Additional Tools

### Download Results as ZIP
Run this cell to create a ZIP file of all transcriptions for easy download.

In [None]:
# Create ZIP file of all transcriptions
import zipfile
from datetime import datetime

zip_filename = f"transcriptions_{datetime.now().strftime('%Y%m%d_%H%M%S')}.zip"
zip_path = f"/content/{zip_filename}"

with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for root, dirs, files in os.walk(OUTPUT_DIR):
        for file in files:
            file_path = os.path.join(root, file)
            arcname = os.path.relpath(file_path, OUTPUT_DIR)
            zipf.write(file_path, arcname)

print(f"✅ ZIP file created: {zip_filename}")
print(f"📥 Download it from the Files panel on the left")

# Also save to Google Drive
drive_zip_path = os.path.join(os.path.dirname(OUTPUT_DIR), zip_filename)
!cp "{zip_path}" "{drive_zip_path}"
print(f"💾 Also saved to Google Drive: {drive_zip_path}")

### Clean Up Output Directory
Run this cell to remove all transcription files (use with caution!).

In [None]:
# WARNING: This will delete all transcription files!
import shutil

confirm = input("⚠️ This will delete all transcription files. Type 'DELETE' to confirm: ")

if confirm == "DELETE":
    if os.path.exists(OUTPUT_DIR):
        shutil.rmtree(OUTPUT_DIR)
        os.makedirs(OUTPUT_DIR)
        print("✅ Output directory cleaned")
    else:
        print("Output directory doesn't exist")
else:
    print("❌ Cleanup cancelled")

## Tips & Troubleshooting

### Model Selection Guide
- **tiny**: Fastest, least accurate (good for quick tests)
- **base**: Fast, reasonable accuracy
- **small**: Balanced speed and accuracy
- **medium**: Good accuracy, slower (recommended for most uses)
- **large**: Best accuracy, very slow (for important transcriptions)

### Common Issues

1. **Out of memory error**: Try using a smaller model (tiny or base)
2. **File not found**: Make sure your file paths don't contain special characters
3. **Slow processing**: This is normal for longer files. Consider using GPU runtime (Runtime > Change runtime type > GPU)
4. **Email not sending**: Make sure you're using an app-specific password for Gmail, not your regular password

### Performance Tips
- Enable GPU in Colab for 2-3x faster processing
- Process files in batches during off-peak hours
- Use smaller models for draft transcriptions
- Split very long files into smaller segments