# 🎵 A2SB: Audio-to-Audio Schrödinger Bridge - Complete Edition

**High-Quality Audio Restoration with NVIDIA A2SB**

This notebook includes **everything** you need: model download, setup, and Gradio web interface!

## ⚠️ IMPORTANT: Colab Pro Required

**This notebook requires Google Colab Pro or Colab Pro+ subscription!**

### Why Colab Pro?
- 🎯 **GPU Memory**: Requires 15GB+ GPU VRAM (T4 or better)
- ⏱️ **Processing Time**: Long audio files need extended runtime
- 💾 **RAM**: Minimum 25GB system RAM recommended
- 🚀 **Performance**: Better GPUs (V100/A100) for faster processing

### Free Colab Limitations:
- ❌ Limited GPU memory (may cause Out of Memory errors)
- ❌ Short session timeouts
- ❌ Lower priority GPU access
- ❌ May disconnect during long processing

### Get Colab Pro:
👉 [Subscribe to Colab Pro](https://colab.research.google.com/signup) - Starting at $9.99/month

---

## 🌟 Features
- ✅ 44.1kHz high-resolution music restoration
- ✅ Bandwidth extension (high-frequency prediction)
- ✅ Audio inpainting (reconstruct missing segments)
- ✅ Support for long audio files (hours)
- ✅ End-to-end, no vocoder required
- ✅ **Gradio Web Interface** - User-friendly UI

## 📚 Resources
- 📄 [Paper](https://arxiv.org/abs/2501.11311)
- 💻 [GitHub Repository](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🎬 [Original NVIDIA Demo](https://research.nvidia.com/labs/adlr/A2SB/)
- 🤗 [Models](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

**Usage:** Run cells in order. The last cell will launch the Gradio interface!

## 🔍 Step 0: Check Your Colab Subscription

**Run this cell to verify your GPU and subscription status.**

In [None]:
import subprocess
import sys

print("="*60)
print("🔍 CHECKING COLAB ENVIRONMENT")
print("="*60)

# Check GPU
try:
    gpu_info = subprocess.check_output(['nvidia-smi', '--query-gpu=name,memory.total', '--format=csv,noheader'], text=True)
    gpu_name, gpu_memory = gpu_info.strip().split(', ')
    gpu_memory_gb = int(gpu_memory.split()[0]) / 1024
    
    print(f"\n✓ GPU Detected: {gpu_name}")
    print(f"✓ GPU Memory: {gpu_memory_gb:.1f} GB")
    
    # Check if sufficient
    if gpu_memory_gb < 14:
        print("\n" + "="*60)
        print("⚠️  WARNING: INSUFFICIENT GPU MEMORY")
        print("="*60)
        print(f"Your GPU has {gpu_memory_gb:.1f} GB memory.")
        print("This notebook requires at least 15GB GPU memory.")
        print("\n🔴 RECOMMENDATION: Upgrade to Colab Pro for better GPUs!")
        print("\nYou may experience:")
        print("  - Out of Memory errors")
        print("  - Failed inference")
        print("  - Slow processing")
        print("\n👉 Get Colab Pro: https://colab.research.google.com/signup")
        print("="*60)
    else:
        print("\n✅ GPU memory is sufficient!")
        if 'V100' in gpu_name or 'A100' in gpu_name or 'P100' in gpu_name:
            print("🎉 You have a premium GPU! (Colab Pro detected)")
        elif 'T4' in gpu_name:
            print("✓ T4 GPU detected - Good for this notebook")
            
except Exception as e:
    print("\n❌ ERROR: No GPU detected!")
    print("\nPlease enable GPU:")
    print("1. Go to Runtime > Change runtime type")
    print("2. Select 'GPU' as Hardware accelerator")
    print("3. Click Save")
    sys.exit(1)

# Check RAM
try:
    with open('/proc/meminfo', 'r') as f:
        meminfo = f.read()
    mem_total = int([line for line in meminfo.split('\n') if 'MemTotal' in line][0].split()[1]) / 1024 / 1024
    print(f"\n✓ System RAM: {mem_total:.1f} GB")
    
    if mem_total < 20:
        print("⚠️  Low RAM detected. Consider upgrading to Colab Pro for 25GB+ RAM.")
except:
    pass

print("\n" + "="*60)
print("✅ Environment check complete!")
print("="*60)
print("\nIf you see warnings above, we strongly recommend:")
print("👉 Upgrade to Colab Pro: https://colab.research.google.com/signup")
print("\nOtherwise, you may proceed at your own risk.")
print("="*60)

## 📦 1. Setup and Dependencies

**This will take 5-10 minutes. Please be patient!**

In [None]:
# Clone the optimized repository
print("📥 Cloning repository...\n")
!git clone https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git
%cd diffusion-audio-restoration-colab-Kaggle-
print("\n✅ Repository cloned successfully!")

In [None]:
# Install required libraries using requirements.txt
print("📦 Installing dependencies from requirements.txt...\n")
print("⏱️  This will take 5-10 minutes. Please wait...\n")

# Install from requirements.txt
!pip install -q -r requirements.txt

# Install PyTorch with CUDA support (if not already installed)
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install nest-asyncio for Gradio
!pip install -q nest-asyncio

print("\n" + "="*60)
print("✅ Installation complete! Verifying...")
print("="*60)

import torch
import lightning
import gradio as gr
import nest_asyncio

# Fix event loop issue
nest_asyncio.apply()

print(f"\n✓ PyTorch: {torch.__version__}")
print(f"✓ Lightning: {lightning.__version__}")
print(f"✓ Gradio: {gr.__version__}")
print(f"✓ CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"✓ CUDA Version: {torch.version.cuda}")
    print(f"✓ GPU: {torch.cuda.get_device_name(0)}")
    print(f"✓ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

# SSR Eval check
try:
    import ssr_eval
    print(f"✓ SSR Eval: Installed")
except ImportError:
    print(f"⚠️ SSR Eval: Not installed (optional)")

print("\n🎉 All libraries successfully installed!\n")

## 📥 2. Download Model Files

We'll download two model checkpoints:
- **One-split (0.0-1.0)**: Full time range (~1.5GB)
- **Two-split (0.5-1.0)**: Second time range (~1.5GB)

**Total download: ~3GB. This will take 5-10 minutes.**

In [None]:
# Create checkpoint directory
!mkdir -p ckpt
print("✓ Checkpoint directory created")

In [None]:
import os
from tqdm import tqdm

print("📥 Downloading model checkpoints...\n")
print("⏱️  This will take 5-10 minutes depending on your connection.\n")

# Model files
models = {
    'onesplit': {
        'path': 'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt',
        'url': 'https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_onesplit_0.0_1.0_release.ckpt'
    },
    'twosplit': {
        'path': 'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt',
        'url': 'https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
    }
}

# Check and download each model
for name, info in models.items():
    if os.path.exists(info['path']):
        size_mb = os.path.getsize(info['path']) / (1024 * 1024)
        print(f"✓ {name} model already exists ({size_mb:.2f} MB)")
    else:
        print(f"⬇️  Downloading {name} model (~1.5GB)...")
        !wget -q --show-progress -O {info['path']} {info['url']}
        if os.path.exists(info['path']):
            size_mb = os.path.getsize(info['path']) / (1024 * 1024)
            print(f"✅ {name} model downloaded ({size_mb:.2f} MB)")
        else:
            print(f"❌ Failed to download {name} model!")
            print(f"Please check your internet connection and try again.")

print("\n" + "="*50)
print("✅ Model download complete!")
print("="*50)

## ⚙️ 3. Configuration

Update the configuration file with the correct model paths.

In [None]:
import yaml

print("⚙️  Updating configuration...\n")

# Update config file
config_path = 'configs/ensemble_2split_sampling.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

config['model']['pretrained_checkpoints'] = [
    'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt',
    'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
]

with open(config_path, 'w') as f:
    yaml.dump(config, f)

print("✅ Configuration updated successfully!")
print(f"\nModel paths:")
for i, path in enumerate(config['model']['pretrained_checkpoints'], 1):
    print(f"  {i}. {path}")

## 🎨 4. Launch Gradio Web Interface

### 🚀 Ready to restore audio!

**Features:**
- 📤 Drag-and-drop file upload
- 🎤 Microphone recording
- ⚙️ Advanced settings (sampling steps, cutoff frequency)
- 📊 Real-time progress tracking
- 🔊 Instant playback and comparison
- 📈 Spectral analysis visualization

**How to use:**
1. **Run the cell below** - Wait for the Gradio link to appear
2. **Click the link** (usually ends with `.gradio.live`)
3. **Upload audio** or record from microphone
4. **Choose mode:**
   - **Bandwidth Extension**: Restore high frequencies (for low-quality MP3s)
   - **Inpainting**: Fill in missing audio segments
5. **Adjust settings** (optional):
   - Sampling Steps: 25-100 (higher = better quality, slower)
   - Auto Cutoff: Automatically detect cutoff frequency
   - Inpainting Length: 0.1-1.0 seconds
6. **Click "🚀 Restore"** and wait for processing
7. **Listen & Download** the restored audio

**Tips:**
- Start with default settings (50 steps, auto cutoff)
- For faster results: 25-30 steps
- For best quality: 75-100 steps
- Processing time: ~2-3 minutes per 10 seconds of audio

**⚠�� Important:**
- Keep this notebook tab open during processing
- Don't close the Colab session
- If you get "Out of Memory" error, reduce sampling steps or audio length

In [None]:
# Launch Gradio interface
print("🚀 Launching Gradio interface...\n")
print("⏱️  Please wait for the link to appear below.\n")
print("="*60)

!python gradio_app.py

print("\n" + "="*60)
print("✅ Gradio interface launched!")
print("Click the link above to access the web interface.")
print("="*60)

## 📚 5. Tips and Troubleshooting

### ⚡ Performance Optimization

**GPU Requirements:**
- ✅ **Colab Pro**: T4 (16GB), V100 (32GB), or A100 (40GB)
- ⚠️ **Free Colab**: May work but expect Out of Memory errors

**Processing Times (on T4 GPU):**
- 10 seconds audio, 50 steps: ~2-3 minutes
- 30 seconds audio, 50 steps: ~5-7 minutes
- 60 seconds audio, 50 steps: ~10-15 minutes

### 🎯 Quality Settings

**Sampling Steps:**
- **25-30:** Fast (good quality)
- **50-75:** Balanced (excellent quality) ⭐ Recommended
- **75-100:** Best (outstanding quality)

**Cutoff Frequency (Bandwidth Extension):**
- **Auto-detect**: Usually best ⭐ Recommended
- **Manual adjustment:**
  - Low-quality MP3: 2000-4000 Hz
  - Medium quality: 4000-8000 Hz
  - High quality: 8000+ Hz

**Inpainting Length:**
- 0.1-0.3s: Small gaps or clicks
- 0.3-0.5s: Medium gaps
- 0.5-1.0s: Large missing segments

### 🔧 Troubleshooting

#### ❌ CUDA Out of Memory Error

**Solutions:**
1. **Reduce sampling steps** to 25-30
2. **Split audio** into shorter segments (10-20 seconds)
3. **Restart runtime**: Runtime > Restart runtime
4. **Clear GPU memory**: Run the cell below
5. **Upgrade to Colab Pro** for better GPUs

```python
# Clear GPU memory
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
print("✅ GPU memory cleared")
```

#### ❌ Model Not Found Error

**Solutions:**
1. Re-run the model download cells (Section 2)
2. Check your internet connection
3. Verify files exist:
```python
!ls -lh ckpt/
```

#### ❌ Gradio Interface Not Loading

**Solutions:**
1. Wait 30-60 seconds for the link to appear
2. Check if the cell is still running
3. Restart runtime and run all cells again

#### ❌ Audio Format Error

**Solution:** Convert to WAV format
```python
import librosa
import soundfile as sf

# Convert any audio to WAV
y, sr = librosa.load('input.mp3', sr=44100)
sf.write('input.wav', y, sr)
```

#### ⚠️ Session Timeout

**Solutions:**
1. **Colab Pro**: Longer session times
2. Keep the tab active
3. Process shorter audio files
4. Save intermediate results

### 💡 Best Practices

1. **Start small**: Test with 10-20 second clips first
2. **Use defaults**: 50 steps, auto cutoff works well
3. **Monitor GPU**: Check `nvidia-smi` if issues occur
4. **Save outputs**: Download restored audio immediately
5. **Batch processing**: Process multiple files one at a time

### 📖 License and Usage

- **Model:** NVIDIA OneWay NonCommercial License
- **Code:** NVIDIA Source Code License - Non Commercial
- **Commercial Use:** Contact NVIDIA for licensing
- **Research Use:** Free for academic and research purposes

### 🔗 Additional Resources

- **Paper:** [arXiv:2501.11311](https://arxiv.org/abs/2501.11311)
- **GitHub:** [test4373/diffusion-audio-restoration](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- **Original NVIDIA Repo:** [NVIDIA/diffusion-audio-restoration](https://github.com/NVIDIA/diffusion-audio-restoration)
- **Demo:** [NVIDIA Research](https://research.nvidia.com/labs/adlr/A2SB/)
- **Models:** [HuggingFace](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

### 📧 Support

- **Issues:** [GitHub Issues](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-/issues)
- **Original NVIDIA Issues:** [NVIDIA GitHub](https://github.com/NVIDIA/diffusion-audio-restoration/issues)

---

### 🎉 Thank You!

Thank you for using this notebook!

**Citation:**
```bibtex
@article{kong2025a2sb,
  title={A2SB: Audio-to-Audio Schrodinger Bridges},
  author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2501.11311},
  year={2025}
}
```

### ⭐ Support This Project

If you find this project useful:
- ⭐ Star the [GitHub repository](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🐛 Report bugs or suggest features
- 📢 Share with others who might benefit

---

**Made with ❤️ for the audio restoration community**

**Optimized for Google Colab Pro with GPU memory management and user-friendly interface**