# 🎵 A2SB: Audio Restoration - Google Colab

**High-Quality Audio Restoration with NVIDIA A2SB**

Restore your audio files with AI!

## 🚀 Features
- ✅ 44.1kHz high-resolution audio restoration
- ✅ Bandwidth extension (high-frequency prediction)
- ✅ Audio inpainting (fill missing parts)
- ✅ User-friendly Gradio interface

## 📋 Requirements
- 🎯 **GPU**: T4 or better (Colab Pro recommended)
- 💾 **RAM**: Minimum 12GB
- ⏱️ **Time**: ~2-3 minutes for 10 seconds of audio

## 📚 Resources
- 📄 [Paper](https://arxiv.org/abs/2501.11311)
- 💻 [GitHub](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🤗 [Models](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

**Usage:** Run cells in order. The last cell will launch the Gradio interface!

## 📦 Step 1: Setup (5-10 minutes)

Clone the repository and install required libraries.

In [None]:
# Clone repository
print("📥 Cloning repository...\n")
!git clone https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git
%cd diffusion-audio-restoration-colab-Kaggle-

# Install libraries
print("\n📦 Installing libraries... (5-10 minutes)\n")
!pip install -q -r requirements.txt
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q nest-asyncio

# Verification
import torch
import gradio as gr
import nest_asyncio
nest_asyncio.apply()

print("\n" + "="*60)
print("✅ Setup complete!")
print("="*60)
print(f"\n✓ PyTorch: {torch.__version__}")
print(f"✓ Gradio: {gr.__version__}")
print(f"✓ CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✓ GPU: {torch.cuda.get_device_name(0)}")
    print(f"✓ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
print("\n🎉 Ready!\n")

## 📥 Step 2: Download Models (5-10 minutes)

Download two model checkpoints (~4.5GB total).

In [None]:
import os

# Create checkpoint directory
!mkdir -p ckpt

print("📥 Downloading models... (5-10 minutes)\n")

# Model 1: One-split (0.0-1.0)
model1 = 'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt'
if not os.path.exists(model1):
    print("⬇️  Downloading Model 1... (~2.3GB)")
    !wget -q --show-progress https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_onesplit_0.0_1.0_release.ckpt -O {model1}
    print("✅ Model 1 downloaded!\n")
else:
    print("✓ Model 1 already exists\n")

# Model 2: Two-split (0.5-1.0)
model2 = 'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
if not os.path.exists(model2):
    print("⬇️  Downloading Model 2... (~2.3GB)")
    !wget -q --show-progress https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_twosplit_0.5_1.0_release.ckpt -O {model2}
    print("✅ Model 2 downloaded!\n")
else:
    print("�� Model 2 already exists\n")

print("="*60)
print("✅ All models ready!")
print("="*60)

## ⚙️ Step 3: Configuration

Set up model paths.

In [None]:
import yaml

print("⚙️  Updating configuration...\n")

config_path = 'configs/ensemble_2split_sampling.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

config['model']['pretrained_checkpoints'] = [
    'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt',
    'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
]

with open(config_path, 'w') as f:
    yaml.dump(config, f)

print("✅ Configuration updated!")
print(f"\nModel paths:")
for i, path in enumerate(config['model']['pretrained_checkpoints'], 1):
    print(f"  {i}. {path}")

## 🎨 Step 4: Launch Gradio Interface

### 🚀 Ready to restore audio!

**How to use:**
1. Run the cell below
2. Click the link (ends with `.gradio.live`)
3. Upload audio file or record from microphone
4. Choose mode:
   - **Bandwidth Extension**: Restore high frequencies (for low-quality MP3s)
   - **Inpainting**: Fill missing parts
5. Click "🚀 Restore"
6. Listen and download the result!

**Settings:**
- **Sampling Steps**: 25-100 (higher = better quality, slower)
  - Fast: 25-30 steps
  - Balanced: 50 steps ⭐ Recommended
  - Best: 75-100 steps
- **Auto Cutoff**: Automatically detect cutoff frequency ⭐ Recommended
- **Inpainting Length**: 0.1-1.0 seconds

**Processing Times (T4 GPU):**
- 10s audio, 50 steps: ~2-3 minutes
- 30s audio, 50 steps: ~5-7 minutes
- 60s audio, 50 steps: ~10-15 minutes

**⚠️ Important:**
- Keep the notebook open during processing
- If you get "Out of Memory" error, reduce sampling steps
- Test with short audio files first

In [None]:
# Launch Gradio interface
print("🚀 Launching Gradio interface...\n")
print("⏱️  Please wait for the link to appear.\n")
print("="*60)

!python gradio_app.py

print("\n" + "="*60)
print("✅ Gradio interface launched!")
print("Click the link above to access the interface.")
print("="*60)

## 💡 Tips and Troubleshooting

### ⚡ Performance

**GPU Requirements:**
- ✅ T4 (16GB) - Colab Pro
- ✅ V100 (32GB) - Colab Pro+
- ✅ A100 (40GB) - Colab Pro+
- ⚠️ Free Colab - May get memory errors

### 🎯 Quality Settings

**Sampling Steps:**
- 25-30: Fast (good quality)
- 50-75: Balanced (excellent quality) ⭐
- 75-100: Best (outstanding quality)

**Cutoff Frequency (Bandwidth Extension):**
- Auto-detect: Usually best ⭐
- Manual:
  - Low-quality MP3: 2000-4000 Hz
  - Medium quality: 4000-8000 Hz
  - High quality: 8000+ Hz

### ❌ Common Errors

**"Out of Memory" Error:**
- Reduce sampling steps (25-30)
- Use shorter audio files
- Upgrade to Colab Pro

**Session Timeout:**
- Use Colab Pro (longer sessions)
- Keep tab active
- Process shorter audio files

### 📖 License

- **Model**: NVIDIA OneWay NonCommercial License
- **Code**: NVIDIA Source Code License - Non Commercial
- **Commercial Use**: Contact NVIDIA for licensing
- **Research**: Free for academic and research purposes

### 🔗 Additional Resources

- **Paper**: [arXiv:2501.11311](https://arxiv.org/abs/2501.11311)
- **GitHub**: [test4373/diffusion-audio-restoration](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- **NVIDIA Demo**: [research.nvidia.com](https://research.nvidia.com/labs/adlr/A2SB/)
- **Models**: [HuggingFace](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

### 🎉 Thank You!

Thank you for using this notebook!

**Citation:**
```bibtex
@article{kong2025a2sb,
  title={A2SB: Audio-to-Audio Schrodinger Bridges},
  author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2501.11311},
  year={2025}
}
```

### ⭐ Support This Project

If you find this useful:
- ⭐ Star the [GitHub repository](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🐛 Report bugs or suggest features
- 📢 Share with others

---

**❤️ Made for the audio restoration community**