# 🎵 A2SB: Audio-to-Audio Schrödinger Bridge - Complete Edition

**High-Quality Audio Restoration with NVIDIA A2SB**

This notebook includes **everything** you need: model download, setup, and Gradio web interface!

## 🌟 Features
- ✅ 44.1kHz high-resolution music restoration
- ✅ Bandwidth extension (high-frequency prediction)
- ✅ Audio inpainting (reconstruct missing segments)
- ✅ Support for long audio files (hours)
- ✅ End-to-end, no vocoder required
- ✅ **Gradio Web Interface** - User-friendly UI

## 📚 Resources
- 📄 [Paper](https://arxiv.org/abs/2501.11311)
- 💻 [GitHub](https://github.com/NVIDIA/diffusion-audio-restoration)
- 🎬 [Demo](https://research.nvidia.com/labs/adlr/A2SB/)
- 🤗 [Models](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

**Usage:** Run cells in order. The last cell will launch the Gradio interface!

## 📦 1. Setup and Dependencies

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/NVIDIA/diffusion-audio-restoration.git
%cd diffusion-audio-restoration

In [None]:
# Install required libraries
print("📦 Starting installation...\n")

# 1. Update basic packages
print("1️⃣ Updating basic packages...")
!pip install -q --upgrade pip setuptools wheel

# 2. PyTorch and torchaudio (CUDA-enabled)
print("2️⃣ Installing PyTorch (CUDA 11.8)...")
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 3. Audio processing libraries
print("3️⃣ Installing audio processing libraries...")
!pip install -q numpy scipy matplotlib librosa soundfile

# 4. Lightning (important!)
print("4️⃣ Installing PyTorch Lightning...")
!pip install -q lightning "pytorch-lightning>=2.0.0"

# 5. Other deep learning libraries
print("5️⃣ Installing other deep learning libraries...")
!pip install -q einops

# 6. Gradio and jsonargparse
print("6️⃣ Installing Gradio and jsonargparse...")
!pip install -q "jsonargparse[signatures]>=4.0.0" gradio

# 7. Rotary embedding (optional)
print("7️⃣ Installing optional packages...")
!pip install -q rotary-embedding-torch 2>/dev/null || echo "⚠️ rotary-embedding-torch skipped (optional)"

# 8. SSR Eval (required!)
print("8️⃣ Installing SSR Eval...")
!pip install -q ssr-eval || pip install -q git+https://github.com/speechbrain/ssr-eval.git || echo "⚠️ ssr_eval could not be installed, continuing..."

# 9. Nest-asyncio (for Gradio event loop fix)
print("9️⃣ Installing nest-asyncio...")
!pip install -q nest-asyncio

# 10. Check versions
print("\n" + "="*60)
print("✅ Installation complete! Version check:")
print("="*60)

import torch
import lightning
import gradio as gr
import nest_asyncio

# Fix event loop issue
nest_asyncio.apply()

print(f"✓ PyTorch: {torch.__version__}")
print(f"✓ Lightning: {lightning.__version__}")
print(f"✓ Gradio: {gr.__version__}")
print(f"✓ CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✓ CUDA Version: {torch.version.cuda}")
    print(f"✓ GPU: {torch.cuda.get_device_name(0)}")

# SSR Eval check
try:
    import ssr_eval
    print(f"✓ SSR Eval: Installed")
except ImportError:
    print(f"⚠️ SSR Eval: Not installed (optional)")

print("\n🎉 All libraries successfully installed!\n")

## 📥 2. Download Model Files

We'll download two model checkpoints:
- **One-split (0.0-1.0)**: Full time range
- **Two-split (0.5-1.0)**: Second time range

**Note:** Each model is approximately 1.5GB. Download may take 5-10 minutes depending on your connection.

In [None]:
# Create checkpoint directory
!mkdir -p ckpt

In [None]:
import os

# Model files
models = {
    'onesplit': {
        'path': 'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt',
        'url': 'https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_onesplit_0.0_1.0_release.ckpt'
    },
    'twosplit': {
        'path': 'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt',
        'url': 'https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge/resolve/main/ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
    }
}

# Check and download each model
for name, info in models.items():
    if os.path.exists(info['path']):
        size_mb = os.path.getsize(info['path']) / (1024 * 1024)
        print(f"✓ {name} model already exists ({size_mb:.2f} MB)")
    else:
        print(f"⬇ Downloading {name} model...")
        !wget -O {info['path']} {info['url']}
        if os.path.exists(info['path']):
            size_mb = os.path.getsize(info['path']) / (1024 * 1024)
            print(f"✓ {name} model downloaded ({size_mb:.2f} MB)")
        else:
            print(f"❌ Failed to download {name} model!")

print("\n" + "="*50)
print("Model download complete!")
print("="*50)

## ⚙️ 3. Configuration

Update the configuration file with the correct model paths.

In [None]:
import yaml

# Update config file
config_path = 'configs/ensemble_2split_sampling.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

config['model']['pretrained_checkpoints'] = [
    'ckpt/A2SB_onesplit_0.0_1.0_release.ckpt',
    'ckpt/A2SB_twosplit_0.5_1.0_release.ckpt'
]

with open(config_path, 'w') as f:
    yaml.dump(config, f)

print("✓ Configuration updated")
print(f"Model paths: {config['model']['pretrained_checkpoints']}")

## 🎨 4. Launch Gradio Web Interface

### User-friendly web interface for audio restoration!

**Features:**
- 📤 Drag-and-drop file upload
- 🎤 Microphone recording
- ⚙️ Advanced settings
- 📊 Real-time progress
- 🔊 Instant playback and comparison

**How to use:**
1. Run the cell below
2. Click on the generated link (usually ends with `.gradio.live`)
3. Upload your audio file or record from microphone
4. Choose restoration mode:
   - **Bandwidth Extension**: Restore high frequencies (for low-quality audio)
   - **Inpainting**: Fill in missing audio segments
5. Adjust settings if needed (or use defaults)
6. Click "🚀 Restore" and wait for processing
7. Listen to the restored audio and download if satisfied

**Tips:**
- Start with default settings (50 steps, auto cutoff)
- For faster results, reduce sampling steps to 25-30
- For best quality, increase sampling steps to 75-100
- Processing time: ~2-3 minutes per 10 seconds of audio

In [None]:
# Launch Gradio interface
!python gradio_app.py

## 📚 5. Tips and Troubleshooting

### ⚡ Performance Optimization

**GPU Requirements:**
- CUDA-enabled NVIDIA GPU required
- Minimum 8GB GPU memory recommended
- 16GB+ GPU memory for long audio files

**Processing Times (Approximate):**
- 10 seconds audio, 50 steps: ~2-3 minutes
- 30 seconds audio, 50 steps: ~5-7 minutes
- 60 seconds audio, 50 steps: ~10-15 minutes

### 🎯 Quality Settings

**Sampling Steps:**
- **25-50:** Fast prototyping
- **50-75:** Production quality (recommended)
- **75-100:** Maximum quality

**Cutoff Frequency (for Bandwidth Extension):**
- Auto-detection usually works best
- Manual adjustment guide:
  - Low-quality MP3: 2000-4000 Hz
  - Medium quality: 4000-8000 Hz
  - High quality: 8000+ Hz

**Inpainting Length:**
- 0.1-0.3s: Small gaps or clicks
- 0.3-0.5s: Medium gaps
- 0.5-1.0s: Large missing segments

### 🔧 Troubleshooting

**CUDA Out of Memory Error:**
```python
# Solution 1: Split audio into shorter segments
# Solution 2: Reduce sampling steps
# Solution 3: Use Colab Pro (more RAM/GPU)
# Solution 4: Restart runtime and clear GPU memory
```

**Model Not Found Error:**
- Re-run the model download cells
- Check your internet connection
- Verify files exist in `ckpt/` directory

**Audio Format Error:**
```python
# Convert audio to WAV format
import soundfile as sf
import librosa
y, sr = librosa.load('input.mp3', sr=44100)
sf.write('input.wav', y, sr)
```

**Gradio Interface Not Loading:**
```python
# Fix event loop issue
import nest_asyncio
nest_asyncio.apply()
```

**Slow Processing:**
- Ensure GPU is being used (check `nvidia-smi`)
- Close other GPU-intensive applications
- Reduce audio length or sampling steps

### 📖 License and Usage

- **Model:** NVIDIA OneWay NonCommercial License
- **Code:** NVIDIA Source Code License - Non Commercial
- **Commercial Use:** Contact NVIDIA for licensing
- **Research Use:** Free for academic and research purposes

### 🔗 Additional Resources

- **Paper:** [arXiv:2501.11311](https://arxiv.org/abs/2501.11311)
- **GitHub:** [NVIDIA/diffusion-audio-restoration](https://github.com/NVIDIA/diffusion-audio-restoration)
- **Demo:** [NVIDIA Research](https://research.nvidia.com/labs/adlr/A2SB/)
- **Models:** [HuggingFace](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

### 📧 Support

- **Issues:** [GitHub Issues](https://github.com/NVIDIA/diffusion-audio-restoration/issues)
- **Research Inquiries:** [NVIDIA Research](https://www.nvidia.com/en-us/research/inquiries/)

---

### 🎉 Thank You!

Thank you for using this notebook. For questions or feedback, please use GitHub Issues.

**Citation:**
```bibtex
@article{kong2025a2sb,
  title={A2SB: Audio-to-Audio Schrodinger Bridges},
  author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2501.11311},
  year={2025}
}
```

### 🌟 Star the Repository!

If you find this project useful, please consider starring the [GitHub repository](https://github.com/NVIDIA/diffusion-audio-restoration)!