# 🎵 A2SB: Audio Restoration - Kaggle

**High-Quality Audio Restoration with NVIDIA A2SB**

Restore your audio files with AI!

## 🚀 Features
- ✅ 44.1kHz high-resolution audio restoration
- ✅ Bandwidth extension (high-frequency prediction)
- ✅ Audio inpainting (fill missing parts)
- ✅ User-friendly Gradio interface

## 📋 Requirements
- 🎯 **GPU**: P100 (16GB) recommended
- 💾 **RAM**: Minimum 25GB
- ⏱️ **Time**: ~2-3 minutes for 10 seconds of audio

## ⚙️ How to Enable GPU?
1. Click **Settings** on the right
2. Under **Accelerator**, select **GPU P100**
3. Click **Save**

## 📚 Resources
- 📄 [Paper](https://arxiv.org/abs/2501.11311)
- 💻 [GitHub](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🤗 [Models](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

**Usage:** Run cells in order!

## 🔧 Step 1: Fix CUDA Compatibility (CRITICAL!)

**⚠️ RUN THIS FIRST!**

Kaggle has PyTorch with CUDA 11.8 but torchvision with CUDA 12.4. This causes a version mismatch error.

**This cell will:**
1. Uninstall incompatible packages
2. Install CUDA 11.8 compatible versions
3. Fix the version mismatch

**⏱️ Takes 2-3 minutes**

In [None]:
print("="*60)
print("🔧 FIXING CUDA VERSION MISMATCH")
print("="*60)
print("\n⚠️ This is CRITICAL for Kaggle compatibility!")
print("⏱️  This will take 2-3 minutes...\n")

# Step 1: Uninstall incompatible packages
print("📦 Step 1/4: Removing incompatible packages...")
!pip uninstall -y torch torchvision torchaudio torchmetrics transformers torchao -q
print("✓ Old packages removed\n")

# Step 2: Install PyTorch with CUDA 11.8
print("📦 Step 2/4: Installing PyTorch with CUDA 11.8...")
!pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 \
    --index-url https://download.pytorch.org/whl/cu118 -q
print("✓ PyTorch installed\n")

# Step 3: Install compatible torchmetrics
print("📦 Step 3/4: Installing torchmetrics...")
!pip install torchmetrics==1.2.0 --no-deps -q
!pip install numpy packaging -q
print("✓ Torchmetrics installed\n")

# Step 4: Install Lightning
print("📦 Step 4/4: Installing Lightning...")
!pip install lightning==2.1.0 pytorch-lightning==2.1.0 -q
print("✓ Lightning installed\n")

print("="*60)
print("✅ CUDA FIX COMPLETE!")
print("="*60)
print("\n⚠️ IMPORTANT: You MUST restart the kernel now!")
print("\n👉 Go to: Session > Restart Session")
print("\nAfter restart, continue with Step 2 (skip this cell).")
print("="*60)

## ⚠️ RESTART KERNEL NOW!

**After running the cell above:**
1. Go to **Session > Restart Session**
2. Wait for kernel to restart
3. Continue with Step 2 below
4. **DO NOT** run Step 1 again

---

## ✅ Step 2: Verify Installation (After Kernel Restart)

**Run this cell to verify the CUDA fix worked.**

In [None]:
import torch
import sys

print("="*60)
print("🔍 VERIFYING INSTALLATION")
print("="*60)

print(f"\n✓ PyTorch: {torch.__version__}")
print(f"✓ PyTorch CUDA: {torch.version.cuda}")
print(f"✓ CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"\n✓ GPU: {torch.cuda.get_device_name(0)}")
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"✓ GPU Memory: {gpu_memory:.1f} GB")
    
    if 'P100' in torch.cuda.get_device_name(0):
        print("\n🎉 Perfect! You have a P100 GPU!")
    
    if gpu_memory < 14:
        print(f"\n⚠️ Warning: GPU only has {gpu_memory:.1f} GB memory")
        print("Recommended: Enable P100 GPU (16GB) in Settings")
    else:
        print(f"\n✅ GPU memory is sufficient ({gpu_memory:.1f} GB)")
else:
    print("\n❌ ERROR: CUDA not available!")
    print("Please enable GPU in Settings > Accelerator > GPU P100")
    sys.exit(1)

if torch.version.cuda == "11.8":
    print(f"\n✅ CUDA version is correct: {torch.version.cuda}")
    print("✅ No version mismatch - Ready to proceed!")
else:
    print(f"\n⚠️ Warning: PyTorch CUDA version is {torch.version.cuda}")
    print("Expected: 11.8")

print("\n" + "="*60)
print("✅ VERIFICATION COMPLETE!")
print("="*60)

## 📦 Step 3: Install Dependencies

**Install remaining packages needed for audio restoration.**

**⏱️ Takes 3-5 minutes**

In [None]:
print("📦 Installing dependencies...\n")

# Audio processing libraries
print("📥 Installing audio processing libraries...")
!pip install librosa soundfile scipy einops -q
print("✓ Audio libraries installed\n")

# Configuration tools
print("📥 Installing configuration tools...")
!pip install jsonargparse[signatures] pyyaml -q
print("✓ Configuration tools installed\n")

# Gradio and utilities
print("📥 Installing Gradio and utilities...")
!pip install 'huggingface_hub>=0.19.0,<1.0' -q
!pip install gradio==4.44.0 -q
!pip install nest-asyncio tqdm rotary-embedding-torch pyngrok -q
print("✓ Gradio installed\n")

# Optional: SSR Eval
!pip install ssr-eval -q 2>/dev/null || echo "⚠️ SSR Eval skipped (optional)"

print("\n" + "="*60)
print("✅ ALL DEPENDENCIES INSTALLED!")
print("="*60)

# Verification
import torch
import lightning
import gradio as gr
import librosa
import nest_asyncio
nest_asyncio.apply()

print(f"\n✓ PyTorch: {torch.__version__}")
print(f"✓ Lightning: {lightning.__version__}")
print(f"✓ Gradio: {gr.__version__}")
print(f"✓ Librosa: {librosa.__version__}")
print("\n🎉 Ready to proceed!")

## ��� Step 4: Clone Repository

**Download the audio restoration code.**

In [None]:
import os

print("📥 Cloning repository...\n")

os.chdir('/kaggle/working')
!rm -rf diffusion-audio-restoration-colab-Kaggle-
!git clone https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git
os.chdir('diffusion-audio-restoration-colab-Kaggle-')

print(f"\n✅ Repository cloned!")
print(f"✓ Directory: {os.getcwd()}")

## 🤗 Step 5: Download Models

**Download two model checkpoints from Hugging Face:**
- One-split (0.0-1.0): ~1.5GB
- Two-split (0.5-1.0): ~1.5GB

**Total: ~3GB. Takes 5-10 minutes.**

In [None]:
import os
from huggingface_hub import snapshot_download

print("🤗 Downloading models from Hugging Face...")
print("⏱️  This will take 5-10 minutes...\n")

model_dir = './pretrained_models'

try:
    snapshot_download(
        repo_id='nvidia/audio_to_audio_schrodinger_bridge',
        local_dir=model_dir,
        local_dir_use_symlinks=False
    )
    print(f"\n✅ Models downloaded: {model_dir}")
    print(f"\n📂 Model files:")
    !ls -lh {model_dir}
    print("\n✅ Model download complete!")
except Exception as e:
    print(f"❌ Model download error: {e}")
    print("\nTroubleshooting:")
    print("1. Check your internet connection")
    print("2. Verify Hugging Face is accessible")
    print("3. Try running this cell again")

## 🔑 Step 6: Ngrok Token (Optional but Recommended)

**Ngrok provides a stable public URL for your Gradio interface.**

### How to get Ngrok token:
1. Go to [ngrok.com](https://ngrok.com/)
2. Sign up for free account
3. Go to [Dashboard](https://dashboard.ngrok.com/get-started/your-authtoken)
4. Copy your authtoken
5. Paste it below

**Note:** Without ngrok token, Gradio will use its default sharing (also works fine).

In [None]:
import os

# Paste your Ngrok token here (optional)
NGROK_TOKEN = ""  # Example: "2abc123def456ghi789jkl"

if NGROK_TOKEN:
    os.environ['NGROK_TOKEN'] = NGROK_TOKEN
    print("✅ Ngrok token set!")
    print("🔗 Gradio will launch with a stable public URL")
else:
    print("⚠️ Ngrok token not set")
    print("📝 Gradio will use default sharing (still works)")
    print("\n💡 For stable URL, get token from ngrok.com")

## 🎨 Step 7: Launch Gradio Interface

### 🚀 Ready to restore audio!

**How to use:**
1. Run the cell below
2. Click the public link
3. Upload audio file
4. Choose mode:
   - **Bandwidth Extension**: Restore high frequencies
   - **Inpainting**: Fill missing parts
5. Click "🚀 Restore"
6. Listen and download the result!

**Settings:**
- **Sampling Steps**: 25-100
  - Fast: 25-30
  - Balanced: 50 ⭐
  - Best: 75-100
- **Auto Cutoff**: Recommended ⭐

**Processing Times (P100 GPU):**
- 10s audio, 50 steps: ~2-3 minutes
- 30s audio, 50 steps: ~5-7 minutes
- 60s audio, 50 steps: ~10-15 minutes

In [None]:
# Launch Gradio interface
print("🚀 Launching Gradio interface...\n")
print("⏱️  Please wait for the link to appear.\n")
print("="*60)

import os
if os.path.exists('gradio_app_kaggle.py'):
    print("✓ Using Kaggle-optimized Gradio app")
    !python gradio_app_kaggle.py --ngrok
else:
    print("⚠️ Using default Gradio app")
    !python gradio_app.py --share

print("\n" + "="*60)
print("✅ Gradio interface launched!")
print("Click the public link above to access the interface.")
print("="*60)

## 💡 Tips and Troubleshooting

### ❌ CUDA Out of Memory

**Run this cell to clear GPU memory:**

In [None]:
import gc
import torch

print("🧹 Clearing GPU memory...")

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    
    total = torch.cuda.get_device_properties(0).total_memory / 1e9
    allocated = torch.cuda.memory_allocated(0) / 1e9
    
    print(f"\n✅ GPU memory cleared!")
    print(f"\nMemory Status:")
    print(f"  Total: {total:.2f} GB")
    print(f"  Allocated: {allocated:.2f} GB")
    print(f"  Free: {total - allocated:.2f} GB")

### 🎯 Best Practices

1. **Start small**: Test with 10-20 second audio
2. **Use defaults**: 50 steps, auto cutoff
3. **Save outputs**: Download immediately
4. **Process one at a time**: One file at a time

### ❌ Common Issues

**CUDA Out of Memory:**
- Reduce sampling steps to 25-30
- Use shorter audio
- Run the memory clearing cell above

**Import Errors:**
- Make sure you ran Step 1 and restarted kernel
- Re-run Step 3 (dependencies)

### 📁 File Locations

- **Input**: Upload via Gradio
- **Output**: `/kaggle/working/gradio_outputs/`
- **Models**: `/kaggle/working/diffusion-audio-restoration-colab-Kaggle-/pretrained_models/`

### 📖 Resources

- **Paper**: [arXiv:2501.11311](https://arxiv.org/abs/2501.11311)
- **GitHub**: [Repository](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- **Models**: [HuggingFace](https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge)

---

## 🎉 Thank You!

**Citation:**
```bibtex
@article{kong2025a2sb,
  title={A2SB: Audio-to-Audio Schrodinger Bridges},
  author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:2501.11311},
  year={2025}
}
```

### ⭐ Support This Project

- ⭐ Star the [GitHub repository](https://github.com/test4373/diffusion-audio-restoration-colab-Kaggle-.git)
- 🐛 Report bugs
- 📢 Share with others
- 👍 Upvote this notebook

---

**❤️ Made for the audio restoration community**