<a href="https://colab.research.google.com/github/vikbol112/sd-webui-faceswaplab/blob/main/Copy_of_AudioSR_Colab_Fork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **AudioSR-Colab-Fork v0.6**



Colab adaptation of AudioSR, with modifications by Sweyei & jarredou.

- v0.6 (Latest)

Fixed: Dependency conflicts with transformers, sentence-transformers, and audiosr, ensuring proper compatibility.
Fixed: pip resolver issues by enforcing stricter package versions to prevent installation failures.
Fixed: CUDA compatibility issues, optimizing model performance for GPU acceleration.
Fixed: Hugging Face resume_download deprecation warning.
Fixed: timm.models.layers import warning (updated to timm.layers).
Improved: Model loading efficiency for faster initialization.
Improved: Audio processing stability, reducing potential errors during inference.

- v0.5

Resampled input audio based on input_cutoff instead of lowpass filtering.
Normalized each processed chunk to the same LUFS level as the input chunk (fixes volume drop issue).

- v0.4

Code rework: inference.py created for local CLI usage.

- v0.3

Added: Multiband ensemble option to merge original audio below the cutoff frequency with generated audio above.
Fixed: Non-WAV input error when saving the final audio.

- v0.2

Added: Chunking feature to process audio of any length.
Added: Stereo handling—each stereo channel is processed independently (dual mono) and reconstructed as stereo audio.
Added: Overlap feature to smooth chunk transitions (avoid high values as AudioSR is not fully phase-accurate, which may cause phase cancellations).

- Modifications & Changes by Sweyei & [jarredou](https://https://github.com/jarredou/)

Original work [AudioSR: Versatile Audio Super-resolution at Scale](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley



In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os

# Clone repo if it doesn't exist
if not os.path.exists("versatile_audio_super_resolution"):
    !git clone https://github.com/haoheliu/versatile_audio_super_resolution.git
%cd versatile_audio_super_resolution

# Install dependencies
!pip install -r requirements.txt
!pip install --upgrade pip
!pip uninstall -y transformers sentence-transformers audiosr
!pip install transformers==4.30.2 audiosr==0.0.7 --no-deps
!pip install cog huggingface_hub==0.29.2 unidecode phonemizer einops torchlibrosa ftfy timm librosa pyloudnorm soundfile progressbar

# Download inference script, overwrite if it exists
!wget -O inference.py https://raw.githubusercontent.com/jarredou/AudioSR-Colab-Fork/main/inference.py


### **IMPORTANT NOTE**

#### If the inference cell crashes, restart the runtime (do not disconnect, just restart it), else it will cause memory errors !

*If you're are doing multiple runs, think also to restart the runtime every 4 or 5 files to clean up memory*

In [None]:
#@markdown # **Change Directory & Import Dependencies**
%cd /content/versatile_audio_super_resolution

import os
import torch
import warnings

# Suppress warnings
warnings.filterwarnings("ignore")

# Set precision for better performance
torch.set_float32_matmul_precision("high")

# Select device: Use GPU if available, otherwise fallback to CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🚀 Using device: {device.upper()}")

#@markdown # **File Paths**
input_file_path = "/content/drive/MyDrive/AudioSR/jotc_remaster.mp3" #@param {type:"string"}
output_folder = "/content/drive/MyDrive/output" #@param {type:"string"}

# Check if input file exists
if not os.path.exists(input_file_path):
    raise FileNotFoundError(f"❌ Error: Input file '{input_file_path}' not found!")

#@markdown ---
#@markdown # **Inference Parameters**
ddim_steps = 20 #@param {type:"slider", min:20, max:200, step:10}
overlap = 0.04 #@param {type:"slider", min:0, max:0.96, step:0.04}
guidance_scale = 3.5 #@param {type:"slider", min:1, max:15, step:0.5}
seed = 0 #@param {type:"integer"}
chunk_size = 10.24 #@param [5.12, 10.24, 20.48] {type:"raw"}
multiband_ensemble = True #@param {type:"boolean"}
input_cutoff = "14000" #@param [20000, 19000, 18000, 17000, 16000, 14000, 13000, 12000, 11000, 10000, 9000, 8000, 7000, 6000, 5000, 4000, 3000, 2000]
input_cutoff = int(input_cutoff)

#@markdown ---
#@markdown # **Check GPU Status**
if device == "cuda":
    print("🚀 CUDA Available:", torch.cuda.is_available())
    print("🖥️ Current Device:", torch.cuda.current_device())
    print("🔋 Memory Allocated:", torch.cuda.memory_allocated() / 1e9, "GB")
    print("💾 Memory Reserved:", torch.cuda.memory_reserved() / 1e9, "GB")
else:
    print("⚡ Running on CPU (CUDA not available)")

#@markdown ---
#@markdown # **Run Inference**
!python inference.py --input "{input_file_path}" \
                     --output "{output_folder}" \
                     --ddim_steps {ddim_steps} \
                     --guidance_scale {guidance_scale} \
                     --seed {seed} \
                     --chunk_size {chunk_size} \
                     --overlap {overlap} \
                     --multiband_ensemble {str(multiband_ensemble).lower()} \
                     --input_cutoff {input_cutoff} \
                     --device {device}

# Nytt avsnitt