<a href="https://colab.research.google.com/github/wyattowalsh/soundlab/blob/main/soundlab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üéµ SoundLab v1.0

**Production-ready audio processing toolkit for stem separation, vocal isolation, drum transcription, and MIDI conversion.**

## Features
- üéöÔ∏è **Stem Separation** ‚Äî Split any song into vocals, drums, bass, and more
- üé§ **Vocal Isolation** ‚Äî Extract vocals or create instrumentals with one command
- ü•Å **Drum-to-MIDI** ‚Äî Transcribe drum patterns with kick/snare/hihat detection
- üéπ **Audio-to-MIDI** ‚Äî Convert melodies to MIDI using neural transcription
- üé® **Effects Processing** ‚Äî Apply compression, EQ, reverb, and more

---

## üöÄ Setup

Install SoundLab and verify GPU availability for faster processing.

In [None]:
# Install SoundLab from GitHub with all features
!pip install -q "soundlab[separation,transcription] @ git+https://github.com/wyattowalsh/soundlab.git"

# Check GPU availability
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"‚úÖ SoundLab installed | Device: {device.upper()}")

if device == "cpu":
    print("‚ö†Ô∏è  GPU recommended for faster processing. Go to Runtime > Change runtime type > GPU")

In [None]:
# Core imports
from pathlib import Path
from IPython.display import Audio, display, HTML

from soundlab.io import load_audio, save_audio
from soundlab.separation import StemSeparator, SeparationConfig, DemucsModel
from soundlab.transcription import DrumTranscriber, MIDITranscriber, TranscriptionConfig
from soundlab.transcription import DrumTranscriptionConfig, get_transcriber
from soundlab.io.midi_io import load_midi, save_midi

print("‚úÖ All modules imported successfully!")

## üìÅ Upload Your Audio

Upload an audio file (MP3, WAV, FLAC) or use the sample.

In [None]:
# @title Upload Audio File
# @markdown Upload your own audio or use a sample file.

USE_SAMPLE = True  # @param {type: "boolean"}

if USE_SAMPLE:
    # Download a sample audio file
    !wget -q -O sample.mp3 "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3"
    AUDIO_PATH = "sample.mp3"
    print("üì• Downloaded sample audio")
else:
    from google.colab import files
    uploaded = files.upload()
    AUDIO_PATH = list(uploaded.keys())[0]
    print(f"üì§ Uploaded: {AUDIO_PATH}")

# Load and preview
audio = load_audio(AUDIO_PATH)
print(f"\nüìä Audio Info:")
print(f"   Duration: {audio.duration:.1f}s")
print(f"   Sample Rate: {audio.sample_rate} Hz")
print(f"   Channels: {audio.channels}")

print("\nüéß Preview:")
display(Audio(audio.samples.T, rate=audio.sample_rate))

---
## üéöÔ∏è Stem Separation

Split your audio into individual stems using Demucs neural network models.

| Model | Stems | Quality | Speed |
|-------|-------|---------|-------|
| `htdemucs_ft` | 4 (vocals, drums, bass, other) | Best | Fast |
| `htdemucs_6s` | 6 (+piano, guitar) | Best | Slower |
| `mdx_extra` | 4 | Good | Fastest |

In [None]:
# @title Configure Stem Separation
# @markdown Choose your model and settings.

MODEL = "htdemucs_ft"  # @param ["htdemucs", "htdemucs_ft", "htdemucs_6s", "mdx_extra"]
DEVICE = "auto"  # @param ["auto", "cuda", "cpu"]

# Create separator
config = SeparationConfig(
    model=DemucsModel(MODEL),
    device=DEVICE,
)
separator = StemSeparator(config)

print(f"üéõÔ∏è Model: {MODEL}")
print(f"   Device: {DEVICE}")
print(f"   Stems: {config.model.stems}")

In [None]:
# @title Run Separation
# @markdown This may take 1-5 minutes depending on audio length and GPU.

OUTPUT_DIR = Path("stems")
OUTPUT_DIR.mkdir(exist_ok=True)

print("üé∂ Separating stems...")
result = separator.separate(AUDIO_PATH, OUTPUT_DIR)

print(f"\n‚úÖ Separation complete in {result.processing_time:.1f}s")
print(f"\nüìÇ Output stems:")
for stem_name, stem_path in result.stems.items():
    print(f"   {stem_name}: {stem_path}")

In [None]:
# @title Preview Separated Stems
# @markdown Listen to each stem individually.

import soundfile as sf

for stem_name, stem_path in result.stems.items():
    audio_data, sr = sf.read(stem_path)
    print(f"\nüéß {stem_name.upper()}")
    display(Audio(audio_data.T if audio_data.ndim > 1 else audio_data, rate=sr))

---
## üé§ Vocal Isolation

**New in v1.0!** Extract just vocals and instrumental with two-stem mode.

In [None]:
# @title Isolate Vocals
# @markdown Extract vocals and create an instrumental track.

# Configure for vocal isolation
vocal_config = SeparationConfig(
    model=DemucsModel.HTDEMUCS_FT,
    two_stems="vocals",  # Key setting for vocal isolation
)
vocal_separator = StemSeparator(vocal_config)

VOCAL_OUTPUT = Path("vocal_isolation")
VOCAL_OUTPUT.mkdir(exist_ok=True)

print("üé§ Isolating vocals...")
vocal_result = vocal_separator.separate(AUDIO_PATH, VOCAL_OUTPUT)

print(f"\n‚úÖ Vocal isolation complete!")
print(f"\nüìÇ Output:")
for stem_name, stem_path in vocal_result.stems.items():
    print(f"   {stem_name}: {stem_path}")

In [None]:
# @title Preview Vocals vs Instrumental

# Load and play vocals
vocals_path = vocal_result.stems.get("vocals")
if vocals_path and vocals_path.exists():
    vocals_audio, sr = sf.read(vocals_path)
    print("üé§ VOCALS")
    display(Audio(vocals_audio.T if vocals_audio.ndim > 1 else vocals_audio, rate=sr))

# Load and play instrumental (no_vocals)
instrumental_path = vocal_result.stems.get("no_vocals")
if instrumental_path and instrumental_path.exists():
    instrumental_audio, sr = sf.read(instrumental_path)
    print("\nüé∏ INSTRUMENTAL")
    display(Audio(instrumental_audio.T if instrumental_audio.ndim > 1 else instrumental_audio, rate=sr))

---
## ü•Å Drum-to-MIDI Transcription

**New in v1.0!** Convert drum tracks to MIDI with automatic kick/snare/hihat detection.

In [None]:
# @title Configure Drum Transcription
# @markdown Tune parameters for your drum audio.

ONSET_THRESHOLD = 0.3  # @param {type: "slider", min: 0.1, max: 0.9, step: 0.1}
MIN_NOTE_LENGTH = 0.02  # @param {type: "slider", min: 0.01, max: 0.1, step: 0.01}
VELOCITY_SCALE = 1.2  # @param {type: "slider", min: 0.5, max: 2.0, step: 0.1}

drum_config = DrumTranscriptionConfig(
    onset_threshold=ONSET_THRESHOLD,
    min_note_length=MIN_NOTE_LENGTH,
    velocity_scale=VELOCITY_SCALE,
)

print("ü•Å Drum Transcription Config:")
print(f"   Onset threshold: {drum_config.onset_threshold}")
print(f"   Min note length: {drum_config.min_note_length}s")
print(f"   Velocity scale: {drum_config.velocity_scale}")
print(f"\n   MIDI Mapping:")
print(f"   Kick: {drum_config.kick_note} | Snare: {drum_config.snare_note} | Hi-hat: {drum_config.hihat_closed_note}")

In [None]:
# @title Transcribe Drums to MIDI
# @markdown Uses the separated drum stem from earlier.

# Get drum stem path
drum_stem = result.stems.get("drums")

if drum_stem and Path(drum_stem).exists():
    drum_transcriber = DrumTranscriber()
    
    MIDI_OUTPUT = Path("midi_output")
    MIDI_OUTPUT.mkdir(exist_ok=True)
    
    print("üéπ Transcribing drums to MIDI...")
    midi_result = drum_transcriber.transcribe(drum_stem, MIDI_OUTPUT)
    
    print(f"\n‚úÖ Transcription complete!")
    print(f"   Notes detected: {len(midi_result.notes)}")
    print(f"   Output: {midi_result.path}")
    print(f"   Processing time: {midi_result.processing_time:.2f}s")
else:
    print("‚ö†Ô∏è Run stem separation first to get a drum track")

In [None]:
# @title Analyze Drum Hits
# @markdown View detected drum events by type.

if 'midi_result' in dir() and midi_result.notes:
    # Count by pitch (drum type)
    from collections import Counter
    
    pitch_counts = Counter(n.pitch for n in midi_result.notes)
    
    DRUM_NAMES = {36: "Kick", 38: "Snare", 42: "Hi-hat (closed)", 46: "Hi-hat (open)"}
    
    print("üìä Drum Hit Analysis:")
    print(f"   Total hits: {len(midi_result.notes)}")
    print("\n   By type:")
    for pitch, count in sorted(pitch_counts.items()):
        name = DRUM_NAMES.get(pitch, f"MIDI {pitch}")
        print(f"   {name}: {count} hits")
    
    # Show first few events
    print("\n   First 10 events:")
    for i, note in enumerate(midi_result.notes[:10]):
        name = DRUM_NAMES.get(note.pitch, f"MIDI {note.pitch}")
        print(f"   {note.start:.3f}s - {name} (vel: {note.velocity})")
else:
    print("‚ö†Ô∏è Run drum transcription first")

---
## üéπ Melodic Audio-to-MIDI

Transcribe melodic content (vocals, piano, bass) to MIDI using neural pitch detection.

In [None]:
# @title Transcribe Melody to MIDI
# @markdown Works best on isolated stems (vocals, bass, piano).

STEM_TO_TRANSCRIBE = "bass"  # @param ["vocals", "bass", "other"]

stem_path = result.stems.get(STEM_TO_TRANSCRIBE)

if stem_path and Path(stem_path).exists():
    # Get the best available transcriber for your Python version
    melodic_transcriber = get_transcriber(backend="auto")
    
    print(f"üéπ Transcribing {STEM_TO_TRANSCRIBE} to MIDI...")
    print(f"   Using: {type(melodic_transcriber).__name__}")
    
    melodic_result = melodic_transcriber.transcribe(stem_path, MIDI_OUTPUT)
    
    print(f"\n‚úÖ Transcription complete!")
    print(f"   Notes detected: {len(melodic_result.notes)}")
    print(f"   Output: {melodic_result.path}")
else:
    print(f"‚ö†Ô∏è Stem '{STEM_TO_TRANSCRIBE}' not found. Run separation first.")

---
## üíæ Download Results

Download all separated stems and MIDI files as a ZIP archive.

In [None]:
# @title Create Download Package
# @markdown Bundle all outputs into a ZIP file.

import shutil
from google.colab import files

# Create output bundle
BUNDLE_DIR = Path("soundlab_output")
BUNDLE_DIR.mkdir(exist_ok=True)

# Copy stems
if OUTPUT_DIR.exists():
    for f in OUTPUT_DIR.glob("*"):
        shutil.copy(f, BUNDLE_DIR)

# Copy vocal isolation
if VOCAL_OUTPUT.exists():
    for f in VOCAL_OUTPUT.glob("*"):
        shutil.copy(f, BUNDLE_DIR)

# Copy MIDI
if MIDI_OUTPUT.exists():
    for f in MIDI_OUTPUT.glob("*.mid"):
        shutil.copy(f, BUNDLE_DIR)

# Create ZIP
zip_path = shutil.make_archive("soundlab_output", "zip", BUNDLE_DIR)
print(f"üì¶ Created: {zip_path}")
print(f"\nüìÇ Contents:")
for f in BUNDLE_DIR.glob("*"):
    print(f"   {f.name}")

# Download
print("\n‚¨áÔ∏è Starting download...")
files.download(zip_path)

---
## üìö What's Next?

**More Examples:**
- [Stem Separation Deep Dive](./notebooks/examples/stem_separation.ipynb)
- [MIDI Transcription Guide](./notebooks/examples/midi_transcription.ipynb)
- [Voice Conversion](./notebooks/examples/voice_conversion.ipynb)

**Resources:**
- üìñ [Documentation](https://github.com/wyattowalsh/soundlab)
- üêõ [Report Issues](https://github.com/wyattowalsh/soundlab/issues)
- ‚≠ê [Star on GitHub](https://github.com/wyattowalsh/soundlab)

---

**SoundLab v1.0** ‚Äî Production-ready audio processing for stem separation, vocal isolation, and MIDI transcription.