# üéπ MIDI Transcription with SoundLab

This notebook demonstrates how to transcribe audio to MIDI using Basic Pitch, then visualize and export the results.

**What you'll learn:**
- Loading piano or instrument recordings
- Configuring the transcription engine
- Generating MIDI from audio
- Visualizing with piano roll
- Exporting MIDI files

## Setup

Install SoundLab and import the necessary modules.

In [None]:
# Install SoundLab (uncomment if running in Colab)
# !pip install soundlab[transcription]


import numpy as np

from soundlab.io import load_audio
from soundlab.io.midi_io import save_midi
from soundlab.transcription import BasicPitchTranscriber, TranscriptionConfig
from soundlab.transcription.viz import plot_piano_roll

print("‚úÖ SoundLab imported successfully!")

## 1. Load Your Audio

Load a piano or instrument recording. For best results, use:
- Clean recordings without background noise
- Monophonic or simple polyphonic content
- Isolated stems (use stem separation first for mixed tracks)

In [None]:
# @title Upload or specify audio path
# @markdown Provide a path to a piano or instrument recording.

AUDIO_PATH = "piano_recording.wav"  # @param {type: "string"}

# For Colab: uncomment to upload
# from google.colab import files
# uploaded = files.upload()
# AUDIO_PATH = list(uploaded.keys())[0]

# Load the audio
audio = load_audio(AUDIO_PATH)

print(f"üìÅ Loaded: {AUDIO_PATH}")
print(f"   Duration: {audio.duration:.2f}s")
print(f"   Sample rate: {audio.sample_rate} Hz")
print(f"   Channels: {audio.channels}")

In [None]:
# Preview the audio
from IPython.display import Audio, display

print("üéß Original Audio:")
display(Audio(audio.samples.T, rate=audio.sample_rate))

## 2. Configure Transcription

Basic Pitch uses neural networks to detect pitch and notes. You can tune several parameters:

| Parameter | Description | Range |
|-----------|-------------|-------|
| `onset_threshold` | Note start sensitivity | 0.1 - 0.9 |
| `frame_threshold` | Note continuation sensitivity | 0.1 - 0.9 |
| `min_note_length` | Minimum note duration (seconds) | 0.01 - 0.5 |
| `min_freq` | Minimum frequency to detect (Hz) | 20 - 500 |
| `max_freq` | Maximum frequency to detect (Hz) | 1000 - 8000 |

In [None]:
# @title Transcription Configuration
# @markdown Tune the transcription parameters for your audio.

ONSET_THRESHOLD = 0.5  # @param {type: "slider", min: 0.1, max: 0.9, step: 0.05}
FRAME_THRESHOLD = 0.3  # @param {type: "slider", min: 0.1, max: 0.9, step: 0.05}
MIN_NOTE_LENGTH = 0.058  # @param {type: "slider", min: 0.01, max: 0.5, step: 0.01}
MIN_FREQ = 32  # @param {type: "slider", min: 20, max: 500, step: 10}
MAX_FREQ = 2000  # @param {type: "slider", min: 1000, max: 8000, step: 100}

# Create configuration
config = TranscriptionConfig(
    onset_threshold=ONSET_THRESHOLD,
    frame_threshold=FRAME_THRESHOLD,
    min_note_length=MIN_NOTE_LENGTH,
    min_freq=MIN_FREQ,
    max_freq=MAX_FREQ,
)

print("üéõÔ∏è Configuration:")
print(f"   Onset threshold: {config.onset_threshold}")
print(f"   Frame threshold: {config.frame_threshold}")
print(f"   Min note length: {config.min_note_length}s")
print(f"   Frequency range: {config.min_freq} - {config.max_freq} Hz")

## 3. Run Transcription

Now let's transcribe the audio to MIDI using Basic Pitch.

In [None]:
# Create the transcriber
transcriber = BasicPitchTranscriber(config)

# Run transcription
print("üéπ Transcribing audio to MIDI...")
result = transcriber.transcribe(audio)

print("\n‚úÖ Transcription complete!")
print(f"   Notes detected: {len(result.notes)}")
print(f"   Duration: {result.duration:.2f}s")
print(f"   Processing time: {result.processing_time:.2f}s")

## 4. Explore the Results

Let's look at the detected notes and their properties.

In [None]:
# Show first 10 notes
print("üìã First 10 notes:")
print(f"{'#':<4} {'Pitch':<6} {'Start':<8} {'Duration':<10} {'Velocity':<10}")
print("-" * 45)

for i, note in enumerate(result.notes[:10]):
    print(
        f"{i + 1:<4} {note.pitch:<6} {note.start_time:<8.3f} {note.duration:<10.3f} {note.velocity:<10}"
    )

In [None]:
# Statistics
pitches = [n.pitch for n in result.notes]
durations = [n.duration for n in result.notes]
velocities = [n.velocity for n in result.notes]

print("üìä Note Statistics:")
print(f"   Pitch range: {min(pitches)} - {max(pitches)} (MIDI)")
print(f"   Duration range: {min(durations):.3f} - {max(durations):.3f}s")
print(f"   Avg velocity: {np.mean(velocities):.1f}")
print(f"   Total note time: {sum(durations):.2f}s")

## 5. Visualize with Piano Roll

The piano roll visualization shows pitch on the Y-axis and time on the X-axis.

In [None]:
import matplotlib.pyplot as plt

# Plot piano roll
fig = plot_piano_roll(
    result.notes,
    title="Transcribed Piano Roll",
    figsize=(14, 6),
)
plt.tight_layout()
plt.show()

In [None]:
# @title Zoom into a specific time range
# @markdown Adjust to focus on a specific section.

START_TIME = 0.0  # @param {type: "number"}
END_TIME = 10.0  # @param {type: "number"}

# Filter notes in time range
notes_in_range = [n for n in result.notes if START_TIME <= n.start_time <= END_TIME]

fig = plot_piano_roll(
    notes_in_range,
    title=f"Piano Roll ({START_TIME:.1f}s - {END_TIME:.1f}s)",
    figsize=(14, 6),
)
plt.tight_layout()
plt.show()

print(f"üéµ Notes in range: {len(notes_in_range)}")

## 6. Export MIDI

Save the transcription as a standard MIDI file.

In [None]:
# @title Export Settings
# @markdown Configure the MIDI export.

OUTPUT_PATH = "transcribed.mid"  # @param {type: "string"}
TEMPO = 120  # @param {type: "integer"}

# Save MIDI
save_midi(result, OUTPUT_PATH, tempo=TEMPO)

print(f"üíæ Saved MIDI to: {OUTPUT_PATH}")
print(f"   Tempo: {TEMPO} BPM")
print(f"   Notes: {len(result.notes)}")

In [None]:
# For Colab: Download the MIDI file
# from google.colab import files
# files.download(OUTPUT_PATH)

## 7. Compare Parameters

Try different threshold values to see how they affect transcription quality.

In [None]:
# @title Parameter Comparison
# @markdown Compare transcription with different settings.

COMPARE_ONSET = [0.3, 0.5, 0.7]  # Different onset thresholds

results = {}
for onset in COMPARE_ONSET:
    cfg = TranscriptionConfig(onset_threshold=onset)
    transcriber = BasicPitchTranscriber(cfg)
    res = transcriber.transcribe(audio)
    results[onset] = res
    print(f"Onset={onset}: {len(res.notes)} notes detected")

print("\nüìä Lower threshold = more notes (possibly more false positives)")
print("üìä Higher threshold = fewer notes (possibly missing soft notes)")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(len(COMPARE_ONSET), 1, figsize=(14, 4 * len(COMPARE_ONSET)))

for ax, onset in zip(axes, COMPARE_ONSET, strict=True):
    notes = results[onset].notes
    for note in notes:
        ax.barh(
            note.pitch,
            note.duration,
            left=note.start_time,
            height=0.8,
            alpha=note.velocity / 127,
            color="steelblue",
        )
    ax.set_ylabel("MIDI Pitch")
    ax.set_title(f"Onset Threshold = {onset} ({len(notes)} notes)")

axes[-1].set_xlabel("Time (s)")
plt.tight_layout()
plt.show()

## üéâ Done!

You've successfully transcribed audio to MIDI using SoundLab.

**Next steps:**
- Import the MIDI file into your DAW
- Try the [Stem Separation](./stem_separation.ipynb) notebook to isolate instruments first
- Explore the [Voice Conversion](./voice_conversion.ipynb) notebook for TTS
- Check out the [SoundLab Studio](../soundlab_studio.ipynb) for the full pipeline

**Tips:**
- For polyphonic audio, separate stems first then transcribe each
- Lower thresholds for soft/quiet recordings
- Higher thresholds to reduce noise in noisy recordings
- Use `min_note_length` to filter out very short spurious notes