**Content-only music recommender that uses STFT-derived features and MFCCs to return nearest neighbors.**


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**COMPILING AUDIO DATASET USING FMA DATASET**

Installs and imports all required libraries for data and audio processing. The !pip command ensures the correct versions of pandas, librosa, and soundfile are installed for consistent behavior. The imports load general utilities (os, io, zipfile, random, csv, math), numerical and data tools (numpy, pandas), and audio libraries (librosa, soundfile) used for reading, analyzing, and saving WAV files.

In [None]:
# FMA-small → WAV mini-dataset builder
# Defaults: N_SAMPLES = 40, SR = 22050, MONO WAV, 30 s clips

!pip -q install pandas==2.2.2 librosa==0.10.2.post1 soundfile==0.12.1

import os, io, zipfile, random, csv, math
import numpy as np
import pandas as pd
import librosa, soundfile as sf

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m260.1/260.1 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25h

Defines key settings for the mini audio dataset. The code specifies how many tracks to sample (N_SAMPLES), sets a uniform sampling rate of 22,050 Hz, fixes each clip length at 30 seconds, and locks randomness with a seed for reproducibility. It also prepares output folders—project_dataset/clips—ensuring they exist before any files are saved.

In [None]:
# Settings

N_SAMPLES   = 40           # choose any 30–80
SR          = 22050        # target sample rate
CLIP_SEC    = 30           # enforce 30 s
SEED        = 123

OUT_DIR     = "project_dataset"
CLIPS_DIR   = os.path.join(OUT_DIR, "clips")
os.makedirs(CLIPS_DIR, exist_ok=True)

Specifies download links for the FMA-small audio dataset and its metadata, then defines a helper function wget() to fetch them safely. The function uses Python’s subprocess to run a shell command that quietly downloads a file while showing progress. If the download fails, it raises an error to alert the user, ensuring that missing or corrupted files are caught early.

In [None]:
# FMA-small audio and metadata zips

AUDIO_ZIP_URL = "https://os.unil.cloud.switch.ch/fma/fma_small.zip"
META_ZIP_URL  = "https://os.unil.cloud.switch.ch/fma/fma_metadata.zip"

def wget(url, out):
    import subprocess, shlex, textwrap
    cmd = f"wget -q --show-progress -O {shlex.quote(out)} {shlex.quote(url)}"
    print(f"Downloading: {url}")
    r = subprocess.run(cmd, shell=True)
    if r.returncode != 0:
        raise RuntimeError(f"wget failed for {url}")

Downloads the FMA-small dataset directly from its online source using the wget command. The exclamation mark runs the command in Colab’s shell environment, fetching the ZIP file so it can later be extracted and processed for feature generation.

In [None]:
!wget https://os.unil.cloud.switch.ch/fma/fma_small.zip

--2025-10-31 14:25:56--  https://os.unil.cloud.switch.ch/fma/fma_small.zip
Resolving os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)... 86.119.28.16, 2001:620:5ca1:201::214
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7679594875 (7.2G) [application/zip]
Saving to: ‘fma_small.zip’


2025-10-31 14:30:00 (30.1 MB/s) - ‘fma_small.zip’ saved [7679594875/7679594875]



Checks whether the FMA-small and metadata ZIP files already exist in the workspace. If they don’t, the wget() function is called to download them. This prevents redundant downloads and ensures both the audio dataset and its metadata are available for later processing.

In [None]:
# Download missing zips

if not os.path.exists("fma_small.zip"):
    wget(AUDIO_ZIP_URL, "fma_small.zip")
if not os.path.exists("fma_metadata.zip"):
    wget(META_ZIP_URL, "fma_metadata.zip")

Downloading: https://os.unil.cloud.switch.ch/fma/fma_metadata.zip


Handles extraction of the downloaded audio dataset. If the fma_small folder doesn’t exist, it opens the ZIP file and extracts its contents to the current directory, displaying a short progress message. If the folder is already present, it simply skips extraction to save time and avoid overwriting existing files.

In [None]:
# Unzip Audio to ./fma_small/

if not os.path.exists("fma_small"):
    print("Extracting fma_small.zip ...")
    with zipfile.ZipFile("fma_small.zip", "r") as z:
        z.extractall(".")
else:
    print("Audio already extracted.")

Audio already extracted.


Extracts the metadata ZIP file containing track information such as genre labels and artist details. If the fma_metadata folder doesn’t exist, it unzips the contents into the current directory; otherwise, it skips extraction and confirms that the metadata is already available.

In [None]:
# Unzip metadata to ./fma_metadata/

if not os.path.exists("fma_metadata"):
    print("Extracting fma_metadata.zip ...")
    with zipfile.ZipFile("fma_metadata.zip", "r") as z:
        z.extractall(".")
else:
    print("Metadata already extracted.")

Extracting fma_metadata.zip ...


Loads the main metadata file tracks.csv using pandas, which contains detailed information about each track in the FMA dataset. The CSV has a hierarchical column structure, so the code flattens it by joining each level (e.g., ('track', 'title')) into a single string like track.title. This makes the DataFrame easier to work with when selecting or filtering specific attributes later in the workflow.

In [None]:
# tracks.csv is a multi-index column CSV so flatten it for easy access.

tracks_csv_path = "fma_metadata/tracks.csv"
tracks = pd.read_csv(tracks_csv_path, header=[0,1], index_col=0)

# Flatten multiindex columns like ('track','title') → 'track.title'
tracks.columns = ['{}.{}'.format(a,b) for a,b in tracks.columns.values]

Selects only the relevant metadata fields needed for the recommender—track title, main genre, and artist name. It first checks that these columns exist in the dataset and raises an error if any are missing. Then it copies those columns into a new DataFrame called meta and renames them to simpler labels (title, artist, genre) for easier reference in later processing and output.

In [None]:
# Keeping only the fields we need

keep_cols = [
    'track.title',
    'track.genre_top',
    'artist.name'
]
missing_cols = [c for c in keep_cols if c not in tracks.columns]
if missing_cols:
    raise ValueError(f"Missing expected columns in tracks.csv: {missing_cols}")

meta = tracks[keep_cols].copy()
meta.rename(columns={
    'track.title': 'title',
    'artist.name': 'artist',
    'track.genre_top': 'genre'
}, inplace=True)

Fills any missing metadata values with the placeholder 'unknown'. This ensures that every track has a defined title, artist, and genre, preventing errors or empty fields during later stages like sampling, labeling, or displaying recommendations.

In [None]:
# Substitute rows may have NaNs with 'unknown'

meta['title']  = meta['title'].fillna('unknown')
meta['artist'] = meta['artist'].fillna('unknown')
meta['genre']  = meta['genre'].fillna('unknown')

Defines a helper function to locate each track’s MP3 file within the nested FMA-small folder structure, where filenames follow a six-digit ID format. It then loops through all track IDs in the metadata, checks if each corresponding audio file exists, and collects the valid ones in available. Finally, it prints how many tracks actually have audio files available for processing.

In [None]:
# discover available MP3 files
# Audio files are named by 6-digit track id in nested folders:
# fma_small/000/000002.mp3 for track_id 2, etc.

def id_to_mp3_path(tid: int) -> str:
    tid6 = f"{tid:06d}"
    folder = tid6[:3]
    return os.path.join("fma_small", folder, f"{tid6}.mp3")

available = []
for tid in meta.index.tolist():
    p = id_to_mp3_path(int(tid))
    if os.path.exists(p):
        available.append(int(tid))

print(f"Tracks with audio present: {len(available)}")


Tracks with audio present: 5230


Selects a random subset of tracks from the available audio files to create a manageable mini-dataset. The random seed ensures the same selection each time for reproducibility. If no audio files are found, it raises an error to flag the issue. Otherwise, it randomly picks up to N_SAMPLES tracks and prints how many were selected.

In [None]:
# Sampling a subset

random.seed(SEED)
if len(available) == 0:
    raise RuntimeError("No audio files found; check that fma_small extracted correctly.")
pick = random.sample(available, k=min(N_SAMPLES, len(available)))
print(f"Sampling {len(pick)} tracks.")

Sampling 40 tracks.


Creates 30-second WAV clips and builds a matching metadata table. For each randomly chosen track, it loads the MP3 file with librosa, resampling to 22,050 Hz, converting to mono, and trimming or padding the waveform to ensure a consistent 30-second length. The processed audio is then saved as a 16-bit PCM WAV file in the clips folder. Alongside this, key metadata track ID, title, artist, genre, and file path is gathered into a rows list that will later form the metadata.csv file.

In [None]:
# Export WAV + build metadata.csv

rows = [("track_id","title","artist","genre","path")]
target_len = SR * CLIP_SEC

for i, tid in enumerate(pick, start=1):
    mp3_path = id_to_mp3_path(tid)
    # Load MP3, resample to SR, mono, duration 30 s
    # librosa.load will trim to duration, but we ensure exact length by pad/truncate
    y, sr = librosa.load(mp3_path, sr=SR, mono=True, duration=CLIP_SEC)
    if len(y) < target_len:
        y = np.pad(y, (0, target_len - len(y)))
    elif len(y) > target_len:
        y = y[:target_len]

    # Write as WAV PCM_16
    out_name = f"{i:03d}.wav"
    out_path = os.path.join(CLIPS_DIR, out_name)
    sf.write(out_path, y.astype(np.float32), SR, subtype="PCM_16")

    # Metadata fields
    m = meta.loc[tid]
    title  = str(m['title'])
    artist = str(m['artist'])
    genre  = str(m['genre']) if pd.notna(m['genre']) else 'unknown'

    rows.append((str(tid), title, artist, genre, f"clips/{out_name}"))


Writes all collected track information to a CSV file for easy reference. It saves the rows list containing each clip’s ID, title, artist, genre, and relative path to metadata.csv inside the project folder. The print statements confirm successful completion, showing where the WAV clips and metadata file were stored.

In [None]:
# Save CSV
meta_csv_path = os.path.join(OUT_DIR, "metadata.csv")
with open(meta_csv_path, "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

print("\nDone.")
print(f"Saved WAV clips → {CLIPS_DIR}")
print(f"Metadata CSV   → {meta_csv_path}")


Done.
Saved WAV clips → project_dataset/clips
Metadata CSV   → project_dataset/metadata.csv


Displays the first few entries of the rows list to verify that metadata and file paths were recorded correctly. This quick check helps confirm that each WAV file has its corresponding track ID, title, artist, and genre before proceeding to feature extraction or further analysis.

In [None]:
# Peek at first ten rows
for r in rows[:10]:
    print(r)

('track_id', 'title', 'artist', 'genre', 'path')
('10440', 'No Gravity Blues', 'Big Blood', 'Folk', 'clips/001.wav')
('53576', 'River', 'Mary Lorson', 'Folk', 'clips/002.wav')
('14653', 'Bad Sign', 'Brad Sucks', 'Rock', 'clips/003.wav')
('73761', 'Weapons', 'Dad Rocks!', 'International', 'clips/004.wav')
('53229', 'Cocaine', 'Fruit Flesh', 'Pop', 'clips/005.wav')
('20667', 'surprisingly upbeat about it', 'Vim', 'Electronic', 'clips/006.wav')
('6603', 'The Woods', 'Height With Friends', 'Hip-Hop', 'clips/007.wav')
('69833', 'Dreamflower (Keep The Dream Alive) (Instrumental)', 'The Honorable Sleaze', 'Hip-Hop', 'clips/008.wav')
('98205', 'Mankind', 'Ezylohm_tek', 'Electronic', 'clips/009.wav')


**Short-Time Fourier Transform (STFT)**

Compute STFT magnitude spectrogram |X(n,k)|, with typical parameters: FFT size 2048, hop length 512, Hann window. Derive summary features like spectral centroid, roll-off, flatness, and zero-crossing rate from time frames, then aggregate with mean and standard deviation across time.

Installs and imports all essential libraries for the feature extraction and recommendation phase. The !pip command ensures consistent versions of key packages like librosa, soundfile, and scikit-learn. Core modules (os, math, json, random, warnings) handle file operations and configuration, while dataclasses helps structure feature records. NumPy and pandas support numerical and tabular processing, tqdm provides progress bars, and librosa manages audio analysis and visualization. StandardScaler and cosine_similarity from scikit-learn are later used to normalize features and compute track similarity. The last line suppresses warning messages for cleaner notebook output.

In [None]:
!pip install librosa==0.10.1 soundfile==0.12.1 numpy pandas scikit-learn tqdm --quiet

import os
import math
import json
import random
import warnings
from dataclasses import dataclass

import numpy as np
import pandas as pd
from tqdm import tqdm

import librosa
import librosa.display

from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

warnings.filterwarnings("ignore")


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/253.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m245.8/253.7 kB[0m [31m7.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.7/253.7 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h

Sets up reproducibility and file paths for the feature extraction process. The seed values ensure that any random sampling or ordering produces identical results across runs. It then defines key directory paths: DATA_DIR for the project’s main data folder, AUDIO_DIR for the audio clips, and TRACKS_CSV and FEATURES_CSV for storing track metadata and extracted audio features respectively.

In [None]:
# Reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)

# Paths
DATA_DIR = "/content/data"
AUDIO_DIR = os.path.join(DATA_DIR, "audio")
TRACKS_CSV = os.path.join(DATA_DIR, "tracks.csv")
FEATURES_CSV = os.path.join(DATA_DIR, "features.csv")

Defines the core audio analysis and evaluation settings for the recommender system. The sample rate (SR) and clip duration determine how audio is processed, while N_FFT, HOP_LENGTH, and WIN configure the Short-Time Fourier Transform used to extract spectral features. Parameters like N_MELS and N_MFCC set the number of mel bands and MFCC coefficients to compute. Flags such as INCLUDE_DELTAS, COMPUTE_TEMPO, and COMPUTE_CHROMA control optional feature extraction steps. TOPK specifies how many similar tracks to return, and the final two flags toggle whether to evaluate similarity by genre or artist.

In [None]:
# Audio/Feature parameters

SR = 22050
CLIP_SECONDS = 30

N_FFT = 2048
HOP_LENGTH = 512
WIN = "hann"

N_MELS = 128
N_MFCC = 13

INCLUDE_DELTAS = False # Set True to append Δ and ΔΔ MFCCs

# Optional add-ons
COMPUTE_TEMPO = True
COMPUTE_CHROMA = False # Set True to add key-aware scalar features

TOPK = 5 # Number of neighbors to return

# Evaluation
EVAL_GENRE = True
EVAL_ARTIST = True

Defines lightweight data structures and helpers for getting audio into a consistent form and extracting core spectral features. TrackMeta is a small record that holds the fields your pipeline needs per track (ID, title, artist, genre, path). load_tracks_csv reads the tracks CSV, validates required columns, and returns a list of TrackMeta objects as clean, string typed entries so downstream code doesn’t break on NaNs.
load_audio_centered loads mono audio at the target sample rate, then takes a centered window if the file is longer than the desired clip length. It trims leading/trailing silence (30 dB threshold) and peak normalizes so loud files don’t dominate feature scales. The function returns the waveform and sample rate ready for analysis.
spectral_features computes a magnitude STFT, derives a power spectrogram, and then summarizes framewise descriptors spectral centroid, roll-off (85%), flatness, and zero-crossing rate by their mean and standard deviation to get time invariant features per clip. If tempo is enabled, it builds an onset envelope, estimates tempo robustly (handling empty/failed cases by returning NaN), and appends it to the feature dict. It returns both the feature dictionary and the power spectrogram, which can be reused for mel or other downstream calculations.

In [None]:
# Preprocessing, feature extraction, aggregation, and I/O

@dataclass
class TrackMeta:
    track_id: str
    title: str
    artist: str
    genre: str
    path: str


def load_tracks_csv(csv_path: str) -> list:
    df = pd.read_csv(csv_path)
    required = {"track_id", "title", "artist", "genre", "path"}
    missing = required - set(df.columns)
    if missing:
        raise ValueError(f"tracks.csv missing columns: {missing}")
    rows = []
    for _, r in df.iterrows():
        rows.append(TrackMeta(
            track_id=str(r["track_id"]),
            title=str(r["title"]),
            artist=str(r["artist"]),
            genre=str(r["genre"]) if not pd.isna(r["genre"]) else "",
            path=str(r["path"]) ))
    return rows


def load_audio_centered(path: str, sr: int = SR, clip_seconds: int = CLIP_SECONDS):
    """Load mono audio and take a centered window of length clip_seconds if audio is longer."""
    y, sr = librosa.load(path, sr=sr, mono=True)
    target_len = clip_seconds * sr
    if len(y) > target_len:
        start = (len(y) - target_len) // 2
        y = y[start:start+target_len]
    y, _ = librosa.effects.trim(y, top_db=30)   # Simple silence trimming at edges
    peak = np.max(np.abs(y)) if len(y) else 1.0   # Peak normalize to prevent loudness dominance
    if peak > 0:
        y = y / peak
    return y, sr


def spectral_features(y: np.ndarray, sr: int):
    S = np.abs(librosa.stft(y, n_fft=N_FFT, hop_length=HOP_LENGTH, window=WIN))
    # Power spectrogram for mel and contrast
    S_power = (S ** 2)

    # Spectral stats over time
    centroid = librosa.feature.spectral_centroid(S=S, sr=sr)
    rolloff = librosa.feature.spectral_rolloff(S=S, sr=sr, roll_percent=0.85)
    flatness = librosa.feature.spectral_flatness(S=S)
    zcr = librosa.feature.zero_crossing_rate(y, frame_length=N_FFT, hop_length=HOP_LENGTH)

    feats = {
        "centroid_mean": float(np.mean(centroid)), "centroid_std": float(np.std(centroid)),
        "rolloff_mean": float(np.mean(rolloff)), "rolloff_std": float(np.std(rolloff)),
        "flatness_mean": float(np.mean(flatness)), "flatness_std": float(np.std(flatness)),
        "zcr_mean": float(np.mean(zcr)), "zcr_std": float(np.std(zcr)),
    }

    # Tempo
    if COMPUTE_TEMPO:
        try:
            onset_env = librosa.onset.onset_strength(y=y, sr=sr, hop_length=HOP_LENGTH)
            if onset_env is None or len(onset_env) == 0:
                tempo_val = np.nan
            else:
                tempo_est = librosa.beat.tempo(onset_envelope=onset_env, sr=sr, hop_length=HOP_LENGTH, aggregate=np.median)
                # tempo_est can be a scalar or ndarray([value])
                if isinstance(tempo_est, np.ndarray):
                    tempo_val = float(tempo_est.flat[0]) if tempo_est.size > 0 else np.nan
                else:
                    tempo_val = float(tempo_est)
        except Exception:
              tempo_val = np.nan
        # Ensure feats is a dictionary before adding to it
        if not isinstance(feats, dict):
            feats = {}
        feats["tempo"] = tempo_val

    return feats, S_power

Implements an auxiliary feature extractor that focuses on pitch related information. It loads the audio file with librosa, computes its Short-Time Fourier Transform, and squares the magnitude to get a power spectrogram. If chroma computation is enabled, it derives a 12-dimensional chroma vector (one per semitone) averaged across time to capture harmonic and key characteristics of the track. In case of errors or missing values, it fills them with zeros for numerical stability. The function returns both the cleaned feature dictionary and the power spectrogram for any later processing.

In [None]:
def extract_features(filepath, COMPUTE_CHROMA=True):
    feats = {}

    y, sr = librosa.load(filepath)
    S = np.abs(librosa.stft(y))
    S_power = S**2

    # Chroma/key
    if COMPUTE_CHROMA:
        try:
            chroma = librosa.feature.chroma_stft(S=S_power, sr=sr)
            chroma_mean = np.mean(chroma, axis=1)
            for i, v in enumerate(chroma_mean):
                feats[f"chroma_mean_{i}"] = float(v)
        except Exception:
            for i in range(12):
                feats[f"chroma_mean_{i}"] = np.nan

    # Clean up NaN/Inf
    for k, v in list(feats.items()):
        if v is None or (isinstance(v, float) and (np.isnan(v) or np.isinf(v))):
            feats[k] = 0.0

    return feats, S_power


Creates a data directory and downloads the FMA metadata ZIP file into it. The !mkdir -p command ensures the /content/data folder exists, wget fetches the metadata archive from the official source, and unzip -o extracts its contents into the same directory, overwriting old files if needed. This prepares the metadata for use in feature extraction and analysis.

In [None]:
!mkdir -p /content/data
!wget https://os.unil.cloud.switch.ch/fma/fma_metadata.zip -O /content/data/fma_metadata.zip
!unzip -o /content/data/fma_metadata.zip -d /content/data


--2025-10-31 14:52:52--  https://os.unil.cloud.switch.ch/fma/fma_metadata.zip
Resolving os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)... 86.119.28.16, 2001:620:5ca1:201::214
Connecting to os.unil.cloud.switch.ch (os.unil.cloud.switch.ch)|86.119.28.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 358412441 (342M) [application/zip]
Saving to: ‘/content/data/fma_metadata.zip’


2025-10-31 14:53:06 (25.6 MB/s) - ‘/content/data/fma_metadata.zip’ saved [358412441/358412441]

Archive:  /content/data/fma_metadata.zip
 bunzipping: /content/data/fma_metadata/README.txt  
 bunzipping: /content/data/fma_metadata/checksums  
 bunzipping: /content/data/fma_metadata/not_found.pickle  
 bunzipping: /content/data/fma_metadata/raw_genres.csv  
 bunzipping: /content/data/fma_metadata/raw_albums.csv  
 bunzipping: /content/data/fma_metadata/raw_artists.csv  
 bunzipping: /content/data/fma_metadata/raw_tracks.csv  
 bunzipping: /content/data/fma_metadata/tracks.csv  
 bunzip

Lists the tracks.csv file inside the metadata folder to confirm that it was successfully extracted. The -lh flag displays the file’s size and permissions in a readable format, while grep tracks.csv filters the output to show only that specific file entry.

In [None]:
!ls -lh /content/data/fma_metadata/tracks.csv | grep tracks.csv

-r--r--r-- 1 root root 249M Apr  1  2017 /content/data/fma_metadata/tracks.csv


Loads detailed track metadata and links each entry to its corresponding MP3 file in the FMA-small dataset. The function reads the multi-index CSV structure, extracts each track’s title, top genre, and artist, then constructs the correct audio file path using the track ID format (000/000002.mp3). It returns a list of dictionaries holding all relevant info. The final line filters this list to include only records whose audio files actually exist, ensuring later processing runs only on valid data.

In [None]:
AUDIO_ROOT = "/content/fma_small"

def load_tracks_csv(csv_path: str):
    # Load multi-index CSV
    df = pd.read_csv(csv_path, header=[0, 1], index_col=0)

    rows = []
    for track_id, row in df.iterrows():
        title = row[('track', 'title')]
        genre = row[('track', 'genre_top')]
        artist = row[('artist', 'name')]

        # Build path to audio file for FMA small
        track_id_int = int(track_id)
        folder = f"{track_id_int:03d}"[:3]
        filename = f"{track_id_int:06d}.mp3"
        audio_path = os.path.join("/content/fma_small", folder, filename)

        rows.append({
            "track_id": track_id,
            "title": title,
            "artist": artist,
            "genre": genre,
            "path": audio_path
        })
    return rows

TRACKS_CSV = "/content/data/fma_metadata/tracks.csv"
metas = load_tracks_csv(TRACKS_CSV)
metas_existing = [m for m in metas if os.path.exists(m["path"])]

Creates a dedicated data directory if it doesn’t already exist, ensuring a clean location for outputs. It then defines the full path for features_parallel.csv, which will store all extracted audio features once the feature extraction process is completed.

In [None]:
DATA_DIR = "/content/data"
os.makedirs(DATA_DIR, exist_ok=True)
FEATURES_CSV = os.path.join(DATA_DIR, "features_parallel.csv")

Implements the main MFCC-based feature extraction pipeline for each audio track. The _mget helper safely retrieves values from either dictionaries or objects, allowing flexibility in how metadata is stored.
mfcc_features() converts a power spectrogram into a mel-scaled representation, then takes its logarithm to approximate human hearing perception. From this log-mel matrix, it computes 13 Mel-Frequency Cepstral Coefficients (MFCCs), which summarize the spectral envelope of the sound. Each coefficient’s mean and standard deviation are calculated across time, forming a compact, time-invariant descriptor for the clip. If enabled, delta and delta-delta features capture how these coefficients change over time. Finally, invalid or infinite values are replaced with zeros for numerical stability.
extract_features_for_track() loads a centered 30-second clip, extracts spectral and MFCC features, merges them with track metadata, and returns the complete feature dictionary. If an audio file is missing or an error occurs, the function prints a warning but continues running, ensuring the pipeline remains robust during batch processing.

In [None]:
# MFCC FEATURES

from typing import Dict, Optional, Any

def _mget(meta, key, default=None):
    """Helper to get key from dict or attribute from object."""
    if isinstance(meta, dict):
        return meta.get(key, default)
    return getattr(meta, key, default)


def mfcc_features(S_power: np.ndarray, sr: int) -> Dict[str, float]:
    """
    Compute MFCC stats from a power spectrogram S_power (|STFT|^2).
    """
    # Mel-spectrogram (power)
    mel = librosa.feature.melspectrogram(S=S_power, sr=sr, n_mels=N_MELS)

    # Log-mel in dB
    log_mel = librosa.power_to_db(mel + 1e-10)

    # MFCC directly from log-mel
    mfcc = librosa.feature.mfcc(S=log_mel, sr=sr, n_mfcc=N_MFCC)

    feats: Dict[str, float] = {}

    # Aggregate mean and std across time for each coefficient
    for i in range(N_MFCC):
        coeff = mfcc[i]
        feats[f"mfcc_{i+1}_mean"] = float(np.nanmean(coeff))
        feats[f"mfcc_{i+1}_std"]  = float(np.nanstd(coeff))

    if INCLUDE_DELTAS:
        d1 = librosa.feature.delta(mfcc)
        d2 = librosa.feature.delta(mfcc, order=2)
        for i in range(N_MFCC):
            feats[f"mfcc_delta_{i+1}_mean"]       = float(np.nanmean(d1[i]))
            feats[f"mfcc_delta_{i+1}_std"]        = float(np.nanstd(d1[i]))
            feats[f"mfcc_deltadelta_{i+1}_mean"]  = float(np.nanmean(d2[i]))
            feats[f"mfcc_deltadelta_{i+1}_std"]   = float(np.nanstd(d2[i]))

    # Final cleanup: replace NaN/Inf with 0.0
    for k, v in list(feats.items()):
        if v is None or (isinstance(v, float) and (np.isnan(v) or np.isinf(v))):
            feats[k] = 0.0

    return feats

def extract_features_for_track(meta: Dict[str, Any]) -> Optional[Dict[str, Any]]:
    try:
        # Load a centered clip
        path = _mget(meta, "path")
        if not path:
            raise ValueError("Missing audio path in meta")
        if not os.path.exists(path):
            raise FileNotFoundError(f"Audio not found at {path}")

        y, sr = load_audio_centered(path, sr=SR, clip_seconds=CLIP_SECONDS)

        # Your spectral features function should return (dict, S_power)
        spec_feats, S_power = spectral_features(y, sr)

        # MFCC features from S_power
        mfcc_feats = mfcc_features(S_power, sr)

        # Merge and attach metadata
        feats: Dict[str, Any] = {**spec_feats, **mfcc_feats}
        feats.update({
            "track_id": _mget(meta, "track_id"),
            "title": _mget(meta, "title"),
            "artist": _mget(meta, "artist"),
            "genre": _mget(meta, "genre"),
            "path": path,
        })
        return feats

    except Exception as e:
        print(f"[WARN] Failed {_mget(meta, 'track_id', 'unknown')}: {e}")
        return None

Loads the full FMA-small metadata and builds a reference table linking each track to its corresponding MP3 file. The function reads the multi-index CSV, extracts key details like title, top genre, and artist, and reconstructs the correct audio path based on the track ID format used in the dataset. It returns a list of dictionaries, each representing one track with both metadata and file path. After loading, the print statements confirm how many tracks were found and display the first entry for verification.

In [None]:
# Feature Table

AUDIO_ROOT = "/content/fma_small"

def load_tracks_csv(csv_path: str):
    # Load multi-index CSV
    df = pd.read_csv(csv_path, header=[0, 1], index_col=0)

    rows = []
    for track_id, row in df.iterrows():
        title = row[('track', 'title')]
        genre = row[('track', 'genre_top')]
        artist = row[('artist', 'name')]

        # Build path to audio file for FMA small
        track_id_int = int(track_id)
        folder = f"{track_id_int:03d}"[:3]
        filename = f"{track_id_int:06d}.mp3"
        audio_path = os.path.join("/content/fma_small", folder, filename)

        rows.append({
            "track_id": track_id,
            "title": title,
            "artist": artist,
            "genre": genre,
            "path": audio_path
        })
    return rows

    TRACKS_CSV = "/content/data/fma_metadata/tracks.csv"
metas = load_tracks_csv(TRACKS_CSV)
print(f"Loaded {len(metas)} tracks.")
print(metas[0])

Loaded 106574 tracks.
{'track_id': 2, 'title': 'Food', 'artist': 'AWOL', 'genre': 'Hip-Hop', 'path': '/content/fma_small/002/000002.mp3'}


Reconstructs valid audio paths for every track and filters out missing files. The helper function build_fma_small_path() formats each track ID into the proper folder and filename structure used in the FMA-small dataset (e.g., fma_small/000/000123.mp3). It then updates the path field for every metadata entry and keeps only those whose audio files actually exist. Finally, it prints the total number of metadata rows and how many audio files were successfully located, providing a quick integrity check of the dataset setup.

In [None]:
AUDIO_ROOT = "/content/fma_small"

def build_fma_small_path(track_id: int, audio_root=AUDIO_ROOT) -> str:
    tid = int(track_id)
    folder = f"{tid:06d}"[:3]
    filename = f"{tid:06d}.mp3"
    return os.path.join(audio_root, folder, filename)


for m in metas:
    m["path"] = build_fma_small_path(m["track_id"])

# Filter to tracks whose audio actually exists
metas_existing = [m for m in metas if os.path.exists(m["path"])]

print(f"Total metadata rows: {len(metas)}")
print(f"Audio files found:   {len(metas_existing)} (should be ~8000 for fma_small)")
print(metas_existing[0] if metas_existing else "No audio files found; check unzip path.")

Total metadata rows: 106574
Audio files found:   5230 (should be ~8000 for fma_small)
{'track_id': 2, 'title': 'Food', 'artist': 'AWOL', 'genre': 'Hip-Hop', 'path': '/content/fma_small/000/000002.mp3'}


Sets up parallel feature extraction with joblib and checkpoints results to disk. Environment vars cap BLAS threads to avoid CPU oversubscription. run_parallel picks n_jobs (CPU-1 by default), slices metadata into batches, and runs extract_features_for_track in separate processes (backend="loky"). After each batch, successful rows are appended to out_csv in append mode, so progress is saved even if the run stops mid-way. The in-memory rows list is cleared after each write to limit RAM; the function returns only any remaining unsaved rows (use the CSV as the canonical output when out_csv is set).

In [None]:
!pip -q install joblib

import os, pandas as pd, multiprocessing
from joblib import Parallel, delayed
from tqdm import tqdm

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"

def process_one(meta):
    return extract_features_for_track(meta)  # already defined earlier

def run_parallel(metas, n_jobs=None, batch_size=100, out_csv=None):
    """
    Parallel feature extraction with joblib and periodic CSV saving.
    """
    if n_jobs is None:
        n_jobs = max(1, multiprocessing.cpu_count() - 1)

    rows = []
    for start in tqdm(range(0, len(metas), batch_size), desc="Extracting (parallel)"):
        chunk = metas[start:start+batch_size]
        results = Parallel(n_jobs=n_jobs, backend="loky", prefer="processes")(
            delayed(process_one)(m) for m in chunk
        )
        good = [r for r in results if r is not None]
        rows.extend(good)

        # Save partial results
        if out_csv and len(good) > 0:
            mode = "a" if os.path.exists(out_csv) else "w"
            header = not os.path.exists(out_csv)
            pd.DataFrame(good).to_csv(out_csv, mode=mode, header=header, index=False)
            rows = []

    return rows

Runs the full feature extraction pipeline on all existing tracks in parallel. The run_parallel function processes audio files in batches, extracts their spectral and MFCC features, and periodically writes them to the CSV defined by FEATURES_CSV. Once complete, the collected results are converted into a pandas DataFrame called feat_df, which holds the complete feature table ready for normalization, similarity computation, or model evaluation.

In [None]:
rows = run_parallel(metas_existing, n_jobs=None, batch_size=100, out_csv=FEATURES_CSV)
feat_df = pd.DataFrame(rows)

Extracting (parallel):  83%|████████▎ | 44/53 [31:17<06:16, 41.89s/it]

[WARN] Failed 99134: 


Extracting (parallel):  92%|█████████▏| 49/53 [34:44<02:45, 41.43s/it]

[WARN] Failed 108925: 


Extracting (parallel): 100%|██████████| 53/53 [36:58<00:00, 41.85s/it]

[WARN] Failed 113063: 





Loads the previously saved feature dataset from features_parallel.csv into a pandas DataFrame. The print(feat_df.shape) command then displays the number of rows and columns, giving a quick overview of how many tracks were processed and how many features were extracted per track.

In [None]:
FEATURES_CSV = "/content/data/features_parallel.csv"
feat_df = pd.read_csv(FEATURES_CSV)
print(feat_df.shape)


(7327, 40)


Performs sequential (non-parallel) feature extraction and builds a complete feature table. The helper _mget safely retrieves metadata fields whether stored as a dictionary or object. The main function extract_features_for_track() loads each track’s audio, extracts its spectral and MFCC features, merges them with metadata like title and genre, and returns a combined dictionary.
The loop iterates over all valid tracks, displaying progress with tqdm and collecting feature dictionaries into a list. This list is then converted into a pandas DataFrame, feat_df, which summarizes the number of successfully processed tracks and total extracted feature columns ready for scaling and similarity analysis.

In [None]:
def _mget(meta, key, default=None):

    if isinstance(meta, dict):
        return meta.get(key, default)
    return getattr(meta, key, default)

def extract_features_for_track(meta):
    try:
        path = _mget(meta, "path")
        if not path:
            raise ValueError("Missing audio path in meta")
        if not os.path.exists(path):
            raise FileNotFoundError(f"Audio not found at {path}")

        # Load audio
        y, sr = load_audio_centered(path, sr=SR, clip_seconds=CLIP_SECONDS)

        # Feature blocks
        spec_feats, S_power = spectral_features(y, sr)
        mfcc_feats = mfcc_features(S_power, sr)

        # Merge + attach metadata
        feats = {**spec_feats, **mfcc_feats}
        feats.update({
            "track_id": _mget(meta, "track_id"),
            "title": _mget(meta, "title"),
            "artist": _mget(meta, "artist"),
            "genre": _mget(meta, "genre"),
            "path": path,
        })
        return feats

    except Exception as e:
        tid = _mget(meta, "track_id", "unknown")
        print(f"[WARN] Failed {tid}: {e}")
        return None

rows = []
for m in tqdm(metas_existing, desc="Extracting features"):
    feats = extract_features_for_track(m)
    if feats is not None:
        rows.append(feats)

feat_df = pd.DataFrame(rows)
print(f"Tracks with features: {len(feat_df)}, Columns: {feat_df.shape[1] if len(feat_df)>0 else 0}")

Extracting features:  85%|████████▌ | 4471/5230 [31:26<03:45,  3.36it/s]

[WARN] Failed 99134: 


Extracting features:  94%|█████████▍| 4904/5230 [34:27<01:37,  3.35it/s]

[WARN] Failed 108925: 


Extracting features: 100%|██████████| 5230/5230 [36:40<00:00,  2.38it/s]

[WARN] Failed 113063: 
Tracks with features: 5227, Columns: 40





Prepares the extracted features for recommendation by standardizing and computing pairwise similarity. Non-numeric columns like titles and genres are excluded, leaving only feature columns. Any infinite or missing values are replaced with column means to maintain consistency. The StandardScaler then normalizes all features so they share a comparable range, improving similarity accuracy. Finally, cosine_similarity computes how closely each track’s feature vector matches others, producing a similarity matrix (sim_mat) where higher values indicate greater resemblance. Self similarities are set to zero to prevent a track from recommending itself.

In [None]:
# Standardize + Similarity Index

# Select numeric columns for modeling
meta_cols = ["track_id", "title", "artist", "genre", "path"]
num_cols = [c for c in feat_df.columns if c not in meta_cols]

# Handle any NaNs
feat_df[num_cols] = feat_df[num_cols].replace([np.inf, -np.inf], np.nan).fillna(feat_df[num_cols].mean())

scaler = StandardScaler()
X = scaler.fit_transform(feat_df[num_cols].values)

# Precompute cosine similarity matrix
sim_mat = cosine_similarity(X)
np.fill_diagonal(sim_mat, 0.0) # ignore self-similarity in retrieval

Implements the retrieval stage of the recommender system, where similar tracks are ranked for any given seed song. It first creates lookup tables: idx2meta links DataFrame indices to track metadata, and title_artist2idx allows quick index retrieval by song title and artist.
The function topk_for_index() retrieves cosine similarity scores for a track, sorts them in descending order, and returns the top-K most similar tracks along with their similarity values. The test loop then selects one or two seed tracks, computes their nearest neighbors, and displays the results as a small ranked table showing each neighbor’s title, artist, genre, and similarity score. This effectively demonstrates how the system finds and lists musically related tracks based on extracted spectral and MFCC features.

In [None]:
# Top-K Neighbors for given index

TOPK = 10

# Build lookup tables
idx2meta = {i: m for i, m in enumerate(metas_existing)}
title_artist2idx = {(m['title'].strip().lower(), m['artist'].strip().lower()): i
                    for i, m in enumerate(metas_existing)}

def topk_for_index(idx: int, k: int = TOPK):
    sims = sim_mat[idx]
    nn_idx = np.argsort(-sims)[:k]
    return [(int(j), float(sims[j])) for j in nn_idx]

# spot-check
for idx in [0, min(1, len(feat_df)-1)]:
    seed = feat_df.iloc[idx]
    nns = topk_for_index(idx, k=TOPK)

    print(f"\nSeed: {seed['title']} — {seed['artist']}")

    # Convert neighbors to table rows
    table = []
    for rank, (j, s) in enumerate(nns, start=1):
        row = feat_df.iloc[j]
        table.append({
            "Rank": rank,
            "Track ID": j,
            "Title": row['title'],
            "Artist": row['artist'],
            "Genre": row.get('genre', 'Unknown'),
            "Similarity": round(s, 3)
        })

    df_neighbors = pd.DataFrame(table)
    display(df_neighbors)


Seed: Food — AWOL


Unnamed: 0,Rank,Track ID,Title,Artist,Genre,Similarity
0,1,664,Must Be a Hit,Laws,Hip-Hop,0.908
1,2,5001,Cold Spot,Waylon Thornton,Rock,0.895
2,3,3426,Oblivion,The Pleasure Kills,Rock,0.886
3,4,3164,For You,Cvees,Hip-Hop,0.875
4,5,2931,Best Friend,Seek Six Sick,Rock,0.858
5,6,3439,Sadness,Gurdonark,Experimental,0.852
6,7,1227,TIME/Leap,St. LaRok,Hip-Hop,0.843
7,8,3437,Ritual,Ambient Fabric,Experimental,0.839
8,9,666,110% (radio edit),Laws,Hip-Hop,0.837
9,10,2100,Food,AWOL,Hip-Hop,0.836



Seed: This World — AWOL


Unnamed: 0,Rank,Track ID,Title,Artist,Genre,Similarity
0,1,3558,Beloved Girlfriend,Kraus,Rock,0.861
1,2,5003,Come On Down,Waylon Thornton,Rock,0.851
2,3,4436,1967,et_,Rock,0.845
3,4,4430,'92,BADLUCK,Hip-Hop,0.833
4,5,667,Deranged Barber,Majik Most,Hip-Hop,0.823
5,6,665,Speak Your Mind feat. Hard Target,Laws,Hip-Hop,0.816
6,7,4841,Ticking eye,I Was A Teenage Strangler Soundtrack,Instrumental,0.815
7,8,4526,Falcon Hood,Podington Bear,Electronic,0.812
8,9,5020,The Great Cape,Kid Flicks,Pop,0.807
9,10,1883,Believe In Me (Reprise),13adluck & uh-oh,Hip-Hop,0.802


Defines quick evaluation utilities for the recommender system. feature_stability_v2 measures how consistent a track’s timbre is by taking two random 5-second segments from the same song, computing MFCC means for each, and returning their Pearson correlation (higher ≈ more stable features; returns NaN on short/silent clips). evaluate_artist_at_k reports the share of seed tracks whose Top-K neighbors include at least one song by the same artist (a sanity check that neighbors aren’t random). evaluate_genre_purity_at_k averages, across seeds, the fraction of Top-K neighbors that share the seed’s genre (higher purity suggests genre-coherent neighborhoods). evaluate_mrr_same_artist computes Mean Reciprocal Rank of the first same-artist neighbor for each seed using cosine similarities (higher means same-artist appears nearer the top). Notes for future-you: the trials argument in feature_stability_v2 isn’t used; consider averaging multiple segment pairs. And in evaluate_mrr_same_artist, the return is indented inside the loop over seeds—move it outside that loop to actually average across all seeds.

In [None]:
# Evaluation
# Implement: Feature Stability, Artist@K, Genre Purity@K, and MRR for same-artist positives

def _mget(meta, key, default=None):
    return meta.get(key, default) if isinstance(meta, dict) else getattr(meta, key, default)

def _mfcc_means_from_segment(y, sr, seg_seconds=5, n_mfcc=20):
    seg_len = int(seg_seconds * sr)
    if len(y) < seg_len + 1:
        return None
    start = np.random.randint(0, max(1, len(y) - seg_len))
    seg = y[start:start + seg_len]

    S_power = np.abs(librosa.stft(seg))**2
    mel = librosa.feature.melspectrogram(S=S_power, sr=sr, n_mels=128)
    log_mel = librosa.power_to_db(mel + 1e-10)
    mfcc = librosa.feature.mfcc(S=log_mel, sr=sr, n_mfcc=n_mfcc)
    return np.nanmean(mfcc, axis=1)

def feature_stability_v2(meta, trials: int = 2):
    """Correlation between two MFCC-mean vectors from different segments of the same track."""
    path = _mget(meta, "path")
    if not path:
        raise ValueError("Meta missing 'path'")
    y, sr = load_audio_centered(path, sr=SR, clip_seconds=CLIP_SECONDS)
    if y is None or len(y) < 4 * sr:
        return np.nan

    x1 = _mfcc_means_from_segment(y, sr)
    x2 = _mfcc_means_from_segment(y, sr)
    if x1 is None or x2 is None:
        return np.nan

    if np.allclose(np.std(x1), 0) or np.allclose(np.std(x2), 0):
        return np.nan
    return float(np.corrcoef(x1, x2)[0, 1])


def evaluate_artist_at_k(seed_indices: list, k: int = TOPK):
    if not EVAL_ARTIST:
        return None
    hits = 0
    denom = 0
    for i in seed_indices:
        seed_artist = str(feat_df.iloc[i]["artist"]).strip().lower()
        if len(seed_artist) == 0:
            continue
        denom += 1
        nns = topk_for_index(i, k)
        found = False
        for j, _ in nns:
            if str(feat_df.iloc[j]["artist"]).strip().lower() == seed_artist:
                found = True
                break
        hits += int(found)
    return hits / denom if denom > 0 else np.nan

def evaluate_genre_purity_at_k(seed_indices: list, k: int = TOPK):
    if not EVAL_GENRE:
        return None
    purities = []
    for i in seed_indices:
        seed_genre = str(feat_df.iloc[i]["genre"]).strip().lower()
        if len(seed_genre) == 0:
            continue
        nns = topk_for_index(i, k)
        same = 0
        for j, _ in nns:
            if str(feat_df.iloc[j]["genre"]).strip().lower() == seed_genre:
               same += 1
        purities.append(same / k)
    return float(np.mean(purities)) if purities else np.nan


def evaluate_mrr_same_artist(seed_indices: list, k: int = TOPK):
    ranks = []
    for i in seed_indices:
        seed_artist = str(feat_df.iloc[i]["artist"]).strip().lower()
        if len(seed_artist) == 0:
            continue
        sims = sim_mat[i]
        order = np.argsort(-sims)
        # find first index with same artist
        rr = 0.0
        for rank, j in enumerate(order[:max(50, k)], start=1):
            if str(feat_df.iloc[j]["artist"]).strip().lower() == seed_artist:
                rr = 1.0 / rank
                break
        if rr > 0:
            ranks.append(rr)
        return float(np.mean(ranks)) if ranks else np.nan

Prints diagnostic information to verify dataset integrity and feature stability. The first line confirms that each metadata entry is stored as a dictionary and shows its available keys (like track_id, title, artist, genre, path). The next line displays a sample audio file path to confirm proper file structure. Finally, it runs feature_stability_v2 on one track to compute the correlation between MFCC means from two random segments, giving a quick sense of how stable that track’s spectral features are.

In [None]:
print(type(metas_existing[0]), list(metas_existing[0].keys()))
print("Sample path:", metas_existing[0]["path"])
print("One stability value (v2):", feature_stability_v2(metas_existing[0], trials=2))

<class 'dict'> ['track_id', 'title', 'artist', 'genre', 'path']
Sample path: /content/fma_small/000/000002.mp3
One stability value (v2): 0.9888729527098878


Updates the reference for the feature stability function. It deletes any old version of feature_stability from memory (ignoring errors if it doesn’t exist) and then reassigns the name to feature_stability_v2. This ensures that any later calls to feature_stability() use the updated implementation without having to rename it elsewhere in the notebook.

In [None]:
try:
    del feature_stability #remove previous version from memeory
except NameError:
    pass
feature_stability = feature_stability_v2  # point the name to the new dict


Randomly selects a subset of 500 tracks from the available metadata to serve as the evaluation sample. The random seed ensures consistent sampling across runs. It then creates a mapping between each metadata object and its index position in the full dataset, allowing evaluation functions to reference the correct similarity matrix rows. Finally, it prints how many tracks were chosen, confirming that the evaluation subset has been successfully prepared.

In [None]:
random.seed(42)

EVAL_SIZE = 500
metas_eval = random.sample(metas_existing, EVAL_SIZE)  # subset of dicts

# map the subset back to their row indices (to use with sim_mat)
meta_to_idx = {id(m): i for i, m in enumerate(metas_existing)}
seed_indices = [meta_to_idx[id(m)] for m in metas_eval]
print(f"Using {len(metas_eval)} tracks for evaluation")

Using 500 tracks for evaluation


Evaluates the recommender system’s overall performance using stability and similarity-based metrics. For each track in the 500-sample evaluation subset, it computes feature stability by correlating MFCC mean vectors from two random segments, collecting the results in stabs. The mean stability score reflects how consistently the system represents a track’s timbre across time.
Next, three retrieval metrics are computed: Artist@K (how often a same-artist song appears among the top-K recommendations), Genre Purity@K (average proportion of neighbors sharing the same genre), and MRR (Mean Reciprocal Rank for the first same-artist match). The fmt() helper formats these metrics to three decimal places or prints “nan” for invalid results.
Finally, all scores are printed together, giving a concise summary of how stable, artist-consistent, and genre-coherent the recommender’s retrieved results are.

In [None]:
# Stability across tracks

stabs = []
for m in tqdm(metas_eval, desc="Stability (MFCC means corr)"):
    s = feature_stability(m, trials=2)
    try:
        stabs.append(float(s))
    except Exception:
        stabs.append(np.nan)

stab_arr = np.asarray(stabs, dtype=float)
stability_mean = float(np.nanmean(stab_arr)) if np.isfinite(stab_arr).any() else np.nan

artist_at_k  = evaluate_artist_at_k(seed_indices, k=TOPK)
genre_purity = evaluate_genre_purity_at_k(seed_indices, k=TOPK)
mrr          = evaluate_mrr_same_artist(seed_indices, k=TOPK)

def fmt(x):
    return "nan" if not (isinstance(x, (int, float)) and np.isfinite(x)) else f"{x:.3f}"

print("\n=== Evaluation ===")
print(f"Feature Stability (MFCC means corr): {fmt(stability_mean)}")
print(f"Artist@{TOPK}: {fmt(artist_at_k)}")
print(f"Genre Purity@{TOPK}: {fmt(genre_purity)}")
print(f"MRR (same-artist within top ~50): {fmt(mrr)}")

Stability (MFCC means corr): 100%|██████████| 500/500 [02:03<00:00,  4.04it/s]



=== Evaluation ===
Feature Stability (MFCC means corr): 0.966
Artist@10: 0.226
Genre Purity@10: 0.218
MRR (same-artist within top ~50): nan


Separates MFCC features from other spectral features to test their individual contribution to similarity results. The code first identifies all MFCC-related columns by name and groups the remaining numeric ones as general spectral features. It then builds a MFCC-only feature matrix, standardizes it with StandardScaler for consistent scaling, and computes pairwise cosine similarity between tracks based solely on their MFCC vectors. Self-similarity values are set to zero, preparing this matrix for comparison or retrieval evaluation focused only on timbral characteristics.

In [None]:
# Identify columns
mfcc_cols = [c for c in feat_df.columns if c.startswith("mfcc_")]
spectral_cols = [c for c in feat_df.columns if c not in meta_cols + mfcc_cols]

# MFCC-only pipeline
X_mfcc = StandardScaler().fit_transform(feat_df[mfcc_cols].values)
sim_mfcc = cosine_similarity(X_mfcc)
np.fill_diagonal(sim_mfcc, 0.0)

Performs an ablation test to compare how well MFCC-only features perform versus the full feature set (MFCC + spectral). The helper function eval_artist_at_k_with_sim() iterates through all tracks, checking whether a same-artist song appears within the top-K most similar tracks using a given similarity matrix. It computes the ratio of successful hits to total evaluated tracks as the final Artist@K score.
Two evaluations are run: one using the MFCC-only similarity matrix (sim_mfcc) and another using the combined model’s similarity results (sim_mat). The printed results directly compare how much timbre-based (MFCC) features alone contribute to artist-level retrieval accuracy versus when additional spectral descriptors are included.

In [None]:
# Evaluate Artist@K for a sim matrix

def eval_artist_at_k_with_sim(sim_matrix, k: int = TOPK):
  hits = 0
  denom = 0
  for i in range(len(feat_df)):
      seed_artist = str(feat_df.iloc[i]["artist"]).strip().lower()
      if len(seed_artist) == 0:
          continue
      denom += 1
      sims = sim_matrix[i]
      order = np.argsort(-sims)[:k]
      found = False
      for j in order:
          if str(feat_df.iloc[j]["artist"]).strip().lower() == seed_artist:
              found = True
              break
      hits += int(found)
  return hits / denom if denom > 0 else np.nan

artist_at_k_mfcc = eval_artist_at_k_with_sim(sim_mfcc, k=TOPK)
artist_at_k_combined = evaluate_artist_at_k(seed_indices, k=TOPK)

print("\n=== Ablation (Artist@K) ===")
print(f"MFCC-only: {artist_at_k_mfcc:.3f}")
print(f"MFCC+Spectral: {artist_at_k_combined:.3f}")


=== Ablation (Artist@K) ===
MFCC-only: 0.717
MFCC+Spectral: 0.226


Implements a simple retrieval interface for generating recommendations by track ID. It first builds a dictionary mapping each track_id to its row index in the feature DataFrame for quick lookup. The recommend_by_track_id() function takes a given track ID, retrieves its index, and uses the precomputed similarity matrix to find its top-K most similar tracks.
For each recommended track, it collects details such as title, artist, and similarity score into a small DataFrame for easy viewing. The printed message confirms which seed track the recommendations are based on, while the returned DataFrame presents the final ranked list of similar songs, forming the user-facing output of the recommender system.



In [None]:
# Retrieval API for report

INDEX_BY_ID = {tid: i for i, tid in enumerate(feat_df["track_id"]) }

def recommend_by_track_id(track_id: str, k: int = TOPK):
    idx = INDEX_BY_ID.get(track_id, None)
    if idx is None:
        raise KeyError(f"Unknown track_id: {track_id}")
    nns = topk_for_index(idx, k=k)
    rows = []
    seed = feat_df.iloc[idx]
    for j, s in nns:
        row = feat_df.iloc[j]
        rows.append({
            "seed_track_id": seed["track_id"],
            "seed_title": seed["title"],
            "seed_artist": seed["artist"],
            "rec_track_id": row["track_id"],
            "rec_title": row["title"],
            "rec_artist": row["artist"],
            "similarity": round(float(s), 4),
            })
    df = pd.DataFrame(rows)
    print(f"✅ Top {k} recommendations for track ID {track_id} — {seed['title']} by {seed['artist']}")
    return df

Retrieves and displays the top 5 recommended tracks for the first entry in your feature dataset. It takes that track’s ID, passes it to recommend_by_track_id(), and returns a small DataFrame showing similar tracks ranked by cosine similarity. Each row lists the seed track’s details alongside a recommended track’s title, artist, and similarity score effectively demonstrating how your recommender suggests musically related songs from the dataset.

In [None]:
some_id = int(feat_df["track_id"].iloc[0])
recommend_by_track_id(some_id, k=5)

✅ Top 5 recommendations for track ID 2 — Food by AWOL


Unnamed: 0,seed_track_id,seed_title,seed_artist,rec_track_id,rec_title,rec_artist,similarity
0,2,Food,AWOL,18887,Best Friend,Seek Six Sick,0.9154
1,2,Food,AWOL,66538,Pass The Ring,Waylon Thornton,0.9065
2,2,Food,AWOL,49070,Bologna Soundcheck 031105,Justice Yeldham,0.8982
3,2,Food,AWOL,37423,Crimson Snow,Church Of When The Shit Hits The Fan,0.8822
4,2,Food,AWOL,33446,Mad About You,Yair Yona,0.8722


Prints a concise summary of all evaluation metrics in a readable report format. It displays the average feature stability score (MFCC correlation across segments), Artist@K, Genre Purity@K, and Mean Reciprocal Rank (MRR), showing how consistent, artist-aware, and genre-coherent the recommender is. The final line compares results from the MFCC-only and combined feature models, revealing how additional spectral features impact recommendation accuracy. This block serves as the system’s final quantitative performance summary.

In [None]:
print(f"Feature stability (MFCC means corr): {stability_mean:.3f}")
print(f"Artist@{TOPK}: {artist_at_k:.3f}")
print(f"Genre Purity@{TOPK}: {genre_purity:.3f}")
print(f"MRR (same-artist within top ~50): {mrr:.3f}")
print(f"Ablation Artist@{TOPK} — MFCC-only: {artist_at_k_mfcc:.3f} | Combined: {artist_at_k_combined:.3f}")

Feature stability (MFCC means corr): 0.969
Artist@10: 0.226
Genre Purity@10: 0.218
MRR (same-artist within top ~50): nan
Ablation Artist@10 — MFCC-only: 0.717 | Combined: 0.226
