Skip to content

Release v3.6.0: Audio Anomaly Detection Modality#694

Merged
yzhao062 merged 2 commits into
masterfrom
development
Jun 4, 2026
Merged

Release v3.6.0: Audio Anomaly Detection Modality#694
yzhao062 merged 2 commits into
masterfrom
development

Conversation

@yzhao062
Copy link
Copy Markdown
Owner

@yzhao062 yzhao062 commented Jun 4, 2026

v3.6.0: Audio Anomaly Detection Modality

Adds audio as a first-class modality on the agentic and multimodal line, entirely additively (no change to existing tabular, text, or image paths).

New in v3.6.0

  • AudioFeatureEncoder (pyod/utils/encoders/audio.py): each clip becomes a 74-dim handcrafted acoustic vector (20 MFCC, 12 chroma, 5 spectral descriptors, each as mean and std over frames, via librosa). Registered as the audio-mfcc encoder.
  • EmbeddingOD.for_audio(quality=...): presets fast=IForest, balanced=KNN, best=LUNAR over the audio encoder, so any classical detector runs on audio (embed then detect).
  • AudioAE (pyod/models/audio_ae.py): DCASE-style log-mel reconstruction autoencoder that reuses the PyOD AutoEncoder with per-clip mean reconstruction error. Torch-gated.
  • ADEngine: audio file-path profiling (_sniff_data_type, profile_data) and routing (for_audio as default, AudioAE as the deep alternative).
  • Knowledge base: new AudioAE entry; audio added to EmbeddingOD and MultiModalOD.
  • Packaging: new optional extra pyod[audio] (librosa, soundfile).

Counts

Buildable detector count rises from 60 to 61. pyod info: 61 total (43 tabular, 7 time-series, 8 graph, 2 text, 2 image, 1 multimodal, 3 audio).

Tests and Review

18 new audio tests (synthetic waveforms; torch-gated deep tests skip without torch). KB count-consistency checks and regen_skill --check pass. Reviewed via /implement-review (Codex, no High findings; one Medium and two Low fixed). References the public methods (DCASE 2020 Task 2 baseline; MFCC, chroma, and spectral features via librosa).

No breaking API changes.

yzhao062 added 2 commits June 4, 2026 14:50
New AudioFeatureEncoder (74-dim handcrafted acoustic features: 20 MFCC, 12 chroma, 5 spectral descriptors via librosa) registered as the 'audio-mfcc' encoder and exposed through EmbeddingOD.for_audio(); new AudioAE detector (DCASE-style log-mel reconstruction autoencoder reusing the PyOD AutoEncoder with clip-level aggregation). ADEngine profiles and routes audio file paths; the knowledge base gains an AudioAE entry plus audio support on EmbeddingOD and MultiModalOD. New optional extra pyod[audio] (librosa, soundfile). Buildable detector count rises from 60 to 61. References the public methods (DCASE 2020 Task 2 baseline; MFCC, chroma, and spectral features via librosa), not any specific manuscript. Reviewed via /implement-review (Codex, no High findings).

18 new audio tests; KB count check and pyod info confirm 61 detectors (3 audio).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ccb8021b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pyod/models/audio_ae.py
Comment on lines +36 to +37
pad = np.zeros((n_mels, context - n_frames), dtype=spec.dtype)
spec = np.concatenate([spec, pad], axis=1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pad short spectrograms at the silence floor

For clips shorter than context, this pads after librosa.power_to_db with 0 dB columns. In the resulting log-mel representation, 0 dB is the reference/max level rather than silence, so short audio clips get artificial high-energy frames during both training and scoring. This affects the documented short-clip path; pad before dB conversion or fill with the spectrogram floor/minimum instead.

Useful? React with 👍 / 👎.

Comment thread pyod/models/audio_ae.py
return np.stack(windows).astype(np.float32)


class AudioAE(BaseDetector):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add a list-safe predict_proba override

AudioAE documents and accepts list inputs such as waveforms and file paths, but it inherits BaseDetector.predict_proba, which allocates probabilities with X.shape[0]. After AudioAE().fit(clips), calling predict_proba(clips) on those documented list inputs raises AttributeError; this class needs the same kind of list-aware override that EmbeddingOD provides.

Useful? React with 👍 / 👎.

Comment thread pyod/utils/ad_engine.py
Comment on lines +158 to +159
if self._looks_like_audio_paths(sample[:5]):
return 'audio'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect waveform audio before tabular fallback

This new audio sniffing only runs inside the all-strings branch, so the other documented audio inputs added here—lists of waveform arrays or (waveform, sample_rate) tuples accepted by AudioFeatureEncoder and AudioAE—still fall through to tabular. In ADEngine's default flow, profile_data([waveform1, waveform2, ...]) therefore plans tabular detectors instead of EmbeddingOD.for_audio/AudioAE, and unequal-length clips can fail during the numeric np.asarray profiling step; add a conservative waveform/tuple check before the tabular fallback.

Useful? React with 👍 / 👎.

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 4, 2026

Coverage Report for CI Build 26982041698

Coverage decreased (-1.2%) to 92.647%

Details

  • Coverage decreased (-1.2%) from the base build.
  • Patch coverage: 264 uncovered changes across 6 files (14 of 278 lines covered, 5.04%).
  • 1 coverage regression across 1 file.

Uncovered Changes

File Changed Covered %
pyod/test/test_audio.py 121 3 2.48%
pyod/models/audio_ae.py 75 0 0.0%
pyod/utils/encoders/audio.py 61 0 0.0%
pyod/models/embedding.py 7 2 28.57%
pyod/utils/ad_engine.py 12 9 75.0%
pyod/utils/_detector_factory.py 2 0 0.0%

Coverage Regressions

1 previously-covered line in 1 file lost coverage.

File Lines Losing Coverage Coverage
pyod/utils/_detector_factory.py 1 84.78%

Coverage Stats

Coverage Status
Relevant Lines: 20113
Covered Lines: 18634
Line Coverage: 92.65%
Coverage Strength: 10.16 hits per line

💛 - Coveralls

@yzhao062 yzhao062 merged commit fb0d774 into master Jun 4, 2026
15 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants