Release v3.6.0: Audio Anomaly Detection Modality by yzhao062 · Pull Request #694 · yzhao062/pyod

yzhao062 · 2026-06-04T21:57:26Z

v3.6.0: Audio Anomaly Detection Modality

Adds audio as a first-class modality on the agentic and multimodal line, entirely additively (no change to existing tabular, text, or image paths).

New in v3.6.0

AudioFeatureEncoder (pyod/utils/encoders/audio.py): each clip becomes a 74-dim handcrafted acoustic vector (20 MFCC, 12 chroma, 5 spectral descriptors, each as mean and std over frames, via librosa). Registered as the audio-mfcc encoder.
EmbeddingOD.for_audio(quality=...): presets fast=IForest, balanced=KNN, best=LUNAR over the audio encoder, so any classical detector runs on audio (embed then detect).
AudioAE (pyod/models/audio_ae.py): DCASE-style log-mel reconstruction autoencoder that reuses the PyOD AutoEncoder with per-clip mean reconstruction error. Torch-gated.
ADEngine: audio file-path profiling (_sniff_data_type, profile_data) and routing (for_audio as default, AudioAE as the deep alternative).
Knowledge base: new AudioAE entry; audio added to EmbeddingOD and MultiModalOD.
Packaging: new optional extra pyod[audio] (librosa, soundfile).

Counts

Buildable detector count rises from 60 to 61. pyod info: 61 total (43 tabular, 7 time-series, 8 graph, 2 text, 2 image, 1 multimodal, 3 audio).

Tests and Review

18 new audio tests (synthetic waveforms; torch-gated deep tests skip without torch). KB count-consistency checks and regen_skill --check pass. Reviewed via /implement-review (Codex, no High findings; one Medium and two Low fixed). References the public methods (DCASE 2020 Task 2 baseline; MFCC, chroma, and spectral features via librosa).

No breaking API changes.

New AudioFeatureEncoder (74-dim handcrafted acoustic features: 20 MFCC, 12 chroma, 5 spectral descriptors via librosa) registered as the 'audio-mfcc' encoder and exposed through EmbeddingOD.for_audio(); new AudioAE detector (DCASE-style log-mel reconstruction autoencoder reusing the PyOD AutoEncoder with clip-level aggregation). ADEngine profiles and routes audio file paths; the knowledge base gains an AudioAE entry plus audio support on EmbeddingOD and MultiModalOD. New optional extra pyod[audio] (librosa, soundfile). Buildable detector count rises from 60 to 61. References the public methods (DCASE 2020 Task 2 baseline; MFCC, chroma, and spectral features via librosa), not any specific manuscript. Reviewed via /implement-review (Codex, no High findings). 18 new audio tests; KB count check and pyod info confirm 61 detectors (3 audio).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ccb8021b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T22:03:38Z

+        pad = np.zeros((n_mels, context - n_frames), dtype=spec.dtype)
+        spec = np.concatenate([spec, pad], axis=1)


Pad short spectrograms at the silence floor

For clips shorter than context, this pads after librosa.power_to_db with 0 dB columns. In the resulting log-mel representation, 0 dB is the reference/max level rather than silence, so short audio clips get artificial high-energy frames during both training and scoring. This affects the documented short-clip path; pad before dB conversion or fill with the spectrogram floor/minimum instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T22:03:38Z

+    return np.stack(windows).astype(np.float32)
+
+
+class AudioAE(BaseDetector):


Add a list-safe predict_proba override

AudioAE documents and accepts list inputs such as waveforms and file paths, but it inherits BaseDetector.predict_proba, which allocates probabilities with X.shape[0]. After AudioAE().fit(clips), calling predict_proba(clips) on those documented list inputs raises AttributeError; this class needs the same kind of list-aware override that EmbeddingOD provides.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T22:03:38Z

+                if self._looks_like_audio_paths(sample[:5]):
+                    return 'audio'


Detect waveform audio before tabular fallback

This new audio sniffing only runs inside the all-strings branch, so the other documented audio inputs added here—lists of waveform arrays or (waveform, sample_rate) tuples accepted by AudioFeatureEncoder and AudioAE—still fall through to tabular. In ADEngine's default flow, profile_data([waveform1, waveform2, ...]) therefore plans tabular detectors instead of EmbeddingOD.for_audio/AudioAE, and unequal-length clips can fail during the numeric np.asarray profiling step; add a conservative waveform/tuple check before the tabular fallback.

Useful? React with 👍 / 👎.

coveralls · 2026-06-04T22:17:32Z

Coverage Report for CI Build 26982041698

Coverage decreased (-1.2%) to 92.647%

Details

Coverage decreased (-1.2%) from the base build.
Patch coverage: 264 uncovered changes across 6 files (14 of 278 lines covered, 5.04%).
1 coverage regression across 1 file.

Uncovered Changes

File	Changed	Covered	%
pyod/test/test_audio.py	121	3	2.48%
pyod/models/audio_ae.py	75	0	0.0%
pyod/utils/encoders/audio.py	61	0	0.0%
pyod/models/embedding.py	7	2	28.57%
pyod/utils/ad_engine.py	12	9	75.0%
pyod/utils/_detector_factory.py	2	0	0.0%

Coverage Regressions

1 previously-covered line in 1 file lost coverage.

File	Lines Losing Coverage	Coverage
pyod/utils/_detector_factory.py	1	84.78%

Coverage Stats


Relevant Lines:	20113
Covered Lines:	18634
Line Coverage:	92.65%
Coverage Strength:	10.16 hits per line

💛 - Coveralls

yzhao062 added 2 commits June 4, 2026 14:50

bump version to 3.6.0 (P1 of release)

5ccb802

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

yzhao062 merged commit fb0d774 into master Jun 4, 2026
15 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v3.6.0: Audio Anomaly Detection Modality#694

Release v3.6.0: Audio Anomaly Detection Modality#694
yzhao062 merged 2 commits into
masterfrom
development

yzhao062 commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

coveralls commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		pad = np.zeros((n_mels, context - n_frames), dtype=spec.dtype)
		spec = np.concatenate([spec, pad], axis=1)

		return np.stack(windows).astype(np.float32)


		class AudioAE(BaseDetector):

Uh oh!

Conversation

yzhao062 commented Jun 4, 2026

v3.6.0: Audio Anomaly Detection Modality

New in v3.6.0

Counts

Tests and Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 26982041698

Coverage decreased (-1.2%) to 92.647%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coveralls commented Jun 4, 2026 •

edited

Loading