Make real metric caches explicit by ftshijt · Pull Request #76 · wavlab-speech/versa

ftshijt · 2026-05-18T22:10:04Z

Summary

Add a visible Hugging Face/discrete-speech cache setup flow for real model-backed metric tests via tools/setup_huggingface_cache.sh.
Wire discrete speech and EmoVAD loaders to explicit cache paths so model/checkpoint assets are reused from repo-visible cache directories instead of hidden or incomplete defaults.
Make discrete-speech loader patching idempotent so repeated metric setup can refresh the active cache directory without stacking wrappers or retaining stale paths.
Tighten metric pipeline checks so expected keys are asserted directly and missing metric outputs fail instead of silently passing.
Add focused regression coverage for registry defaults, duplicate nested basenames, metric-specific minimum audio length, safe WER parsing, and mocked WER pipeline behavior.
Document the real-model cache workflow in the README and CI/testing docs.

Diff Notes

README.md, docs/ci.md, docs/metric_migration.md: document real-model cache preparation and explicit env vars.
tools/setup_huggingface_cache.sh, versa/huggingface_cache.py: add shared cache configuration, cached-file detection, and offline loading helpers.
versa/utterance_metrics/discrete_speech.py, versa/utterance_metrics/emo_vad.py, versa/utterance_metrics/nomad.py: update optional model-backed metric setup paths and availability checks.
versa/scorer_shared.py, versa/utils_shared.py, versa/bin/scorer_chunk.py: tighten scoring validation and preserve nested duplicate file basenames.
scripts/survey/get_wer.py: replace eval parsing with JSON / ast.literal_eval and fix ESPnet fallback selection.
tools/install_fairseq.sh, tools/install_scoreq.sh: respect explicit PYTHON=... and fall back to python3 when python is unavailable.
test/test_metrics/*, test/test_pipeline/*: assert expected summary keys and cover the new behavior.
pyproject.toml: include matplotlib in the audio extra used by metric paths.

Root Cause

The real-model failures came from package availability checks passing while required model/checkpoint assets were missing or stored in hidden/inconsistent cache locations. Some pipeline tests also iterated over returned summary keys, so an empty or incomplete metric output could look successful.

CI Fix

The latest CI failure on c66ba0f was the code-quality job's Black check. GitHub reported five unformatted files:

test/test_metrics/test_asr_matching.py
test/test_pipeline/test_base_metrics_pipeline.py
versa/huggingface_cache.py
versa/utils_shared.py
versa/utterance_metrics/discrete_speech.py

Commit c4edd48 formats those files and fixes the stale-cache review issue in the discrete-speech loader patching.

The follow-up core-tests failure was a packaging/import issue: test_definition.py imported scripts.survey.get_wer, but scripts is not importable after pip install -e .[test] in CI. Commit 2341953 keeps the regression coverage and loads scripts/survey/get_wer.py by file path instead.

Commit 4af57bc keeps the optional fairseq/scoreq installers portable on environments that only expose python3.

Validation

.codex-test-venv/bin/python -m black --check $(git ls-files '*.py') -> pass
.codex-test-venv/bin/python -m flake8 versa scripts test setup.py --count --select=E9,F63,F7,F82 --show-source --statistics -> 0
.codex-test-venv/bin/python -m pytest -q test/test_metrics/test_definition.py -> 7 passed
.codex-test-venv/bin/python -m pytest -q test/test_metrics/test_discrete_speech.py test/test_metrics/test_emo_vad.py test/test_metrics/test_definition.py test/test_pipeline/test_base_metrics_pipeline.py -> 40 passed
.codex-test-venv/bin/python -c "import versa; print(versa.__version__)" -> 1.0.0
bash -n tools/setup_huggingface_cache.sh -> pass
bash -n tools/install_fairseq.sh tools/install_scoreq.sh -> pass
git diff --check -> pass

Notes

The broader CI flake8 command is configured with --exit-zero; it still reports existing style warnings across the repo but does not fail the job. The remaining skipped real-model item in local full-suite validation is scoreq_versa when that optional package is not installed.

ftshijt added 4 commits May 18, 2026 14:58

Make real metric caches explicit

c66ba0f

Fix PR 76 CI formatting

c4edd48

Fix core test script import

2341953

Use python3 fallback in optional installers

4af57bc

ftshijt merged commit c398480 into wavlab-speech:main May 19, 2026
6 checks passed

ftshijt deleted the codex-move-fastdtw-to-audio-extra branch May 19, 2026 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make real metric caches explicit#76

Make real metric caches explicit#76
ftshijt merged 4 commits into
wavlab-speech:mainfrom
ftshijt:codex-move-fastdtw-to-audio-extra

ftshijt commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ftshijt commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Diff Notes

Root Cause

CI Fix

Validation

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ftshijt commented May 18, 2026 •

edited

Loading