Skip to content

Make real metric caches explicit#76

Merged
ftshijt merged 4 commits into
wavlab-speech:mainfrom
ftshijt:codex-move-fastdtw-to-audio-extra
May 19, 2026
Merged

Make real metric caches explicit#76
ftshijt merged 4 commits into
wavlab-speech:mainfrom
ftshijt:codex-move-fastdtw-to-audio-extra

Conversation

@ftshijt
Copy link
Copy Markdown
Contributor

@ftshijt ftshijt commented May 18, 2026

Summary

  • Add a visible Hugging Face/discrete-speech cache setup flow for real model-backed metric tests via tools/setup_huggingface_cache.sh.
  • Wire discrete speech and EmoVAD loaders to explicit cache paths so model/checkpoint assets are reused from repo-visible cache directories instead of hidden or incomplete defaults.
  • Make discrete-speech loader patching idempotent so repeated metric setup can refresh the active cache directory without stacking wrappers or retaining stale paths.
  • Tighten metric pipeline checks so expected keys are asserted directly and missing metric outputs fail instead of silently passing.
  • Add focused regression coverage for registry defaults, duplicate nested basenames, metric-specific minimum audio length, safe WER parsing, and mocked WER pipeline behavior.
  • Document the real-model cache workflow in the README and CI/testing docs.

Diff Notes

  • README.md, docs/ci.md, docs/metric_migration.md: document real-model cache preparation and explicit env vars.
  • tools/setup_huggingface_cache.sh, versa/huggingface_cache.py: add shared cache configuration, cached-file detection, and offline loading helpers.
  • versa/utterance_metrics/discrete_speech.py, versa/utterance_metrics/emo_vad.py, versa/utterance_metrics/nomad.py: update optional model-backed metric setup paths and availability checks.
  • versa/scorer_shared.py, versa/utils_shared.py, versa/bin/scorer_chunk.py: tighten scoring validation and preserve nested duplicate file basenames.
  • scripts/survey/get_wer.py: replace eval parsing with JSON / ast.literal_eval and fix ESPnet fallback selection.
  • tools/install_fairseq.sh, tools/install_scoreq.sh: respect explicit PYTHON=... and fall back to python3 when python is unavailable.
  • test/test_metrics/*, test/test_pipeline/*: assert expected summary keys and cover the new behavior.
  • pyproject.toml: include matplotlib in the audio extra used by metric paths.

Root Cause

The real-model failures came from package availability checks passing while required model/checkpoint assets were missing or stored in hidden/inconsistent cache locations. Some pipeline tests also iterated over returned summary keys, so an empty or incomplete metric output could look successful.

CI Fix

The latest CI failure on c66ba0f was the code-quality job's Black check. GitHub reported five unformatted files:

  • test/test_metrics/test_asr_matching.py
  • test/test_pipeline/test_base_metrics_pipeline.py
  • versa/huggingface_cache.py
  • versa/utils_shared.py
  • versa/utterance_metrics/discrete_speech.py

Commit c4edd48 formats those files and fixes the stale-cache review issue in the discrete-speech loader patching.

The follow-up core-tests failure was a packaging/import issue: test_definition.py imported scripts.survey.get_wer, but scripts is not importable after pip install -e .[test] in CI. Commit 2341953 keeps the regression coverage and loads scripts/survey/get_wer.py by file path instead.

Commit 4af57bc keeps the optional fairseq/scoreq installers portable on environments that only expose python3.

Validation

  • .codex-test-venv/bin/python -m black --check $(git ls-files '*.py') -> pass
  • .codex-test-venv/bin/python -m flake8 versa scripts test setup.py --count --select=E9,F63,F7,F82 --show-source --statistics -> 0
  • .codex-test-venv/bin/python -m pytest -q test/test_metrics/test_definition.py -> 7 passed
  • .codex-test-venv/bin/python -m pytest -q test/test_metrics/test_discrete_speech.py test/test_metrics/test_emo_vad.py test/test_metrics/test_definition.py test/test_pipeline/test_base_metrics_pipeline.py -> 40 passed
  • .codex-test-venv/bin/python -c "import versa; print(versa.__version__)" -> 1.0.0
  • bash -n tools/setup_huggingface_cache.sh -> pass
  • bash -n tools/install_fairseq.sh tools/install_scoreq.sh -> pass
  • git diff --check -> pass

Notes

The broader CI flake8 command is configured with --exit-zero; it still reports existing style warnings across the repo but does not fail the job. The remaining skipped real-model item in local full-suite validation is scoreq_versa when that optional package is not installed.

@ftshijt ftshijt merged commit c398480 into wavlab-speech:main May 19, 2026
6 checks passed
@ftshijt ftshijt deleted the codex-move-fastdtw-to-audio-extra branch May 19, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant