docs(audio-lab): README updates for ASR (M3) by wavekat-eason · Pull Request #48 · wavekat/wavekat-lab

wavekat-eason · 2026-05-14T23:23:57Z

Summary

M3 polish for the ASR integration — docs only. Stacks on top of M2 (#47); base = `feat/asr-frontend`.

New "ASR" subsection in `tools/audio-lab/README.md` under Supported Backends, with the sherpa-onnx preset table and a NOTE about the first-run model download (~75 MB to `$HF_HOME`).
"Live transcripts" added to the audio-lab What It Does list.
Top-level `README.md`: ASR mentioned in the audio-lab one-liner + tool-layout blurb.

The audio-lab release (`v0.0.x`) will get cut automatically by release-please when the merged PRs land on main; no manual step needed.

Not in this PR (intentional):

Loom / screenshot for the README's video table — needs a recording session.
Frontend version bump — release-please owns that.

Test plan

`git diff` reviewed
Renders correctly on GitHub once merged

🤖 Generated with Claude Code

- New "ASR" subsection under Supported Backends with the sherpa-onnx preset table and a NOTE about the first-run model download (~75 MB to \$HF_HOME). - Mention "live transcripts" in the audio-lab What It Does list. - Top-level README: include ASR in the audio-lab description + tool layout blurb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Makefile invoked `nvm use` from audio-lab/ where no .nvmrc exists, causing nvm to print help and exit 127. Move `nvm use` after `cd frontend` so it picks up frontend/.nvmrc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wavekat-eason · 2026-05-14T23:40:13Z

Superseded by #49 (consolidated ASR work).

## Summary Implements the [ASR plan](https://github.com/wavekat/wavekat-lab/blob/main/docs/05-plan-asr.md) end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all closed). ### Backend - New `tools/audio-lab/backend/src/asr.rs` module: `AsrConfig`, `AsrServerEvent`, `run_asr_pipeline`. Each config gets a dedicated OS worker thread that owns a `SherpaOnnxAsr`; a tokio task bridges the audio broadcast in, and `blocking_send` bridges transcript events back onto a tokio mpsc. - New WS messages: `ListAsrBackends` / `SetAsrConfigs` (client) and `AsrBackends` / `Asr` (server). `Asr` carries a `kind` discriminator: `ready` / `speech_started` / `speech_ended` / `partial` / `final` / `warning`, with optional `ts_ms` / `end_ms` / `text` / `confidence` / `message` fields populated per kind. - Wired into both `StartRecording` (live mic) and `LoadFile` (WAV upload) paths. - Adds `wavekat-asr = "0.0.4"` with the `sherpa-onnx` feature. ### Frontend - New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend + preset dropdowns, editable label, add / clone / remove). - New `AsrTranscript` card per active ASR config: committed finals with `[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count / avg segment duration. Shows `loading model…` until the backend's `ready` event arrives. - `websocket.ts` types + log-panel batching of `asr.partial` messages so the log doesn't drown in partials. Finals and warnings still log inline. - `App.tsx`: `asrConfigs` persisted to `localStorage` (`lab-asr-configs`), pushed to backend on change + before every start / load_file, transcripts reset on each new session. - **2-column layout**: all config panels (VAD / Turn / Pipeline / ASR) moved into a left aside (`w-80` on lg+); waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in `docs/05-plan-asr.md`. Single-column on narrower screens. ### Docs - `tools/audio-lab/README.md`: new "ASR" subsection with the sherpa-onnx preset table and a NOTE about the first-run ~75 MB HF model download. "Live transcripts" added to What It Does. - Top-level `README.md`: ASR mentioned in the audio-lab one-liner + tool-layout blurb. ### Out of scope (follow-up) - Loom / screenshot in the README video table — needs a recording session. - Transcript ticks on `VadTimeline` / `PipelineTimeline` at each `final`. - Two-channel ASR (`Channel::Remote`). - WER / latency benchmarking — wait for a second ASR backend. - Audio-lab release tag — release-please will cut it automatically on merge. ## Test plan - [x] `cargo check --workspace` (backend) - [x] `cargo clippy --workspace -- -D warnings` (backend, when M1 landed) - [x] `cargo test --workspace` (5 pre-existing tests still pass) - [x] `npm run lint` (no new warnings beyond pre-existing 7 in `FrequencySpectrum` / `Waveform`) - [x] `npm run build` (clean) - [ ] Manual smoke test: `make dev`, add an ASR config (sherpa-onnx · bilingual), record / load a WAV, confirm partials roll in and finals commit; toggle preset between bilingual / en / zh and verify model reload. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wavekat-eason and others added 2 commits May 15, 2026 11:23

wavekat-eason deleted the branch feat/asr-frontend May 14, 2026 23:40

wavekat-eason closed this May 14, 2026

wavekat-eason deleted the docs/asr-readme branch May 14, 2026 23:40

wavekat-eason mentioned this pull request May 14, 2026

feat(audio-lab): live ASR backend + transcript UI #50

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(audio-lab): README updates for ASR (M3)#48

docs(audio-lab): README updates for ASR (M3)#48
wavekat-eason wants to merge 2 commits into
feat/asr-frontendfrom
docs/asr-readme

wavekat-eason commented May 14, 2026

Uh oh!

wavekat-eason commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wavekat-eason commented May 14, 2026

Summary

Test plan

Uh oh!

wavekat-eason commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant