feat(audio-lab): live ASR transcript panel (M2)#47
Closed
wavekat-eason wants to merge 1 commit into
Closed
Conversation
Frontend half of the ASR integration: - New AsrConfigPanel mirrors TurnConfigPanel — backend + preset + label. - New AsrTranscript card renders finals (with [mm:ss.s–mm:ss.s] prefix) plus a dimmed trailing partial that overwrites until the final lands. Footer shows last confidence, count of finals, average segment duration. "loading model…" until the backend's `ready` event arrives. Copy-all button concatenates final text to the clipboard. - App.tsx wires list_asr_backends on connect, persists asr configs to localStorage, pushes set_asr_configs on change + before start / load_file, resets transcripts on new session. - websocket.ts: new AsrConfig / AsrEventKind types, asr_backends + asr server messages, list_asr_backends + set_asr_configs client messages. Log panel batches `partial` events (matching how `vad` is batched) and inlines finals / warnings. cargo isn't touched — backend already merged on feat/asr-backend. npm run lint clean (no new warnings); npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
Contributor
Author
|
Superseded by #49 (consolidated ASR work). |
6 tasks
wavekat-eason
added a commit
that referenced
this pull request
May 15, 2026
## Summary Implements the [ASR plan](https://github.com/wavekat/wavekat-lab/blob/main/docs/05-plan-asr.md) end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all closed). ### Backend - New `tools/audio-lab/backend/src/asr.rs` module: `AsrConfig`, `AsrServerEvent`, `run_asr_pipeline`. Each config gets a dedicated OS worker thread that owns a `SherpaOnnxAsr`; a tokio task bridges the audio broadcast in, and `blocking_send` bridges transcript events back onto a tokio mpsc. - New WS messages: `ListAsrBackends` / `SetAsrConfigs` (client) and `AsrBackends` / `Asr` (server). `Asr` carries a `kind` discriminator: `ready` / `speech_started` / `speech_ended` / `partial` / `final` / `warning`, with optional `ts_ms` / `end_ms` / `text` / `confidence` / `message` fields populated per kind. - Wired into both `StartRecording` (live mic) and `LoadFile` (WAV upload) paths. - Adds `wavekat-asr = "0.0.4"` with the `sherpa-onnx` feature. ### Frontend - New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend + preset dropdowns, editable label, add / clone / remove). - New `AsrTranscript` card per active ASR config: committed finals with `[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count / avg segment duration. Shows `loading model…` until the backend's `ready` event arrives. - `websocket.ts` types + log-panel batching of `asr.partial` messages so the log doesn't drown in partials. Finals and warnings still log inline. - `App.tsx`: `asrConfigs` persisted to `localStorage` (`lab-asr-configs`), pushed to backend on change + before every start / load_file, transcripts reset on each new session. - **2-column layout**: all config panels (VAD / Turn / Pipeline / ASR) moved into a left aside (`w-80` on lg+); waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in `docs/05-plan-asr.md`. Single-column on narrower screens. ### Docs - `tools/audio-lab/README.md`: new "ASR" subsection with the sherpa-onnx preset table and a NOTE about the first-run ~75 MB HF model download. "Live transcripts" added to What It Does. - Top-level `README.md`: ASR mentioned in the audio-lab one-liner + tool-layout blurb. ### Out of scope (follow-up) - Loom / screenshot in the README video table — needs a recording session. - Transcript ticks on `VadTimeline` / `PipelineTimeline` at each `final`. - Two-channel ASR (`Channel::Remote`). - WER / latency benchmarking — wait for a second ASR backend. - Audio-lab release tag — release-please will cut it automatically on merge. ## Test plan - [x] `cargo check --workspace` (backend) - [x] `cargo clippy --workspace -- -D warnings` (backend, when M1 landed) - [x] `cargo test --workspace` (5 pre-existing tests still pass) - [x] `npm run lint` (no new warnings beyond pre-existing 7 in `FrequencySpectrum` / `Waveform`) - [x] `npm run build` (clean) - [ ] Manual smoke test: `make dev`, add an ASR config (sherpa-onnx · bilingual), record / load a WAV, confirm partials roll in and finals commit; toggle preset between bilingual / en / zh and verify model reload. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
M2 of the ASR plan — the frontend that makes ASR visible to a user. Stacks on top of M1 (#46); base = `feat/asr-backend`.
Out of scope (M3 follow-up):
Test plan
🤖 Generated with Claude Code