Skip to content

feat(audio-lab): live ASR transcript panel (M2)#47

Closed
wavekat-eason wants to merge 1 commit into
feat/asr-backendfrom
feat/asr-frontend
Closed

feat(audio-lab): live ASR transcript panel (M2)#47
wavekat-eason wants to merge 1 commit into
feat/asr-backendfrom
feat/asr-frontend

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

M2 of the ASR plan — the frontend that makes ASR visible to a user. Stacks on top of M1 (#46); base = `feat/asr-backend`.

  • New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend dropdown, preset dropdown, editable label, add / clone / remove).
  • New `AsrTranscript` card per active config: committed finals with `[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count of finals / avg segment duration. Shows "loading model…" until the backend's `ready` event arrives, and "⚠" for warnings. "Copy all" concatenates final text to the clipboard.
  • Auto-scrolls the transcript list to the bottom unless the user has scrolled up.
  • `websocket.ts`: new `AsrConfig` / `AsrEventKind` types, `asr_backends` + `asr` server messages, `list_asr_backends` + `set_asr_configs` client messages. Log panel batches the spammy `partial` events (matching how `vad` is batched) and inlines finals / warnings verbatim.
  • `App.tsx`: `asrConfigs` persisted to `localStorage` (key `lab-asr-configs`), pushed to backend on change + before every `start_recording` / `load_file`, transcripts reset on each new session.

Out of scope (M3 follow-up):

  • README "ASR" section + first-run download note
  • Loom / screenshot in the lab README's video table
  • Cut a `v0.0.x` audio-lab release
  • Optional v2: transcript ticks on `VadTimeline` / `PipelineTimeline` at each `final`

Test plan

  • `npm run lint` (clean — no new warnings beyond the 7 pre-existing ones in `FrequencySpectrum` / `Waveform`)
  • `npm run build` (clean, 1.1s)
  • Manual smoke test: `make dev`, add an ASR config, record / load a WAV, confirm a transcript appears

🤖 Generated with Claude Code

Frontend half of the ASR integration:
- New AsrConfigPanel mirrors TurnConfigPanel — backend + preset + label.
- New AsrTranscript card renders finals (with [mm:ss.s–mm:ss.s] prefix)
  plus a dimmed trailing partial that overwrites until the final lands.
  Footer shows last confidence, count of finals, average segment
  duration. "loading model…" until the backend's `ready` event arrives.
  Copy-all button concatenates final text to the clipboard.
- App.tsx wires list_asr_backends on connect, persists asr configs to
  localStorage, pushes set_asr_configs on change + before start /
  load_file, resets transcripts on new session.
- websocket.ts: new AsrConfig / AsrEventKind types, asr_backends + asr
  server messages, list_asr_backends + set_asr_configs client messages.
  Log panel batches `partial` events (matching how `vad` is batched)
  and inlines finals / warnings.

cargo isn't touched — backend already merged on feat/asr-backend.
npm run lint clean (no new warnings); npm run build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wavekat-eason
Copy link
Copy Markdown
Contributor Author

Superseded by #49 (consolidated ASR work).

@wavekat-eason wavekat-eason deleted the branch feat/asr-backend May 14, 2026 23:40
@wavekat-eason wavekat-eason deleted the feat/asr-frontend branch May 14, 2026 23:40
wavekat-eason added a commit that referenced this pull request May 15, 2026
## Summary

Implements the [ASR
plan](https://github.com/wavekat/wavekat-lab/blob/main/docs/05-plan-asr.md)
end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all
closed).

### Backend

- New `tools/audio-lab/backend/src/asr.rs` module: `AsrConfig`,
`AsrServerEvent`, `run_asr_pipeline`. Each config gets a dedicated OS
worker thread that owns a `SherpaOnnxAsr`; a tokio task bridges the
audio broadcast in, and `blocking_send` bridges transcript events back
onto a tokio mpsc.
- New WS messages: `ListAsrBackends` / `SetAsrConfigs` (client) and
`AsrBackends` / `Asr` (server). `Asr` carries a `kind` discriminator:
`ready` / `speech_started` / `speech_ended` / `partial` / `final` /
`warning`, with optional `ts_ms` / `end_ms` / `text` / `confidence` /
`message` fields populated per kind.
- Wired into both `StartRecording` (live mic) and `LoadFile` (WAV
upload) paths.
- Adds `wavekat-asr = "0.0.4"` with the `sherpa-onnx` feature.

### Frontend

- New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend + preset
dropdowns, editable label, add / clone / remove).
- New `AsrTranscript` card per active ASR config: committed finals with
`[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets
overwritten until the final lands, footer with last confidence / count /
avg segment duration. Shows `loading model…` until the backend's `ready`
event arrives.
- `websocket.ts` types + log-panel batching of `asr.partial` messages so
the log doesn't drown in partials. Finals and warnings still log inline.
- `App.tsx`: `asrConfigs` persisted to `localStorage`
(`lab-asr-configs`), pushed to backend on change + before every start /
load_file, transcripts reset on each new session.
- **2-column layout**: all config panels (VAD / Turn / Pipeline / ASR)
moved into a left aside (`w-80` on lg+); waveform / spectrum / timelines
/ ASR transcript / preprocessed sections fill a flex-1 main column.
Matches the layout sketch in `docs/05-plan-asr.md`. Single-column on
narrower screens.

### Docs

- `tools/audio-lab/README.md`: new "ASR" subsection with the sherpa-onnx
preset table and a NOTE about the first-run ~75 MB HF model download.
"Live transcripts" added to What It Does.
- Top-level `README.md`: ASR mentioned in the audio-lab one-liner +
tool-layout blurb.

### Out of scope (follow-up)

- Loom / screenshot in the README video table — needs a recording
session.
- Transcript ticks on `VadTimeline` / `PipelineTimeline` at each
`final`.
- Two-channel ASR (`Channel::Remote`).
- WER / latency benchmarking — wait for a second ASR backend.
- Audio-lab release tag — release-please will cut it automatically on
merge.

## Test plan

- [x] `cargo check --workspace` (backend)
- [x] `cargo clippy --workspace -- -D warnings` (backend, when M1
landed)
- [x] `cargo test --workspace` (5 pre-existing tests still pass)
- [x] `npm run lint` (no new warnings beyond pre-existing 7 in
`FrequencySpectrum` / `Waveform`)
- [x] `npm run build` (clean)
- [ ] Manual smoke test: `make dev`, add an ASR config (sherpa-onnx ·
bilingual), record / load a WAV, confirm partials roll in and finals
commit; toggle preset between bilingual / en / zh and verify model
reload.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant