Skip to content

feat(audio-lab): wire wavekat-asr backend (M1)#46

Closed
wavekat-eason wants to merge 1 commit into
mainfrom
feat/asr-backend
Closed

feat(audio-lab): wire wavekat-asr backend (M1)#46
wavekat-eason wants to merge 1 commit into
mainfrom
feat/asr-backend

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

M1 of the ASR plan — backend wiring only.

  • New backend/src/asr.rs module: AsrConfig, AsrServerEvent, run_asr_pipeline. Per-config OS thread owns a SherpaOnnxAsr; tokio task bridges the existing audio broadcast in, blocking_send bridges transcript events back.
  • WS surface: new ListAsrBackends / SetAsrConfigs client messages, AsrBackends + Asr server messages. Asr carries a kind discriminator (ready / speech_started / speech_ended / partial / final / warning) with the relevant optional payload fields.
  • Wired into both StartRecording (live mic) and LoadFile (WAV upload) paths so the same runner serves both flows.
  • Adds wavekat-asr = "0.0.4" with the sherpa-onnx feature. First record after pulling will download the ~75 MB bilingual Zipformer to \$HF_HOME; the worker emits a ready event once the model is loaded.

Out of scope (later milestones, per the plan):

  • Frontend AsrConfigPanel and AsrTranscript (M2)
  • README "ASR" section, log-panel batching, cold-start loading UI (M3)
  • Two-channel ASR, per-config preprocessing, benchmark table

Test plan

  • cargo check --workspace
  • cargo clippy --workspace -- -D warnings
  • cargo test --workspace (5 existing tests pass)
  • Manual: make dev-backend, send set_asr_configs + start_recording via wscat, confirm partials/finals print in logs

🤖 Generated with Claude Code

Adds a sherpa-onnx ASR backend that fans out alongside the existing VAD
and turn-detection pipelines. Each AsrConfig runs in its own worker
thread (sherpa-onnx is sync + holds model state); a tokio task bridges
the audio broadcast in, and a blocking_send loop bridges transcript
events back to the websocket.

WS surface: ListAsrBackends / SetAsrConfigs client messages,
AsrBackends + Asr server messages. Asr events carry a `kind` field
(ready, speech_started, speech_ended, partial, final, warning) with
optional ts_ms/end_ms/text/confidence/message.

M1 scope: backend only — no frontend yet. cargo check + clippy clean,
existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wavekat-eason
Copy link
Copy Markdown
Contributor Author

Superseded by #49 (consolidated ASR work).

@wavekat-eason wavekat-eason deleted the feat/asr-backend branch May 14, 2026 23:40
wavekat-eason added a commit that referenced this pull request May 15, 2026
## Summary

Implements the [ASR
plan](https://github.com/wavekat/wavekat-lab/blob/main/docs/05-plan-asr.md)
end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all
closed).

### Backend

- New `tools/audio-lab/backend/src/asr.rs` module: `AsrConfig`,
`AsrServerEvent`, `run_asr_pipeline`. Each config gets a dedicated OS
worker thread that owns a `SherpaOnnxAsr`; a tokio task bridges the
audio broadcast in, and `blocking_send` bridges transcript events back
onto a tokio mpsc.
- New WS messages: `ListAsrBackends` / `SetAsrConfigs` (client) and
`AsrBackends` / `Asr` (server). `Asr` carries a `kind` discriminator:
`ready` / `speech_started` / `speech_ended` / `partial` / `final` /
`warning`, with optional `ts_ms` / `end_ms` / `text` / `confidence` /
`message` fields populated per kind.
- Wired into both `StartRecording` (live mic) and `LoadFile` (WAV
upload) paths.
- Adds `wavekat-asr = "0.0.4"` with the `sherpa-onnx` feature.

### Frontend

- New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend + preset
dropdowns, editable label, add / clone / remove).
- New `AsrTranscript` card per active ASR config: committed finals with
`[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets
overwritten until the final lands, footer with last confidence / count /
avg segment duration. Shows `loading model…` until the backend's `ready`
event arrives.
- `websocket.ts` types + log-panel batching of `asr.partial` messages so
the log doesn't drown in partials. Finals and warnings still log inline.
- `App.tsx`: `asrConfigs` persisted to `localStorage`
(`lab-asr-configs`), pushed to backend on change + before every start /
load_file, transcripts reset on each new session.
- **2-column layout**: all config panels (VAD / Turn / Pipeline / ASR)
moved into a left aside (`w-80` on lg+); waveform / spectrum / timelines
/ ASR transcript / preprocessed sections fill a flex-1 main column.
Matches the layout sketch in `docs/05-plan-asr.md`. Single-column on
narrower screens.

### Docs

- `tools/audio-lab/README.md`: new "ASR" subsection with the sherpa-onnx
preset table and a NOTE about the first-run ~75 MB HF model download.
"Live transcripts" added to What It Does.
- Top-level `README.md`: ASR mentioned in the audio-lab one-liner +
tool-layout blurb.

### Out of scope (follow-up)

- Loom / screenshot in the README video table — needs a recording
session.
- Transcript ticks on `VadTimeline` / `PipelineTimeline` at each
`final`.
- Two-channel ASR (`Channel::Remote`).
- WER / latency benchmarking — wait for a second ASR backend.
- Audio-lab release tag — release-please will cut it automatically on
merge.

## Test plan

- [x] `cargo check --workspace` (backend)
- [x] `cargo clippy --workspace -- -D warnings` (backend, when M1
landed)
- [x] `cargo test --workspace` (5 pre-existing tests still pass)
- [x] `npm run lint` (no new warnings beyond pre-existing 7 in
`FrequencySpectrum` / `Waveform`)
- [x] `npm run build` (clean)
- [ ] Manual smoke test: `make dev`, add an ASR config (sherpa-onnx ·
bilingual), record / load a WAV, confirm partials roll in and finals
commit; toggle preset between bilingual / en / zh and verify model
reload.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant