Skip to content

docs(audio-lab): README updates for ASR (M3)#48

Closed
wavekat-eason wants to merge 2 commits into
feat/asr-frontendfrom
docs/asr-readme
Closed

docs(audio-lab): README updates for ASR (M3)#48
wavekat-eason wants to merge 2 commits into
feat/asr-frontendfrom
docs/asr-readme

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

M3 polish for the ASR integration — docs only. Stacks on top of M2 (#47); base = `feat/asr-frontend`.

  • New "ASR" subsection in `tools/audio-lab/README.md` under Supported Backends, with the sherpa-onnx preset table and a NOTE about the first-run model download (~75 MB to `$HF_HOME`).
  • "Live transcripts" added to the audio-lab What It Does list.
  • Top-level `README.md`: ASR mentioned in the audio-lab one-liner + tool-layout blurb.

The audio-lab release (`v0.0.x`) will get cut automatically by release-please when the merged PRs land on main; no manual step needed.

Not in this PR (intentional):

  • Loom / screenshot for the README's video table — needs a recording session.
  • Frontend version bump — release-please owns that.

Test plan

  • `git diff` reviewed
  • Renders correctly on GitHub once merged

🤖 Generated with Claude Code

wavekat-eason and others added 2 commits May 15, 2026 11:23
- New "ASR" subsection under Supported Backends with the sherpa-onnx
  preset table and a NOTE about the first-run model download (~75 MB
  to \$HF_HOME).
- Mention "live transcripts" in the audio-lab What It Does list.
- Top-level README: include ASR in the audio-lab description + tool
  layout blurb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Makefile invoked `nvm use` from audio-lab/ where no .nvmrc exists,
causing nvm to print help and exit 127. Move `nvm use` after `cd frontend`
so it picks up frontend/.nvmrc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wavekat-eason wavekat-eason deleted the branch feat/asr-frontend May 14, 2026 23:40
@wavekat-eason
Copy link
Copy Markdown
Contributor Author

Superseded by #49 (consolidated ASR work).

@wavekat-eason wavekat-eason deleted the docs/asr-readme branch May 14, 2026 23:40
wavekat-eason added a commit that referenced this pull request May 15, 2026
## Summary

Implements the [ASR
plan](https://github.com/wavekat/wavekat-lab/blob/main/docs/05-plan-asr.md)
end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all
closed).

### Backend

- New `tools/audio-lab/backend/src/asr.rs` module: `AsrConfig`,
`AsrServerEvent`, `run_asr_pipeline`. Each config gets a dedicated OS
worker thread that owns a `SherpaOnnxAsr`; a tokio task bridges the
audio broadcast in, and `blocking_send` bridges transcript events back
onto a tokio mpsc.
- New WS messages: `ListAsrBackends` / `SetAsrConfigs` (client) and
`AsrBackends` / `Asr` (server). `Asr` carries a `kind` discriminator:
`ready` / `speech_started` / `speech_ended` / `partial` / `final` /
`warning`, with optional `ts_ms` / `end_ms` / `text` / `confidence` /
`message` fields populated per kind.
- Wired into both `StartRecording` (live mic) and `LoadFile` (WAV
upload) paths.
- Adds `wavekat-asr = "0.0.4"` with the `sherpa-onnx` feature.

### Frontend

- New `AsrConfigPanel` mirrors `TurnConfigPanel` (backend + preset
dropdowns, editable label, add / clone / remove).
- New `AsrTranscript` card per active ASR config: committed finals with
`[mm:ss.s–mm:ss.s]` prefix, dimmed trailing partial that gets
overwritten until the final lands, footer with last confidence / count /
avg segment duration. Shows `loading model…` until the backend's `ready`
event arrives.
- `websocket.ts` types + log-panel batching of `asr.partial` messages so
the log doesn't drown in partials. Finals and warnings still log inline.
- `App.tsx`: `asrConfigs` persisted to `localStorage`
(`lab-asr-configs`), pushed to backend on change + before every start /
load_file, transcripts reset on each new session.
- **2-column layout**: all config panels (VAD / Turn / Pipeline / ASR)
moved into a left aside (`w-80` on lg+); waveform / spectrum / timelines
/ ASR transcript / preprocessed sections fill a flex-1 main column.
Matches the layout sketch in `docs/05-plan-asr.md`. Single-column on
narrower screens.

### Docs

- `tools/audio-lab/README.md`: new "ASR" subsection with the sherpa-onnx
preset table and a NOTE about the first-run ~75 MB HF model download.
"Live transcripts" added to What It Does.
- Top-level `README.md`: ASR mentioned in the audio-lab one-liner +
tool-layout blurb.

### Out of scope (follow-up)

- Loom / screenshot in the README video table — needs a recording
session.
- Transcript ticks on `VadTimeline` / `PipelineTimeline` at each
`final`.
- Two-channel ASR (`Channel::Remote`).
- WER / latency benchmarking — wait for a second ASR backend.
- Audio-lab release tag — release-please will cut it automatically on
merge.

## Test plan

- [x] `cargo check --workspace` (backend)
- [x] `cargo clippy --workspace -- -D warnings` (backend, when M1
landed)
- [x] `cargo test --workspace` (5 pre-existing tests still pass)
- [x] `npm run lint` (no new warnings beyond pre-existing 7 in
`FrequencySpectrum` / `Waveform`)
- [x] `npm run build` (clean)
- [ ] Manual smoke test: `make dev`, add an ASR config (sherpa-onnx ·
bilingual), record / load a WAV, confirm partials roll in and finals
commit; toggle preset between bilingual / en / zh and verify model
reload.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant