LiveKit Agents plugin for Techmo ASR — a high-accuracy Polish and multilingual automatic speech recognition service exposed via gRPC.
- Streaming speech recognition (bidirectional gRPC)
- Batch recognition
- Interim (partial) results
- Word-level timestamps
- TLS and mutual-TLS support
- Multiple language groups / model selection
- MRCP timeout controls (no-input, recognition, speech-complete, speech-incomplete)
- Compatible with LiveKit Agents >= 1.5
- Python >= 3.10
livekit-agents >= 1.5grpcio >= 1.63protobuf >= 5.0- Access to a running Techmo ASR gRPC server
git clone https://github.com/techmo-pl/livekit-plugins-techmo
cd livekit-plugins-techmo
# Install build tools
pip install grpcio-tools hatchling
# Install the plugin (stubs are generated at build time)
pip install --no-build-isolation .Note: The
--no-build-isolationflag is required because the build hook generates gRPC Python stubs from the.protofiles inproto/at install time. The stubs are placed inlivekit/plugins/techmo/_proto/.
To regenerate stubs manually after changing .proto files:
pip install grpcio-tools
python hatch_build.pyfrom livekit.plugins.techmo import STT
stt = STT(
service_address="asr.example.com:5555",
language_group="pl", # Polish; omit to use server default
interim_results=True,
)Or set the address via environment variable:
export TECHMO_ASR_ADDRESS=asr.example.com:5555from livekit.plugins.techmo import STT
stt = STT() # reads TECHMO_ASR_ADDRESS automaticallystt = STT(
service_address="asr.example.com:443",
tls=True,
ca_cert=open("ca.crt", "rb").read(),
# For mutual TLS:
# client_cert=open("client.crt", "rb").read(),
# client_key=open("client.key", "rb").read(),
)from livekit.agents import Agent, AgentSession, JobContext, RoomInputOptions, WorkerOptions, cli
from livekit.plugins import silero
from livekit.plugins.techmo import STT
async def entrypoint(ctx: JobContext) -> None:
await ctx.connect()
session = AgentSession(
vad=silero.VAD.load(),
stt=STT(
language_group="pl",
interim_results=True,
mrcp_speech_complete_timeout=1000,
),
# llm=..., tts=...
)
await session.start(room=ctx.room, agent=Agent(), room_input_options=RoomInputOptions())
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))| Parameter | Type | Default | Description |
|---|---|---|---|
service_address |
str |
TECHMO_ASR_ADDRESS env |
gRPC server address (host:port) |
sample_rate |
int |
16000 |
Audio sample rate in Hz |
language_group |
str | None |
None |
Language group name (server default if unset) |
model_name |
str | None |
None |
Model name (language group default if unset) |
interim_results |
bool |
True |
Return partial transcripts during speech |
single_utterance |
bool |
False |
Stop after first complete utterance |
max_alternatives |
int |
1 |
Maximum recognition alternatives |
enable_word_timing |
bool |
False |
Include word-level timestamps |
tls |
bool |
False |
Use TLS for connection |
ca_cert |
bytes | None |
None |
PEM CA certificate for TLS |
client_cert |
bytes | None |
None |
PEM client certificate (mutual TLS) |
client_key |
bytes | None |
None |
PEM client private key (mutual TLS) |
grpc_timeout |
float | None |
None |
Overall gRPC deadline in seconds |
mrcp_no_input_timeout |
int | None |
None |
ms of silence before NO_INPUT_TIMEOUT (server default if unset) |
mrcp_recognition_timeout |
int | None |
None |
Maximum total utterance duration in ms (server default if unset) |
mrcp_speech_complete_timeout |
int | None |
None |
Silence after speech (match expected) in ms (server default if unset) |
mrcp_speech_incomplete_timeout |
int | None |
None |
Silence after speech (no match yet) in ms (server default if unset) |
The four mrcp_* parameters map directly to MRCP speech recognition resource headers:
mrcp_no_input_timeout— how long to wait for the user to start speaking before giving upmrcp_recognition_timeout— hard cap on total recognition time; set large (e.g.600000) for long utterancesmrcp_speech_complete_timeout— silence duration after speech that ends the utterance when a grammar match is possible; smaller values make recognition feel more responsive (e.g.1000)mrcp_speech_incomplete_timeout— silence duration when no match is possible yet; typically larger thanspeech_complete_timeout
The plugin uses the livekit.plugins.techmo logger. To see interim and final transcript events, enable DEBUG level logging. With LiveKit Agents this is done via the LIVEKIT_LOG_LEVEL environment variable:
LIVEKIT_LOG_LEVEL=DEBUG python my_agent.py dev# Generate gRPC stubs
python hatch_build.py
# Run linter
ruff check .
# Run formatter
ruff format .
# Run unit tests (no server required)
pytest tests/ -v
# Run integration tests (requires TECHMO_ASR_ADDRESS)
TECHMO_ASR_ADDRESS=localhost:5555 pytest tests/test_integration.py -vThis plugin uses the Techmo ASR v1p1 gRPC API. The .proto definition is located at
proto/techmo/asr/api/v1p1/asr.proto.