An example of gRPC server implementation in Rust.
The server is designed for audio transcribing (speech-to-text translation), and voice activity detection without using external Speech APIs.
The project uses:
- Tokio asynchronous runtime environment.
- Tonic to implement gRPC.
- Prost to implement Protocol Buffers.
- Vosk an open model for speech recognition.
- Silero an open model for voice activity detection.
- Opt bindings for ONNX runtime.
- ...
If needed, you can change the model (Vosk) to any of available list.
To do this, just download it and put it in the ./model
directory. Also do not forget to specify the path in config.yaml
.
By default, the portable vosk-model-small-ru-0.22
is used.
Also in config.yaml
you can set the value for the fields:
vosk.model_path
- the path to the model (Vosk).vosk.pause_threshold
- duration of pauses to divide speech into phrases. The default is 500ms.vad.model_path
- the path to the model (Silero).vad.sessions_num
- number of sessions.
The configuration passed in the request contains fields:
sample_rate
- defines the sampling frequency of the audio file to be recognized. The availability of the recognition frequency is limited by the selected Vosk model. Forvosk-model-small-ru-0.22
, it is recommended to recognize audio with a frequency of 16000(Hz).split_into_phrases
is responsible for enabling/disabling the splitting of transcribed text into phrases (based on the value frompause_threshold
).max_alternatives
- maximum number of alternatives. If set to non-zero, the most plausible result from the given number of alternatives will be returned.audio_type
- audio file format. At this stage, audio formatsWAV_PCM_S16LE
,RAW_PCM_S16LE
,RAW_PCM_S16BE
are available for recognition.
When using your own audio file, make sure it is in the correct format:
- For WAV - PCM_S16LE mono (frequency 16000Hz)
- For RAW - PCM_S16LE / PCM_S16BE mono (16000Hz)
Otherwise, you will get an error, as the service checks the audio file for consistency.
If you have ffmpeg
installed, you can use it to convert.
To send a request to the server, take transcribe.proto
(from the ./grpc_server/proto
directory), and use it in your client.
You can check if it works, for example, via Postman.
Request structure for rpc Transcribation:
{
"config": {
"sample_rate": 16000,
"max_alternatives": 0,
"split_into_phrases": true,
"audio_type": "WAV_PCM_S16LE"
},
"content": "audio in base64 format"
}
As a result of the recognition, the server will return JSON of the form:
{
"phrases": [
{
"words": [
{
"word": "hello",
"startMs": 60,
"endMs": 480
},
{
"word": "world",
"startMs": 540,
"endMs": 715
}
],
"text": "hello world"
},
{
"words": [
{
"word": "you",
"startMs": 1224,
"endMs": 1450
},
{
"word": "are",
"startMs": 1590,
"endMs": 1710
},
{
"word": "beautiful",
"startMs": 1810,
"endMs": 2143
}
],
"text": "you are beautiful"
}
],
"text": "hello world you are beautiful"
}
Query structure for rpc Vad:
{
"config": {
"sample_rate": 16000,
"audio_type": "WAV_PCM_S16LE"
},
"content": "audio in base64 format"
}
As a result of the recognition, the server will return JSON of the form:
{
"intervals": [
{
"start_s": 0.068,
"end_s": 0.715
},
{
"start_s": 1.224,
"end_s": 2.143
}
]
}
- To install
Rust
on unix-like systems (MacOS, Linux, ...) - run the command in the terminal. After the download is complete, you will get the latest stable version of Rust for your platform, as well as the latest version of Cargo.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Run the following command in the terminal to verify.
cargo --version
- Open the project and run the commands.
Check the code to see if it can be compiled (without running it).
cargo check
Build + run the project (in release mode with optimizations).
cargo run --release
UDP: If you have Windows, see Instructions here.
- You will most likely need to install Vosk.
In the simple version, just run the following command.
pip3 install vosk
- You may also need to install ONNX runtime.
In the simple variant it is enough to execute the following command.
brew install onnxruntime