GitHub - Ave-Sergeev/Dictator: Speech-to-Text translation service (Rust, Tonic) (2025)

Dictator

Description

An example of gRPC server implementation in Rust.

The server is designed for audio transcribing (speech-to-text translation), and voice activity detection without using external Speech APIs.

The project uses:

Tokio asynchronous runtime environment.
Tonic to implement gRPC.
Prost to implement Protocol Buffers.
Vosk an open model for speech recognition.
Silero an open model for voice activity detection.
Opt bindings for ONNX runtime.
...

If needed, you can change the model (Vosk) to any of available list. To do this, just download it and put it in the ./model directory. Also do not forget to specify the path in config.yaml. By default, the portable vosk-model-small-ru-0.22 is used.

Also in config.yaml you can set the value for the fields:

vosk.model_path - the path to the model (Vosk).
vosk.pause_threshold - duration of pauses to divide speech into phrases. The default is 500ms.
vad.model_path - the path to the model (Silero).
vad.sessions_num - number of sessions.

The configuration passed in the request contains fields:

sample_rate - defines the sampling frequency of the audio file to be recognized. The availability of the recognition frequency is limited by the selected Vosk model. For vosk-model-small-ru-0.22, it is recommended to recognize audio with a frequency of 16000(Hz).
split_into_phrases is responsible for enabling/disabling the splitting of transcribed text into phrases (based on the value from pause_threshold).
max_alternatives - maximum number of alternatives. If set to non-zero, the most plausible result from the given number of alternatives will be returned.
audio_type - audio file format. At this stage, audio formats WAV_PCM_S16LE, RAW_PCM_S16LE, RAW_PCM_S16BE are available for recognition.

When using your own audio file, make sure it is in the correct format:

For WAV - PCM_S16LE mono (frequency 16000Hz)
For RAW - PCM_S16LE / PCM_S16BE mono (16000Hz)

Otherwise, you will get an error, as the service checks the audio file for consistency. If you have ffmpeg installed, you can use it to convert.

Usage

To send a request to the server, take transcribe.proto (from the ./grpc_server/proto directory), and use it in your client. You can check if it works, for example, via Postman.

Request structure for rpc Transcribation:

{
  "config": {
    "sample_rate": 16000,
    "max_alternatives": 0,
    "split_into_phrases": true,
    "audio_type": "WAV_PCM_S16LE"
  },
  "content": "audio in base64 format"
}

As a result of the recognition, the server will return JSON of the form:

{
  "phrases": [
    {
      "words": [
        {
          "word": "hello",
          "startMs": 60,
          "endMs": 480
        },
        {
          "word": "world",
          "startMs": 540,
          "endMs": 715
        }
      ],
      "text": "hello world"
    },
    {
      "words": [
        {
          "word": "you",
          "startMs": 1224,
          "endMs": 1450
        },
        {
          "word": "are",
          "startMs": 1590,
          "endMs": 1710
        },
        {
          "word": "beautiful",
          "startMs": 1810,
          "endMs": 2143
        }
      ],
      "text": "you are beautiful"
    }
  ],
  "text": "hello world you are beautiful"
}

Query structure for rpc Vad:

{
  "config": {
    "sample_rate": 16000,
    "audio_type": "WAV_PCM_S16LE"
  },
  "content": "audio in base64 format"
}

As a result of the recognition, the server will return JSON of the form:

{
  "intervals": [
    {
      "start_s": 0.068,
      "end_s": 0.715
    },
    {
      "start_s": 1.224,
      "end_s": 2.143
    }
  ]
}

Local startup

To install Rust on unix-like systems (MacOS, Linux, ...) - run the command in the terminal. After the download is complete, you will get the latest stable version of Rust for your platform, as well as the latest version of Cargo.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Run the following command in the terminal to verify.

cargo --version

Open the project and run the commands.

Check the code to see if it can be compiled (without running it).

cargo check

Build + run the project (in release mode with optimizations).

cargo run --release

UDP: If you have Windows, see Instructions here.

You will most likely need to install Vosk.

In the simple version, just run the following command.

pip3 install vosk

You may also need to install ONNX runtime.

In the simple variant it is enough to execute the following command.

brew install onnxruntime

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
grpc_server		grpc_server
model		model
silero_vad		silero_vad
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
README.ru.md		README.ru.md
config.yaml		config.yaml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dictator

Description

Usage

Local startup

About

Releases

Packages

Languages

Ave-Sergeev/Dictator

Folders and files

Latest commit

History

Repository files navigation

Dictator

Description

Usage

Local startup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages