Skip to content

Speech-to-Text translation service (Rust, Tonic) (2025)

Notifications You must be signed in to change notification settings

Ave-Sergeev/Dictator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dictator


Русская версия

Description

An example of gRPC server implementation in Rust.

The server is designed for audio transcribing (speech-to-text translation), and voice activity detection without using external Speech APIs.

The project uses:

  1. Tokio asynchronous runtime environment.
  2. Tonic to implement gRPC.
  3. Prost to implement Protocol Buffers.
  4. Vosk an open model for speech recognition.
  5. Silero an open model for voice activity detection.
  6. Opt bindings for ONNX runtime.
  7. ...

If needed, you can change the model (Vosk) to any of available list. To do this, just download it and put it in the ./model directory. Also do not forget to specify the path in config.yaml. By default, the portable vosk-model-small-ru-0.22 is used.

Also in config.yaml you can set the value for the fields:

  • vosk.model_path - the path to the model (Vosk).
  • vosk.pause_threshold - duration of pauses to divide speech into phrases. The default is 500ms.
  • vad.model_path - the path to the model (Silero).
  • vad.sessions_num - number of sessions.

The configuration passed in the request contains fields:

  • sample_rate - defines the sampling frequency of the audio file to be recognized. The availability of the recognition frequency is limited by the selected Vosk model. For vosk-model-small-ru-0.22, it is recommended to recognize audio with a frequency of 16000(Hz).
  • split_into_phrases is responsible for enabling/disabling the splitting of transcribed text into phrases (based on the value from pause_threshold).
  • max_alternatives - maximum number of alternatives. If set to non-zero, the most plausible result from the given number of alternatives will be returned.
  • audio_type - audio file format. At this stage, audio formats WAV_PCM_S16LE, RAW_PCM_S16LE, RAW_PCM_S16BE are available for recognition.

When using your own audio file, make sure it is in the correct format:

  • For WAV - PCM_S16LE mono (frequency 16000Hz)
  • For RAW - PCM_S16LE / PCM_S16BE mono (16000Hz)

Otherwise, you will get an error, as the service checks the audio file for consistency. If you have ffmpeg installed, you can use it to convert.

Usage

To send a request to the server, take transcribe.proto (from the ./grpc_server/proto directory), and use it in your client. You can check if it works, for example, via Postman.

Request structure for rpc Transcribation:

{
  "config": {
    "sample_rate": 16000,
    "max_alternatives": 0,
    "split_into_phrases": true,
    "audio_type": "WAV_PCM_S16LE"
  },
  "content": "audio in base64 format"
}

As a result of the recognition, the server will return JSON of the form:

{
  "phrases": [
    {
      "words": [
        {
          "word": "hello",
          "startMs": 60,
          "endMs": 480
        },
        {
          "word": "world",
          "startMs": 540,
          "endMs": 715
        }
      ],
      "text": "hello world"
    },
    {
      "words": [
        {
          "word": "you",
          "startMs": 1224,
          "endMs": 1450
        },
        {
          "word": "are",
          "startMs": 1590,
          "endMs": 1710
        },
        {
          "word": "beautiful",
          "startMs": 1810,
          "endMs": 2143
        }
      ],
      "text": "you are beautiful"
    }
  ],
  "text": "hello world you are beautiful"
}

Query structure for rpc Vad:

{
  "config": {
    "sample_rate": 16000,
    "audio_type": "WAV_PCM_S16LE"
  },
  "content": "audio in base64 format"
}

As a result of the recognition, the server will return JSON of the form:

{
  "intervals": [
    {
      "start_s": 0.068,
      "end_s": 0.715
    },
    {
      "start_s": 1.224,
      "end_s": 2.143
    }
  ]
}

Local startup

  1. To install Rust on unix-like systems (MacOS, Linux, ...) - run the command in the terminal. After the download is complete, you will get the latest stable version of Rust for your platform, as well as the latest version of Cargo.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  1. Run the following command in the terminal to verify.
cargo --version
  1. Open the project and run the commands.

Check the code to see if it can be compiled (without running it).

cargo check

Build + run the project (in release mode with optimizations).

cargo run --release

UDP: If you have Windows, see Instructions here.

  1. You will most likely need to install Vosk.

In the simple version, just run the following command.

pip3 install vosk
  1. You may also need to install ONNX runtime.

In the simple variant it is enough to execute the following command.

brew install onnxruntime

About

Speech-to-Text translation service (Rust, Tonic) (2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages