Skip to content

waikato-ufdl/gtk-audio-transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gtk-audio-transcribe

Simple user interface for transcribing audio into text (STT).

The user interface records audio, broadcasts it on a Redis channel and then displays the transcript that was received on another Redis channel. The interface is therefore not tied to a particular STT engine, as long as it can communicate via Redis channels.

Uses python-sounddevice under the hood for recording the audio (stumbled across the library in this post).

Installation

Install redis on your machine (if not already present):

sudo apt install redis-server
sudo systemctl restart redis

Set up the virtual environment and install the application:

virtualenv -p /usr/bin/python3 venv
./venv/bin/pip install "git+ssh://git@github.com/waikato-ufdl/gtk-audio-transcribe.git"

Coqui STT Example

The following example uses Coqui STT via Redis and docker.

Docker

Download the English tflite model into the current directory and start the Coqui container from the same directory:

docker run   \
    --net=host \
    -v `pwd`:/workspace \
    -it waikatodatamining/tf_coqui_stt:1.15.2_0.10.0a10_cpu \
    stt_transcribe_redis \
    --redis_in audio \
    --redis_out transcript \
    --model /workspace/full.tflite \
    --verbose

Config

Create a YAML config file called config.yaml in the current directory with the following content:

# the 
redis:
  host: "localhost"
  port: 6379
  db: 0
  channel_out: "audio"
  channel_in: "transcript"

recording:
  # the device to use for recording
  device: "pulse"
  # maximum length in seconds
  max_duration: 3.0
  # the number of channels
  num_channels: 1
  # the sample rate in Hz
  sample_rate: 16000

Application

Start the application as follows:

./venv/bin/python3 -c config.yaml