Skip to content

xinjli/alqalign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALQAlign

CI Test

alqalign is a phoneme-based multilingual speech alignment toolkit.

It is supposed to be able to handle ~8k language (at least theoretically). See the full list of supported languages.

Install

python setup.py install

Usage

The basic command-line usage with main configures is as follows:

For the details of usage, check the instruction page.

python -m alqalign.run  --lang   <your target language> 
                        --audio  <path to your audio file> 
                        --text   <path to your text file> 
                        --output <path to an output directory>

You can also use it directly as follows:

In [1]: from alqalign.app import align

In [2]: align('./samples/eng/utt.wav', './samples/eng/utt.txt', 'eng')
Out[2]:
[{'utt_id': 'utt-00000-0000028-0000367',
  'start': 0.28,
  'end': 3.67,
  'text': 'A PROGRAMMER WALKS TO THE BUTCHER SHOP AND BUYS A KILO OF MEAT.',
  'score': -0.14},
 {'utt_id': 'utt-00001-0000384-0000917',
  'start': 3.85,
  'end': 9.17,
  'text': 'AN HOUR LATER HE COMES BACK UPSET THAT THE BUTCHER SHORTCHANGED HIM BY 24 GRAMS.',
  'score': -0.13}]

Tutorial

There is one English sample in the samples/eng directory. It contains two files:

  • utt.wav: a wav file containing 10 seconds of speech.
  • utt.txt: a text file containing two lines as follows:
A programmer walks to the butcher shop and buys a kilo of meat.
An hour later he comes back upset that the butcher shortchanged him by 24 grams.

To apply the alignment for each line, you can run the following command:

python -m alqalign.run --lang=eng --audio=./samples/eng/utt.wav --text=./samples/eng/utt.txt --output=./samples/output/eng

The output will be in the ./samples/output/eng directory. It contains a few files including

  • segments: containing timestamps
  • text: containing the aligned text

In this sample, the segments is

utt-00000-0000028-0000367 utt 0.28 3.67
utt-00001-0000384-0000917 utt 3.85 9.17

and text is

utt-00000-0000028-0000367 A PROGRAMMER WALKS TO THE BUTCHER SHOP AND BUYS A KILO OF MEAT.
utt-00001-0000384-0000917 AN HOUR LATER HE COMES BACK UPSET THAT THE BUTCHER SHORTCHANGED HIM BY 24 GRAMS.

About

multilingual speech aligner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published