Transcript Processing! tpro
takes transcripts produced by
various speech-to-text services and converts them to various standardized
formats.
- download and unzip this
- put these files in in /usr/local/bin/:
- stanford-ner.jar
- classifiers/english.all.3class.distsim.crf.ser.gz
- you might have to update Java on Linux
$ pip install tpro
$ tpro --help
Usage: tpro [OPTIONS] TRANSCRIPT_DATA_PATH OUTPUT_PATH
[amazon|gentle|speechmatics|google] [universal|vo]
Options:
-p, --print-output pretty print the transcript, breaks pipeability
--language-code TEXT specify language, defaults to en-US.
--help Show this message and exit.
- Universal Transcript (JSON)
- viraloverlay (JSON)
- Draft.js JSON
- Word (
.doc
,.docx
) - text files
- SRT (subtitles)