GitHub - tongjinle123/non_autoregressive_stream_asr

Non-Autoregressive streaming speech recognation

Model backbone: rezero-transformer with directional mask

Core streaming module: ctc trigger

Training tools: amp & pytorch-lightning

Core module is written in batched tensor operation rather than for-loop for speeding up: See src/model/module/mask.py and src/model/layer/spec_augment.py

Model main structure: See src/model/lightning_module/

Model training and inference with half precision (APEX)

Dataset module is hdfs based rather than original dataset for fast data preprocess and fast training

There are some task remained to be done: add grpc client, grpc server and a streamlit webapp as my former stream asr project to make it truly usable. And model to be tested for tensor rt and onnx/onnxruntime and converted, there could be some operator needed to be written with cuda c and binding into pytorch. Use other transformer backbone for fast inference such as sparse transformer/ reformer etc.

Use four loss for chinese english code-switching task and is combined with weight

environment: docker , dockerfile see my github resp "docker_image"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
README.MD		README.MD
config.py		config.py
ctc_output.py		ctc_output.py
main.py		main.py
mean_std.pth		mean_std.pth
preprocess_hdf.py		preprocess_hdf.py
preprocess_wav.py		preprocess_wav.py
process_corpus.py		process_corpus.py
start_experiment.py		start_experiment.py
vocab.model		vocab.model
vocab.vocab		vocab.vocab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-Autoregressive streaming speech recognation

Model backbone: rezero-transformer with directional mask

Core streaming module: ctc trigger

Training tools: amp & pytorch-lightning

Core module is written in batched tensor operation rather than for-loop for speeding up: See src/model/module/mask.py and src/model/layer/spec_augment.py

Model main structure: See src/model/lightning_module/

Model training and inference with half precision (APEX)

Dataset module is hdfs based rather than original dataset for fast data preprocess and fast training

Use four loss for chinese english code-switching task and is combined with weight

environment: docker , dockerfile see my github resp "docker_image"

About

Releases

Packages

Languages

tongjinle123/non_autoregressive_stream_asr

Folders and files

Latest commit

History

Repository files navigation

Non-Autoregressive streaming speech recognation

Model backbone: rezero-transformer with directional mask

Core streaming module: ctc trigger

Training tools: amp & pytorch-lightning

Core module is written in batched tensor operation rather than for-loop for speeding up: See src/model/module/mask.py and src/model/layer/spec_augment.py

Model main structure: See src/model/lightning_module/

Model training and inference with half precision (APEX)

Dataset module is hdfs based rather than original dataset for fast data preprocess and fast training

Use four loss for chinese english code-switching task and is combined with weight

environment: docker , dockerfile see my github resp "docker_image"

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages