We optimized the EmotiVoice inference engine with ~8x speedup. The speedup framework we are used is TensorRT 8.x and above.
Zero, you should update the EmotiVoice submodule:
git submodule init
git submodule update
First, convert the torch model to onnx model:
python th2onnx.py
Then, you will get the onnx model in outputs/onnx
dir.
Just generate the .trt file for TensorRT inference:
python onnx2trt.py
Finally, just try it in a TTS application:
python main.py
Or run with openai http api:
python openai_api.py
Just enjoy it, good luck!
We tests our optimization on a single nvidia 2070's GPU card.
model | torch | ours | speedup |
---|---|---|---|
backend | 85 ms | 11 ms | ~8x |
front+backend | 967ms | 867 ms | ~10% |
We suggest @Netease to optimize the frontend.
Maybe a cpp frontend is useful when you deploy in the production environments.
If this work is helpful for you, please star it.
EmotiVoice-TensorRT is provided under the Apache-2.0 License - see the LICENSE file for details.