ViiTorVoice-NAR

ViiTorVoice is a non-autoregressive speech generation system for voice cloning and local speech editing. The current deployment path uses split gRPC v2 services, with an HTTP gateway for end-to-end calls.

Core capabilities:

Voice cloning: provide prompt audio or a prompt audio codebook, and synthesize speech for the target text.
Local editing: provide source audio, original text, and edited text; the system locates the changed region and resynthesizes only the local segment.
Emotion and paralinguistic control: insert emotion tags and paralinguistic information into text conditions, then enhance them with CFG.
Low-latency inference: supports first block inference, with end-to-end first-frame latency around 60 ms.

For model architecture, features, and technical details, see Technical Notes.

Inference Environment Setup

Run the initialization script from the repository root:

bash init_env.sh

The script creates .venv and installs the dependencies required for inference. Service startup uses this virtual environment by default.

Model Download

Download the model files into local_models/ under the repository root. Do not use symlinks; make sure the model files really exist under the local local_models/ directory.

Model page:

https://huggingface.co/ZzWater/ViiTorVoice-NAR

mkdir -p local_models
huggingface-cli download ZzWater/ViiTorVoice-NAR \
  --local-dir local_models \
  --local-dir-use-symlinks False

If you use another download tool, keep the same rule: place the downloaded files under local_models/, and do not use symlinks.

Service Startup And Management

Services are managed by run_grpc_v2.sh. Use the default all-in-one startup path; all starts encoder, llm, decoder, orchestrator, and http services.

./run_grpc_v2.sh start all
./run_grpc_v2.sh status all
./run_grpc_v2.sh logs orchestrator
./run_grpc_v2.sh stop all

The HTTP service listens on 0.0.0.0:7861 by default. Local access uses http://127.0.0.1:7861. For other ports, model paths, GPU settings, log directories, and environment variables, see viitorvoice/grpc_server/deploy.env.

HTTP Inference Examples

Default local HTTP endpoint:

BASE_URL="http://127.0.0.1:7861"

Health Check

curl "$BASE_URL/health"

Voice Cloning

For no-ref-text cloning, omit ref_text:

curl -X POST "$BASE_URL/v1/voice-clone" \
  -F 'ref_audio=@prompt.wav' \
  -F 'text=今天天气不错，我们下午一起去公园散步吧。' \
  -F 'language=zh' \
  -F 'allow_missing_ref_text=true' \
  --output clone_no_ref_text.wav

Emotion And Paralinguistic Control

After adding emotion or paralinguistic tags to the text, use CFG parameters to strengthen the control effect:

curl -X POST "$BASE_URL/v1/voice-clone" \
  -F 'ref_audio=@prompt.wav' \
  -F 'text=<|emotion-happy|>I finally finished the project, and I feel really happy.' \
  -F 'language=en' \
  -F 'emotion_guidance_scale=6.0' \
  -F 'nvv_guidance_scale=2.0' \
  --output clone_emotion.wav

The available tag set depends on the training data and model configuration. If no corresponding tag is present, the related CFG parameters do not take effect.

Local Editing

Upload source audio, original text, and the complete edited text:

curl -X POST "$BASE_URL/v1/text-local-edit" \
  -F 'source_audio=@source.wav' \
  -F 'original_text=Please send the meeting notes before Friday.' \
  -F 'edited_text=Please send the meeting notes before Monday.' \
  -F 'language=en' \
  -F 'align_granularity=word' \
  -F 'expand_mask_ratio=1.5' \
  -F 'output_format=wav' \
  --output edited.wav

For more HTTP parameters, codebook input, base64 input, and Python examples, see HTTP API Usage. To call the orchestrator gRPC service directly, see gRPC API Usage.

Acknowledgements

The model architecture and training ideas in this project are inspired by:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
docs		docs
scripts		scripts
viitorvoice		viitorvoice
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README_zh.md		README_zh.md
demo_gradio.py		demo_gradio.py
docker-compose.yml		docker-compose.yml
init_env.sh		init_env.sh
requirements-alone.txt		requirements-alone.txt
requirements-grpc.txt		requirements-grpc.txt
run_grpc_v2.sh		run_grpc_v2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViiTorVoice-NAR

Inference Environment Setup

Model Download

Service Startup And Management

HTTP Inference Examples

Health Check

Voice Cloning

Emotion And Paralinguistic Control

Local Editing

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ViiTorVoice-NAR

Inference Environment Setup

Model Download

Service Startup And Management

HTTP Inference Examples

Health Check

Voice Cloning

Emotion And Paralinguistic Control

Local Editing

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages