SqueezeWave Implementation #53

sujeendran · 2020-06-21T22:54:27Z

Hi,
Will it be possible to add a TF2 implementation of SqueezeWave vocoder to this system? The performance is really fast and promising. I'm working on the same. But I'm not well versed with TF2 yet. I had quite a struggle trying to train the PyTorch implementation from the authors with my custom dataset even though it had almost the same characteristics as LJSpeech but double the size of dataset. I believe TF2 is more suitable for post training optimization and deployment.
Original Repo: https://github.com/tianrengao/SqueezeWave

dathudeptrai · 2020-06-22T00:44:24Z

@sujeendran if you can prove the performance of squeezeWave better than MB-melgan, i will implement it :)). There are no reason to add new model in this framework that no faster and no stronger than what was there. Hear audio samples and glance the paper i don't think squeezewave better than mb-melgan on both inference time and quality.

dathudeptrai · 2020-06-22T14:13:09Z

@sujeendran

sujeendran · 2020-06-22T15:19:46Z

@dathudeptrai I will need some time to test mb-melgan on my target platform. I suggested SqueezeWave mostly for the speed and possibility to run on CPU for resource restricted edge devices. TFLite and TFmicro are favorable for such solutions. In my case, I was able to run a combination of FastSpeech and SqueezeWave synthesis on Jetson Nano platform in 0.5 seconds with PyTorch. The quality was not bad, but could have been better. Will update here if I'm successful with mb-melgan.

manmay-nakhashi · 2020-06-24T04:54:58Z

@dathudeptrai @sujeendran it is fast but audio quality is not so good
Intel® Core™ i5-6300U CPU

example 1

taskset --cpu-list 1 python3 synthesis.py "Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu"

Speech synthesis time: 1.7220683097839355

soxi out:
Input File : 'results/Fastspeech with Squeezewave vocoder in pytorch , very fast inference on cpu_112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:05.96 = 131328 samples ~ 446.694 CDDA sectors
File Size : 263k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
approx. 6 sec. audio output in 1.72 sec on single cpu

example 2
taskset --cpu-list 0 python3 synthesis.py "How are you"
Speech synthesis time: 0.3431851863861084
soxi out:
Input File : 'results/How are you _112000_squeezewave.wav'
Channels : 1
Sample Rate : 22050
Precision : 16-bit
Duration : 00:00:00.85 = 18688 samples ~ 63.5646 CDDA sectors
File Size : 37.4k
Bit Rate : 353k
Sample Encoding: 16-bit Signed Integer PCM
0.85 sec. audio output in 0.34 sec on single cpu

dathudeptrai · 2020-07-14T17:45:43Z

@sujeendran any update ?

sujeendran · 2020-07-15T08:32:15Z

@dathudeptrai Hi I haven't worked on SqueezeWave for a while as I am working on tflite c++ inference of fastspeech and mbmelgan. As manmay noted the quality of SqueezeWave is not as good as mbmelgan, but it is definitely faster on my tests running on Jetson Nano on CPU/GPU with PyTorch compared to running FastSpeech+MBMelgan on CPU/GPU with Tensorflow2.x. On Jetson, the GPU takes 2+ seconds(even after warmup) for tiny sentences with Tensorflow2.x with the above pipeline (CPU runs faster, but inference time increases linearly with sentence length). Whereas using the PyTorch GPU implementation of FastSpeech+SqueezeWave is able to do this in ~0.5 seconds irrespective of the sentence length and with no warmup.

dathudeptrai · 2020-07-15T08:58:00Z

@sujeendran on Jetson i think u can inference directly by install our framework without convert into pb or TFlite, i noticed that run inference with @tf.function and input_signature no need warmup compared with pb. In overall i think FastSpeech + mbmelgan is fast enough to run real-time on streaming mode. BTW, did you use 8bit or 32bit for tflite ?, and Jetson nano is ARM ?

sujeendran · 2020-07-15T20:02:17Z

@dathudeptrai You are right about using the @tf.function directly on Jetson for faster inference. But I was trying to reduce the size taken by the model files to avoid keeping the source code on the target device. But the GPU inference is still 2+ seconds at least. I need something that is below 1 second.
In case of TFLite, allowing supported type tf.float16 increased the speed by around 16x I would say. But I couldnt do the same with FastSpeech. The conversion to tflite failed when I gave supported type tf.float16. Jetson Nano is ARM64. Can you help me out with 8bit tflite as you mentioned?

manmay-nakhashi · 2020-07-20T05:36:20Z

@sujeendran use TFLITE_BUILTINS_INT8 as opset while tflite conversion. also can you share your c++ inference code?

sujeendran · 2020-07-21T10:24:35Z

@manmay-nakhashi Thanks for the tip. I will try that. I'm not at the liberty for sharing the complete C++ code, but I can share a bit of minimal mbmelgan inference code sample once the interpreter is loaded. The same pattern can be used for fastspeech, but you just need to set the other input tensor buffers too and inputtensor will be int32_t type.
Hope this helps!

//MB Melgan
//Input signature -> [1 -1 80]float32
//Output signature -> [1 -1 1]float32
void infer(float *inputtensor, int N, float *&output, int &outsize)
{
  //Resize and reallocate tensor buffers only if input dimension has changed.
  if (currentDim != N)
  {
    const std::vector<int> newDim{1, N, 80};
    interpreter->ResizeInputTensor(0, newDim);
    // Allocate tensor buffers.
    interpreter->AllocateTensors();
  }

  // Fill input buffers
  float *inputptr = interpreter->typed_tensor<float>(inputs[0]);
  memcpy((void *)inputptr, inputtensor, sizeof(float) * N * 80);

  // Run inference
  interpreter->Invoke();

  // Read output buffers
  TfLiteIntArray *output_dims = interpreter->tensor(outputs[0])->dims;
  int output_size = output_dims->data[output_dims->size - 2];
  printf("Output shape: [1 %d 1]\n", output_size);

  float *outputptr = interpreter->typed_tensor<float>(outputs[0]);
  output = outputptr;
  outsize = output_size;
}

EDIT: Just removed the kTfLiteOk checks for allocate and invoke calls. It was part of a error check function call i forgot to remove before posting.

sujeendran · 2020-07-27T13:29:32Z

@manmay-nakhashi can you show your code for INT8 conversion of the fastspeech model? I tried several configurations but couldn't get INT8 to work. Did you provide any representative dataset while converting? and how is the quality of inference for INT8?

dathudeptrai · 2020-07-28T17:14:35Z

@sujeendran https://www.tensorflow.org/lite/performance/post_training_quantization

dathudeptrai self-assigned this Jun 22, 2020

dathudeptrai added Discussion 😁 Discuss new feature Feature Request 🤗 Feature support labels Jun 22, 2020

dathudeptrai added the stat:awaiting response ☏ Waiting Response label Jun 22, 2020

dathudeptrai mentioned this issue Aug 4, 2020

Cannot run mb_melgan inference with C API: You must feed a value for placeholder tensor 'saver_filename' with dtype string #179

Closed

dathudeptrai closed this as completed Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqueezeWave Implementation #53

SqueezeWave Implementation #53

sujeendran commented Jun 21, 2020

dathudeptrai commented Jun 22, 2020 •

edited

dathudeptrai commented Jun 22, 2020

sujeendran commented Jun 22, 2020 •

edited

manmay-nakhashi commented Jun 24, 2020

dathudeptrai commented Jul 14, 2020

sujeendran commented Jul 15, 2020 •

edited

dathudeptrai commented Jul 15, 2020

sujeendran commented Jul 15, 2020

manmay-nakhashi commented Jul 20, 2020 •

edited

sujeendran commented Jul 21, 2020 •

edited

sujeendran commented Jul 27, 2020

dathudeptrai commented Jul 28, 2020

SqueezeWave Implementation #53

SqueezeWave Implementation #53

Comments

sujeendran commented Jun 21, 2020

dathudeptrai commented Jun 22, 2020 • edited

dathudeptrai commented Jun 22, 2020

sujeendran commented Jun 22, 2020 • edited

manmay-nakhashi commented Jun 24, 2020

dathudeptrai commented Jul 14, 2020

sujeendran commented Jul 15, 2020 • edited

dathudeptrai commented Jul 15, 2020

sujeendran commented Jul 15, 2020

manmay-nakhashi commented Jul 20, 2020 • edited

sujeendran commented Jul 21, 2020 • edited

sujeendran commented Jul 27, 2020

dathudeptrai commented Jul 28, 2020

dathudeptrai commented Jun 22, 2020 •

edited

sujeendran commented Jun 22, 2020 •

edited

sujeendran commented Jul 15, 2020 •

edited

manmay-nakhashi commented Jul 20, 2020 •

edited

sujeendran commented Jul 21, 2020 •

edited