Join GitHub today
Fast WaveNet generation using queues (NSynth) (CLA) #669
Here is an implementation of using queues for the WaveNet decoder in NSynth as described in:
This should let you encode using the existing NSynth model and then synthesize from any encoding using a much faster method than the current approach. You can generate a 4 second audio file in a few minutes this way, which isn't terrible. I can get about 100 samples per second using this method (not at all accurate measurements), which means a 4 second clip @ 16 KHz can be synthesized in about 10 minutes. You can potentially use this to also explore different encodings from interpolation or encode your own sounds and explore their syntheses with this generation method much more easily than before.
There is no CLI tool I'm afraid but I'm hoping someone else can develop that to make it easier for others! This just includes a simple python module magenta.models.nsynth.wavenet.generate which includes a function synthesize showing how to use the FastGenerationConfig to load an audio file, encode it, and then synthesize from the encoding.
Lastly, I wasn't familiar with the BUILD system so please let me know if that looks okay.
left a comment
Awesome submission! Just a first comment, I think a couple of the functions could be moved over to utils.py. I'm going to run this PR through our internal linters and let you know if anything needs to be changed.
You should probably add a py_binary to run the program from the command line. It can be super simple, something like...
# Copyright 2017 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. r"""DOC STRING HERE """ # internal imports import tensorflow as tf from magenta.models.nsynth.generate import synthesize FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_string("wav_file", "'model.ckpt-200000", "Path to input file.") tf.app.flags.DEFINE_string("out_file", "'synthesis.wav", "Path to output file.") tf.app.flags.DEFINE_string("ckpt_path", "'model.ckpt-200000", "Path to checkpoint.") tf.app.flags.DEFINE_integer("sample_length", 64000, "Input file size in samples.") tf.app.flags.DEFINE_integer("sample_length", 64000, "Output file size in samples.") tf.app.flags.DEFINE_string("log", "INFO", "The threshold for what messages will be logged." "DEBUG, INFO, WARN, ERROR, or FATAL.") def main(unused_argv=None): tf.logging.set_verbosity(FLAGS.log) synthesize(wav_file=FLAGS.wav_file, ckpt_path=FLAGS.ckpt_path, out_file='synthesis.wav', sample_length=64000, synth_length=64000): if __name__ == "__main__": tf.app.run()