New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast WaveNet generation using queues (NSynth) (CLA) #669

Merged
merged 25 commits into from Jun 12, 2017

Conversation

Projects
None yet
2 participants
@pkmital
Copy link
Contributor

pkmital commented May 23, 2017

Here is an implementation of using queues for the WaveNet decoder in NSynth as described in:
Ramachandran, P., Le Paine, T., Khorrami, P., Babaeizadeh, M., Chang, S., Zhang, Y., … Huang, T. (2017). Fast Generation For Convolutional Autoregressive Models, 1–5.

This should let you encode using the existing NSynth model and then synthesize from any encoding using a much faster method than the current approach. You can generate a 4 second audio file in a few minutes this way, which isn't terrible. I can get about 100 samples per second using this method (not at all accurate measurements), which means a 4 second clip @ 16 KHz can be synthesized in about 10 minutes. You can potentially use this to also explore different encodings from interpolation or encode your own sounds and explore their syntheses with this generation method much more easily than before.

There is no CLI tool I'm afraid but I'm hoping someone else can develop that to make it easier for others! This just includes a simple python module magenta.models.nsynth.wavenet.generate which includes a function synthesize showing how to use the FastGenerationConfig to load an audio file, encode it, and then synthesize from the encoding.

Lastly, I wasn't familiar with the BUILD system so please let me know if that looks okay.

@jesseengel jesseengel self-requested a review May 23, 2017

@jesseengel jesseengel self-assigned this May 23, 2017

@jesseengel
Copy link
Collaborator

jesseengel left a comment

Awesome submission! Just a first comment, I think a couple of the functions could be moved over to utils.py. I'm going to run this PR through our internal linters and let you know if anything needs to be changed.

You should probably add a py_binary to run the program from the command line. It can be super simple, something like...

# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
r"""DOC STRING HERE
"""
# internal imports
import tensorflow as tf

from magenta.models.nsynth.generate import synthesize

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string("wav_file", "'model.ckpt-200000", "Path to input file.")
tf.app.flags.DEFINE_string("out_file",  "'synthesis.wav", "Path to output file.")
tf.app.flags.DEFINE_string("ckpt_path", "'model.ckpt-200000", "Path to checkpoint.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Input file size in samples.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Output file size in samples.")
tf.app.flags.DEFINE_string("log", "INFO",
                           "The threshold for what messages will be logged."
                           "DEBUG, INFO, WARN, ERROR, or FATAL.")

def main(unused_argv=None):
  tf.logging.set_verbosity(FLAGS.log)
  synthesize(wav_file=FLAGS.wav_file,
                    ckpt_path=FLAGS.ckpt_path,
                    out_file='synthesis.wav',
                    sample_length=64000,
                    synth_length=64000):

if __name__ == "__main__":
  tf.app.run()
import numpy as np


def inv_mu_law(x, mu=255.0):

This comment has been minimized.

@jesseengel

jesseengel May 23, 2017

Collaborator

These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.

return out


def load_audio(wav_file, sample_length=64000):

This comment has been minimized.

@jesseengel

jesseengel May 23, 2017

Collaborator

These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.

jesseengel and others added some commits May 24, 2017

@jesseengel
Copy link
Collaborator

jesseengel left a comment

LGTM, after my commits ;).

@jesseengel jesseengel merged commit bd5f28b into tensorflow:master Jun 12, 2017

1 check passed

cla/google All necessary CLAs are signed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment