Wavenet model for audio generation #624

akshaan · 2020-07-06T05:08:04Z

This implements a first cut Wavenet model for the audio generation task on the VCTK dataset. This uses the Python + Tensorflow version here as a reference implementation.

At the moment only the training loop works and there are some limitations / missing features that
need to be addressed:

Add PaddingFIFOQueue to correctly produce batches from data files
Add generation step to generate new audio samples
Add preprocessing function to trim silent sections of audio
Make model configurable with command line args

Additionally, I have some ideas for future improvements / enhancements that I can include as follow up PRs:

Uselibrosa instead of the simplistic pydub library in Python for reading and processing audio. Initial attempts resulted in
the following error:

Assertion failed: (PassInf && "Expected all immutable passes to be initialized"), function addImmutablePass, file /Users/buildbot/miniconda3/conda-bld/llvmdev_1556270736866/work/lib/IR/LegacyPassManager.cpp, line 849.
Abort trap: 6

Possibly use Swift-native audio libraries like AVFoundation? Initial attempts at this caused linker issues:

dyld: Symbol not found $<symbol-name>
expected in <path-to-swift-toolchain>/AVFoundation.swift

Add XLA support
Add support for more tasks and datasets
Add scalar input mode (non one-hot inputs)
Add L2 regularization for weights
Factor out some of the dataset and audio reading code into common utils

I'd love to get some early feedback on this!

BradLarson · 2020-07-08T17:24:02Z

This looks great, what kind of feedback were you looking for in the draft stage? The external prerequisites are understandable, and it doesn't introduce any new build dependencies, so that looks good to me.

The model itself looks solid, although I haven't tried it yet. Would you like a more thorough review of it now, or wait until you've completed work on the model?

Thanks for working on this, it would be a nice model to have in our examples.

akshaan · 2020-07-08T20:35:18Z

@BradLarson Just wanted to make sure the general direction looks reasonable. Detailed review can definitely wait till after pending work is done. Thanks!

BradLarson · 2020-10-28T14:49:11Z

It's been a little while, so I'm just checking back in on this model. It would be great to have in, if you still had the time to drive it to completion. If not, I totally understand.

akshaan · 2020-11-11T15:25:06Z

It's been a little while, so I'm just checking back in on this model. It would be great to have in, if you still had the time to drive it to completion. If not, I totally understand.

Hey Brad, sorry about the delay here. I'll be picking this back up soon. Once I get the PaddingFIFOQueue ops into swift-apis, I'll run some tests on this and it should be ready for review.

BradLarson · 2020-12-16T17:45:18Z

We're doing another pass on outstanding pull requests, so I just wanted to confirm that you still were planning to move forward with this. Will this need the swift-apis additions first, in order to make this viable?

akshaan · 2020-12-19T00:11:52Z

Hi @BradLarson, sorry for the delay. I've been caught up with other stuff. I definitely plan to return to this at some point but I might not get to it soon. It will require some changes in swift-apis first, unless the PaddingFIFOQueue has already been added there. I'm happy to close this out and re-open it later if that's more convenient

akshaan · 2021-01-27T16:52:10Z

Closing this out since it's outdated. I'll pull in updates from swift-apis etc. and reopen shortly!

akshaan changed the title ~~First cut Wavenet model~~ Wavenet model for audio generation Jul 6, 2020

First cut Wavenet model

1edddf0

akshaan force-pushed the wavenet branch from 2cddcde to 1edddf0 Compare July 6, 2020 05:18

marcrasi requested a review from BradLarson July 8, 2020 17:13

BradLarson changed the base branch from master to main December 1, 2020 16:04

akshaan closed this Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wavenet model for audio generation #624

Wavenet model for audio generation #624

akshaan commented Jul 6, 2020 •

edited

Loading

BradLarson commented Jul 8, 2020

akshaan commented Jul 8, 2020

BradLarson commented Oct 28, 2020

akshaan commented Nov 11, 2020

BradLarson commented Dec 16, 2020

akshaan commented Dec 19, 2020

akshaan commented Jan 27, 2021

Wavenet model for audio generation #624

Wavenet model for audio generation #624

Conversation

akshaan commented Jul 6, 2020 • edited Loading

BradLarson commented Jul 8, 2020

akshaan commented Jul 8, 2020

BradLarson commented Oct 28, 2020

akshaan commented Nov 11, 2020

BradLarson commented Dec 16, 2020

akshaan commented Dec 19, 2020

akshaan commented Jan 27, 2021

akshaan commented Jul 6, 2020 •

edited

Loading