Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wavenet model for audio generation #624

Closed
wants to merge 1 commit into from
Closed

Conversation

akshaan
Copy link

@akshaan akshaan commented Jul 6, 2020

This implements a first cut Wavenet model for the audio generation task on the VCTK dataset. This uses the Python + Tensorflow version here as a reference implementation.

At the moment only the training loop works and there are some limitations / missing features that
need to be addressed:

  • Add PaddingFIFOQueue to correctly produce batches from data files
  • Add generation step to generate new audio samples
  • Add preprocessing function to trim silent sections of audio
  • Make model configurable with command line args

Additionally, I have some ideas for future improvements / enhancements that I can include as follow up PRs:

  • Uselibrosa instead of the simplistic pydub library in Python for reading and processing audio. Initial attempts resulted in
    the following error:
Assertion failed: (PassInf && "Expected all immutable passes to be initialized"), function addImmutablePass, file /Users/buildbot/miniconda3/conda-bld/llvmdev_1556270736866/work/lib/IR/LegacyPassManager.cpp, line 849.
Abort trap: 6
  • Possibly use Swift-native audio libraries like AVFoundation? Initial attempts at this caused linker issues:
dyld: Symbol not found $<symbol-name>
expected in <path-to-swift-toolchain>/AVFoundation.swift
  • Add XLA support
  • Add support for more tasks and datasets
  • Add scalar input mode (non one-hot inputs)
  • Add L2 regularization for weights
  • Factor out some of the dataset and audio reading code into common utils

I'd love to get some early feedback on this!

@akshaan akshaan changed the title First cut Wavenet model Wavenet model for audio generation Jul 6, 2020
@BradLarson
Copy link
Contributor

This looks great, what kind of feedback were you looking for in the draft stage? The external prerequisites are understandable, and it doesn't introduce any new build dependencies, so that looks good to me.

The model itself looks solid, although I haven't tried it yet. Would you like a more thorough review of it now, or wait until you've completed work on the model?

Thanks for working on this, it would be a nice model to have in our examples.

@akshaan
Copy link
Author

akshaan commented Jul 8, 2020

@BradLarson Just wanted to make sure the general direction looks reasonable. Detailed review can definitely wait till after pending work is done. Thanks!

@BradLarson
Copy link
Contributor

It's been a little while, so I'm just checking back in on this model. It would be great to have in, if you still had the time to drive it to completion. If not, I totally understand.

@akshaan
Copy link
Author

akshaan commented Nov 11, 2020

It's been a little while, so I'm just checking back in on this model. It would be great to have in, if you still had the time to drive it to completion. If not, I totally understand.

Hey Brad, sorry about the delay here. I'll be picking this back up soon. Once I get the PaddingFIFOQueue ops into swift-apis, I'll run some tests on this and it should be ready for review.

@BradLarson BradLarson changed the base branch from master to main December 1, 2020 16:04
@BradLarson
Copy link
Contributor

We're doing another pass on outstanding pull requests, so I just wanted to confirm that you still were planning to move forward with this. Will this need the swift-apis additions first, in order to make this viable?

@akshaan
Copy link
Author

akshaan commented Dec 19, 2020

Hi @BradLarson, sorry for the delay. I've been caught up with other stuff. I definitely plan to return to this at some point but I might not get to it soon. It will require some changes in swift-apis first, unless the PaddingFIFOQueue has already been added there. I'm happy to close this out and re-open it later if that's more convenient

@akshaan
Copy link
Author

akshaan commented Jan 27, 2021

Closing this out since it's outdated. I'll pull in updates from swift-apis etc. and reopen shortly!

@akshaan akshaan closed this Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants