Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with training on audio (not a bug with this repo) #46

Closed
lostmsu opened this issue Jun 7, 2021 · 5 comments
Closed

Issues with training on audio (not a bug with this repo) #46

lostmsu opened this issue Jun 7, 2021 · 5 comments

Comments

@lostmsu
Copy link

lostmsu commented Jun 7, 2021

I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone has faced the same issue when reimplementing Siren on their own.

What I've tried so far:

  1. doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layers
  2. Changing batch size to full length of the sample (I used to do randomized batches of 8*1024)
  3. Using float64 to avoid potential issues with numerical overflows/underflows
  4. Checked network weights: all are finite numbers
  5. Using SGD as a more stable optimizer
  6. Increasing network width/adding more layers

Essentially, all the above actions still led to the same result with loss ~0.0242

@schreon
Copy link

schreon commented Jun 7, 2021 via email

@lostmsu
Copy link
Author

lostmsu commented Jun 7, 2021

@schreon I used the learning rate from the paper: 5e-5.

But NVM, I figured why it was not training on audio and it was completely my fault: I set incorrect shuffling mode. In TensorFlow when you do model.fit by default data is not shuffled so I assume feeding the audio stream sequentially threw the optimizer off the course each time due to forgetting.

@lostmsu lostmsu closed this as completed Jun 7, 2021
@lostmsu lostmsu reopened this Jun 7, 2021
@lostmsu
Copy link
Author

lostmsu commented Jun 7, 2021

It also appears that you need to scale omega for the input layer for longer audios.

@lostmsu lostmsu closed this as completed Jun 7, 2021
@schreon
Copy link

schreon commented Jun 7, 2021

Yes. Did you find a good heuristic for scaling omega with differing input sizes yet? I believe we can scale it linearly per domain. For example, if you squeeze an audio of double size than the one in the paper into -1, 1 you will end up with double frequency, hence doubling omega to omega_input = 6000 would make sense. If this works consistently, we would only have to find one "base omega" for each domain once.

@lostmsu
Copy link
Author

lostmsu commented Jun 7, 2021

Yes, I noticed that.
I wonder now if it makes sense to make omega itself a trainable parameter with log scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants