Issues with training on audio (not a bug with this repo) #46

lostmsu · 2021-06-07T06:47:25Z

I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone has faced the same issue when reimplementing Siren on their own.

What I've tried so far:

doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layers
Changing batch size to full length of the sample (I used to do randomized batches of 8*1024)
Using float64 to avoid potential issues with numerical overflows/underflows
Checked network weights: all are finite numbers
Using SGD as a more stable optimizer
Increasing network width/adding more layers

Essentially, all the above actions still led to the same result with loss ~0.0242

The text was updated successfully, but these errors were encountered:

schreon · 2021-06-07T08:50:08Z

I also experienced instability during training, until I just used a very small learning rate ( 1e-5 ) from start to finish. Then train for a lot of epochs, because the training is much slower due to the small learning rate. Did you try something like that already?

…

On Mon, Jun 7, 2021 at 8:47 AM Victor ***@***.***> wrote: I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone has faced the same issue when reimplementing Siren on their own. What I've tried so far: 1. doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layers 2. Changing batch size to full length of the sample (I used to do randomized batches of 8*1024) 3. Using float64 to avoid potential issues with numerical overflows/underflows 4. Checked network weights: all are finite numbers 5. Using SGD as a more stable optimizer 6. Increasing network width/adding more layers Essentially, all the above actions still led to the same result with loss ~0.0242 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#46>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGFHLVVLXMUVKW2SDYJR53TRRTQZANCNFSM46G67GIA> .

lostmsu · 2021-06-07T09:27:59Z

@schreon I used the learning rate from the paper: 5e-5.

But NVM, I figured why it was not training on audio and it was completely my fault: I set incorrect shuffling mode. In TensorFlow when you do model.fit by default data is not shuffled so I assume feeding the audio stream sequentially threw the optimizer off the course each time due to forgetting.

lostmsu · 2021-06-07T09:35:09Z

It also appears that you need to scale omega for the input layer for longer audios.

schreon · 2021-06-07T09:57:50Z

Yes. Did you find a good heuristic for scaling omega with differing input sizes yet? I believe we can scale it linearly per domain. For example, if you squeeze an audio of double size than the one in the paper into -1, 1 you will end up with double frequency, hence doubling omega to omega_input = 6000 would make sense. If this works consistently, we would only have to find one "base omega" for each domain once.

lostmsu · 2021-06-07T22:23:45Z

Yes, I noticed that.
I wonder now if it makes sense to make omega itself a trainable parameter with log scale.

lostmsu closed this as completed Jun 7, 2021

lostmsu reopened this Jun 7, 2021

lostmsu closed this as completed Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with training on audio (not a bug with this repo) #46

Issues with training on audio (not a bug with this repo) #46

lostmsu commented Jun 7, 2021

schreon commented Jun 7, 2021 via email

lostmsu commented Jun 7, 2021

lostmsu commented Jun 7, 2021

schreon commented Jun 7, 2021

lostmsu commented Jun 7, 2021

Issues with training on audio (not a bug with this repo) #46

Issues with training on audio (not a bug with this repo) #46

Comments

lostmsu commented Jun 7, 2021

schreon commented Jun 7, 2021 via email

lostmsu commented Jun 7, 2021

lostmsu commented Jun 7, 2021

schreon commented Jun 7, 2021

lostmsu commented Jun 7, 2021