-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with training on audio (not a bug with this repo) #46
Comments
I also experienced instability during training, until I just used a very
small learning rate ( 1e-5 ) from start to finish. Then train for a lot of
epochs, because the training is much slower due to the small learning rate.
Did you try something like that already?
…On Mon, Jun 7, 2021 at 8:47 AM Victor ***@***.***> wrote:
I reimplemented Siren in TensorFlow 2.5. The network easily learns images,
but I can not reproduce result with audio. On the sample file from the
paper loss gets stuck at relatively high value (~0.0242), and network's
output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone
has faced the same issue when reimplementing Siren on their own.
What I've tried so far:
1. doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0
(inner) layers
2. Changing batch size to full length of the sample (I used to do
randomized batches of 8*1024)
3. Using float64 to avoid potential issues with numerical
overflows/underflows
4. Checked network weights: all are finite numbers
5. Using SGD as a more stable optimizer
6. Increasing network width/adding more layers
Essentially, all the above actions still led to the same result with loss
~0.0242
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#46>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGFHLVVLXMUVKW2SDYJR53TRRTQZANCNFSM46G67GIA>
.
|
@schreon I used the learning rate from the paper: 5e-5. But NVM, I figured why it was not training on audio and it was completely my fault: I set incorrect shuffling mode. In TensorFlow when you do |
It also appears that you need to scale |
Yes. Did you find a good heuristic for scaling omega with differing input sizes yet? I believe we can scale it linearly per domain. For example, if you squeeze an audio of double size than the one in the paper into -1, 1 you will end up with double frequency, hence doubling omega to |
Yes, I noticed that. |
I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (
max(abs(x)) ~= 0.012
). Just curious if anyone has faced the same issue when reimplementing Siren on their own.What I've tried so far:
omega
- it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layersfloat64
to avoid potential issues with numerical overflows/underflowsEssentially, all the above actions still led to the same result with loss ~0.0242
The text was updated successfully, but these errors were encountered: