Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow.keras behaves wrongly in autoencoder setup compared to keras #39432

Closed
fedxa opened this issue May 11, 2020 · 13 comments
Closed

tensorflow.keras behaves wrongly in autoencoder setup compared to keras #39432

fedxa opened this issue May 11, 2020 · 13 comments
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:feature Feature requests

Comments

@fedxa
Copy link

fedxa commented May 11, 2020

Attempt to recreate the simplistic autoencoder works properly if keras (2.3.1) is used, but fails to converge if tensorflow.keras (2.3.0-tf).

The basic realisation can be found in https://gist.github.com/fedxa/45eb1a412964ddf19820fff347c5b2de

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# from keras.layers import Input, Dense
# from keras.models import Model

input_img = Input(shape=(784,))
encoded = Dense(32, activation='relu')(input_img)
decoded = Dense(784, activation='sigmoid')(encoded)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

autoencoder.fit(x_train, x_train,
                epochs=20,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

If the separate keras 2.3.1 is used -- the example converges fast, if the tensorflow.keras 2.3.0-tf is used no convergence is observed (loss funciton is ~0.6 all the time and autoencoder encodes noise only).

The problem is present in google collab, on linux and MacOS with tensorflow versions 2.0, 2.1, 2.2

@iobtl
Copy link

iobtl commented May 12, 2020

It seems like the problem may lie with optimizer='adadelta'.
Running the code example provided above with the following modification seems to run fine:

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Try using this instead for now.

@Saduf2019
Copy link
Contributor

@fedxa
Please update as per above comment

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label May 12, 2020
@fedxa
Copy link
Author

fedxa commented May 12, 2020

@Saduf2019 Oh, yes! The difference is in the default learning rate for the Adadelta optimizer in keras and tensorflow.keras

The keras version has the defaults:

class Adadelta(Optimizer):
    def __init__(self, learning_rate=1.0, rho=0.95, **kwargs):

whiel the tensorflow.keras is

class Adadelta(optimizer_v2.OptimizerV2):
    def __init__(self,
               learning_rate=0.001,
               rho=0.95,
               epsilon=1e-7,
               name='Adadelta',
               **kwargs):
 

Initialising the optimiser explicitly with the same learning rate makes the example behave in the same way for both keras versions.

Seems it is a feature, not a bug.

@Saduf2019 Saduf2019 added type:feature Feature requests and removed stat:awaiting response Status - Awaiting response from author type:bug Bug labels May 12, 2020
@Saduf2019 Saduf2019 assigned gowthamkpr and unassigned Saduf2019 May 12, 2020
@gowthamkpr
Copy link

@fedxa This has been clearly mentioned in the docs here
learning_rate: A Tensor, floating point value, or a schedule that is a tf.keras.optimizers.schedules.LearningRateSchedule. The learning rate. To match the exact form in the original paper use 1.0

More over the learning rate of all the optimizers have been set to 0.001. This is expected.

@gowthamkpr gowthamkpr added the stat:awaiting response Status - Awaiting response from author label May 12, 2020
@fedxa
Copy link
Author

fedxa commented May 12, 2020

@gowthamkpr Yes, for sure, missed this bit!
By the way, for the documentation writers -- it may be sensible to change on the mentioned page

x_t := x_{t-1}+\Delta x_t

to

x_t := x_{t-1}+ \lambda \Delta x_t

with \lambda being the learning rate.

@iobtl
Copy link

iobtl commented May 12, 2020

@fedxa The algorithm is correct. The learning rate is contained within the term \Delta x_t, since it employs an adaptive learning rate. Look at the Adadelta paper for more information.

@fedxa
Copy link
Author

fedxa commented May 13, 2020

@iobtl Hmm, seems the actual code in tensorflow/python/keras/optimizers.py#L448 does not really correspond to Algorithm 1 in the paper, but is additionally multiplied by the learning rate. C.f. also line-by-line comparison https://stackoverflow.com/questions/56730888/what-is-the-learning-rate-parameter-in-adadelta-optimiser-for-in-keras
So seems that there is an additional learning rate introduced in the code compared to the "automatic" calculation "RMS[Δx]_{t-1}/RMS[g]_t" of learning rate by the algorithm given in the paper and in documentation.

@gowthamkpr gowthamkpr assigned omalleyt12 and unassigned gowthamkpr May 14, 2020
@gowthamkpr gowthamkpr added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels May 14, 2020
@short-circuitt
Copy link

Similar observation here. I believe it is related to the tf.keras.losses binary_crossentropy implementation.

I have run 3 versions in Colab of the Keras VAE tutorial:
https://keras.io/examples/variational_autoencoder/

Version 1 using Keras
Version 2 using tf.keras
Version 3 using tf.keras + a non-standard implementation of binary_crossentropy

Version 1 and 3 converge, version 2 does not

Results:
Version 1:
loss: 156.5520 - val_loss: 156.6273

Version 2:
loss: 460.1004 - val_loss: 459.4628

Version 3:
loss: 157.9199 - val_loss: 158.2236

Colab notebook:
https://gist.github.com/short-circuitt/3f2a004f6726d03f06785b9d2accfa23

@fedxa
Copy link
Author

fedxa commented May 29, 2020

Looks like a separate problem for me... Mine was clearly the initial learning rate in AdaDelta optimizer. Yours looks like different binary_crossentropy realizations

@short-circuitt
Copy link

Hi, thanks for your message, it does look like a separate problem so I will open a different issue for it.

@jvishnuvardhan jvishnuvardhan self-assigned this Mar 1, 2022
@jvishnuvardhan
Copy link
Contributor

@fedxa This is a stale issue. Now, Keras code moved to separate repo keras-team/keras and the code is same whether it is imported from Keras or tf.keras.

Can we close this issue? Thanks!

@jvishnuvardhan jvishnuvardhan added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Mar 1, 2022
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 8, 2022
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

7 participants