New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference in training accuracy and loss using gradientTape vs model.fit with binary_accuracy: A bug? #35585
Comments
Same Problem #35533 |
@NLP-ZY is this a known problem? I didn't understand your comment about it being caused by the relu activation? Is this a known bug? If so is there a solution? |
@amjass12 There are two problem. First caused by relu activation, when I use model.fit, the result is perfect in acc & loss & val_acc & val_loss, but when I use GradientTape, the acc & loss is unusual, abount 8 epoch, every batch predictions is same. Seceond, when I remove relu activation, every batch predictions become usual, the acc & loss as excepted, but the val_acc & val_loss also bad. Anyway, under the same network, the result of model.fit is much better than GradientTape on val_acc & val_loss. |
Network of 13 classification, my code is #35533 |
Hi @NLP-ZY , Thank you for the reply and detailed information. This is very interesting, is this likely a bug? With regards to removing the relu, can you expand on this please .. Do you mean you use a different activation function? I would really like to fix this as soon as possible. |
sorry to respond slowly, I am busy recently. remove relu means use activation=None. I have try many times, I think gradienttape maybe have some bugs, the result of gradienttape is worse than model.fit |
@amjass12 this is my test code, under the same network, the result of model.fit is better than gradienttape |
@gowthamkpr can you help us to figure out ? Is there some implicit operation in model.fit ? why there is much diff in model.fit vs gradienttape |
Hi @NLP-ZY , No worries! Yes this is interesting behaviour indeed! I assume specifying activation =None means that it uses no activation function and is therefore linear model? I would really like to understand this behaviour and get to the bottom of this as I need gradient tape for a shared dense layer! |
I also run into the same problem. The loss is far larger when I updated the weights using tf.GradientTape than calling model.fit. I created a reproducible example in Colab. Could anyone have a look? It would take 2 minutes to reproduce the problem. |
Do you have any updates on this issue? Like what's causing it? Maybe a quick fix? |
I have the exact same problem - consistently much worse results using gradient tape as opposed to model fit, for the exact same network and training. The network I'm using has batch norm, relu, and dropout. |
I am glad this problem is reproducible, maybe someone can look into this? |
yes right now because of this I have to give up on tensorflow 2 and revert to 1.x - may consider pytorch as well |
Any update on this? it would be nice to make some progress with this as i will need to use this soon for some unequal length input and merged layers :D thanks! |
Experiencing the same here. Any updates on when to expect a fix??? |
Please take a look at issue #38596, linked above. I believe this is the underlying issue for why the Keras |
@NLP-ZY I can reproduce your error. Gist is here. The reasons behind the discrepancy is mentioned here. In short,
I made those changes and ran your code for 25 epochs. I get 93.24% training accuracy. Please take a look at the gist. Thanks. Please verify once and close the issue if this was resolved for you. Thanks! |
@amjass12 Can you please follow the steps mentioned above and verify whether it resolved the issue for you. If your issue was not resolved, can you please post a complete standalone code to reproduce your issue. If the issue was resolved, then please close the issue. Thanks! |
@jvishnuvardhan Thanks for your reply. I have try to use |
@jvishnuvardhan I try to set epoch=100 for GradientTape and set epoch=5 for mode.fit. I get loss: 0.0030 - acc;: 1.0000 on epoch 5(model.fit), get loss: 1.4832 - acc: 0.9774 on epoch 100 (GradientTape). you can see my result on the gist. Please help us figure out why GradientTape can't work as Model.fit. |
Does using tf.gradients instead GradientTape make a difference in your cases?? |
@amalroy2016 Thanks, I have a try, but it doesn't work. |
Many thanks for the message I will do this and report back asap! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
Hi all,
I am running a training loop using gradientTape which works well, however I am getting different training accuracy metrics when training using the gradientTape loop vs a straight model.fit method. I apologise if this should be a question for stack overflow, however, to the best of my knowledge the parameters are the same and therefore should be producing exactly the same results (or very close at least).. I therefore think there may be a bug and if any one can help me elucidate this i would really appreciate it!
I have prepared a sequential model as follows:
and for the
model.fit
method, fit as follows:This works well and produces the following results (please note 100 epochs is overkill and the model overfits, however this is just to keep the same epochs as the as the gradientTape loop, otherwise there would be an early-stopping callback normally...
The model metrics are as follows:
This is the expected behaviour (minus the overfitting)... Now when I create the gradientTape loop as follows, the accuracy metrics are of by about ~4-5% during the same 100 epochs, and the reason i suspect a bug is because i believe i am using the appropriate metrics:
When i run this code, it works fine, and the iterations update the states of the accuracy and loss: however, the training accuracy is much lower than the model.fit method, after running also for 100 epochs: showing final epoch result that is printed (each same epoch is iterating over each batch):
Epoch 100, Training loss: 0.027735430747270584, Training accuracy: 93.6534423828125
Epoch 100, Training loss: 0.03832387551665306, Training accuracy: 93.67249298095703
Epoch 100, Training loss: 0.035500235855579376, Training accuracy: 93.69097900390625
Validation loss: 0.3204055726528168 Validation acc: 90.36458587646484
Validation loss: 0.32066160440444946 Validation acc: 89.71354675292969
Validation loss: 0.32083287835121155 Validation acc: 90.49479675292969
Validation loss: 0.3209479749202728 Validation acc: 90.10416412353516
Validation loss: 0.32088229060173035 Validation acc: 90.625
As you can see, the training accuracy is ~4-5% lower compared to the model.fit method. The loss records fine, and also, the validation data looks pretty much just like the validation data in the model.fit method.
Additionally, when i plot accuracy and loss in both model.fit and geadientTape methods, the shape of the curves look pretty much the same, and they both begin to overfit at similar points! but again, there is a huge discrepancy in the training accuracy.
I have specified the adam optimizer as well binary_crossentropy loss in model.fit and gradientTape. For model.fit, when I specific 'accuracy' or 'acc' for metrics, my understanding is that it will call on the binary_accuracy for calculating the accuracy. So as far as I am aware the parameters are similar that results should be fairly similar.
Additionally, when i call
model.compile
after training the model withgradientTape
just to confirm evaluation, the results are slightly different again and look more like the model.fit method:Now model.evaluate shows a loss and accuracy that are very similar to the model.fit method when i call evaluate on X_train and y_train. This is why i am suspect of a bug? Interestingly, the model.evaluate on validation data look similar to the gradientTape loop which leaves me really confused as i am therefore unsure of the true training accuracy and loss!
If anyone can help i would really appreciate this... I am happy to provide further code upstream of the model etc.. Again, apologies if this is not a bug but this seems really confusing to me like an incorrect behaviour...
The text was updated successfully, but these errors were encountered: