This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Unable to reproduce WMT En2De results #539
Comments
read this #444 |
There is a problem since v1.3.0: #529. It seems that it can be worked around by increasing warmup steps (and training long enough), but anyway the optimal hyper-parameters have changed at 1.3.0. |
Will close in favor of #529. Let's continue the discussion there. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm reproducing WMT14 EN-DE experiment.
Here is my training result:
Here is my decoding result:
I've refered to this issue #317 and found my training loss is larger than his
1.56711
at sstep 250000. Is there anything wrong with my training process?My Environment:
Using 4 K40 GPU cards
Training code:
Testing code:
The text was updated successfully, but these errors were encountered: