Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Setting max_length low makes BLEU unexpectedly worse #582

Open
martinpopel opened this issue Feb 13, 2018 · 4 comments
Open

Setting max_length low makes BLEU unexpectedly worse #582

martinpopel opened this issue Feb 13, 2018 · 4 comments
Labels

Comments

@martinpopel
Copy link
Contributor

Sentences longer than the parameter max_length are excluded from training and lowering this parameter helps to prevent OOM errors and allows to use higher batch_size, so it is quite useful.
Unfortunately, setting this parameter too low results in low BLEU and retarded learning curves. The graph below shows curves (evaluated on dev set) for max_length 25, 50, 70, 150, 200 and 400:
1gpu-max_length-b1500

There are two possible explanations, but I think both of them are false:

  • Setting max_length too low makes the training data smaller. However, with max_length=70 only 2.1% of my training sentences are excluded. Moreover, the "70" BLEU curve is decreasing after the first hour of training, while processing the whole training data (one epoch) takes more than two days of training.
  • A model trained on short sentences only does not achieve good results when applied on long sentences. However, there are only 2.2% sentences longer than 70 subwords in my dev set (and 0.3% sentences longer than 100 subwords), so this does not seem to be the cause either.

When I increased the batch_size from 1500 to 2000, the results improved: the "25" and "50" curves were still retarded, but "70" and higher achieved the same result as when training without any max_length restriction.
Can someone explain this? Or even fix it if it is a bug?

@noe
Copy link
Contributor

noe commented Feb 14, 2018

@martinpopel are these numbers from tensor2tensor 1.2.9 or from a more recent version? (I ask this in relation to bug #529 , as 1.2.9 is the version some of us are working in).

@martinpopel
Copy link
Contributor Author

@noe: Yes, these numbers (graph) are with 1.2.9.

@mehmedes
Copy link

@martinpopel How did you find out how many subwords your sentences have?

@martinpopel
Copy link
Contributor Author

@mehmedes using this ad-hoc script

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants