Skip to content

thammegowda/004-nmt-learning-curve

Repository files navigation

NMT Learning Curve

Re-iteration of the famous learning curve experiment from Koehn and Knowles (2017)

nmt learning curve 2017
Figure 1. NMT learning curve (Koehn and Knowles, 2017)

Aim

Neural/deeplearning models are known to have poorer formance in lesser training data scenarios, which is demonstrated in Figure 1. In this work, new neural MT approaches such as Transformers are compared with the non-neural and other predecessors to see how much improvements has been made.

Setup

Train NMT models at different training corpus size, and track its peformance on a test set (BLEU). Use the same data sets and splits as Koehn and Knowles (2017), as well as compare the results with their

nmt learning curve
Figure 2. NMT learning curve revisited

Summary:

  • Transformer NMT requires lesser training data than RNN NMT used by Koehn and Knowles (2017). See Transformer base in the Figure 2.

  • The Transformer base is already consistently higher than prior neural model, it can be further improved by tuning a few hyperparameters such as batch size and vocabualary size (Transformer varbatch in Figure 2)

Take Aways

  • Neural models are parameteric models. Parametric models needs its hyperparameters to be carefully chosen

  • To achieve good performance in low-resource / less training data scenarios, hyperparameter values needs to be carefully set

About

NMT Learning Curve

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages