Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor results when training Estimator with parallel data and TER scores #92

Open
lluisg opened this issue Mar 1, 2021 · 1 comment
Open
Labels
question Further information is requested

Comments

@lluisg
Copy link

lluisg commented Mar 1, 2021

Hi!

We are trying to train a EN-FR sentence level QE model by using a predictor estimator model with parallel data.

We are using OpenKiwi 0.1.3 to train it.

The procedure was as follows:

  1. Train the Predictor using parallel data (EN-FR)
  2. Train the Estimator using the Predictor (from step 1) using the following data (as commented in thread Is it possible to train with just src, mt, ter? #46):
    a.the English source sentences
    b.the FR translated sentences using a pretrained MT model
    c.the TER scores for each FR sentence translated

The results obtained were of a Pearson correlation of 0.32 and a Spearman correlation of 0.36, which are below the 0.5018 and 0.5566 obtained on the OpenKiwi paper (https://www.aclweb.org/anthology/P19-3020.pdf).

My question is: is it possible to obtain a similar result using only parallel data? If affirmative there is something wrong on our procedure?

The configuration files used to train are the following:

#predictor_config-enfr.yml
checkpoint-early-stop-patience: 0
checkpoint-keep-only-best: 2
checkpoint-save: true
checkpoint-validation-steps: 50000
dropout-pred: 0.5
embedding-sizes: 200
epochs: 5
experiment-name: Pretrain Predictor
gpu-id: 0
hidden-pred: 400
learning-rate: 2e-3
learning-rate-decay: 0.6
learning-rate-decay-start: 2
log-interval: 100
model: predictor
optimizer: adam
out-embeddings-size: 200
output-dir: runs/predictor-enfr
predict-inverse: false
rnn-layers-pred: 2
source-embeddings-size: 200
source-max-length: 50
source-min-length: 1
source-vocab-min-frequency: 1
source-vocab-size: 45000
split: 0.9
target-embeddings-size: 200
target-max-length: 50
target-min-length: 1
target-vocab-min-frequency: 1
target-vocab-size: 45000
train-batch-size: 16
train-source: custom_data/train-enfr.src
train-target: custom_data/train-enfr.tgt
valid-batch-size: 16
valid-source: custom_data/dev-enfr.src
valid-target: custom_data/dev-enfr.tgt
#estimator_config-enfr.yml
binary-level: false
checkpoint-early-stop-patience: 0
checkpoint-keep-only-best: 2
checkpoint-save: true
checkpoint-validation-steps: 0
dropout-est: 0.0
epochs: 5
experiment-name: Train Estimator
gpu-id: 0
hidden-est: 125
learning-rate: 2e-3
load-pred-target: runs/predictor-enfr/best_model.torch
log-interval: 100
mlp-est: true
model: estimator
output-dir: runs/estimator-enfr
predict-gaps: false
predict-source: false
predict-target: false
rnn-layers-est: 1
sentence-level: true
sentence-ll: false
source-bad-weight: 2.5
target-bad-weight: 2.5
token-level: false
train-batch-size: 16
train-sentence-scores: custom_data/train-enfr.ter
train-source: custom_data/train-enfr.src
train-target: custom_data/train-enfr.pred
valid-batch-size: 16
valid-sentence-scores: custom_data/dev-enfr.ter
valid-source: custom_data/dev-enfr.src
valid-target: custom_data/dev-enfr.pred
wmt18-format: false
#predictions_config-enfr.yml
gpu-id: 0
load-model: runs/estimator-enfr/best_model.torch
model: estimator
output-dir: predictions/predest-enfr
seed: 42
test-source: custom_data/test-enfr.src
test-target: custom_data/test-enfr.pred
valid-batch-size: 64
wmt18-format: false
@captainvera
Copy link
Contributor

Hello @lluisg,
sorry for the (very) late response!

Everything seems alright in your settings and proposed setup. You could play a bit more with the hyper-params but nothing jumps out to me as obviously wrong.

As for your original question "is it possible to obtain a similar result using only parallel data? " I do not know! It is definitely an interesting research question!

Traditionally the community as believed that the multi-task nature of the normal QE setup helps it with both tasks, as HTER and the tag creation are inherently correlated, but who knows, maybe it is possible to get as good results with just parallel data? I would be interested in knowing about your results!

P.S. Is there any specific reason as for why you are using Openkiwi 0.1.3 instead of Openkiwi >2.0 ?

@captainvera captainvera added the question Further information is requested label Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants