You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per the email we sent all of the paper authors we (@zouharvi, @obo) trained the predictor and then the estimator on our custom data, but the results were almost random.
Describe the bug
While trying to find the problem we tried to reproduce the WMT result based on your pre-trained models, as mentioned in the documentation. There must be some systematic mistake we're making because the pre-trained estimator produces almost random results.
To Reproduce
Run in an empty directory. The script downloads the model and then tries to estimate the quality of the first sentence from the training dataset in WMT18.
wget https://github.com/Unbabel/OpenKiwi/releases/download/0.1.1/en_de.nmt_models.zip
unzip -n en_de.nmt_models.zip
mkdir output input
echo "the part of the regular expression within the forward slashes defines the pattern ." > ./input/test.src
echo "der Teil des regulären Ausdrucks innerhalb der umgekehrten Schrägstrich definiert das Muster ." > ./input/test.trg
kiwi predict \
--config ./en_de.nmt_models/estimator/target_1/predict.yaml \
--load-model ./en_de.nmt_models/estimator/target_1/model.torch \
--experiment-name "Single line test" \
--output-dir output \
--gpu-id -1 \
--test-source ./input/test.src \
--test-target ./input/test.trg
cat output/tags
Expected result
OK OK OK OK OK OK OK OK OK OK OK OK OK BAD OK OK OK BAD OK OK OK OK OK OK OK OK OK
Of course, the gold annotation contains the extra gap tags, but despite that most of the sentence is classified as OK, which is contrary to the model output (lots of almost zeroes).
Hey @zouharvi, thanks for your interest in OpenKiwi and the detailed issue!
I believe it is not an error that you're making but a misinterpretation of the results.
What we model is the probability of a word being BAD and not the probability of a word being OK. With that in mind, the results you're getting are completely expected 🙂
See below:
Golden Tags (removed gaps)
OK OK OK OK OK OK BAD OK BAD OK OK OK OK
Your results
OK OK OK OK OK OK BAD BAD BAD OK OK OK OK
The model is actually only getting one tag wrong. So it's getting around 92% accuracy, not too bad!
As per the email we sent all of the paper authors we (@zouharvi, @obo) trained the predictor and then the estimator on our custom data, but the results were almost random.
Describe the bug
While trying to find the problem we tried to reproduce the WMT result based on your pre-trained models, as mentioned in the documentation. There must be some systematic mistake we're making because the pre-trained estimator produces almost random results.
To Reproduce
Run in an empty directory. The script downloads the model and then tries to estimate the quality of the first sentence from the training dataset in WMT18.
Expected result
Of course, the gold annotation contains the extra gap tags, but despite that most of the sentence is classified as OK, which is contrary to the model output (lots of almost zeroes).
Actual result
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: