Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to conduct Predictor-Estimator predicting #22

Closed
lihongzheng-nlp opened this issue Apr 16, 2019 · 13 comments
Closed

Failed to conduct Predictor-Estimator predicting #22

lihongzheng-nlp opened this issue Apr 16, 2019 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@lihongzheng-nlp
Copy link

After training zh-en data with predictor model, I continued the predict step with following command:
kiwi predict --model estimator --test-source /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.source --test-target /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.target --sentence-level True --gpu-id 0 --output-dir /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/
I got following errors:
[kiwi.lib.predict setup:159] {'batch_size': 64,
'config': None,
'debug': False,
'experiment_name': None,
'gpu_id': 0,
'load_data': None,
'load_model': None,
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': '/home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/',
'quiet': False,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}

Traceback (most recent call last):
File "/home/hzli/anaconda3/bin/kiwi", line 11, in
sys.exit(main())
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
predict.main(extra_args)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
predict.predict_from_options(options)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
run(options.model_api, output_dir, options.pipeline, options.model)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 113, in run
model = Model.create_from_file(pipeline_opts.load_model)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/model.py", line 210, in create_from_file
str(path), map_location=lambda storage, loc: storage
File "/home/hzli/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 356, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'None'
I spent lots time to find out the error, but it never work.
Would you please give me some advice for solving this error? Thank you very much!

@captainvera
Copy link
Contributor

Hey @VictorLi2017,
The way the predict pipeline works is by loading a pre-trained model and creating predictions for data where you don't have tags. Here, you forgot the step of loading the pre-trained model.
You need to pass a --load-model [Path to model] flag to the predict pipeline.

I also realised that this isn't addressed in the documentation and will update it 👍

Note: As a friendly reminder, to use a predictor-estimator you need to first pre-train the predictor on a large parallel corpora and then the estimator on QE data (with tags).

I'm closing the issue, feel free to re-open it if the problem persists!

@lihongzheng-nlp
Copy link
Author

Hey @VictorLi2017,
The way the predict pipeline works is by loading a pre-trained model and creating predictions for data where you don't have tags. Here, you forgot the step of loading the pre-trained model.
You need to pass a --load-model [Path to model] flag to the predict pipeline.

I also realised that this isn't addressed in the documentation and will update it

Note: As a friendly reminder, to use a predictor-estimator you need to first pre-train the predictor on a large parallel corpora and then the estimator on QE data (with tags).

I'm closing the issue, feel free to re-open it if the problem persists!

Hello @captainvera , following your guide, I added --load-model to above full command,
kiwi predict --model estimator --test-source /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.source --test-target /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.target --sentence-level True --gpu-id 0 --output-dir /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/ **--load-model /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/runs/0/de596b315f7a4428bd881224376158bc/best_model.torch**
After a while, but there are no predictation results instead of only a output.log file under the output dir, as attached. I guess there should be some predictation results, right?
output.log

Would you please check it for me, and give me further instructions? Thank you very much!

@captainvera captainvera reopened this Apr 17, 2019
@trenous
Copy link
Contributor

trenous commented Apr 17, 2019

Hey @VictorLi2017 ,

I believe the issue is that the model you are loading is a Predictor, not an Estimator. Is that possible?
If so, train an Estimator with your pretrained Predictor (see this section in the docs, example config) and then run the prediction pipeline again.

Does this solve your issue?

A bit more detail about what happened:
The Predictor model itself does not do quality estimation, it is a conditional language model predicting words in the target given the source.
Now, when calling the Model.predict method, only QE predictions are generated which explains why you did not see any outputs.

And thanks for reporting these problems, you are pointing out some important flaws in our handling of flags and incorrect inputs. This should have generated an informative error message. Improving the parameter parsing and validation is one of our main priorities moving forward.
Best,
Sony

@lihongzheng-nlp
Copy link
Author

Hey @VictorLi2017 ,

I believe the issue is that the model you are loading is a Predictor, not an Estimator. Is that possible?
If so, train an Estimator with your pretrained Predictor (see this section in the docs, example config) and then run the prediction pipeline again.

Does this solve your issue?

A bit more detail about what happened:
The Predictor model itself does not do quality estimation, it is a conditional language model predicting words in the target given the source.
Now, when calling the Model.predict method, only QE predictions are generated which explains why you did not see any outputs.

And thanks for reporting these problems, you are pointing out some important flaws in our handling of flags and incorrect inputs. This should have generated an informative error message. Improving the parameter parsing and validation is one of our main priorities moving forward.
Best,
Sony

Hello @trenous , Yes, I think what I have trained is with Predictor. If not mistaken, the QE pipeline includes three main stages: Training, Predicting and Evaluation, right?
I want to try the Predictor-Estimator model with official Chinese-English data. In the training step, I used kiwi train --model predictor and corresponding parameters, after 50 epoches, I got the best_model.torch in the output dir.
Then the Predicting step, I ran kiwi predict --model estimator --load-model best_model.torch, and got above problems: only a output.log, but with any no prediction results at all. I'm not quite sure that I used the correct model name in the two step?

By the way, I checked the training output.log, records in most epoches are as follow:
target_PERP: nan, target_CORRECT: 0.0000, target_ExpErr: nan
target_PERP: nan, target_CORRECT: 0.0000, target_ExpErr: nan
EVAL_target_PERP: nan, EVAL_target_CORRECT: 0.0415, EVAL_target_ExpErr: nan
I guess there must be some problems with the data. Right?
I'll retry the WMT18 data with the whole pipeline once again, and will update you soon later. Thank you!

@trenous
Copy link
Contributor

trenous commented Apr 18, 2019

Hey,
The predictor-estimator model relies on pretraining of its component model predictor. This is what you did with the command kiwi train --model predictor. The resulting best_model.torch is not a QE model, but can be used to initialize an estimator model like so:

kiwi train --model estimator --load-pred-target best_model.torch

The pretraining step allows you to make use of any parallel corpus in your target language. This can make a significant difference as public QE corpora are usually of a very small size.

Indeed it seems something went wrong with your training, would you mind sharing the config file and data you used?

@lihongzheng-nlp
Copy link
Author

lihongzheng-nlp commented Apr 20, 2019

Hey,
The predictor-estimator model relies on pretraining of its component model predictor. This is what you did with the command kiwi train --model predictor. The resulting best_model.torch is not a QE model, but can be used to initialize an estimator model like so:

kiwi train --model estimator --load-pred-target best_model.torch

The pretraining step allows you to make use of any parallel corpus in your target language. This can make a significant difference as public QE corpora are usually of a very small size.

Indeed it seems something went wrong with your training, would you mind sharing the config file and data you used?

Hello @trenous , I trained sentence-level QE with predictor-estimator, following your last guide, I ran
kiwi train --config experiments/train_predictor.yaml
successfully, and got the best_model.torch,
then I ran kiwi train --config experiments/train_estimator.yaml, but failed once again.
Here is the errors:
Traceback (most recent call last):
File "/home/hzli/anaconda3/bin/kiwi", line 11, in
sys.exit(main())
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 349, in forward
sentence_input = self.make_sentence_input(h_tgt, h_src)
File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 418, in make_sentence_input
h = h_tgt[0] if h_tgt else h_src[0]
TypeError: 'NoneType' object is not subscriptable

Attached is the train_estimator.yaml config file for your reference. Quite strangely, with the exactly same config file, my colleague ran it successfully on his machine. The data is official data used for QE by China Workshop of Machine Translation (CWMT). So I think the data should be good. Thank you!

train_estimator_yaml.txt

@lihongzheng-nlp
Copy link
Author

@trenous PS: the train/dev data files include 4 files respectively: train.source, train.target, train.pe and train.hter, similar name formats like those in WMT sentence-level data.

@captainvera
Copy link
Contributor

captainvera commented Apr 22, 2019

Hello @VictorLi2017 it is indeed extremely weird that your colleague can run it successfully on his machine. From the error messages it seems there was an error with data loading.
As a first step I would make sure the path to your data is correct and that there is no typo.

This issue is hard to diagnose based on the error message since the only information we're getting is that there was an error in data loading. As @trenous mentioned earlier, our handling of flags and inputs is not the safest. As such, it is hard to conclude the exact problem solely from the error message.

If you are sure there is no issue in your path to the files, would you mind running with the --debug flag and posting the output log here (or the console output with timestamps if possible)?

@lihongzheng-nlp
Copy link
Author

Hello @captainvera I'm sure that the path to the data is correct, I've already finished train_predictor step once again, but train_estimator step alway has the same error that TypeError: 'NoneType' object is not subscriptable
Attached is the train_estimator.log trained with --debug, please check it. Thank you!
train_estimator.log

@trenous
Copy link
Contributor

trenous commented Apr 23, 2019

@VictorLi2017 Can you run git pull and let us know if the error persists? We fixed a bug related to training sentence-level only models recently.

@lihongzheng-nlp
Copy link
Author

Hello @trenous the repo I used yesterday is already the latest version. I tried zh-en, en-zh pairs, even the official sentence-level data of WMT18, all resulted in the same error TypeError: 'NoneType' object is not subscriptable in train_estimator step.

@captainvera captainvera added bug Something isn't working and removed good first issue Good for newcomers labels Apr 26, 2019
@trenous
Copy link
Contributor

trenous commented May 25, 2019

Hello VictorLi,

Sorry for the long response time our team was working on a deadline.

The line numbers in your log file don't match the current version, e.g.:

File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 349, in forward:
    sentence_input = self.make_sentence_input(h_tgt, h_src)

If you look at the changes introduced in this commit - which addresses the bug you encountered - you'll see that that line was No 349 beforehand, and 357 afterwards.

Can you just do a fresh checkout of the repo, that should solve your problem.

@trenous
Copy link
Contributor

trenous commented Jun 18, 2019

I am closing this as it seems to be solved.

@trenous trenous closed this as completed Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants