Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurred while using sentence-level Predictor-Estimator to predict #23

Closed
Zachary-YL opened this issue Apr 22, 2019 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@Zachary-YL
Copy link

After successfully training the sentence-level Predictor and Estimator model,an error occurred while using the Estimator model to predict the sentence-level data.

The command is:
kiwi predict --config experiments_sl/predict_estimator.yaml

And the error is:
2019-04-22 07:19:37.521 [kiwi.lib.predict setup:159] {'batch_size': 64,
'config': 'experiments_sl/predict_estimator.yaml',
'debug': False,
'experiment_name': 'EN-ZH Pretrain Predictor',
'gpu_id': None,
'load_data': None,
'load_model': 'runs/0/464dc10bfc174ac79ca082eae0dea352/best_model.torch',
'load_vocab': None,
'log_interval': 100,
'mlflow_always_log_artifacts': False,
'mlflow_tracking_uri': 'mlruns/',
'model': 'estimator',
'output_dir': 'predictions/predest/ccmt/en_zh',
'quiet': False,
'run_uuid': None,
'save_config': None,
'save_data': None,
'seed': 42}
2019-04-22 07:19:37.521 [kiwi.lib.predict setup:160] Local output directory is: predictions/predest/ccmt/en_zh
2019-04-22 07:19:37.521 [kiwi.lib.predict run:100] Predict with the PredEst (Predictor-Estimator) model
Traceback (most recent call last):
File "/home2/zyl/anaconda3/envs/openkiwi/bin/kiwi", line 10, in
sys.exit(main())
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
predict.main(extra_args)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
predict.predict_from_options(options)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
run(options.model_api, output_dir, options.pipeline, options.model)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 113, in run
model = Model.create_from_file(pipeline_opts.load_model)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 214, in create_from_file
model = Model.subclasses[model_name].from_dict(model_dict)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 235, in from_dict
model.load_state_dict(class_dict[const.STATE_DICT])
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Estimator:
Unexpected key(s) in state_dict: "predictor_tgt.W2", "predictor_tgt.V", "predictor_tgt.C", "predictor_tgt.S", "predictor_tgt.attention.scorer.layers.0.0.weight", "predictor_tgt.attention.scorer.layers.0.0.bias", "predictor_tgt.attention.scorer.layers.1.0.weight", "predictor_tgt.attention.scorer.layers.1.0.bias", "predictor_tgt.embedding_source.weight", "predictor_tgt.embedding_target.weight", "predictor_tgt.lstm_source.weight_ih_l0", "predictor_tgt.lstm_source.weight_hh_l0", "predictor_tgt.lstm_source.bias_ih_l0", "predictor_tgt.lstm_source.bias_hh_l0", "predictor_tgt.lstm_source.weight_ih_l0_reverse", "predictor_tgt.lstm_source.weight_hh_l0_reverse", "predictor_tgt.lstm_source.bias_ih_l0_reverse", "predictor_tgt.lstm_source.bias_hh_l0_reverse", "predictor_tgt.lstm_source.weight_ih_l1", "predictor_tgt.lstm_source.weight_hh_l1", "predictor_tgt.lstm_source.bias_ih_l1", "predictor_tgt.lstm_source.bias_hh_l1", "predictor_tgt.lstm_source.weight_ih_l1_reverse", "predictor_tgt.lstm_source.weight_hh_l1_reverse", "predictor_tgt.lstm_source.bias_ih_l1_reverse", "predictor_tgt.lstm_source.bias_hh_l1_reverse", "predictor_tgt.forward_target.weight_ih_l0", "predictor_tgt.forward_target.weight_hh_l0", "predictor_tgt.forward_target.bias_ih_l0", "predictor_tgt.forward_target.bias_hh_l0", "predictor_tgt.forward_target.weight_ih_l1", "predictor_tgt.forward_target.weight_hh_l1", "predictor_tgt.forward_target.bias_ih_l1", "predictor_tgt.forward_target.bias_hh_l1", "predictor_tgt.backward_target.weight_ih_l0", "predictor_tgt.backward_target.weight_hh_l0", "predictor_tgt.backward_target.bias_ih_l0", "predictor_tgt.backward_target.bias_hh_l0", "predictor_tgt.backward_target.weight_ih_l1", "predictor_tgt.backward_target.weight_hh_l1", "predictor_tgt.backward_target.bias_ih_l1", "predictor_tgt.backward_target.bias_hh_l1", "predictor_tgt.W1.weight".

Could you give some advice for solving this error? Thanks a lot!

@Zachary-YL Zachary-YL added the bug Something isn't working label Apr 22, 2019
@captainvera captainvera pinned this issue Apr 22, 2019
@captainvera captainvera unpinned this issue Apr 22, 2019
@trenous
Copy link
Contributor

trenous commented Apr 23, 2019

Hello Zachary, I was not able to reproduce your bug. Can you verify that you are using the newest version of the repository? If you already do or that does not solve your problem, can you provide the trained model for us to analyze what is the problem?

Best

@Zachary-YL
Copy link
Author

Thank you for your reply.
Here is the trained Estimator model.

https://www.dropbox.com/s/ce9akwcvhs4tcbo/best_model.torch?dl=0

My config file for training Estimator model, and my config file for predicting.

train_estimator_yaml.txt
predict_estimator_yaml.txt

It's worth noting that I trained Estimator model with a CPU, because there will be errors when using GPU training:

2019-04-24 04:03:26.789 [root setup:380] This is run ID: 62e6dc469e3a4971bbce19bc119487c5
2019-04-24 04:03:26.790 [root setup:383] Inside experiment ID: 0 (None)
2019-04-24 04:03:26.790 [root setup:386] Local output directory is: runs/0/62e6dc469e3a4971bbce19bc119487c5
2019-04-24 04:03:26.790 [root setup:389] Logging execution to MLflow at: None
2019-04-24 04:03:26.872 [root setup:395] Using GPU: 0
2019-04-24 04:03:26.873 [root setup:400] Artifacts location: None
2019-04-24 04:03:26.886 [kiwi.lib.train run:154] Training the PredEst (Predictor-Estimator) model
2019-04-24 04:03:27.666 [kiwi.data.utils load_vocabularies_to_fields:126] Loaded vocabularies from runs/predictor/best_model.torch
2019-04-24 04:03:38.657 [kiwi.lib.train run:187] Estimator(
(predictor_tgt): Predictor(
(attention): Attention(
(scorer): MLPScorer(
(layers): ModuleList(
(0): Sequential(
(0): Linear(in_features=1600, out_features=800, bias=True)
(1): Tanh()
)
(1): Sequential(
(0): Linear(in_features=800, out_features=1, bias=True)
(1): Tanh()
)
)
)
)
(embedding_source): Embedding(9300, 200, padding_idx=1)
(embedding_target): Embedding(3845, 200, padding_idx=1)
(lstm_source): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(forward_target): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5)
(backward_target): LSTM(200, 400, num_layers=2, batch_first=True, dropout=0.5)
(W1): Embedding(3845, 200, padding_idx=1)
(_loss): CrossEntropyLoss()
)
(mlp): Sequential(
(0): Linear(in_features=1000, out_features=125, bias=True)
(1): Tanh()
)
(lstm): LSTM(125, 125, batch_first=True, bidirectional=True)
(embedding_out): Linear(in_features=250, out_features=2, bias=True)
(sentence_pred): Sequential(
(0): Linear(in_features=250, out_features=125, bias=True)
(1): Sigmoid()
(2): Linear(in_features=125, out_features=62, bias=True)
(3): Sigmoid()
(4): Linear(in_features=62, out_features=1, bias=True)
)
(xents): ModuleDict(
(tags): CrossEntropyLoss()
)
(mse_loss): MSELoss()
)
2019-04-24 04:03:38.658 [kiwi.lib.train run:188] 16202078 parameters
2019-04-24 04:03:38.670 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 0%| | 1/232 [00:02<09:19, 2.42s/ batches]
Traceback (most recent call last):
File "estimator_train_sl.py", line 4, in
kiwi.train(estimator_config)
File "/home2/zyl/code/OpenKiwi-master/kiwi/lib/train.py", line 79, in train_from_file
return train_from_options(options)
File "/home2/zyl/code/OpenKiwi-master/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/home2/zyl/code/OpenKiwi-master/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/home2/zyl/code/OpenKiwi-master/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/home2/zyl/code/OpenKiwi-master/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/home2/zyl/code/OpenKiwi-master/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/code/OpenKiwi-master/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/code/OpenKiwi-master/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/home2/zyl/code/OpenKiwi-master/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/code/OpenKiwi-master/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/code/OpenKiwi-master/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home2/zyl/anaconda3/envs/openkiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 57.62 MiB (GPU 0; 10.92 GiB total capacity; 6.78 GiB already allocated; 31.50 MiB free; 109.37 MiB cached)

Thanks a lot!

@captainvera
Copy link
Contributor

Hi @Zachary-YL we will look into what is happening on the predict pipeline.

Meanwhile, the error that you're getting when training on a GPU just means that OpenKiwi is trying to allocate more memory than what is available on your GPU. This happens when the combination of Batch_size and N of tokens on a sentence is too large.

You can easily train using the GPU if you do one of two things (or both):

  • Reduce batch size
  • set the source-max-length and target-max-length flags in the training yaml

@trenous
Copy link
Contributor

trenous commented May 25, 2019

Hello Zachary,

Sorry for the long delay in response, our team was busy with the WMT shared task.

I have run your predict-yaml with the model you provided (changing source and target to a toy file) and it worked fine without error.
Are you sure it is not a version issue? The first release of OpenKiwi was breaking when training for sentence level only.

@trenous
Copy link
Contributor

trenous commented Jun 18, 2019

I am closing this as it seems to be solved.

@trenous trenous closed this as completed Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants