Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predictor-estimator crashes with Russian data #32

Closed
ninalopatina opened this issue Jun 24, 2019 · 1 comment
Closed

predictor-estimator crashes with Russian data #32

ninalopatina opened this issue Jun 24, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@ninalopatina
Copy link

Describe the bug
Estimator training crashes during training with WMT19 Russian data

To Reproduce
Steps to reproduce the behavior:

  1. Switch data to WMT2019 Russian data
  2. train predictor
  3. train estimator
  4. See error @ 22% of batches in first epoch, 53/236

Expected behavior
I expected the estimator to train the same way it had for the German datasets

Screenshots
2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last):
File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in
load_entry_point('openkiwi', 'console_scripts', 'kiwi')()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)

Environment (please complete the following information):
OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5

Additional context

  • did not have this error with all the same hyperparameters w/the german dataset
  • Tried running smaller batches; batch of 2 works for some time, but then crashes with a different error message.
@ninalopatina ninalopatina added the bug Something isn't working label Jun 24, 2019
@ninalopatina
Copy link
Author

Nevermind, fixed this by adding a few specifications to the yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant