predictor-estimator crashes with Russian data #32

ninalopatina · 2019-06-24T21:10:16Z

Describe the bug
Estimator training crashes during training with WMT19 Russian data

To Reproduce
Steps to reproduce the behavior:

Switch data to WMT2019 Russian data
train predictor
train estimator
See error @ 22% of batches in first epoch, 53/236

Expected behavior
I expected the estimator to train the same way it had for the German datasets

Screenshots
2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last):
File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in
load_entry_point('openkiwi', 'console_scripts', 'kiwi')()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)

Environment (please complete the following information):
OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5

Additional context

did not have this error with all the same hyperparameters w/the german dataset
Tried running smaller batches; batch of 2 works for some time, but then crashes with a different error message.

ninalopatina · 2019-06-24T23:22:44Z

Nevermind, fixed this by adding a few specifications to the yaml

ninalopatina added the bug Something isn't working label Jun 24, 2019

ninalopatina closed this as completed Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predictor-estimator crashes with Russian data #32

predictor-estimator crashes with Russian data #32

ninalopatina commented Jun 24, 2019

ninalopatina commented Jun 24, 2019

predictor-estimator crashes with Russian data #32

predictor-estimator crashes with Russian data #32

Comments

ninalopatina commented Jun 24, 2019

ninalopatina commented Jun 24, 2019