You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Estimator training crashes during training with WMT19 Russian data
To Reproduce
Steps to reproduce the behavior:
Switch data to WMT2019 Russian data
train predictor
train estimator
See error @ 22% of batches in first epoch, 53/236
Expected behavior
I expected the estimator to train the same way it had for the German datasets
Screenshots
2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last):
File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in
load_entry_point('openkiwi', 'console_scripts', 'kiwi')()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)
Environment (please complete the following information):
OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5
Additional context
did not have this error with all the same hyperparameters w/the german dataset
Tried running smaller batches; batch of 2 works for some time, but then crashes with a different error message.
The text was updated successfully, but these errors were encountered:
Describe the bug
Estimator training crashes during training with WMT19 Russian data
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expected the estimator to train the same way it had for the German datasets
Screenshots
2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last):
File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in
load_entry_point('openkiwi', 'console_scripts', 'kiwi')()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main
return kiwi.cli.main.cli()
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli
train.main(extra_args)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main
train.train_from_options(options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options
trainer = run(ModelClass, output_dir, pipeline_options, model_options)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run
trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run
self.train_epoch(train_iterator, valid_iterator)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch
outputs = self.train_step(batch)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step
model_out = self.model(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward
model_out_tgt = self.predictor_tgt(batch)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward
for i in range(target_len - 2)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in
for i in range(target_len - 2)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward
scores = self.scorer(query, keys)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward
layer_in = layer(layer_in)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward
return torch.tanh(input)
RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)
Environment (please complete the following information):
OS: Linux
OpenKiwi version 0.1.1
Python version 3.6.5
Additional context
The text was updated successfully, but these errors were encountered: