Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Input, output and indices must be on the current device #13

Closed
super-buster opened this issue Sep 17, 2021 · 4 comments
Closed

Comments

@super-buster
Copy link

Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?

Here is the evironments:
certifi (2021.5.30)
charset-normalizer (2.0.4)
click (8.0.1)
dataclasses (0.8)
filelock (3.0.12)
idna (3.2)
importlib-metadata (4.8.1)
itsdangerous (2.0.1)
Jinja2 (3.0.1)
joblib (1.0.1)
MarkupSafe (2.0.1)
nltk (3.6.2)
numpy (1.19.5)
packaging (21.0)
Pillow (8.3.2)
pip (9.0.3)
pyparsing (2.4.7)
Python-dev (2.0.0.dev0)
regex (2021.8.28)
requests (2.26.0)
sacremoses (0.0.45)
sentencepiece (0.1.96)
setuptools (39.2.0)
six (1.16.0)
tokenizers (0.8.1rc2)
torch (1.8.0+cu111)
torchvision (0.9.0+cu111)
tqdm (4.62.2)
transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
typing-extensions (3.10.0.2)
urllib3 (1.26.6)
Werkzeug (2.0.1)
zipp (3.5.0)

Here is the command line:
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache

Here is the error information:

webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py         --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --model_type=gpt2         --model_name_or_path=gpt2-medium         --tokenizer_name=gpt2-medium         --per_device_train_batch_size 5         --per_device_eval_batch_size 5         --save_steps 500000         --num_train_epochs 5         --do_train         --train_data_file=../data/webnlg_challenge_2017/train.json         --do_eval         --line_by_line         --save_total_limit 1         --overwrite_output_dir         --task_mode webnlg         --eval_data_file=../data/webnlg_challenge_2017/dev.json          --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1         --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000  --cache_dir cache/gpt2-medium-s3 
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
  FutureWarning,
09/16/2021 10:22:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
09/16/2021 10:22:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
objective is 1
False
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
  FutureWarning,
prefixtune
adapting the size of the model embedding to include [PAD]
len(tokenizer) =  50257
len(tokenizer) =  50258
<|endoftext|> 50256
<|endoftext|> 50256
loading the prefix model from  None
training the prefix model from scratch. 
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
webnlg
tgt_avg:  30.665242718446603
src_avg:  49.62568654646324
ratios:  1.6183040519881826
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
  | Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
webnlg
tgt_avg:  31.644375553587246
src_avg:  51.023914968999115
ratios:  1.6124165535386898
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
  | Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
[50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[3554, 5376]

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
  | Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
[50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[23443, 24539]
FORMAT MODE IS  cat
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
  FutureWarning,
09/16/2021 10:22:53 - WARNING - trainer_prefix -   You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
  warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
{'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
09/16/2021 10:22:53 - INFO - trainer_prefix -   ***** Running training *****
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num examples = 18025
09/16/2021 10:22:53 - INFO - trainer_prefix -     Num Epochs = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Instantaneous batch size per device = 5
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total train batch size (w. parallel, distributed & accumulation) = 40
09/16/2021 10:22:53 - INFO - trainer_prefix -     Gradient Accumulation steps = 1
09/16/2021 10:22:53 - INFO - trainer_prefix -     Total optimization steps = 2255
Traceback (most recent call last):
  File "run_language_modeling.py", line 1159, in <module>
    main()
  File "run_language_modeling.py", line 993, in main
    trainer.train(model_path=model_path)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
    tr_loss += self.training_step(model, inputs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
    loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
    outputs = model(**inputs, gpt2_model=gpt2_model)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
    return_dict=return_dict, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
    return_dict=return_dict,
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
    inputs_embeds = self.wte(input_ids)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device

@XiangLi1999
Copy link
Owner

Thanks for the question. When I run my experiments I am only using a single GPU;
could you try including CUDA_VISIBLE_DEVICES=0 and see if the problem persists?

@super-buster
Copy link
Author

super-buster commented Sep 20, 2021

Thanks for the question. When I run my experiments I am only using a single GPU;
could you try including CUDA_VISIBLE_DEVICES=0 and see if the problem persists?

Thank you for the advice. However, it doesn't work after adding CUDA_VISIBLE_DEVICES=0.
Notice that Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False . I think may be some tensors are loaded in cpu that trigger this problem. The code is untouched except the data path.

@XiangLi1999
Copy link
Owner

I think it's probably models loaded to different GPUs. Since you currently still have n_gpu: 8, could you specify n_gpu to be 1 rather than 8.

@super-buster
Copy link
Author

I think it's probably models loaded to different GPUs. Since you currently still have n_gpu: 8, could you specify n_gpu to be 1 rather than 8.

Thanks. I force the TrainingArguments.n_gpu=1 and it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants