-
Notifications
You must be signed in to change notification settings - Fork 354
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! #103
Copy link
Copy link
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
hello, I'm trying to modify some knowledge to Qwen-14b, I have 8 x A100 to use, in yaml file
- when
model_parallel : False, OOM occured - when I set
model_parallel : Trueuse multi-GPU to run ROME/MEMITRuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!error occured - qwen-14b.yaml
alg_name: "ROME"
model_name: "/data/weights/trained/sft/qwen-14b_hf/"
stats_dir: "./data/stats"
device: 0
layers: [10]
fact_token: "subject_last"
v_num_grad_steps: 20
v_lr: 5e-1
v_loss_layer: 39
v_weight_decay: 0.5
clamp_norm_factor: 4
kl_factor: 0.0625
mom2_adjustment: false
context_template_length_params: [[5, 10], [10, 10]]
rewrite_module_tmp: "transformer.h.{}.mlp.c_proj"
layer_module_tmp: "transformer.h.{}"
mlp_module_tmp: "transformer.h.{}.mlp"
attn_module_tmp: "transformer.h.{}.attn"
ln_f_module: "transformer.ln_f"
lm_head_module: "lm_head"
mom2_dataset: "wikipedia"
mom2_n_samples: 100000
mom2_dtype: "float32"
model_parallel: true
- output
Rewrite layer is 10
Tying optimization objective to 39
Recording initial value of v*
Traceback (most recent call last):
File "/data/projects/EasyEdit/play_edit.py", line 88, in <module>
test_ROME_Qwen(cfg_path, model_path)
File "/data/projects/EasyEdit/play_edit.py", line 32, in test_ROME_Qwen
metrics, edited_model, _ = editor.edit(
File "/data/projects/EasyEdit/easyeditor/editors/editor.py", line 247, in edit
edited_model, weights_copy = self.apply_algo(
File "/data/projects/EasyEdit/easyeditor/models/rome/rome_main.py", line 41, in apply_rome_to_model
deltas = execute_rome(model, tok, request, hparams)
File "/data/projects/EasyEdit/easyeditor/models/rome/rome_main.py", line 113, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/data/projects/EasyEdit/easyeditor/models/rome/compute_v.py", line 121, in compute_v
logits = model(**input_tok).logits
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1104, in forward
transformer_outputs = self.transformer(
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 934, in forward
outputs = block(
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 655, in forward
mlp_output = self.mlp(layernorm_output)
File "/opt/conda/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1547, in _call_impl
hook_result = hook(self, args, result)
File "/data/projects/EasyEdit/easyeditor/util/nethook.py", line 80, in retain_hook
output = invoke_with_optional_args(
File "/data/projects/EasyEdit/easyeditor/util/nethook.py", line 451, in invoke_with_optional_args
return fn(*pass_args, **pass_kw)
File "/data/projects/EasyEdit/easyeditor/models/rome/compute_v.py", line 98, in edit_output_fn
cur_out[i, idx, :] += delta
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested