Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CUDA_VISIBLE_DEVICES set to anything else than 0 causes invalid ordinal error on CUDA #670

Closed
J4Q8 opened this issue Jun 19, 2024 · 2 comments

Comments

@J4Q8
Copy link

J4Q8 commented Jun 19, 2024

Hi!

I am using mistralai/Mistral-7B-v0.1 and when I set CUDA_VISIBLE_DEVICES=1 (or anything else than 0) it gives me the error below. After checking I think that the culprit is this line. It happens because you use the number specified in CUDA_VISIBLE_DEVICES, but when this parameter is specified the gpus are reordered starting with zero. Perhaps just using "cuda:0" is sufficient there. Could you fix this?

Thanks in advance!

P.S. I just tried this with mistral model but perhaps it concerns your other supported models as well.

Error trace:

Traceback (most recent call last):
  File "/home/jlucki/s2024-safety-interp/experiments/robustness_evaluation/finetune.py", line 98, in <module>
    model = FastLanguageModel.get_peft_model(
  File "/local/home/jlucki/mambaforge/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 1709, in get_peft_model
    model = FastLlamaModel.patch_peft_model(model, use_gradient_checkpointing)
  File "/local/home/jlucki/mambaforge/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 1908, in patch_peft_model
    extra_ignored_labels = torch.full((max_seq_length, 1), -100, device = device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@danielhanchen
Copy link
Contributor

Oh interesting hmm

@danielhanchen
Copy link
Contributor

Oh fixed! This actually resolved some issues in #660 ! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants