-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error supporting Deepseek, Yi #8
Comments
seconding this, would be awesome to see support for https://huggingface.co/chargoddard/Yi-6B-200K-Llama for example! |
Working on it!!! |
@pszemraj Possibly made it work with Yi and maybe the one you mentioned - would be awesome if you could check!! |
Hey thanks! I took a look, and am still having an issue (that said, it is a different error than what I had before, but I unfortunately didn't write that one down). Unsure if this is colab-specific or not, lmk if I should reopen this or make a new issue. version/install
error messagethis is after trying to run the sft trainer wandb: Currently logged in as: pszemraj. Use `wandb login --relogin` to force relogin
Tracking run with wandb version 0.16.1
Run data is saved locally in /content/wandb/run-20231206_001547-47uo1ewk
Syncing run unsloth-llama7b-knowledge-inoc-yt-audio-default-21e8 to Weights & Biases (docs)
View project at https://wandb.ai/pszemraj/sloth-me-maybe
View run at https://wandb.ai/pszemraj/sloth-me-maybe/runs/47uo1ewk
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-12-3d62c575fcfd> in <cell line: 1>()
----> 1 trainer_stats = trainer.train()
9 frames
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py in train(self, *args, **kwargs)
278 self.model = self._trl_activate_neftune(self.model)
279
--> 280 output = super().train(*args, **kwargs)
281
282 # After training we make sure to retrieve back the original forward pass method
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1544 # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
1545 hf_hub_utils.disable_progress_bars()
-> 1546 return inner_training_loop(
1547 args=args,
1548 resume_from_checkpoint=resume_from_checkpoint,
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1858
1859 with self.accelerator.accumulate(model):
-> 1860 tr_loss_step = self.training_step(model, inputs)
1861
1862 if (
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
2732 scaled_loss.backward()
2733 else:
-> 2734 self.accelerator.backward(loss)
2735
2736 return loss.detach() / self.args.gradient_accumulation_steps
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
1901 return
1902 elif self.scaler is not None:
-> 1903 self.scaler.scale(loss).backward(**kwargs)
1904 else:
1905 loss.backward(**kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py in apply(self, *args)
286 )
287 user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 288 return user_fn(self, *args)
289
290 def apply_jvp(self, *args):
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
286 " this checkpoint() is not necessary"
287 )
--> 288 torch.autograd.backward(outputs_with_grad, args_with_grad)
289 grads = tuple(
290 inp.grad if isinstance(inp, torch.Tensor) else None
/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.) replicationI put a sharded checkpoint on the hub to make it easier for colab testing :) should just be able to drop in https://huggingface.co/pszemraj/Yi-6B-200K-Llama-sharded into your colab demo to replicate |
From Reddit, Discord bugs
The text was updated successfully, but these errors were encountered: