Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error supporting Deepseek, Yi #8

Closed
danielhanchen opened this issue Dec 2, 2023 · 4 comments
Closed

Error supporting Deepseek, Yi #8

danielhanchen opened this issue Dec 2, 2023 · 4 comments

Comments

@danielhanchen
Copy link
Contributor

From Reddit, Discord bugs

@pszemraj
Copy link

pszemraj commented Dec 3, 2023

seconding this, would be awesome to see support for https://huggingface.co/chargoddard/Yi-6B-200K-Llama for example!

@danielhanchen
Copy link
Contributor Author

Working on it!!!

@danielhanchen
Copy link
Contributor Author

@pszemraj Possibly made it work with Yi and maybe the one you mentioned - would be awesome if you could check!!

@pszemraj
Copy link

pszemraj commented Dec 6, 2023

Hey thanks! I took a look, and am still having an issue (that said, it is a different error than what I had before, but I unfortunately didn't write that one down). Unsure if this is colab-specific or not, lmk if I should reopen this or make a new issue.

version/install

!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"
!pip install sentencepiece
==((====))==  Unsloth: Fast Llama patching release 2023.12
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB
O^O/ \_/ \    CUDA compute capability = 7.5
\        /    Pytorch version: 2.1.0+cu118. CUDA Toolkit = 11.8
 "-____-"     bfloat16 support = FALSE

Loading checkpoint shards: 100%
5/5 [01:03<00:00, 10.67s/it]

error message

this is after trying to run the sft trainer

wandb: Currently logged in as: pszemraj. Use `wandb login --relogin` to force relogin
Tracking run with wandb version 0.16.1
Run data is saved locally in /content/wandb/run-20231206_001547-47uo1ewk
Syncing run unsloth-llama7b-knowledge-inoc-yt-audio-default-21e8 to Weights & Biases (docs)
View project at https://wandb.ai/pszemraj/sloth-me-maybe
View run at https://wandb.ai/pszemraj/sloth-me-maybe/runs/47uo1ewk
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-3d62c575fcfd> in <cell line: 1>()
----> 1 trainer_stats = trainer.train()

9 frames
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py in train(self, *args, **kwargs)
    278             self.model = self._trl_activate_neftune(self.model)
    279 
--> 280         output = super().train(*args, **kwargs)
    281 
    282         # After training we make sure to retrieve back the original forward pass method

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1544                 # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
   1545                 hf_hub_utils.disable_progress_bars()
-> 1546                 return inner_training_loop(
   1547                     args=args,
   1548                     resume_from_checkpoint=resume_from_checkpoint,

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1858 
   1859                 with self.accelerator.accumulate(model):
-> 1860                     tr_loss_step = self.training_step(model, inputs)
   1861 
   1862                 if (

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
   2732                 scaled_loss.backward()
   2733         else:
-> 2734             self.accelerator.backward(loss)
   2735 
   2736         return loss.detach() / self.args.gradient_accumulation_steps

/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
   1901             return
   1902         elif self.scaler is not None:
-> 1903             self.scaler.scale(loss).backward(**kwargs)
   1904         else:
   1905             loss.backward(**kwargs)

/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    490                 inputs=inputs,
    491             )
--> 492         torch.autograd.backward(
    493             self, gradient, retain_graph, create_graph, inputs=inputs
    494         )

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    249     # some Python versions print out the first line of a multi-line function
    250     # calls in the traceback and some print out the last line
--> 251     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252         tensors,
    253         grad_tensors_,

/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py in apply(self, *args)
    286             )
    287         user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 288         return user_fn(self, *args)
    289 
    290     def apply_jvp(self, *args):

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
    286                 " this checkpoint() is not necessary"
    287             )
--> 288         torch.autograd.backward(outputs_with_grad, args_with_grad)
    289         grads = tuple(
    290             inp.grad if isinstance(inp, torch.Tensor) else None

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    249     # some Python versions print out the first line of a multi-line function
    250     # calls in the traceback and some print out the last line
--> 251     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252         tensors,
    253         grad_tensors_,

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

replication

I put a sharded checkpoint on the hub to make it easier for colab testing :) should just be able to drop in https://huggingface.co/pszemraj/Yi-6B-200K-Llama-sharded into your colab demo to replicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants