Error supporting Deepseek, Yi #8

danielhanchen · 2023-12-02T06:02:54Z

From Reddit, Discord bugs

pszemraj · 2023-12-03T03:14:38Z

seconding this, would be awesome to see support for https://huggingface.co/chargoddard/Yi-6B-200K-Llama for example!

danielhanchen · 2023-12-03T03:42:05Z

Working on it!!!

danielhanchen · 2023-12-05T16:32:52Z

@pszemraj Possibly made it work with Yi and maybe the one you mentioned - would be awesome if you could check!!

pszemraj · 2023-12-06T00:21:59Z

Hey thanks! I took a look, and am still having an issue (that said, it is a different error than what I had before, but I unfortunately didn't write that one down). Unsure if this is colab-specific or not, lmk if I should reopen this or make a new issue.

version/install

!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"
!pip install sentencepiece

==((====))==  Unsloth: Fast Llama patching release 2023.12
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB
O^O/ \_/ \    CUDA compute capability = 7.5
\        /    Pytorch version: 2.1.0+cu118. CUDA Toolkit = 11.8
 "-____-"     bfloat16 support = FALSE

Loading checkpoint shards: 100%
5/5 [01:03<00:00, 10.67s/it]

error message

this is after trying to run the sft trainer

wandb: Currently logged in as: pszemraj. Use `wandb login --relogin` to force relogin
Tracking run with wandb version 0.16.1
Run data is saved locally in /content/wandb/run-20231206_001547-47uo1ewk
Syncing run unsloth-llama7b-knowledge-inoc-yt-audio-default-21e8 to Weights & Biases (docs)
View project at https://wandb.ai/pszemraj/sloth-me-maybe
View run at https://wandb.ai/pszemraj/sloth-me-maybe/runs/47uo1ewk
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-3d62c575fcfd> in <cell line: 1>()
----> 1 trainer_stats = trainer.train()

9 frames
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py in train(self, *args, **kwargs)
    278             self.model = self._trl_activate_neftune(self.model)
    279 
--> 280         output = super().train(*args, **kwargs)
    281 
    282         # After training we make sure to retrieve back the original forward pass method

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1544                 # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
   1545                 hf_hub_utils.disable_progress_bars()
-> 1546                 return inner_training_loop(
   1547                     args=args,
   1548                     resume_from_checkpoint=resume_from_checkpoint,

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1858 
   1859                 with self.accelerator.accumulate(model):
-> 1860                     tr_loss_step = self.training_step(model, inputs)
   1861 
   1862                 if (

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
   2732                 scaled_loss.backward()
   2733         else:
-> 2734             self.accelerator.backward(loss)
   2735 
   2736         return loss.detach() / self.args.gradient_accumulation_steps

/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
   1901             return
   1902         elif self.scaler is not None:
-> 1903             self.scaler.scale(loss).backward(**kwargs)
   1904         else:
   1905             loss.backward(**kwargs)

/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    490                 inputs=inputs,
    491             )
--> 492         torch.autograd.backward(
    493             self, gradient, retain_graph, create_graph, inputs=inputs
    494         )

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    249     # some Python versions print out the first line of a multi-line function
    250     # calls in the traceback and some print out the last line
--> 251     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252         tensors,
    253         grad_tensors_,

/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py in apply(self, *args)
    286             )
    287         user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 288         return user_fn(self, *args)
    289 
    290     def apply_jvp(self, *args):

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
    286                 " this checkpoint() is not necessary"
    287             )
--> 288         torch.autograd.backward(outputs_with_grad, args_with_grad)
    289         grads = tuple(
    290             inp.grad if isinstance(inp, torch.Tensor) else None

/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    249     # some Python versions print out the first line of a multi-line function
    250     # calls in the traceback and some print out the last line
--> 251     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252         tensors,
    253         grad_tensors_,

RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

replication

I put a sharded checkpoint on the hub to make it easier for colab testing :) should just be able to drop in https://huggingface.co/pszemraj/Yi-6B-200K-Llama-sharded into your colab demo to replicate

danielhanchen mentioned this issue Dec 5, 2023

Pre-release 2023 December version (Mistral, Prelim DPO, WSL, bug fixes) #16

Merged

10 tasks

danielhanchen closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error supporting Deepseek, Yi #8

Error supporting Deepseek, Yi #8

danielhanchen commented Dec 2, 2023

pszemraj commented Dec 3, 2023

danielhanchen commented Dec 3, 2023

danielhanchen commented Dec 5, 2023

pszemraj commented Dec 6, 2023

Error supporting Deepseek, Yi #8

Error supporting Deepseek, Yi #8

Comments

danielhanchen commented Dec 2, 2023

pszemraj commented Dec 3, 2023

danielhanchen commented Dec 3, 2023

danielhanchen commented Dec 5, 2023

pszemraj commented Dec 6, 2023

version/install

error message

replication