Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kaggle :: GPU P100 :: TypeError: LoraLayer_update_layer() got an unexpected keyword argument 'use_dora' #201

Open
dsbyprateekg opened this issue Feb 28, 2024 · 23 comments
Labels
fixed Fixed!

Comments

@dsbyprateekg
Copy link

Hi,

I am trying to run Alpaca + Gemma 7b full example.ipynb in Kaggle environment and getting following error-
image

while running the below code-
image

Installed libraries versions are: langchain-0.1.9, langchain-community-0.0.24, langchain-core-0.1.27, sentence-transformers-2.4.0
Please have a look at this issue.

@Jonaskouwenhoven
Copy link

Just encountered the same error on Colab. Seems to be a new issue

@DeanChugall
Copy link

Just downgrade HF PEFT to 0.8.2 until unsloth team fix new DORA support form HF PEFT.

!pip install --force-reinstall --no-cache-dir peft==0.8.2

@danielhanchen
Copy link
Contributor

Oh my I will get this fixed ASAP

@danielhanchen danielhanchen added the URGENT BUG Urgent bug label Feb 28, 2024
@RonanKMcGovern
Copy link

Yeah, it's because HuggingFace just merged their DoRA branch to main in the last days. Probably that new argument is slipping through.

@DeanChugall
Copy link

It would be great if we could integrate PEFT internally in Unsloth to prevent these reverse breaking changes in external packages.

@BenjaminBossan
Copy link

Thanks @RonanKMcGovern for sending me here.

Let's set up CI using PEFT and unsloth main to prevent this in the future. Do you want to set it up on your side or should we look into adding it to PEFT?

Regarding this specific error, if possible, add **kwargs to the method so that future additions won't lead to the same kind of error.

@danielhanchen
Copy link
Contributor

@BenjaminBossan Should be fine in the future hopefully - I rewrote the code to use inspect.getsource to patch it internally :) I used to have 1 custom function, but now its dynamic patching

@danielhanchen
Copy link
Contributor

Doing some tests on my end and will push it asap!! Sorry everyone for the issue and also thanks for notifying me!

@danielhanchen
Copy link
Contributor

@DeanChugall @dsbyprateekg @Jonaskouwenhoven Again sorry - just fixed it!! On Kaggle / Colab, a reinstall of Unsloth will have to take place - no need to disconnect - just press restart and run all.

For local machines: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Again sorry and also thanks for notifying me!!

@danielhanchen danielhanchen added the fixed - pending confirmation Fixed, waiting for confirmation from poster label Feb 28, 2024
@dsbyprateekg
Copy link
Author

@danielhanchen Thanks a lot for the quick response and the fix.
It's working but facing another error-
image

ValueError: Invalid pattern: '**' can only be an entire path component

Can you please check and help me to resolve this as well?

@danielhanchen
Copy link
Contributor

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

@dsbyprateekg
Copy link
Author

@dsbyprateekg That's a weird bug - do u have a more complete error trace - ie are u just using our notebook?

It's my bad, I forgot to attach the logs.
Please find attached the complete logs of the error-
logs_kaggle.txt

@danielhanchen
Copy link
Contributor

@dsbyprateekg Is ur Kaggle instance connected to the internet?

@dsbyprateekg
Copy link
Author

@dsbyprateekg Is ur Kaggle instance connected to the internet?

Yes.

@danielhanchen
Copy link
Contributor

Hmm weird bug indeed

@danielhanchen
Copy link
Contributor

@dsbyprateekg Oh try pip install --upgrade datasets I might have to change the datasets version

@dsbyprateekg
Copy link
Author

@DeanChugall Thanks again! It solved my issue and I am able to proceed.

@danielhanchen
Copy link
Contributor

@dsbyprateekg Oh the datasets issue is fine as well? Also I'll reopen this temporarily for people who might have the same issue!! I'll close this in a few days :)

@danielhanchen danielhanchen reopened this Feb 28, 2024
@danielhanchen danielhanchen added fixed Fixed! and removed URGENT BUG Urgent bug fixed - pending confirmation Fixed, waiting for confirmation from poster labels Feb 28, 2024
@dsbyprateekg
Copy link
Author

@danielhanchen Yes, datasets issue was also resolved. But now facing another error-
TypeError: '>' not supported between instances of 'NoneType' and 'int'

While running the training command-
`from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
num_train_epochs=1,
max_steps = None,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)`

Logs are attached.
logs_kaggle.txt

@dsbyprateekg
Copy link
Author

so the issue is resolved once I commented the line max_steps = None.

The next error is with command trainer_stats = trainer.train() and it's related to wandb logon.
Although I have not used it anywhere in the code. It seems it is picking up internally.
`UsageError Traceback (most recent call last)
Cell In[11], line 1
----> 1 trainer_stats = trainer.train()

File /opt/conda/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:331, in SFTTrainer.train(self, *args, **kwargs)
328 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
329 self.model = self._trl_activate_neftune(self.model)
--> 331 output = super().train(*args, **kwargs)
333 # After training we make sure to retrieve back the original forward pass method
334 # for the embedding layer by removing the forward post hook.
335 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1622 hf_hub_utils.enable_progress_bars()
1623 else:
-> 1624 return inner_training_loop(
1625 args=args,
1626 resume_from_checkpoint=resume_from_checkpoint,
1627 trial=trial,
1628 ignore_keys_for_eval=ignore_keys_for_eval,
1629 )

File :272, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:370, in CallbackHandler.on_train_begin(self, args, state, control)
368 def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
369 control.should_training_stop = False
--> 370 return self.call_event("on_train_begin", args, state, control)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py:414, in CallbackHandler.call_event(self, event, args, state, control, **kwargs)
412 def call_event(self, event, args, state, control, **kwargs):
413 for callback in self.callbacks:
--> 414 result = getattr(callback, event)(
415 args,
416 state,
417 control,
418 model=self.model,
419 tokenizer=self.tokenizer,
420 optimizer=self.optimizer,
421 lr_scheduler=self.lr_scheduler,
422 train_dataloader=self.train_dataloader,
423 eval_dataloader=self.eval_dataloader,
424 **kwargs,
425 )
426 # A Callback can skip the return of control if it doesn't change it.
427 if result is not None:

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:767, in WandbCallback.on_train_begin(self, args, state, control, model, **kwargs)
765 args.run_name = None
766 if not self._initialized:
--> 767 self.setup(args, state, model, **kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/integration_utils.py:740, in WandbCallback.setup(self, args, state, model, **kwargs)
737 init_args["name"] = args.run_name
739 if self._wandb.run is None:
--> 740 self._wandb.init(
741 project=os.getenv("WANDB_PROJECT", "huggingface"),
742 **init_args,
743 )
744 # add config parameters (run may have been created manually)
745 self._wandb.config.update(combined_dict, allow_val_change=True)

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1195, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1193 if logger is not None:
1194 logger.exception(str(e))
-> 1195 raise e
1196 except KeyboardInterrupt as e:
1197 assert logger

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:1172, in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
1170 try:
1171 wi = _WandbInit()
-> 1172 wi.setup(kwargs)
1173 assert wi.settings
1174 except_exit = wi.settings._except_exit

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_init.py:306, in _WandbInit.setup(self, kwargs)
303 settings.update(init_settings, source=Source.INIT)
305 if not settings._offline and not settings._noop:
--> 306 wandb_login._login(
307 anonymous=kwargs.pop("anonymous", None),
308 force=kwargs.pop("force", None),
309 _disable_warning=True,
310 _silent=settings.quiet or settings.silent,
311 _entity=kwargs.get("entity") or settings.entity,
312 )
314 # apply updated global state after login was handled
315 wl = wandb.setup()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:317, in _login(anonymous, key, relogin, host, force, timeout, _backend, _silent, _disable_warning, _entity)
314 return logged_in
316 if not key:
--> 317 wlogin.prompt_api_key()
319 # make sure login credentials get to the backend
320 wlogin.propogate_login()

File /opt/conda/lib/python3.10/site-packages/wandb/sdk/wandb_login.py:247, in _WandbLogin.prompt_api_key(self)
241 if status == ApiKeyStatus.NOTTY:
242 directive = (
243 "wandb login [your_api_key]"
244 if self._settings._cli_only_mode
245 else "wandb.login(key=[your_api_key])"
246 )
--> 247 raise UsageError("api_key not configured (no-tty). call " + directive)
249 self.update_session(key, status=status)
250 self._key = key

UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])
`

@danielhanchen
Copy link
Contributor

@dsbyprateekg On wandb:

import os
os.environ["WANDB_DISABLED"] = "true"

then for TrainingArgs:

  seed = 3407,
  output_dir = "outputs",
  report_to = "none",

@dsbyprateekg
Copy link
Author

dsbyprateekg commented Feb 29, 2024

@danielhanchen I have added my wandb login but now I am facing nbclient.exceptions.DeadKernelError: Kernel died error while doing the training using command trainer_stats = trainer.train()

Please check logs and see if you find something wrong here.
logs_kaggle.txt

@danielhanchen
Copy link
Contributor

@dsbyprateekg Oh on the topic of Kaggle - would the Mistral notebook we have help? https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook I tested that vigourously, so hopefully that one doesn't have any issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Fixed!
Projects
None yet
Development

No branches or pull requests

6 participants