-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
无法保存模型 #166
Comments
|
peft版本的[commit id为13e53fc,是安装要求安装的 |
应该是transformer版本的问题 我也有这个问题 |
我的transformers是按照作者要求的 4.31.0版本,你是怎么解决的呢 |
|
|
这是因为你用了--deepspeed ${deepspeed_config_file}启动但是又只使用单机单卡。我这边的解决方案是两种方式任选其一: 1. 第一种 去掉 2. 第二种 增加 |
谢谢大佬,用你这种方法解决了 |
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Alpaca-2-7B
操作系统
Linux
详细描述问题
python run_clm_sft_with_peft.py --deepspeed ${deepspeed_config_file}
--model_name_or_path ${pretrained_model}
--tokenizer_name_or_path ${chinese_tokenizer_path}
--dataset_dir ${dataset_dir}
--validation_split_percentage 0.001
--per_device_train_batch_size ${per_device_train_batch_size}
--per_device_eval_batch_size ${per_device_eval_batch_size}
--do_train
--do_eval
--seed $RANDOM
--fp16
--num_train_epochs 3
--lr_scheduler_type cosine
--learning_rate ${lr}
--warmup_ratio 0.03
--weight_decay 0
--logging_strategy steps
--logging_steps 10
--save_strategy steps
--save_total_limit 3
--evaluation_strategy steps
--eval_steps 5
--save_steps 5
--gradient_accumulation_steps ${gradient_accumulation_steps}
--preprocessing_num_workers 8
--max_seq_length 1024
--output_dir ${output_dir}
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--lora_rank ${lora_rank}
--lora_alpha ${lora_alpha}
--trainable ${lora_trainable}
--modules_to_save ${modules_to_save}
--lora_dropout ${lora_dropout}
--torch_dtype float32
--validation_file ${validation_file}
--gradient_checkpointing
--ddp_find_unused_parameters False
依赖情况(代码类问题务必提供)
peft 0.3.0.dev0 /data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src
torch 2.0.1
transformers 4.31.0
运行日志或截图
Traceback (most recent call last):
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/peft_model.py", line 287, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PeftModelForCausalLM' object has no attribute 'save_checkpoint'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/tuners/lora.py", line 211, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LoraModel' object has no attribute 'save_checkpoint'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_clm_sft_with_peft.py", line 442, in
main()
File "run_clm_sft_with_peft.py", line 414, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1901, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2237, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2298, in _save_checkpoint
self.model_wrapped.save_checkpoint(output_dir)
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/peft_model.py", line 289, in getattr
return getattr(self.base_model, name)
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/tuners/lora.py", line 213, in getattr
return getattr(self.model, name)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlamaForCausalLM' object has no attribute 'save_checkpoint'
The text was updated successfully, but these errors were encountered: