Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法保存模型 #166

Closed
3 tasks done
luhairong11 opened this issue Aug 22, 2023 · 6 comments
Closed
3 tasks done

无法保存模型 #166

luhairong11 opened this issue Aug 22, 2023 · 6 comments

Comments

@luhairong11
Copy link

提交前必须检查以下项目

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案
  • 第三方插件问题:例如llama.cpptext-generation-webui等,同时建议到对应的项目中查找解决方案

问题类型

模型训练与精调

基础模型

Alpaca-2-7B

操作系统

Linux

详细描述问题

python run_clm_sft_with_peft.py --deepspeed ${deepspeed_config_file}
--model_name_or_path ${pretrained_model}
--tokenizer_name_or_path ${chinese_tokenizer_path}
--dataset_dir ${dataset_dir}
--validation_split_percentage 0.001
--per_device_train_batch_size ${per_device_train_batch_size}
--per_device_eval_batch_size ${per_device_eval_batch_size}
--do_train
--do_eval
--seed $RANDOM
--fp16
--num_train_epochs 3
--lr_scheduler_type cosine
--learning_rate ${lr}
--warmup_ratio 0.03
--weight_decay 0
--logging_strategy steps
--logging_steps 10
--save_strategy steps
--save_total_limit 3
--evaluation_strategy steps
--eval_steps 5
--save_steps 5
--gradient_accumulation_steps ${gradient_accumulation_steps}
--preprocessing_num_workers 8
--max_seq_length 1024
--output_dir ${output_dir}
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--lora_rank ${lora_rank}
--lora_alpha ${lora_alpha}
--trainable ${lora_trainable}
--modules_to_save ${modules_to_save}
--lora_dropout ${lora_dropout}
--torch_dtype float32
--validation_file ${validation_file}
--gradient_checkpointing
--ddp_find_unused_parameters False

依赖情况(代码类问题务必提供)

peft 0.3.0.dev0 /data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src
torch 2.0.1
transformers 4.31.0

运行日志或截图

Traceback (most recent call last):
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/peft_model.py", line 287, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PeftModelForCausalLM' object has no attribute 'save_checkpoint'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/tuners/lora.py", line 211, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LoraModel' object has no attribute 'save_checkpoint'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_clm_sft_with_peft.py", line 442, in
main()
File "run_clm_sft_with_peft.py", line 414, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1901, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2237, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2298, in _save_checkpoint
self.model_wrapped.save_checkpoint(output_dir)
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/peft_model.py", line 289, in getattr
return getattr(self.base_model, name)
File "/data/luhairong/deeplearn/NLP/Chinese-LLaMA-Alpaca-2/3rd/peft/src/peft/tuners/lora.py", line 213, in getattr
return getattr(self.model, name)
File "/data/luhairong/anaconda3/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlamaForCausalLM' object has no attribute 'save_checkpoint'

@luhairong11
Copy link
Author

peft版本的[commit id为13e53fc,是安装要求安装的

@xxw1995
Copy link

xxw1995 commented Aug 23, 2023

peft版本的[commit id为13e53fc,是安装要求安装的

应该是transformer版本的问题 我也有这个问题

@luhairong11
Copy link
Author

peft版本的[commit id为13e53fc,是安装要求安装的

应该是transformer版本的问题 我也有这个问题

我的transformers是按照作者要求的 4.31.0版本,你是怎么解决的呢

@xxw1995
Copy link

xxw1995 commented Aug 23, 2023

peft版本的[commit id为13e53fc,是安装要求安装的

应该是transformer版本的问题 我也有这个问题

我的transformers是按照作者要求的 4.31.0版本,你是怎么解决的呢
暂未解决 尝试重写Trainer 不使用代码中的SavePeftModelCallback 还是报错

@Qznan
Copy link
Contributor

Qznan commented Aug 24, 2023

这是因为你用了--deepspeed ${deepspeed_config_file}启动但是又只使用单机单卡。我这边的解决方案是两种方式任选其一:

1. 第一种 去掉--deepspeed参数配置。例:

python run_clm_sft_with_peft.py \
  --model_name_or_path ${pretrained_model} \
  --tokenizer_name_or_path ${chinese_tokenizer_path} \
   ......

2. 第二种 增加torchrun运行上层。例:

torchrun --standalone --nnodes 1 --nproc-per-node 1 \
run_clm_sft_with_peft.py \
  --deepspeed ${deepspeed_config_file} \
  --model_name_or_path ${pretrained_model} \
  --tokenizer_name_or_path ${chinese_tokenizer_path} \
  ......

@luhairong11
Copy link
Author

这是因为你用了--deepspeed ${deepspeed_config_file}启动但是又只使用单机单卡。我这边的解决方案是两种方式任选其一:

1. 第一种 去掉--deepspeed参数配置。例:

python run_clm_sft_with_peft.py \
  --model_name_or_path ${pretrained_model} \
  --tokenizer_name_or_path ${chinese_tokenizer_path} \
   ......

2. 第二种 增加torchrun运行上层。例:

torchrun --standalone --nnodes 1 --nproc-per-node 1 \
run_clm_sft_with_peft.py \
  --deepspeed ${deepspeed_config_file} \
  --model_name_or_path ${pretrained_model} \
  --tokenizer_name_or_path ${chinese_tokenizer_path} \
  ......

谢谢大佬,用你这种方法解决了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants