Closed
Description
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.49.0- Platform: Linux-6.6.0-72.0.0.64.oe2403.x86_64-x86_64-with-glibc2.38
- Python version: 3.10.16
- Huggingface_hub version: 0.29.1
- Safetensors version: 0.5.3
- Accelerate version: 1.4.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.2.2+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA L40
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I finetune qwen2.5 using follow code:
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
dataset = load_dataset("trl-lib/Capybara", split="train")
dataset = dataset.select(range(500))
MODEL_ID = 'Qwen/Qwen2.5-0.5B'
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules="all-linear",
modules_to_save=["lm_head", "embed_token"],
task_type="CAUSAL_LM",
)
args = SFTConfig(
output_dir="Qwen2.5-0.5B-SFT-Capybara", # directory to save and repository id
num_train_epochs=1, # number of training epochs
per_device_train_batch_size=4, # batch size per device during training
gradient_accumulation_steps=4, # number of steps before performing a backward/update pass
gradient_checkpointing=True, # use gradient checkpointing to save memory
optim="adamw_torch_fused", # use fused adamw optimizer
logging_steps=10, # log every 10 steps
save_strategy="epoch", # save checkpoint every epoch
bf16=True, # use bfloat16 precision
tf32=True, # use tf32 precision
learning_rate=2e-4, # learning rate, based on QLoRA paper
max_grad_norm=0.3, # max gradient norm based on QLoRA paper
warmup_ratio=0.03, # warmup ratio based on QLoRA paper
lr_scheduler_type="constant", # use constant learning rate scheduler
push_to_hub=False, # push model to hub
# report_to="tensorboard", # report metrics to tensorboard
)
trainer = SFTTrainer(
MODEL_ID,
train_dataset=dataset,
args=args,
peft_config=peft_config
)
trainer.train()
print('end')
and I use follow code to inference:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "/home/chenjq/pythonWork/nlp/Qwen2.5-0.5B-SFT-Capybara/checkpoint-31"
# peft_model_id = args.output_dir
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_id,
device_map="auto",
torch_dtype=torch.float16
)
prompt = "3的5倍是多少"
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=200
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
print(1)
an error occur when load model with AutoPeftModelForCausalLM:
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Traceback (most recent call last):
File "/home/chenjq/.pycharm_helpers/pydev/pydevd.py", line 1500, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/chenjq/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/chenjq/pythonWork/nlp/test14.py", line 11, in <module>
model = AutoPeftModelForCausalLM.from_pretrained(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/auto.py", line 130, in from_pretrained
return cls._target_peft_class.from_pretrained(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 581, in from_pretrained
load_result = model.load_adapter(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 1239, in load_adapter
load_result = set_peft_model_state_dict(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 451, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 896]) from checkpoint, the shape in current model is torch.Size([151665, 896]).
Process finished with exit code 1
Expected behavior
expecte model can predict normally.