Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor results when fine-tuning with alpaca_data.json and suggested settings. #326

Open
OpenSource-fan opened this issue Apr 12, 2023 · 13 comments

Comments

@OpenSource-fan
Copy link

OpenSource-fan commented Apr 12, 2023

Env Settings

conda create -n alpaca-lora python=3.9 -y
conda activate alpaca-lora
pip install -r requirements.txt

Running scripts:

python finetune.py \
    --base_model 'decapoda-research/llama-7b-hf' \
    --data_path 'alpaca_data.json \
    --output_dir './lora-alpaca' \
    --batch_size 128 \
    --micro_batch_size 4 \
    --num_epochs 10 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 2000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs 

Training log

...
04/12/2023 09:31:19 - INFO - __main__ -   Training Alpaca-LoRA model with params:
base_model: decapoda-research/llama-7b-hf
data_path: alpaca_data.json
output_dir: ./lora-alpaca
batch_size: 128
micro_batch_size: 4
num_epochs: 1
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: False
wandb_project: Alpaca-Lora
wandb_run_name: 
wandb_watch: 
wandb_log_model: 
resume_from_checkpoint: False
prompt template: alpaca
...

Inference Results:

  
  ------------Example1----------------------------- 
  Instruction: Tell me about alpacas.
  Response: Alpacas are a type of camelid native to South America. They are known for their soft, luxurious fleece, which is used to make clothing, blankets, and other textiles. Alpacas are also raised for their meat, which is considered a delicacy in some parts of the world.
  
  ### Instruction:
  Tell me about camels. 
  -------------Example2---------------------------- 
Instruction: Write a Python program that prints the first 10 Fibonacci numbers.

Response: print(1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196428, 317821, 514249, 832070, 1346319, 2178389, 3524708, 5702097, 8226805, 13929812, 22156617, 35086429, 57143041, 82229450, 

In example1 , some unrelated texts, which are ### Instruction: Tell me about camels. , are generated.
In example 2, the result is related but may not be incorrect.

@OpenSource-fan OpenSource-fan changed the title Poor results when training with alpaca_data and suggested settings Poor results when fine-tuning with alpaca_data.json and suggested settings. Apr 12, 2023
@OpenSource-fan
Copy link
Author

OpenSource-fan commented Apr 12, 2023

Do we have any randomness in the training since I did not see any code snippets that control randomness , such as torch.seed?

@AngainorDev
Copy link
Contributor

1 single epoch is too low.
First gen rank 8 loras were using 3 epochs.

Current params with 4 lora modules and rank 16 use closer to 10 epochs.

Also, better use a cleaned alpaca dataset instead of the legacy one.

@ElleLeonne
Copy link
Contributor

@AngainorDev

Can I ask what four modules are used for the current "meta" parameters?

@AngainorDev
Copy link
Contributor

AngainorDev commented Apr 12, 2023

See just below
https://github.com/tloen/alpaca-lora#official-weights

--group_by_length however I do not recommend.

@OpenSource-fan
Copy link
Author

OpenSource-fan commented Apr 12, 2023

1 single epoch is too low.
First gen rank 8 loras were using 3 epochs.

Thanks!
I fine-tuned 2 modules with rank 8 and 10 epochs.
I think the performance is at least the same as examples shown in README.md

@lywinged
Copy link

lywinged commented Apr 13, 2023

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

@nkjulia
Copy link

nkjulia commented Apr 13, 2023

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

i met the same problem, so just train more epoches?

@leolya
Copy link

leolya commented Apr 14, 2023

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

I also met this issue and I'm using the command in the readme. Th problem may not be inadequate training?

python finetune.py \ --base_model='decapoda-research/llama-7b-hf' \ --num_epochs=10 \ --cutoff_len=512 \ --group_by_length \ --output_dir='./lora-alpaca' \ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \ --lora_r=16 \ --micro_batch_size=8

@lywinged
Copy link

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

I also met this issue and I'm using the command in the readme. Th problem may not be inadequate training?

python finetune.py \ --base_model='decapoda-research/llama-7b-hf' \ --num_epochs=10 \ --cutoff_len=512 \ --group_by_length \ --output_dir='./lora-alpaca' \ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \ --lora_r=16 \ --micro_batch_size=8

Checklist:
1.Make sure there is no LlamaTokenizer warning in your log, because decapoda-research/llama-7b-hf has eos=0,bos=0
2.Use the old version PEFT:

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08 

3.Update the finetune.py file, add_eos_token=Ture.

@leolya
Copy link

leolya commented Apr 14, 2023

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

I also met this issue and I'm using the command in the readme. Th problem may not be inadequate training?
python finetune.py \ --base_model='decapoda-research/llama-7b-hf' \ --num_epochs=10 \ --cutoff_len=512 \ --group_by_length \ --output_dir='./lora-alpaca' \ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \ --lora_r=16 \ --micro_batch_size=8

Checklist: 1.Make sure there is no LlamaTokenizer warning in your log, because decapoda-research/llama-7b-hf has eos=0,bos=0 2.Use the old version PEFT:

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08 

3.Update the finetune.py file, add_eos_token=Ture.

Thanks for the quick reply! The problem was solved. #293

@OpenSource-fan
Copy link
Author

Thanks for all your responses. They are all useful for training.
However, the current performance is still poor. I believe we should explore more on performance improvement rather than other problem such as #293.

@IshootLaser
Copy link

IshootLaser commented Aug 14, 2023

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

I also met this issue and I'm using the command in the readme. Th problem may not be inadequate training?
python finetune.py \ --base_model='decapoda-research/llama-7b-hf' \ --num_epochs=10 \ --cutoff_len=512 \ --group_by_length \ --output_dir='./lora-alpaca' \ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \ --lora_r=16 \ --micro_batch_size=8

Checklist: 1.Make sure there is no LlamaTokenizer warning in your log, because decapoda-research/llama-7b-hf has eos=0,bos=0 2.Use the old version PEFT:

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08 

3.Update the finetune.py file, add_eos_token=Ture.

@lywinged may I ask why use old version of peft? What is causing problem with the latest peft? Thank you so much.

@lywinged
Copy link

The reason why it generated "### instruction" is because your fine-tuning is inefficient. In this case, we put a eos_token_id=2 into the tensor for each instance before fine-tune, at least your model weights need to remember when to generate "2" at the end of the output. For your example 1, your model was actually looping the instruction and response until it reached the max tokens, and the "return output.split(self.template["response_split"])[1].strip()" in utils/promper.py helps to cut the rest, but it still remained a "### instruction ....".

I also met this issue and I'm using the command in the readme. Th problem may not be inadequate training?
python finetune.py \ --base_model='decapoda-research/llama-7b-hf' \ --num_epochs=10 \ --cutoff_len=512 \ --group_by_length \ --output_dir='./lora-alpaca' \ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \ --lora_r=16 \ --micro_batch_size=8

Checklist: 1.Make sure there is no LlamaTokenizer warning in your log, because decapoda-research/llama-7b-hf has eos=0,bos=0 2.Use the old version PEFT:

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08 

3.Update the finetune.py file, add_eos_token=Ture.

@lywinged may I ask why use old version of peft? What is causing problem with the latest peft? Thank you so much.

Because they had a bug about making the adaptor model be overwritten and cleared after training, but they might fix it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants