New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance on llama3 #27
Comments
@jiwooya1000 if you have any pointer and some llama checkpoints for ORPO, thanks a lot!! |
Hello @JasonZhu1313, Yes there were a lot of models in the orpo-explorers, but we made it private for now since we were running some ablations for building a general recipe for using ORPO, and there were way too many checkpoints from orpo-explorers😅 I did not try Llama-3 + UltraFeedback at the moment, but I did try with the Capybara-Preference dataset. Since I ran with alignment-handbook + 4 A100 + FSDP, here is the .yaml file you could easily try with:
As a result, I've got 7.19 on MT-Bench. We did not fully study the optimal setting for Llama-3 for now, but I will let you know if we have some good insights🙂 Regarding that Llama-3-8B-Instruct with 8.08 was trained on a heavily filtered 10M human preference dataset, we might be able to get close to Llama-3-8B-Instruct with some hparam search! |
Thanks a lot for being responsive as always!! Maybe a different question regarding Figure 5 of your paper. Can you explain what the x-axis is? Each method has a different reward function. How do you plot them on the same axis? Can you explain if any transformation was used? Also, is this analysis done on the win candidates of a test set which is a held out part of the Ultrafeedback dataset? Can you please give full details? Any code that you can point to would be useful. Please give details about Figure 11 too. Thanks! |
Hey,
I am wondering if you have tried ORPO on llama3, I was using the same hp as mistral training the mt-bench score is quite low compared to llama-instruct and similarly trained mistral model.
I used to see a lot of llama3 based model on this hub https://huggingface.co/orpo-explorers but they are suddenly all gone, do you have some reference for me to compare? And share some pointers on what could be wrong in llama3. I am using the same chat template as shown in repo during training and inference.
The text was updated successfully, but these errors were encountered: