-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pretrained models #90
Comments
For validation you can check: https://github.com/tatsu-lab/alpaca_farm?tab=readme-ov-file#running-automatic-evaluation or directly use AlpacaEval This is the reward model from human preferences: https://huggingface.co/tatsu-lab/alpaca-farm-reward-model-human-wdiff |
I've been trying to generate text using ppo-human but I've just been getting gibberish. It works fine when I use LLama2. Is there an example in AlpacaEval I can refer to? |
The paper mentions that you performed end-to-end validation of AlpacaFarm. Do you have the code up on Github for that? I want to use the LLM pre-trained on human preferences to generate some more preferences.
The text was updated successfully, but these errors were encountered: