Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a offline dataset for Learning to summarize with human feedback #1

Open
4 tasks
thejaminator opened this issue Feb 26, 2023 · 0 comments
Open
4 tasks
Labels
good first issue Good for newcomers

Comments

@thejaminator
Copy link
Owner

thejaminator commented Feb 26, 2023

Description

We already have a toy example with the imdb reward model.
Let's create one with an actual RLHF task (Learning to summarize with human feedback)[https://arxiv.org/abs/2009.01325]

For this issue, we'll just focus on creating the dataset itself, instead of training our language (policy) model. We'll handle that in another card.

Here's an example of a dataset that I created for the imdb toy scenario. https://huggingface.co/datasets/thejaminator/imdb_rewarded.

For summarizing, the dataset will be slightly different.
It should have the columns (prompt, completion, reward).
Where the prompt is the article to summarize.
The completion should be the summary article by the labeller.
The reward should be the reward from the reward model (RM).

For the reward model, I'm not sure of a pure pretrained RM that we can just use.
However, OpenAssistant does have a RM that was trained on multiple preference datasets, including the summary dataset. So perhaps thats good enough.

  • See if the OpenAssistant RM is good enough to use. Otherwise, we may need to train our own.
  • Run the RM on the summary dataset to get the scalar rewards. Create the dataset where we have the columns (prompt, completion, reward).
  • Upload the dataset to huggingface. You can see an example of creating jsons for export in examples/imdb/export_imdb_reward_dataset.py and https://huggingface.co/docs/datasets/upload_dataset.
  • Make an MR for the script that you ran to do the above.
@thejaminator thejaminator added the good first issue Good for newcomers label Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant