Create a offline dataset for Learning to summarize with human feedback #1

thejaminator · 2023-02-26T11:11:31Z

Description

We already have a toy example with the imdb reward model.
Let's create one with an actual RLHF task (Learning to summarize with human feedback)[https://arxiv.org/abs/2009.01325]

For this issue, we'll just focus on creating the dataset itself, instead of training our language (policy) model. We'll handle that in another card.

Here's an example of a dataset that I created for the imdb toy scenario. https://huggingface.co/datasets/thejaminator/imdb_rewarded.

For summarizing, the dataset will be slightly different.
It should have the columns (prompt, completion, reward).
Where the prompt is the article to summarize.
The completion should be the summary article by the labeller.
The reward should be the reward from the reward model (RM).

For the reward model, I'm not sure of a pure pretrained RM that we can just use.
However, OpenAssistant does have a RM that was trained on multiple preference datasets, including the summary dataset. So perhaps thats good enough.

See if the OpenAssistant RM is good enough to use. Otherwise, we may need to train our own.
Run the RM on the summary dataset to get the scalar rewards. Create the dataset where we have the columns (prompt, completion, reward).
Upload the dataset to huggingface. You can see an example of creating jsons for export in examples/imdb/export_imdb_reward_dataset.py and https://huggingface.co/docs/datasets/upload_dataset.
Make an MR for the script that you ran to do the above.

The text was updated successfully, but these errors were encountered:

thejaminator added the good first issue Good for newcomers label Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a offline dataset for Learning to summarize with human feedback #1

Create a offline dataset for Learning to summarize with human feedback #1

thejaminator commented Feb 26, 2023 •

edited

Loading

Create a offline dataset for Learning to summarize with human feedback #1

Create a offline dataset for Learning to summarize with human feedback #1

Comments

thejaminator commented Feb 26, 2023 • edited Loading

Description

thejaminator commented Feb 26, 2023 •

edited

Loading