Add optional masking for reward #4

thejaminator · 2023-02-26T11:38:39Z

Currently we make the reward mandatory for all training ( and inference examples ).

# DecisionGPT2LMHeadModel
def forward(
        self,
        target_rewards: torch.Tensor,
...

This isn't ideal, it is possible that sometimes we don't want to train with a specified reward.

Perhaps we should take in an optional rewards_mask

def forward(
        self,
        target_rewards: torch.Tensor,
       rewards_mask: Optional[torch.BoolTensor], # same length as target_rewards
...

We'll then need to modify our attention mask in a corresponding manner to ignore the target_reward for that sequence.

Alternatively, maybe we can detect that when target_rewards is nan, we'll just modify our attention mask according?
That way we don't need a separate rewards_mask param. IDK if this could lead to more bugs though.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional masking for reward #4

Add optional masking for reward #4

thejaminator commented Feb 26, 2023

Add optional masking for reward #4

Add optional masking for reward #4

Comments

thejaminator commented Feb 26, 2023