You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't ideal, it is possible that sometimes we don't want to train with a specified reward.
Perhaps we should take in an optional rewards_mask
def forward(
self,
target_rewards: torch.Tensor,
rewards_mask: Optional[torch.BoolTensor], # same length as target_rewards
...
We'll then need to modify our attention mask in a corresponding manner to ignore the target_reward for that sequence.
Alternatively, maybe we can detect that when target_rewards is nan, we'll just modify our attention mask according?
That way we don't need a separate rewards_mask param. IDK if this could lead to more bugs though.
The text was updated successfully, but these errors were encountered:
Currently we make the reward mandatory for all training ( and inference examples ).
This isn't ideal, it is possible that sometimes we don't want to train with a specified reward.
Perhaps we should take in an optional
rewards_mask
We'll then need to modify our attention mask in a corresponding manner to ignore the target_reward for that sequence.
Alternatively, maybe we can detect that when
target_rewards
isnan
, we'll just modify our attention mask according?That way we don't need a separate
rewards_mask
param. IDK if this could lead to more bugs though.The text was updated successfully, but these errors were encountered: