You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently training the reward model starts at the iteration number of the transformer checkpoint, which is weird. We should just start counting iterations from 0 in the reward model training loop (assuming that the reward model is being trained from scratch, if loading a reward model from a checkpoint then we can count from the checkpointed iteration number of the reward model) regardless of how many iterations the transformer was trained for.
The text was updated successfully, but these errors were encountered:
Currently training the reward model starts at the iteration number of the transformer checkpoint, which is weird. We should just start counting iterations from 0 in the reward model training loop (assuming that the reward model is being trained from scratch, if loading a reward model from a checkpoint then we can count from the checkpointed iteration number of the reward model) regardless of how many iterations the transformer was trained for.
The text was updated successfully, but these errors were encountered: