You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
conditionme/scaling/scaler.py defines the scalers we currently have.
This doesn't seem to really help, as our reward distribution from reward models tend to be very heavily skewed.
We can implement a quantile scaler. Maybe that helps.
can try binning rewards into 10 buckets as in Quark
conditionme/scaling/scaler.py
defines the scalers we currently have.This doesn't seem to really help, as our reward distribution from reward models tend to be very heavily skewed.
We can implement a quantile scaler. Maybe that helps.
Thanks @tomekkorbak for this suggestion!
The text was updated successfully, but these errors were encountered: