Add a Quantile RewardScaler #6

thejaminator · 2023-02-26T13:25:35Z

conditionme/scaling/scaler.py defines the scalers we currently have.
This doesn't seem to really help, as our reward distribution from reward models tend to be very heavily skewed.

We can implement a quantile scaler. Maybe that helps.

can try binning rewards into 10 buckets as in Quark

Thanks @tomekkorbak for this suggestion!

The text was updated successfully, but these errors were encountered:

SahilDave04 · 2023-03-30T17:03:13Z

Hey can you elaborate your request more. I'm not exactly able to understand your request.

thejaminator added the good first issue Good for newcomers label Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Quantile RewardScaler #6

Add a Quantile RewardScaler #6

thejaminator commented Feb 26, 2023

SahilDave04 commented Mar 30, 2023

Add a Quantile RewardScaler #6

Add a Quantile RewardScaler #6

Comments

thejaminator commented Feb 26, 2023

SahilDave04 commented Mar 30, 2023