Skip to content

Commit 4f0956b

Browse files
authored
Fix reward scaler when run on varied episode lengths (#455)
When calling `fit` with a reward scaler on a dataset with varied episode lengths, the following error would be thrown in the `fit_with_trajectory_slicer` method: ``` ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. ``` This commit fixes the issue by flattening the rewards before calculating the mean and std.
1 parent 8418d92 commit 4f0956b

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

d3rlpy/preprocessing/reward_scalers.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,8 +297,9 @@ def fit_with_trajectory_slicer(
297297
).rewards
298298
for episode in episodes
299299
]
300-
self.mean = float(np.mean(rewards))
301-
self.std = float(np.std(rewards))
300+
flat_rewards = np.concatenate(rewards)
301+
self.mean = float(np.mean(flat_rewards))
302+
self.std = float(np.std(flat_rewards))
302303

303304
def transform(self, x: torch.Tensor) -> torch.Tensor:
304305
assert self.built

0 commit comments

Comments
 (0)