Fix reward scaler when run on varied episode lengths (#455)

opz · web-flow · commit 4f0956ba8db4 · 2025-05-25T21:40:35.000+09:00
When calling `fit` with a reward scaler on a dataset with varied episode lengths,
the following error would be thrown in the `fit_with_trajectory_slicer` method:

```
ValueError: setting an array element with a sequence. The requested array has an
inhomogeneous shape after 1 dimensions.
```

This commit fixes the issue by flattening the rewards before calculating the mean and std.
diff --git a/d3rlpy/preprocessing/reward_scalers.py b/d3rlpy/preprocessing/reward_scalers.py
@@ -297,8 +297,9 @@ def fit_with_trajectory_slicer(
             ).rewards
             for episode in episodes
         ]
-        self.mean = float(np.mean(rewards))
-        self.std = float(np.std(rewards))
+        flat_rewards = np.concatenate(rewards)
+        self.mean = float(np.mean(flat_rewards))
+        self.std = float(np.std(flat_rewards))
 
     def transform(self, x: torch.Tensor) -> torch.Tensor:
         assert self.built