-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
| self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift().fillna(0).to_dict() |
Hi, I’d like to suggest a small optimization to this line:
self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift().fillna(0).to_dict()This can be rewritten as:
self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift(fill_value=0).to_dict()Using shift(fill_value=0) integrates the missing-value fill directly into the shift operation. This is faster and more memory-efficient because it avoids creating an intermediate Series with NaNs and then performing a second pass for fillna(). Instead, the value substitution happens during the shift process at the C level, resulting in cleaner, single-step logic.
The current form with .shift().fillna(0) introduces extra overhead by first generating a new Series containing NaNs, followed by a full scan to fill them. While functionally equivalent, this pattern is slightly slower and less expressive. Switching to shift(fill_value=0) provides both performance and readability benefits, particularly when used in time-critical or repeated data-processing steps.
Metadata
Metadata
Assignees
Labels
No labels