Optimization: Replace shift().fillna(0) with shift(fill_value=0) for cleaner and faster Series manipulation

https://github.com/zphang/minimal-llama/blob/22a6bd949ff6fa98620d117177f90251aa168a21/minimal_llama/hyper/data/ref_msft.py#L199
Hi, I’d like to suggest a small optimization to this line:
`self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift().fillna(0).to_dict()`
This can be rewritten as:
`self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift(fill_value=0).to_dict()`
Using shift(fill_value=0) integrates the missing-value fill directly into the shift operation. This is faster and more memory-efficient because it avoids creating an intermediate Series with NaNs and then performing a second pass for fillna(). Instead, the value substitution happens during the shift process at the C level, resulting in cleaner, single-step logic.

The current form with .shift().fillna(0) introduces extra overhead by first generating a new Series containing NaNs, followed by a full scan to fill them. While functionally equivalent, this pattern is slightly slower and less expressive. Switching to shift(fill_value=0) provides both performance and readability benefits, particularly when used in time-critical or repeated data-processing steps.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization: Replace shift().fillna(0) with shift(fill_value=0) for cleaner and faster Series manipulation #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimization: Replace shift().fillna(0) with shift(fill_value=0) for cleaner and faster Series manipulation #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions