Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate implementation of calculate_diff #528

Merged
merged 2 commits into from
Feb 26, 2025

Conversation

Swatinem
Copy link
Contributor

@Swatinem Swatinem commented Feb 24, 2025

This cleans up the duplicated implementations of Filtered/Report.calculate_diff and moves it to a new well typed file.


I’m a bit surprised that this does not yield any speedup in the diff calculation benchmarks (compared to #525).

I think the difference between the base and FilteredReport is based on the actual filtering and line-mapping code that happens, instead of the different and less optimized version that was previously there.

                                          Benchmark Results                                          
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃                                       Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃  Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━┩
│            test_report_diff_calculation[Report] │         803ns │        0.9% │    2.92s │ 15,975 │
│    test_report_diff_calculation[FilteredReport] │       1,886ns │        1.1% │    2.92s │ 10,437 │
└─────────────────────────────────────────────────┴───────────────┴─────────────┴──────────┴────────┘

@Swatinem Swatinem requested a review from a team February 24, 2025 12:55
@Swatinem Swatinem self-assigned this Feb 24, 2025
@Swatinem Swatinem force-pushed the swatinem/unify-calculate-diff branch from d008aaf to e454565 Compare February 24, 2025 13:08
Copy link

codspeed-hq bot commented Feb 24, 2025

CodSpeed Performance Report

Merging #528 will create unknown performance changes

Comparing swatinem/unify-calculate-diff (ca08e87) with main (0222015)

Summary

⚠️ No benchmarks were detected in both the base of the PR and the PR.\

@Swatinem Swatinem force-pushed the swatinem/unify-calculate-diff branch from e454565 to ec03cd7 Compare February 24, 2025 13:39
This cleans up the duplicated implementations of `Filtered/Report.calculate_diff` and moves it to a new well typed file.
@Swatinem Swatinem force-pushed the swatinem/unify-calculate-diff branch from ec03cd7 to f664924 Compare February 25, 2025 12:21
Copy link
Contributor

@giovanni-guidini giovanni-guidini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding more types to this part of the code 🥳

As you mentioned no speed improvement, but in the comparison the stddev for the Report class was greatly reduced... that might be a win? Having a more bounded code I guess. The speed report doesn't really report quantiles for us to know, though.

I wonder if using @dataclass instead of TypedDict could improve speed. But my guess is that it would be worse (cause you need to transform from dict into a class instance).


class DiffSegment(TypedDict):
lines: list[str]
header: tuple[int, int, int, int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here like # (base_start_line, base_length, head_start_line, head_length) would be useful for people that don't know what the numbers represent

Or breaking up the header further, but might be overkill



class DiffFile(TypedDict):
type: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know what the possible types are?

Copy link
Contributor

@joseph-sentry joseph-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lgtm:

to answer one of @giovanni-guidini's questions from above:

def diff_to_json(self, diff):

I got confused because i was looking at the response schemas from the Github API and I couldn't find "type" or "segments"

@Swatinem
Copy link
Contributor Author

Thanks for the suggestions. I improved the types and added doc comments.

On that note, I just hate how python doc comments come after the item they are documenting. This is so unintuitive compared to pretty much every other language that does it differently.

Bildschirmfoto 2025-02-26 um 10 37 12

@Swatinem Swatinem added this pull request to the merge queue Feb 26, 2025
Merged via the queue into main with commit e4eb57b Feb 26, 2025
8 checks passed
@Swatinem Swatinem deleted the swatinem/unify-calculate-diff branch February 26, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants