Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Split to record #21583

Merged
merged 1 commit into from
Apr 18, 2024
Merged

Convert Split to record #21583

merged 1 commit into from
Apr 18, 2024

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Apr 17, 2024

Since the records have generates equals and hashCode methods, we need to switch away from using Set in the scheduler implementation which seems incorrect anyway as it suggests that we want to remove duplicates but hashCode and equals are not part of the ConnectorSplit contract.

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findepi
Copy link
Member

findepi commented Apr 17, 2024

but hashCode and equals are not part of the ConnectorSplit contract.

Previously Split.equals did not invoke ConnectorSplit.equals, so it maybe didn't matter.
Now it does.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeks ok, but would be good for other set of eyes to look at it

Since the records have generates equals and hashCode methods, we need to switch away from using Set<Split> in the scheduler
implementation which seems incorrect anyway as it suggests that we want to remove duplicates but hashCode and equals are not
part of the ConnectorSplit contract.
@wendigo wendigo merged commit 12b8e5f into master Apr 18, 2024
103 checks passed
@wendigo wendigo deleted the serafin/schedulers-sets branch April 18, 2024 20:19
@github-actions github-actions bot added this to the 446 milestone Apr 18, 2024
@@ -92,7 +92,7 @@ private enum State
private final BooleanSupplier anySourceTaskBlocked;
private final PartitionIdAllocator partitionIdAllocator;
private final Map<InternalNode, RemoteTask> scheduledTasks;
private final Set<Split> pendingSplits = new HashSet<>();
private final List<Split> pendingSplits = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the line below

splitAssignment.values().forEach(pendingSplits::remove); // AbstractSet.removeAll performs terribly here.

now that pendingSplits is a list, this will be List.remove, which is a slow thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the removeAll better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colebow colebow added the no-release-notes This pull request does not require release notes entry label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

None yet

4 participants