Skip to content

Add unit tests for CLUSTER NODES parsing#112

Merged
bjosv merged 3 commits intovalkey-io:mainfrom
bjosv:cluster-nodes-test
Oct 14, 2024
Merged

Add unit tests for CLUSTER NODES parsing#112
bjosv merged 3 commits intovalkey-io:mainfrom
bjosv:cluster-nodes-test

Conversation

@bjosv
Copy link
Copy Markdown
Collaborator

@bjosv bjosv commented Oct 9, 2024

Includes OOM testing of replica parsing and fixes for memory issues when parsing fails.

Fixes #33

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
@bjosv bjosv requested a review from zuiderkwast October 10, 2024 07:00
bjosv added 2 commits October 11, 2024 13:04
Fix memory issues when the parsing of replicas fails.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Copy link
Copy Markdown
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/cluster.c
@@ -628,12 +628,12 @@ static int cluster_master_slave_mapping_with_name(valkeyClusterContext *cc,
}

if (node_old->slaves != NULL) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little odd and maybe fragile.

What about the corner case primaries without any slots? Does the nodes dict own them too?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd.
The nodes dict owns all valkeyClusterNodes after the parsing is complete, these are nodes with-or-without slots. This dict holds key=address, value=primary node, and each primary owns a list of replicas.
Deleting the nodes dict will delete all its objects, primaries and their known replicas, when there is an error while parsing.

But then there is the temporary nodes_name dict which holds references to all currently parsed cluster nodes.
The found issue is that this dict is the actual owner of an replica node, that has a primary, which is not yet parsed.
So, these are the replica nodes that needs to be deleted while the parsing fails.

I have a coming PR where the cluster nodes parsing is refactored.

// Connect
{
for (int i = 0; i < 128; ++i) {
for (int i = 0; i < 148; ++i) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the stack-allocated iterator uses fewer allocations?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the config to also parse replicas in this testcase, This required a couple of more allocations to succeed.

@bjosv bjosv merged commit a97a74e into valkey-io:main Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve tests of parsing CLUSTER NODES response

2 participants