Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network-bytes-out metric support for CLUSTER SLOT-STATS command (#20) #771

Closed
wants to merge 2 commits into from

Conversation

kyle-yh-kim
Copy link
Contributor

@kyle-yh-kim kyle-yh-kim commented Jul 10, 2024

Update, July 24th 2024

This PR has been cherry-picked into an existing network-bytes-in PR; #720
Both network-bytes-in and out will be tracked under a single PR. I will now close this PR.


The metric tracks network egress bytes under per-slot context, by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command, with sample response attached below;

127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
   2) 1) "key-count"
      2) (integer) 1
      3) "network-bytes-out"
      4) (integer) 175

…alkey-io#20).

The metric tracks network egress bytes under per-slot context,
by hooking onto COB buffer mutations.

The metric can be viewed by calling the CLUSTER SLOT-STATS command,
with sample response attached below;

```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
   2) 1) "key-count"
      2) (integer) 1
      3) "network-bytes-out"
      4) (integer) 175
```

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
@kyle-yh-kim
Copy link
Contributor Author

Currently, the PR aggregates network-bytes-out from three different sources;

  1. User client's outgoing RESP response, as a result of command invocation.
  2. Replica client's outgoing replication stream.
  3. Outgoing message due to sharded pubsub publishment.

If this feels too confusing, there are two alternatives;

  1. Drop the replication stream and sharded pubsub accounting altogether, and strictly track only user client’s and valkey-server’s interaction.
  2. Split the network-bytes-out metric into three sub-components;
    127.0.0.1:6379> cluster slot-stats slotsrange 12539 12539
    1) 1) (integer) 12539
      2) 1) "key-count"
         2) (integer) 1
         3) "network-bytes-out-user"
         4) (integer) 10
         5) "network-bytes-out-replication"
         6) (integer) 20
         7) "network-bytes-out-sharded-pubsub"
         8) (integer) 30
    

Personally, I would be opposed to dropping the replication stream and sharded pubsub requirements altogether, as the inclusion of all three components better represents the actual egress bytes. Curious to hear the core team’s thoughts on this.

- Added more integration tests.
- Added a modifiable config, "cluster-slot-stats-enabled", with default
  value false.
- Updated the APIs to separate between 1) user client, 2) replication
  and 3) sharded pub/sub.

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
}

/* Accumulates egress bytes upon sending replication stream. This only applies for primary nodes. */
void clusterSlotStatsAddNetworkBytesOutForReplication(int len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void clusterSlotStatsAddNetworkBytesOutForReplication(int len) {
void clusterSlotStatsAddNetworkBytesOutForReplication(size_t len) {

Comment on lines +163 to +164
int _slot = c->slot;
c->slot = slot;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clearly following this trick and seems dangerous. If we do an early exit on the following statement, we don't restore the value in c->slot. Is that intentional ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was not done intentionally. The command should stay idempotent, irregardless of its early return condition.

Update has been reflected on the other PR, which replaces this one; #720

@@ -2833,6 +2838,7 @@ int processCommandAndResetClient(client *c) {
client *old_client = server.current_client;
server.current_client = c;
if (processCommand(c) == C_OK) {
clusterSlotStatsAddNetworkBytesOutForUserClient(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also go into afterCommand ?

@kyle-yh-kim
Copy link
Contributor Author

This PR has been cherry-picked into an existing network-bytes-in PR; #720
Both network-bytes-in and out will be tracked under a single PR. I will now close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants