Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

diegomrsantos
Copy link
Collaborator

@diegomrsantos diegomrsantos commented Apr 9, 2024

This PR applies a behavior penalty to peers whose non-prio queue reaches the max limit configured, instead of the previous strategy of disconnecting the peer. A conservative penalty of 0.0001 is added to behaviourPenalty for each message tried to be sent when the queue is over the limit, and the message is discarded. This usually results in a behaviourPenalty around [0.1, 0.2] when the score is updated and its value around [-0.4, -0.1].

It causes the peer to be pruned due to its negative score. This PR also clears the non-prio queue at this moment.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 63.63636% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.60%. Comparing base (03f67d3) to head (a6237bd).
Report is 2 commits behind head on unstable.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           unstable    #1083      +/-   ##
============================================
- Coverage     84.82%   84.60%   -0.23%     
============================================
  Files            91       91              
  Lines         15428    15486      +58     
============================================
+ Hits          13087    13102      +15     
- Misses         2341     2384      +43     
Files Coverage Δ
libp2p/protocols/pubsub/gossipsub/behavior.nim 88.86% <0.00%> (+0.02%) ⬆️
libp2p/protocols/pubsub/pubsubpeer.nim 87.59% <70.00%> (+1.45%) ⬆️

... and 9 files with indirect coverage changes

@diegomrsantos
Copy link
Collaborator Author

diegomrsantos commented Apr 10, 2024

This log shows the moment when the peer 16U*23VggQ non-prio queue reaches maxNumElementsInNonPriorityQueue (currently 1024) at 2024-04-10 15:35:48.273+02:00. At this moment, a behavior penalty of 0.0001 is applied. It keeps increasing by the same amount until it reaches 0.08230000000000133. At 2024-04-10 15:35:57.562+02:00, the peer score becomes negative (-0.1079571840000035). The peer is pruned and the queue is cleared. At 2024-04-10 16:05:57.471 the peer score becomes 0 again. After that, something similar happens again, but this time, after the peer score becomes negative, it's disconnected DBG 2024-04-10 16:15:14.749+02:00 Received Goodbye message topics="peer_proto" reason="Disconnected (129)" peer=16U*23VggQ. The agent was prysm and I don't know if it's different from teku, but in the latter, this error means too many peers.
nimbus-log.txt

@diegomrsantos diegomrsantos marked this pull request as ready for review April 11, 2024 15:46
p.closeSendConn(PubSubPeerEventKind.DisconnectionRequested)
else:
Future[void].completed()
p.behaviourPenalty += 0.0001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving the value used here to a constant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

p.behaviourPenalty += 0.0001
trace "Peer has reached maxNumElementsInNonPriorityQueue. Discarding message and applying behaviour penalty.", peer = p, score = p.score,
behaviourPenalty = p.behaviourPenalty, agent = p.getAgent()
Future[void].completed()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given most return paths are Future[void].completed() how about moving it outside the if statement and doing an explicit return on each arm for the cases that are not Future[void].completed().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate on what you believe is bad with the current approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing wrong per se. What I was hinting at was, when you have code akin to this:

if <cond>:
  foo()
  return A  

elif <cond>:
  bar()
  return A

elif <cond>:
  baz()
  return B

...

else:
  foobar()
  return A

Sometimes it's clearer to group/extract the returns:

if <cond>:
  foo()

elif <cond>:
  bar()


elif <cond>:
  baz()
  return B

...

else:
  foobar()

return A

But I just realised here it's almost 50/50, so not a lot of improvement can be made.

Comment on lines +277 to +283
proc clearNonPriorityQueue*(p: PubSubPeer) =
if len(p.rpcmessagequeue.nonPriorityQueue) > 0:
p.rpcmessagequeue.nonPriorityQueue.clear()

when defined(pubsubpeer_queue_metrics):
libp2p_gossipsub_non_priority_queue_size.set(labelValues = [$p.peerId], value = 0)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How asynchronous is this operation? That is: How feasible it is that, between the clear call and the metrics update, some value is added to the nonPriorityQueue? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proc isn't async and won't be suspended.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right. nvm then

@kaiserd kaiserd self-requested a review May 10, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants