Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC subscriptions: client is not pulling messages fast enough #3935

Closed
qustavo opened this issue Sep 3, 2019 · 8 comments · Fixed by #4521
Closed

RPC subscriptions: client is not pulling messages fast enough #3935

qustavo opened this issue Sep 3, 2019 · 8 comments · Fixed by #4521
Assignees
Labels
C:rpc Component: JSON RPC, gRPC T:bug Type Bug (Confirmed)

Comments

@qustavo
Copy link

qustavo commented Sep 3, 2019

Tendermint version
0.32.3

ABCI app

  • kvstore
  • InProc implementation

Environment:

  • OS linux-5.2.10 (Archlinux)
  • Install tools: git

What happened:
When I subscribe to events via RPC API, after getting 0 or more events (this is completely random) I get the following error:

{
  "jsonrpc": "2.0",
  "id": "1#event",
  "error": {
    "code": -32000,
    "message": "Server error",
    "data": "subscription was cancelled (reason: client is not pulling messages fast enough)"
  }
}

What you expected to happen:
I should keep getting the events

Have you tried the latest version: yes

How to reproduce it (as minimally and precisely as possible):

  • step 1
    Start tendermint:
    go run ./cmd/tendermint/main.go node --proxy_app kvstore

  • step 2
    Subscribe to events, I'm using ws but any WS client should be sufficient:

$  ws ws://localhost:26657/websocket
> {"jsonrpc":"2.0", "id": 1, "method": "subscribe", "params": {"query": "tm.event='Tx'"}}
< {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {}
}

- step 3
Broadcast tons of Tx:
```bash
WAIT=0.1
for i in $(seq 10000)
do
  echo $i
  sleep $WAIT
  curl http://localhost:26657/broadcast_tx_sync\?tx\=\"$i\"
done

You can increment $WAIT to 1 or more seconds until the error dissapears.

Config:
Generated by tendermint init

@qustavo
Copy link
Author

qustavo commented Sep 3, 2019

After some investigation, I noted that changing this:

sub, err := eventBus.Subscribe(subCtx, addr, q)

to

	sub, err := eventBus.Subscribe(subCtx, addr, q, 9999)

Fixed the problem.

Would it make sense to provide an RPC config param (i.e. max_subscription_capacity) that allows passing an arbitrary capacity to the Subscribe function?

@melekes
Copy link
Contributor

melekes commented Sep 3, 2019

"i've found that this error can happen if there are two events from the same subscription on a single block. the second event triggers the error because the first event is still in the queue" - Mark from ShapeShift

@qustavo
Copy link
Author

qustavo commented Sep 3, 2019

can u give me an example of that? I can't picture it

@melekes melekes added T:bug Type Bug (Confirmed) C:rpc Component: JSON RPC, gRPC labels Sep 4, 2019
@melekes
Copy link
Contributor

melekes commented Sep 5, 2019

I believe the right thing to do here would be buffering events on the client side. Forcing Tendermint to buffer events for subscriptions will further a) increase the amount of memory b) bring more complexity potentially since we'll have to persist buffered events (so that if Tendermint crashes, we don't loose those events).

@qustavo
Copy link
Author

qustavo commented Sep 5, 2019

I understand your concerns, and I agree that buffering events on the client-side is the way to go right now.
Now, regarding your concerns: a) although that would increase the amount of memory, you can allow the user to control the size (max_subscription_capacity) plus I don't think that the amount of memory will increase dramatically and b) if we buffer in the caller, we (the caller program) still need to implement a synchronization system against Tendermint. Assuming TM crashes and loses messages, the caller SHOULD make sure to query TM to fill the gap between the last received events and the newly received.

@melekes
Copy link
Contributor

melekes commented Mar 3, 2020

Solution 1

I would propose to allocate a buffer per each subscription on the Tendermint (server) side to allow some slowness in clients (or short bursts of events). A similar buffer should exist on the client side if processing a single event takes time (alt., processing should be done in a separate thread).

The size of this buffer can be configurable (similar to TCP send/receive buffers) or constant (most of the users will not tweak it, I think). However, I am not exactly sure what's the ideal size.

pros:

  • we don't block Tendermint if the certain WS client is slow to consume events
  • allows short bursts

cons:

  • can loose events currently in buffer if TM crashes

Solution 2

Allow blocking subscriptions (/subscribe?unbuffered=true).

pros:

  • guaranteed delivery

cons:

  • blocking Tendermint consensus & other parts using eventBus & other clients

melekes added a commit that referenced this issue Mar 3, 2020
@erikgrinaker
Copy link
Contributor

Allow blocking subscriptions

I don't think this is viable, any RPC client would be able to effectively DoS a node simply by opening a subscription and doing nothing.

can loose events currently in buffer if TM crashes

Is this not already a problem? Without being familiar with the event bus, if a Tendermint node crashes I'm guessing any in-flight events would be lost. Also, once the node comes back online I would think it's possible for it to generate events before the client reconnects, unless we have some sort of sequence number we can resume from in which case we could use that to resume events that were lost in the buffer as well.

I agree that delivery guarantees are fantastic, but if we have the necessary infrastructure to actually give such guarantees then that infrastructure should be able to easily handle buffer loss as well.

@feizerl
Copy link

feizerl commented Sep 16, 2021

Is there a workaround this issue? I am currently running into the exact same issue, and I am pretty sure it is not because of my client's slowness. (it is basically a tight loop polling messages from websocket).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:rpc Component: JSON RPC, gRPC T:bug Type Bug (Confirmed)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants