[blockchain] Riri processor-scheduler peer behaviour #3777

brapse · 2019-07-08T15:10:16Z

The new blockchain reactor design specified in #3753 introduces asynchronous message communication between the block processing logic and the request scheduling logic. This async communication introduces latency between the scheduler perceiving bad peer behaviour and informing the processor to remove block provided by that peer. A malicious peer could use this period to saturate the block processing thread and slowing down the FastSync'ing node.

This issue can be solved by yielding the block processing thread to peerErrors at a higher frequency to ensure few if any blocks from an errored peer are processed.

ancazamfir · 2019-07-08T16:42:53Z

Hmmm...not sure. You would yield to the messages in a channel that may contain a lot of blocks from scheduler, requests from other fast-sync peers, etc....before you get to the peerError from the scheduler. Unless we add another channel for errors (this is what v0 and v1 have).
Also, this is not the only issue, we also have to deal carefully with out of sync views on the world between processor and scheduler.

At this point I am not sure anymore that we need to have the processor and scheduler routines. I will try to give a minimal explanation on how v0 and v1 got to having the two routines, added links to the issue and PR below.

We used to have a single goroutine that was doing both the scheduling and the processing. The way it was written and given that we were running timers that triggered in separate goroutines the following was seen:

a high number of block requests were sent out and received in channel 1
then processing would start and after each processed block a signal would be sent to channel 2 in order to process more blocks quickly without sitting in a tight loop.
because processing blocks takes a long time, some blocks would sit in channel 1 for more than 15 sec.
the timer associated with the peer(s) sending those blocks would trigger and its handler would run in a separate goroutine rescheduling the pending requests (those blocks already waiting in channel 1).
later when we got to actually reading those blocks from channel 1, the code dropped them since now those requests were made to other peers.

With the current approach (use of timestamps and not real timers) this wouldn't happen, right?
Note also that the time spent in the scheduler routine is insignificant compared with the processing routine. AFAIK, we haven't seen any performance improvement after splitting.

v0 issue: #3457
v1 PR: #3459

brapse · 2019-07-09T08:19:01Z

. Unless we add another channel for errors (this is what v0 and v1 have).

Yes I think that could definitely be one solution. It could be done internal the the processRoutine because in the end, it's the processRoutine which is most concerned for missing a peerError.

At this point I am not sure anymore that we need to have the processor and scheduler routines.

The motivation for having separate routines in v2 was to decompose the problem into components which were easier to test and comprehend for reviewing purposes. There are additional benefits but I would say that by far is the largest.

With the current approach (use of timestamps and not real timers) this wouldn't happen, right?

If i understand you correctly, we are talking about thread saturation where an expensive process like block validation holds control of a thread for too long and doesn't check for errors or doesn't check the time and issues timeouts. In both cases it seems like the solution is to ensure that threads yields frequently enough to be responsive to external state changes (events).

brapse · 2019-11-28T14:56:10Z

This has been resolved by using a priority queue routine which ensures peerErrors are higher priority than blocks.

brapse self-assigned this Jul 8, 2019

brapse mentioned this issue Jul 8, 2019

[ADR - 43] blockchain riri-org #3753

Merged

4 tasks

brapse mentioned this issue Jul 19, 2019

[blockchain] v2-riri Scheduler implementation #3821

Closed

5 tasks

brapse mentioned this issue Jul 30, 2019

[blockchain] v2 riri Schedule composit data structure #3848

Merged

5 tasks

brapse closed this as completed Nov 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[blockchain] Riri processor-scheduler peer behaviour #3777

[blockchain] Riri processor-scheduler peer behaviour #3777

brapse commented Jul 8, 2019

ancazamfir commented Jul 8, 2019

brapse commented Jul 9, 2019 •

edited

brapse commented Nov 28, 2019

[blockchain] Riri processor-scheduler peer behaviour #3777

[blockchain] Riri processor-scheduler peer behaviour #3777

Comments

brapse commented Jul 8, 2019

ancazamfir commented Jul 8, 2019

brapse commented Jul 9, 2019 • edited

brapse commented Nov 28, 2019

brapse commented Jul 9, 2019 •

edited