You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To make huge pressure to rpc api, first change tendermint code, just increase the lock time so that a small scale of query request can make ApplyBlock a great delay:
Write a shell script, and run it on local node:
while true; do
time curl 127.0.0.1:26657/{abci-query-for something} &
sleep 0.2
done
Restart node and wait for about 2 or 3 minutes, we will see errors:
To find the root cause, we modified the source code:
Give BlockRequest new field Time
type BlockRequest struct {
Time time.Time //add to test
Height int64
PeerID p2p.ID
}
and init it with time.Now() when send it to pool.requestsCh
In poolRoutine() of BlockchainReactor , we compare the create time and current time.
Then we check the the peer that seems to be sent us nothing, find the block heigh that it is expected to receive but get nothing, we find that the request is still not send out !!! Generate at
19min42second, but pop from channel at 20min10second.
Just check the code of poolRoutine. we find that trySyncTicker is trigger in every 10ms which will send didProcessCh an object. When process a block takes more that 10ms, didProcessCh alway get an object so that get event from requestsCh will delay.
I did a simple test:
package main
import (
"fmt"
"math/rand"
"time"
)
type Event struct {
Time time.Time
}
func main() {
eventChannel := make(chan Event,600)
didProcessCh := make(chan struct{}, 1)
trySyncTicker := time.NewTicker(10 * time.Millisecond)
go func() {
for{
t:=rand.Intn(1000)
time.Sleep(time.Duration(t)*time.Millisecond)
eventChannel<-Event{time.Now()}
}
}()
for {
select {
case e:=<-eventChannel:
end:=time.Now()
d:=end.Sub(e.Time)
fmt.Println(d)
case <-trySyncTicker.C: // chan time
select {
case didProcessCh <- struct{}{}:
default:
}
case <-didProcessCh:
// We need both to sync the first block.
// Try again quickly next loop.
time.Sleep(2*time.Second)
didProcessCh <- struct{}{}
}
}
}
The text was updated successfully, but these errors were encountered:
unclezoro
changed the title
p2p: "peer did not send us anything" in high pressure
p2p: "peer did not send us anything" in 发售in high pressure
Mar 20, 2019
unclezoro
changed the title
p2p: "peer did not send us anything" in 发售in high pressure
p2p: "peer did not send us anything" in fast_sync mode when under high pressure
Mar 20, 2019
Fixestendermint#3457
The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.
Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Fixestendermint#3457
The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.
Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Reproduce step:
To make huge pressure to rpc api, first change tendermint code, just increase the lock time so that a small scale of query request can make ApplyBlock a great delay:
Write a shell script, and run it on local node:
To find the root cause, we modified the source code:
BlockRequest
new fieldTime
and init it with
time.Now()
when send it topool.requestsCh
In
poolRoutine()
ofBlockchainReactor
, we compare the create time and current time.Then we check the the peer that seems to be
sent us nothing
, find the block heigh that it is expected to receive but get nothing, we find that therequest
is still not send out !!! Generate at19min42second, but pop from channel at 20min10second.
Just check the code of
poolRoutine
. we find thattrySyncTicker
is trigger in every 10ms which will senddidProcessCh
an object. When process a block takes more that 10ms, didProcessCh alway get an object so that get event fromrequestsCh
will delay.I did a simple test:
The result is
The text was updated successfully, but these errors were encountered: