New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Election timers can lead to resource exhaustion : Could not create thread: Resource temporarily unavailable #1216
Comments
seems to happen more often after pre-elections. with pre-elections we don't seem to back-off exponentially, but try a new pre-election every 3sec. Probably one option would be to add a back off scheme similar to normal elections. I0414 03:45:19.108985 24452 raft_consensus.cc:803] T 6388fff4f85d4227b6de5f15dd2154f4 P 07c4d6051edb487c85276c4309e7793b [term 24 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION... |
In this case, over a small period of 6 mins from 03:45 to 03:51, the total started threads increased from 88k to 120k. The number of running threads went from ~850 to 11k; before running out of resources to create new threads -- resulting in a FATAL. |
addressed in |
Seeing this in one of the workloads. Once a node is unable to hear from its peers, it starts firing the election timeout, and it turns out that we create a new thread for each such election timeout. (since we do not want to run in the timer thread.)
However the thread pool that raft token uses is unlimited in sizes, so if things don't catch up as fast we could run into resource exhaustion.
https://github.com/YugaByte/yugabyte-db/blob/6262fc7a7e23c2533441b10be774a715d28f804f/src/yb/tserver/ts_tablet_manager.cc#L317
I0414 03:50:05.447449 15811 raft_consensus.cc:803] T c6d4a0a0d0e04e4db7a07ea5e0b2cff0 P 07c4d6051edb487c85276c4309e7793b [term 34 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION...
I0414 03:50:05.452247 25149 raft_consensus.cc:803] T da25186fd14545868286ea6b205383b3 P 07c4d6051edb487c85276c4309e7793b [term 20 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION...
I0414 03:50:05.453394 25150 raft_consensus.cc:803] T 0b54d0832741467ba6021d16ed1a33f0 P 07c4d6051edb487c85276c4309e7793b [term 22 FOLLOWER]: ReportFailDetected: Starting NORMAL_ELECTION...
I0414 03:50:05.464104 16404 leader_election.cc:215] T a873e7f151d7436b91a04e7ceb085467 P 07c4d6051edb487c85276c4309e7793b [CANDIDATE]: Term 12 pre-election: Requesting vote from peer 7f1eb600612044bfa88cfda14f67fad3
I0414 03:50:05.464107 16886 raft_consensus.cc:481] T a873e7f151d7436b91a04e7ceb085467 P 07c4d6051edb487c85276c4309e7793b [term 11 FOLLOWER]: Fail of leader 7f1eb600612044bfa88cfda14f67fad3 detected. Triggering leader pre-election, mode=NORMAL_ELECTION
W0414 03:50:05.464138 473 threadpool.cc:482] Thread pool failed to create thread: Runtime error (yb/util/thread.cc:586): Could not create thread: Resource temporarily unavailable (error 11)
The text was updated successfully, but these errors were encountered: