Skip to content

2.27.0.0-b475

@arpit-saxena arpit-saxena tagged this 21 Aug 17:43
Summary:
This fixes a race condition in `od_router_attach` when warmup mode is enabled. Note that warmup mode
is enabled in Java tests due to settings in BaseYsqlConnMgr.java. In doing this, we also fix the
flakiness we have observed in the test `o.y.ysqlconnmgr.AuthSocketCloseWithLoad`.

Here are the details of the issue:
- The issue is when we have to create a new server and break out of the main loop in
  `od_router_attach`, we try to create new connections in warmup mode.
- In that, we create new server(s) and add them to the server pool as IDLE. The last server we
  create here will be attached to the client.
- After that we release the lock on `route`. At this point, another client can call
  `od_router_attach`, acquire the lock on the same `route` and then get allotted the same server
  which was going to get attached to the first client.

This flow makes it so that one server gets allotted to 2 clients running in 2 separate coroutines
and most likely threads. This causes conditions like invalid memory accesses since every client
coroutine assumes exclusive access to the server it has been allotted.

Here is how this issue is fixed:
- Instead of releasing the lock and then acquiring it, we proceed to attach phase directly while
  holding the lock
- The above ensures that another client is not able to acquire this server from the server pool
Jira: DB-18008

Test Plan:
Ran `AuthSocketCloseWithLoad` 150 times locally with ASAN build and saw no failure. Before this
diff, we were seeing around 2-3% failure rate.

Jenkins: all tests

Reviewers: skumar, mkumar, asrinivasan, vikram.damle

Reviewed By: vikram.damle

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D46194
Assets 2
Loading