Bind Cluster sessions to single server nodes, and add naive load balancing #183

alexjpwalker · 2021-02-11T19:29:54Z

What is the goal of this PR?

Previously, each Cluster session would lazily create Core sessions on-demand as Transactions were opened. This delegated too much responsibility to the Transaction. Now, each Cluster session is bound to a single server node, which may be a primary or a secondary node depending on the Options passed to the Session.

We also select secondary nodes in a fairer, more balanced way; the server indicates which node is its first preference for secondary sessions, and the client tries to obey the server's decision, with fallback to backup nodes only in the event of a failure.

What are the changes implemented in this PR?

On creating a Cluster session, a Core session is immediately created to a server node of the appropriate type
Node selection is based on the read_any_replica Option passed to the Session
When read_any_replica is true, the client will first select nodes that the server has indicated are "preferred secondary", with fallback to backup nodes only in the event of a failure

alexjpwalker · 2021-02-11T19:31:12Z

grakn/client.py

+        self.options = options
+
+    def run(self, replica: ReplicaInfo.Replica):
+        return SessionClusterRPC(self.client, replica.address(), self.database, self.session_type, self.options)


FailsafeTask is a new class that holds the failover/retry logic. It is an abstract base class, requiring that run is implemented, where run represents a unit of work that can fail and should be retried / failed over.

grakn/rpc/cluster/failsafe_task.py

alexjpwalker · 2021-02-11T19:34:53Z

grakn/rpc/cluster/session.py

+            self.cluster_session.core_session.close()
+        self.cluster_session.core_client = self.cluster_session.cluster_client.core_client(replica.address())
+        self.cluster_session.core_session = self.cluster_session.core_client.session(self.database, self.cluster_session.session_type(), self.options)
+        return self.cluster_session.core_session.transaction(self.transaction_type, self.options)


In FailsafeTask, run is called on the first run, and rerun is called on every subsequent run, if the first run fails.

In this case, if the first run fails, we need to tell the Cluster session to:

close its Core session

switch to a different Core client (linked to a different server node)

try to open a new Core session in the new client

try to open the transaction using this Core session, instead of the original one.

This allows us to fail-over the opening of Transactions, while enforcing that each Cluster session has, at any one time, precisely one Core session associated with it.

grakn/rpc/cluster/replica_info.py

Bind Cluster sessions to nodes; add load balancing

8f60d2c

alexjpwalker added type: feature priority: high labels Feb 11, 2021

alexjpwalker added this to the 2.0.0 milestone Feb 11, 2021

alexjpwalker self-assigned this Feb 11, 2021

alexjpwalker requested review from flyingsilverfin and vmax as code owners February 11, 2021 19:29

grabl assigned vmax and flyingsilverfin Feb 11, 2021

alexjpwalker commented Feb 11, 2021

View reviewed changes

alexjpwalker requested a review from lolski February 11, 2021 19:40

grabl assigned lolski Feb 11, 2021

vmax reviewed Feb 12, 2021

View reviewed changes

grakn/rpc/cluster/replica_info.py Show resolved Hide resolved

Make client more helpful when connection fails

aba5a18

vmax approved these changes Feb 12, 2021

View reviewed changes

alexjpwalker merged commit 5594249 into typedb:master Feb 12, 2021

alexjpwalker deleted the session-first branch February 12, 2021 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bind Cluster sessions to single server nodes, and add naive load balancing #183

Bind Cluster sessions to single server nodes, and add naive load balancing #183

Uh oh!

alexjpwalker commented Feb 11, 2021

Uh oh!

alexjpwalker Feb 11, 2021

Uh oh!

Uh oh!

alexjpwalker Feb 11, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bind Cluster sessions to single server nodes, and add naive load balancing #183

Bind Cluster sessions to single server nodes, and add naive load balancing #183

Uh oh!

Conversation

alexjpwalker commented Feb 11, 2021

What is the goal of this PR?

What are the changes implemented in this PR?

Uh oh!

alexjpwalker Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexjpwalker Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants