Skip to content
This repository was archived by the owner on Oct 9, 2023. It is now read-only.

Conversation

@alexjpwalker
Copy link
Member

What is the goal of this PR?

Previously, each Cluster session would lazily create Core sessions on-demand as Transactions were opened. This delegated too much responsibility to the Transaction. Now, each Cluster session is bound to a single server node, which may be a primary or a secondary node depending on the Options passed to the Session.

We also select secondary nodes in a fairer, more balanced way; the server indicates which node is its first preference for secondary sessions, and the client tries to obey the server's decision, with fallback to backup nodes only in the event of a failure.

What are the changes implemented in this PR?

  • On creating a Cluster session, a Core session is immediately created to a server node of the appropriate type
  • Node selection is based on the read_any_replica Option passed to the Session
  • When read_any_replica is true, the client will first select nodes that the server has indicated are "preferred secondary", with fallback to backup nodes only in the event of a failure

self.options = options

def run(self, replica: ReplicaInfo.Replica):
return SessionClusterRPC(self.client, replica.address(), self.database, self.session_type, self.options)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FailsafeTask is a new class that holds the failover/retry logic. It is an abstract base class, requiring that run is implemented, where run represents a unit of work that can fail and should be retried / failed over.

self.cluster_session.core_session.close()
self.cluster_session.core_client = self.cluster_session.cluster_client.core_client(replica.address())
self.cluster_session.core_session = self.cluster_session.core_client.session(self.database, self.cluster_session.session_type(), self.options)
return self.cluster_session.core_session.transaction(self.transaction_type, self.options)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FailsafeTask, run is called on the first run, and rerun is called on every subsequent run, if the first run fails.

In this case, if the first run fails, we need to tell the Cluster session to:

  1. close its Core session
  2. switch to a different Core client (linked to a different server node)
  3. try to open a new Core session in the new client
  4. try to open the transaction using this Core session, instead of the original one.

This allows us to fail-over the opening of Transactions, while enforcing that each Cluster session has, at any one time, precisely one Core session associated with it.

@alexjpwalker alexjpwalker merged commit 5594249 into typedb:master Feb 12, 2021
@alexjpwalker alexjpwalker deleted the session-first branch February 12, 2021 12:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants