Voldemort thin client

Why Thin Client ?

The current (thick) client used in the client side routing mode in Voldemort has a number of issues:

The cluster topology and store metadata has to be synchronized on the client side. This makes bootstrapping (a process by which the client obtains this metadata from a random server node) an expensive operation. In addition, it introduces the additional burden of fetching the latest changes or bouncing the client when such an event occurs.
Complicated connection management which seems to have operational and performance implications.
Existing client handles all the replication (including cross zone) which is inefficient.
The client design is pretty complicated, with multiple levels of nesting.
Client side config parameters are far too many (sometimes confusing) which causes an operational overhead.

Although some of these issues can be fixed, it requires an iterative release to all the clients. For instance, we have recently implemented an auto-bootstrapper mechanism which tracks changes in cluster.xml happening on the cluster and automatically re-bootstraps the client which helps in doing cluster maintenance. However, in order to reap the benefits, it is essential that all the clients using the cluster pick up this new release. This is a very costly operation when a large number of clients are involved.

Architectural details

Isolating the thin client

The different layers of the thick client are broken down into two key pieces:

Thin client
Coordinator

1) Thin client:

The Thin client is intended to be a RESTful library that can be incorporated in any given application that wants to use Voldemort. Currently, we're targeting a thin Java client only. But the beauty of this segregation is that, building a RESTful client in any other language becomes very easy. The different layers in this component are:

Monitoring: For measuring the different metrics over time (eg: JMX Mbeans)
Inconsistency Resolver: The default (timestamp based) or user defined resolution mechanism
Serialization / De-serialization : To send / receive data on the wire
Compression / de-compression
HTTP client: To handle the request to and the response from the coordinator (HTTP based).

2) Coordinator:

The coordinator is a logical entity which accepts Voldemort requests and handles the main routing logic. It may be deployed as a set of machines in front of the cluster, or part of each storage node. The coordinators are symmetric - in the sense each coordinator can accept a request from any client intended for any store. The different layers in this component are:

HTTP handler: To handle requests from and response to the thin client
Stat tracking store: Similar to the monitoring layer
Routed store: To handle the actual Voldemort routing

Essentially the Coordinator maintains a set of thick clients for different stores owned by this cluster. After receiving a HTTP request, it determines the target store name and hands off that request to the corresponding thick client.

Routing between the thin client and the coordinator

Routing requests from the thin client to the coordinator has two main challenges

i) Load balancing ii) Topology transparent routing

i) Load balancing:

This is an easier problem to solve. The requests from different clients should ideally be sent to the available set of coordinators in a round robin manner. Any of the available (preferably software) load balancers would do.

ii) Topology transparent routing

The challenge here is that the available set of coordinators might keep changing. We might add more coordinators (to the original set) in the future to handle the increase traffic (related to cluster expansion as well). The thin clients should be able to detect such changes transparently - without the application knowing about this.

At LinkedIn, there is a proprietary solution that resolves the above Open Issue. Unfortunately, the proprietary solution is not open sourced (yet). We will keep this thread updated. In the meantime, the general assumption is that the load balancers would need to be re-initialized in case of a change in the coordinator membership.

Architectural Overview

Thin client high level overview

Coordinator Design

Voldemort Coordinator design

The coordinator has the following components:

Fat client pool

The coordinator needs to manager one fat client per store for satisfying the request. Here are the general design guidelines:

One fat client per factory per store with the compression and serialization layers turned off: This gives us isolation in the connection pool, selector threads and failure detector characteristics.
We should go with Default store client (and not ZenStoreClient): We do not need auto-bootstrapper per fat client (very expensive)
The Fat client needs to be modified in order to accept a (routing) timeout value per request: This is necessary since its very expensive to configure this separately per factory (per store) in a static manner.
The Fat client bootstrap call also needs to be modified to (also) accept a specific cluster.xml and stores.xml (This is useful during auto-bootstrap).
We need another fat client per factory per store with the inconsistency resolver turned off (in addition to compression and serialization layers) for the REST requests with the X-VOLD-Inconsistency-Resolver header specifying a custom application specific resolver. The result from this fat client will thus have all the versions of that particular key and will be returned back to the thin client (and hence to the application) for the custom resolution.

Asynchronous Metadata Manager thread

Initially this thread was started as part of the ZenStoreClient. However, running this thread in each fat client is a) Wasteful b) Unnecessary since they're all monitoring the same cluster.xml and stores.xml.

We need a separate thread in a coordinator:

Keep checking for updates to cluster.xml: In case of a change, re-bootstrap all the fat clients and provide this cluster.xml in (lightweight bootstrap).
Similarly, keep checking for updates in stores.xml: Only for the stores with avro-generic-versioned This will internally use the new System store client functionality to retrieve the metadata versions.

Management API

We need additional functionality:

Creating a new factory and a fat client on the fly (for instance in case a new store is created).
Bouncing a particular fat client: Possibly for static config change or other reasons.
View the current cluster.xml and stores.xml (and any other state / config).

Metadata

The coordinator needs to handle the following pieces of metadata

i) Stores.xml and cluster.xml

When the coordinator is started, it needs to bootstrap i.e. fetch the stores.xml and cluster.xml from the bootstrap URL specified in the Coordinator config.

ii) Fat client config

Different stores have different workloads and need store specific tuning on the fat client side with respect to

Connection Pool
Failure Detector sensitivity (for an ideal FD)
Bootstrap URL (although this will be the same for all fat clients)
Socket Buffer size (Not very obvious how this will be used though).

iii) Coordinator config

The coordinator itself has some configurable properties

Bootstrap URL (Same as that of fat clients)
Number of threads (for handling the REST requests)
Netty config
Async metadata checker properties (frequency)

Voldemort REST API documentation:

Voldemort-rest-api.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly