Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: VTAdmin Architecture #7117

Closed
doeg opened this issue Dec 4, 2020 · 2 comments
Closed

RFC: VTAdmin Architecture #7117

doeg opened this issue Dec 4, 2020 · 2 comments
Labels
Component: VTAdmin VTadmin interface Type: RFC Request For Comment

Comments

@doeg
Copy link
Contributor

doeg commented Dec 4, 2020

🪐 VTAdmin Architecture RFC

Authors

This RFC introduces VTAdmin, a new web interface for managing multi-cluster Vitess deployments in a single browser tab. Long-term, we intend VTAdmin as a replacement for the current Vitess Admin interface.

Readers should be familiar with Vitess’s current architecture and the Vitess Admin interface.

Background

Today, operators manage Vitess deployments using the vtctld interface, also known as the Vitess Admin interface. Use cases include:

  • Browsing the topology of a Vitess deployment.
  • Viewing schemas and schema metadata.
  • Creating and managing VReplication Workflows.

Here's a look at the current Vitess Admin topology browser:
Screen Shot 2020-12-04 at 9 26 26 AM

...And here's the Workflows interface:
Screen Shot 2020-12-04 at 9 25 19 AM

1 Topology Service = 1 browser tab

vtctld servers host both the Vitess Admin interface and the HTTP API it uses. The vtctld's API endpoints in turn issue requests against a Topology Service.

The Vitess documentation shows the architecture of a standard, single-cluster Vitess deployment:
The architecture of a (single-cluster) Vitess deployment, from the Vitess documentation.

Most Vitess deployments have a single Topology Service and a single Vitess Admin interface. We call these single-cluster deployments. Large Vitess deployments can be multi-cluster, consisting of multiple Topology Services.

Multi-cluster deployments are useful for data isolation, since reads and writes never exceed the boundary of a single topology. Operators can deploy separate clusters called prod, dev, and qa, for example, or silo data across several geographic regions to comply with data residency policies.

Since every Topology Service requires its own Vitess Admin interface, operators managing a multi-cluster deployment wind up with a lot of browser tabs. If we try to rewrite Vitess Admin to unify multiple Topology Servers in a single tab, we encounter two big problems.

First, we’d need to significantly rearchitect the vtctld server to multiplex requests across several Topology Services instead of just one. A rewrite of this magnitude complicates the API and introduces a large surface area for dangerous bugs. A vtctld could, for example, write to the wrong Topology Service and accidentally delete a keyspace in the production cluster instead of in dev.

Second, we’d need to rearchitect the front-end which brings us to…

Making front-end changes is really hard

The front-end codebase is largely unchanged since it was first built four years ago. Without regular upkeep, even simple changes are now prohibitively difficult.

Developers will encounter a toolchain circa 2017—Node v8 (current is v15), for example, and Angular 2.0.0-alpha.7-2 (current is Angular 11.0.2)—as well as hundreds of security warnings. To update our dependencies, we need to navigate four years’ worth of major version migrations without breaking the existing UI.

npm outdated shows the diff between current and latest versions:
npm outdated shows the diff between current and latest versions

npm audit lists hundreds of vulnerabilities “requiring manual review”:
npm audit lists hundreds of vulnerabilities “requiring manual review”

Building (another) brand new UI is a spicy proposal, but we see several compelling reasons to rearchitect VTAdmin from the ground up:

  • Vitess Admin is hosted on and queries data from the vtctld hosts. Given that rearchitecting vtctlds to be multi-cluster is out of the question, Vitess Admin would need to be deployed independently before it could support multi-cluster queries. Rearchitecting the front-end on top of an unstable, outdated toolchain is intractable.

  • Updating Vitess Admin’s dependencies will take several months, which would bring the current interface up to date but would not bring us closer to multi-cluster support (or any other new features). On the other hand, starting over lets us lean on industry standards (like create-react-app) to obviate tedious scaffolding work like the build toolchain, linting/formatting, test frameworks, and on. Our codebase will be simpler, more approachable, and more harmonious than it would be otherwise. And, best of all, we can spend our time on building the parts that are unique to VTAdmin.

  • Since Vitess Admin is used in production, “move fast and break things” isn’t an option. Whether upgrading dependencies or adding multi-cluster support, we’d need to be exceedingly careful to maintain backwards compatibility. In writing VTAdmin as a separate component, we isolate it from the rest of the Vitess codebase and allow for opt-in deploys.

  • Finally, and most optimistically, we have an opportunity to revisit our design decisions from four years ago with fresh eyes and (re)set the stage for exciting improvements to come.

Proposal

We propose a new UI and API, together called VTAdmin, to replace the current Vitess Admin interface. We aim to:

  • Architect VTAdmin to work with multi-cluster Vitess deployments.
  • Establish VTAdmin as the next generation of Vitess front-end tooling.

Over the next three months, we will build two new components:

  • vtadmin-web, a front-end for managing a Vitess deployment, starting with a UI for managing VReplication streams
  • vtadmin-api, an HTTP API that aggregates data across multiple clusters.

Architecture

VTAdmin is deployed as two independently-hosted components, vtadmin-web and vtadmin-api, which are mapped to one or more clusters.

If a Vitess deployment has only one cluster, VTAdmin and Vitess Admin have similar architecture: one Topology Store, one interface. VTAdmin’s utility really shines in multi-cluster deployments, consolidating information across n clusters in a single browser tab instead of n Vitess Admin interfaces.

A multi-cluster Vitess deployment

Service discovery

VTAdmin requires a new service discovery layer in order multiplex its requests to VTGates and vtctlds across multiple clusters.

Like the Vitess Topology Service, VTAdmin will support pluggable service discovery backends, allowing users to use their existing service discovery infrastructure. Initially, we will implement service discovery for at least Consul, while leaving room for other backends like Zookeeper, etcd, and static host lists.

The VTAdmin service discovery layer will be discussed in more detail in a separate RFC. (When it's published, we'll update this RFC with the link.) :)

Non-Goals

The following things could reasonably be goals in the medium-to-long term, but we explicitly omit them from our goals for the next three months:

  • We won’t remove the Vitess Admin interface until VTAdmin has acceptable feature parity with Vitess Admin, which will take several months. Eventually, we will deprecate the Vitess Admin interface (and vtctld2 codebase) and remove it after a generous transition period. We enthusiastically support early adoption of VTAdmin and welcome all feedback so the transition is as easy as possible.

  • We won’t add access control (ACLs)… yet. Adding ACLs on to a piece of infrastructure that spans multiple clusters requires its own design and considerations of how Vitess users manage identity across a set of clusters. Consequently, we will defer this until a later version.

  • We won’t issue write operations, since we consider ACLs a prerequisite. For example, VTAdmin’s VReplication UI will display VReplication streams but will not create or update them.

Putting it all together

Since we propose starting with a new VReplication UI, let’s look at how this would work in a multi-cluster deployment.

Let’s assume our Vitess deployment has three clusters: prod, dev, and qa. One could also imagine clustering by geographic region, which means our clusters might look more like us-east-1, ap-northeast-1, and eu-north-1.

One VTAdmin to rule them all

When deploying for the first time, a new vtadmin user is added to grpc_auth_static_creds.json to allow vtadmin-api to issue requests against VTGates and vtctlds:

    {
      "Username": "vtadmin",
      "Password": "********"
    }

Deploying VTAdmin is then very straightforward and self-contained:

    ./vtadmin-api --grpc-user vtadmin --clusters "dev,prod,qa" --port 12345
    ./vtadmin-web --port 12346

At this point, we have a front-end at https://vtadmin-web.example.com, which makes requests against the API on https://vtadmin-api.example.com. Together, they unify data across all three Topology Stores in a single browser tab. 🎉

Let’s imagine our new VReplication UI in more detail:

  • https://vtadmin-web.example.com/vrep/streams renders a list of VReplication streams across all of the shards on prod, dev, and qa in a single browser tab.

  • The front-end makes an HTTP request to GET https://vtadmin-api.example.com/api/vrep/streams.

    • Alternatively, a cluster parameter is used restrict the query to a subset of the clusters: GET /api/vreplication/streams?cluster=prod. This can be useful if latency is a concern, like geographically distant clusters.
  • vtadmin-api uses service discovery to find the primary vtctld in each cluster, and issues a [VReplicationExec](https://vitess.io/docs/reference/features/vreplication/#vreplicationexec) query over gRPC across all of the keyspaces and shards in that cluster. (In practice, doing this performantly can be tricky and requires coordination between the front- and back-end to avoid too many gRPC requests at once.)

…and making front-end changes is really easy (really!)

So far, we’ve spent a lot of time on the “n browser tabs” problem, and no time at all on the nuts-and-bolts, should-we-use-React-or-Angular decisions that come along with building a new front-end.

This discussion is nuanced enough to deserve its very own RFC and "hello world" demo branch. (When it's published, we'll update this RFC with the link.)

(Spoiler: we should use React.) 😈

🙇‍♀️ Thanks for reading!

@doeg doeg added the Type: RFC Request For Comment label Dec 4, 2020
@deepthi
Copy link
Member

deepthi commented Dec 4, 2020

Beautifully written! I can't wait to see the "hello world" demo 🍿

@doeg
Copy link
Contributor Author

doeg commented Mar 16, 2021

VTAdmin is well underway, so I'm going to close out this ticket. ❤️ Our roadmap going forward is here: https://github.com/vitessio/vitess/projects/12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTAdmin VTadmin interface Type: RFC Request For Comment
Projects
None yet
Development

No branches or pull requests

4 participants
@deepthi @doeg @askdba and others