-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: VTAdmin Architecture #7117
Labels
Comments
|
Beautifully written! I can't wait to see the "hello world" demo 🍿 |
6 tasks
9 tasks
7 tasks
|
VTAdmin is well underway, so I'm going to close out this ticket. ❤️ Our roadmap going forward is here: https://github.com/vitessio/vitess/projects/12 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🪐 VTAdmin Architecture RFC
Authors
This RFC introduces VTAdmin, a new web interface for managing multi-cluster Vitess deployments in a single browser tab. Long-term, we intend VTAdmin as a replacement for the current Vitess Admin interface.
Readers should be familiar with Vitess’s current architecture and the Vitess Admin interface.
Background
Today, operators manage Vitess deployments using the vtctld interface, also known as the Vitess Admin interface. Use cases include:
Here's a look at the current Vitess Admin topology browser:

...And here's the Workflows interface:

1 Topology Service = 1 browser tab
vtctldservers host both the Vitess Admin interface and the HTTP API it uses. Thevtctld's API endpoints in turn issue requests against a Topology Service.The Vitess documentation shows the architecture of a standard, single-cluster Vitess deployment:

Most Vitess deployments have a single Topology Service and a single Vitess Admin interface. We call these single-cluster deployments. Large Vitess deployments can be multi-cluster, consisting of multiple Topology Services.
Multi-cluster deployments are useful for data isolation, since reads and writes never exceed the boundary of a single topology. Operators can deploy separate clusters called
prod,dev, andqa, for example, or silo data across several geographic regions to comply with data residency policies.Since every Topology Service requires its own Vitess Admin interface, operators managing a multi-cluster deployment wind up with a lot of browser tabs. If we try to rewrite Vitess Admin to unify multiple Topology Servers in a single tab, we encounter two big problems.
First, we’d need to significantly rearchitect the
vtctldserver to multiplex requests across several Topology Services instead of just one. A rewrite of this magnitude complicates the API and introduces a large surface area for dangerous bugs. Avtctldcould, for example, write to the wrong Topology Service and accidentally delete a keyspace in the production cluster instead of in dev.Second, we’d need to rearchitect the front-end which brings us to…
Making front-end changes is really hard
The front-end codebase is largely unchanged since it was first built four years ago. Without regular upkeep, even simple changes are now prohibitively difficult.
Developers will encounter a toolchain circa 2017—Node v8 (current is v15), for example, and Angular 2.0.0-alpha.7-2 (current is Angular 11.0.2)—as well as hundreds of security warnings. To update our dependencies, we need to navigate four years’ worth of major version migrations without breaking the existing UI.
npm outdatedshows the diff between current and latest versions:npm auditlists hundreds of vulnerabilities “requiring manual review”:Building (another) brand new UI is a spicy proposal, but we see several compelling reasons to rearchitect VTAdmin from the ground up:
Vitess Admin is hosted on and queries data from the
vtctldhosts. Given that rearchitectingvtctlds to be multi-cluster is out of the question, Vitess Admin would need to be deployed independently before it could support multi-cluster queries. Rearchitecting the front-end on top of an unstable, outdated toolchain is intractable.Updating Vitess Admin’s dependencies will take several months, which would bring the current interface up to date but would not bring us closer to multi-cluster support (or any other new features). On the other hand, starting over lets us lean on industry standards (like create-react-app) to obviate tedious scaffolding work like the build toolchain, linting/formatting, test frameworks, and on. Our codebase will be simpler, more approachable, and more harmonious than it would be otherwise. And, best of all, we can spend our time on building the parts that are unique to VTAdmin.
Since Vitess Admin is used in production, “move fast and break things” isn’t an option. Whether upgrading dependencies or adding multi-cluster support, we’d need to be exceedingly careful to maintain backwards compatibility. In writing VTAdmin as a separate component, we isolate it from the rest of the Vitess codebase and allow for opt-in deploys.
Finally, and most optimistically, we have an opportunity to revisit our design decisions from four years ago with fresh eyes and (re)set the stage for exciting improvements to come.
Proposal
We propose a new UI and API, together called VTAdmin, to replace the current Vitess Admin interface. We aim to:
Over the next three months, we will build two new components:
vtadmin-web, a front-end for managing a Vitess deployment, starting with a UI for managing VReplication streamsvtadmin-api, an HTTP API that aggregates data across multiple clusters.Architecture
VTAdmin is deployed as two independently-hosted components,
vtadmin-webandvtadmin-api, which are mapped to one or more clusters.If a Vitess deployment has only one cluster, VTAdmin and Vitess Admin have similar architecture: one Topology Store, one interface. VTAdmin’s utility really shines in multi-cluster deployments, consolidating information across n clusters in a single browser tab instead of n Vitess Admin interfaces.
Service discovery
VTAdmin requires a new service discovery layer in order multiplex its requests to VTGates and
vtctlds across multiple clusters.Like the Vitess Topology Service, VTAdmin will support pluggable service discovery backends, allowing users to use their existing service discovery infrastructure. Initially, we will implement service discovery for at least Consul, while leaving room for other backends like Zookeeper, etcd, and static host lists.
The VTAdmin service discovery layer will be discussed in more detail in a separate RFC. (When it's published, we'll update this RFC with the link.) :)
Non-Goals
The following things could reasonably be goals in the medium-to-long term, but we explicitly omit them from our goals for the next three months:
We won’t remove the Vitess Admin interface until VTAdmin has acceptable feature parity with Vitess Admin, which will take several months. Eventually, we will deprecate the Vitess Admin interface (and vtctld2 codebase) and remove it after a generous transition period. We enthusiastically support early adoption of VTAdmin and welcome all feedback so the transition is as easy as possible.
We won’t add access control (ACLs)… yet. Adding ACLs on to a piece of infrastructure that spans multiple clusters requires its own design and considerations of how Vitess users manage identity across a set of clusters. Consequently, we will defer this until a later version.
We won’t issue write operations, since we consider ACLs a prerequisite. For example, VTAdmin’s VReplication UI will display VReplication streams but will not create or update them.
Putting it all together
Since we propose starting with a new VReplication UI, let’s look at how this would work in a multi-cluster deployment.
Let’s assume our Vitess deployment has three clusters:
prod,dev, andqa. One could also imagine clustering by geographic region, which means our clusters might look more likeus-east-1,ap-northeast-1, andeu-north-1.When deploying for the first time, a new
vtadminuser is added togrpc_auth_static_creds.jsonto allowvtadmin-apito issue requests against VTGates andvtctlds:{ "Username": "vtadmin", "Password": "********" }Deploying VTAdmin is then very straightforward and self-contained:
./vtadmin-api --grpc-user vtadmin --clusters "dev,prod,qa" --port 12345 ./vtadmin-web --port 12346At this point, we have a front-end at
https://vtadmin-web.example.com, which makes requests against the API onhttps://vtadmin-api.example.com. Together, they unify data across all three Topology Stores in a single browser tab. 🎉Let’s imagine our new VReplication UI in more detail:
https://vtadmin-web.example.com/vrep/streamsrenders a list of VReplication streams across all of the shards onprod,dev, andqain a single browser tab.The front-end makes an HTTP request to
GET https://vtadmin-api.example.com/api/vrep/streams.clusterparameter is used restrict the query to a subset of the clusters:GET /api/vreplication/streams?cluster=prod. This can be useful if latency is a concern, like geographically distant clusters.vtadmin-apiuses service discovery to find the primaryvtctldin each cluster, and issues a[VReplicationExec](https://vitess.io/docs/reference/features/vreplication/#vreplicationexec)query over gRPC across all of the keyspaces and shards in that cluster. (In practice, doing this performantly can be tricky and requires coordination between the front- and back-end to avoid too many gRPC requests at once.)…and making front-end changes is really easy (really!)
So far, we’ve spent a lot of time on the “n browser tabs” problem, and no time at all on the nuts-and-bolts, should-we-use-React-or-Angular decisions that come along with building a new front-end.
This discussion is nuanced enough to deserve its very own RFC and "hello world" demo branch. (When it's published, we'll update this RFC with the link.)
(Spoiler: we should use React.) 😈
🙇♀️ Thanks for reading!
The text was updated successfully, but these errors were encountered: