Initial vtadmin-api, clusters, and service discovery #7187

ajm188 · 2020-12-15T23:02:41Z

Backport

NO

Status

READY

Description

This is the initial implementation of the vtadmin-api! It doesn't have everything, but this was the, uhh, smallest reasonable place to draw a line of "$thing that works and can do something useful". At a high level, this adds:

a multiplexed gRPC/HTTP server, which vtadmin uses to serve both request types over the same port. There's no precedent for something like this in Vitess, and to simplify the initial move (no dependency on servenv, global flags that prevent/complicate the use of cobra, etc), this lives in go/vt/vtadmin/grpcserver. I think it makes sense to leave this as-is for now, and figure out how to unify this with the servenv.GRPCServer later. Happy to hear disagreements, though!
A proto definition of some of the vtadmin-api RPCs, with associated types.
Introduction of a cluster abstraction and service discovery interface. More on this below.
An implementation of the RPC interface using (3).

VTAdmin Clusters

A mini design doc in a PR!

Abstract

One of the bigger benefits of VTAdmin over the vtctld API / admin UI is that VTAdmin provides a single place to view and administer many Vitess clusters than a single one.

This document proposes designs for the following components for VTAdmin:

A cluster abstraction.
Cluster-based configuration, to support a wide-ranging set of customer needs.
Per-cluster service discovery.

Expected Use Cases

We expect most clusters to be either environments (e.g. dev/prod/qa), or geographic regions, whether AWS (us-east-1, eu-north-1, ap-northeast-1), GCP (us-west1, europe-north1, asia-northeast2), Azure (East US, Norway East, Japan East), or others. Clusters in theory can be any arbitrary way a Vitess user choose to splice up their keyspaces, so hard-coding a list of expected names (i.e. AWS regions) prohibits many use cases.

Different clusters need different configurations, even for the same overall deployment. A small list of things that may differ per-cluster:

gRPC credentials to each of VTGates and Vtctlds.
Different sets of gates to route to for admin queries (to prevent admin queries from taking resources from gates that power your application traffic).
Different discovery service configurations:
- With consul as an example, you may have different datacenters that vtgates and vtctlds register themselves to.
- Different service entry cache times for certain clusters, for example if they are geographically further from where your vtadmin deployment runs.
- As above, possibly different credentials or authentication mechanisms to your service discovery backends.
- You may, though unlikely, use completely different service discovery backends in your different clusters. For example, use zookeeper in cluster1 but k8s in cluster2, because you’re in the middle of a migration to kubernetes.

Goal

Define the concept of a generalized cluster. Clusters should provide maximal flexibility in configuration, allowing VTAdmin users to tweak as much behavior as possible without making code changes or rebuilding the vtadmin-api binaries (so -ldflags–based solutions are out).

Non-Goal

Allow VTAdmin users to change the behavior of VTAdmin clusters at runtime. Though this is possible with the proposed design, it is sufficiently complex to warrant a separate design document. We should get a v1 landed and used in production first, and then revisit runtime configuration. It may also turn out it’s not a highly-desired feature.

Proposal

To support a generalized cluster in vtadmin-api, we propose two things:

The definition of a cluster as (discover service + vtctld interface + vtgate / db interface).
Two DSN flags to support extremely flexible and arbitrary configuration of clusters.
Additionally, allowing these configurations to be read from YAML files, to provide better ergonomics.

Cluster definition

Clusters will be created via configs (see vtadmin/cluster/config.go). VTAdmin can then maintain a mapping of map[string]*cluster.Cluster and delegate all logic to a specific cluster depending on the name parameter of a given function/http endpoint/gRPC endpoint/etc, and return an error if a cluster with that name was never configured.

During startup, vtadmin-api parses three flags, -cluster-config , a path to a YAML representation of cluster configs, -cluster-defaults, for setting options common to all or most of your clusters, and -cluster, a repeated flag for specifying per-cluster options and overriding global options. Then, the levels of configs are merged according to the following precedence:

Per-cluster configs set on the command line.
Per-cluster configs set in the YAML file.
Cluster defaults set on the command line.
Cluster defaults set in the YAML file.

Finally, each fully-merged cluster config is used to produce a *cluster.Cluster, which the API uses.

To facilitate flexible, independent configurations, both of these flags take a DSN as their argument, loosely described as follows:

id= # Config.ID
name= # Config.Name
discovery= # Config.DiscoveryImpl
discovery-(?P<impl>[a-z][^ -]*)-(?P<flag>\w+)= # Config.DiscoveryFlagsByImpl[impl][flag]
vtsql-(?<flag>\w+)= # Config.VtSQLFlags

That "discovery-" wildcard provides per-discovery-implementation level configuration. The second segment (going by - as the separator) should be the name of the corresponding discovery implementation (e.g. "``consul``", "``etcd2``"). This value groups the flags, with the discovery-${impl}- prefix removed. These flags are then passed along the given discovery implementation’s factory, which can parse those flags with its own implementation-specific flag.FlagSet.

Note: We also add vtsql-* flags similarly, and expect to add vtctl-* flags, depending on how that cluster-specific services may need to be configured, but discovery is more complicated and interesting, so it’s the thing we discuss.

This leads to the following example usage (with a bash helper to make it look nice):

cluster_defaults=(
  discovery=consul
  discovery-consul-vtgate-datacenter-tmpl="dev-{{ .Name }}-mydatacenter-{{ .ID }}"
  discovery-consul-vtgate-addr-tmpl="{{ .Name }}.my.cool.website:15000"
)
# via https://stackoverflow.com/a/29637493
function join() {
  local IFS="$1"
  shift
  echo -n "$*"
}
./vtadmin \
  -cluster-defaults="$(join "," "${cluster_defaults[@]}")" \
  -cluster "name=cluster1,id=id1" \
  -cluster "name=cluster2,id=id2,discovery-consul-vtgate-datacenter-tmpl={{ .Name }}-prod-mydatacenter" \
  -cluster "name=cluster3,id=id3,discovery-consul-vtgate-addr-tmpl={{ .Name }}.internal.cool.website:15000"

This approach has some downsides, namely:

Validation of DiscoveryImpl and DiscoveryFlagsByImpl is deferred until discovery-creation time. This makes a subpar parsing experience, in that technically invalid flags don’t get caught during the initial flag.Parse() in vtadmin-api’s entrypoint. This is also true of VtSQLFlags.
Discovery flag parsing and validation is entirely dependent on the specified implementation. This can potentially cause surprises. Therefore, the recommendation, which we cannot enforce, is that each discovery implementation’s factory define a flag.FlagSet and use that to parse the deferred flags.
- Further, it’s possible to pass flags completely unrelated to a cluster’s discovery implementation (e.g., setting -cluster discovery=consul,discovery-etcd2-foo=bar) and have those silently ignored. This is in fact required to support the -cluster-defaults functionality, as different clusters may use different discovery implementations, and we need to support setting defaults for both implementations. In my opinion, this is fine, and just something to document and be aware of.

For completeness, the YAML representation of this config is:

defaults:
  discovery: consul
  discovery-consul-vtgate-datacenter-tmpl: "dev-{{ .Name }}-mydatacenter-{{ .ID }}"
  discovery-consul-vtgate-addr-tmpl: "{{ .Name }}.my.cool.website:15000"

clusters:
  id1:
    name: cluster1
  id2:
    name: cluster2
    discovery-consul-vtgate-datacenter-tmpl: "{{ .Name }}-prod-mydatacenter"
  id3:
    name: cluster3
    discovery-consul-vtgate-addr-tmpl: "{{ .Name }}.internal.cool.website:15000"

Discovery

VTAdmin will provide several discovery implementations out-of-the-box. These include:

consul - this PR
staticfile - Tentatively planned for v1
etcd2
k8s
zk

VTAdmin will also support custom discovery implementations via plugin loading. First, though, we will consider the built-in case.

VTAdmin discovery will maintain a private factory registry, similar to many other Vitess components. During cluster initialization, a cluster will call discovery.New(clusterName, impl, args). New will lookup the corresponding factory for that discovery implementation, and call it with the given cluster name and args; these args are the flags described above, and a factory should parse these using a FlagSet.

This looks like:

type Factory func(cluster string, args []string) (Discovery, error)

var (
    ErrImplementationNotRegistered = errors.New("no factory registered for implementation")
    registry = map[string]Factory{}
)

func Register(name string, factory Factory) {
    _, ok := registry[name]
    if ok {
        panic("[discovery] factory already registered for " + name)
    }
    registry[name] = factory
}

func New(cluster string, impl string, args []string) (Discovery, error) {
    factory, ok := registry[impl]
    if !ok {
        return nil, fmt.Errorf("%w %s", ErrImplementationNotRegistered, impl)
    }
    return factory(cluster, args)
}

func init() {
    Register("consul", NewConsul)
    Register("etcd2", NewEtcd2)
    // .... etc
}

Discovery Plugins

In the event the builtin discovery implementations do not work for all use cases, users may write their own, which vtadmin will load via package plugin. They should provide a file which exports a function New(cluster string, args []string) (discovery.Discovery, error), compile their code with go build -buildmode=plugin, and make the resulting .so file accessible to vtadmin-api [1].

Then in their command-line flags, specify discovery=plugin:/path/to/my.so, and their plugin will be registered and used. Discovery flags should then be set with discovery-my.so-* (this exact spec may change depending on how much it complicates the implementation).

The New function above then becomes:

func New(cluster string, impl string, args []string) (Discovery, error) {
    factory, ok := registry[impl]
    if !ok {
        if strings.HasPrefix(impl, "plugin:") {
            factory, err := pluginLoad(impl)
            if err != nil {
                return nil, err
            }
            Register(filepath.Base(impl), factory)
            return factory(cluster, args)
        }
        return nil, fmt.Errorf("%w %s", ErrImplementationNotRegistered, impl)
    }
    return factory(cluster, args)
}

func pluginLoad(impl string) (Factory, error) {
    pluginPath := strings.Split(impl, ":")[1]
    if pluginPath == "" {
        return nil, fmt.Errorf("plugin path cannot be empty, have %s", impl)
    }
    p, err := plugin.Open(pluginP)
    if err != nil {
        return nil, err
    }
    f, err := p.Lookup("New")
    if err != nil {
        return nil, err
    }
    factory, ok := f.(func(string, []string) (Discovery, error))
    if !ok {
        return nil, fmt.Errorf("symbol New in plugin %s was not of type Factory", pluginPath)
    }
    return factory, nil
}

Consul Discovery

This PR includes an implementation of consul-backed service discovery, which is also documented to serve as a reference for future implementations.

One note: The current interface uses tags []string as a parameter to every function because it maps nicely onto the internal implementation details of consul. This may not work as well for other discovery service APIs, in which case our options are probably:

Switch to a map[string]string and accept that each implementation needs to handle arbitrary maps.
Remove the parameter entirely and make it part of the discovery configuration parsed at initialization. The biggest downside here is we would no longer be able to specify tags (search filters) on a per-call basis, but that’s actually not needed by vtadmin currently, and it’s possible it never will.

Appendix

[1]: Sample custom plugin:

package main

import "vitess.io/vitess/go/vt/vtadmin/cluster/discovery"

type mySecretDiscovery struct {
    // ... other fields omitted
}

// [implement discovery interface]

func New(cluster string, args []string) (discovery.Discovery, error) {
    // your constructor here
    return &mySecretDiscovery{...}, nil
}

Related Issue(s)

List related PRs against other branches:

Todos

Tests
Documentation
CODEOWNERS for vtadmin
README in go/vt/vtadmin documenting alpha state of the service, "use at your risk, api not subject to backwards-compatibility guarantees" etc etc

Known bugs

The call to (vtsql.DB).Dial() returns an error if the user does not provide a credentials flag to vtsql for a cluster. This is because vitessdriver.OpenWithConfiguration only calls RegisterDialer if len(c.GRPCDialOptions) > 0, and without the credentials options, that is the case. The fix is either to have vitessdriver.Configuration take an option to always register, or to make the call to vtgateconn.RegisterDialer ourselves before calling OpenWithConfiguration

Deployment Notes

Notes regarding deployment of the contained body of work. These should note any
db migrations, etc.

Impacted Areas in Vitess

List general components of the application that this PR will affect:

Query Serving
VReplication
Cluster Management
Build

Signed-off-by: Andrew Mason <amason@slack-corp.com>

This will make greater cohesion at the cluster layer between vtsql and discovery, I promise. Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

…me here This adds the following: - multiplexed gRPC/HTTP server. I'm not using the `servenv` gRPC setup, because I want to get a workable version out the door without having to work our existing code too closely into the vitess grpc set up (plus, vitess doesn't support the multiplexing that I want). We can definitely revisit bringing these together later. - complete implementation of the VTAdminServer interface, as well as an HTTP-wrapped interface to it. This requires a bunch of plumbing to do in an ergonomic way, which is what `vtadmin/http` is for, as well as `vtadmin/errors`. - Add the CLI entrypoint Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

…CLI-powered map Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

1. vtsql needs a discovery 1, mistaken wg.Done() replaced with Wait() Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

The glog package sets these on the global flagset on init, and exposes no way to attach those flags, and only those flags, to a different flagset, so we're stuck looking them up by hard-coded name. Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

…entation!) Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Andrew Mason <amason@slack-corp.com>

derekperkins · 2020-12-15T23:12:00Z

Plugins are an interesting beast in Go. Historically in Vitess, we've required users to compile their own code directly into the binary, which is definitely a barrier to entry if you just want to use default images for the most part. AFAIK, the stdlib plugin is dead in the water. The most common option I've seen for plugins is probably https://github.com/hashicorp/go-plugin.

For the time being and for v1, I would probably not worry about plugin compatibility, and just make it pluggable with a function like you describe, allowing for users to just compile their code with the binary if they want. We can revisit plugins more broadly another time.

ajm188 · 2020-12-16T00:32:47Z

Just pointing out that plugins are described in the overall feature but not actually implemented in this PR. It's something I definitely want to have in the fully-finished vtadmin, but I don't have particularly strong feelings besides "there should be some way to provide additional implementations that aren't in vitess:master". I'll take a look at that repo!

ajm188 · 2020-12-16T00:36:59Z

Is there a way I can disable the race detector test? It's failing on the grpcserver tests that I wrote, which are .... actually designed that way 😅 I don't want to add a mutex to the Server struct solely to make the test safe from a static analysis perspective. The reason (I believe) it's technically safe is because the s.serving value just needs to be eventually consistent from the perspective of the test code, so even though the read in the test is technically racing with the write in ListenAndServe, it doesn't matter because eventually the test will pick up the change.

Signed-off-by: Andrew Mason <amason@slack-corp.com>

rohit-nayak-ps

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>

rohit-nayak-ps

Looks great.
I will merge it once the codeowners conflict is resolved.

rohit-nayak-ps · 2020-12-23T16:23:05Z

go/vt/vtadmin/api_test.go

+	assert.Error(t, err)
+}
+
+func TestGetTabet(t *testing.T) {


typo: TestGetTabet => TestGetTablet

rohit-nayak-ps · 2020-12-23T16:24:13Z

go/vt/vtadmin/http/handlers/trace.go

+//			return NewJSONResponse(api.Something(ctx))
+//		}
+//
+// An unnamed route will get a span named "vtfun:http:<unnamed route>".


Nit: Two references to VTFun in the comments :-)

Signed-off-by: Andrew Mason <amason@slack-corp.com>

ajm188 · 2020-12-23T17:52:37Z

@rohit-nayak-ps should be good to go now!

ajm188 added 30 commits December 8, 2020 22:26

Initial proto with just a vtgate type def

72cae77

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add discovery interface and factory registration/construction

ca1bd81

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add consul discovery implementation

62b2a14

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add shared function bodies into helpers so we can correctly trace

a0eae6d

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add shim consul client interface to facilitate testing, add tests

1c10381

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add vtsql implementation

c761b34

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Make credentials optional, add a test

ab3ea74

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add parsing capabilities to vtsql configs

1da2005

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Refactor to allow templated credential paths

2deb4a2

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add more parse tests, change a flag name slightly

cc7e67c

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add a top-level function for parsing without an existing config

3cfc00f

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Make discovery (as much as possible) depend on a pflag format

9c221d5

This will make greater cohesion at the cluster layer between vtsql and discovery, I promise. Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add cluster component to vtadmin

498eb97

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add simple RPCs, we have everything we need to implement them now

86866be

Signed-off-by: Andrew Mason <amason@slack-corp.com>

wip

d4aae17

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add support for pflag in cluster flags, plus a convenience constructor

a5c00df

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add ServingState to tablets, add helper function

1d6b5a5

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Bugfix to handle uninitialized map case

8b3294c

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Bugfix: clusters is the length of the configs, not the length of the …

1cdd98c

…CLI-powered map Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add http healthcheck endpoint

5aa4d2a

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Bugfixes:

7220691

1. vtsql needs a discovery 1, mistaken wg.Done() replaced with Wait() Signed-off-by: Andrew Mason <amason@slack-corp.com>

Bugfix: use value so we don't need to handle nil pointer

83f65e2

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Merge branch 'vitessio:master' into am_vtadmin_cluster

7a8386c

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add support for the base log flags

21deb5d

The glog package sets these on the global flagset on init, and exposes no way to attach those flags, and only those flags, to a different flagset, so we're stuck looking them up by hard-coded name. Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add tests to grpcserver

c6ff0dd

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add fakediscovery, write tests for Gates API (and for the fake implem…

55ac522

…entation!) Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add fakevtsql implementation for mocking out vtsql.DB

a9d036e

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add GetTablets tests

eb4943e

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Add GetTablet tests

fec97dd

Signed-off-by: Andrew Mason <amason@slack-corp.com>

ajm188 requested review from deepthi, doeg and rohit-nayak-ps December 15, 2020 23:02

ajm188 marked this pull request as ready for review December 15, 2020 23:02

ajm188 requested a review from sougou as a code owner December 15, 2020 23:02

ajm188 requested a review from setassociative December 15, 2020 23:04

Rewrite test to use RWLock to not trip the race detector

f308d3c

Signed-off-by: Andrew Mason <amason@slack-corp.com>

rohit-nayak-ps reviewed Dec 16, 2020

View reviewed changes

ajm188 added 2 commits December 16, 2020 20:35

Start timing before shutting down, so we don't lose those ms

e449917

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Apply copyright notices everywhere

1e8a761

Signed-off-by: Andrew Mason <amason@slack-corp.com>

doeg added a commit to tinyspeck/vitess that referenced this pull request Dec 22, 2020

Steal vtadmin.proto from vitessio#7187

626a81e

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>

doeg mentioned this pull request Dec 22, 2020

[vtadmin-web] The tiniest possible first implementation of vtadmin-web #7218

Merged

7 tasks

rohit-nayak-ps approved these changes Dec 23, 2020

View reviewed changes

ajm188 added 2 commits December 23, 2020 12:50

Fix typos and comments

2702e72

Signed-off-by: Andrew Mason <amason@slack-corp.com>

Merge branch 'vitessio:master' into am_vtadmin_cluster

0628517

Signed-off-by: Andrew Mason <amason@slack-corp.com>

rohit-nayak-ps merged commit a0a85a9 into vitessio:master Dec 23, 2020

This was referenced Dec 23, 2020

Add @ajm188 + @doeg as vtadmin codeowners #7223

Merged

[vtadmin-api] Add static file service discovery implementation #7229

Merged

askdba added this to the v9.0 milestone Jan 6, 2021

doeg added this to In progress in VTAdmin via automation Mar 16, 2021

doeg moved this from In progress to Done in VTAdmin Mar 16, 2021

rafael mentioned this pull request Mar 17, 2021

Slack vitess 9 2021.03.16r0 tinyspeck/vitess#200

Merged

ajm188 added Component: VTAdmin VTadmin interface Type: Feature Request labels May 21, 2021

doeg mentioned this pull request Nov 8, 2021

[vtadmin] etcd discovery implementation #9162

Open

mattlord mentioned this pull request Feb 14, 2024

VStreamer Unit Tests: framework to remove the need to specify serialized strings in row events for unit tests #14903

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial vtadmin-api, clusters, and service discovery #7187

Initial vtadmin-api, clusters, and service discovery #7187

ajm188 commented Dec 15, 2020

derekperkins commented Dec 15, 2020

ajm188 commented Dec 16, 2020

ajm188 commented Dec 16, 2020

rohit-nayak-ps left a comment

rohit-nayak-ps left a comment

rohit-nayak-ps Dec 23, 2020

rohit-nayak-ps Dec 23, 2020

ajm188 Dec 23, 2020

ajm188 commented Dec 23, 2020

Initial vtadmin-api, clusters, and service discovery #7187

Initial vtadmin-api, clusters, and service discovery #7187

Conversation

ajm188 commented Dec 15, 2020

Backport

Status

Description

VTAdmin Clusters

Abstract

Expected Use Cases

Goal

Non-Goal

Proposal

Cluster definition

Discovery

Discovery Plugins

Consul Discovery

Appendix

Related Issue(s)

Todos

Known bugs

Deployment Notes

Impacted Areas in Vitess

derekperkins commented Dec 15, 2020

ajm188 commented Dec 16, 2020

ajm188 commented Dec 16, 2020

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

rohit-nayak-ps Dec 23, 2020

Choose a reason for hiding this comment

rohit-nayak-ps Dec 23, 2020

Choose a reason for hiding this comment

ajm188 Dec 23, 2020

Choose a reason for hiding this comment

ajm188 commented Dec 23, 2020