Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial vtadmin-api, clusters, and service discovery #7187

Merged
merged 40 commits into from Dec 23, 2020

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Dec 15, 2020

Backport

NO

Status

READY

Description

This is the initial implementation of the vtadmin-api! It doesn't have everything, but this was the, uhh, smallest reasonable place to draw a line of "$thing that works and can do something useful". At a high level, this adds:

  1. a multiplexed gRPC/HTTP server, which vtadmin uses to serve both request types over the same port. There's no precedent for something like this in Vitess, and to simplify the initial move (no dependency on servenv, global flags that prevent/complicate the use of cobra, etc), this lives in go/vt/vtadmin/grpcserver. I think it makes sense to leave this as-is for now, and figure out how to unify this with the servenv.GRPCServer later. Happy to hear disagreements, though!
  2. A proto definition of some of the vtadmin-api RPCs, with associated types.
  3. Introduction of a cluster abstraction and service discovery interface. More on this below.
  4. An implementation of the RPC interface using (3).

VTAdmin Clusters

A mini design doc in a PR!

Abstract

One of the bigger benefits of VTAdmin over the vtctld API / admin UI is that VTAdmin provides a single place to view and administer many Vitess clusters than a single one.

This document proposes designs for the following components for VTAdmin:

  • A cluster abstraction.
  • Cluster-based configuration, to support a wide-ranging set of customer needs.
  • Per-cluster service discovery.

Expected Use Cases

We expect most clusters to be either environments (e.g. dev/prod/qa), or geographic regions, whether AWS (us-east-1, eu-north-1, ap-northeast-1), GCP (us-west1, europe-north1, asia-northeast2), Azure (East US, Norway East, Japan East), or others. Clusters in theory can be any arbitrary way a Vitess user choose to splice up their keyspaces, so hard-coding a list of expected names (i.e. AWS regions) prohibits many use cases.

Different clusters need different configurations, even for the same overall deployment. A small list of things that may differ per-cluster:

  • gRPC credentials to each of VTGates and Vtctlds.
  • Different sets of gates to route to for admin queries (to prevent admin queries from taking resources from gates that power your application traffic).
  • Different discovery service configurations:
    • With consul as an example, you may have different datacenters that vtgates and vtctlds register themselves to.
    • Different service entry cache times for certain clusters, for example if they are geographically further from where your vtadmin deployment runs.
    • As above, possibly different credentials or authentication mechanisms to your service discovery backends.
    • You may, though unlikely, use completely different service discovery backends in your different clusters. For example, use zookeeper in cluster1 but k8s in cluster2, because you’re in the middle of a migration to kubernetes.

Goal

Define the concept of a generalized cluster. Clusters should provide maximal flexibility in configuration, allowing VTAdmin users to tweak as much behavior as possible without making code changes or rebuilding the vtadmin-api binaries (so -ldflags–based solutions are out).

Non-Goal

Allow VTAdmin users to change the behavior of VTAdmin clusters at runtime. Though this is possible with the proposed design, it is sufficiently complex to warrant a separate design document. We should get a v1 landed and used in production first, and then revisit runtime configuration. It may also turn out it’s not a highly-desired feature.

Proposal

To support a generalized cluster in vtadmin-api, we propose two things:

  1. The definition of a cluster as (discover service + vtctld interface + vtgate / db interface).
  2. Two DSN flags to support extremely flexible and arbitrary configuration of clusters.
    Additionally, allowing these configurations to be read from YAML files, to provide better ergonomics.

Cluster definition

Clusters will be created via configs (see vtadmin/cluster/config.go). VTAdmin can then maintain a mapping of map[string]*cluster.Cluster and delegate all logic to a specific cluster depending on the name parameter of a given function/http endpoint/gRPC endpoint/etc, and return an error if a cluster with that name was never configured.

During startup, vtadmin-api parses three flags, -cluster-config , a path to a YAML representation of cluster configs, -cluster-defaults, for setting options common to all or most of your clusters, and -cluster, a repeated flag for specifying per-cluster options and overriding global options. Then, the levels of configs are merged according to the following precedence:

  1. Per-cluster configs set on the command line.
  2. Per-cluster configs set in the YAML file.
  3. Cluster defaults set on the command line.
  4. Cluster defaults set in the YAML file.

Finally, each fully-merged cluster config is used to produce a *cluster.Cluster, which the API uses.

To facilitate flexible, independent configurations, both of these flags take a DSN as their argument, loosely described as follows:

id= # Config.ID
name= # Config.Name
discovery= # Config.DiscoveryImpl
discovery-(?P<impl>[a-z][^ -]*)-(?P<flag>\w+)= # Config.DiscoveryFlagsByImpl[impl][flag]
vtsql-(?<flag>\w+)= # Config.VtSQLFlags

That "discovery-" wildcard provides per-discovery-implementation level configuration. The second segment (going by - as the separator) should be the name of the corresponding discovery implementation (e.g. "``consul``", "``etcd2``"). This value groups the flags, with the discovery-${impl}- prefix removed. These flags are then passed along the given discovery implementation’s factory, which can parse those flags with its own implementation-specific flag.FlagSet.

Note: We also add vtsql-* flags similarly, and expect to add vtctl-* flags, depending on how that cluster-specific services may need to be configured, but discovery is more complicated and interesting, so it’s the thing we discuss.

This leads to the following example usage (with a bash helper to make it look nice):

cluster_defaults=(
  discovery=consul
  discovery-consul-vtgate-datacenter-tmpl="dev-{{ .Name }}-mydatacenter-{{ .ID }}"
  discovery-consul-vtgate-addr-tmpl="{{ .Name }}.my.cool.website:15000"
)
# via https://stackoverflow.com/a/29637493
function join() {
  local IFS="$1"
  shift
  echo -n "$*"
}
./vtadmin \
  -cluster-defaults="$(join "," "${cluster_defaults[@]}")" \
  -cluster "name=cluster1,id=id1" \
  -cluster "name=cluster2,id=id2,discovery-consul-vtgate-datacenter-tmpl={{ .Name }}-prod-mydatacenter" \
  -cluster "name=cluster3,id=id3,discovery-consul-vtgate-addr-tmpl={{ .Name }}.internal.cool.website:15000"

This approach has some downsides, namely:

  • Validation of DiscoveryImpl and DiscoveryFlagsByImpl is deferred until discovery-creation time. This makes a subpar parsing experience, in that technically invalid flags don’t get caught during the initial flag.Parse() in vtadmin-api’s entrypoint. This is also true of VtSQLFlags.
  • Discovery flag parsing and validation is entirely dependent on the specified implementation. This can potentially cause surprises. Therefore, the recommendation, which we cannot enforce, is that each discovery implementation’s factory define a flag.FlagSet and use that to parse the deferred flags.
    • Further, it’s possible to pass flags completely unrelated to a cluster’s discovery implementation (e.g., setting -cluster discovery=consul,discovery-etcd2-foo=bar) and have those silently ignored. This is in fact required to support the -cluster-defaults functionality, as different clusters may use different discovery implementations, and we need to support setting defaults for both implementations. In my opinion, this is fine, and just something to document and be aware of.

For completeness, the YAML representation of this config is:

defaults:
  discovery: consul
  discovery-consul-vtgate-datacenter-tmpl: "dev-{{ .Name }}-mydatacenter-{{ .ID }}"
  discovery-consul-vtgate-addr-tmpl: "{{ .Name }}.my.cool.website:15000"

clusters:
  id1:
    name: cluster1
  id2:
    name: cluster2
    discovery-consul-vtgate-datacenter-tmpl: "{{ .Name }}-prod-mydatacenter"
  id3:
    name: cluster3
    discovery-consul-vtgate-addr-tmpl: "{{ .Name }}.internal.cool.website:15000"

Discovery

VTAdmin will provide several discovery implementations out-of-the-box. These include:

  • consul - this PR
  • staticfile - Tentatively planned for v1
  • etcd2
  • k8s
  • zk

VTAdmin will also support custom discovery implementations via plugin loading. First, though, we will consider the built-in case.

VTAdmin discovery will maintain a private factory registry, similar to many other Vitess components. During cluster initialization, a cluster will call discovery.New(clusterName, impl, args). New will lookup the corresponding factory for that discovery implementation, and call it with the given cluster name and args; these args are the flags described above, and a factory should parse these using a FlagSet.

This looks like:

type Factory func(cluster string, args []string) (Discovery, error)

var (
    ErrImplementationNotRegistered = errors.New("no factory registered for implementation")
    registry = map[string]Factory{}
)

func Register(name string, factory Factory) {
    _, ok := registry[name]
    if ok {
        panic("[discovery] factory already registered for " + name)
    }
    registry[name] = factory
}

func New(cluster string, impl string, args []string) (Discovery, error) {
    factory, ok := registry[impl]
    if !ok {
        return nil, fmt.Errorf("%w %s", ErrImplementationNotRegistered, impl)
    }
    return factory(cluster, args)
}

func init() {
    Register("consul", NewConsul)
    Register("etcd2", NewEtcd2)
    // .... etc
}

Discovery Plugins

In the event the builtin discovery implementations do not work for all use cases, users may write their own, which vtadmin will load via package plugin. They should provide a file which exports a function New(cluster string, args []string) (discovery.Discovery, error), compile their code with go build -buildmode=plugin, and make the resulting .so file accessible to vtadmin-api [1].

Then in their command-line flags, specify discovery=plugin:/path/to/my.so, and their plugin will be registered and used. Discovery flags should then be set with discovery-my.so-* (this exact spec may change depending on how much it complicates the implementation).

The New function above then becomes:

func New(cluster string, impl string, args []string) (Discovery, error) {
    factory, ok := registry[impl]
    if !ok {
        if strings.HasPrefix(impl, "plugin:") {
            factory, err := pluginLoad(impl)
            if err != nil {
                return nil, err
            }
            Register(filepath.Base(impl), factory)
            return factory(cluster, args)
        }
        return nil, fmt.Errorf("%w %s", ErrImplementationNotRegistered, impl)
    }
    return factory(cluster, args)
}

func pluginLoad(impl string) (Factory, error) {
    pluginPath := strings.Split(impl, ":")[1]
    if pluginPath == "" {
        return nil, fmt.Errorf("plugin path cannot be empty, have %s", impl)
    }
    p, err := plugin.Open(pluginP)
    if err != nil {
        return nil, err
    }
    f, err := p.Lookup("New")
    if err != nil {
        return nil, err
    }
    factory, ok := f.(func(string, []string) (Discovery, error))
    if !ok {
        return nil, fmt.Errorf("symbol New in plugin %s was not of type Factory", pluginPath)
    }
    return factory, nil
}

Consul Discovery

This PR includes an implementation of consul-backed service discovery, which is also documented to serve as a reference for future implementations.

One note: The current interface uses tags []string as a parameter to every function because it maps nicely onto the internal implementation details of consul. This may not work as well for other discovery service APIs, in which case our options are probably:

  1. Switch to a map[string]string and accept that each implementation needs to handle arbitrary maps.
  2. Remove the parameter entirely and make it part of the discovery configuration parsed at initialization. The biggest downside here is we would no longer be able to specify tags (search filters) on a per-call basis, but that’s actually not needed by vtadmin currently, and it’s possible it never will.

Appendix

[1]: Sample custom plugin:

package main

import "vitess.io/vitess/go/vt/vtadmin/cluster/discovery"

type mySecretDiscovery struct {
    // ... other fields omitted
}

// [implement discovery interface]

func New(cluster string, args []string) (discovery.Discovery, error) {
    // your constructor here
    return &mySecretDiscovery{...}, nil
}

Related Issue(s)

List related PRs against other branches:

Todos

  • Tests
  • Documentation
  • CODEOWNERS for vtadmin
  • README in go/vt/vtadmin documenting alpha state of the service, "use at your risk, api not subject to backwards-compatibility guarantees" etc etc

Known bugs

  • The call to (vtsql.DB).Dial() returns an error if the user does not provide a credentials flag to vtsql for a cluster. This is because vitessdriver.OpenWithConfiguration only calls RegisterDialer if len(c.GRPCDialOptions) > 0, and without the credentials options, that is the case. The fix is either to have vitessdriver.Configuration take an option to always register, or to make the call to vtgateconn.RegisterDialer ourselves before calling OpenWithConfiguration

Deployment Notes

Notes regarding deployment of the contained body of work. These should note any
db migrations, etc.

Impacted Areas in Vitess

List general components of the application that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
This will make greater cohesion at the cluster layer between vtsql and
discovery, I promise.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
…me here

This adds the following:
- multiplexed gRPC/HTTP server. I'm not using the `servenv` gRPC setup,
  because I want to get a workable version out the door without having
  to work our existing code too closely into the vitess grpc set up
  (plus, vitess doesn't support the multiplexing that I want). We can
  definitely revisit bringing these together later.
- complete implementation of the VTAdminServer interface, as well as an
  HTTP-wrapped interface to it. This requires a bunch of plumbing to do
  in an ergonomic way, which is what `vtadmin/http` is for, as well as
  `vtadmin/errors`.
- Add the CLI entrypoint

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
…CLI-powered map

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
1. vtsql needs a discovery
1, mistaken wg.Done() replaced with Wait()

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
The glog package sets these on the global flagset on init, and exposes
no way to attach those flags, and only those flags, to a different
flagset, so we're stuck looking them up by hard-coded name.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
…entation!)

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188 ajm188 marked this pull request as ready for review December 15, 2020 23:02
@ajm188 ajm188 requested a review from sougou as a code owner December 15, 2020 23:02
@derekperkins
Copy link
Member

Plugins are an interesting beast in Go. Historically in Vitess, we've required users to compile their own code directly into the binary, which is definitely a barrier to entry if you just want to use default images for the most part. AFAIK, the stdlib plugin is dead in the water. The most common option I've seen for plugins is probably https://github.com/hashicorp/go-plugin.

For the time being and for v1, I would probably not worry about plugin compatibility, and just make it pluggable with a function like you describe, allowing for users to just compile their code with the binary if they want. We can revisit plugins more broadly another time.

@ajm188
Copy link
Contributor Author

ajm188 commented Dec 16, 2020

Just pointing out that plugins are described in the overall feature but not actually implemented in this PR. It's something I definitely want to have in the fully-finished vtadmin, but I don't have particularly strong feelings besides "there should be some way to provide additional implementations that aren't in vitess:master". I'll take a look at that repo!

@ajm188
Copy link
Contributor Author

ajm188 commented Dec 16, 2020

Is there a way I can disable the race detector test? It's failing on the grpcserver tests that I wrote, which are .... actually designed that way 😅 I don't want to add a mutex to the Server struct solely to make the test safe from a static analysis perspective. The reason (I believe) it's technically safe is because the s.serving value just needs to be eventually consistent from the perspective of the test code, so even though the read in the test is technically racing with the write in ListenAndServe, it doesn't matter because eventually the test will pick up the change.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor thing I noticed: all new source files should carry the copyright notice, new files should say (c) 2020.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
doeg added a commit to tinyspeck/vitess that referenced this pull request Dec 22, 2020
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.
I will merge it once the codeowners conflict is resolved.

assert.Error(t, err)
}

func TestGetTabet(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: TestGetTabet => TestGetTablet

// return NewJSONResponse(api.Something(ctx))
// }
//
// An unnamed route will get a span named "vtfun:http:<unnamed route>".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Two references to VTFun in the comments :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 ✅

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188
Copy link
Contributor Author

ajm188 commented Dec 23, 2020

@rohit-nayak-ps should be good to go now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants