Skip to content

Provide docs/standards about client/worker/proxy interactions in redeployable taskcluster#128

Merged
petemoore merged 29 commits intomasterfrom
rfc0128
Nov 6, 2018
Merged

Provide docs/standards about client/worker/proxy interactions in redeployable taskcluster#128
petemoore merged 29 commits intomasterfrom
rfc0128

Conversation

@petemoore
Copy link
Member

@petemoore petemoore commented Sep 25, 2018

Changes are required for clients of taskcluster services when taskcluster becomes redeployable.

This RFC serves to define the standards by which taskcluster deployments will publish a manifest of their service definitions, how client generators will query and interpret this manifest and associated reference and schema documents in order to generate clients, what the architectural changes to generated clients will be, how users of those clients (workers, worker authentication proxies, command line tools, software libraries) will be adapted in order to use the new clients, and how generated clients and client generators will be built, released and deployed.

This entails changes to the way workers will share deployment / authentication proxy configuration information with tasks, versioning of the API / Event reference schemas, versioning of the services definitions based on those API and Event reference schemas, and the possibility for clients to connect to different taskcluster deployments.

@djmitche
Copy link
Contributor

Is this going to include schemas for reference files and so on (like https://bugzilla.mozilla.org/show_bug.cgi?id=1476602)? If so, I'll stop working on that until this RFC is complete.

Copy link
Contributor

@imbstack imbstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far.


```
{
"version" : 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have found it easier to just include a $schema property to indicate the version of a document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

"<serviceName>": {
"api": true|false,
"exchanges": true|false,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to omit the API version.

Why switch from relative URLs to true/false? Both are probably fine, but the URLs are a little more discoverable for people trying to navigate these files by hand.

@djmitche djmitche self-requested a review October 23, 2018 18:32
example:

```
queue := queue.NewFromEnvVars()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just leave env var stuff to the person using the library? I don't think we need to build in any alternative ways of configuring other than passing in creds probably.

Copy link
Member Author

@petemoore petemoore Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for node.js and python, that wouldn't be the end of the world. The problem comes when you have static language clients where the client gets compiled into some other binary that get used in other places (for example, the taskcluster-client is used internally by generic-worker, but also by taskcluster-proxy, for example). In the case of generic worker, for example, it will fetch the root URL from the generic-worker.config file, rather than the TASKCLUSTER_ROOT_URL environment variable. If it were for some reason set in the environment, and then it took it from there, it might not necessarily be the correct TASKCLUSTER_ROOT_URL to use.

Also consider the situation where e.g. somebody writes a tool that syncs role definitions between two taskcluster deployments. If internally the client uses the environment variable TASKCLUSTER_ROOT_URL to discover which cluster it needs to talk to, only one can be specified in that process. It kind of assumes it is only talking to one deployment.

When writing executable code, such as a command line tool or web service, where you know what your tool is and what it is being used for, and have a feeling about the context it might be called from, it seems reasonable to take config from environment variables (e.g. for writing a web service). You also know only one value is required for each parameter (unlike the sync example above where internally in the code multiple clients can be created). But when it isn't a service or command line application you are writing, but just a library which is internally used by other command line tools and web services, (for example), it feels cleaner and more appropriate to take configuration as parameters passed to methods and functions. By providing a utility method to fetch config from env vars, you expose the possibility to the calling code to configure based on env vars, but you don't force it. Then it is up to the author of the command line tool / service / etc that calls your library to decide explicitly whether they want that or not. Then another advantage of being explicit is it is a little less magical and easier to reason about when you are degugging that code and trying to work out where it got the configuration from. I think. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all agree here -- libraries shouldn't be reading env variables without explicit instruction from the caller.

We just need to roll out that change gently since it will break every user of the client in possibly-subtle ways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I guess I misread @imbstack's comment, and thought he meant we should support only accepting env vars, but now I see he meant the opposite, so I didn't need to write all that! 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we all agreed that all env vars will no longer be automatically read by the clients, but the text of the RFC still refers only to TASKCLUSTER_ROOT_URL. I think clients should be consistent in which env vars they read, and they currently read TASKCLUSTER_CLIENT_ID etc.

This will be a big breaking change for our users (and I've held @tomprince back from using newer clients in anticipation), so I think we should be explicit about it and make sure they get a chance to comment here. Alternately, we could remove this bit from this RFC and move it to a new one.

Copy link
Contributor

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the renaming (api -> http, events/exchanges -> amqp), I don't feel like the new names make more sense than the old, but the old aren't perfect. It'd be nice to be consistent, whatever we choose, but the fewer things we rename, the easier this work will be. Renaming always seems easy, then you end up breaking things in production because you missed somewhere!


# 3. Details

## 3.1 The development lifecycle of APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier draft this was labeled as "current" -- I kind of liked that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops that got lost in a rewrite, I'm putting it back in. Nice spot!

currently existing reference schemas are these:

* [HTTP reference schema](https://schemas.taskcluster.net/base/v1/api-reference.json)
* [AMQP 0.9.1 reference schema](https://schemas.taskcluster.net/base/v1/exchanges-reference.json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have always called these "api" and "events", so let's continue with those names. Actually, we've sometimes called the latter "exchanges", so fixing that to "events" would be good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

versions, and released
6. Released software is tested and deployed

## 3.2 Changes to publishing API manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this section 4, proposed changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

references/schemas that the client reads dynamically should also be a
dependency of the project, which may be either frozen references/schemas,
fetched from a language package, or fetched dynamically during build/CI from a
taskcluster deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means -- how can the references be a dependency?


```go
func HTTPReference(rootURL string, version string) string
func AMQPReference(rootURL string, version string) string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are equivalent to the existing ApiReference and ExchangesReference methods, but omit the service. I'm not sure how that would work? Also, given that the manifest format allows arbitrary URLs, I think these methods are unnecessary?


These will return absolute urls to the `*-reference.json` documents.

## Changes to building docs site
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave these out..

@petemoore
Copy link
Member Author

Regarding the renaming (api -> http, events/exchanges -> amqp), I don't feel like the new names make more sense than the old, but the old aren't perfect. It'd be nice to be consistent, whatever we choose, but the fewer things we rename, the easier this work will be. Renaming always seems easy, then you end up breaking things in production because you missed somewhere!

That is a fair point. And I agree the new names aren't considerably better, so I will go back to the old names. The main naming issue I was trying to solve was that we have no collective term for API meaning HTTP API or AMQP API. We use API for the collective term, but then also sometimes just meaning the HTTP API (e.g. in the name apis.json). For example, does API references mean something that adheres to api-reference.json or exchange-reference.json, or just the former?

Maybe I can just update the docs to make it clear in each place I reference these terms, what I mean...

jhford
jhford previously approved these changes Oct 31, 2018
Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, @petemoore !

## 4.1 Changes to publishing API manifest

1. If a service of a taskcluster deployment provides an API interface, the
cluster may host the API reference document under
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally may and not must?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just to cover the case that you have an API but you don't want to publish it and people/clients to start using it...

```

How this list is generated by taskcluster is compiled and served by
taskcluster-references is not the concern of this RFC, but it is recommended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well formed manifest to drive client generation seems like a foundational element of the design. I don't want to speak for @petemoore , but I suspect what he's trying to say is that creating the manifest is out of scope for this document, but that the format which this document expects is in scope.

generate the URL paths, rather than requiring the consumer of this library to
use taskcluster-lib-urls. This keeps the involvement of taskcluster-lib-urls as
high up in the stack as possible, which makes the lower parts of the stack more
generic/flexible with fewer concerns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this design consideration!

manifest simply says "these are the APIs I declare, here is where you can fetch
their references, and they are self-describing, so go ask them". It doesn't
burn in any concerns about URL path building, or the types of reference we
support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach, I think it's the right way to handle the manifest.

## Changes to publication of API references and schemas

The implementation must serve the described resources under the given URLs set
out in this document. The author is not concerned with how a service declares
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful information to clearly mark what is and isn't in scope.

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

`TASKCLUSTER_PROXY_URL` and `TASKCLUSTER_ROOT_URL`. This gives them the
freedom to refer to either the proxy or the target service, as required.
Since they must explicitly configure the root url when using a taskcluster
client, both endpoints are at their disposal, based on what they wish to do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@petemoore
Copy link
Member Author

Note, I'll squash commits when I land, of course, and give a reasonable commit message!

This is ready for review, I am not intending to land any more changes.

Copy link
Contributor

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I'm a little worried that some JSON-schema tooling will struggle with the metadata field, but as long as that's not the case it sounds like a really elegant solution, not least because it means we can make "trivial" changes to the reference schema (such as modifying a description) without bumping the version number.

We had earlier planned to have the version number in the reference schemas. That would allow us to have services that are publishing references adhering to different schema versions to co-exist in the same deployment (and would couple nicely with the metadata property, so nobody needs to parse "-v1" out of a filename).

Would the metadata-based versioning apply to the manifest schema (api-manifest.json) as well?

Could we rename reference.json to indicate that it is a metaschema, not a reference or schema? Perhaps metadata-metaschema.json or something like that?

example:

```
queue := queue.NewFromEnvVars()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we all agreed that all env vars will no longer be automatically read by the clients, but the text of the RFC still refers only to TASKCLUSTER_ROOT_URL. I think clients should be consistent in which env vars they read, and they currently read TASKCLUSTER_CLIENT_ID etc.

This will be a big breaking change for our users (and I've held @tomprince back from using newer clients in anticipation), so I think we should be explicit about it and make sure they get a chance to comment here. Alternately, we could remove this bit from this RFC and move it to a new one.

* teams that deploy their own taskcluster environments, are able to include
additional references for programming interfaces not covered by the core
platform (for example, maybe it is desired to provide APIs for talking with
databases, other messaging buses, monitoring tools, ...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I see that you've been careful to indicate what this RFC requires, and what it merely discusses. I'm strongly opposed to a few of the things it discusses, but I'm happy to agree to the RFC only on the basis of what it requires.

Copy link
Contributor

@imbstack imbstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems generally reasonable to me!


* building taskcluster clients, that provide language-level programming
interfaces to taskcluster APIs
* refreshing `taskcluster-raw-docs` AWS S3 bucket, which is used by taskcluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we actually use these for docs. Every service just publishes to that bucket directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK.

Do you know where the docs site fetches the references and schemas from?

@petemoore petemoore merged commit f168db0 into master Nov 6, 2018
@djmitche
Copy link
Contributor

djmitche commented Nov 7, 2018

So, the meaning of the env vars section remains ambiguous, and there appears to have been no final-comment period here in which to catch that issue or check that release engineering and other client consumers are OK with the change. I don't want to delay this by standing on ceremony, and I think we have consensus on most of this, but I'm worried that we've missed discussion of a consumer-facing breaking change. Maybe we should consider pulling out the env-var bit into a separate RFC?

@petemoore
Copy link
Member Author

petemoore commented Nov 7, 2018

there appears to have been no final-comment period here in which to catch that issue or check that release engineering and other client consumers are OK with the change

Apologies, I forgot about applying the tags, but I think we have a general consensus based on the review approvals. It is always plausible for there to be slight changes during the implementation phase, but I think as long as we are vocal about changes coming, and communicate these things well, we should be ok. When users upgrade a major version, they are generally aware there can be breaking changes - the mistake we have made in the past is not being very vocal about it and explicit in release notes. As long as the release notes are very explicit about the breaking changes, and we communicate them well, users can decide if they wish to upgrade, and how long they need for adjusting to the changes. I think this is where we have fallen down in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants