Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename host config option to follow nomenclature from RFC 3986 #1775

Closed
ghost opened this issue Feb 11, 2020 · 8 comments
Closed

Rename host config option to follow nomenclature from RFC 3986 #1775

ghost opened this issue Feb 11, 2020 · 8 comments
Assignees
Labels
domain: config Anything related to configuring Vector domain: networking Anything related to Vector's networking have: should We should have this feature, but is not required. It is medium priority. meta: good first issue Anything that is good for new contributors. type: task Generic non-code related tasks

Comments

@ghost
Copy link

ghost commented Feb 11, 2020

Some of our components support host (or hosts) configuration option. The following table shows examples of this option provided by our docs for components which use it:

Component name Example value
prometheus source http://localhost:9090
aws_ec2_metadata transform http://169.254.169.254
clickhouse sink http://localhost:8123
datadog_metrics sink https://api.datadoghq.com
elasticsearch sink http://10.24.32.122:9000
humio_logs sink http://myhumiohost.com
logdna sink http://127.0.0.1
sematext sink http://example.com
splunk_hec sink http://my-splunk-host.com

From these examples it can be seen that the values are expected to contain schema and (optionally) port number. However, according to RFC 3986 URIs are divided into components like this:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment
   |   _____________________|__
  / \ /                        \
  urn:example:animal:ferret:nose

and the "host" part is defined in section 3.2.2 as

The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted-decimal form, or a registered name.

This means that using host as the configuration option name is technically not precise because it contains the schema and can also contain the port number.

Proposal

We need a single name for an expression of form http(s)://<host>(:<port>). I propose to use name endpoint. This name is used in AWS Documentation and we already use name endpoint for the same purpose in aws_* sinks: https://github.com/timberio/vector/blob/1b24dd1833d41a3fdac741a1190033b6d49cb6f0/src/region.rs#L9-L12

An alternative name is base_uri, which is mentioned in section 5.1 of the same RFC 3986.

Transitioning strategy

Because the change is merely cosmetic, I think it would not be fair to break existing configs for mentioned above components. So we can add host as an alias for endpoint/base_uri and update the documentation, so that the host option would still work, but endpoint/base_uri would be recommended for new configs.

It is an interesting question should we keep mentioning host in the documentation as a deprecated option or just make it undocumented.

@ghost ghost added domain: config Anything related to configuring Vector domain: networking Anything related to Vector's networking labels Feb 11, 2020
@ghost ghost changed the title Rename host config option for sinks to follow nomenclature from RFC 3986 Rename host config option to follow nomenclature from RFC 3986 Feb 11, 2020
@binarylogic
Copy link
Contributor

We need a single name for an expression of form http(s)://(:). I propose to use name endpoint.

👍 agree.

Because the change is merely cosmetic, I think it would not be fair to break existing configs for mentioned above components. So we can add host as an alias for endpoint/base_uri and update the documentation, so that the host option would still work, but endpoint/base_uri would be recommended for new configs.

100% agree. This would be a severe breaking change that I would like to avoid. I also wonder if this is better solved by implementing https://github.com/timberio/vector/issues/1291? Then we can build layers that handle backward compatibility. I fear this type of logic will get messy and complicated over time unless we support versioning.

It is an interesting question should we keep mentioning host in the documentation as a deprecated option or just make it undocumented.

I would prefer to remove it for the docs.

@LucioFranco
Copy link
Contributor

This seems like a good idea! I think another thing we could do as well with this change is push endpoint parsing into serde. We already have a small util for this now https://github.com/timberio/vector/blob/master/src/sinks/util/uri.rs but I think this would also provide a much better UX around adding hosts.

@binarylogic binarylogic added the type: task Generic non-code related tasks label Feb 11, 2020
@ghost
Copy link
Author

ghost commented Feb 11, 2020

We already have a small util for this now https://github.com/timberio/vector/blob/master/src/sinks/util/uri.rs but I think this would also provide a much better UX around adding hosts.

We definitely should use it, it would greatly reduce amount of boilerplate. And it can also be used in RegionOrEndpoint.

@binarylogic binarylogic added have: should We should have this feature, but is not required. It is medium priority. meta: good first issue Anything that is good for new contributors. labels Aug 7, 2020
@ktff
Copy link
Contributor

ktff commented Aug 20, 2020

In AWS sinks we have options endpoint = "127.0.0.0:5000/path/to/service", is the path part really used or just the scheme and authority? I haven't found any examples with path in AWS documentation. It would simplify some things if path isn't used.

@binarylogic
Copy link
Contributor

Yes, the path can be used for some services. It’s also helpful when pointing Vector to services that behave like AWS services (ex: many services implement the S3 protocol).

@binarylogic
Copy link
Contributor

@ktff did #3590 close this?

@ktff
Copy link
Contributor

ktff commented Sep 1, 2020

@binarylogic I wanted to add

I think another thing we could do as well with this change is push endpoint parsing into serde

on which I was working on, but it turned out a lot more taxing than the original issue, so I am for closing this.

There is value in pushing endpoint parsing, but there is no clear border on how much should be moved since there are trade-offs going on. If we do want to add this, it would need a separate issue, with listed trade-offs, and explanation where to draw the border. This is mostly complicated by having to think defensively about how the users are using the endpoints to avoid breakage. In all, I don't think the amount of work that needs to be put into it is worth the result at the moment, but it will be even more difficult to do in the future.

@binarylogic
Copy link
Contributor

Thanks. Would you mind closing this and opening an issue for the idea? We can always come back to it in the future. I would elaborate on why it's beneficial and difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector domain: networking Anything related to Vector's networking have: should We should have this feature, but is not required. It is medium priority. meta: good first issue Anything that is good for new contributors. type: task Generic non-code related tasks
Projects
None yet
Development

No branches or pull requests

3 participants