-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added dns sd to query and rule #575
Conversation
pkg/dns/dns.go
Outdated
hosts = append(hosts, net.JoinHostPort(ip.String(), port)) | ||
} | ||
case "dnssrv": | ||
_, recs, err := s.resolver.LookupSRV(ctx, "", proto, host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why did we left empty the service for SRV record? It will be looked up as _._proto.name
. The standard form for the SRV record is _service._proto.name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Kubernetes, we can create a headless service to using the SRV record with the form _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local
, in this case, we can not use the SRV record to discover the hosts in Thanos. Of course, we should use dns+
to look up the hosts for headless service in Kubernetes. Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, this seems to be broken. Will fix it in future commits.
d4eaf04
to
7ac03bc
Compare
cmd/thanos/query.go
Outdated
return runutil.Repeat(dnsSDInterval, ctx.Done(), func() error { | ||
addresses := append(fileSDCache.Addresses(), storeAddrs...) | ||
// TODO(ivan): default port... Use a flag maybe? | ||
if err := dnsProvider.Resolve(ctx, addresses, 9090); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the defaultPort here? Should it be exposed as a flag?
The defaultPort is used when a non SRV lookup is made if the port is not specified. Same question for rule.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just use a variable to represent the default port of store address. User can be easy to set the --grpc-address
for the store components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no default. We should fail if port is not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And potentially have metric for failed resolutions and incorrect filesd entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The default port is only used for alertmanagers right now. The resolver shouldn't care about it. I am moving this logic to the alertmanagerSet update function. I will also add metrics.
@bwplotka @domgreen @jojohappy This PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks great. Some suggestions after first review.
cmd/thanos/query.go
Outdated
return runutil.Repeat(dnsSDInterval, ctx.Done(), func() error { | ||
addresses := append(fileSDCache.Addresses(), storeAddrs...) | ||
// TODO(ivan): default port... Use a flag maybe? | ||
if err := dnsProvider.Resolve(ctx, addresses, 9090); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no default. We should fail if port is not specified.
cmd/thanos/query.go
Outdated
return runutil.Repeat(dnsSDInterval, ctx.Done(), func() error { | ||
addresses := append(fileSDCache.Addresses(), storeAddrs...) | ||
// TODO(ivan): default port... Use a flag maybe? | ||
if err := dnsProvider.Resolve(ctx, addresses, 9090); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And potentially have metric for failed resolutions and incorrect filesd entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, small things only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks legit. Last touches I hope!
ref #492
TODO:
Extract dns sd to separate packageWrite tests for dns sdAdd dns sd to ruler and queryResolve dns periodically and not on every query (add provider/cache)Add a flag for the dns resolution intervalClarify assumptions (default port/failure behaviour)Update docs explaining how dns sd can be used in static flags and file sdPR here - Documented DNS SD and added to the changelog #613Changes
Extracted the DNS SD that was used only for alertmanagers inside ruler into a separate package. Added it to query and rule. This allows for any address provided via static flag or file sd to be a domain name that can be resolved. You can add
dns+
ordnssrv+
in front of the address to get it resolved as a A/AAAA or SRV DNS lookup respectively.Verification
Added unit tests to the DNS Resolver and Provider.
Any ideas how to do an e2e test using some sort of actual dns configuration?