Skip to content

Incorrect IP when using dns-resolver #904

Open
@shumvgolove

Description

@shumvgolove

Describe the bug

Gatus doesn't show the correct IP in conditions when dns-resolver overshadows existing domain name, although healthcheck is performed on the correct one.

What do you see?

Gatus returns incorrect IP:

✓ ~ [STATUS] == 200
X ~ [IP] (142.250.200.110) == any(:1, 127.0.0.1)

What do you expect to see?

Gatus should return the correct IP when using dns-resolver:

✓ ~ [STATUS] == 200
✓ ~ [IP] (127.0.0.1) == any(:1, 127.0.0.1)

List the steps that must be taken to reproduce this issue

  1. Create a dns server that overshadows domain google.com with custom IPs (for example, using CoreDNS):

    Corefile

    .:54 {
        bind lo
        hosts {
            127.0.0.1 google.com
            ::1 google.com
            fallthrough
        }
        log
    }
    

    drill -p 54 A google.com

    ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 22162
    ;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
    ;; QUESTION SECTION:
    ;; google.com.	IN	A
    
    ;; ANSWER SECTION:
    google.com.	3600	IN	A	127.0.0.1
    
    ;; AUTHORITY SECTION:
    
    ;; ADDITIONAL SECTION:
    
    ;; Query time: 0 msec
    ;; SERVER: ::1
    ;; WHEN: Tue Nov 19 09:57:36 2024
    ;; MSG SIZE  rcvd: 44
    

    drill -p 54 AAAA google.com

    ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 7552
    ;; flags: qr rd ra ; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0 
    ;; QUESTION SECTION:
    ;; google.com.	IN	AAAA
    
    ;; ANSWER SECTION:
    google.com.	3600	IN	AAAA	::1
    
    ;; AUTHORITY SECTION:
    
    ;; ADDITIONAL SECTION:
    
    ;; Query time: 187 msec
    ;; SERVER: 127.0.0.1
    ;; WHEN: Tue Nov 19 10:01:17 2024
    ;; MSG SIZE  rcvd: 56
    
  2. Create the following healthcheck:

    endpoints:
      - name: test
        url: "https://google.com"
        client:
          dns-resolver: "tcp://127.0.0.1:54"
        interval: 30s
        conditions:
          - "[STATUS] == 200"
          - "[IP] == any(::1, 127.0.0.1)"
  3. Observe that healthcheck fails with the incorrect IP:

    ✓ ~ [STATUS] == 200
    X ~ [IP] (142.250.200.110) == any(:1, 127.0.0.1)
    

    142.250.200.110 here is the actual Google IP, resolved from global system DNS.

  4. Observe that Gatus process connects correctly to 127.0.0.1 (5848 is a main gatus PID) :

    strace -f -e trace=network -s 10000 -p 5848 2>&1 | grep 'connect' | grep '443'

    [pid  5848] connect(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation in progress)
    [pid  5824] connect(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation in progress)
    ...
    

Version

v5.13.1

Additional information

No response

Activity

TwiN

TwiN commented on Nov 20, 2024

@TwiN
Owner

Hmm.. This is because the IP from the [IP] placeholder is only retrieved if the placeholder is present in one of the conditions, and when it is present, it retrieves it using net.LookupIP, which completely bypasses the client.dns-resolver configuration.

if e.needsToRetrieveIP() {
e.getIP(result)
}

func (e *Endpoint) getIP(result *Result) {
if ips, err := net.LookupIP(result.Hostname); err != nil {
result.AddError(err.Error())
return
} else {
result.IP = ips[0].String()
}
}

From a UX perspective, I completely understand why you'd expect client.dns-resolver to be used for the DNS lookups though, so you bring a good point.

It shouldn't be too difficult to implement, given that the code for the resolver already exists, and that under the hood, net.LookupIP makes a call to DefaultResolver.LookupIPAddr.

gatus/client/config.go

Lines 240 to 260 in 0113175

if c.HasCustomDNSResolver() {
dnsResolver, err := c.parseDNSResolver()
if err != nil {
// We're ignoring the error, because it should have been validated on startup ValidateAndSetDefaults.
// It shouldn't happen, but if it does, we'll log it... Better safe than sorry ;)
logr.Errorf("[client.getHTTPClient] THIS SHOULD NOT HAPPEN. Silently ignoring invalid DNS resolver due to error: %s", err.Error())
} else {
dialer := &net.Dialer{
Resolver: &net.Resolver{
PreferGo: true,
Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
d := net.Dialer{}
return d.DialContext(ctx, dnsResolver.Protocol, dnsResolver.Host+":"+dnsResolver.Port)
},
},
}
c.httpClient.Transport.(*http.Transport).DialContext = func(ctx context.Context, network, addr string) (net.Conn, error) {
return dialer.DialContext(ctx, network, addr)
}
}
}

We'd have to extract the piece of code that creates the resolver, and then we can reuse it to create a dialer that would work for both the HTTP client and the function used for resolving the IP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @TwiN@shumvgolove

      Issue actions

        Incorrect `IP` when using `dns-resolver` · Issue #904 · TwiN/gatus