Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom DNS server without adding it to system #36

Conversation

carlosonunez-vmw
Copy link
Contributor

Currently, SIVT will fail if the DNS resolver it was configured with cannot
resolve custom Avi FQDNs. This might not work for customers who want to
use FQDNs that will only live on the DNS resolver that they provided to the
UI.

This commit fixes that by using dns.resolver and monkey-patching
urllib.connection.create_connection to ensure that the resolver that
the user provided is used during environment turn up.

Signed-off-by: Carlos Nunez ncarlos@vmware.com

sunshetty and others added 7 commits May 13, 2022 11:56
Signed-off-by: Sunil Shetty <sunilsh@vmware.com>
Signed-off-by: Sunil Shetty <sunilsh@vmware.com>
(cherry picked from commit 3aca59e)
Signed-off-by: Rashi Khandelwal <rashik@vmware.com>
Signed-off-by: Ian Zink <izink@pivotal.io>
Signed-off-by: Tasmiya Bano <tbano@tbano-a01.vmware.com>

Co-authored-by: Tasmiya Bano <tbano@tbano-a01.vmware.com>
Currently, SIVT will fail if the DNS resolver it was configured with cannot
resolve custom Avi FQDNs. This might not work for customers who want to
use FQDNs that will only live on the DNS resolver that they provided to the
UI.

This commit fixes that by using `dns.resolver` and monkey-patching
`urllib.connection.create_connection` to ensure that the resolver that
the user provided is used during environment turn up.

Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
@carlosonunez-vmw
Copy link
Contributor Author

This is also related to #34.

carlosonunez-vmw and others added 4 commits May 26, 2022 14:23
Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
* Fix merge conflicts

Signed-off-by: srinivasme21 <srinivasme@vmware.com>

* Review and fix typos and formatting

Signed-off-by: srinivasme21 <srinivasme@vmware.com>

* Fix format

Signed-off-by: srinivasme21 <srinivasme@vmware.com>

* Update staging URLs to production

Signed-off-by: srinivasme21 <srinivasme@vmware.com>

* Fix release version in title

Signed-off-by: srinivasme21 <srinivasme@vmware.com>
* Fixing Documentation issues

Signed-off-by: Tasmiya Bano <tbano@tbano-a01.vmware.com>

* Docmentation fix for AWS

Signed-off-by: Tasmiya Bano <tbano@tbano-a01.vmware.com>

Co-authored-by: Tasmiya Bano <tbano@tbano-a01.vmware.com>
Users get the message above when TMC token validation fails. This pull
request fixes that.

Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
@sunshetty sunshetty force-pushed the main branch 2 times, most recently from 562acf0 to b5b931f Compare May 31, 2022 09:30
Signed-off-by: Sunil Shetty <sunilsh@vmware.com>
sunshetty and others added 7 commits May 31, 2022 20:14
…check, wrapped-link check

Signed-off-by: Sunil Shetty <sunilsh@vmware.com>
Signed-Off-By: Carlos Nunez <ncarlos@vmware.com>
The bootstrapping cluster forwards `/etc/resolv.conf` to CoreDNS, which
prevents FQDNs with domains in the custom DNS servers from resolving.
This fixes that.

Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
@vmwclabot
Copy link

@carlosonunez-vmw, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding Signed-off-by: John Doe <john.doe@email.org> to the last line of each Git commit message. The e-mail address used to sign must match the e-mail address of the Git author. Click here to view the Developer Certificate of Origin agreement.

…server

Signed-off-by: Carlos Nunez <ncarlos@vmware.com>
@@ -7199,14 +7245,15 @@ def getAviIpFqdnDnsMapping(avi_controller_fqdn_ip_dict, dns_server):
try:
for dns in dns_server:
for avi_fqdn, avi_ip in avi_controller_fqdn_ip_dict.items():
listOfCmd = ["dig", dns, avi_fqdn, "+short"]
listOfCmd = ["dig", f"@{dns}", avi_fqdn, "+short"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since dnspython is being used elsewhere, what are your thoughts on shoehorning in Resolver here? Totally can see it as a being out of scope of this PR (this looks to be purely for 🐛 squashing), but it's so much more efficient than shelling out to run dig and parsing the output.

@carlosonunez-vmw
Copy link
Contributor Author

carlosonunez-vmw commented Jun 1, 2022 via email

@carlosonunez-vmw
Copy link
Contributor Author

Here’s the summary of why this PR is needed up front:

  • It fixes an error that prevents users from validating Avi when using FQDNs whose records are stored in a custom DNS server: link
  • It fixes an error during setup where Arcas fails to resolve Avi FQDNs whose records are stored in a custom DNS server: link,
  • It works around a limitation with the ClusterAPI bootstrapping process wherein AKO is not able to resolve Avi FQDNs whose records are stored in a custom DNS server: link, and
  • It fixes providing the custom nameservers to the management and workload clusters: link

Validating Avi Records

The first thing that the SIVT configuration file wizard asks for is for any custom DNS nameservers that are to be used during the deployment along with a list of search domains and NTP servers:

image

Later in the wizard, you are asked to provide DNS records for the Avi controllers that you create. The wizard expects these to have been created already, and it appears that they must be provided in order to continue (i.e. you can’t just use static IPs):

PastedGraphic-5

Since I installed this appliance in an environment that does not come with a DNS server, I created one using dnsmasq at 10.220.3.251 and added the following records for my Avi controllers:

  • avi-01.tkg.local: 10.220.3.10
  • avi-02.tkg.local: 10.220.3.28
  • avi-03.tkg.local: 10.220.3.29
  • avi-cluster.tkg.local: 10.220.3.30

Prior to entering them, I validated that the records were correct by using nslookup on the appliance, located at 10.220.3.240:

image

Before this PR, trying to validate these records by clicking the “Validate Name Resolution” button would yield an error:

image

This is clearly incorrect given that I was able to resolve this record from the appliance itself! Upon digging into the source code, I saw that this uses an API endpoint that resolves each record with dig:

image

Given the structure of this command, I assumed that the intent was to use the custom DNS server provided earlier to resolve each record. However, what this dig command is actually doing is querying the system’s nameservers for two records:

  • The DNS server (which won’t resolve because it is an IP address without a PTR in a reverse lookup zone), nad
  • The FQDN for each Avi node and the FQDN for the cluster VIP.

Since this appliance is running Photon and is using systemd-resolved for DNS, the system’s nameservers are located at /etc/resolv.conf and managed by resolvectl and /etc/systemd/resolved.conf. When I set up this appliance, I only configured it to use the DNS servers for H2O, which are:

  • 10.79.2.5, and
  • 10.79.2.6

Which you can see here:

image

Because the custom DNS server I provided earlier isn’t here, this validation test will always fail. Moreover, the setup instructions for this appliance do not tell users to add their custom DNS servers to the appliance’s nameservers (which I wouldn’t expect them to do anyway).

The third commit in this PR fixes this by putting an @ sign before the dns record to tell dig that this IP is a nameserver.

Resolving Avi Records

After I was able to get the SIVT frontend to validate my Avi FQDNs, the next problem I faced was getting arcas to connect to the Avi nodes with these custom records. After provisioning the appliances and turning them on, Arcas connects to the Avi REST endpoints using the FQDNs provided during setup. It does this using Python’s requests library. There are several places this is done; an example of one is provided below:

image

The requests library also uses system resolvers by default, which is a problem if you’ve provided a set of custom nameservers to Arcas. Fortunately, Python allows you to override this behavior by monkey-patching the connection.create_connection method and providing your own resolver which is used by requests to establish its underlying TCP connections prior to initiating HTTP sessions. This commit in this pull request implements that.

Unfortunately, once I fixed that, I ran into an issue verifying Avi’s self-signed certificates for a similar reason: Python’s built in ssl library also uses system resolvers. This same commit overrides that by providing an IP to the ssl.get_server_certificate method if custom nameservers are provided.

Kind and TKG

Once I resolved these issues, I was finally able to get Arcas to provision a management cluster with tanzu management-cluster create. Unfortunately I ran into two additional issues by this point:

  • The Avi for Kubernetes Operator (AKO) couldn’t resolve the Avi records in my custom nameservers, and
  • The kubelets in the management cluster control plane were not configured to use custom nameservers either.

The first issue was tricky to pin down, but I found that the coredns service in the bootstrap cluster created by the Tanzu CLI is configured to forward queries to nameservers in /etc/resolv.conf, which Docker symlinks to the system’s /etc/resolv.conf by default:

image

The Cluster object for Kind doesn’t expose a way of specifying custom nameservers, and even if it did, tanzu-framework doesn’t leverage that anyway. Consequently, the only way to forward custom nameservers to Kind is by modifying /etc/resolv.conf. However, because (a) the user is not instructed to do this in SIVT’s documentation, and (b) it would be a poor user experience if they were, Arcas should be able to handle modifying /etc/resolv.conf transparently before the Tanzu CLI starts up. This commit implements this functionality.

Finally, the configuration file that Arcas generated for the Tanzu CLI does not pass in the custom nameservers that were provided earlier. Consequently, the only way these nodes would know about the custom nameservers is if they are defined in DHCP option 6 for the subnets they are connected to. The aforementioned commit (and others that follow it) fix this by:

  • Enabling the features.management-cluster.custom-nameservers and features.clusters.custom-nameservers feature flags in Tanzu CLI, and
  • Modifying the configuration file Jinja2 templates to provide custom nameservers if any are provided.

@sunshetty sunshetty force-pushed the main branch 3 times, most recently from 7bdea37 to 528dca6 Compare July 22, 2022 12:51
@sunshetty
Copy link
Contributor

@carlosonunez-vmw Can you update the PR and raise it against the newly released version 1.3-1.5.4?

@carlosonunez
Copy link

carlosonunez commented Aug 3, 2022 via email

@sunshetty
Copy link
Contributor

Sure, but it will have to wait a month as I’ll be out of office for a few weeks.

On Aug 2, 2022, at 11:44 PM, sunshetty @.***> wrote: @carlosonunez-vmw https://github.com/carlosonunez-vmw Can you update the PR and raise it against the newly released version 1.3-1.5.4? — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGWPR2YNSTT4N3CE5UCOHDVXH2JXANCNFSM5W6ZS4FQ. You are receiving this because you commented.

That should be fine Carlos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants