Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] WaitForUpdatesEx doesn't return error in response #2724

Closed
vrevelas opened this issue Jan 31, 2022 · 5 comments
Closed

[BUG] WaitForUpdatesEx doesn't return error in response #2724

vrevelas opened this issue Jan 31, 2022 · 5 comments

Comments

@vrevelas
Copy link

vrevelas commented Jan 31, 2022

Describe the bug
I was using the Rancher (v2.6.2) Kubernetes (version v1.20.11-rancher1-2) in-tree vsphere cloud provider with vSphere 6.7.0.50000 to provision persistent volumes. When creating a new PersistentVolumeClaim, the status of the claim would remain hung in a 'pending' state. I followed the troubleshooting guide and increased the log level of kube-controller-manager, but the logs still showed no error.

Desperate, I set up an instance of mitmproxy in reverse proxy mode to spy on traffic between kubernetes and vSphere. This showed that Kubernetes was creating a disk, then using the WaitForUpdatesEx request to wait until the creation was complete. However, the first WaitForUpdatesEx request was returning the following response:

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <soapenv:Body>
    <WaitForUpdatesExResponse xmlns="urn:vim25">
      <returnval>
        <version>1</version>
        <filterSet>
          <filter type="PropertyFilter">session[526df172-e9ac-a23e-abbc-a5bac1ea98b3]524eab32-9d88-4dd4-8890-fdccfa73a7be</filter>
          <objectSet>
            <kind>enter</kind>
            <obj type="Task">task-5797302</obj>
            <missingSet>
              <path>info</path>
              <fault>
                <fault xsi:type="NoPermission">
                  <object type="Folder">name-redacted</object>
                  <privilegeId>System.Read</privilegeId>
                </fault>
                <localizedMessage></localizedMessage>
              </fault>
            </missingSet>
          </objectSet>
        </filterSet>
      </returnval>
    </WaitForUpdatesExResponse>
  </soapenv:Body>
</soapenv:Envelope>

This request was immediately followed by a second WaitForUpdatesEx request, as if the client hadn't understood that the response contained an error. The response from vSphere for the second request was "not found". The client then sent a third WaitForUpdatesEx request. The response this time was completely empty - an EOF.

When deleting the PVC, Kubernetes would display an "unexpected EOF" error in the events of the PVC. The real problem though was when creating the PVC, which just hung with no error at all.

I found a few other bug reports related to govmomi and "unexpected EOF" which may or may not be related to this issue: #2611, #1025, and jetbrains-infra/packer-builder-vsphere#87.

Taking a look at the code, it looks like WaitForUpdatesEx doesn't contain any error handling for the contents of the response body - it just checks for connectivity errors from the RoundTrip:

return resBody.Res, nil
I compiled a custom build of the kube-controller-manager which vendors govmomi to include an additional check for resBody.Fault_ != nil, however that also failed to pick up the NoPermission fault, so it looks like the parsing of the response body will need to be improved. At this point the third party I am working with gave my account the System.Read privilege on the vCenter level (I previously only had it on the data center level) and I am no longer able to reproduce the issue.

To Reproduce
Steps to reproduce the behavior:

  1. Create a vsphere user without the System.Read privilege on the vcenter level, but with System.Read on the data center level.
  2. Use the in-tree Kubernetes cloud provider to provision a persistent volume
  3. PVC will remain in a hung 'pending' state with no error message, including when following the instructions in the docs to increase the log level.

Expected behavior
An error message stating that permission was denied due to lack of System.Read on the name-redacted folder should have been displayed when using kubectl describe pvc pvcname, or when viewing the kube-controller-manager logs.

Affected version
This was experienced with govmomi v0.20.3, however I can confirm by reading the code that the issue should still exist in the most recent version, at time of writing 0.27.2

Screenshots/Debug Output
None, but I'm happy to supply any additional detail if required.

Additional context
NA

@github-actions
Copy link
Contributor

Howdy 🖐   vrevelas ! Thank you for your interest in this project. We value your feedback and will respond soon.

If you want to contribute to this project, please make yourself familiar with the CONTRIBUTION guidelines.

@dougm
Copy link
Member

dougm commented Mar 1, 2022

Hi @vrevelas , there is an option you can set to propagate the NoPermission error: #1579

Maybe we ought to revisit that and consider propagating certain fault types by default.

@vrevelas
Copy link
Author

vrevelas commented Mar 2, 2022

Hi @dougm , thanks for taking a look. You're right, this issue is a effectively duplicate of #1604.

This is a stack trace I pulled from Kubernetes when I hit the issue: https://gist.github.com/vrevelas/ac65eeb5837b320cbe0f4c20f43a94ab

Kubernetes calls into govmomi at govmomi/task/wait:WaitForResult on line 15. Correct me if I'm wrong, but I don't think there's any way to pass {PropagateMissing: true} from that part of the API?

@dougm
Copy link
Member

dougm commented Mar 2, 2022

Ah, PR #1579 always sets it to true in task.Wait:

filter := &property.WaitFilter{PropagateMissing: true}

That fix was after v0.20.3, so bumping to 0.27.x should fix it.

% git tag --contains e373feb8e90894dfbe39871e72c05946b4cf848f
prerelease-v0.21.0-58-g8d28646
prerelease-v0.22.1-247-g770fcba2
v0.22.0
v0.22.1
v0.22.2
v0.23.0
v0.23.1
v0.24.0
v0.24.1
v0.24.2
v0.25.0
v0.26.0
v0.26.1
v0.27.0
v0.27.1
v0.27.2
v0.27.3
v0.27.4

@vrevelas
Copy link
Author

vrevelas commented Mar 3, 2022

Ah perfect!

filter := &property.WaitFilter{PropagateMissing: true}
is where I got the {PropagateMissing: true} I used my previous comment, but it didn't occur to me to check if the call stack included it 😄 I've raised the kubernetes issue linked above to request they upgrade their version of govmomi. Many thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants