Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TserverUI service gets mistakenly deleted if operator reads stale cluster.Spec.Tserver.TserverUIPort #36

Open
srteam2020 opened this issue Sep 1, 2021 · 0 comments

Comments

@srteam2020
Copy link

Describe the bug

After restarting from crash and connecting to a stale apiserver, the operator can mistakenly delete the tserverUI service if it reads the stale state cluster.Spec.Tserver.TserverUIPort.

Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field cluster.Spec.Tserver.TserverUIPort is initially set to -1, so there is no tserverUI service running. Then the user changes the cluster.Spec.Tserver.TserverUIPort to a valid port number. The operator creates the tserverUI service accordingly. After the tserverUI service is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds the cluster.Spec.Tserver.TserverUIPort field as -1 at the moment. The operator cannot differentiate whether the data is stale and directly deletes the tserverUI service.

To reproduce

  1. Create YBCluster with cluster.Spec.Tserver.TserverUIPort set to -1.
  2. Change cluster.Spec.Tserver.TserverUIPort to 7000. Operator will create the tserverUI service. Meanwhile, apiserver2 is straggling and still holds cluster.Spec.Tserver.TserverUIPort as -1.
  3. Operator crashes, restarts, and communicates with apiserver2. It then reconciles and deletes the tserverUI service since cluster.Spec.Tserver.TserverUIPort is -1 on apiserver2.

Additional information

This bug is similar to #35. We remove the min value constraint for TserverUIPort in the CRD and find this problem.

Fix

We are willing to send a PR to fix this problem.
A potential fix is to use UID in (precondition) when deleting the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant