Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy webhook by yaml #2346

Merged
merged 1 commit into from Aug 12, 2022

Conversation

hwdef
Copy link
Member

@hwdef hwdef commented Jul 11, 2022

ref: #2333

steps:

  • Remove the code that creates the webhook by go
  • Create yaml files for deploying the webhook, set the certificate field blank.
  • Supplementary certificate fields via volcano-admission

a little problem:
There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.

I don't know if this is a serious problem, please give me some advice.
@Thor-wl @william-wang @shinytang6 @qiankunli

@volcano-sh-bot volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 11, 2022
@Thor-wl Thor-wl requested review from william-wang, Thor-wl, qiankunli and shinytang6 and removed request for merryzhou and hzxuzhonghu July 11, 2022 08:43
v1 "k8s.io/api/admissionregistration/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort the packages regularlly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func addCaCertForWebhook(kubeClient *kubernetes.Clientset, caBundle []byte) error {
for _, mutatingWebhookName := range mutatingWebhooksName {
var mutatingWebhook *v1.MutatingWebhookConfiguration
if err := wait.PollInfinite(time.Second, func() (done bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to add timeout and exit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long should the timeout be set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shinytang6
I added a 5 minutes timeout.

klog.V(3).Infof("Registered mutating webhook for path <%s>.", service.Path)
for index := 0; index < len(mutatingWebhook.Webhooks); index++ {
if mutatingWebhook.Webhooks[index].ClientConfig.CABundle == nil ||
bytes.Equal(mutatingWebhook.Webhooks[index].ClientConfig.CABundle, caBundle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!bytes.Equal ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I fixed it.

service.ValidatingConfig.Webhooks[i].AdmissionReviewVersions = reviewVersions
service.ValidatingConfig.Webhooks[i].ClientConfig = clientConfig
service.ValidatingConfig.Webhooks[i].NamespaceSelector = webhookLabelSelector
if _, err := kubeClient.AdmissionregistrationV1().MutatingWebhookConfigurations().Update(context.TODO(), mutatingWebhook, metav1.UpdateOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to compare the updated webhook with original mutatingWebhook here, if no change, we don't need to call Update func

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added a flag to record if the webhook has been changed, and if so. then call the update function. reflection is not used, which may affect performance

@shinytang6
Copy link
Member

shinytang6 commented Jul 21, 2022

During this period, volcano-scheduler will restart because it cannot connect to the webhook.

Will the scheduler connect to the webhook? l am a little confused here

@hwdef
Copy link
Member Author

hwdef commented Jul 22, 2022

During this period, volcano-scheduler will restart because it cannot connect to the webhook.

Will the scheduler connect to the webhook? l am a little confused here

Yes, I got some error log

$ kubectl logs -n volcano-system       volcano-scheduler-5499bd684f-v6c8g  -p 
2022/07/22 06:50:51 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
W0722 06:50:51.150374       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
panic: failed init default queue, with err: Internal error occurred: failed calling webhook "mutatequeue.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.volcano-system.svc:443/queues/mutate?timeout=10s": dial tcp 10.96.139.202:443: connect: connection refused

goroutine 1 [running]:
volcano.sh/volcano/pkg/scheduler/cache.newSchedulerCache(0x7ffcc03f4b15?, {0x1c1c30d, 0x7}, {0x1c1b6db, 0x7}, {0x0, 0x0, 0x0?})
        /home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:402 +0x20fc
volcano.sh/volcano/pkg/scheduler/cache.New(...)
        /home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:87
volcano.sh/volcano/pkg/scheduler.NewScheduler(0x0?, {0x1c1c30d?, 0x7?}, {0x7ffcc03f4b15, 0x29}, 0x3b9aca00, {0x1c1b6db?, 0x7?}, {0x0, 0x0, ...})
        /home/edward/code/go/volcano/pkg/scheduler/scheduler.go:75 +0xf0
volcano.sh/volcano/cmd/scheduler/app.Run(0xc0001fc2d0)
        /home/edward/code/go/volcano/cmd/scheduler/app/server.go:76 +0x1bf
main.main()
        /home/edward/code/go/volcano/cmd/scheduler/main.go:65 +0x190

@william-wang
Copy link
Member

a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.

I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli

@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.

@hwdef
Copy link
Member Author

hwdef commented Jul 28, 2022

a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.
I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli

@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.

After discussing it with @shinytang6 , we have roughly located the problem. I will try to solve it later.

@hwdef hwdef force-pushed the delete-webhook-gen-by-go branch 2 times, most recently from 6c578f3 to 47e1ca6 Compare August 2, 2022 03:43
@hwdef
Copy link
Member Author

hwdef commented Aug 2, 2022

The error mentioned above is caused by the failure to call the webhook when creating the initial queue. For this I use the wait package to retry the creation of the initial queue at a frequency of once per second. Until it is created successfully.

In addition, I also set a timeout of 5 minutes, and when a fatal problem is encountered, the program can be exited by panic.
Not sure if such a solution would work.
@william-wang @shinytang6 What do you think?

@hwdef
Copy link
Member Author

hwdef commented Aug 4, 2022

@shinytang6
I use retry package to create init queue, please review it again.
Thanks!

@hwdef hwdef force-pushed the delete-webhook-gen-by-go branch 2 times, most recently from fc88bca to 68af3dc Compare August 5, 2022 07:23
Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another questions: due to we apply all the webhook configurations now, will enabled-admission arguments still work as expected?


v1 "k8s.io/api/admissionregistration/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. I deleted this.

@hwdef
Copy link
Member Author

hwdef commented Aug 5, 2022

Another questions: due to we apply all the webhook configurations now, will enabled-admission arguments still work as expected?

After checking the source of this parameter and related code, I think that deploying webhook and this parameter are two separate functions that have no effect on each other

@hwdef hwdef force-pushed the delete-webhook-gen-by-go branch 4 times, most recently from dd804fb to 1a86782 Compare August 8, 2022 05:55
@@ -79,7 +79,7 @@ func (c *Config) AddFlags(fs *pflag.FlagSet) {
fs.StringVar(&c.WebhookNamespace, "webhook-namespace", "", "The namespace of this webhook")
fs.StringVar(&c.WebhookName, "webhook-service-name", "", "The name of this webhook")
fs.StringVar(&c.WebhookURL, "webhook-url", "", "The url of this webhook")
fs.StringVar(&c.EnabledAdmission, "enabled-admission", defaultEnabledAdmission, "enabled admission webhooks")
fs.StringVar(&c.EnabledAdmission, "enabled-admission", defaultEnabledAdmission, "enabled admission webhooks,if this parameter is modified, please modify the deployed webhook as well. Make sure the same webhook is deployed and enabled.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabled admission webhooks, if this parameter is modified, make sure corresponding webhook configurations are the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed this.

Signed-off-by: hwdef <hwdefcom@outlook.com>
@hwdef
Copy link
Member Author

hwdef commented Aug 12, 2022

@shinytang6 @william-wang @Thor-wl
I think this PR is ready to review. Please take a look.

@shinytang6
Copy link
Member

Generally LGTM.

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2022
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Thor-wl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants