Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Kubernetes and Client-Go for 1.11.0 / 8.0.0 #634

Merged
merged 3 commits into from
Jul 17, 2018
Merged

Update Kubernetes and Client-Go for 1.11.0 / 8.0.0 #634

merged 3 commits into from
Jul 17, 2018

Conversation

marpaia
Copy link
Contributor

@marpaia marpaia commented Jun 29, 2018

This PR contains my run at updating the revs for the Kubernetes dependencies. I had to do some refactoring to use the new dynamic client interface, which is the most noteworthy change to the Ark code in my opinion.

@@ -27,7 +27,7 @@ type PodVolumeBackupSpec struct {
Node string `json:"node"`

// Pod is a reference to the pod containing the volume to be backed up.
Pod corev1api.ObjectReference `json:"pod"`
Pod *corev1api.ObjectReference `json:"pod"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required. We ran into this problem when we were updating the generated code when we didn't have k8s.io/api available on the GOPATH. How did you regenerate things? The kubernetes-1.11.x tags for k8s.io/code-generator now allow us to run the generate groups shell script from anywhere (i.e. within ark), so we can remove the k8s.io/api bind mount (see https://github.com/heptio/ark/blob/32907931e13f38ef4e055652245eeb78d20ac76e/Makefile#L101-L102).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! So, hack/update-generated-crd-code.sh calls $GOPATH/src/k8s.io/code-generator/generate-groups.sh which also requires k8s.io/apimachinery to be in the GOPATH as well. This is described in this issue: kubernetes/code-generator#21

I have some other projects that use code generation and I add a required stanza to my Gopkg.toml for the k8s.io/code-generator repo so that we're not relying on out-of-tree code. I tried to set this up for Ark but that would be complicated for y'all because you prune non-go files, which prunes out the scripts.

When I did run the generator, however, both code-generator and apimachinery were checked out to the release-1.11 branch.

@marpaia
Copy link
Contributor Author

marpaia commented Jun 30, 2018

I had to go to dinner last night before I could get this to work, but I will definitely circle back soon.

@ncdc
Copy link
Contributor

ncdc commented Jun 30, 2018 via email

@ncdc
Copy link
Contributor

ncdc commented Jul 2, 2018

@marpaia thanks for doing this! We're working to close out 0.9.0, then we'll pick up reviewing this. When you get a chance (no rush!), please split out changes to Gopkg.* and vendor to a separate commit from changes to the Ark code. Thanks!

Signed-off-by: Mike Arpaia <mike@arpaia.co>
Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, otherwise LGTM.


# vendor/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go:300:25:
# cannot call non-function spt.Token (type adal.Token)
[[override]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a [[constraint]] for this package under Cloud provider packages - can you just update the constraint to specify this revision rather than adding an override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the constraint below. I believe this needs to be an override instead of a constraint because we need to override the version used in k8s.io/client-go.

@skriss
Copy link
Contributor

skriss commented Jul 11, 2018

We may also be able to simplify pkg/client/dynamic.go now - IIRC we created a factory/client wrapper in order to be able to fake for unit testing, which may be easier with the updated client. That can be addressed separately, though - I'll add an issue for it.

@ncdc
Copy link
Contributor

ncdc commented Jul 11, 2018

We created it because the upstream client had some developer UX issues. The new client is much better and we shouldn't have to provider a helper any more, hopefully.

@skriss skriss requested a review from nrb July 12, 2018 16:19
@skriss
Copy link
Contributor

skriss commented Jul 12, 2018

@nrb can you take a look at this too? I'm fine merging as-is and can follow up to address my comments.

@skriss
Copy link
Contributor

skriss commented Jul 12, 2018

Thanks @marpaia. Could you add DCO signoff to that last commit? Or squash it into a previous one.

@marpaia
Copy link
Contributor Author

marpaia commented Jul 12, 2018

Woops, sorry about that :)

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

Testing this branch out, I'm seeing backup failures with a simple example that I don't see with master.

I run ark backup create nginx --include-namespaces nginx-example.

I see this in the server logs:

INFO[0096] Backup completed                              backup=heptio-ark/nginx logSource="pkg/controller/backup_controller.go:404"
ERRO[0097] backup failed                                 error="the server could not find the requested resource" key=heptio-ark/nginx logSource="pkg/controller/backup_controller.go:280"

End of backup logs:
ark backup logs nginx

time="2018-07-13T11:30:40-04:00" level=info msg="Backup completed with errors: the server could not find the requested resource" backup=heptio-ark/nginx logSource="pkg/backup/backup.go:302"

I'll do some more debugging on this, since Steve didn't see this behavior.

@skriss
Copy link
Contributor

skriss commented Jul 13, 2018

@nrb what version's your cluster? Do you not get this error if running v0.9.0 in this same cluster?

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

Kube versions:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T11:52:23Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.3", GitCommit:"9b5b719c5f295c99de68ffb5b63101b0e0175376", GitTreeState:"clean", BuildDate:"2018-05-31T18:32:23Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Creating with a pod running v0.9.0:

pod log:

time="2018-07-13T15:50:54Z" level=info msg="Starting backup" backup=heptio-ark/nginx logSource="pkg/controller/backup_controller.go:339"
time="2018-07-13T15:50:57Z" level=info msg="Backup completed" backup=heptio-ark/nginx logSource="pkg/controller/backup_controller.go:401"

backup log:

time="2018-07-13T15:50:56Z" level=info msg="Backup completed successfully" backup=heptio-ark/nginx logSource="pkg/backup/backup.go:300"

I'm going to try with a clean cluster and this branch on top of master.

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

Narrowing this down some, it looks like the nginx pod isn't getting retrieved.

ark backup create all on a fresh cluster, I get the following in my backup logs:

x1c in /home/nrb/go/src/github.com/heptio/ark/docs (git) marpaia-k8s-1.11 U
% ark backup logs all | grep error
time="2018-07-13T12:30:12-04:00" level=error msg="Error executing item actions" backup=heptio-ark/all error="the server could not find the requested resource" group=v1 groupResource=pods logSource="pkg/backup/item_backupper.go:208" name=nginx-deployment-99997d74d-2xz8k namespace=nginx-example
time="2018-07-13T12:30:59-04:00" level=info msg="Backup completed with errors: the server could not find the requested resource" backup=heptio-ark/all logSource="pkg/backup/backup.go:307"

Printing some debug info, I see

Error in resourceBackupper.backupResource with the itemBackupper.backupItem call
err = the server could not find the requested resource
unstructured = &{Object:map[kind:Pod apiVersion:v1 metadata:map[generateName:nginx-deployment-99997d74d- selfLink:/api/v1/namespaces/nginx-example/pods/nginx-deployment-99997d74d-2xz8k resourceVersion:679 creationTimestamp:2018-07-13T16:25:21Z labels:map[app:nginx pod-template-hash:555538308] ownerReferences:[map[kind:ReplicaSet name:nginx-deployment-99997d74d uid:56ce88c3-86b9-11e8-8ba2-42010a9600bf controller:true blockOwnerDeletion:true apiVersion:extensions/v1beta1]] name:nginx-deployment-99997d74d-2xz8k namespace:nginx-example uid:56d0fd4a-86b9-11e8-8ba2-42010a9600bf] spec:map[serviceAccountName:default serviceAccount:default securityContext:map[] volumes:[map[name:nginx-logs persistentVolumeClaim:map[claimName:nginx-logs]] map[name:default-token-9w69v secret:map[defaultMode:420 secretName:default-token-9w69v]]] containers:[map[terminationMessagePolicy:File imagePullPolicy:IfNotPresent name:nginx image:nginx:1.7.9 ports:[map[containerPort:80 protocol:TCP]] resources:map[] volumeMounts:[map[name:nginx-logs mountPath:/var/log/nginx] map[name:default-token-9w69v readOnly:true mountPath:/var/run/secrets/kubernetes.io/serviceaccount]] terminationMessagePath:/dev/termination-log]] dnsPolicy:ClusterFirst schedulerName:default-scheduler tolerations:[map[effect:NoExecute tolerationSeconds:300 key:node.kubernetes.io/not-ready operator:Exists] map[tolerationSeconds:300 key:node.kubernetes.io/unreachable operator:Exists effect:NoExecute]] restartPolicy:Always terminationGracePeriodSeconds:30 nodeName:gke-cluster-1-default-pool-4655bde0-17mn] status:map[containerStatuses:[map[image:nginx:1.7.9 imageID:docker-pullable://nginx@sha256:e3456c851a152494c3e4ff5fcc26f240206abac0c9d794affb40e0714846c451 containerID:docker://d41feea34000e758736929861085173ca2134406575410431583807797bd7e51 name:nginx state:map[running:map[startedAt:2018-07-13T16:25:52Z]] lastState:map[] ready:true restartCount:0]] qosClass:BestEffort phase:Running conditions:[map[type:Initialized status:True lastProbeTime:<nil> lastTransitionTime:2018-07-13T16:25:28Z] map[type:Ready status:True lastProbeTime:<nil> lastTransitionTime:2018-07-13T16:25:52Z] map[type:PodScheduled status:True lastProbeTime:<nil> lastTransitionTime:2018-07-13T16:25:28Z]] hostIP:10.150.0.3 podIP:10.28.2.7 startTime:2018-07-13T16:25:28Z]]}

And

Found an error in kbbackupper.Backup with the gb.backupGroup call
err = the server could not find the requested resource
group = &APIResourceList{GroupVersion:v1,APIResources:[{pods  true   Pod [create delete deletecollection get list patch proxy update watch] [po] [all]} {persistentvolumeclaims  true   PersistentVolumeClaim [create delete deletecollection get list patch update watch] [pvc] []} {persistentvolumes  false   PersistentVolume [create delete deletecollection get list patch update watch] [pv] []} {endpoints  true   Endpoints [create delete deletecollection get list patch update watch] [ep] []} {configmaps  true   ConfigMap [create delete deletecollection get list patch update watch] [cm] []} {nodes  false   Node [create delete deletecollection get list patch proxy update watch] [no] []} {resourcequotas  true   ResourceQuota [create delete deletecollection get list patch update watch] [quota] []} {secrets  true   Secret [create delete deletecollection get list patch update watch] [] []} {events  true   Event [create delete deletecollection get list patch update watch] [ev] []} {podtemplates  true   PodTemplate [create delete deletecollection get list patch update watch] [] []} {limitranges  true   LimitRange [create delete deletecollection get list patch update watch] [limits] []} {namespaces  false   Namespace [create delete get list patch update watch] [ns] []} {services  true   Service [create delete get list patch proxy update watch] [svc] [all]} {serviceaccounts  true   ServiceAccount [create delete deletecollection get list patch update watch] [sa] []} {replicationcontrollers  true   ReplicationController [create delete deletecollection get list patch update watch] [rc] [all]}],}

@skriss
Copy link
Contributor

skriss commented Jul 13, 2018

Hmm, okay, I'll see if I can reproduce this and dig into it some more

return &dynamicResourceClient{
resourceClient: dynamicClient.Resource(&resource, namespace),
resourceClient: f.dynamicClient.Resource(gv.WithResource(resource.Name)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! good catch @nrb. This line needs to be:

f.dynamicClient.Resource(gv.WithResource(resource.Name)).Namespace(namespace),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a test case to catch this kind of thing. Thoughts @skriss @marpaia ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ , sorry guys. Yeah, the whole instantiation of the dynamic client is pretty suspect as I'm not super familiar with the dynamic client or the context in which it's used in Ark. I definitely think that some more tests would be great. I often struggle with how to most effectively test API interactions with an API server.. y'all don't have some sort of pattern for this in Ark, do you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I think a basic test that ensures the returned client has a namespace would be sufficient for now. I'm not super familiar with the new client so I'm not sure if there's a method or field on it that we can inspect.

@marpaia
Copy link
Contributor Author

marpaia commented Jul 16, 2018

@nrb I dug around the dynamic interface and there is no good way as far as I can tell to interrogate the client to determine if it has a namespace set or not. I found that when a namespace isn't set, the error message for GETing an API object that doesn't exist is slightly different, which is what the test asserts now, but this is really brittle / not adding much value IMO. I'm happy to back it out if y'all think it's not worth it.

@skriss
Copy link
Contributor

skriss commented Jul 17, 2018

I'd say let's leave out that test for now, get this merged, and when one of us circles back to doing some refactoring/simplification of pkg/client/dynamic.go, we can see if there are any useful tests that can be added. @nrb WDYT?

@marpaia if @nrb is OK with that approach, then I'd drop the unit-test commit, squash each of the most recent two (removing constraint and namespace fix) into the first three, and we'll be good to go.

Signed-off-by: Mike Arpaia <mike@arpaia.co>
@nrb
Copy link
Contributor

nrb commented Jul 17, 2018

@skriss Works for me - it doesn't look like there's a great test directly on the client at this point in time.

Copy link
Contributor

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@skriss skriss merged commit 13f893f into vmware-tanzu:master Jul 17, 2018
@skriss
Copy link
Contributor

skriss commented Jul 17, 2018

Thanks @marpaia!

@marpaia marpaia deleted the k8s-1.11 branch July 17, 2018 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants