Skip to content
This repository has been archived by the owner on Sep 7, 2022. It is now read-only.

Fix issue #390 #402

Closed
wants to merge 703 commits into from
Closed

Fix issue #390 #402

wants to merge 703 commits into from

Conversation

shaominchen
Copy link

Fix issue #390: Failed to dynamically provision volume if VM is removed from inventory

When VM node is removed from vSphere Inventory, the corresponding kubernetes node is unregistered and removed from registeredNodes cache in nodemanager. However, it is not removed from the other node info cache in nodemanager. The fix is to update the other cache accordingly.

Also, fixed getSharedDatastores function to handle the same case properly.

Testing Done:

  1. Removed the node VM from vSphere inventory.
  2. Create storageclass and pvc to provision volume dynamically

xiangpengzhao and others added 30 commits November 22, 2017 15:53
Automatic merge from submit-queue (batch tested with PRs 55103, 56036, 56186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Removed opaque integer resources (deprecated in v1.8)

**What this PR does / why we need it**:

* Remove opaque integer resources (OIR) support from the code base. This feature was deprecated in v1.8 and replaced by Extended Resources (ER).

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#55102

**Release note**:

```release-note
Remove opaque integer resources (OIR) support (deprecated in v1.8.)
```
Automatic merge from submit-queue (batch tested with PRs 55103, 56036, 56186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add cleanup-ipvs flag for kube-proxy 

**What this PR does / why we need it**:

There is no way to tell if a given ipvs rule is created by ipvs proxier or not, and some people have complained that iptables/userspace proxier will clean up their ipvs rules when start up - both iptables and userspace proxiers need to clean up legacy proxy rules created by ipvs proxier.

This PR adds a new `--cleanup-ipvs` flag for kube-proxy for the sake of providing users a way to decide if clean up IPVS rules or not when start iptables or userspace proxier.

**Which issue(s) this PR fixes**:
Fixes kubernetes#55857 

**Special notes for your reviewer**:

**Release note**:

```release-note
Add cleanup-ipvs flag for kube-proxy 
```

/sig network
/area ipvs
/king bug
Automatic merge from submit-queue (batch tested with PRs 55103, 56036, 56186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "Kubelet flags take precedence over config from files/ConfigMaps"

This reverts commit cbebb61.

Per kubernetes#56097 (comment)

```release-note
NONE
```
…ests

abstract out etcd server creation
test/integration/framework: cleanup master_utils.go
kube-apiserver: move StartTestServer tests into test/integration/master
Fix the failing scale test
kube-apiserver's TestServer now returns a struct instead of individual values
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

install ipset in debian-iptables docker image

**What this PR does / why we need it**:

IPVS kube-proxy use ipset doing SNAT and packets filtering. Because IPVS kube-proxy is based on debian-iptables docker image, this PR installs ipset util in the image.

I believe I lost this change in kubernetes#54219 somehow during code rebase.

**Which issue(s) this PR fixes**:
Fixes kubernetes#56116

**Special notes for your reviewer**:

**Release note**:

```release-note
install ipset in debian-iptables docker image
```

/sig network

/kind bug

/area kube-proxy
Automatic merge from submit-queue (batch tested with PRs 56115, 55143, 56179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use GetVersion() API instead of ver command

**What this PR does / why we need it**:

Should use GetVersion vs Shelling out to ver.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#55083

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
…og-support

Automatic merge from submit-queue (batch tested with PRs 56115, 55143, 56179). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Do not add new field in fluentd CRI log format.

After kubernetes#55922 is merged, the test `Cluster level logging implemented by Stackdriver should ingest logs` starts to fail in cri-containerd cluster e2e test.
https://k8s-testgrid.appspot.com/sig-node-containerd#e2e-gci

I believe the reason is that the GCP fluentd plugin assumes that there are only `timestamp`, `severity`, `stream` and `log|message|msg` fields in the log entry.

If there is any other fields, GCP fluentd plugin will not try to convert the payload to json, even if the log content is json. The plugin deletes `stream`, `timestamp` and `severity`, then assumes that there is only one field left https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/blob/e13c89a1b6e0c33bac35435fe8e41d566ce52687/lib/fluent/plugin/out_google_cloud.rb#L495.

This PR removes the tag field. With this, fluentd GCP plugin should work again.

@yujuhong @crassirostris 
/cc @kubernetes/sig-node-bugs @kubernetes/sig-instrumentation-bugs 
/cc @derekwaynecarr for milestone approve. Thanks!

**Release note**:

```release-note
none
```
Signed-off-by: Mik Vyatskov <vmik@google.com>
…r-crt

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Regenerate API server serving certificates when upgrading.

**What this PR does / why we need it**:
TODO: 
- [x] check the age of crt.
- [x] check the new version number.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#548

**Special notes for your reviewer**:
/cc @luxas 

**Release note**:

```release-note
NONE
```
…_kibana

Automatic merge from submit-queue (batch tested with PRs 55998, 55400). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update of elasticsearch kibana version

**What this PR does / why we need it**:
Updated elasticsearch and kibana version to version 5.6.4
This was motivated by @crassirostris in kubernetes#54215 (comment)

**Release note**:
```release-note
[fluentd-elasticsearch addon] Elasticsearch and Kibana are updated to version 5.6.4
```
Automatic merge from submit-queue (batch tested with PRs 56207, 55950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix long event handler in cloud cidr allocator

Ref kubernetes#52292
…tting-resources-fix

Automatic merge from submit-queue (batch tested with PRs 56207, 55950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix setting resources in fluentd-gcp plugin

Currently if some of the variables are not set, scripts prints error, which is not critical, since the function is executed in a separate process, but it leads to the wrong resulting values

```release-note
NONE
```

/cc @piosz @x13n 
/assign @roberthbailey @mikedanese 
Could you please approve?
Automatic merge from submit-queue (batch tested with PRs 55873, 56156). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubectl: Add Terminating state to PVCs

kubectl should show something when a PVC has a deletion timestamp and is waiting for deletion. This patch follows Pod - it adds Terminating state.

For easier discovery of errors, finalizers are printed in `kubectl describe pvc`.

This is part of [PVC finalizer feature for 1.9](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md) where we will keep PVC waiting for deletion for a longer time than before so users should know what is going on.

/sig cli

**Release note**:
```release-note
NONE
```
…n_etcd

Automatic merge from submit-queue (batch tested with PRs 55873, 56156). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding etcd version for kubeadm upgrade plan

Adding etcd version display to kubeadm upgrade plan subcommand
```release-note
Adding etcd version display to kubeadm upgrade plan subcommand
```
Closes kubernetes/kubeadm#531
This change adds a new flag `kubeadm token create --print-join-command`. When this flag is passed, kubeadm prints the full `kubeadm join [...]` command, including the CA certificate hash which is otherwise annoying to calculate.

Example:
```
$ kubeadm token create --print-join-command
kubeadm join --token 447067.20b55955bd6abe6c 192.168.99.100:8443 --discovery-token-ca-cert-hash sha256:17023a5c90b996e50c514e63e161e46f78be216fd48c0c3df3be67e008b28889
```
Kubernetes Submit Queue and others added 14 commits November 30, 2017 09:24
…ter-resize

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Do not do fs resize on read-only mount

We should not perform file system resize when volume is mounted in read-only mode.

Fixes : kubernetes#56588

```release-note
Do not do file system resize on read-only mounts
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add andyzhangx as azure reviewer

**What this PR does / why we need it**:
add andyzhangx as azure reviewer

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```
none
```

/sig azure
/assign @jdumars @brendandburns
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add wildcard tolerations to kube-proxy

- Add wildcard tolerations to kube-proxy.
- Add `nvidia.com/gpu` toleration to nvidia-gpu-device-plugin.

Related to kubernetes#55080 and kubernetes#44445.

/kind bug
/priority critical-urgent
/sig scheduling

**Release note**:
```release-note
kube-proxy addon tolerates all NoExecute and NoSchedule taints by default.
```

/assign @davidopp @bsalamat @vishh @jiayingz
…fication

Automatic merge from submit-queue (batch tested with PRs 56589, 56503). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

MustRunAsNonRoot should reject a pod if it has non-numeric USER

**What this PR does / why we need it**:
This PR modifies kubelet behavior to reject pods with non-numeric USER instead of showing a warning.

**Special notes for your reviewer**:
Related discussion: kubernetes/community#756 (comment)

**Release note**:
```release-note
kubelet: fix bug where `runAsUser: MustRunAsNonRoot` strategy didn't reject a pod with a non-numeric `USER`.
```

PTAL @pweil- @tallclair @liggitt @Random-Liu
CC @simo5 @adelton
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

fix CreateVolume func: use search mode instead

**What this PR does / why we need it**:
This is a little fall back for CreateVolume func: use search mode for Dedicated kind as @rootfs suggested.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#52396

**Special notes for your reviewer**:
I reference the implmentation of v1.6 in the same CreateVolume func
https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/cloudprovider/providers/azure/azure_storage.go#L213-L247

**Release note**:

```
fix azure storage account exhausting issue by using azure disk mount
```
/sig azure

@rootfs @feiskyer @karataliu
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Return no volume match if prebound PV node affinity doesn't match node

**What this PR does / why we need it**:
VolumeBindingChecker predicate needs to return false for prebound PVs if the NodeAffinity doesn't match the node.

Also fix log formatting in predicate.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#56596

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

AWS: Support for mounting nvme volumes

Supports mounting nvme volumes

Fixes kubernetes#56155

```release-note
AWS: Detect EBS volumes mounted via NVME and mount them
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update Dashboard addon to version 1.8.0 and align /ui redirect with it

**What this PR does / why we need it**: In Dashboard 1.8.0 we have introduced a couple of changes (security, settings, new resources etc.) and fixed a lot of bugs. You can check release notes at https://github.com/kubernetes/dashboard/releases/tag/v1.8.0.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Updated Dashboard add-on to version 1.8.0.

- The Dashboard add-on now deploys with https enabled
- The Dashboard can be accessed via kubectl proxy at http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
- The /ui redirect is deprecated and will be removed in 1.10
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Cluster Autoscaler 1.1.0-beta1

This PR will be shortly followed with one updating Cluster Autoscaler to 1.1.0 (final).
```release-note
NONE
```
…-plugin-update

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update nvidia-gpu-device-plugin addon.

This includes changes from GoogleCloudPlatform/container-engine-accelerators#33

**Release note**:
```release-note
NONE
```

/sig node
/priority critical-urgent
/kind bug
Copy link

@rjog rjog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -239,7 +240,12 @@ func (nm *NodeManager) addNode(node *v1.Node) {

func (nm *NodeManager) removeNode(node *v1.Node) {
nm.registeredNodesLock.Lock()
nm.nodeInfoLock.Lock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can create a deadlock. Can you please change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any existing workflow that may hold a nodeInfoLock and meantime try to acquire registeredNodesLock, so deadlock should not really happen. I do agree with you it is not a good idea to acquires two locks here, but unless we make a refactoring to merge these two maps, I don't see an option to avoid this. Do you have any better suggestions?

Copy link

@abrarshivani abrarshivani Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaominchen Currently we don't have any workflow where deadlock can happen, yet to make it safe let's add common methods so that sequence to acquire and release lock remains same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we agree it's better to keep these 2 locks independently for now (before we merge those 2 maps into 1). Will update the logic.

glog.V(9).Infof("Getting accessible datastores for node %s", nodeVmDetail.NodeName)
accessibleDatastores, err := getAccessibleDatastores(ctx, &nodeVmDetail, nodeManager)
if err != nil {
return nil, err
if err != vclib.ErrNoVMFound {
return nil, err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a log message here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

…-test-gce-2

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix for the network partition tests

Fix kubernetes#56416

The underlying issue was that after cluster upgrade, the nodes talk to the master using the in-cluster IP.
The IPTables rules used for blocking were thus far only effective when the nodes used the external network interface.

Reasoning: 

api-server.log [from gce upgrade cluster](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-stable1-beta-upgrade-cluster-new/35/artifacts/bootstrap-e2e-master/kube-apiserver.log)

> I1201 13:56:34.287956       5 wrap.go:42] PATCH /api/v1/nodes/bootstrap-e2e-minion-group-hv6p/status: (18.100082ms) 200 [[node-problem-detector/v1.4.0 (linux/amd64) kubernetes/$Format] **10.128.0.4:53766**]
> I1201 13:56:34.287956       5 wrap.go:42] PATCH /api/v1/nodes/bootstrap-e2e-minion-group-hv6p/status: (18.100082ms) 200 [[node-problem-detector/v1.4.0 (linux/amd64) kubernetes/$Format] **10.128.0.4:53766**]
> I1201 13:56:34.515042       5 wrap.go:42] PATCH /api/v1/nodes/bootstrap-e2e-master/status: (4.327563ms) 200 [[kubelet/v1.9.0 (linux/amd64) kubernetes/e067596] **10.128.0.2:41898**]

api-server.log [from gce serial](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-serial/70/artifacts/test-34cf3ed1e3-master/kube-apiserver.log)

> I1201 15:59:46.863961       5 wrap.go:42] GET /api/v1/nodes/test-34cf3ed1e3-minion-group-zr99?resourceVersion=0: (926.753µs) 200 [[kubelet/v1.9.0 (linux/amd64) kubernetes/e067596] **104.154.254.154:40220**]
> I1201 15:59:46.881810       5 wrap.go:42] PATCH /api/v1/nodes/test-34cf3ed1e3-minion-group-zr99/status: (10.157704ms) 200 [[kubelet/v1.9.0 (linux/amd64) kubernetes/e067596] **104.154.254.154:40220**]

The underlying issue is one of cluster setup - but we can make the test more resilient with this change.

cc @krzyzacy @spiffxp @enisoc @jberkus @kubernetes/sig-autoscaling-misc
li-ang pushed a commit to li-ang/kubernetes that referenced this pull request Dec 16, 2017
Automatic merge from submit-queue (batch tested with PRs 56639, 56746, 56715, 56673, 56726). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix issue kubernetes#390

**What this PR does / why we need it**:
When VM node is removed from vSphere Inventory, the corresponding Kubernetes node is unregistered and removed from registeredNodes cache in nodemanager. However, it is not removed from the other node info cache in nodemanager. The fix is to update the other cache accordingly.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes vmware-archive#390

**Special notes for your reviewer**:
Internally review PR here: vmware-archive#402

**Release note**:
```
NONE
```

Testing Done:
1. Removed the node VM from vSphere inventory.
2. Create storageclass and pvc to provision volume dynamically
@rjog rjog closed this Jan 30, 2018
@shaominchen shaominchen deleted the issue_390 branch January 30, 2018 18:57
calebamiles pushed a commit to kubernetes/cloud-provider-vsphere that referenced this pull request May 22, 2018
Automatic merge from submit-queue (batch tested with PRs 56639, 56746, 56715, 56673, 56726). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix issue #390

**What this PR does / why we need it**:
When VM node is removed from vSphere Inventory, the corresponding Kubernetes node is unregistered and removed from registeredNodes cache in nodemanager. However, it is not removed from the other node info cache in nodemanager. The fix is to update the other cache accordingly.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes vmware-archive/kubernetes-archived#390

**Special notes for your reviewer**:
Internally review PR here: vmware-archive/kubernetes-archived#402

**Release note**:
```
NONE
```

Testing Done:
1. Removed the node VM from vSphere inventory.
2. Create storageclass and pvc to provision volume dynamically
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet