Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Kind to version 0.8.1 (Kubernetes 1.18.0) #871

Closed
wants to merge 4 commits into from

Conversation

lalbers
Copy link
Contributor

@lalbers lalbers commented Mar 19, 2020

It would be nice if the e2e tests would run on a more recent kind version. Since most of the k8s versions are newer, it makes sense to test the operator with a more recent api.

@frittentheke
Copy link
Contributor

frittentheke commented Mar 20, 2020

@FxKu looking at the failed travis CI run this seems to not really be related to the change to kind 0.7.0.

I just ran this code on my local machine and it ran just fine (one test tripped its configured timeout though).

@FxKu
Copy link
Member

FxKu commented Mar 20, 2020

@frittentheke, yeah the node_readiness_label test keeps failing from time to time, but here also the subsequent scaling test fails, which is new. Will also run it on my machine now.

@FxKu
Copy link
Member

FxKu commented Mar 31, 2020

@lalbers can you rebase to trigger the pipeline again?

I'm still not able to run the e2e tests successfully on my machine. Before, I was able to "debug" my clusters by setting the kind KUBECONFIG. Now that the K8s config is used, I wonder what I have to set to get access to my cluster. Only getting:

The connection to the server 127.0.0.1:35623 was refused - did you specify the right host or port?

@frittentheke
Copy link
Contributor

@FxKu as for running locally ... did you see my reference to kubernetes-sigs/kind#1029 ?
The KUBECONFIG variable is now used to configure there kind shall place the credentials to the new cluster (default is ~/.kube/config)

[...]

If you want kind to use a different config file, you can either set the --kubeconfig option to kind create cluster / kind delete cluster or you can export KUBECONFIG prior to using kind

@FxKu
Copy link
Member

FxKu commented Apr 1, 2020

@frittentheke yes I could see that with kubectl config view. But running any other kubectl commands against the kind cluster got me these connnection refused errors.

@FxKu
Copy link
Member

FxKu commented Apr 16, 2020

@lalbers please merge with master to trigger another travis build. I want to get this PR merged :)

@FxKu
Copy link
Member

FxKu commented Apr 16, 2020

👍

@FxKu FxKu modified the milestones: 1.5, 1.6 Apr 28, 2020
@frittentheke
Copy link
Contributor

@FxKu some folks might run a CI that executes its jobs on Kubernetes as pods.
Maybe the article https://d2iq.com/blog/running-kind-inside-a-kubernetes-cluster-for-continuous-integration and the corresponding Github repo (https://github.com/jieyu/docker-images/tree/master/kind-cluster) holds some things that could be applied to allow the end-2-end tests run in such environments easier?

@frittentheke
Copy link
Contributor

@FxKu it might make sense it jump to Kind >= 0.8.1 right away. There are LOTS of fixes in that version and there is now (experimental) podman support which might help some folks to run the tests in a CI pipeline on Kubernetes.

See 0.8 release notes: https://github.com/kubernetes-sigs/kind/releases/tag/v0.8.0

@FxKu FxKu changed the title Update Kind to version 0.7.0 (Kubernetes 1.17.0) Update Kind to version 0.8.1 (Kubernetes 1.18.0) Jul 30, 2020
@FxKu
Copy link
Member

FxKu commented Jul 30, 2020

@frittentheke that would definitely make sense. Hopefully, this time test will pass immediately this time. Please update. Thanks!

@lalbers
Copy link
Contributor Author

lalbers commented Jul 31, 2020

Updated the go module to kind@v0.8.1.
The pipeline seems to fail, but I was able to run the test on my local system without problems.

Creating cluster "postgres-operator-e2e-tests" ...
✓ Ensuring node image (kindest/node:v1.15.3) 🖼
✓ Preparing nodes 📦📦📦
✓ Creating kubeadm config 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="postgres-operator-e2e-tests")"
kubectl cluster-info
serviceaccount/postgres-operator created
clusterrole.rbac.authorization.k8s.io/postgres-operator created
clusterrolebinding.rbac.authorization.k8s.io/postgres-operator created
clusterrole.rbac.authorization.k8s.io/postgres-pod created
configmap/postgres-operator created
deployment.apps/postgres-operator created
postgresql.acid.zalan.do/acid-minimal-cluster created
test_enable_load_balancer (test_e2e.EndToEndTestCase) ... ok
test_logical_backup_cron_job (test_e2e.EndToEndTestCase) ... ok
test_min_resource_limits (test_e2e.EndToEndTestCase) ... ok
test_multi_namespace_support (test_e2e.EndToEndTestCase) ... postgresql.acid.zalan.do/acid-test-cluster created
ok
test_node_readiness_label (test_e2e.EndToEndTestCase) ... ok
test_scaling (test_e2e.EndToEndTestCase) ... ok
test_service_annotations (test_e2e.EndToEndTestCase) ... ok
test_taint_based_eviction (test_e2e.EndToEndTestCase) ... ok


Ran 8 tests in 1155.446s

OK
Tested operator image: registry.opensource.zalan.do/acid/postgres-operator:v1.4.0-10-g9ddee8f
Deleting cluster "postgres-operator-e2e-tests" ...

@Jan-M
Copy link
Member

Jan-M commented Aug 4, 2020

Has this PR gotten a bit out of hand of the original scope? Seems unlikely to mix/merge 88 changed files with so many different changes under "Kind update"

@lalbers
Copy link
Contributor Author

lalbers commented Aug 8, 2020

Sorry, I think it messed up my local repo somehow :/
Commits should be clean now.

@FxKu
Copy link
Member

FxKu commented Aug 11, 2020

Hm still no luck for me:

./run.sh
Creating cluster "postgres-operator-e2e-tests" ...
 ✓ Ensuring node image (kindest/node:v1.15.3) 🖼
 ✓ Preparing nodes 📦📦📦 
 ✓ Creating kubeadm config 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
 ✓ Joining worker nodes 🚜 
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="postgres-operator-e2e-tests")"
kubectl cluster-info
Error: exit status 1
Deleting cluster "postgres-operator-e2e-tests" ...

Travis also doesn't like it but gives a few more details:

ERROR
======================================================================
ERROR: setUpClass (test_e2e.EndToEndTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/timeout_decorator/timeout_decorator.py", line 81, in new_function
    return function(*args, **kwargs)
  File "/test_e2e.py", line 44, in setUpClass
    k8s.api.core_v1.create_namespace(v1_namespace)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 5316, in create_namespace
    (data) = self.create_namespace_with_http_info(body, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/apis/core_v1_api.py", line 5401, in create_namespace_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 334, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 168, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 377, in request
    body=body)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 266, in POST
    body=body)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/rest.py", line 166, in request
    headers=headers)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/request.py", line 80, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/usr/local/lib/python3.6/dist-packages/urllib3/request.py", line 171, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py", line 325, in urlopen
    conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py", line 231, in connection_from_host
    raise LocationValueError("No host specified.")
urllib3.exceptions.LocationValueError: No host specified.
----------------------------------------------------------------------
Ran 0 tests in 0.018s
FAILED (errors=1)
Deleting cluster "postgres-operator-e2e-tests" ...
make[1]: *** [e2etest] Error 1
make[1]: Leaving directory `/home/travis/gopath/src/github.com/zalando/postgres-operator/e2e'
make: *** [e2e] Error 2
The command "make e2e" exited with 2.

@frittentheke
Copy link
Contributor

Hm still no luck for me:

./run.sh
Creating cluster "postgres-operator-e2e-tests" ...
 ✓ Ensuring node image (kindest/node:v1.15.3) 🖼
 ✓ Preparing nodes 📦📦📦 

According to https://github.com/kubernetes-sigs/kind/releases/tag/v0.8.0 the node image should be kindest/node:v1.18.2@sha256:7b27a6d0f2517ff88ba444025beae41491b016bc6af573ba467b70c5e8e0d85f

so it's rather strange that your kind seems to be starting up a 1.15.3 node ...

@frittentheke
Copy link
Contributor

File "/usr/local/lib/python3.6/dist-packages/urllib3/poolmanager.py", line 231, in connection_from_host
raise LocationValueError("No host specified.")
urllib3.exceptions.LocationValueError: No host specified.

@FxKu This error is caused by the run.sh extracting the IPAddress at path of doing docker inspect.
Since Kind 0.8.x a dedicated docker network named kind is used (see https://github.com/kubernetes-sigs/kind/releases/tag/v0.8.0 - BREAKING CHANGES section).

So the query should be docker inspect --format "{{ .NetworkSettings.Networks.kind.IPAddress }}"
and then following this lead, the test container spawed by docker run need to be placed into the kind network as well.
This is the diff of what I did to make them e2e tests fly:

diff --git a/e2e/run.sh b/e2e/run.sh
index 0a2689d..8ace743 100755
--- a/e2e/run.sh
+++ b/e2e/run.sh
@@ -45,13 +45,13 @@ function set_kind_api_server_ip(){
   # use the actual kubeconfig to connect to the 'kind' API server
   # but update the IP address of the API server to the one from the Docker 'bridge' network
   readonly local kind_api_server_port=6443 # well-known in the 'kind' codebase
-  readonly local kind_api_server=$(docker inspect --format "{{ .NetworkSettings.IPAddress }}:${kind_api_server_port}" "${cluster_name}"-control-plane)
+  readonly local kind_api_server=$(docker inspect --format "{{ .NetworkSettings.Networks.kind.IPAddress }}:${kind_api_server_port}" "${cluster_name}"-control-plane)
   sed -i "s/server.*$/server: https:\/\/$kind_api_server/g" "${kubeconfig_path}"
 }
 
 function run_tests(){
 
-  docker run --rm --mount type=bind,source="$(readlink -f ${kubeconfig_path})",target=/root/.kube/config -e OPERATOR_IMAGE="${operator_image}" "${e2e_test_image}"
+  docker run --rm --network kind --mount type=bind,source="$(readlink -f ${kubeconfig_path})",target=/root/.kube/config -e OPERATOR_IMAGE="${operator_image}" "${e2e_test_image}"
 }
 
 function clean_up(){

@lalbers FYI ^^

@FxKu
Copy link
Member

FxKu commented Aug 13, 2020

Thanks @frittentheke for solving this issue and @lalbers for updating. I've just tests on my machine and it "seems" to work (test fail because of timeouts but's that's likely due to my setup). On travis, one test has an issue so I'm rebuilding now. Maybe that's the only thing then left to fix. The K8s 1.15 image appeared in my logs as I did not update kind but ran ./run.sh straight away.

@FxKu
Copy link
Member

FxKu commented Aug 13, 2020

So it's the same like in march: The test_node_readiness_label fails. Therefore we cannot merge this PR before this is fixed - maybe in a separate PR. There seems to be some breaking change in K8s, so that either the whole feature, or just the py lib we use is not working as expected. Can you update the dependency to 11.0.0 in requirements?

@FxKu
Copy link
Member

FxKu commented Aug 14, 2020

testing it manually with kind I could see that patching a node with the node readiness label lead to termination of the pod on the other worker - that's fine. But then it is not assigned to the node that has the label. On kubectl describe pod I got:

Warning  FailedScheduling  43s (x6 over 6m18s)  default-scheduler  0/3 nodes are available: 1 node(s) had volume node affinity conflict, 2 node(s) didn't match node selector.

Maybe this helps us moving further.

@FxKu
Copy link
Member

FxKu commented Aug 19, 2020

Since kind 0.7.0 rancher/local-path-provisioner is used for volume provisioning. And this one seems to only take the node name as affinity, not hostname as it was done before with the former storage class. E.g. when I kubectl describe the volume that is referenced by a pending pod which was evicted from a node that lacks the node readiness label (kind-m-worker2 in this case):

Node Affinity:     
  Required Terms:  
    Term 0:        kubernetes.io/hostname in [kind-m-worker2]

Maybe we could add the legacy storage class for e2e tests and specify it in the Postgres manifest to fix this problem. But I do wonder why tests pass in your setup? Maybe the solution is even simpler 😃

@FxKu
Copy link
Member

FxKu commented Aug 25, 2020

@lalbers can you add the storage class mentioned here in the manifest folder and use it the e2e tests? Maybe call it e2e-storage-class.yaml to that the use case is clear. I guess, tests should pass then.

@FxKu
Copy link
Member

FxKu commented Aug 28, 2020

@lalbers @frittentheke I've continued your work in #1121. Exchanging the storage class was not enough. Had to fix another long standing issue when calling list_node(). Now we are on kind v0.8.1 🥳 Thanks for pushing and contributing.

@FxKu FxKu closed this Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants