Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to cross-compiled docker containers & container build #1571

Closed
wants to merge 5 commits into from

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Jun 29, 2021

This allows for one container name to support multiple archs, and once pushed we can remove more of the arm64 specific installation stuff since the same container names will support both.

docker does "the right thing" and pulls the container associated with the arch.

You can take a look at the containers I built with this change in my own dockerhub at https://hub.docker.com/repository/docker/holdenk/volcanosh-scheduler , https://hub.docker.com/repository/docker/holdenk/volcanosh-controller-manager , https://hub.docker.com/repository/docker/holdenk/volcanosh-webhook-manager-base , etc.

This is in response to #1570 (although it could also solve #1568 since we wouldn't need volcano-development-arm64.yaml anymore).

To preserve backwards compatibility with users who might be developing locally in single arch mode I have that the default. If there are release docs I should update as well let me know.

Signed-off-by: Holden Karau holden@pigscanfly.ca

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 29, 2021
Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2021
@k82cn
Copy link
Member

k82cn commented Jun 29, 2021

/approve

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021
@william-wang william-wang requested review from william-wang and k82cn and removed request for merryzhou and hzxuzhonghu June 29, 2021 13:27
Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@holdenk
Copy link
Contributor Author

holdenk commented Jun 29, 2021

Looking at the failure it looks like an E2E test failure with insufficient resources on the cluster, could someone with permission run it again?

@william-wang
Copy link
Member

Let me restart it now.

@Thor-wl
Copy link
Member

Thor-wl commented Jun 30, 2021

Looking at the failure it looks like an E2E test failure with insufficient resources on the cluster, could someone with permission run it again?

Don't worry. We will restart CI as soon as we find failure. But github action really runs not so stable, which bothers us for some time.

Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@holdenk
Copy link
Contributor Author

holdenk commented Jun 30, 2021

Looks like it's still having some E2E failures, I can dig into those in more detail but if this is sort of a known issue with GHA E2E tests being flaky I'll leave that to y'all to restart the tests as needed :)

Thanks :)

@k82cn
Copy link
Member

k82cn commented Jul 1, 2021

Looks like it's still having some E2E failures, I can dig into those in more detail but if this is sort of a known issue with GHA E2E tests being flaky I'll leave that to y'all to restart the tests as needed :)

Thanks :)

Seems the Github action is unstable :(
We're working on that to improve it.

@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 8, 2021
@k82cn
Copy link
Member

k82cn commented Jul 9, 2021

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2021
@Thor-wl
Copy link
Member

Thor-wl commented Jul 9, 2021

CI has been fixed and commits can be rebased and pushed again. refer: #1595

@jasonliu747
Copy link
Member

CI has been fixed and commits can be rebased and pushed again. refer: #1595

@holdenk sorry for any inconvenience. Please follow the instruction above, let CI bot complete all necessary checks. Thanks.

@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2021
@william-wang william-wang added this to the v1.5 milestone Dec 17, 2021
@hwdef
Copy link
Member

hwdef commented Dec 21, 2021

By the way, how about the current experience of using podman to compile cross-platform images? With the abandonment of docker by kubernetes, should we consider using podman to compile images?

@odidev
Copy link

odidev commented Jan 21, 2022

Hi Team, this PR has been awaiting a pending review for a long time. May I know, is there anything stopping this PR from merging?

@@ -12,8 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM golang:1.16.5 AS builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three places where golang mirrors are used. If I want to update the version of go that compiles volcano, I need to update three places. Is there any room for optimization here?

@@ -12,8 +12,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.


FROM golang:1.16.5 AS builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golang image also used here

@@ -15,8 +15,13 @@

# The base image is created via `Dockerfile.base`, the base image is cached
# since the required packages change very rarely.
FROM golang:1.16 AS builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golang image also used here

@Thor-wl
Copy link
Member

Thor-wl commented Jan 22, 2022

@Thor-wl @k82cn @jasonliu747

I am trying to install Volcano via helm charts method on the Linux/ARM64 platform. Since the ‘volcanosh’ docker images are named differently for Linux/ARM64, the installation is unsuccessful, as the pods are crashing.

I have tried editing the names of docker images for arm64, in the values.yaml file. Also, I needed to edit CRD_VERSION to ‘v1’ in the same file, for both AMD64 and ARM64 architectures. The volcano installation using helm chart method is successful now on the Linux/ARM64 platform as well, with the above changes.

Hence, I am interested in this PR. It will be helpful if the multi-arch docker images are available with the same name for both AMD64 and ARM64 architectures.

Sorry for not give a response. The only problem is that this PR cannnot pass the CI. We are all OK about the modifications. Is there any update to fix? @odidev

@odidev
Copy link

odidev commented Jan 25, 2022

@holdenk, I have checked the changes in this PR, and I have some confusion in the same.

I can see that you have replaced docker build with docker buildx in the images tag in Makefile. But you are using ${DOCKER_PLATFORMS} to initialize –platform flag. ${DOCKER_PLATFORMS} will be linux/amd64 since REL_OSARCH=linux/$(OSARCH), and OSARCH=$(shell uname -m), which will be AMD64. Since the CI is Github Actions, which has Linux/AMD64 host available, we can only build and release Linux/AMD64 docker images with the docker buildx command using CI. We can never be able to build and release multi-arch Docker images with these changes. Kindly suggest to me if I am not clear in this understanding.

W.R.T the tests failures in the CI, these E2e tests are failing to locate Kubernetes pods, as can be seen here < https://github.com/volcano-sh/volcano/runs/4137023885?check_suite_focus=true#step:7:348 >.

I am not sure why that has happened, but the latest source code passes in those tests, so maybe a rebase can help to pass those tests.

However, I feel if the multi-arch Docker images are not getting released with these changes using the CI, then the basic motive of this PR is not achieved. Kindly provide your suggestions on the same. Please correct me if my understanding seems erroneous.

@Yikun
Copy link
Member

Yikun commented Feb 11, 2022

- name: Login to DockerHub

Looks like we should also add the buildx and qemu deps for github action?

      -
        name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      -
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

@Yikun
Copy link
Member

Yikun commented Feb 17, 2022

We can never be able to build and release multi-arch Docker images with these changes. Kindly suggest to me if I am not clear in this understanding.

@odidev As I last comments, what we need to do is just add a setup-qemu-action this will use qemu as vm to do a arm64 compile.

@odidev
Copy link

odidev commented Feb 17, 2022

@Yikun, yes you are correct. We need to install qemu before using buildx.

But my concern is, docker buildx will only build images for amd64 itself, since the “DOCKER_PLATFORMS” is initialized with $(shell uname -m), which will read Linux/amd64 on GitHub Actions CI.

To build and release multi-arch docker images for both AMD64 and ARM64, we will have to add “DOCKER_PLATFORMS="linux/amd64,linux/arm64"” with the docker buildx command.

@Yikun
Copy link
Member

Yikun commented Feb 17, 2022

@odidev Ah I got your idea now. But I think it's OK maybe.

?= indicates to set the platform variable only if it's not set/doesn't have a value.

So by default, it will be set to host arch (can reduce time in some level). If user specify the platform, will defer to user choice.

We can get this merged first, and add followup PR to setup qemu/buildx in ci and enable CI.

@odidev
Copy link

odidev commented Feb 18, 2022

@Yikun, yes correct.

I can see that volcanosh dockerhub repo contains separate images for ARM64 and pushed 7 days ago. GitHub Actions only supports Linux/AMD64 native build. So, may I know, are you releasing Linux/ARM64 docker images manually?

Also, are there any plans to release multi-arch docker images (AMD64 and ARM64)?

@Yikun
Copy link
Member

Yikun commented Feb 22, 2022

are you releasing Linux/ARM64 docker images manually?

@william-wang helped to update latest images (volcano-arm), it worked both in my arm env.

@odidev I hope we can deliver this soon, maybe v1.5.1. Could you also validate this if you have time, thanks!

also cc @martin-g

@william-wang
Copy link
Member

are you releasing Linux/ARM64 docker images manually?

@william-wang helps to update latest images (volcano-arm), it works both in my arm env.

No problem. the images will be updated automatically once the pr is merged.

@martin-g
Copy link

martin-g commented Feb 22, 2022

I've just tested the PR locally!

I've made the following minor change locally because I was running out of disk space due to many new Docker images being downloaded:

diff --git Makefile Makefile
index 5ec17dab..082e0a17 100644
--- Makefile
+++ Makefile
@@ -93,7 +93,7 @@ image_bins: init
        fi;
 
 images:
-       for name in controller-manager scheduler webhook-manager; do\
+       for name in controller-manager ; do\
                docker buildx build -t "${IMAGE_PREFIX}-$$name:$(TAG)" . -f ./installer/dockerfile/$$name/Dockerfile --output=type="${BUILDX_OUTPUT_TYPE}" --platform "${DOCKER_PLATFORMS}"; \
        done

The following has been executed on my amd64 laptop:

$ make images DOCKER_PLATFORMS="linux/amd64,linux/arm64" BUILDX_OUTPUT_TYPE=registry IMAGE_PREFIX=ghcr.io/martin-g/volcano

and it created https://github.com/users/martin-g/packages/container/package/volcano-controller-manager

And on my Linux ARM64 machine:

$ make images DOCKER_PLATFORMS="linux/amd64,linux/arm64" BUILDX_OUTPUT_TYPE=registry IMAGE_PREFIX=ghcr.io/martin-g/volcano-from-aarch64

created https://github.com/users/martin-g/packages/container/package/volcano-from-aarch64-controller-manager

Now I will update and test release.yaml Github workflow!

martin-g added a commit to martin-g/volcano that referenced this pull request Feb 22, 2022
Related-to: volcano-sh#1571

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link

martin-g added a commit to martin-g/volcano that referenced this pull request Feb 23, 2022
Related-to: volcano-sh#1571

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link

What could be the reason for this warning:

WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1

I think vcctl e2e tests fails because of it:
https://github.com/martin-g/volcano/runs/5300427754?check_suite_focus=true

@william-wang
Copy link
Member

@holdenk
The pr is urgently needed by spark 3.3, and due to confliction and ci issue resolving, I submit another pr based on this pr. Hoping you can understand. Thanks:)
#2049

@hwdef
Copy link
Member

hwdef commented Apr 2, 2022

@holdenk Any progress?

@k82cn k82cn modified the milestones: v1.5, v1.6 May 7, 2022
@stale
Copy link

stale bot commented Jul 7, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2022
@stale stale bot closed this Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet