Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tonistiigi/binfmt:latest (qemu 9.2.0) segfault on ubuntu-24.04 / ubuntu-latest #198

Closed
2 of 3 tasks
mayeut opened this issue Feb 9, 2025 · 19 comments · Fixed by vllm-project/vllm-ascend#64
Closed
2 of 3 tasks

Comments

@mayeut
Copy link

mayeut commented Feb 9, 2025

Contributing guidelines

I've found a bug, and:

  • The documentation does not mention anything about my problem
  • There are no open or closed issues that are related to my problem

Description

As mentioned in #188, the default image is unusable on ubuntu-24.04 since a kernel update a few weeks back.

As reported in that issue, moving to the qemu 8.1.5 image solved the issue for most users (since then, there has been a 9.2.0 / master image published).

It seems more & more unlikely that the latest image will be updated and more likely that setup-qemu-action should change the default image (possibly depending on the runner) to an image that works in most cases rather than what I imagine is almost none => hence this is a bug report setup-qemu-action can do something about without closing saying it's an upstream issue.

Multiple reproducers/logs/... are already available in that issue so not added new ones here.

Expected behaviour

No random segmentation faults.

Actual behaviour

Random segmentation faults in all the repositories I know of when the action is used with the default image on ubuntu-24.04 (latest)

Repository URL

No response

Workflow run URL

No response

YAML workflow

N/C

Workflow logs

No response

BuildKit logs


Additional info

No response

bettio added a commit to atomvm/AtomVM that referenced this issue Feb 10, 2025
Run qemu on Ubuntu 22.04

See also:
- actions/runner-images#11471
- docker/setup-qemu-action#188
- docker/setup-qemu-action#198

Upgrading to QEMU v8.1.5 doesn't seem to help, so
closes #1529

I runt the CI multiple times and it always worked, so I think this downgrade really "fixes" the issue.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
@kxc171
Copy link

kxc171 commented Feb 12, 2025

This is impacting us as well, as of this morning most of our builds that use linux/arm64 result in a segmentation error. Pinning to tonistiigi/binfmt:qemu-v7.0.0-28 seems to have resolved the issue.

There is also an issue opened on tonistiigi/binfmt, tonistiigi/binfmt#240

@dylanTesouro
Copy link

This is also effecting us as well.

@goshander
Copy link

Same issue for linux/arm64

@crazy-max crazy-max changed the title The default settings are buggy on ubuntu-24.04 / ubuntu-latest tonistiigi/binfmt:latest (qemu 9.2.0) segfault on ubuntu-24.04 / ubuntu-latest Feb 12, 2025
@crazy-max
Copy link
Member

This is probably related to a kernel issue on Ubuntu: tonistiigi/binfmt#215 (comment)

@aeggerm
Copy link

aeggerm commented Feb 12, 2025

A workaround that we found is to run our docker builds on ARM based VMs with Qemu emulating the AMD64.

Minimal modification was needed for our use case (CI runners).
Hope this will help someone :)

@giovaborgogno
Copy link

This is impacting us as well, as of this morning most of our builds that use linux/arm64 result in a segmentation error. Pinning to tonistiigi/binfmt:qemu-v7.0.0-28 seems to have resolved the issue.

There is also an issue opened on tonistiigi/binfmt, tonistiigi/binfmt#240

This worked for me, thanks

@bluecamel
Copy link

Here's a minimalish repro based on our build that was failing. Maybe worth noting is that our runner is ubicloud. Also, the same thing happens on the latest version of this action (even though the example below is v2) as well as the buildx action.

Dockerfile:

FROM ros:iron-ros-base-jammy

RUN apt-get update && \
    apt-get install -y \
        default-jdk \
        ros-iron-rmw-cyclonedds-cpp \
        libasio-dev \
        python3-pip \
        dumb-init && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

RUN apt-get update && \
    apt-get install -y \
        ros-iron-cv-bridge \
        ros-iron-image-transport \
        libusb-1.0-0-dev \
        libgstreamer1.0-dev \
        gstreamer1.0-tools \
        gstreamer1.0-plugins-bad \
        gstreamer1.0-plugins-ugly && \
    rm -rf /var/lib/apt/lists/*

Excerpt from our GitHub workflow:

jobs:
  docker_release:
    permissions:
      contents: read
      id-token: write
    runs-on: ubicloud
    steps:
      - name: Checkout project
        uses: actions/checkout@v3
        with:
          submodules: recursive
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-region: ${{ inputs.AWS_REGION }}
          role-to-assume: arn:aws:iam::${{ inputs.AWS_ACCOUNT_ID }}:role/github-actions-${{ inputs.AWS_REGION }}-${{ github.repository_owner }}-${{ github.event.repository.name }}
      - name: Login to Amazon ECR
        id: login_ecr
        uses: aws-actions/amazon-ecr-login@v1
      - name: Setup QEMU
        uses: docker/setup-qemu-action@v2
      - name: Setup Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Increment Version and Create Tag
        id: tag
        uses: mathieudutour/github-tag-action@v6.1
        with:
          github_token: ${{ secrets.CI_GITHUB_TOKEN }}
      - name: Get repo name with dashes instead of underscores
        id: get_repo_name_dashes
        run: |
          REPO_NAME_DASHES=${{ github.event.repository.name }}
          REPO_NAME_DASHES="${REPO_NAME_DASHES//_/-}"
          echo "REPO_NAME_DASHES=${REPO_NAME_DASHES}" >> ${GITHUB_OUTPUT}
      - name: Extract Docker Metadata (tags, labels)
        id: docker_metadata
        uses: docker/metadata-action@v4
        with:
          images: ${{ inputs.AWS_ACCOUNT_ID }}.dkr.ecr.${{ inputs.AWS_REGION }}.amazonaws.com/path/to/repo/${{ steps.get_repo_name_dashes.outputs.REPO_NAME_DASHES }}
          tags: |
            type=raw,value=latest,enable={{ is_default_branch }}
            type=semver,pattern={{version}},value=${{ steps.tag.outputs.new_tag }}
      - name: Build and Push Image
        id: container_image_build_push
        uses: docker/build-push-action@v4
        with:
          context: .
          file: docker/Dockerfile
          platforms: linux/amd64,linux/arm64/v8
          push: true
          tags: ${{ steps.docker_metadata.outputs.tags }}
          labels: ${{ steps.docker_metadata.outputs.labels }}

@tejashah88
Copy link

tejashah88 commented Feb 12, 2025

A workaround that we found is to run our docker builds on ARM based VMs with Qemu emulating the AMD64.

Minimal modification was needed for our use case (CI runners). Hope this will help someone :)

I mentioned it in another issue (here) but I found another workaround as well. If you set up your Dockerfile as a multi-stage build with multiarch/qemu-user-static as a base and inject the QEMU binaries into your target image, no segfaults will generate. Here's an example:

# Use multiarch/qemu-user-static only for non-native architectures. Ending tag is excluded due to specifying platform
# See https://docs.docker.com/reference/dockerfile/#automatic-platform-args-in-the-global-scope for more info
FROM --platform=$BUILDPLATFORM multiarch/qemu-user-static AS qemu

# Start with base image
FROM luxonis/depthai-library:latest

# Copy QEMU binary only when cross-compiling with ARM and ARM64
COPY --from=qemu /usr/bin/qemu-*-static /usr/bin/

# More build statements...

This was tested on Windows 10 Pro x64 host machine while building docker images to be used on Raspberry Pi 3s (linux/arm/v7) and 4s (linux/arm64/v8). To be clear, only the linux/arm64/v8 builds were failing.

I'd like to know why this works, as this seems too simple for my liking but I'm glad this is a decent workaround for now.

eabatalov added a commit to tensorlakeai/indexify that referenced this issue Feb 13, 2025
We're not currently running Server on ARM, so we don't really
need this. This also allows to workaround QEMU segfaulting on ARM
see docker/setup-qemu-action#198.
diptanu pushed a commit to tensorlakeai/indexify that referenced this issue Feb 13, 2025
We're not currently running Server on ARM, so we don't really
need this. This also allows to workaround QEMU segfaulting on ARM
see docker/setup-qemu-action#198.
@shink
Copy link

shink commented Feb 13, 2025

Same issue

@DoingItNow
Copy link

At first it was intermitted failures now I consistently with fail every time when attempting to docker buildx

j3soon added a commit to j3soon/ros2-essentials that referenced this issue Feb 20, 2025
devinrsmith added a commit to devinrsmith/deephaven-server-docker that referenced this issue Feb 20, 2025
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this issue Feb 21, 2025
### What this PR does / why we need it?

Backport vllm-project#64 to
v0.7.1-dev branch

Add container image build ci:
- Enable branch, tag docker image publish
    - branch image: `vllm-ascend:main`, `vllm-ascend:v0.7.1-dev`
    - tag image: `vllm-ascend:v0.7.1rc1`
- Enable PR docker image build check
- other changes:
    - Prepare the `REPO_OWNER` because the ghcr lowerercase required
- Add `Free up disk space` step to avoid `No space left on device` like
vllm-project#27
- Setup qemu with image to resolve
docker/setup-qemu-action#198

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
build: CI passed

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
devinrsmith added a commit to deephaven/deephaven-server-docker that referenced this issue Feb 21, 2025
SoulPancake pushed a commit to SoulPancake/gocv that referenced this issue Feb 21, 2025
Signed-off-by: deadprogram <ron@hybridgroup.com>
123marvin123 added a commit to 123marvin123/typst-docker that referenced this issue Feb 21, 2025
zspitzer added a commit to lucee/lucee-dockerfiles that referenced this issue Feb 21, 2025
@booleanbetrayal
Copy link

We are mysteriously encountering this issue on ARM64 self-hosted GitHub Actions (ghcr.io/actions/actions-runner:2.322.0) based runners, beginning sometime over the past 72 hours, targeting AMD64. None of the workarounds have worked for us. Anyone else in the same boat?

@tair-itzhak
Copy link

We are mysteriously encountering this issue on ARM64 self-hosted GitHub Actions (ghcr.io/actions/actions-runner:2.322.0) based runners, beginning sometime over the past 72 hours, targeting AMD64. None of the workarounds have worked for us. Anyone else in the same boat?

We experience the same issue with actions-runners

m1k1o added a commit to m1k1o/neko that referenced this issue Feb 23, 2025
m1k1o added a commit to m1k1o/neko that referenced this issue Feb 23, 2025
dc-mak pushed a commit to rems-project/cerberus that referenced this issue Feb 26, 2025
* Bump setup-qemu-action to 3.4.0
* Bump Ubuntu version

docker/setup-qemu-action#198 caused the Ubuntu Docker image build to fail, and this commit fixes that.
@dyrnq
Copy link

dyrnq commented Feb 27, 2025

Same issue for linux/arm64

@booleanbetrayal
Copy link

We are mysteriously encountering this issue on ARM64 self-hosted GitHub Actions (ghcr.io/actions/actions-runner:2.322.0) based runners, beginning sometime over the past 72 hours, targeting AMD64. None of the workarounds have worked for us. Anyone else in the same boat?

Specifically, this is happening during an addgroup RUN call in our Dockerfile.

@alphonsekoh
Copy link

alphonsekoh commented Feb 28, 2025

For me, as a workaround I downgraded it to ubuntu-22.04 however the build time takes a huge impact; when building my docker image is 4x slower. Used to take around 10 mins when building using linux/arm64, now it take 40 mins.

@crazy-max
Copy link
Member

https://github.com/tonistiigi/binfmt/releases/tag/deploy%2Fv9.2.2-52 (QEMU 9.2.2) has been released yesterday: tonistiigi/binfmt#215 (comment)

If you encounter something similar, open an issue on https://github.com/tonistiigi/binfmt/issues

mstorsjo added a commit to mstorsjo/llvm-mingw that referenced this issue Feb 28, 2025
This reverts commit 3dbfd17.

This shouldn't be necessary any longer; QEMU in
tonistiigi/binfmt:latest has been updated to 9.2.2, which has
this bug fixed - see
docker/setup-qemu-action#198 (comment)
tonistiigi/binfmt#215 (comment)
and https://gitlab.com/qemu-project/qemu/-/issues/1913.
espressif-bot pushed a commit to espressif/esp-idf that referenced this issue Feb 28, 2025
Works around issue from
https://github.com/espressif/esp-idf/actions/runs/13531037397/job/37813060700
caused by Qemu segmentation fault.

    Errors were encountered while processing: libc-bin

The workaround is from docker/setup-qemu-action#198.
espressif-bot pushed a commit to espressif/esp-idf that referenced this issue Mar 6, 2025
Works around issue from
https://github.com/espressif/esp-idf/actions/runs/13531037397/job/37813060700
caused by Qemu segmentation fault.

    Errors were encountered while processing: libc-bin

The workaround is from docker/setup-qemu-action#198.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.