Skip to content

[ci] build aarch64 wheel builds on a real aarch64 machine #6843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .ci/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
set -e -E -u -o pipefail

# defaults
AZURE=${AZURE:-"false"}
IN_UBUNTU_BASE_CONTAINER=${IN_UBUNTU_BASE_CONTAINER:-"false"}
SETUP_CONDA=${SETUP_CONDA:-"true"}

Expand Down
63 changes: 55 additions & 8 deletions .github/workflows/python_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,54 @@ env:
SKBUILD_STRICT_CONFIG: true

jobs:
test:
test-linux-aarch64:
name: bdist wheel (ubuntu-24.04-arm)
runs-on: ubuntu-24.04-arm
timeout-minutes: 60
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 5
submodules: true
- name: Setup and run tests
shell: bash
# this uses 'docker run' instead of just setting 'container:'
# because actions/checkout requires GLIBC 2.28 and that is too
# new for manylinux2014
env:
BUILD_DIRECTORY: /LightGBM
run: |
cat > ./docker-script.sh <<EOF
mkdir -p \$BUILD_ARTIFACTSTAGINGDIRECTORY
export CONDA=\$HOME/miniforge
export PATH=\$CONDA/bin:\$PATH
${{ env.BUILD_DIRECTORY }}/.ci/setup.sh || exit 1
${{ env.BUILD_DIRECTORY }}/.ci/test.sh || exit 1
EOF
IMAGE_URI="lightgbm/vsts-agent:manylinux2014_aarch64"
docker pull "${IMAGE_URI}" || exit 1
docker run \
--platform "${PLATFORM}" \
--rm \
--env BUILD_DIRECTORY=${{ env.BUILD_DIRECTORY }} \
--env BUILD_ARTIFACTSTAGINGDIRECTORY=${{ env.BUILD_DIRECTORY }}/artifacts/ \
--env COMPILER=gcc \
--env METHOD=wheel \
--env OS_NAME=linux \
--env PRODUCES_ARTIFACTS=true \
--env PYTHON_VERSION="3.13" \
--env TASK=bdist \
-v "${PWD}":"${{ env.BUILD_DIRECTORY }}" \
-w ${{ env.BUILD_DIRECTORY }} \
"${IMAGE_URI}" \
/bin/bash ./docker-script.sh
- name: upload wheels
uses: actions/upload-artifact@v4
with:
name: linux-aarch64-wheel
path: artifacts/*.whl
test-macos:
name: ${{ matrix.task }} ${{ matrix.method }} (${{ matrix.os }}, Python ${{ matrix.python_version }})
runs-on: ${{ matrix.os }}
timeout-minutes: 60
Expand Down Expand Up @@ -65,17 +112,13 @@ jobs:
run: |
export TASK="${{ matrix.task }}"
export METHOD="${{ matrix.method }}"
export OS_NAME="macos"
export PYTHON_VERSION="${{ matrix.python_version }}"
if [[ "${{ matrix.os }}" == "macos-14" ]]; then
# use clang when creating macOS release artifacts
export COMPILER="clang"
export OS_NAME="macos"
elif [[ "${{ matrix.os }}" == "macos-13" ]]; then
else
export COMPILER="gcc"
export OS_NAME="macos"
elif [[ "${{ matrix.os }}" == "ubuntu-latest" ]]; then
export COMPILER="clang"
export OS_NAME="linux"
fi
export BUILD_DIRECTORY="$GITHUB_WORKSPACE"
export CONDA=${HOME}/miniforge
Expand Down Expand Up @@ -152,7 +195,11 @@ jobs:
all-python-package-jobs-successful:
if: always()
runs-on: ubuntu-latest
needs: [test, test-latest-versions, test-old-versions]
needs:
- test-latest-versions
- test-macos
- test-linux-aarch64
- test-old-versions
steps:
- name: Note that all tests succeeded
uses: re-actors/alls-green@v1.2.2
Expand Down
76 changes: 0 additions & 76 deletions .vsts-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -223,81 +223,6 @@ jobs:
inputs:
filePath: $(Build.SourcesDirectory)/.ci/test.sh
targetType: 'filePath'
##################
# QEMU_multiarch #
##################
- job: QEMU_multiarch
variables:
BUILD_DIRECTORY: /LightGBM
COMPILER: gcc
PRODUCES_ARTIFACTS: 'true'
pool:
vmImage: ubuntu-22.04
timeoutInMinutes: 180
strategy:
matrix:
bdist:
TASK: bdist
ARCH: aarch64
steps:
- script: |
sudo apt-get update
sudo apt-get install --no-install-recommends -y \
binfmt-support \
qemu \
qemu-user \
qemu-user-static
displayName: 'Install QEMU'
- script: |
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
displayName: 'Enable Docker multi-architecture support'
- script: |
git clean -d -f -x
displayName: 'Clean source directory'
# LGBM_SKIP_DASK_TESTS=true is set below only because running the tests under emulation is so slow...
# in theory, 'lightgbm.dask' should work without issue on aarch64 Linux systems.
# That could probably be removed as part of https://github.com/microsoft/LightGBM/issues/6788
- script: |
cat > docker-script.sh <<EOF
export CONDA=\$HOME/miniforge
export PATH=\$CONDA/bin:/opt/rh/llvm-toolset-7.0/root/usr/bin:\$PATH
export LD_LIBRARY_PATH=/opt/rh/llvm-toolset-7.0/root/usr/lib64:\$LD_LIBRARY_PATH
Comment on lines -263 to -264
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally did not preserve this in the workflow. It shouldn't be necessary.

docker run \
  --rm \
  -it lightgbm/vsts-agent:manylinux2014_aarch64 \
  bash

echo $PATH
# /opt/rh/llvm-toolset-7.0/root/usr/bin:/opt/rh/devtoolset-10/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bi

echo $LD_LIBRARY_PATH
# /opt/rh/llvm-toolset-7.0/root/usr/lib64:/opt/rh/devtoolset-10/root/usr/lib64:/opt/rh/devtoolset-10/root/usr/lib:/opt/rh/devtoolset-10/root/usr/lib64/dyninst:/opt/rh/devtoolset-10/root/usr/lib/dyninst:/usr/local/lib64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the correct compiler (gcc) is being used:

-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rh/devtoolset-10/root/usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-10/root/usr/bin/g++ - skipped

(build link)

\$BUILD_DIRECTORY/.ci/setup.sh || exit 1
\$BUILD_DIRECTORY/.ci/test.sh || exit 1
EOF
IMAGE_URI="lightgbm/vsts-agent:manylinux2014_aarch64"
docker pull "${IMAGE_URI}" || exit 1
PLATFORM=$(docker inspect --format='{{.Os}}/{{.Architecture}}' "${IMAGE_URI}") || exit 1
echo "detected image platform: ${PLATFORM}"
docker run \
--platform "${PLATFORM}" \
--rm \
--env AZURE=true \
--env BUILD_ARTIFACTSTAGINGDIRECTORY=$BUILD_ARTIFACTSTAGINGDIRECTORY \
--env BUILD_DIRECTORY=$BUILD_DIRECTORY \
--env COMPILER=$COMPILER \
--env LGBM_SKIP_DASK_TESTS=true \
--env METHOD=$METHOD \
--env OS_NAME=linux \
--env PRODUCES_ARTIFACTS=$PRODUCES_ARTIFACTS \
--env PYTHON_VERSION=$PYTHON_VERSION \
--env TASK=$TASK \
-v "$(Build.SourcesDirectory)":"$BUILD_DIRECTORY" \
-v "$(Build.ArtifactStagingDirectory)":"$(Build.ArtifactStagingDirectory)" \
"${IMAGE_URI}" \
/bin/bash $BUILD_DIRECTORY/docker-script.sh
displayName: 'Setup and run tests'
- task: PublishBuildArtifacts@1
condition: >
and(
succeeded(),
in(variables['TASK'], 'bdist'),
not(startsWith(variables['Build.SourceBranch'], 'refs/pull/'))
)
inputs:
pathtoPublish: '$(Build.ArtifactStagingDirectory)'
artifactName: PackageAssets
artifactType: container
#########
# macOS #
#########
Expand Down Expand Up @@ -441,7 +366,6 @@ jobs:
dependsOn:
- Linux
- Linux_latest
- QEMU_multiarch
- macOS
- Windows
- R_artifact
Expand Down
4 changes: 0 additions & 4 deletions tests/python_package_test/test_dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@
pytest.mark.skipif(getenv("TASK", "") == "mpi", reason="Fails to run with MPI interface"),
pytest.mark.skipif(getenv("TASK", "") == "gpu", reason="Fails to run with GPU interface"),
pytest.mark.skipif(getenv("TASK", "") == "cuda", reason="Fails to run with CUDA interface"),
pytest.mark.skipif(
getenv("LGBM_SKIP_DASK_TESTS", "") == "true",
reason="Skipping lightgbm.dask tests (found env variable LGBM_SKIP_DASK_TESTS=true)",
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]


Expand Down
Loading