Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CPython 3.11, 3.12, and aarch64 processors #2331

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

ddelange
Copy link

@ddelange ddelange commented Jan 20, 2023

Hoi 👋

linux-aarch64 makes up for almost 10% of all platforms ref giampaolo/psutil#2103

aarch64 has already surpassed windows in terms of downloads for this package. Oracle, Amazon, Google, and Microsoft are all offering aarch64 cloud instances at an undeniable price point compared to amd/intel, so the demand will undoubtedly only grow

  • this PR is adapted from Add arm64 mac and linux wheels MagicStack/asyncpg#954
  • uses QEMU emulation for linux arm64 wheels: manylinux takes around 2.5hrs per wheel and alpine arm64 up to 4 hrs 😅
  • manylinux2014 wheels are built with GCC 10, which I think does not guarantee proper functioning of pybind11 (docs).
    • so with this PR, linux wheels are built with GCC 12 (manylinux_2_28).
    • pip will only install these wheels on linux operating systems with glibc >= 2.28 (mostly all 2020+ linux distributions like debian 10 buster, ubuntu 20.04 focal, almalinux/rhel 8, ...).

the wheels from this PR can be installed with:

# comma separated list for --find-links
export PIP_FIND_LINKS=https://github.com/ddelange/vaex/releases/expanded_assets/core-v4.17.1.post4
pip install --force-reinstall vaex

fixes #2366, fixes #2368, fixes #2397

@maartenbreddels
Copy link
Member

Hoi 👋

exciting, will take a look early next week!

  • manylinux takes around 2.5hrs per wheel and alpine arm64 up to 4 hrs

that worries me a bit.. :)

groeten,

Maarten

@ddelange
Copy link
Author

ddelange commented Jan 21, 2023

here are all timings: https://github.com/ddelange/vaex/actions/runs/3965720337/usage

depending on how often a month you release vaex, this could eat into the 2k free minutes of GH...

as the parallelization is maximised and they're pushed to PyPI as soon as they're built, most of the wheels will be available soon upon release regardless

here are all the wheels: distributions.zip

@ddelange
Copy link
Author

interestingly, that was 8260 minutes ^

apparently that's OK? then I don't understand their explanation 🤔 https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#included-storage-and-minutes

@ddelange
Copy link
Author

ddelange commented Jan 21, 2023

ah there is a fair amount of duplication in that usage table for whatever reason 🤯

@ddelange
Copy link
Author

a diff of current PyPI vs the zip above:

 vaex_core-4.16.1-cp310-cp310-macosx_10_9_x86_64.whl
 vaex_core-4.16.1-cp310-cp310-macosx_11_0_arm64.whl
-vaex_core-4.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+vaex_core-4.16.1-cp310-cp310-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp310-cp310-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp310-cp310-musllinux_1_1_aarch64.whl
 vaex_core-4.16.1-cp310-cp310-musllinux_1_1_x86_64.whl
 vaex_core-4.16.1-cp310-cp310-win_amd64.whl
+vaex_core-4.16.1-cp311-cp311-macosx_10_9_x86_64.whl
+vaex_core-4.16.1-cp311-cp311-macosx_11_0_arm64.whl
+vaex_core-4.16.1-cp311-cp311-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp311-cp311-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp311-cp311-musllinux_1_1_aarch64.whl
+vaex_core-4.16.1-cp311-cp311-musllinux_1_1_x86_64.whl
+vaex_core-4.16.1-cp311-cp311-win_amd64.whl
 vaex_core-4.16.1-cp36-cp36m-macosx_10_9_x86_64.whl
-vaex_core-4.16.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+vaex_core-4.16.1-cp36-cp36m-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp36-cp36m-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp36-cp36m-musllinux_1_1_aarch64.whl
 vaex_core-4.16.1-cp36-cp36m-musllinux_1_1_x86_64.whl
 vaex_core-4.16.1-cp36-cp36m-win_amd64.whl
 vaex_core-4.16.1-cp37-cp37m-macosx_10_9_x86_64.whl
-vaex_core-4.16.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+vaex_core-4.16.1-cp37-cp37m-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp37-cp37m-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp37-cp37m-musllinux_1_1_aarch64.whl
 vaex_core-4.16.1-cp37-cp37m-musllinux_1_1_x86_64.whl
 vaex_core-4.16.1-cp37-cp37m-win_amd64.whl
 vaex_core-4.16.1-cp38-cp38-macosx_10_9_x86_64.whl
 vaex_core-4.16.1-cp38-cp38-macosx_11_0_arm64.whl
-vaex_core-4.16.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+vaex_core-4.16.1-cp38-cp38-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp38-cp38-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp38-cp38-musllinux_1_1_aarch64.whl
 vaex_core-4.16.1-cp38-cp38-musllinux_1_1_x86_64.whl
 vaex_core-4.16.1-cp38-cp38-win_amd64.whl
 vaex_core-4.16.1-cp39-cp39-macosx_10_9_x86_64.whl
 vaex_core-4.16.1-cp39-cp39-macosx_11_0_arm64.whl
-vaex_core-4.16.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+vaex_core-4.16.1-cp39-cp39-manylinux_2_28_aarch64.whl
+vaex_core-4.16.1-cp39-cp39-manylinux_2_28_x86_64.whl
+vaex_core-4.16.1-cp39-cp39-musllinux_1_1_aarch64.whl
 vaex_core-4.16.1-cp39-cp39-musllinux_1_1_x86_64.whl
 vaex_core-4.16.1-cp39-cp39-win_amd64.whl

Comment on lines -16 to -23
namespace std {
template<>
struct hash<PyObject*> {
size_t operator()(const PyObject *const &o) const {
return PyObject_Hash((PyObject*)o);
}
};
}
Copy link
Author

@ddelange ddelange Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maartenbreddels any thoughts on this (incl me updating the pybind11 submodule)?

@@ -183,12 +183,14 @@ def __str__(self):
include_package_data=True,
ext_modules=([extension_vaexfast] if on_rtd else [extension_vaexfast, extension_strings, extension_superutils, extension_superagg]) if not use_skbuild else [],
zip_safe=False,
python_requires=">=3.6",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cibuildwheel parses this to determine which wheels to build

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @franz101

see also the diff above

@ddelange
Copy link
Author

I'm guessing this is blocked by #2339

@maartenbreddels
Copy link
Member

Just letting you know i'm very busy and had a vacation.
Yes, I'll try to get #2339 green first!

@ddelange
Copy link
Author

fwiw there are now third party free minutes on native arm64 machines, to get rid of the slow qemu builds

@ddelange ddelange changed the title Build aarch64 wheels Build aarch64 wheels and support python 3.11 Jul 10, 2023
@maartenbreddels
Copy link
Member

Could you try rebasing this?

@ddelange
Copy link
Author

@maartenbreddels already merged in master 👍

@ddelange
Copy link
Author

    ERROR: Could not find a version that satisfies the requirement vaex-core<4.17,>=4.17.0 (from vaex)
    ERROR: No matching distribution found for vaex-core<4.17,>=4.17.0

@maartenbreddels
Copy link
Member

Yeah, a bug/artifact or our release script. Should be good now.

@ddelange
Copy link
Author

ddelange commented Aug 3, 2023

hoi @maartenbreddels 👋

I pulled master and fixed merge conflicts, but it looks like CI is still not very happy. Seeing errors like hdf file missing on disk, and TypeError: train() got an unexpected keyword argument 'early_stopping_rounds'.

Do you think it might be related to this PR?

ddelange referenced this pull request in rapidfuzz/RapidFuzz Aug 10, 2023
@franz101
Copy link
Contributor

Just wondering here on the Python packaging. Python 3.6 and 3.7 are now deprecated on the other hand we can bump to 3.10 and 3.11?

@to-bee
Copy link

to-bee commented Aug 28, 2023

Do we have any updates on this MR?

@ddelange
Copy link
Author

ddelange commented Sep 1, 2023

HI @maartenbreddels 👋

Was your s3 account deleted by any chance?

vaex.open('s3://vaex/taxi/yellow_taxi_2009_2015_f32.hdf5?anon=true')

raises

FileNotFoundError: [Errno 2] Path does not exist 'vaex/taxi/yellow_taxi_2009_2015_f32.hdf5'. Detail: [errno 2] No such file or directory
image

@ddelange ddelange force-pushed the build-matrix branch 3 times, most recently from 5680eb9 to 2136629 Compare September 4, 2023 08:28
@JovanVeljanoski
Copy link
Member

Pushed some changes that should fix the failing tests in vaex-ml

@maartenbreddels
Copy link
Member

Thank you @JovanVeljanoski !
This is starting to look good, I need to fix those files that are missing now, I'm happy to fix that.
The Python 3.6 and 3.7 failures with micromamba I could use some help with.

@JovanVeljanoski
Copy link
Member

Looks like lightgbm>4. is not available via conda-forge for python < 3.8.
I will attempt to install it via pip to see if that helps.

@ddelange
Copy link
Author

base_url = 's3://vaex'

    @pytest.mark.slow
    @pytest.mark.parametrize("base_url", ["gs://vaex-data", "s3://vaex"])
    def test_cloud_glob(base_url):
>       assert set(vaex.file.glob(f'{base_url}/testing/*.hdf5', fs_options=fs_options)) >= ({f'{base_url}/testing/xys-masked.hdf5', f'{base_url}/testing/xys.hdf5'})
E       AssertionError: assert set() >= {'s3://vaex/testing/xys-masked.hdf5', 's3://vaex/testing/xys.hdf5'}
E        +  where set() = set([])
E        +    where [] = <function glob at 0x7f8447317f28>('s3://vaex/testing/*.hdf5', fs_options={'anonymous': 'true'})
E        +      where <function glob at 0x7f8447317f28> = <module 'vaex.file' from '/home/runner/work/vaex/vaex/packages/vaex-core/vaex/file/__init__.py'>.glob
E        +        where <module 'vaex.file' from '/home/runner/work/vaex/vaex/packages/vaex-core/vaex/file/__init__.py'> = vaex.file

tests/cloud_dataset_test.py:45: AssertionError

@maartenbreddels
Copy link
Member

The hash issues are due to dask/dask#10876
I think it's easier to pin dask to <2024.2.0 and keep doing that for a while and see what changes in the future (will they keep changing, or will they revert back to having the same result as before this release).

This version gives different results, although
not a problem in production (it will make your
cache invalid though), for CI we test that we
have stable keys (fingerprints)
@maartenbreddels
Copy link
Member

Getting greener, but seeing micromamba failing often, and hanging of tests on OSX.

@ddelange
Copy link
Author

hmm, looks like micromamba is still flakey. maybe relevant? https://stackoverflow.com/a/77333269/5511061

@ddelange
Copy link
Author

macos seems to be consistently hanging on https://github.com/vaexio/vaex/blob/master/tests/ml/cluster_test.py

any ideas there @JovanVeljanoski?

@EwoutH
Copy link

EwoutH commented Feb 23, 2024

Can we make this more manageable by splitting it into multiple smaller PRs? Like:

  • Removing old Python versions
  • Updating all the environments and tools used
  • Adding new Python versions (3.11 and 3.12)
  • Adding new platforms (arm64)

I feel the size and complication of this PR now holds this effort back.

@EwoutH
Copy link

EwoutH commented Mar 4, 2024

With #2417 and #2414 I started with two small steps.

@franz101
Copy link
Contributor

franz101 commented Mar 8, 2024

I compiled a list of all stable releases during the time the last build was working:
#2417 (comment)

I'm not sure which package is causing the hanging tests:
I noticed we pinned pytest-async to 0.15 latest is (0.23.5) further catboost maybe needs to be pinned

@to-bee
Copy link

to-bee commented Apr 17, 2024

Hi there. Any plans to release this soonish? Really appreciated!

@ddelange
Copy link
Author

@to-bee it would be a great help if you can install the wheels (see PR description) and report back your environment info + whether the wheels work in your environment!

@to-bee
Copy link

to-bee commented Apr 24, 2024

@ddelange yes sure.
The wheels are working fine for me. Could install without any problems.
python 3.12.3, Apple M1, ARM64_T6000 arm64, macOS 14.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants