ARM wheels by anivegesana · Pull Request #12 · wangyiqiu/dbscan-python

anivegesana · 2022-11-30T06:24:02Z

This PR builds the wheels for the ARM architecture and makes some minor rearrangements in the README to bring the parts that talk about Python into the same section. It adds a autogenerated _version.py file which will contain the version of the wheel, which can be automatically updated from the Git commit hash or tag.

I did not compile the Windows ARM wheels because numpy doesn't ship Windows ARM wheels. The Linux ARM wheels are only compiled on master to save build time. A single Linux ARM build is 20 minutes. The only ARM build that is run routinely is the Mac one, since M1/M2 Macs are the latest generation of Macs.

… into 0.0.10-dev

anivegesana · 2022-11-30T06:39:59Z

I think Mac still takes too long. I think it is best to drop Mac x86 from non-master branch builds. I do not want to rebuild this without deciding on the subset of builds that should be built each time, but here are the changes that can be made to make it save time.

Drop Mac x86 on debug builds

diff --git a/.github/workflows/build_wheels.yml b/.github/workflows/build_wheels.yml
index e476f70..096c096 100644
--- a/.github/workflows/build_wheels.yml
+++ b/.github/workflows/build_wheels.yml
@@ -19,8 +19,6 @@ jobs:
 
     steps:
       - uses: actions/checkout@v3
-        with:
-          fetch-depth: 0
 
       - name: Build
         run: |
@@ -46,19 +44,22 @@ jobs:
           fetch-depth: 0
 
       - name: Set up QEMU
-        if: runner.os == 'Linux'
+        if: runner.os == 'Linux' && (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/tags/v'))
         uses: docker/setup-qemu-action@v2
         with:
           platforms: all
 
       - name: Build wheels (development)
-        if: github.ref != 'refs/heads/master'
+        if: github.ref != 'refs/heads/master' && !startsWith(github.ref, 'refs/tags/v')
         uses: pypa/cibuildwheel@v2.11.2
+        env:
+          CIBW_ARCHS_MACOS: "arm64"
 
       - name: Build wheels (production)
-        if: github.ref == 'refs/heads/master'
+        if: github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/tags/v')
         uses: pypa/cibuildwheel@v2.11.2
         env:
+          CIBW_ARCHS_MACOS: "x86_64 arm64"
           CIBW_ARCHS_LINUX: "auto aarch64"
 
       - uses: actions/upload-artifact@v3
@@ -68,6 +69,7 @@ jobs:
   build_sdist:
     name: Build source distribution
     runs-on: ubuntu-latest
+    if: github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/tags/v')
     steps:
       - uses: actions/checkout@v3
         with:
diff --git a/pyproject.toml b/pyproject.toml
index 0024880..b82ca36 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -53,8 +53,10 @@ write_to = "pythonmodule/_version.py"
 build = "cp36-* cp38-macosx_arm64" # pp39-* cp39-win_arm64
 skip = "*-win32 *_i686 *-musllinux_*"
 
-[tool.cibuildwheel.macos]
-archs = ["x86_64", "arm64"]
+# We will only build x86_64 when merged into master to save on compilation
+# time.
+# [tool.cibuildwheel.macos]
+# archs = ["x86_64", "arm64"]
 
 # We will only build aarch64 when merged into master to save on compilation
 # time.
diff --git a/pythonmodule/__init__.py b/pythonmodule/__init__.py
index ec09bcc..5ddd39f 100644
--- a/pythonmodule/__init__.py
+++ b/pythonmodule/__init__.py
@@ -4,7 +4,7 @@ from ._dbscan import *
 from . import _dbscan
 __all__ = tuple(v for v in dir(_dbscan) if v.startswith('_'))
 try:
-    from ._version import __version__
+    from ._version import version as __version__
     __all__ += ('__version__',)
 except:
     pass

What do you think is better? Having a separate branch for staging the more time-intensive builds or working mainly on other branches and merging into master when it is working on the more easy to compile subset (4 minutes per build amounting to 1 hour of free credits)?

I can change the destination of this PR to master if you think this is good.

wangyiqiu · 2022-11-30T18:05:06Z

Hi @anivegesana good suggestion.

Yes, I agree that let's work mainly on other branches, and put the time-insensitive builds on the master. You could change the destination of this PR to the master branch.

By the way, @anivegesana are you interested in becoming a maintainer for the PyPI repository of this project? If yes, let me know your PyPI username.

anivegesana · 2022-12-01T04:58:23Z

Hey @wangyiqiu, I created a PyPI username. It matches my Github username: @anivegesana. I would be happy to be a maintainer on dbscan, but I don't know how much maintenance this project will require. Using the Python limited ABI and the CI reduce the amount of effort required to publish to PyPI because the wheels are built automatically and, unlike Cython extensions, are portable across Python versions, so we do not need to periodically recompile/update the package to allow users of new Python versions to use it. We compile dbscan once and it is ready forever.

The only real maintenance would be adding features or bug fixes. I will read the paper and get accustomed to the dbscan folder itself. (I have only looked at the outward facing Python package up until now.) I will also follow the issues and open pull requests to try to resolve them.

Thank you!

wangyiqiu · 2022-12-01T21:39:53Z

Great thanks @anivegesana -- I have sent you an invite on PyPI.

As you may have already know, the code stems from one of the methods in this paper, and the goal of this repository is to enable more people to use the code in real applications. I really like your work in making the code more accessible by modifying the API, streamlining the build, improving the portability and compatibility. They are very important for making a bigger impact.

I think the short-term goal is to make v0.0.10 work well on most platforms / environment without bugs like you have mentioned. Looking forward, it may also be interesting to support dimensional beyond 20 from a software standpoint. Potential solutions may include falling back to other existing DBSCAN implementations when the dimensionality is high, or write our own fast DBSCAN code for higher dimensional data.

anivegesana · 2022-12-04T01:20:24Z

@wangyiqiu Maybe it would be a good idea for us to look into manually triggering builds on master. There is a rather limited budget for number of free minutes and it would be irritating to not be able to build things if the minutes run out. In fact, maybe we should figure that out before publishing v0.0.10?

wangyiqiu · 2022-12-06T20:06:03Z

That's a good idea, though I did not figure out how to do that. At the moment I will upgrade to Pro if the minutes run out.

anivegesana · 2022-12-06T20:53:37Z

I can also look into other CIs like Circle CI and Cirrus CI. Worst comes to worst, we can just disable MacOS builds on master and development and I could periodically build them on my MacBook. These cost the most because Apple makes it hard to reverse-engineer their products and create virtual machines, so the builds are being done on physical Mac devices. This additional cost would be the same across all CIs. The only difference would be how much free credit they give a month. This would be inconvenient though, but it would work in the short term.

anivegesana · 2022-12-10T00:04:28Z

@wangyiqiu Considering that we don't seem to use that many builds a month, I don't think it will be a big concern for now. I can look into it a bit later. You can go ahead and push the release.

We can just use only the standard C++ libraries and get away with only testing development branches on Linux because we would expect the code to work on every system that C++ is supported on. We should first rewrite example.py as a test case to ensure that the wheels are properly working before publishing if we choose to do that.

wangyiqiu · 2022-12-11T18:58:44Z

Sounds good, I agree.

I just pushed a branch called 0.0.11-dev. I thought of a few items that we can include in this update (including the one you mentioned):

Support float32, int32 and/or int64 in the Python API where the basic idea is to cast non-double input into doubles. I tried to work with this function call in dbscanmodule.cpp but did ont get it to work. I tried to instead call a PyArray_CheckFromAny function followed by some type casts, but the data seems to be garbled at the beginning. Are you familiar with how this works? Also, is it easier if we handle the data type conversion in Python rather than C++?
Update example.py like you have mentioned. Let's test most / all the wheels before publishing 0.0.11.
Update running time diagram in case the performance properties changed since 0.0.10.
Use our current README as a description for the PyPI repository.

Any thoughts?

anivegesana · 2022-12-11T21:33:54Z

Hi Yiqiu, thanks for the update! (pun intended)

I could be wrong, but PyArray_FROMANY creates a copy of the original array object if the dtype doesn't match. It isn't explicitly stated in the documentation for that function, but there is an example for the similar np.asarray. In addition, I believe that the example still works when using int64 or the other types you mentioned.

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
import dbscan

centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=750, centers=centers,
                            cluster_std=0.4, random_state=0)
X = StandardScaler().fit_transform(X)

X_labels, X_core = dbscan.DBSCAN(X, eps=0.3, min_samples=10)

Y = (X * 100000).astype(np.int64)
Y_labels, Y_core = dbscan.DBSCAN(Y, eps=30000, min_samples=10)
assert (X_labels == Y_labels).all() and (X_core == Y_core).all()

Well, I believe that example.py is still useful for people to be able to understand how to use the library. I think we should have a separate file test_python_module.py that contains an almost exact copy of example.py, but can be run without matplotlib.
Yep. It is a good idea to retime everything and make sure that the figure is accurate and truthful and it didn't get slower. I think if it did get slower, however, it is not likely due to significant overhead from the limited API because it should only add a couple of additional function calls which would be many times smaller than the number of operations in the DBSCAN code itself. If there is a big timing difference, we should investigate to see if we added a bug in between versions.
It seems that I didn't read the fine print on something. setuptools doesn't yet support all of the fields in pyproject.toml even though they are published by the same people. So I shouldn't have moved all of the fields over from setup.py to pyproject.toml. I will put the information in both places so that setuptools is happy. We can publish a v0.0.10.1 to fix that problem or just try to fix the following issue and make a v0.0.11 release.

A bug that noticed but was unable to track down was that after running the DBSCAN function, the CPU utilization would remain high even after the function terminated. It also happened in 0.0.9, so I think it might be somewhere in the actual C++ library itself, probably in pbbs. If you have any ideas, you can investigate because I don't really know where to start.

anivegesana · 2022-12-11T23:43:37Z

Fixed the problems mentioned above (except for the CPU usage bug.) The Python test case are run during the build phase of cibuildwheel. I manually verified that the METADATA of the wheel does have the README information and added a test case to check the metadata of the wheel.

Here are the changes: #13

Looking into the CPU usage issue. Let's see if I can tackle that one too. The unnecessary busy wait is on line 283 of scheduler.h. It looks pretty hacky:

std::this_thread::sleep_for(std::chrono::nanoseconds(num_deques * 100));

Why did they not use condition_variables? It would let the OS deal with the waiting instead of re-awaking the thread every num_deques * 100 only for it to fall asleep again. Surprisingly enough, pbbs seems to have still kept this ugly line of code: https://github.com/cmuparlay/parlaylib/blob/master/include/parlay/scheduler.h#L266

I need to find some time to mess around with it to make sure I am not adding any deadlocks, so maybe I should create the PR now for the fixes above and slowly work out how to fix this up later.

anivegesana · 2022-12-13T02:14:11Z

I have a solution here: https://github.com/anivegesana/dbscan-python/tree/atomic_wait

I am trying to use condition_variable to do it to not have to include a backport of some C++20 synchronization primitives (std::atomic_wait). I don't think it can be done without semaphores or shared mutexes (just any synchonization primitive that allows multiple threads to access one resource at the same time and wait until said resource is available), which both came into the standard later, so we might have to include this repo: https://github.com/ogiroux/atomic_wait. Don't know if you want to include a copy of it directly in this repo (like pbbs) or if you want to just use the build system to download it during compilation (like gtest.) Any thoughts/knowledge/experience you can lend?

wangyiqiu · 2022-12-13T21:49:29Z

Thanks! @anivegesana I'm still looking into it and will get back to you asap.

anivegesana · 2022-12-13T23:14:06Z

The other option is to drop older platforms and upgrade to C++20. Right now, I use defaults for a lot of the cibuildwheel settings. For most of the platforms, the wheels are compiled on Python 3.6. Changing the settings to compile on 3.8 (so that the versions of GCC match on Linux) wouldn't change anything for the end user since we never promised supporting older versions than that. The problem is with dropping older OS versions (like MacOS Catalina and Linuxes with GCC<12.) We could do it, but I do not know if people are using outdated OSes to install this package and it might be a good idea to leave this small buffer for a year or two in case they are.

anivegesana · 2022-12-16T19:06:57Z

I think it might be possible to just stick with C++17 and still remove the busy waiting in the scheduler. This pattern seems to do the same thing as the C++20 atomic_wait, but I have only tried it out on macOS Big Sur and Ubuntu.

#ifdef __cpp_lib_atomic_wait
  std::atomic<int> deque_size;
#else
  struct : std::atomic<int> {
    std::condition_variable_any cv{};

    inline void notify_one() {
      cv.notify_one();
    }
    inline void notify_all() {
      cv.notify_all();
    }
    inline void wait(int val) {
      static struct {
        void lock() {}
        void unlock() {}
      } mutex{}; // a dummy mutex

      bool waited = false;
      if (load() == val)
        cv.wait(mutex, [&]() {
          bool old_waited = waited;
          waited = true;
          return old_waited;
        });
    }
  } deque_size;
#endif

I am trying to see if I can verify that this performs the same operation as atomic_wait and if so, that would save the headache of vendoring the atomic_wait backport. I also am now keeping track of the number of jobs available because that seems to be working better. (anivegesana@70f2a45) Still do not know if this is equivalent/works. Will look into verifying that.

wangyiqiu · 2023-01-03T20:37:09Z

Hi @anivegesana thanks for finding this issue!
I tested the python module and indeed, the scheduler does not stop after the DBSCAN computation.

The parlay/pbbs scheduler is designed for C++ programs, such that a unique copy of the scheduler is created globally, and the scheduler is only destroyed at program exit. The problem you identified points to the high utilization of the CPU between the end of the DBSCAN function call and the end of the Python session, and this is because the scheduler is still expecting more work (and the threads are spinning).

The solution you proposed is a good one in the sense that it still keeps the scheduler active in the background, but potentially without causing high CPU-utilization before the next DBSCAN call. The downside, like you said, being that it may require some comprehensive testing of the scheduler, which can be quite a lot of work.

I proposed a quick fix (9a27ef4) by adding a start and stop function for the scheduler, and calling them at the correct places. Essentially, a new copy of the scheduler is responsible for each DBSCAN function call, rather than one scheduler being active during the entire Python session. The downside is that we need to start and stop the scheduler at the correct places in the code, and maybe there is some scheduler initialization overhead.

For the work (using conditional variable and atomic wait) you have in progress, I think it is great and we can keep it for future releases. I also encourage you to submit a PR for the parlaylib to help them fix the same scheduler issue. Let me know what you think.

anivegesana · 2023-01-03T20:44:18Z

Happy new year, @wangyiqiu!

Sounds good. It would be easier to just add a temporary fix and create an issue on the parlaylib repo. That repo probably has more comprehensive tests that would be able to tell if the fix works. Eventually, when they get a general solution, we can upgrade dbscan to use the latest parlaylib version.

wangyiqiu · 2023-01-03T22:32:44Z

Sounds good @anivegesana Happy new year to you, too!

anivegesana · 2023-01-04T21:08:12Z

Created the issue at cmuparlay/parlaylib#33. I will test my solution against their test cases and create a PR there after I get it to pass. I think the workaround 9a27ef4 that you pushed yesterday is good enough for this package, if you want to push another release.

wangyiqiu · 2023-01-06T23:30:33Z

Thanks, it looks good.

I published v0.0.11, and it seems to work well. Before publishing it I encountered some errors which seems to be coming from Python test configuration: https://github.com/wangyiqiu/dbscan-python/actions/runs/3858728146
Currently I temporarily disabled the Python tests. Will look into it soon.

I also created a v0.0.12 branch for the next version: https://github.com/wangyiqiu/dbscan-python/tree/0.0.12-dev

anivegesana · 2023-01-06T23:49:52Z

Hmm. That looks odd. I guess that the Python tests weren't being run on non-master branches so I didn't notice that I removed the "py36" extra that the test cases use.

CIBW_TEST_EXTRAS: "test36"

The extras are specified in setup.py. I was supposed to add another extra 'py36': ['scikit-learn', 'matplotlib', 'pytest'],.

Will look into it soon and verify if that was the issue.

I will also see if I can get the images to show in PyPI.

anivegesana · 2023-01-08T20:55:36Z

Somewhat related to the issue from yesterday, it might be a good idea to swap Option 1 and Option 2 in the README. Putting the C++ instructions first might make it seem like that there is some benefit in rebuilding the pure C++ version over using the prebuilt Python one. Let me know if you think this is a good idea.

Also, I noticed that the large paragraph on how to build the Python module from scratch seems a bit Ubuntu specific and has some information that isn't relevant anymore since it is now automated.

To build from scratch for testing, this command seems sufficient (with the current directory set to the repo's root.)

pip install -e .

Finally, a small thing that you probably are doing right already. Are you calling the both the C++ and sklearn versions using multiple threads? In case if you aren't, there is an n_jobs parameter in the sklearn version that needs to be set to -1 to turn on multithreading:

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

wangyiqiu · 2023-01-09T17:15:46Z

Hi @anivegesana
Thanks for the suggestions!

Yes let's swap option 1 and 2. I agree we should let the users use the prebuilt Python. Like you suggested, let's also swap the Ubuntu-specific description with pip install -e ..

I had actually being testing against the sequential sklearn since I did not know they can also run in parallel! I will rerun the tests to compare against the parallel sklearn and update the figure / number.

anivegesana · 2023-01-09T17:28:47Z

Maybe having four columns (sequential for each, parallel for each) could be helpful. I don't know if the sizes of the bars would be legible if the two parallel ones are too similar in order of magnitude. Do you think using a log scale would make it clearer?

wangyiqiu · 2023-01-09T18:33:47Z

Sounds good, I will make four columns. Some of the bars can be quite similar. I will use the log scale when that happens.

wangyiqiu and others added 10 commits November 29, 2022 14:30

Update build_wheels.yml

256eaaa

Fix windows compatibility

814c4d5

Merge branch '0.0.10-dev' of https://github.com/wangyiqiu/dbscan-python…

699e3e2

… into 0.0.10-dev

Automatic versioning from Git tags

7a5ee1a

Fix windows compatibility

4800b1e

Fix windows compatibility

551a517

Fix static allocation

c2beb32

Fix static allocation

afbe1b4

Update build_wheels.yml

600fa4a

Merge remote-tracking branch 'upstream/0.0.10-dev' into sklearn-wrapper

5f1bc0a

Remove x86 Mac builds from development builds

8021f98

anivegesana changed the base branch from 0.0.10-dev to master December 1, 2022 04:50

wangyiqiu merged commit 46e6028 into wangyiqiu:master Dec 1, 2022

anivegesana deleted the sklearn-wrapper branch December 13, 2022 02:23

Conversation

anivegesana commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Dec 1, 2022

Uh oh!

anivegesana commented Dec 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Dec 6, 2022

Uh oh!

anivegesana commented Dec 6, 2022

Uh oh!

anivegesana commented Dec 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Dec 11, 2022

Uh oh!

anivegesana commented Dec 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Dec 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Dec 13, 2022

Uh oh!

anivegesana commented Dec 13, 2022

Uh oh!

anivegesana commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Jan 3, 2023

Uh oh!

anivegesana commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Jan 3, 2023

Uh oh!

anivegesana commented Jan 4, 2023

Uh oh!

wangyiqiu commented Jan 6, 2023

Uh oh!

anivegesana commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyiqiu commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anivegesana commented Jan 9, 2023

Uh oh!

wangyiqiu commented Jan 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anivegesana commented Nov 30, 2022 •

edited

Loading

anivegesana commented Nov 30, 2022 •

edited

Loading

wangyiqiu commented Nov 30, 2022 •

edited

Loading

anivegesana commented Dec 1, 2022 •

edited

Loading

anivegesana commented Dec 4, 2022 •

edited

Loading

anivegesana commented Dec 10, 2022 •

edited

Loading

anivegesana commented Dec 11, 2022 •

edited

Loading

anivegesana commented Dec 11, 2022 •

edited

Loading

anivegesana commented Dec 13, 2022 •

edited

Loading

anivegesana commented Dec 16, 2022 •

edited

Loading

anivegesana commented Jan 3, 2023 •

edited

Loading

anivegesana commented Jan 6, 2023 •

edited

Loading

anivegesana commented Jan 8, 2023 •

edited

Loading

wangyiqiu commented Jan 9, 2023 •

edited

Loading