[CI] Add macos build test #3994

Jokeren · 2024-05-24T17:20:10Z

No description provided.

jlebar · 2024-05-24T17:27:14Z

.github/workflows/integration-tests.yml.in

+            ~/.triton/nvidia
+            ~/.triton/pybind11
+            ~/.triton/json
+          key: ${{ runner.os }}-${{ runner.arch }}-llvm-${{ steps.cache-key.outputs.llvm }}-nvidia-${{ steps.cache-key.outputs.nvidia }}-pybind11-${{ steps.cache-key.outputs.pybind11 }}


Instead of copy-pasting sections like this one, can we use the YAML anchors? Replace this section with

- *cache-build-dependencies-step

(Also it seems that we're caching ~/.triton/json here but I don't see it in the CUDA runner. That's part of the reason to use the anchors, so that code that's supposed to be the same is the same.)

No problem, I'll do that. Just trying to make the build runner pass first.

ty ty

Sorry that there's some unused cruft in here. I'm happy to remove it if after you're done with your changes if @ThomasRaoux is ok with that.

jlebar · 2024-05-27T04:15:16Z

.github/workflows/integration-tests.yml

@@ -155,7 +155,7 @@ jobs:
    runs-on: ${{ matrix.runner }}
    timeout-minutes: 30
    env:
-      CMAKE_BUILD_TYPE: "MIN_SIZE_REL"
+      CMAKE_BUILD_TYPE: "Debug"


Just to over-communicate, I would not expect "Debug" or "MinSizeRel" is the fastest build; I'd expect -O1 is probably faster than both.

But also seeing as Debug seems to be the same speed as Release (15m) maybe something else is causing the slowness...

jlebar · 2024-05-27T04:20:50Z

github is not loading the logs for the most recent build so I wasn't able to investigate this myself, but it may be worth investigating if you haven't how much of the 12m30s build step is due to downloading dependencies versus actually building Triton. IIRC downloading the CUDA and nvidia dependencies was slow for Linux, so we started caching them. But I'm not sure if the cache works here yet.

Jokeren · 2024-05-28T12:54:33Z

Hi @jlebar, it's ready for re-review. :)

jlebar · 2024-05-28T16:43:22Z

.github/workflows/integration-tests.yml.in

+          brew install ccache
+          brew install llvm


Suggested change

brew install ccache

brew install llvm

brew install ccache llvm

jlebar · 2024-05-28T16:45:01Z

.github/workflows/integration-tests.yml.in

+          TRITON_BUILD_WITH_O1: "true"
+          # macos-latest has a 3-core M1 vcpu
+          # https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories
+          MAX_JOBS: 3


I don't think this is quite the right comment. ninja -j will read the number of CPUs and launch f(ncpu) jobs, I think 2*ncpu.

You're restricting to 3 jobs and because I presume you found it's faster? Probably due to limited RAM?

Yes. Due to limited RAM.

I'd like to launch ncpu instead of 2*cpu because M1 doesn't have hyperthreading.

Right, just suggest updating the comment to explain the reason, which isn't really that M1 only has 3 cores.

(Also usually build systems will do 2ncpu even if there's no hyperthreading, because sometimes jobs are I/O bound. Indeed I think they probably do 2nvcpu, i.e. if M1 had hyperthreading it would by default launch 12 jobs.)

jlebar · 2024-05-28T16:45:39Z

.github/workflows/integration-tests.yml.in

+          python3 -m pip install cython setuptools wheel cmake==3.24 ninja pytest-xdist lit
+      - name: Install Triton
+        env:
+          TRITON_BUILD_WITH_CCACHE: "true"


I don't see us caching the ccache directory, in which case ccache is just overhead.

Sorry, forgot the caching step...

jlebar · 2024-05-28T16:46:45Z

third_party/proton/CMakeLists.txt

@@ -35,7 +35,8 @@ endif()

 # Check if the platform is MacOS
 if(APPLE)
-  set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup -flto")
+  set(CMAKE_SHARED_LIBRARY_SUFFIX ".so")
+  set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup")


Suggested change

set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup")

# Other platforms build with -flto, but we found that this adds significant overhead to our macos CI without providing a major benefit.

set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup")

jlebar

Woo, I'm excited for more CI.

jlebar · 2024-05-28T17:17:06Z

LGTM!

Jokeren added 2 commits May 24, 2024 13:07

Update

766a1c2

Update

ded5a70

jlebar reviewed May 24, 2024

View reviewed changes

Jokeren added 18 commits May 24, 2024 13:29

Update

592f9b1

Update

b3d2e93

Update

a7ca6f4

Update

3721ca5

Update

0d2325e

Update

4953936

Update

bbd023b

Update

b5a4de9

try new clang

37cc847

Update

ae43371

Update path

b9df133

Hack

4e96ea5

Hack

016ed11

Update

588fc62

Update

ebeac37

Install new llvm

f6361b3

Update

28bb6ba

Update

3bcb4df

jlebar reviewed May 27, 2024

View reviewed changes

Jokeren added 7 commits May 27, 2024 09:17

Measure time, get clang file format

d1e9e04

Update

3a76fab

Update

1a0c8fa

Try macos latest

31856d2

try xl

b3e30ae

Fix json version; try smaller cpu count

287a189

Update

abe3b28

Jokeren added 9 commits May 27, 2024 12:00

Disable lto

fe3b0cb

remove max_jobs

af8da4f

Try O0 again

deac7fc

Restore to O1

aafa34a

Delete timing

f702953

Remove tests

a09acd9

Update

f372bc4

Update

ac3bfc5

Remove wrong build type

d81f44e

Jokeren marked this pull request as ready for review May 28, 2024 12:53

Jokeren requested a review from ptillet as a code owner May 28, 2024 12:53

jlebar reviewed May 28, 2024

View reviewed changes

jlebar approved these changes May 28, 2024

View reviewed changes

Jokeren added 3 commits May 28, 2024 12:57

Update ccache

a21a072

Add comment

734ac76

Update comment

675b638

Jokeren merged commit 706174d into main May 28, 2024
6 checks passed

Jokeren deleted the keren/macos branch May 28, 2024 17:22

jlebar mentioned this pull request May 30, 2024

Shared -> LinearLayout conversion #4038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add macos build test #3994

[CI] Add macos build test #3994

Jokeren commented May 24, 2024

jlebar May 24, 2024

Jokeren May 24, 2024 •

edited

Loading

jlebar May 24, 2024

jlebar May 27, 2024

jlebar commented May 27, 2024

Jokeren commented May 28, 2024

jlebar May 28, 2024

jlebar May 28, 2024

Jokeren May 28, 2024

jlebar May 28, 2024

jlebar May 28, 2024

Jokeren May 28, 2024

jlebar May 28, 2024

jlebar left a comment

jlebar commented May 28, 2024

	brew install ccache
	brew install llvm
	brew install ccache llvm

	set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup")
	# Other platforms build with -flto, but we found that this adds significant overhead to our macos CI without providing a major benefit.
	set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup")

[CI] Add macos build test #3994

[CI] Add macos build test #3994

Conversation

Jokeren commented May 24, 2024

Choose a reason for hiding this comment

Jokeren May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlebar commented May 27, 2024

Jokeren commented May 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlebar left a comment

Choose a reason for hiding this comment

jlebar commented May 28, 2024

Jokeren May 24, 2024 •

edited

Loading