-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add macos build test #3994
Conversation
~/.triton/nvidia | ||
~/.triton/pybind11 | ||
~/.triton/json | ||
key: ${{ runner.os }}-${{ runner.arch }}-llvm-${{ steps.cache-key.outputs.llvm }}-nvidia-${{ steps.cache-key.outputs.nvidia }}-pybind11-${{ steps.cache-key.outputs.pybind11 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copy-pasting sections like this one, can we use the YAML anchors? Replace this section with
- *cache-build-dependencies-step
(Also it seems that we're caching ~/.triton/json here but I don't see it in the CUDA runner. That's part of the reason to use the anchors, so that code that's supposed to be the same is the same.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'll do that. Just trying to make the build runner pass first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty ty
Sorry that there's some unused cruft in here. I'm happy to remove it if after you're done with your changes if @ThomasRaoux is ok with that.
@@ -155,7 +155,7 @@ jobs: | |||
runs-on: ${{ matrix.runner }} | |||
timeout-minutes: 30 | |||
env: | |||
CMAKE_BUILD_TYPE: "MIN_SIZE_REL" | |||
CMAKE_BUILD_TYPE: "Debug" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to over-communicate, I would not expect "Debug" or "MinSizeRel" is the fastest build; I'd expect -O1 is probably faster than both.
But also seeing as Debug seems to be the same speed as Release (15m) maybe something else is causing the slowness...
github is not loading the logs for the most recent build so I wasn't able to investigate this myself, but it may be worth investigating if you haven't how much of the 12m30s build step is due to downloading dependencies versus actually building Triton. IIRC downloading the CUDA and nvidia dependencies was slow for Linux, so we started caching them. But I'm not sure if the cache works here yet. |
Hi @jlebar, it's ready for re-review. :) |
brew install ccache | ||
brew install llvm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brew install ccache | |
brew install llvm | |
brew install ccache llvm |
TRITON_BUILD_WITH_O1: "true" | ||
# macos-latest has a 3-core M1 vcpu | ||
# https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories | ||
MAX_JOBS: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite the right comment. ninja -j
will read the number of CPUs and launch f(ncpu) jobs, I think 2*ncpu.
You're restricting to 3 jobs and because I presume you found it's faster? Probably due to limited RAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Due to limited RAM.
I'd like to launch ncpu
instead of 2*cpu
because M1 doesn't have hyperthreading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, just suggest updating the comment to explain the reason, which isn't really that M1 only has 3 cores.
(Also usually build systems will do 2ncpu even if there's no hyperthreading, because sometimes jobs are I/O bound. Indeed I think they probably do 2nvcpu, i.e. if M1 had hyperthreading it would by default launch 12 jobs.)
python3 -m pip install cython setuptools wheel cmake==3.24 ninja pytest-xdist lit | ||
- name: Install Triton | ||
env: | ||
TRITON_BUILD_WITH_CCACHE: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see us caching the ccache directory, in which case ccache is just overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot the caching step...
@@ -35,7 +35,8 @@ endif() | |||
|
|||
# Check if the platform is MacOS | |||
if(APPLE) | |||
set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup -flto") | |||
set(CMAKE_SHARED_LIBRARY_SUFFIX ".so") | |||
set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup") | |
# Other platforms build with -flto, but we found that this adds significant overhead to our macos CI without providing a major benefit. | |
set(PROTON_PYTHON_LDFLAGS "-undefined dynamic_lookup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo, I'm excited for more CI.
LGTM! |
No description provided.