-
Notifications
You must be signed in to change notification settings - Fork 288
feat(ci): Split Java Gradle CI in many jobs to reduce execution time #1897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/gradle.yml
Outdated
steps: | ||
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 | ||
- name: Set up JDK 21 | ||
- name: Set up JDK 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the ci before runs whole check, build, and tests on JDK 21 because Polaris is currently guarantee the compatibility with java 21, and then we are only doing selected check with java 23.
If we just update the JDK version to 23, that means we will loose the test coverage for 21, right? if the intension of this PR is to just split the ci into multiple jobs, can we keep the java 21 setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry that wasn't my intention. I can set the "Style check" job to 21, would that be enough? If not, I'm afraid we'd need a matrix job.
Also, tbh we should upgrade 23 to 24 but I don't want to introduce any unwanted changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original CI have the following coverage for 21
run: ./gradlew --continue check
which I believe runs all tests, and on 23 it only does compilation and integration tests. However, with the current change, it seems we are doing full test coverage with java 23, but only style check with 21. Can we still retain the full test coverage with 21, limited test with 23 like before ? so that this is a pure refactoring task
If we also want to introduce full coverage with other jdk version like 23, we can do in follow up PR to introduce a test matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that will limit the amount of parallelism to just 2 jobs, but if that's what you want, OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back to 3 jobs, 2 with 21 and 1 with 23, because 2 jobs only was too slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All jobs on 21 now, as per @snazy suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, for the confusion. i don't really mean still run ./gradlew --continue check. what i mean is when we are breaking it into jobs, make sure the tasks runs on 21 still provides the fully test coverage, include unit test, initTest, style check, publishToMaven etc. For 23 it can remain just have one job with coverage for integration test.
However, it seems we are now removing the coverage on 23 now because 23 is going to be end of like, which i think should be fine.
If we want to further breakdown the job runs with 21 to more jobs, i am also fine with that.
.github/workflows/gradle.yml
Outdated
permissions: | ||
contents: read | ||
steps: | ||
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 | ||
- name: Set up JDK 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23 is EOL - and 24 won't work. Probably better to go back to 21?
|
||
unit-tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update .asf.yaml
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to have a problem to merge this PR because the build
job is still marked as required, but it doesn't exist anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the old build
job around to make it possible to merge this PR. We'd need to remove that job in a separate PR.
.github/workflows/gradle.yml
Outdated
- name: Run Quarkus tests | ||
run: | | ||
./gradlew \ | ||
:polaris-quarkus-service:test \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we isolate quarkus test
, but not intTest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume intTest
takes the longest time in general, so it might be preferable to break it out if the goal is to reduce CI time 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we isolate quarkus test, but not intTest?
Because these two jobs are roughly taking 8 minutes each now, so that sounded like a balanced choice. In fact, the intTest
job is not necessarily the longest the job currently 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: the job is named "Quarkus tests" for simplicitly but it actually runs all the tests in polaris-runtime-service
and polaris-admin
.
We could isolate only the tests annotated with @QuarkusTest
or QuarkusMainTest
but that's not a trivial thing to do with Gradle, so I went for simplicity. We can revisit that later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Please consider my comments optional.
Just figured out that we should have another change to either this PR or a followup around Gradle caches. |
@snazy the caches seem to be working afaict; what makes you say they are "borked"? |
"Last Gradle cache store" wins. The next job will read the state of that "last one". So you either have test, intTest or ... - but not a merged state of all. This is what Nessie CI does - merging the cache state of all jobs. |
OK, I will tackle that in a follow-up since this one was approved already. |
Hey @adutra, did this possibly break the tests? I notice that the new tests are stuck in a pending state and PRs can't be merged. ![]() |
Yes, this was expected unfortunately. If you rebase the PR it should get unstuck. |
Since apache#1897, the jobs in gradle.yaml changed and the "build" job was split into many smaller jobs. But since it was a required job, it couldn't be removed immediately.
Since #1897, the jobs in gradle.yaml changed and the "build" job was split into many smaller jobs. But since it was a required job, it couldn't be removed immediately.
This is a simple attempt to reduce the CI execution time for pull requests. Instead of executing all tests in the same job, it parallelizes the tests in 4 jobs:
This could certainly be improved further, but I observed a total execution time around 10 minutes instead of 20 minutes before (at the cost of more compute power, of course).