-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45505: [CI][R] Use Ubuntu 22.04 instead of 20.04 as much as possible for nightly jobs #45507
Conversation
@github-actions crossbow submit -g r |
|
@@ -313,8 +313,8 @@ jobs: | |||
# fedora-clang-devel cannot use binaries bc of libc++ (uncomment to see the error) | |||
# - {image: "rhub/fedora-clang-devel", libarrow_binary: "TRUE"} | |||
- {image: "rhub/ubuntu-release"} # currently ubuntu-22.04 | |||
- {image: "rocker/r-ver:4.0.0"} # ubuntu-20.04 | |||
- {image: "rstudio/r-base:4.1-focal"} | |||
- {image: "rstudio/r-base:4.0-jammy"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK?
I assume that R 4.0 (not Ubuntu 20.04) is important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonkeane Do you have any opinion for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you're absolutely right that 4.0 is important and not the version — that comment was mostly to call attention to the fact that rocker/r-ver
would end up being ubuntu 20.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming this!
Revision: be959e5 Submitted crossbow builds: ursacomputing/crossbow @ actions-b1cf7ac8c9 |
@@ -89,7 +89,7 @@ jobs: | |||
- { old_arrow_version: '1.0.1', r: '4.0' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove 1.0.1
- 5.0.0
?
It seems that we don't have Arrow C++ binaries for them on Ubuntu 22.04.
https://github.com/ursacomputing/crossbow/actions/runs/13279720095/job/37133898415#step:7:21
Failure ('test-read-files.R:25:3'): Can read the file (parquet)
`df <- read_parquet(pq_file)` threw an error.
Message: Cannot call io___MemoryMappedFile__Open(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries.
Class: simpleError/error/condition
Backtrace:
▆
1. ├─testthat::expect_error(df <- read_parquet(pq_file), NA) at test-read-files.R:25:3
2. │ └─testthat:::quasi_capture(...)
3. │ ├─testthat (local) .capture(...)
4. │ │ └─base::withCallingHandlers(...)
5. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
6. └─arrow::read_parquet(pq_file)
7. └─arrow:::make_readable_file(file)
8. └─arrow::mmap_open(file)
9. └─arrow:::io___MemoryMappedFile__Open(path, mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, we could, but would it be possible to install them from source instead? Or is that not possible at this point?
It would be great to be able to assert that we are still backwards compatible, though in practice going back this far is probably not super important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems that binary "arrow" package (not source "arrow" package) is used:
https://github.com/ursacomputing/crossbow/actions/runs/13279720095/job/37133898415#step:4:275
* installing *binary* package ‘arrow’ ...
Can we install Apache Arrow C++ from source when we use binary "arrow" package...?
@@ -89,7 +89,7 @@ jobs: | |||
- { old_arrow_version: '1.0.1', r: '4.0' } | |||
env: | |||
ARROW_R_DEV: "TRUE" | |||
RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" | |||
RSPM: "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we moved this later and did something like:
- name: Set RSPM
if: ${{ matrix.config.use_rspm }}
run: echo "RSPM =https://packagemanager.rstudio.com/cran/__linux__/jammy/latest" >> $GITHUB_ENV
And then set use_rspm
in the matrix:
- { old_arrow_version: '14.0.2.1', r: '4.3', use_rspm = true }
- { old_arrow_version: '13.0.0.1', r: '4.3', use_rspm = true }
- { old_arrow_version: '12.0.1.1', r: '4.3', use_rspm = true }
- { old_arrow_version: '11.0.0.3', r: '4.2', use_rspm = true }
- { old_arrow_version: '10.0.1', r: '4.2', use_rspm = true }
- { old_arrow_version: '9.0.0', r: '4.2', use_rspm = true }
- { old_arrow_version: '8.0.0', r: '4.2', use_rspm = true }
- { old_arrow_version: '7.0.0', r: '4.1', use_rspm = true }
- { old_arrow_version: '6.0.1', r: '4.1', use_rspm = true }
- { old_arrow_version: '5.0.0', r: '4.1', use_rspm = false }
- { old_arrow_version: '4.0.0', r: '4.0', use_rspm = false }
- { old_arrow_version: '3.0.0', r: '4.0', use_rspm = false }
- { old_arrow_version: '2.0.0', r: '4.0', use_rspm = false }
- { old_arrow_version: '1.0.1', r: '4.0', use_rspm = false }
That should prevent setting RSPM and then it will use source installs when that's not set. This might not work / have other issues, but it's worth trying IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's try it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, it may be better that we do this backward compatibility test in C++ not R...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be better that we do this backward compatibility test in C++ not R...
Yeah, totally. There are some R-specific pieces of metadata that it's nice to test, but if we did have c++ compatibility testing for the underlying parquet implementation, the decision to drop 1.0.1 - 5.0.0 testing for R does become less of a worry (yes, the R-specific metadata might not work, but that's also not so catastrophic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I didn't know that these tests have some R-specific checks.
@github-actions crossbow submit test-r-arrow-backwards-compatibility |
This comment was marked as outdated.
This comment was marked as outdated.
8bdd4ef
to
3636c5c
Compare
@github-actions crossbow submit test-r-arrow-backwards-compatibility |
Revision: 3636c5c Submitted crossbow builds: ursacomputing/crossbow @ actions-4f1a2ba4a1
|
We got a few more versions with that! 1.0.1–3.0.0 look like they fail when trying to build some of our dependencies. That's not totally surprising, I guess. Arrow 3.0.0 was just over 4 years ago, so maybe we do drop those from the matrix. And especially if we had C++ tests for this kind of backwards compatibility like you mention @kou that would give us more confidence too. One other alternative (though I think it's more work than it's worth right here / right now) would be to refactor this action so that the old version of arrow is run on an older OS that it is compatible with. |
We can use Ubuntu 20.04 for 1.0.0-3.0.0 but we need to remove them eventually because Ubuntu 20.04 will reach EOL. Should we keep 1.0.0-3.0.0 while we don't get any error with Ubuntu 20.04? Or can we drop 1.0.0-3.0.0 in this PR? |
Let's keep them for now (with 20.04, and we might / will eventually get an error) and I can look into what we want to do going forward |
3636c5c
to
eb57ce6
Compare
@github-actions crossbow submit test-r-arrow-backwards-compatibility |
Revision: 83c378f Submitted crossbow builds: ursacomputing/crossbow @ actions-288e5d5ed0
|
OK! Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the work on this ❤️
# condition && true-case || false-case | ||
# == | ||
# condition ? true-case : false-case | ||
# | ||
# We use Ubuntu 20.04 for 3.0.0 or earlier because 3.0.0 or | ||
# earlier can't be built on Ubuntu 22.04. We will drop 3.0.0 or | ||
# earlier when Ubuntu 20.04 is unavailable on GitHub Actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment + the helpful hint about how these work — I always have to go and look these up when I edit them.
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 15f1d94. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…possible for nightly jobs (apache#45507) ### Rationale for this change Ubuntu 20.04 will reach EOL on 2025-05. ### What changes are included in this PR? * Use Ubuntu 22.04 instead of Ubuntu 20.04 for Apache Arrow C++ 4.0.0 or later. * Keep using Ubuntu 20.04 for Apache Arrow C++ 3.0.0 or earlier because we can't build Apache Arrow C++ 3.0.0 or earlier on Ubuntu 22.04. We can use pre-built binaries forApache Arrow C++ 3.0.0 or earlier on Ubuntu 20.04. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#45505 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
Ubuntu 20.04 will reach EOL on 2025-05.
What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
No.