Skip to content

GH-46214: [C++] Improve S3 client initialization #46723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jun 5, 2025

Rationale for this change

The default constructor of the S3ClientConfiguration class in the AWS SDK issues spurious EC2 metadata requests, even though we later set up the configuration values ourselves.

What changes are included in this PR?

  1. Avoid spurious EC2 metadata calls by disabling "IMDS" in the S3ClientConfiguration constructor
  2. Change the smart defaults from "legacy" to "standard" (see https://docs.aws.amazon.com/sdkref/latest/guide/feature-smart-config-defaults.html)
  3. Let the user configure the smart defaults in S3Options

Benchmarks on my local work machine:

  • on git main:
>>> %time fs.S3FileSystem(anonymous=True)
CPU times: user 316 μs, sys: 1.01 ms, total: 1.33 ms
Wall time: 2 s
<pyarrow._s3fs.S3FileSystem at 0x7d2182772fb0>
>>> %time fs.S3FileSystem(access_key='key', secret_key='secret')
CPU times: user 1.43 ms, sys: 0 ns, total: 1.43 ms
Wall time: 2 s
<pyarrow._s3fs.S3FileSystem at 0x7d22602337b0>
>>> %time fs.S3FileSystem()
CPU times: user 6.72 ms, sys: 3.27 ms, total: 9.99 ms
Wall time: 12 s
<pyarrow._s3fs.S3FileSystem at 0x7d226003bc30>
  • on this PR:
>>> %time fs.S3FileSystem(anonymous=True)
CPU times: user 199 μs, sys: 0 ns, total: 199 μs
Wall time: 203 μs
<pyarrow._s3fs.S3FileSystem at 0x7d6c401c01b0>
>>> %time fs.S3FileSystem(access_key='key', secret_key='secret')
CPU times: user 198 μs, sys: 10 μs, total: 208 μs
Wall time: 212 μs
<pyarrow._s3fs.S3FileSystem at 0x7d6c4b2e2c30>
>>> %time fs.S3FileSystem()
CPU times: user 13.5 ms, sys: 3.52 ms, total: 17.1 ms
Wall time: 14 s
<pyarrow._s3fs.S3FileSystem at 0x7d6c40158df0>

Are these changes tested?

By existing CI tests and configurations.

Are there any user-facing changes?

The default S3 settings are potentially changed. Hopefully this will not trigger any regression in behavior.

@pitrou
Copy link
Member Author

pitrou commented Jun 5, 2025

cc @apmorton

@pitrou pitrou force-pushed the gh46214-s3-client-config branch from d90e175 to cbf8592 Compare June 9, 2025 06:58
@pitrou pitrou force-pushed the gh46214-s3-client-config branch from cbf8592 to be5865a Compare June 24, 2025 09:54
@pitrou
Copy link
Member Author

pitrou commented Jun 24, 2025

@github-actions crossbow submit -g python -g cpp

@pitrou
Copy link
Member Author

pitrou commented Jun 24, 2025

@github-actions crossbow submit wheelcp313*

This comment was marked as outdated.

@apache apache deleted a comment from github-actions bot Jun 24, 2025
@pitrou
Copy link
Member Author

pitrou commented Jun 24, 2025

@github-actions crossbow submit -g python -g cpp

This comment was marked as outdated.

@pitrou pitrou force-pushed the gh46214-s3-client-config branch from be5865a to 80bf6ee Compare June 24, 2025 12:54
@pitrou
Copy link
Member Author

pitrou commented Jun 24, 2025

@github-actions crossbow submit -g python -g cpp

Copy link

Revision: 80bf6ee

Submitted crossbow builds: ursacomputing/crossbow @ actions-e43f4463ab

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@pitrou pitrou marked this pull request as ready for review June 24, 2025 15:50
@pitrou pitrou requested a review from kou June 24, 2025 15:50
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 6080ca6 into apache:main Jun 25, 2025
38 of 39 checks passed
@kou kou removed the awaiting review Awaiting review label Jun 25, 2025
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jun 25, 2025
@pitrou pitrou deleted the gh46214-s3-client-config branch June 25, 2025 07:03
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 6080ca6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants