Skip to content

GH-40754: [Python] Expose tls_ca_file_path to S3FileSystem #45881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

bw513
Copy link
Contributor

@bw513 bw513 commented Mar 21, 2025

Rationale for this change

Currently, when using the pyarrow.fs.S3FileSystem, it is not immediately obvious how provide an alternative TLS certificate authority when working with (for example) a self-hosted minio using a self-signed certificate. It is possible to do this via environment variables, as per #40754; however, the user may not be able to easily specify these depending on their python execution environment.

What changes are included in this PR?

Adds an additional parameter to S3FileSystem to specify the file path to a custom certificate.

Are these changes tested?

Yes. A unit-test for S3FileSystem pickling is included; and I have manually verified that I can access files from a self-hosted minio with TLS enabled.

Are there any user-facing changes?

An additional parameter tls_ca_file_path is available on the S3FileSystem constructor.

Copy link

⚠️ GitHub issue #40754 has been automatically assigned in GitHub to PR creator.

@bw513 bw513 marked this pull request as draft March 21, 2025 14:38
@kou
Copy link
Member

kou commented Mar 21, 2025

@github-actions crossbow submit -g python

This comment was marked as outdated.

@pitrou pitrou self-requested a review March 24, 2025 16:55
@@ -406,6 +408,8 @@ cdef class S3FileSystem(FileSystem):
retry_strategy.max_attempts)
else:
raise ValueError(f'Invalid retry_strategy {retry_strategy!r}')
if tls_ca_file_path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if tls_ca_file_path:
if tls_ca_file_path is not None:

Comment on lines 260 to 267
tls_ca_file_path : str, default None
Use custom CA certificate for TLS verification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tls_ca_file_path : str, default None
Use custom CA certificate for TLS verification
tls_ca_file_path : str, default None
If set, this should be the path of a file containing TLS certificates
in PEM format which will be used for TLS verification.

@pitrou
Copy link
Member

pitrou commented Mar 27, 2025

Thanks for submitting this @bw513 . Is there a reason this PR is still in draft? I've posted two comments, feel free to update.

@bw513
Copy link
Contributor Author

bw513 commented Jun 13, 2025

Thanks for submitting this @bw513 . Is there a reason this PR is still in draft? I've posted two comments, feel free to update.

Really sorry, I got prioritised onto other work and am now back on this. I made it a draft as there seemed to be some different approaches being suggested on #40754. I'll fix-up based on the suggestions above and remove the draft. Thanks for the comments!

@bw513 bw513 force-pushed the pyarrow_tls_ca_file_path branch from ed19cc0 to 1fc1a46 Compare June 13, 2025 09:46
@bw513 bw513 marked this pull request as ready for review June 13, 2025 09:46
@bw513 bw513 requested review from AlenkaF, raulcd and rok as code owners June 13, 2025 09:46
@bw513 bw513 force-pushed the pyarrow_tls_ca_file_path branch from 1fc1a46 to f477507 Compare June 13, 2025 09:48
@bw513
Copy link
Contributor Author

bw513 commented Jun 20, 2025

If I understand correctly, the failure is due to R format which is probably an unrelated issue. Should I rebase off a stable release such as apache-arrow-20.0.0?

@kou
Copy link
Member

kou commented Jun 20, 2025

Could you rebase on the current main?

@bw513 bw513 force-pushed the pyarrow_tls_ca_file_path branch from f477507 to 2adc66e Compare June 20, 2025 08:25
@bw513 bw513 force-pushed the pyarrow_tls_ca_file_path branch from 2adc66e to 61bc36e Compare June 20, 2025 09:18
@bw513
Copy link
Contributor Author

bw513 commented Jun 20, 2025

That looks like it works, thanks! I have also realised that I missed making the test a bit more specific as suggested, so I have fixed that now too.

@pitrou
Copy link
Member

pitrou commented Jun 25, 2025

@github-actions crossbow submit -g python

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'll just wait for CI

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 25, 2025
Copy link

Revision: 61bc36e

Submitted crossbow builds: ursacomputing/crossbow @ actions-643d9fc53b

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jun 25, 2025

CI failures are unrelated.

@pitrou pitrou merged commit 4636e8a into apache:main Jun 25, 2025
13 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jun 25, 2025
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 4636e8a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants