Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glacier skipping logic #21164

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

bangtim
Copy link
Contributor

@bangtim bangtim commented Mar 19, 2024

Description

This PR supersedes this one --> #18237

TrinoS3FileSystem contains logic/configuration for skipping Glacier objects found in S3.

This PR adds the missing functionality/configuration from TrinoS3FileSystem and also adds the capability to configure the filesystem to read restored Glacier objects.

Additional context and related issues

Recently, S3 added the RestoreStatus to the result of ListObjectsV2 - see https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-s3-restore-status-s3-glacier-objects-list-api/ for more details.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive
* Add support for reading S3 objects restored from glacier storage ({issue}`21164`)

Copy link

cla-bot bot commented Mar 19, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim marked this pull request as ready for review March 19, 2024 17:04
@bangtim bangtim marked this pull request as draft March 19, 2024 17:51
Copy link

cla-bot bot commented Mar 19, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

4 similar comments
Copy link

cla-bot bot commented Mar 19, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Mar 19, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Mar 21, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Mar 21, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@electrum
Copy link
Member

Please see my comment on the previous PR: #18237 (comment)

Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 2e75199 to 874fcd0 Compare March 28, 2024 17:38
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 874fcd0 to 3d6e7df Compare March 28, 2024 18:15
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 3d6e7df to 56a50ca Compare March 28, 2024 18:16
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 56a50ca to 4435227 Compare March 28, 2024 18:21
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 4435227 to 45850ca Compare March 28, 2024 18:25
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the hive Hive connector label Mar 28, 2024
@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 45850ca to 1b5a155 Compare March 28, 2024 20:36
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 1b5a155 to bd5cbaa Compare March 28, 2024 21:33
Copy link

cla-bot bot commented Mar 28, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@bangtim bangtim changed the title adding glacier skipping logic to s3filesystem adding glacier skipping logic Mar 28, 2024
@bangtim bangtim changed the title adding glacier skipping logic Add glacier skipping logic Apr 5, 2024
@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 9ae2cf2 to abd0130 Compare April 5, 2024 14:00
@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from abd0130 to aca6c00 Compare April 8, 2024 19:35
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from aca6c00 to 32b7d9d Compare April 8, 2024 20:04
@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 32b7d9d to fec1cb4 Compare April 8, 2024 20:40
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, the test failure appears to be caused by #20326.


@Config("hive.s3.storage-class-filter")
public HiveConfig setS3StorageClassFilter(S3StorageClassFilter s3StorageClassFilter)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this configuration in HiveS3Config along with the other hive.s3.* properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this comment on the previous PR, I believe that HiveConfig is the correct place to put it. Additionally, the trino-hive module (where HiveConfig lies) cannot access the trino-hdfs module (where HiveS3Config lies) directly

@pettyjamesm
Copy link
Member

/test-with-secrets sha=47508e987aa6c4367698854bffdfc2293bd74e3c

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8654423316

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

void testS3FileIteratorFileEntryTags()
throws IOException
{
try (S3Client s3Client = createS3Client()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for the restored object as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe testing for restored glacier object may be a little difficult since glacier restores are extremely slow. Testing glacier objects is more intuitive as we can mock an s3 bucket with a glacier object; however, in the putobject request there is no way we can manipulate the restore status

@bangtim bangtim force-pushed the adding-glacier-skipping-logic branch from 47508e9 to 1695cdf Compare April 17, 2024 00:40
@pettyjamesm pettyjamesm merged commit 9962596 into trinodb:master Apr 23, 2024
68 checks passed
@pettyjamesm
Copy link
Member

Merged, thanks @bangtim!

@github-actions github-actions bot added this to the 446 milestone Apr 23, 2024
@bangtim bangtim deleted the adding-glacier-skipping-logic branch April 25, 2024 19:16
@bangtim bangtim restored the adding-glacier-skipping-logic branch April 25, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

7 participants