Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make createExclusive atomic: fix GcsTransactionLogSynchronizer atomicity with GCS native filesystem implementation #20180

Merged
merged 2 commits into from
Dec 21, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Dec 20, 2023

GcsTransactionLogSynchronizer requires that the new transaction log file is created exclusively and atomically. Otherwise concurrent readers of a table may see a file partially written (or not written yet, just empty) and fail. This commit:

  • fixes the GCS native filesystem implementation so that it's atomic
  • changes the method signature to indicate atomic creation and remove default not atomic implementation.
  • makes it clear in-memory buffering occurs (previously it was implicitly done in HdfsOutputFile which could be considered surprising)

Fixes #20168
Required by #19991

@findepi findepi force-pushed the findepi/make-createexclusive-atomic-8dcb09 branch 2 times, most recently from 42075fc to 220dc37 Compare December 20, 2023 15:08
@findepi findepi force-pushed the findepi/make-createexclusive-atomic-8dcb09 branch from 220dc37 to a6e20b0 Compare December 20, 2023 15:11
/**
* Specifies whether implementation supports {@link TrinoOutputFile#create()} is exclusive.
*/
protected boolean isCreateExclusive()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two methods are pretty close in naming which can be rather misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why they got a javadoc. do you want to suggest better names?

When defining test profiles, we don't use properties.
@findepi findepi force-pushed the findepi/make-createexclusive-atomic-8dcb09 branch from 315917c to 6297459 Compare December 20, 2023 20:44
@@ -63,11 +71,22 @@ public abstract class AbstractTestTrinoFileSystem

protected abstract void verifyFileSystemIsEmpty();

protected boolean supportsCreateExclusive()
/**
* Specifies whether implementation supports {@link TrinoOutputFile#create()} is exclusive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop supports

`GcsTransactionLogSynchronizer` requires that the new transaction log file
is created exclusively and atomically. Otherwise concurrent readers of a
table may see a file partially written (or not written yet, just empty)
and fail. This commit:

- fixes the GCS native filesystem implementation so that it's atomic
- changes the method signature to indicate atomic creation and
  remove default not atomic implementation.
- makes it clear in-memory buffering occurs (previously it was
  implicitly done in `HdfsOutputFile` which could be considered
  surprising)
- in `AbstractTestTrinoFileSystem` decouples "is create() exclusive" and
  "supports createExclusive" behaviors. For example local filesystem has
  the former, GCS filesystem has both and S3 filesystem has none.
@losipiuk losipiuk force-pushed the findepi/make-createexclusive-atomic-8dcb09 branch from 6297459 to b23725c Compare December 21, 2023 08:14
@losipiuk losipiuk merged commit 4829f69 into master Dec 21, 2023
2 of 5 checks passed
@losipiuk losipiuk deleted the findepi/make-createexclusive-atomic-8dcb09 branch December 21, 2023 08:14
@github-actions github-actions bot added this to the 436 milestone Dec 21, 2023
@findepi findepi changed the title Make createExclusive atomic Make createExclusive atomic: fix GcsTransactionLogSynchronizer atomicity with GCS native filesystem implementation Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

Delta Lake table metadata corruption during writes on GCS when using native fs
3 participants