Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert FileSource support for empty directories #1622

Merged
merged 1 commit into from
Nov 17, 2016
Merged

Revert FileSource support for empty directories #1622

merged 1 commit into from
Nov 17, 2016

Conversation

isnotinvain
Copy link
Contributor

Unfortunately, there is a pretty serious race condition between Source.createTaps() and Source.validateTaps(). The empty directory support has changed this race condition from being fatal (which is generally OK, as it throws InvalidSourceException) to simply being wrong.

What happens is, createTaps() sees an empty directory and prepares a memory tap for it.
Then, data + success files show up in the dir before validateTaps() is called. When validateTaps() is called, it sees the data + success files and allows the job to proceed. But the job proceeds with an empty tap for the directory. The old behavior of stopping because the data isn't there seems to be preferable.

This PR reverts the support for empty directories in FileSource / SuccessFileSource.

I will follow up with two things after this PR:

  1. A trait AllowsEmptySuccessFileSource for opting in to empty directory support
  2. A more general fix for the race between validateTaps() and createTaps(). This is tricky because it requires more coordinate between these two methods than the API currently requires. It can be fixed by querying HDFS only once, but it probably won't be compatible with the many subclasses of FileSource. I think at best we can make a race-condition-free variant available, but subclasses might need to take care to preserve the fix.

@piyushnarang
Copy link
Collaborator

👍 when we're rolling back the change (as AllowsEmptySuccessFileSource) we should include the fields related fixes & tests as well.

@johnynek
Copy link
Collaborator

👍

can we test for this race in the follow up (knowing that it can be an issue for Twitter)?

@isnotinvain
Copy link
Contributor Author

thanks!

Yes, I'd like a test for the race condition too. I will include that in the follow up (might be tricky we'll see)

@isnotinvain
Copy link
Contributor Author

And yes, we'll want to add the removed tests back in when we add this feature back

@isnotinvain isnotinvain merged commit 3c72dd7 into twitter:develop Nov 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants