Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing parallel ingest single worker case #898

Merged
merged 2 commits into from
Jun 8, 2017

Conversation

jortiz16
Copy link
Contributor

@jortiz16 jortiz16 commented Jun 8, 2017

Fixing an ingest bug for cases where parallel ingest only needed a single worker to read a file. With the old code, every worker would try to read the entire file (since workerIndex value was kept at 0). In other words, every worker would believe they were the first worker.

Now, workerIndex starts at -1 and is only assigned to read the file if it is found in the workerIds array. If workerIndex is kept at -1, we mark flagAsIncomplete as True which then makes sure to not initialize the parser for the current worker.

@jortiz16 jortiz16 requested a review from senderista June 8, 2017 00:33
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 26.966% when pulling 5be0e4d on CSVFragmentTupleSource-bug-fix into 262de2f on master.

Copy link
Contributor

@senderista senderista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

long startByteRange = currentPartitionSize * workerIndex;
long endByteRange;
if (workerIndex >= 0) {
boolean isLastWorker = workerIndex == workerIds.length - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parens around the condition would be more readable (I realize this was in the original).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 26.973% when pulling 8b29a4d on CSVFragmentTupleSource-bug-fix into 262de2f on master.

@jortiz16 jortiz16 merged commit 07ceb18 into master Jun 8, 2017
@jortiz16 jortiz16 deleted the CSVFragmentTupleSource-bug-fix branch June 8, 2017 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants