Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.

Checking for job process status inside a map task attempt when it's not ... #126

Conversation

vrushalic
Copy link
Collaborator

...the first attempt

@vrushalic
Copy link
Collaborator Author

The idea behind this pull request is to optimize the processing time when attempts fail and are restarted. In such cases, a previous attempt has already run through the process record and failed at a later point in processing. The next attempt that is brought up will now reprocess all the records that the previous (failed) attempt has already processed. This results in a lot of time spent in this hRaven job and the next hRaven run cant start up.
Hence this pull request first checks if this attempt is NOT the first attempt. If so, it will do a get from hbase and check the status of job processing for that record.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) when pulling 0a352dd on vrushalic:check_raw_job_status_before_processing_in_mapper into 748cd90 on twitter:twitter_only.

Get get = new Get(row);
get.addColumn(Constants.INFO_FAM_BYTES, Constants.JOB_PROCESSED_SUCCESS_COL_BYTES);

boolean success = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might be bit easier to read the code it this is named something like "processed" or "alreadyProcessed". "Success" seems a little confusing.

}
}
return processed;
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to capture only the kinds of exceptions that can be thrown by the code you run.
Let's make sure that we don't catch other exceptions such as InterruptedException, or OOME's.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.42%) when pulling 32245ca on vrushalic:check_raw_job_status_before_processing_in_mapper into 748cd90 on twitter:twitter_only.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.4%) when pulling 01b063c on vrushalic:check_raw_job_status_before_processing_in_mapper into 748cd90 on twitter:twitter_only.

jrottinghuis added a commit that referenced this pull request Dec 16, 2014
…ocessing_in_mapper

Checking for job process status inside a map task attempt when it's not ...
@jrottinghuis jrottinghuis merged commit 62364e6 into twitter:twitter_only Dec 16, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants