Spark history file processing #124

vrushalic · 2014-11-07T23:29:35Z

Enables the preprocessing, rawloader and processing steps to process Spark history/conf files.

Ensures the "spark" prefix is applied to spark jobs, but backwards compatibility is maintained for hadoop jobs. hRaven till now presumed the prefix to be "job" for all job ids. All the JobIds, JobKeys, Row keys did not contain this prefix , but now we need to add existence of such a prefix to include spark files.

Adds SparkJobDescFactory, JobHistoryFileParserSpark classes

If the processing step is going to be run on a hadoop1 cluster, then jackson-1.8.8 libraries need to occur in the classpath before the jackson-1.5.2 libraries . The jackson 1.5.2 libraries come with hadoop.
Hence we need set -Dmapreduce.task.classpath.user.precedence=true parameter at the processing step and add $(ls /usr/local/hbase/lib/jackson-core-asl-.jar),$(ls /usr/local/hbase/lib/jackson-mapper-asl-.jar) to the libjars at the processing step for hadoop1 clusters.

coveralls · 2014-11-07T23:50:06Z

Coverage increased (+7.76%) when pulling 34f6e78 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

…iable names,adding a Spark Job Desc Factory class

coveralls · 2014-11-11T20:31:42Z

Coverage increased (+8.0%) when pulling a2dbc93 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

coveralls · 2014-11-11T20:44:59Z

Coverage increased (+8.0%) when pulling a2dbc93 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

sjlee · 2014-11-14T22:09:17Z

hraven-core/src/main/java/com/twitter/hraven/datasource/JobIdConverter.java

-    return Bytes.add(Bytes.toBytes(jobId.getJobEpoch()),
-        Bytes.toBytes(jobId.getJobSequence()));
+    String prefix = jobId.getJobPrefix();
+    if ((StringUtils.isNotBlank(prefix) && (JobId.JOB_PREFIX.equalsIgnoreCase(prefix)))


I thought that JobId.JOB_PREFIX.equalsIgnoreCase(prefix) is not needed?

Given that JobId never has prefix = JOB_PREFIX, can't we remove the latter check?

coveralls · 2014-11-18T21:00:39Z

Coverage increased (+8.22%) when pulling 5176251 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

coveralls · 2014-11-18T21:12:06Z

Coverage increased (+8.22%) when pulling 5176251 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

sjlee · 2014-11-20T00:50:14Z

hraven-core/src/main/java/com/twitter/hraven/Constants.java

@@ -426,4 +456,6 @@

  /** name of the properties file used for cluster to cluster identifier mapping */
  public static final String HRAVEN_CLUSTER_PROPERTIES_FILENAME = "hRavenClusters.properties";
+
+  public static final int SPARK_JOB_KEY_LENGTH = 21;


Out of curiosity, how is this number (21) derived?

(Will add a comment in the code as well), since spark job keys have a prefix of "spark", it is 5 + regular job key length (which is 16 (epoch and seq), hence spark job key length is 21.

sjlee · 2014-11-20T01:06:07Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserSpark.java

+      byte[] qualifier;
+      byte[] valueBytes;
+
+      Iterator<Map.Entry<String, JsonNode>> fieldsIterator = rootNode.getFields();


Nit: it can be expressed more concisely as

for (Map.Entry<String,JsonNode> field: rootNode.getFields()) {
// job history keys are in upper case
...

Yes, I think so too. But rootNode is a JsonNode object and that has .getFields which returns an iterator. The other two methods it has are .getKey or .getValue. It has .get which needs an index number to passed in. It does not have a get function which returns a Map.Entry.

Ah, I thought (mistakenly) getFields returned an Iterable, in which case the enhance for syntax applies. I should have looked more carefully. NVM.

…into preprocess_spark

coveralls · 2014-11-20T20:37:32Z

Coverage increased (+8.22%) when pulling 2680bc3 on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

sjlee · 2014-11-21T19:27:30Z

LGTM. Thanks for your patience @vrushalic! We can merge this after @jrottinghuis gives his +1.

coveralls · 2014-12-04T00:35:31Z

Coverage increased (+8.22%) when pulling 268305a on vrushalic:preprocess_spark into 0dd1bef on twitter:twitter_only.

Spark history file processing

Preprocessing step for Spark history file processing

34f6e78

Vrushali Channapattan added 2 commits November 11, 2014 12:01

Updating comments (hadoop version to history file type), updating var…

ee3e572

…iable names,adding a Spark Job Desc Factory class

Updating formatting

a2dbc93

sjlee reviewed Nov 14, 2014
View reviewed changes

Adding in the processing step and unit tests, changes to JobDescFactory

ed6f6c3

vrushalic changed the title ~~Preprocessing step for Spark history file processing~~ Spark history file processing Nov 18, 2014

Update spark_1413515656084_3051855

5176251

sjlee reviewed Nov 20, 2014
View reviewed changes

Vrushali Channapattan added 3 commits November 20, 2014 11:42

Updating exception logging and if check styles as per review comments

06977f0

adding in comments for explaining constant values

cc16080

Merge branch 'preprocess_spark' of https://github.com/vrushalic/hraven …

2680bc3

…into preprocess_spark

Updating one field in rest response

268305a

jrottinghuis added a commit that referenced this pull request Dec 5, 2014

Merge pull request #124 from vrushalic/preprocess_spark

4675812

Spark history file processing

jrottinghuis merged commit 4675812 into twitter:twitter_only Dec 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark history file processing #124

Spark history file processing #124

vrushalic commented Nov 7, 2014

coveralls commented Nov 7, 2014

coveralls commented Nov 11, 2014

coveralls commented Nov 11, 2014

sjlee Nov 14, 2014

sjlee Nov 14, 2014

coveralls commented Nov 18, 2014

coveralls commented Nov 18, 2014

sjlee Nov 20, 2014

vrushalic Nov 20, 2014

sjlee Nov 20, 2014

vrushalic Nov 20, 2014

sjlee Nov 20, 2014

coveralls commented Nov 20, 2014

sjlee commented Nov 21, 2014

coveralls commented Dec 4, 2014

Spark history file processing #124

Spark history file processing #124

Conversation

vrushalic commented Nov 7, 2014

coveralls commented Nov 7, 2014

coveralls commented Nov 11, 2014

coveralls commented Nov 11, 2014

sjlee Nov 14, 2014

Choose a reason for hiding this comment

sjlee Nov 14, 2014

Choose a reason for hiding this comment

coveralls commented Nov 18, 2014

coveralls commented Nov 18, 2014

sjlee Nov 20, 2014

Choose a reason for hiding this comment

vrushalic Nov 20, 2014

Choose a reason for hiding this comment

sjlee Nov 20, 2014

Choose a reason for hiding this comment

vrushalic Nov 20, 2014

Choose a reason for hiding this comment

sjlee Nov 20, 2014

Choose a reason for hiding this comment

coveralls commented Nov 20, 2014

sjlee commented Nov 21, 2014

coveralls commented Dec 4, 2014