Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #113

vrushalic · 2014-07-01T06:18:04Z

...ocessing Step in the ETL. Also enable a rest api to fetch this aggregated app info, use check and put to store number of runs, cost and queues,

… as part of JobProcessing Step in the ETL. Also enable a rest api to fetch this aggregated app info, use check and put to store number of runs, cost and queues,

vrushalic · 2014-07-01T06:19:22Z

Previous pull request for the same: #99
That one was closed due to merge conflicts from my side, due to outdated master on fork

vrushalic · 2014-07-01T17:23:01Z

This pull request also updates columns in the raw table that specify the status of aggregation for daily and weekly for that job. If the job has already been aggregated, aggregation will not be re-attempted unless the re-aggregate flag is turned on. This helps avoid inadvertent re-aggregation, since aggregation is not idempotent.

Also, it implements Check and Put methods for queue list, job cost and number of runs in the info column family with retries. The intention here is to ensure we update these columns carefully since multiple tasks/jobs may be updating these columns for the same app for that day or that week. More details in the code in comments

sjlee · 2014-10-06T18:31:22Z

bin/create_schema.rb

+# the s column family has a TTL of 30 days, it's used as a scratch col family
+# it stores the run ids that are seen for that day
+# we assume that a flow will not run for more than 30 days, hence it's fine to "expire" that data
+create 'hraven_agg_daily', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'},


Any reason you used "hraven_..." as the names of these new tables? Doesn't look like that is the naming convention followed by other tables?

I think using an hraven prefix for table names is the better way ahead considering that the hbase datastore can contain tables from other applications as well.
For existing tables, renaming them now may be a bit more complex since hbase does not have a rename command. The recommended way has a few steps, as listed here http://hbase.apache.org/book/table.rename.html
We would need to disable the table and use clone snapshot to do it:

hbase shell> disable 'tableName'
hbase shell> snapshot 'tableName', 'tableSnapshot'
hbase shell> clone_snapshot 'tableSnapshot', 'newTableName'
hbase shell> delete_snapshot 'tableSnapshot'
hbase shell> drop 'tableName'

I modified the table names to be prefixed with job_history so that it is more consistent with other hraven tables

…m navigable map of hbase results

coveralls · 2014-11-12T19:41:54Z

Changes Unknown when pulling ba338ec on vrushalic:etl_aggregate_2 into * on twitter:master*.

sjlee · 2014-11-14T23:52:48Z

hraven-core/src/main/java/com/twitter/hraven/datasource/AppSummaryService.java

+   * in daily or weekly aggregation table
+   * @param {@link JobDetails}
+   */
+  public Boolean aggregateJobDetails(JobDetails jobDetails,


Is returning the object type Boolean necessary? I think it is just fine to use the primitive type (boolean). Using the object would unnecessarily cause boxing and unboxing and sometimes cause subtle bugs.

sjlee · 2014-11-15T00:27:12Z

hraven-core/src/main/java/com/twitter/hraven/datasource/AppSummaryService.java

+      newAppsKeys = createNewAppKeysFromResults(scan, startTime, endTime, limit);
+    } catch (IOException e) {
+      LOG.error("Caught exception while trying to scan, returning empty list of flows: "
+          + e.toString());


Although it is existing code, it makes me wonder. If this threw an exception, can you proceed to the next? I would think newAppsKeys is null, and you'd get a NullPointerException in line 117.

I looked through the code again, it looks like the function createNewAppKeysFromResults will always return a non-null list (either empty or populated). And I also see some "Long" objects there which can be changed to "long"

…ong, logging exceptions correctly etc)

coveralls · 2014-11-17T23:33:37Z

Changes Unknown when pulling 71d4a5b on vrushalic:etl_aggregate_2 into * on twitter:master*.

coveralls · 2014-11-17T23:46:01Z

Changes Unknown when pulling 71d4a5b on vrushalic:etl_aggregate_2 into * on twitter:master*.

coveralls · 2014-11-17T23:46:37Z

Changes Unknown when pulling 71d4a5b on vrushalic:etl_aggregate_2 into * on twitter:master*.

sjlee · 2014-11-19T23:22:26Z

hraven-core/src/main/java/com/twitter/hraven/AggregationConstants.java

+   * name of the flag that determines whether or not re-aggregate
+   * (overrides aggregation status in raw table for that job)
+   */
+  public static String RE_AGGREGATION_FLAG_NAME = "reaggregate";


sjlee · 2014-11-21T19:36:51Z

LGTM. Thanks for your patience @vrushalic! It seems like the travis build is stuck for some reason. It might be good to get a green build. We can merge this once @jrottinghuis gives his +1.

vrushalic · 2014-11-21T19:49:47Z

Yes, that build isn't even starting up. I tried cancelling it and restarted it. I will keep an eye on this. If nothing changes by today evening, I will make a simple checkin (like a comment update or something) to trigger another build on this branch.

coveralls · 2014-11-22T06:05:25Z

Changes Unknown when pulling 9661ad2 on vrushalic:etl_aggregate_2 into * on twitter:master*.

vrushalic · 2014-11-24T18:32:36Z

The travis CI build has passed and is green

Issue #97, #98: Aggregate job info per day and per week as part of JobPr...

Issue twitter#97, twitter#98: Aggregate job info per day and per week…

b05dcf2

… as part of JobProcessing Step in the ETL. Also enable a rest api to fetch this aggregated app info, use check and put to store number of runs, cost and queues,

vrushalic mentioned this pull request Jul 1, 2014

Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #99

Closed

sjlee reviewed Oct 6, 2014
View reviewed changes

Updating TaskDetails to use ByteUtil functions for getting values fro…

ba338ec

…m navigable map of hbase results

sjlee reviewed Nov 14, 2014
View reviewed changes

sjlee reviewed Nov 15, 2014
View reviewed changes

Vrushali Channapattan added 3 commits November 17, 2014 15:01

Updating as per review comments (changing Double to double, Long to l…

bed8dd8

…ong, logging exceptions correctly etc)

[minor] updating if condition

77cd3ef

(minor) updating another Long to long

71d4a5b

sjlee reviewed Nov 19, 2014
View reviewed changes

updating if checks around trace level logging

9661ad2

jrottinghuis added a commit that referenced this pull request Dec 8, 2014

Merge pull request #113 from vrushalic/etl_aggregate_2

a7dd908

Issue #97, #98: Aggregate job info per day and per week as part of JobPr...

jrottinghuis merged commit a7dd908 into twitter:master Dec 8, 2014

vrushalic mentioned this pull request Jan 9, 2015

Merge master twitter only #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #113

Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #113

vrushalic commented Jul 1, 2014

vrushalic commented Jul 1, 2014

vrushalic commented Jul 1, 2014

sjlee Oct 6, 2014

vrushalic Nov 12, 2014

vrushalic Nov 17, 2014

coveralls commented Nov 12, 2014

sjlee Nov 14, 2014

sjlee Nov 15, 2014

vrushalic Nov 17, 2014

coveralls commented Nov 17, 2014

coveralls commented Nov 17, 2014

coveralls commented Nov 17, 2014

sjlee Nov 19, 2014

sjlee commented Nov 21, 2014

vrushalic commented Nov 21, 2014

coveralls commented Nov 22, 2014

vrushalic commented Nov 24, 2014

Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #113

Issue #97, #98: Aggregate job info per day and per week as part of JobPr... #113

Conversation

vrushalic commented Jul 1, 2014

vrushalic commented Jul 1, 2014

vrushalic commented Jul 1, 2014

sjlee Oct 6, 2014

Choose a reason for hiding this comment

vrushalic Nov 12, 2014

Choose a reason for hiding this comment

vrushalic Nov 17, 2014

Choose a reason for hiding this comment

coveralls commented Nov 12, 2014

sjlee Nov 14, 2014

Choose a reason for hiding this comment

sjlee Nov 15, 2014

Choose a reason for hiding this comment

vrushalic Nov 17, 2014

Choose a reason for hiding this comment

coveralls commented Nov 17, 2014

coveralls commented Nov 17, 2014

coveralls commented Nov 17, 2014

sjlee Nov 19, 2014

Choose a reason for hiding this comment

sjlee commented Nov 21, 2014

vrushalic commented Nov 21, 2014

coveralls commented Nov 22, 2014

vrushalic commented Nov 24, 2014