-
Notifications
You must be signed in to change notification settings - Fork 77
Calculate job cost based on hadoop2 counters of mbmillis #132
Calculate job cost based on hadoop2 counters of mbmillis #132
Conversation
Calculate job cost based on hadoop2 counters of mbmillis
This should also be merged to master, right? |
Yes. On Tue, Jan 27, 2015 at 1:26 PM, Sangjin Lee notifications@github.com
|
Any help in making the master current will be highly appreciated! On Tue, Jan 27, 2015 at 1:38 PM, Vrushali Channapattan <vrushali@twitter.com
|
We can clone the PR to merge the identical patch to master. Let me know if you want me to do it. I can help. |
Yes, that would be great, please go ahead! Thanks On Jan 27, 2015, at 4:32 PM, Sangjin Lee notifications@github.com wrote: We can clone the PR to merge the identical patch to master. Let me know if — |
Hmm. The issue turns out to be bit bigger than I initially thought. It appears that there is some divergence between master and twitter_only branches. There are some changes/commits that were done on master that are not present on twitter_only. We need a one-time PR to re-apply any missing changes from master to twitter_only. Also going forward, we need to make sure to apply any commits that are done over to twitter_only, or the branches will start drifting apart. |
Yes, we didn't miss those; those were being done as part of On Wed, Jan 28, 2015 at 9:28 AM, Sangjin Lee notifications@github.com
|
Thanks for the reminder. My apologies for not realizing it. Going forward, it would be good to establish the best practice. IMO, for any generic issues, master should be the first place to commit. Then for each issue, we would promptly merge it over to twitter_only. That way, the merge stays sane, and the branches stay reasonably in sync. Thoughts? |
hRaven should now start calculating job cost based on the hadoop2 counters of mb millis instead of slot millis.
Since hadoop2 did not ship with these counters (MB_MILLIS_MAPS and MB_MILLIS_REDUCES in job counters) , on the hRaven side, hRaven would itself calculate the megabytemillis of a job and then calculate the job cost. But now that hadoop2 is emitting these counters, we should use those values.
If those counters are not present (for instance in files generated by older versions of hadoop2), hRaven should continue to use the slot millis counters.
Also, hRaven was correcting the slot millis emitted by hadoop due to a bug in hadoop. Now that fix is irrelevant and we should deprecate those methods.
Relevant open source hadoop jira https://issues.apache.org/jira/browse/MAPREDUCE-5464