New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Rounding Bug in RatioBasedEstimator #1542
Conversation
*/ | ||
override def estimateReducers(info: FlowStrategyInfo): Option[Int] = | ||
def estimateExactReducers(info: FlowStrategyInfo): Option[Double] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a bad name
Can you have a test that shows this issue and that this fixes it? |
@johnynek yes:
|
Running the new test on the develop branch yields:
|
I'm adding a cap as well @gerashegalov |
val maxEstimatedReducersKey = "scalding.reducer.estimator.max.estimated.reducers" | ||
|
||
// TODO: what's a reasonable default? Int.maxValue? 5k? 100k? | ||
val defaultMaxEstimatedReducers = 100 * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100K seems way too many, no? Pig's max is 999.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have no idea. I guess I can spot check some jobs in our hadoop cluster.
But if you can have 30k mappers, why not 30k reducers? and if 30k is somewhat normal, then 100k isn't that far off in terms of being "way too much"
I don't know whether pig's max of 999 was chosen with much thought or what size of cluster it was chosen for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason I can think of - too many reducers equals too many files.
Yeah, I don't know the historic reason for 999 reducer. I think Hive has the same too.
I can make the max non-fatal, though I'm not entirely convinced. Having the max act as a cap means that it's another tuning dimension that users need to worry about, that might be causing them to have an unexpected number of reducers (if the cap is set too low). On the other hand, if we set the cap relatively high, as in a value where we think "anything that estimates needing this is clearly broken" and treat it as exceptional, this becomes a faster feedback mechanism for users. I actually think that in many cases fail-fast is a much better user experience than "just try to make it work and hope". The grey area here is whether this is one of those cases or not. If the estimator estimates needing a ton of reducers, they are either needed, and capping will probably not be great, or the estimator is broken. That's my pitch, but I don't feel too strongly if everyone else want the cap to not be exceptional. |
thinking as a user, I think I would not want scalding to (at least by default) fail to run if it uses too many reducers. If I set N as the max, and it runs slow, I should have alerts that I notice, and I can rewrite my job or turn up the max. If it runs too fast because of a bug (and I waste resources) I should have a system to watch that too, and if not, set a lower max so the waste is not that bad. I have a hard time imagining making the default to fail the job. I can see adding an option to fail if we exceed the max (since we expect a bug in that case, or a totally giant input), and some users may want it, but I would err on the side of running the job and not changing default behavior. |
I'll update to make it non-fatal. I will also add a property to the hadoop config so that tools that monitor hadoop config properties can warn users that the max has been applied. |
OK, this should be good to go. LMK if 5k is not a good default for the cap |
@@ -179,14 +195,27 @@ object ReducerEstimatorStepStrategy extends FlowStepStrategy[JobConf] { | |||
val info = FlowStrategyInfo(flow, preds.asScala, step) | |||
|
|||
// if still None, make it '-1' to make it simpler to log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still valid? Move it down maybe before you log it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this could be moved to below where it become -1
LGTM - may want @rubanm take a look at it too. |
""".stripMargin) | ||
} | ||
|
||
n.min(configuredMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could go in an if-else block above?
One minor comment, LGTM! |
I will merge once the tests run |
Fixes the rounding bug mentioned in #1541, and adds a maximum for reducer estimation as well.