Configurable elimination of FlatMapNode by pushing the flatMap logic into the SourceNode. #675

Closed
NPraneeth opened this Issue Jul 19, 2016 · 3 comments

Projects

None yet

3 participants

@NPraneeth
Contributor

Currently we have the FlatMapNode created for a simple flatMap operation. The intent is to let the spout handle the flatMap operation just like the optionMap operation.

We are considering SourceNode aggregation out of scope for this issue, so topologies without any FlatMapNodes will do all of their aggregation in the SummerNode.

@johnynek
Collaborator

I like this project, but it might be a non-negligible change to the design.

We could imagine making all tuples (K, V) pairs but doing a shuffle grouping between mapping nodes. This means you don't have two different input/output formats. In the case of no key, we can put () which is serialized in 0 bytes by chill (but still takes a little space in a tuple by making a longer List to wrap.

That might make things easier since there is not this odd-ball node that has to prepare a special wire format in front of the summer.

@jnievelt
Contributor

FWIW, the complexity will largely mimic what's in FlatMapBoltProvider today:

https://github.com/twitter/summingbird/blob/v0.11.0-RC1/summingbird-storm/src/main/scala/com/twitter/summingbird/storm/FlatMapBoltProvider.scala

Shimming everything into (K, V) is an interesting idea, but it does have the complexity of knowing when to add/discard the (), and possibly handling the (contrived?) case where V actually is Unit. So maybe its benefit isn't that much overall?

@NPraneeth
Contributor

@johnynek : I have made all the changes to the PR considering all the comments. Can you take a look ?

@johnynek johnynek closed this in #676 Aug 3, 2016
@johnynek johnynek added a commit that referenced this issue Aug 3, 2016
@NPraneeth @johnynek NPraneeth + johnynek Issue #675.Configurable elimination of FlatMapNode by enhancing Sourc… (
#676)

* Issue #675.Configurable elimination of FlatMapNode by enhancing SourceNode

* Added Tests for FlatMap fanOut case. Corrected the case types in scheduleSpout method. Removed the repeating tests. Changed the names to be descriptive. Many other suggested changes have been done.

* test case refined. indentation corrected.

* Added the property Test. Added fanOut test and validated graph. Comments addressed.

* Refactored the case logic in OnlinePlan, corrected some of the stale/incorrect comments

* Added some more tests. Changes on assert style, map.get. Comments have been added to code.

* variable labelling standards
22db7b0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment