Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Ability to add named nodes #712

Open
pankajroark opened this issue Jan 28, 2017 · 3 comments
Open

Ability to add named nodes #712

pankajroark opened this issue Jan 28, 2017 · 3 comments

Comments

@pankajroark
Copy link
Contributor

Automatically generating good names is very hard. It would be great if summingbird allowed users to have some control over how Heron tasks are named. One way would be to allow using named functions i.e. accept functions with names in the dsl.

source .flatMap("name") { x => ... }

Then while generating task names we could use "name" instead of "FlatMap" that we use now. Similarly for other components.

@johnynek
Copy link
Collaborator

johnynek commented Jan 28, 2017 via email

@pankajroark
Copy link
Contributor Author

pankajroark commented Jan 28, 2017

You're right if we were to simply replace "FlatMap" with "name" in above example for the entire topology then .name is the best choice. That was actually not what I had in mind I expressed incorrectly, let me elaborate.

What users want is to be able to identify nodes with what's executing on them with a name that makes sense to them. .name applies to the entire subgraph. I'll be happy with any good way of doing this but here's one:

We could keep the current naming scheme but wherever we find named functions we use them instead, overriding name of only the node where they are executed.

e.g. let's say a node was called Tail-FlatMap-Summer and was created using .sumByKey("aggregator")

then the node can be called aggregator. If we merge multiple methods together we can use a concatenated name. If it applies to multiple nodes like in the summer above then we do it based on the component, e.g. map side reduce component of sumByKey above can be called "mapsideAggregator".

Having two ways of naming i.e. .name and this scheme could be confusing and implementing above could be pretty complicated. But having bad names is a big issue for our users. Sometimes names get insane like:
"Tail-FlatMap-Summer-FlatMap-Summer-FlatMap-Summer-FlatMap-Source"
It's impossible for most users to keep such complicated names in their mind. Searching for viz graphs becomes hard. Unlike batch where tasks are short lived, for realtime each node is like a service and operators have to interact with it for debugging, frequently.

To be honest applying settings by .name is unintuitive to users. What they usually want is for the setting to be applied to just one node and not the entire subtree. We could instead use the above names for applying settings and keep the scope localized. To avoid confusion with .name we can require that only one of the naming mechanisms is used and not both.

@johnynek
Copy link
Collaborator

johnynek commented Jan 29, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants