Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different window assigners / time windows in WindowGraphAggregation #25

Closed
wants to merge 5 commits into from

Conversation

drfloob
Copy link

@drfloob drfloob commented Sep 28, 2016

Fixes #2 (mostly). I did not try tuple-based windowing, I don't think flink itself can do it very efficiently. Tumbling and Sliding windows both work well.

logic extracted directly from toString, as it's generally useful (I needed it).
…tion

Unit test added for Connected Components over a Sliding Window. All
other tests still pass, these changes are backwards compatible.

fixes vasia#2
@vasia
Copy link
Owner

vasia commented Sep 29, 2016

Hi @drfloob,
thank you for the PR. I don't quite understand what is the intention here. Issue #2 refers to supporting the streaming sliding window model or tumbling window model, i.e. apply computations on a stream of graph snapshots as these are created by a sliding or tumbling window.
Regarding the window graph aggregation, this is intentionally implemented using a tumbling window internally, as this makes partial state merging more efficient. I don't see why someone would do single-pass connected components with a sliding window to merge state. Was there something else you had in mind that I'm missing here?

@drfloob
Copy link
Author

drfloob commented Sep 29, 2016

@vasia thanks, I'm fairly new to flink, maybe there is a simpler way to accomplish what I'm after? I am computing connected component over a sliding window without aggregation -- emitting the connected components that exist in each window. The Connected Components example was (apparently) tightly coupled with the WindowGraphAggregation class, so I found modifying it to be the shortest path to a working solution. Can it be done another way?

Cheers,
-aj

@drfloob
Copy link
Author

drfloob commented Sep 30, 2016

Issue #2 refers to supporting the streaming sliding window model or tumbling window model, i.e. apply computations on a stream of graph snapshots as these are created by a sliding or tumbling window.

To clarify, I taught WindowGraphAggregation how to do this (with any arbitrary window assigner). See the included test case, it uses sliding windows.

@vasia
Copy link
Owner

vasia commented Sep 30, 2016

Thank you for the explanation @drfloob. The Connected Components example is indeed tightly coupled to the WindowGraphAggregation class. In fact, the idea is to showcase window graph aggregation usage. I think the way to go in your case would be to create a new example or even a new abstraction to expose the contents of a sliding window as a graph snapshot. This way it would be more general than aggregation and would allow us to do any kind of operation on the window contents. What do you think?

@drfloob
Copy link
Author

drfloob commented Sep 30, 2016

@vasia That makes a lot of sense, it seems there's no need for this PR. I'm not sure how to build this more general abstraction at the moment, I'll need to get more familiar with the project. Do you already have an architecture in mind, or any suggestions as to where this new abstraction would live?

Also, if I understand correctly, I believe the changes we're talking about would fix #2: enabling computation over snapshots from arbitrary windowing models. Is that right? If not, there's still some subtlety in #2 that I don't understand.

@drfloob drfloob closed this Sep 30, 2016
@vasia
Copy link
Owner

vasia commented Oct 1, 2016

Hi @drfloob,
it could be as simple as slice(), e.g. a slidingSlice() method on the GraphStream and wrappers for UDFs. The abstraction could look like GraphWindowStream which now only works with tumbling. alternatively, we could actually generalize slice() and GraphWindowStream to work for both.

#2 is an old issue and doesn't really provide helpful information. If you'd like to work on this, maybe it's a good idea to open a new issue and we can discuss details there.

This example shows a non-reducing connected components algorithm,
where the components within each window are emitted independently,
without being merged with other windows.
@drfloob
Copy link
Author

drfloob commented Oct 3, 2016

I've added another example as a unit test, along with a WindowConnectedComponents library class that better showcases the specific use case. It spares quite a bit of redundant code compared to the alternative.

@drfloob drfloob reopened this Oct 3, 2016
@drfloob drfloob closed this Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants