Allow different window assigners / time windows in WindowGraphAggregation #25

drfloob · 2016-09-28T03:22:20Z

Fixes #2 (mostly). I did not try tuple-based windowing, I don't think flink itself can do it very efficiently. Tumbling and Sliding windows both work well.

logic extracted directly from toString, as it's generally useful (I needed it).

…tion Unit test added for Connected Components over a Sliding Window. All other tests still pass, these changes are backwards compatible. fixes vasia#2

vasia · 2016-09-29T20:20:07Z

Hi @drfloob,
thank you for the PR. I don't quite understand what is the intention here. Issue #2 refers to supporting the streaming sliding window model or tumbling window model, i.e. apply computations on a stream of graph snapshots as these are created by a sliding or tumbling window.
Regarding the window graph aggregation, this is intentionally implemented using a tumbling window internally, as this makes partial state merging more efficient. I don't see why someone would do single-pass connected components with a sliding window to merge state. Was there something else you had in mind that I'm missing here?

drfloob · 2016-09-29T20:47:54Z

@vasia thanks, I'm fairly new to flink, maybe there is a simpler way to accomplish what I'm after? I am computing connected component over a sliding window without aggregation -- emitting the connected components that exist in each window. The Connected Components example was (apparently) tightly coupled with the WindowGraphAggregation class, so I found modifying it to be the shortest path to a working solution. Can it be done another way?

Cheers,
-aj

drfloob · 2016-09-30T00:45:54Z

Issue #2 refers to supporting the streaming sliding window model or tumbling window model, i.e. apply computations on a stream of graph snapshots as these are created by a sliding or tumbling window.

To clarify, I taught WindowGraphAggregation how to do this (with any arbitrary window assigner). See the included test case, it uses sliding windows.

vasia · 2016-09-30T08:51:43Z

Thank you for the explanation @drfloob. The Connected Components example is indeed tightly coupled to the WindowGraphAggregation class. In fact, the idea is to showcase window graph aggregation usage. I think the way to go in your case would be to create a new example or even a new abstraction to expose the contents of a sliding window as a graph snapshot. This way it would be more general than aggregation and would allow us to do any kind of operation on the window contents. What do you think?

drfloob · 2016-09-30T17:42:30Z

@vasia That makes a lot of sense, it seems there's no need for this PR. I'm not sure how to build this more general abstraction at the moment, I'll need to get more familiar with the project. Do you already have an architecture in mind, or any suggestions as to where this new abstraction would live?

Also, if I understand correctly, I believe the changes we're talking about would fix #2: enabling computation over snapshots from arbitrary windowing models. Is that right? If not, there's still some subtlety in #2 that I don't understand.

vasia · 2016-10-01T10:27:43Z

Hi @drfloob,
it could be as simple as slice(), e.g. a slidingSlice() method on the GraphStream and wrappers for UDFs. The abstraction could look like GraphWindowStream which now only works with tumbling. alternatively, we could actually generalize slice() and GraphWindowStream to work for both.

#2 is an old issue and doesn't really provide helpful information. If you'd like to work on this, maybe it's a good idea to open a new issue and we can discuss details there.

This example shows a non-reducing connected components algorithm, where the components within each window are emitted independently, without being merged with other windows.

drfloob · 2016-10-03T06:36:43Z

I've added another example as a unit test, along with a WindowConnectedComponents library class that better showcases the specific use case. It spares quite a bit of redundant code compared to the alternative.

drfloob added 3 commits September 27, 2016 15:35

add buildMap to the DisjointSet public API

121a8fa

logic extracted directly from toString, as it's generally useful (I needed it).

Allow different window assigners / time windows in WindowGraphAggrega…

fe3904c

…tion Unit test added for Connected Components over a Sliding Window. All other tests still pass, these changes are backwards compatible. fixes vasia#2

remove backup file that snuck in

559ff66

drfloob closed this Sep 30, 2016

get connected components to play well with edge-values

539894e

drfloob mentioned this pull request Oct 3, 2016

Adding data-parallel window graph aggregation #26

Closed

add WindowConnectedComponents library and test case

dcac87e

This example shows a non-reducing connected components algorithm, where the components within each window are emitted independently, without being merged with other windows.

drfloob reopened this Oct 3, 2016

drfloob closed this Oct 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow different window assigners / time windows in WindowGraphAggregation #25

Allow different window assigners / time windows in WindowGraphAggregation #25

drfloob commented Sep 28, 2016

vasia commented Sep 29, 2016

drfloob commented Sep 29, 2016

drfloob commented Sep 30, 2016

vasia commented Sep 30, 2016

drfloob commented Sep 30, 2016

vasia commented Oct 1, 2016

drfloob commented Oct 3, 2016

Allow different window assigners / time windows in WindowGraphAggregation #25

Allow different window assigners / time windows in WindowGraphAggregation #25

Conversation

drfloob commented Sep 28, 2016

vasia commented Sep 29, 2016

drfloob commented Sep 29, 2016

drfloob commented Sep 30, 2016

vasia commented Sep 30, 2016

drfloob commented Sep 30, 2016

vasia commented Oct 1, 2016

drfloob commented Oct 3, 2016