Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Implementation of DISCO and DIMSUM algorithms. #833

Merged
merged 5 commits into from Mar 26, 2014

Conversation

Projects
None yet
3 participants
Contributor

reconditesea commented Mar 25, 2014

@johnynek johnynek commented on an outdated diff Mar 25, 2014

...om/twitter/scalding/mathematics/TypedSimilarity.scala
+ .values
+
+ // Returns all edges with non-zero in-degree
+ def withInDegree[N,E](g: TypedPipe[Edge[N,E]])(implicit ord: Ordering[N]):
+ TypedPipe[Edge[N,(E,InDegree)]] = joinAggregate(g.groupBy { _.to }) { it =>
+ InDegree(it.size)
+ }
+
+ // Returns all edges with non-zero out-degree
+ def withOutDegree[N,E](g: TypedPipe[Edge[N,E]])(implicit ord: Ordering[N]):
+ TypedPipe[Edge[N,(E,OutDegree)]] = joinAggregate(g.groupBy { _.from }) { it =>
+ OutDegree(it.size)
+ }
+
+ // Returns all edges with weights and non-zero norms
+ def withNorm[N,E](g: TypedPipe[Edge[N,Weight]])(implicit ord: Ordering[N]):
@johnynek

johnynek Mar 25, 2014

Collaborator

this should probably be called withInNorm to distinguish the total in-degree weight of a node from the total out-degree weight of the node.

@johnynek johnynek commented on the diff Mar 25, 2014

...om/twitter/scalding/mathematics/TypedSimilarity.scala
+
+import com.twitter.scalding.typed.{ Grouped, TypedPipe, WithReducers }
+
+import java.io.Serializable
+
+/**
+ * Implementation of DISCO and DIMSUM approximation similarity algorithm
+ * @author Oscar Boykin
+ * @author Kevin Lin
+ */
+
+
+/** Represents an Edge in a graph with some edge data
+ */
+case class Edge[+N,+E](from: N, to: N, data: E) {
+ def mapData[F](fn: (E => F)): Edge[N,F] = Edge(from, to, fn(data))
@johnynek

johnynek Mar 25, 2014

Collaborator

can we add: def reverse: Edge[N, E] which swaps from and to. This will make it easier to reverse graphs to reuse the similarity code.

@johnynek johnynek commented on an outdated diff Mar 25, 2014

...om/twitter/scalding/mathematics/TypedSimilarity.scala
+ * Implementation of DISCO and DIMSUM approximation similarity algorithm
+ * @author Oscar Boykin
+ * @author Kevin Lin
+ */
+
+
+/** Represents an Edge in a graph with some edge data
+ */
+case class Edge[+N,+E](from: N, to: N, data: E) {
+ def mapData[F](fn: (E => F)): Edge[N,F] = Edge(from, to, fn(data))
+}
+
+abstract sealed trait Degree { val degree: Int }
+case class InDegree(override val degree: Int) extends Degree
+case class OutDegree(override val degree: Int) extends Degree
+case class Weight(val weight: Double)
@johnynek

johnynek Mar 25, 2014

Collaborator

I don't think weight or norm need val. It is public due to case class. InDegree needs it to override degree.

@ianoc ianoc and 1 other commented on an outdated diff Mar 25, 2014

...om/twitter/scalding/mathematics/TypedSimilarity.scala
+case class Weight(weight: Double)
+case class L2Norm(norm: Double)
+
+object GraphOperations extends Serializable {
+ /** For each N, aggregate all the edges, and attach Edge state
+ */
+ def joinAggregate[N,E,T](grouped: Grouped[N,Edge[N,E]])(agfn: Iterable[Edge[N,E]] => T):
+ TypedPipe[Edge[N,(E,T)]] =
+ grouped.cogroup(grouped) {
+ (to: N, left: Iterator[Edge[N,E]], right: Iterable[Edge[N,E]]) =>
+ val newState = agfn(right)
+ left.map { _.mapData { e: E => (e, newState) } }
+ }
+ .values
+
+ // Returns all edges with non-zero in-degree
@ianoc

ianoc Mar 25, 2014

Collaborator

Vertices rather than edges right?

@reconditesea

reconditesea Mar 25, 2014

Contributor

Correct, will change this.

@johnynek johnynek added a commit that referenced this pull request Mar 26, 2014

@johnynek johnynek Merge pull request #833 from twitter/klin_typed_sim
Implementation of DISCO and DIMSUM algorithms.
86ee158

@johnynek johnynek merged commit 86ee158 into develop Mar 26, 2014

1 check passed

default The Travis CI build passed
Details

@johnynek johnynek deleted the klin_typed_sim branch Mar 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment