Creating SketchMap #151

Merged
merged 5 commits into from Mar 21, 2013

2 participants

@wlue

A SketchMap is a more generalized version of the Count-Min Sketch that accepts any time of Key (K), and a Value with an ordering/monoid.

@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+package com.twitter.algebird
+
+/**
+ * A Sketch Map is a generalized version of the Count-Min Sketch that is an
+ * approximation of Map[K, V] that stores reference to top heavy hitters. The
+ * Sketch Map can approximate the sums of any summable value that has a monoid.
+ */
+
+/**
+ * Responsible for creating instances of SketchMap.
+ */
+class SketchMapMonoid[K, V](eps: Double, delta: Double, seed: Int, heavyHittersCount: Int)
+ (implicit serialization: K => Array[Byte], valueOrdering: Ordering[V], monoid: Monoid[V])
+extends Monoid[SketchMap[K, V]] {
+
+ val hashes: Seq[SketchMapHash[K]] = {
@johnynek
johnynek added a line comment Mar 21, 2013

Can we change the type here to Seq[(K) => Int] which abstracts us from the internals a bit better (should just work to change that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek and 1 other commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+ * Returns a new set of sorted and concatenated heavy hitters given an
+ * arbitrary list of keys.
+ */
+ private def updatedHeavyHitters(hitters: Seq[K], table: SketchMapValuesTable[V]): List[K] = {
+ val mapping = calculateHeavyHittersMapping(hitters, table)
+ val specificOrdering = Ordering.by[K, V] { mapping(_) } reverse
+
+ hitters.sorted(specificOrdering).take(params.heavyHittersCount).toList
+ }
+}
+
+
+/**
+ * Convenience class for holding constant parameters of a Sketch Map.
+ */
+case class SketchMapParams[K, V](hashes: Seq[SketchMapHash[K]], eps: Double, delta: Double, heavyHittersCount: Int) {
@johnynek
johnynek added a line comment Mar 21, 2013

is V needed here?

Can we make this Seq[(K) => Int]?

@johnynek
johnynek added a line comment Mar 21, 2013

let's just take width and depth here. Again it makes serialization easier. You can convert back at runtime with the SketchMap companion object.

@wlue
wlue added a line comment Mar 21, 2013

I think the problem with storing width/depth is that the reverse conversion isn't accurate, since a width can map to many different eps values due to rounding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package com.twitter.algebird
+
+/**
+ * A Sketch Map is a generalized version of the Count-Min Sketch that is an
+ * approximation of Map[K, V] that stores reference to top heavy hitters. The
+ * Sketch Map can approximate the sums of any summable value that has a monoid.
+ */
+
+/**
+ * Responsible for creating instances of SketchMap.
+ */
+class SketchMapMonoid[K, V](eps: Double, delta: Double, seed: Int, heavyHittersCount: Int)
@johnynek
johnynek added a line comment Mar 21, 2013

Can we store the equivalent intergers to eps and delta? i.e. width and depth? Integers are easy to reason about exactness, etc...

You have the method in the companion object to create a monoid given eps/delta for that, you can just use the functions in the companion object: eps(Int), delta(Int).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+ val params: SketchMapParams[K, V] = SketchMapParams[K, V](hashes, eps, delta, heavyHittersCount)
+
+ /**
+ * A zero Sketch Map is one with zero elements.
+ */
+ val zero: SketchMap[K, V] = SketchMap[K, V](params, SketchMapValuesTable[V](params.depth, params.width), Nil, monoid.zero)
+
+ /**
+ * We assume the Sketch Map on the left and right use the same hash functions.
+ */
+ def plus(left: SketchMap[K, V], right: SketchMap[K, V]): SketchMap[K, V] = left ++ right
+
+ /**
+ * Create a Sketch Map sketch out of a single key.
+ */
+ def create(key: K, value: V): SketchMap[K, V] = zero + (key, value)
@johnynek
johnynek added a line comment Mar 21, 2013

I think we should take kv: (K,V) here to be consistent with create, and note that the application becomes cleaner: zero + kv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+ * Updates the value of a single cell in the table.
+ */
+ def +(pos: (Int, Int), value: V): SketchMapValuesTable[V] = {
+ val (row, col) = pos
+ val currValue: V = getValue(pos)
+ val newValues = values.updated(row, values(row).updated(col, Monoid.plus(currValue, value)))
+
+ SketchMapValuesTable[V](newValues)
+ }
+
+ /**
+ * Adds another values table to this one, through elementwise addition.
+ */
+ def ++(other: SketchMapValuesTable[V]): SketchMapValuesTable[V] = {
+ assert((depth, width) == (other.depth, other.width), "Tables must have the same dimensions.")
+
@johnynek
johnynek added a line comment Mar 21, 2013

Please add a comment about the jank here: scala 2.10 is more strict on recursive implicit resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+
+
+/**
+ * The 2-dimensional table of values used in the Sketch Map.
+ * Each row corresponds to a particular hash function.
+ */
+object SketchMapValuesTable {
+ /**
+ * Creates a new SketchMapValuesTable with counts initialized to all zeroes.
+ */
+ def apply[V](depth: Int, width: Int)(implicit monoid: Monoid[V]): SketchMapValuesTable[V] = {
+ SketchMapValuesTable(AdaptiveVector.fill(depth)(AdaptiveVector.fill[V](width)(monoid.zero)))
+ }
+}
+
+case class SketchMapValuesTable[V](values: AdaptiveVector[AdaptiveVector[V]])(implicit monoid: Monoid[V]) {
@johnynek
johnynek added a line comment Mar 21, 2013

Can you put this in a separate file and call it: AdaptiveMatrix[V] and remove the implicit monoid from the constructor?

You can add a companion object:

object AdaptiveMatrix {
def emptyV: AdaptiveMatrix[V]
implicit def monoid[V:Monoid]: Monoid[AdaptiveMatrix[V]] = {
// put the stuff here with the recursive AdaptiveVector wrapped in this type.
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+
+ def depth: Int = values.size
+ def width: Int = values(0).size
+
+ def getValue(pos: (Int, Int)): V = {
+ val (row, col) = pos
+
+ assert(row < depth && col < width, "Position must be within the bounds of this table.")
+
+ values(row)(col)
+ }
+
+ /**
+ * Updates the value of a single cell in the table.
+ */
+ def +(pos: (Int, Int), value: V): SketchMapValuesTable[V] = {
@johnynek
johnynek added a line comment Mar 21, 2013

Make this method take an implicit Monoid[V], don't make it global.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+
+ /**
+ * Updates the value of a single cell in the table.
+ */
+ def +(pos: (Int, Int), value: V): SketchMapValuesTable[V] = {
+ val (row, col) = pos
+ val currValue: V = getValue(pos)
+ val newValues = values.updated(row, values(row).updated(col, Monoid.plus(currValue, value)))
+
+ SketchMapValuesTable[V](newValues)
+ }
+
+ /**
+ * Adds another values table to this one, through elementwise addition.
+ */
+ def ++(other: SketchMapValuesTable[V]): SketchMapValuesTable[V] = {
@johnynek
johnynek added a line comment Mar 21, 2013

Use the code in the monoid you make above rather than here (that monoid will already have the Monoid[V]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+ * Convenience class for holding constant parameters of a Sketch Map.
+ */
+case class SketchMapParams[K, V](hashes: Seq[SketchMapHash[K]], eps: Double, delta: Double, heavyHittersCount: Int) {
+ assert(0 < eps && eps < 1, "eps must lie in (0, 1)")
+ assert(0 < delta && delta < 1, "delta must lie in (0, 1)")
+ assert(0 <= heavyHittersCount , "heavyHittersCount must be greater than 0")
+
+ val depth = SketchMap.depth(delta)
+ val width = SketchMap.width(eps)
+}
+
+
+/**
+ * Hashes an arbitrary key type to one that the Sketch Map can use.
+ */
+case class SketchMapHash[T](hasher: CMSHash, seed: Int)
@johnynek
johnynek added a line comment Mar 21, 2013

Can this be a private inner class of SketchMapMonoid?

We've found the smaller the public API surface area is, the easier it is to keep things compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
.../main/scala/com/twitter/algebird/AdaptiveMatrix.scala
+ * Use recursive AdaptiveVector monoid.
+ */
+ implicit def monoid[V:Monoid]: Monoid[AdaptiveMatrix[V]] = new Monoid[AdaptiveMatrix[V]] {
+ // Scala 2.10.0 is more strict with recursive implicit resolution, so hint
+ // it with the inner monoid.
+ private implicit val innerMonoid: Monoid[AdaptiveVector[V]] = AdaptiveVector.monoid[V]
+ private val matrixMonoid = AdaptiveVector.monoid[AdaptiveVector[V]]
+
+ override def zero: AdaptiveMatrix[V] = AdaptiveMatrix[V](matrixMonoid.zero)
+ override def plus(left: AdaptiveMatrix[V], right: AdaptiveMatrix[V]): AdaptiveMatrix[V] = {
+ AdaptiveMatrix[V](matrixMonoid.plus(left.contents, right.contents))
+ }
+ }
+}
+
+case class AdaptiveMatrix[V](contents: AdaptiveVector[AdaptiveVector[V]]) {
@johnynek
johnynek added a line comment Mar 21, 2013

Can you change context to "rowsByCols" or something that indicates whether the rows ares on the outer vector or the columns are?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+
+ /**
+ * Create a Sketch Map sketch from a sequence of pairs.
+ */
+ def create(data: Seq[(K, V)]): SketchMap[K, V] = {
+ data.foldLeft(zero) { case (acc, (key, value)) =>
+ plus(acc, create(key, value))
+ }
+ }
+}
+
+
+/**
+ * Convenience class for holding constant parameters of a Sketch Map.
+ */
+case class SketchMapParams[K, V](hashes: Seq[K => Int], width: Int, depth: Int, heavyHittersCount: Int) {
@johnynek
johnynek added a line comment Mar 21, 2013

V is not needed here, right? Can't we remove the V type parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on an outdated diff Mar 21, 2013
...e/src/main/scala/com/twitter/algebird/SketchMap.scala
+ data.foldLeft(zero) { case (acc, (key, value)) =>
+ plus(acc, create(key, value))
+ }
+ }
+}
+
+
+/**
+ * Convenience class for holding constant parameters of a Sketch Map.
+ */
+case class SketchMapParams[K, V](hashes: Seq[K => Int], width: Int, depth: Int, heavyHittersCount: Int) {
+ assert(0 < width, "width must be greater than 0")
+ assert(0 < depth, "depth must be greater than 0")
+ assert(0 <= heavyHittersCount , "heavyHittersCount must be greater than 0")
+
+ val eps = SketchMap.eps(width)
@johnynek
johnynek added a line comment Mar 21, 2013

Can we make this a def so we don't waste the memory on storing it? (same for delta).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek merged commit b75a6af into develop Mar 21, 2013

1 check passed

Details default The Travis build passed
@johnynek

Closes: #43

@johnynek johnynek deleted the feature/sketch-map branch Mar 21, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment