Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

strawman implementation of AdaptiveSeq #85

Closed
wants to merge 3 commits into from

3 participants

@avibryant
Collaborator

See #84

@johnynek
Collaborator

This looks like an excellent strawman.

Two small points:

1) The Monoid should probably be in the companion object, since Monoid[AdaptiveSeq[T]] is well defined (even group/ring for that matter).
2) I think you need a specialized .updated method which does a merge with a Single.

Also, I'm not sure I like the name AdaptiveSeq. SparseIndexedSeq would be my choice, even thought not all representations are actually sparse (Spare with fallback to dense). But I'm not super concerned either way.

@avibryant
Collaborator

I'm somewhat confused by both of your points.

1) Do you really mean Monoid[AdaptiveSeq[T]], or do you mean something like Monoid[AdaptiveSeq[T].Base], which is what AdaptiveSeq's monoid method returns? If the former, what would that even mean? And either way, what do companion objects have to do with it?

2) do you just mean something like this, on Base?

def updated(idx : Int, value : T) = merge(Single(idx, value))
@avibryant
Collaborator

Oh, as for name, I'm not fussy. SparseIndexedSeq is fine, even though as you point out it's slightly misleading.

@avibryant
Collaborator

Ok, renamed and added updated(), and addressed aaron's point about bounds checking.

@avibryant
Collaborator

Oh, wait. That's not the right semantics for updated() at all.

@johnynek johnynek commented on the diff
...ain/scala/com/twitter/algebird/SparseIndexedSeq.scala
((47 lines not shown))
+ Single(i, v)
+ else
+ Sparse(Map(i -> v, index -> value))
+ }
+ }
+
+ val mapMonoid = new MapMonoid[Int,T]
+ case class Sparse(map : Map[Int,T]) extends Base {
+ def get(idx : Int) = map.getOrElse(idx, tZero)
+
+ def merge(b : Base) : Base = {
+ b match {
+ case Empty => this
+ case Single(i, v) => mergeSingle(i,v)
+ case Sparse(m) => mergeSparse(m)
+ case d:Dense => d.mergeSparse(map)
@johnynek Collaborator

this is reversing the order of the plus, right?

So, if the + is not commutative, this will break, won't it?

@avibryant Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek johnynek commented on the diff
...ain/scala/com/twitter/algebird/SparseIndexedSeq.scala
((108 lines not shown))
+ Dense(vec.zipWithIndex.map{case (v,i) =>
+ m.get(i) match {
+ case Some(v2) => tMonoid.plus(v, v2)
+ case None => v
+ }
+ })
+ }
+
+ def mergeDense(other : Vector[T]) = {
+ Dense(vec.zip(other).map{case (v1, v2) => tMonoid.plus(v1, v2)})
+ }
+
+ def updated(i : Int, v : T) : IndexedSeq[T] = Dense(vec.updated(i, v))
+ }
+
+ val monoid = new Monoid[Base] {
@johnynek Collaborator

I don't think scala will be able to find this Monoid implicitly without additional code.

Can you remind me again why you don't lift the SparseIndexedSeq trait up by itself, then have private [algebird] instances that you have here, and use the SparseIndexedSeq companion object to create Empty and get started from there?

With that approach, an implicit def monoid[T:Monoid]: Monoid[SparseIndexedSeq[T]] would be found automatically (and code like Monoid.plus(v1, v2) would just work).

@avibryant Collaborator
@aaron-siegel Collaborator

... and the optimal choice of density cutoff is also sensitive to T. I was envisioning that the user would put

implicit val myMonoid = new SparseIndexedSeq[MyType](myLen, mySparseFactor)

somewhere in scope before using this feature.

@johnynek Collaborator

Good point. A default Monoid on this doesn't make sense.

@johnynek Collaborator

Agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@aaron-siegel
Collaborator

How about SwitchingIndexedSeq? (I agree that AdaptiveSeq is a bit grandiose)

@aaron-siegel
Collaborator

Another idea would be to call it "SwitchingSeqFamily" or some such. Calling the enclosing class a "Seq" is a bit of a misnomer; there's an argument that a class shouldn't be suffixed "IndexedSeq" unless it actually inherits the IndexedSeq trait. One proposal is

SparseIndexedSeq -> SwitchingSeqFamily (or SparseSeqFamily or some such)
Base -> Element
Empty -> EmptyElement
Single -> SingletonElement
Sparse -> SparseElement
Dense -> DenseElement

which is perhaps more self-documenting nomenclature.

@avibryant
Collaborator

I think I like SparseSeqFamily best. I don't see the need to rename the internal classes, but if I did rename them I would prefer the suffix Seq to Element.

Apart from naming, are people happy with the current patch?

@johnynek johnynek commented on the diff
...ain/scala/com/twitter/algebird/SparseIndexedSeq.scala
((25 lines not shown))
+
+ case class Single(index : Int, value : T) extends Base {
+ def get(idx : Int) = if(idx == index) value else tZero
+
+ def merge(b : Base) : Base = {
+ b match {
+ case Empty => this
+ case Single(i, v) => mergeSingle(i, v)
+ case s:Sparse => s.mergeSingle(index, value)
+ case d:Dense => d.mergeSingle(index, value)
+ }
+ }
+
+ def mergeSingle(i : Int, v : T) = {
+ if(i == index)
+ Single(i, tMonoid.plus(v, value))
@johnynek Collaborator

This is also backwards for a non-commutative monoid.

PS: Can you add a test of the monoid laws on this using the T = String? That monoid is not commutative, and it should expose any problems we have there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@johnynek
Collaborator

I'm not crazy about the pattern of keeping this classes as inner classes. I'd rather they not be (I'm a little afraid of how Kryo is going to interact with that, because inner classes of immutable objects give it some problems sometimes).

So, I'd rather the inner classes be lifted out.

That said, if anyone else like's Avi's approach better, then I'll accept the majority rule.

@aaron-siegel
Collaborator

I prefer the naming convention: X for the base trait, and ZeroX, SingleX, SparseX, DenseX for the implementations. I'm less particular about what X is, though I have a mild preference for Element over Seq. I'll explain.

I borrowed the term "Family" from GAP, in which a "Family" is a collection of algebraic elements that can be combined with one another in various ways, but not with elements of another family. (The set of elements of a specific group is the canonical example of a family.) It's a fairly broad concept, but one that maps perfectly to the present scenario. Here each instance of the enclosing class defines a family, whose elements are instances of its nested types.

This approach works so well here that it's inspired me to think about various other uses of this design pattern. For example, we could create a class PermutationFamily(n: Int), whose elements are permutations on n letters, equipped with a corresponding Group structure. I think this would be right at home in algebird.

In this grand generalization of avi's design pattern, there are numerous specialized XFamily implementations, each with a nested XFamily.Element type. I think this convention works better than XFamily.X (or some such) since it highlights the common structure across different Family implementations. It's quite analogous to the Enumeration pattern in scala, in which every class XEnumeration has an XEnumeration.Value type.

Of course, this is all highly speculative and presumes that (1) there really is value to building out other Family implementations in algebird, and drawing attention to their common structure; and (2) Oscar's practical concerns over this design pattern (serialization etc) can be overcome.

But in principle I'm very attracted to avi's proposal here. I think the idea of an enclosing Family instance that binds its elements within the type system (rather than in an ad hoc way at runtime) is really quite beautiful, and has many potential applications beyond this one.

@avibryant
Collaborator

So, assuming Kryo can cope with it, I'm with Aaron that I think this is an elegant and generally applicable pattern. Oscar, how do we make sure Kryo is ok? The way you say "give it some problems sometimes" makes it sound like it's not straightforward to reproduce.

@johnynek
Collaborator

So, Avi, I hope you don't mind that I reimplemented this approach in a way more to my liking when we needed SketchMap.

Can we close the request?

@avibryant
Collaborator

I'm horribly offended, of course, but yes I'll close the request.

@avibryant avibryant closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 19, 2012
  1. strawman implementaiton of AdaptiveSeq

    Avi Bryant authored
Commits on Dec 22, 2012
  1. proper semantics for updated()

    Avi Bryant authored
This page is out of date. Refresh to see the latest.
Showing with 127 additions and 0 deletions.
  1. +127 −0 src/main/scala/com/twitter/algebird/SparseIndexedSeq.scala
View
127 src/main/scala/com/twitter/algebird/SparseIndexedSeq.scala
@@ -0,0 +1,127 @@
+package com.twitter.algebird
+
+class SparseIndexedSeq[T](len : Int, sparseFactor : Double = 0.25)(implicit tMonoid : Monoid[T]) {
+ val tZero = tMonoid.zero
+ val sparseThreshold = len * sparseFactor
+
+ sealed trait Base extends IndexedSeq[T] {
+ val length = len
+ def merge(b : Base) : Base
+ def get(idx : Int) : T
+ def apply(idx : Int) = {
+ if(idx < len && idx >= 0)
+ get(idx)
+ else
+ throw new IndexOutOfBoundsException()
+ }
+ }
+
+ object Empty extends Base {
+ def get(idx : Int) = tZero
+ def merge(b : Base) = b
+
+ def updated(i : Int, v : T) : IndexedSeq[T] = Single(i, v)
+ }
+
+ case class Single(index : Int, value : T) extends Base {
+ def get(idx : Int) = if(idx == index) value else tZero
+
+ def merge(b : Base) : Base = {
+ b match {
+ case Empty => this
+ case Single(i, v) => mergeSingle(i, v)
+ case s:Sparse => s.mergeSingle(index, value)
+ case d:Dense => d.mergeSingle(index, value)
+ }
+ }
+
+ def mergeSingle(i : Int, v : T) = {
+ if(i == index)
+ Single(i, tMonoid.plus(v, value))
@johnynek Collaborator

This is also backwards for a non-commutative monoid.

PS: Can you add a test of the monoid laws on this using the T = String? That monoid is not commutative, and it should expose any problems we have there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ else
+ Sparse(Map(i -> v, index -> value))
+ }
+
+ def updated(i : Int, v : T) : IndexedSeq[T] = {
+ if(i == index)
+ Single(i, v)
+ else
+ Sparse(Map(i -> v, index -> value))
+ }
+ }
+
+ val mapMonoid = new MapMonoid[Int,T]
+ case class Sparse(map : Map[Int,T]) extends Base {
+ def get(idx : Int) = map.getOrElse(idx, tZero)
+
+ def merge(b : Base) : Base = {
+ b match {
+ case Empty => this
+ case Single(i, v) => mergeSingle(i,v)
+ case Sparse(m) => mergeSparse(m)
+ case d:Dense => d.mergeSparse(map)
@johnynek Collaborator

this is reversing the order of the plus, right?

So, if the + is not commutative, this will break, won't it?

@avibryant Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ }
+ }
+
+ def mergeSingle(i : Int, v : T) = {
+ val newVal = map.get(i) match {
+ case Some(v2) => tMonoid.plus(v, v2)
+ case None => v
+ }
+ sparseOrDense(map + (i -> newVal))
+ }
+
+ def mergeSparse(m : Map[Int, T]) = {
+ sparseOrDense(mapMonoid.plus(map, m))
+ }
+
+ def sparseOrDense(m : Map[Int, T]) = {
+ if(m.size < sparseThreshold)
+ Sparse(m)
+ else {
+ val vector = Vector[T]().padTo(len, tZero)
+ Dense(map.foldLeft(vector){ case (vec, (i,v)) => vec.updated(i, v) })
+ }
+ }
+
+ def updated(i : Int, v : T) : IndexedSeq[T] = sparseOrDense(map + (i -> v))
+ }
+
+ case class Dense(vec : Vector[T]) extends Base {
+ override def apply(idx : Int) = vec(idx)
+ def get(idx : Int) = vec(idx)
+
+ def merge(b : Base) : Base = {
+ b match {
+ case Empty => this
+ case Single(i, v) => mergeSingle(i,v)
+ case Sparse(map) => mergeSparse(map)
+ case Dense(vec) => mergeDense(vec)
+ }
+ }
+
+ def mergeSingle(i : Int, v : T) = {
+ Dense(vec.updated(i, v))
+ }
+
+ def mergeSparse(m : Map[Int, T]) = {
+ Dense(vec.zipWithIndex.map{case (v,i) =>
+ m.get(i) match {
+ case Some(v2) => tMonoid.plus(v, v2)
+ case None => v
+ }
+ })
+ }
+
+ def mergeDense(other : Vector[T]) = {
+ Dense(vec.zip(other).map{case (v1, v2) => tMonoid.plus(v1, v2)})
+ }
+
+ def updated(i : Int, v : T) : IndexedSeq[T] = Dense(vec.updated(i, v))
+ }
+
+ val monoid = new Monoid[Base] {
@johnynek Collaborator

I don't think scala will be able to find this Monoid implicitly without additional code.

Can you remind me again why you don't lift the SparseIndexedSeq trait up by itself, then have private [algebird] instances that you have here, and use the SparseIndexedSeq companion object to create Empty and get started from there?

With that approach, an implicit def monoid[T:Monoid]: Monoid[SparseIndexedSeq[T]] would be found automatically (and code like Monoid.plus(v1, v2) would just work).

@avibryant Collaborator
@aaron-siegel Collaborator

... and the optimal choice of density cutoff is also sensitive to T. I was envisioning that the user would put

implicit val myMonoid = new SparseIndexedSeq[MyType](myLen, mySparseFactor)

somewhere in scope before using this feature.

@johnynek Collaborator

Good point. A default Monoid on this doesn't make sense.

@johnynek Collaborator

Agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ def zero = Empty
+ def plus(a : Base, b : Base) : Base = a.merge(b)
+ }
+}
Something went wrong with that request. Please try again.