New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Library of Binary Tree Algorithms as Mixable Scala Traits #496

Closed
wants to merge 12 commits into
base: develop
from

Conversation

Projects
None yet
3 participants
@erikerlandson
Contributor

erikerlandson commented Oct 24, 2015

This is a PR for tree and map traits supporting some binary tree algorithms, factored out of the PR for implementing t-digest: #495

A Library of Binary Tree Algorithms as Mixable Scala Traits
http://erikerlandson.github.io/blog/2015/09/26/a-library-of-binary-tree-algorithms-as-mixable-scala-traits/

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 29, 2015

Contributor

@johnynek I have tree/map libs in a state that is ready for some more review. They now inherit from SortedMapLike and SortedSetLike. (#495 is not currently rebased onto these latest changes). Unit tests are passing except for a spurious randomized test failure in HLL.

Contributor

erikerlandson commented Oct 29, 2015

@johnynek I have tree/map libs in a state that is ready for some more review. They now inherit from SortedMapLike and SortedSetLike. (#495 is not currently rebased onto these latest changes). Unit tests are passing except for a spurious randomized test failure in HLL.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 30, 2015

Contributor

Actually I'm going to tweak this a little, it causes some subtle problems when I try to apply it to TDigestMap

Contributor

erikerlandson commented Oct 30, 2015

Actually I'm going to tweak this a little, it causes some subtle problems when I try to apply it to TDigestMap

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 30, 2015

Contributor

@johnynek OK I have #495 and #496 synced up.

Contributor

erikerlandson commented Oct 30, 2015

@johnynek OK I have #495 and #496 synced up.

@@ -1,9 +1,8 @@
language: scala
sudo: false
matrix:

This comment has been minimized.

@ianoc

ianoc Nov 12, 2015

Collaborator

why is this change in here? other than the compile step being before the test step whats the gain/reasoning?

@ianoc

ianoc Nov 12, 2015

Collaborator

why is this change in here? other than the compile step being before the test step whats the gain/reasoning?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 12, 2015

Contributor

The test was timing out. Factoring out the compilation helps keep the actual 'sbt test' command from running too long. This used to be a separate commit with its own message, but it was all squashed when I refactored the PR

@erikerlandson

erikerlandson Nov 12, 2015

Contributor

The test was timing out. Factoring out the compilation helps keep the actual 'sbt test' command from running too long. This used to be a separate commit with its own message, but it was all squashed when I refactored the PR

Show outdated Hide outdated algebird-core/src/main/scala/com/twitter/algebird/maps/increment.scala
import com.twitter.algebird.maps.ordered._
import com.twitter.algebird.maps.ordered.tree.DataMap
object tree {

This comment has been minimized.

@ianoc

ianoc Nov 12, 2015

Collaborator

object names generally are all capitalized

@ianoc

ianoc Nov 12, 2015

Collaborator

object names generally are all capitalized

This comment has been minimized.

@erikerlandson

erikerlandson Nov 12, 2015

Contributor

In this case the tree object is functioning more like a package, and so it seemed apropos to leave it lower case

@erikerlandson

erikerlandson Nov 12, 2015

Contributor

In this case the tree object is functioning more like a package, and so it seemed apropos to leave it lower case

This comment has been minimized.

@ianoc

ianoc Nov 12, 2015

Collaborator

why not make it a package then? this defining things in lower case objects then importing them immediately seems unusual.

@ianoc

ianoc Nov 12, 2015

Collaborator

why not make it a package then? this defining things in lower case objects then importing them immediately seems unusual.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 13, 2015

Contributor

@ianoc I replaced object with package

Contributor

erikerlandson commented Nov 13, 2015

@ianoc I replaced object with package

new Inject[K, V](keyOrdering, valueMonoid) with INodeInc[K, V] with IncrementMap[K, V] {
// INode[K]
val color = clr
val lsub = ls.asInstanceOf[NodeInc[K, V]]

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

why the casts here? Why not accept NodeInc[K, V] as the type for ls? same for the rest.

@johnynek

johnynek Nov 18, 2015

Collaborator

why the casts here? Why not accept NodeInc[K, V] as the type for ls? same for the rest.

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

Can you put a return type on iNode. What are we using here?

@johnynek

johnynek Nov 18, 2015

Collaborator

Can you put a return type on iNode. What are we using here?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

iNode is an abstract function defined in ...redblack.tree.Node (the very top of the hierarchy). Its return type is INode (the highest level internal tree node type). At that level, it has no knowledge of any sub classes, and so ls and rs have to be Node, and therefore they need to be cast. I also would have preferred not to require casting, but could not discover any way around it. I decided it was at least (a) localized and (b) only relevant to anybody designing a new tree/map subclass.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

iNode is an abstract function defined in ...redblack.tree.Node (the very top of the hierarchy). Its return type is INode (the highest level internal tree node type). At that level, it has no knowledge of any sub classes, and so ls and rs have to be Node, and therefore they need to be cast. I also would have preferred not to require casting, but could not discover any way around it. I decided it was at least (a) localized and (b) only relevant to anybody designing a new tree/map subclass.

Show outdated Hide outdated algebird-core/src/main/scala/com/twitter/algebird/maps/increment.scala
new DataMap[K, V] {
val key = k
val value = iv
}).asInstanceOf[M]

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

how does this cast succeed? Why is it needed?

It seems that DataMap[K, V] <: DataMap[K] <: AnyRef.

I'd really rather not have any casts.

@johnynek

johnynek Nov 18, 2015

Collaborator

how does this cast succeed? Why is it needed?

It seems that DataMap[K, V] <: DataMap[K] <: AnyRef.

I'd really rather not have any casts.

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

It is being used so that the xxxMap classes can return their own type (encoded by self type M). As far as I know, the only way to avoid it would be to extend the self types up into the tree classes. I made a few attempts at that, and could not get any of them to work, and so in the end I decided that casting in a few places was acceptable as a way to get it all working.

The cast succeeds because the class-specific implementation of iNode generates objects of the correct type.

Generally speaking, I share your aesthetic of wishing to avoid asInstanceOf, I just wasn't able to figure out how to avoid it in the places I did use it. The places I do use it, the type casts will succeed.

That said, if I or anybody else can demonstrate a version of the code that doesn't need them, that would be very nice.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

It is being used so that the xxxMap classes can return their own type (encoded by self type M). As far as I know, the only way to avoid it would be to extend the self types up into the tree classes. I made a few attempts at that, and could not get any of them to work, and so in the end I decided that casting in a few places was acceptable as a way to get it all working.

The cast succeeds because the class-specific implementation of iNode generates objects of the correct type.

Generally speaking, I share your aesthetic of wishing to avoid asInstanceOf, I just wasn't able to figure out how to avoid it in the places I did use it. The places I do use it, the type casts will succeed.

That said, if I or anybody else can demonstrate a version of the code that doesn't need them, that would be very nice.

Show outdated Hide outdated algebird-core/src/main/scala/com/twitter/algebird/maps/increment.scala
* val map2 = IncrementMap.key[String].value(mon)
* }}}
*/
def key[K](implicit ord: Ordering[K]) = infra.GetValue(ord)

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

please put a return type on all public and protected methods.

@johnynek

johnynek Nov 18, 2015

Collaborator

please put a return type on all public and protected methods.

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

Sure, that will be no problem.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

Sure, that will be no problem.

Show outdated Hide outdated algebird-core/src/main/scala/com/twitter/algebird/maps/prefixsum.scala
* semantics similar to Scala's seq.aggregate(z)(seqop,combop), where the standard
* Monoid 'plus' corresponds to 'combop' and the 'inc' method corresponds to 'seqop'.
*/
trait IncrementingMonoid[T, E] extends Monoid[T] {

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

why is this needed? What are the laws? Is it any different than:

case class PreparedMonoid[E, T](prepare: E => T, monoid: Monoid[T])

why define a new trait?

@johnynek

johnynek Nov 18, 2015

Collaborator

why is this needed? What are the laws? Is it any different than:

case class PreparedMonoid[E, T](prepare: E => T, monoid: Monoid[T])

why define a new trait?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

PreparedMonoid might be equivalent. I couldn't find anything except Monoid when I looked. If you already have something like it, that would be even better.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

PreparedMonoid might be equivalent. I couldn't find anything except Monoid when I looked. If you already have something like it, that would be even better.

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

I don't actually see any definition of PreparedMonoid, is this a recent addition?

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

I don't actually see any definition of PreparedMonoid, is this a recent addition?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

I have some misgivings about the Algebird use of the prepare concept. I believe it doesn't perform well when it is used on monoids with nontrivial structure.

As one example, consider a monoid defined over Map where m1.plus(m1) (or m1 ++ m2) is defined as something such as "merge maps m1 and m2" Suppose we also want to implement aggregate - like behavior over a list of elements. That is Map[K, V], where (K, V) is the element type. In Scala aggregate, this shows up as combop(m1, m2) (the analog of monoid m1 ++ m2), and seqop(m1, (k,v)) (which I will short-hand as operator m1 + (k,v)). Here, m + (k,v) is defined as "insert (k,v) into map m" (which is efficient).

Now, compare this to Algebird's aggregators, where instead of defining +, the user has to define a prepare function, in this case it would be prepare((k,v)) => Map((k,v)) That is, for every single pair (k,v) it has to instantiate an entire Map object containing a single k/v entry, and then merge it into the result using the monoidal op ++.

I ran into this when defining the monoid for TDigest, where I had to do some gymnastics inside of the plus function to make sure that a singleton-tdigest created by prepare could be merged in without tripping over quadratic inefficiencies. A model closer to Scala's combop and seqop (or ++ and +) would be more efficient (and, IMO, actually easier to think about). My definition of the trait IncrementingMonoid was an attempt to capture this more efficent behavior, where I add inc (aka +) into the trait.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

I have some misgivings about the Algebird use of the prepare concept. I believe it doesn't perform well when it is used on monoids with nontrivial structure.

As one example, consider a monoid defined over Map where m1.plus(m1) (or m1 ++ m2) is defined as something such as "merge maps m1 and m2" Suppose we also want to implement aggregate - like behavior over a list of elements. That is Map[K, V], where (K, V) is the element type. In Scala aggregate, this shows up as combop(m1, m2) (the analog of monoid m1 ++ m2), and seqop(m1, (k,v)) (which I will short-hand as operator m1 + (k,v)). Here, m + (k,v) is defined as "insert (k,v) into map m" (which is efficient).

Now, compare this to Algebird's aggregators, where instead of defining +, the user has to define a prepare function, in this case it would be prepare((k,v)) => Map((k,v)) That is, for every single pair (k,v) it has to instantiate an entire Map object containing a single k/v entry, and then merge it into the result using the monoidal op ++.

I ran into this when defining the monoid for TDigest, where I had to do some gymnastics inside of the plus function to make sure that a singleton-tdigest created by prepare could be merged in without tripping over quadratic inefficiencies. A model closer to Scala's combop and seqop (or ++ and +) would be more efficient (and, IMO, actually easier to think about). My definition of the trait IncrementingMonoid was an attempt to capture this more efficent behavior, where I add inc (aka +) into the trait.

This comment has been minimized.

@erikerlandson

erikerlandson Dec 3, 2015

Contributor

After working through #501, I think I can replace my IncrementingMonoid with MonoidAggregator, since append is the analog of inc

@erikerlandson

erikerlandson Dec 3, 2015

Contributor

After working through #501, I think I can replace my IncrementingMonoid with MonoidAggregator, since append is the analog of inc

Show outdated Hide outdated algebird-test/src/test/scala/com/twitter/algebird/maps/mixed.scala
class Inject[K, V, P](
val keyOrdering: Numeric[K],
val valueMonoid: Monoid[V],
val prefixMonoid: IncrementingMonoid[P, V]) extends Serializable {

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

both valueMonoid and prefixMonoid have monoids on V. What if they are not related? Can we avoid using two monoids and just take a function from P => V here?

@johnynek

johnynek Nov 18, 2015

Collaborator

both valueMonoid and prefixMonoid have monoids on V. What if they are not related? Can we avoid using two monoids and just take a function from P => V here?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

They do different things, and serve orthogonal purposes. So they definitely aren't related. I presume it is only a problem where implicits are in play. Currently one of them is the different type IncrementingMonoid so it isn't generating any type conflict. I assume the same would be true if I can replace it with PreparedMonoid

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

They do different things, and serve orthogonal purposes. So they definitely aren't related. I presume it is only a problem where implicits are in play. Currently one of them is the different type IncrementingMonoid so it isn't generating any type conflict. I assume the same would be true if I can replace it with PreparedMonoid

import com.twitter.algebird.maps.redblack.tree._
import com.twitter.algebird.maps.ordered.tree.DataMap
class Inject[K, V, P](

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

This is a very abstract name. Can you add a comment as to how it used?

@johnynek

johnynek Nov 18, 2015

Collaborator

This is a very abstract name. Can you add a comment as to how it used?

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

Yes, I'll add some doc to explain it

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

Yes, I'll add some doc to explain it

import com.twitter.algebird.maps.ordered.tree.DataMap
class Inject[K, V, P](
val keyOrdering: Numeric[K],

This comment has been minimized.

@johnynek

johnynek Nov 18, 2015

Collaborator

why is it called a keyOrdering but it is a Numeric? Can we avoid Numeric which is a pretty strong type class.

@johnynek

johnynek Nov 18, 2015

Collaborator

why is it called a keyOrdering but it is a Numeric? Can we avoid Numeric which is a pretty strong type class.

This comment has been minimized.

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

keyOrdering is just the general ordering relation for all keys, and in general it is only required to be Ordering. However, in the case of the subclasses that provide nearest-neighbor queries, keys must also support a notion of distance, defined as |k1 - k2|, and more generically obey the idea that when k1 < k2 < k3, then also k2 - k1 < k3 - k1.

In other words, for the nearest-neighbor subclass, the keys aren't just ordered, they are numeric. This works out nice and clean, since Numeric <: Ordering in scals's type system, so that subclass just tightens the type constraint to Numeric

@erikerlandson

erikerlandson Nov 19, 2015

Contributor

keyOrdering is just the general ordering relation for all keys, and in general it is only required to be Ordering. However, in the case of the subclasses that provide nearest-neighbor queries, keys must also support a notion of distance, defined as |k1 - k2|, and more generically obey the idea that when k1 < k2 < k3, then also k2 - k1 < k3 - k1.

In other words, for the nearest-neighbor subclass, the keys aren't just ordered, they are numeric. This works out nice and clean, since Numeric <: Ordering in scals's type system, so that subclass just tightens the type constraint to Numeric

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Dec 11, 2015

Contributor

@johnynek, replacing IncrementingMonoid with the standard and more idiomatic MonoidAggregator works as expected.

Contributor

erikerlandson commented Dec 11, 2015

@johnynek, replacing IncrementingMonoid with the standard and more idiomatic MonoidAggregator works as expected.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Jan 11, 2016

Contributor

@johnynek @ianoc @avibryant
Any feedback and/or questions on the latest state?

Contributor

erikerlandson commented Jan 11, 2016

@johnynek @ianoc @avibryant
Any feedback and/or questions on the latest state?

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Dec 7, 2016

Contributor

This is now available here:
https://github.com/isarn/isarn-collections

So I'm going to close this out

Contributor

erikerlandson commented Dec 7, 2016

This is now available here:
https://github.com/isarn/isarn-collections

So I'm going to close this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment