Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a non-sparse MapMonoid/Semigroup/Group/etc... #450

Open
johnynek opened this issue Jun 12, 2015 · 17 comments
Open

Add a non-sparse MapMonoid/Semigroup/Group/etc... #450

johnynek opened this issue Jun 12, 2015 · 17 comments

Comments

@johnynek
Copy link
Collaborator

plus(Map("hey" -> 0), Map()) == Map() which surprises people (that zeros are removed). In many cases, you want this, in many cases you don't.

We could change the default, but that might break people, or we could add something like:

import com.twitter.algebird.MapAlgebra.dense._

to opt in to dense versions of all the objects.

@johnynek
Copy link
Collaborator Author

Work around now by doing a .mapValues(Option(_)) and this will make it dense, but that's ugly.

@ianoc
Copy link
Collaborator

ianoc commented Jun 12, 2015

It's also likely slower too I suppose. More allocations of the options. I
think importing the sense would be good

On Thursday, June 11, 2015, P. Oscar Boykin notifications@github.com
wrote:

Work around now by doing a .mapValues(Option(_)) and this will make it
dense, but that's ugly.


Reply to this email directly or view it on GitHub
#450 (comment).

@jnievelt
Copy link
Contributor

I disagree with the use of dense/spare in this context. It's more a matter of determining which keys are valid. Dense and sparse are typically not a functional distinction like that. For example, both dense and sparse vectors will have a size property with the understanding that indices in [0, size) are valid.

One could also imagine a 'sparse' Map-like Monoid (or Map Aggregator) which, instead of storing full associative pairs (x, zero) as they occur, stores a separate Set of keys which map to zero along with the Map of keys that map to non-zero.

@avibryant
Copy link
Contributor

Agree with @jnievelt on the terminology being wrong there; "dense" to me means that the keys have a known, finite domain and that storage is pre-allocated for all of them.

The Option workaround wouldn't work, would it? Because the zero for the Option monoid is None, which means the Nones will get dropped just like any other 0 would.

@jnievelt
Copy link
Contributor

edit: actually I think @avibryant would be right, if the MapGroup is using OptionGroup. Does that mean it doesn't use OptionGroup (because there is a OptionMonoid)? Seems like we want a similar thing to happen for Map (i.e., the Group has to be explicitly requested).

I guess I'm not entirely sure how all these things get pulled in anyway.

@avibryant
Copy link
Contributor

Oh right, of course.

One term that comes to mind, instead of dense, is monotonic - in the sense that the size of the vectors does not ever decrease as the result of an addition (or subtraction).

@jnievelt
Copy link
Contributor

Oh, I see that even MapMonoid has this Group-like behavior (it's just missing negate). So it would use OptionMonoid and everything is good.

@johnynek
Copy link
Collaborator Author

  1. the Option wrapper does work (or seemed to when I tried it at the REPL). I think the reason is because Monoid[Option[T]].isNonZero does not recurse: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Monoid.scala#L61 just checks to see if it is None.
  2. dense may not be a great name, but considering Map(k -> 0) to be equivalent to Map() is definitely considering the Map to be sparse, in my mind, which is to say, we assume an infinite vector of zeros.

This comes down to this: is this an infinite sparse vector of (K -> Option[V]) where None values are not stored. Or is it an infinite sparse vector of K -> V where there is a zero value in V that is not stored. I think it is totally convention because each can simulate the other.

Monotonic is good, I might go futher: monotonicSize to be explicit what is monotonic here.

@Gabriella439
Copy link
Contributor

Another approach is to wrap the Map in an opaque class that quotients the difference between a key with a zero value an empty key. The idea is that this wrapper would provide a new lookup method that would return Monoid.zero if the key is absent, so that the user cannot detect the difference between the two representations.

@ianoc
Copy link
Collaborator

ianoc commented Jun 15, 2015

deleting is more memory efficient, so i think we'll want to be additive
with anything that keeps more state around

On Mon, Jun 15, 2015 at 10:33 AM, Gabriel Gonzalez <notifications@github.com

wrote:

Another approach is to wrap the Map in an opaque class that quotients the
difference between a key with a zero value an empty key. The idea is that
this wrapper would provide a new lookup method that would return
Monoid.zero if the key is absent, so that the user cannot detect the
difference between the two representations.


Reply to this email directly or view it on GitHub
#450 (comment).

@jnievelt
Copy link
Contributor

The point of this issue is not just that the MapMonoid is confusing, but also that the functionality it's missing (preserving keys that map to monoid.zero) is desired. So presenting a different interface that makes this impossible doesn't really help.

But yes, it should be additive. It could easily be that applications depend on assumptions like map.contains(k) <=> map(k) != monoid.zero, map.size == map.count { _._2 != monoid.zero }, etc.

@Gabriella439
Copy link
Contributor

Whether or not you use Option you still break the identity laws. One of the consequences of the Monoid laws is that the identity element must be unique. If you have two proposed non-equal identity elements (let's call them i1 and i2) then one of them must necessarily not be the identity.

To see why, you just add i1 and i2 together and whichever one you get back is not the identity. For example, if:

i1 + i2 = i1

... then i1 is not the identity element because i1 + i2 != i2

The only way to truly fix this issue is to have exactly one identity element or quotient the representation so that the user can only distinguish one identity element. The Option solution has the exact same problem because you can't have both Map(k -> None) and Map() both be identity elements.

@jnievelt
Copy link
Contributor

Maybe I'm missing something, but isn't Map(k -> None) not an identity candidate because Map(k -> None) + Map() != Map()?

@Gabriella439
Copy link
Contributor

@jnievelt That means that Map() is not a valid identity, because the identity laws say that:

x + zero = x

... but the law does not hold when zero = Map() and x = Map(k -> None)

@jnievelt
Copy link
Contributor

Ah, of course you're right. In fact the result is that there are no identity elements, except for degenerate cases like Map[Nothing, Long].

@johnynek
Copy link
Collaborator Author

@Gabriel439 the laws require a definition of =. If your definition of = includes only looking at non-zero keys, you have no problem, because really you had zero + zero = zero for these cases.

@Gabriella439
Copy link
Contributor

@johnynek I agree. That's why I believe we should wrap the Map type so that we can quotient the API and then the user wouldn't be able to distinguish multiple identity elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants