Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Equality (Sets, Seqs and Maps should be equal by content - modulo type) #1555

Closed
nrktkt opened this issue Sep 3, 2016 · 15 comments

Comments

@nrktkt
Copy link

nrktkt commented Sep 3, 2016

One would typically expect implementations of a set interface to consider two sets equal regardless of subtype or ordering as long as they contained exactly the same elements.
But in javaslang this:

Set<String> tree = TreeSet.of("a", "b", "c", "D");
Set<String> hash = HashSet.of("a", "b", "c", "D");
assertEquals(tree, hash);

Results in this:

java.lang.AssertionError: 
Expected :TreeSet(D, a, b, c)
Actual   :HashSet(a, b, c, D)

Is this the intended behavior or could it be improved?

@danieldietrich
Copy link
Contributor

danieldietrich commented Sep 3, 2016

Scala does behave as follows:

scala> val set1 = scala.collection.immutable.SortedSet.empty[String] + ("a", "c", "b")
set1: scala.collection.immutable.SortedSet[String] = TreeSet(a, b, c)

scala> val set2 = scala.collection.immutable.Set.empty + ("c", "b", "a")
set2: scala.collection.immutable.Set[String] = Set(c, b, a)

scala> set1.equals(set2)
res0: Boolean = true

scala> set2 == set1
res1: Boolean = true

scala> set1 == set2
res2: Boolean = true

scala> set2.equals(set1)
res3: Boolean = true

Sequences seems to be equal too if type differs but Seq/Set does not work that way:

scala> List(1, 2, 3) == Stream(1 ,2 ,3)
res4: Boolean = true

scala> List(1, 2, 3) equals Stream(1 ,2 ,3)
res5: Boolean = true

scala> List(1, 2, 3) equals Set(1 ,2 ,3)
res6: Boolean = false

In Javaslang equals does take the type into account. We have Value.eq() for the case that just the contained values should be compared (deep-equals). But for Sets it currently does not seem to work (in all cases).

We should definitely rethink the current behavior, maybe for equals, at least of Value.eq.

Changing equals and eq are a change of behavior that is backward-incompatible. Therefore I will target it for 3.0.0.

Update: For now I will consider it as bug and target 2.0.4. After investigation we might change the target to 3.0.0.

@danieldietrich danieldietrich modified the milestones: 2.0.4, 3.0.0 Sep 3, 2016
@danieldietrich danieldietrich modified the milestones: 2.0.5, 2.0.4 Sep 4, 2016
pfijalki pushed a commit to pfijalki/javaslang that referenced this issue Sep 14, 2016
@danieldietrich
Copy link
Contributor

This changes the behavior of collection's equals method. Traversable should override equals and add a javadoc that contains a description for all existing collection types (Seq, Set, Map, Multimap, Tree, ...).

Collection Equality

The collection libraries have a uniform approach to equality and hashing. The idea is, first, to divide collections into sets, maps, and sequences. Collections in different categories are always unequal. For instance, Set(1, 2, 3) is unequal to List(1, 2, 3) even though they contain the same elements. On the other hand, within the same category, collections are equal if and only if they have the same elements (for sequences: the same elements in the same order). For example, List(1, 2, 3) == Vector(1, 2, 3), and HashSet(1, 2) == TreeSet(2, 1).

Source: Scala Documentation


A test for sequences on the Scala REPL:

scala> val stream = Stream(1, 2, 3)
stream: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> val list = List(1, 2, 3)
list: List[Int] = List(1, 2, 3)

scala> stream equals list
res0: Boolean = true

scala> list equals stream
res1: Boolean = true

scala> stream == list
res2: Boolean = true

scala> list == stream
res3: Boolean = true

@danieldietrich danieldietrich modified the milestones: 3.0.0, 2.0.5 Sep 18, 2016
@danieldietrich
Copy link
Contributor

It is not a bug - it is a change of behavior. The structural equals method Value.eq does not change because the order of elements returned by iterator() matters (see contract).

pfijalki pushed a commit to pfijalki/javaslang that referenced this issue Oct 11, 2016
@zsolt-donca
Copy link

zsolt-donca commented Oct 21, 2016

Is Value.eq reliable? It seems to me that it has undefined behavior for collection types that don't have a well-defined ordering, such as hash-based collections.

For HashSet, the actual depends on the hash codes of the elements. Consider the examples below.

Value<String> s1 = HashSet.of("f", "o");
Value<String> s2 = TreeSet.of("f", "o");

System.out.println(s1.eq(s2)); // prints `true`

The same code with different values:

Value<String> s1 = HashSet.of("f", "uu");
Value<String> s2 = TreeSet.of("f", "uu");

System.out.println(s1.eq(s2)); // prints `false`

So, if all you got is two Value instances, you can't reliably compare them while looking for congruence of structures (quoting from the javadoc) because certain values don't have their structure in correlation with their ordering, but eq only checks ordering.

Because of the above, you can only use eq if you possess some additional information about your Value instances (that is, they have a well-defined order), but this seems like a leaking abstraction to me.

Is there something that we could do about this?

@danieldietrich
Copy link
Contributor

danieldietrich commented Oct 21, 2016

@zsolt-donca thank you! I think on the level of Value the only thing we can do is updating the Javadoc. A Traversable, e.g. a HashSet, may contain elements that have no natural order. So there will be no chance for eq to order elements.

We could also add more introspection methods like

  • hasOrder() (= has underlying Comparator)
  • isSequential() (like Seq, LinkedHash* and Iterator)
  • isDistinct() (like Set)

But Value seems to be the wrong place for these methods, they would fit better to Traversable.

Note: If we add more of these methods, we can build the Splitterator characteristics automatically. I think this is a good idea! Value could have a default implementation for single-valued types. Traversable should override it and use the introspection-methods to build a Splitterator.

(see #1635)

@danieldietrich danieldietrich modified the milestones: 2.1.0, 3.0.0 Oct 23, 2016
@danieldietrich danieldietrich changed the title Set equality Collection Equality (Sets, Seqs and Maps should be equal by content - modulo type) Oct 23, 2016
@v1ctor
Copy link
Member

v1ctor commented Mar 13, 2017

Hi! After some investigation I have found these problems:

  • First of all, it's impossible to create default implementation of equals in Traversable, because it's interface. So I've created a static method which is using isSequential() and isDistinct() introspection methods.
  • We would like to optimize equals and hashCode for Iterator.range(), we could only check borders instead of every element, but then Iterator.of(1, 2, 3) and Iterator.range(1, 4) won't be equal.

@danieldietrich What do you think? Especially about the second problem.

@danieldietrich
Copy link
Contributor

@v1ctor thank you for your investigation!

So I've created a static method which is using isSequential() and isDistinct() introspection methods.

We should create static methods located in the package-private class Collections in order to hide these methods from the public API.

It is sufficient to do these for Seq, Set, Map and Multimap:

class Collections {

    static boolean equals(Set<T> self, Object that) {
        ...
    }

    static boolean equals(Seq<T> self, Object that) {
        ...
    }

    static boolean equals(Map<K, V> self, Object that) {
        ...
    }

    static boolean equals(Multimap<K, V> self, Object that) {
        ...
    }

}

We need also to ensure that the hashCodes are equal, if the collections are equal, especially for Sets. Because Sets may have different element order we can't just do hashCode = hashCode * 31 + x. Maybe we already have a solution in the code... (@ruslansennov that might be also a problem of HAMT, @me and also a problem of RedBlackTree).


What do you think? Especially about the second problem.

We focus only on Set, Seq, Map and Multimap. Iterator, Tree and PriorityQueue are out of scope for this issue.

@ruslansennov
Copy link
Member

@ruslansennov that might be also a problem of HAMT

No, it was solved #1818

@v1ctor
Copy link
Member

v1ctor commented Mar 14, 2017

@danieldietrich
Yes, I've done it exactly like you've told 👍
Thank you for the refinements. I will carry on.

@danieldietrich
Copy link
Contributor

Great! Feel free to create an unfinished PR at any time if you want early feedback.

@danieldietrich
Copy link
Contributor

danieldietrich commented Mar 19, 2017

OUTDATED The actual version can be found below.


Scala will introduce 'Multiversal Equality', currently explored in Dotty (using type classes / a trait Eq). That means the types are partitioned into multiple disjoint universes (by type) and equality is defined there. Note: we do this already for our types (like Option) by checking instanceof.

In order to keep symmetry, i.e. c1.equals(c2) <=> c2.equals(c1), for all collections c1, c2 we need to go a step further. We will partition collections into Seq, Set, Map, ... (as already described above). But additionally we will define sub-partitions, e.g. the partition Set has the sub-partition SortedSet.

When computing equals() we determine the actual partition by calculating the upper type-bound of the given collections. Example:

  • equals(Set, SortedSet) -> upper bound = Set
  • equals(SortedSet, Set) -> upper bound = Set
  • equals(SortedSet, SortedSet) -> upper bound = SortedSet

When having equals(SortedSet, SortedSet) we need to respect element order. We do this by using iteration order during comparison.


Idea

We might expect our equals() relation to form an equivalence class, i.e. the collection types are partitioned:

  • For every element a in X, a ~ a (reflexivity),
  • For every two elements a and b in X, if a ~ b, then b ~ a (symmetry),
  • For every three elements a, b, and c in X, if a ~ b and b ~ c, then a ~ c (transitivity).

(Here X is a collection type and ~ ⊆ X x X, a ~ b := equals(a, b) == true)

I suggest to relax this equivalence property a bit by allowing sub-partitions , i.e. the classes of elements that relate to each other aren't disjoint any more:. Update: We still have disjoint partitions. I think sub-partition is the wrong word here. It is just part of the definition of the equals relation.

equivalence-classes

In particular we need that for our sorted collections.

Proposed solution

final class Collections {

    static <T> boolean equals(Seq<T> seq1, Object o) {
        if (o == seq1) {
            return true;
        } else if (seq1 != null && o instanceof Seq) {
            @SuppressWarnings("unchecked")
            final Seq<T> seq2 = (Seq<T>) o;
            return seq1.size() == seq2.size() && areEqual(seq1, seq2);
        } else {
            return false;
        }
    }

    static <T> boolean equals(Set<T> set1, Object o) {
        if (o == set1) {
            return true;
        } else if (set1 != null && o instanceof Set) {
            @SuppressWarnings("unchecked")
            final Set<T> set2 = (Set<T>) o;
            if (set1.size() != set2.size()) {
                return false;
            }
            return !set1.isOrdered() ? set2.forAll(set1::contains) :
                   !set2.isOrdered() ? set1.forAll(set2::contains) : areEqual(set1, set2);
        } else {
            return false;
        }
    }

    ...

    // already existis!
    static boolean areEqual(Iterable<?> iterable1, Iterable<?> iterable2) { ... }
}

Tests:

{
    Set<Integer> set1 = Set(1, 2, 3);
    Set<String> set2 = Set("a", "b", "c");
    println(equals(set1, set2)); // = false (especially no ClassCastException)
    println(equals(set2, set1)); // = false
}

{
    Set<Integer> set1 = SortedSet(1, 2, 3);
    Set<Integer> set2 = API.<Integer> SortedSet((i, j) -> i - j, 1, 2, 3);
    println(equals(set1, set2)); // = true (because of same order)
    println(equals(set2, set1)); // = true
}

{
    Set<Integer> set1 = SortedSet(1, 2, 3);
    Set<Integer> set2 = API.<Integer> SortedSet((i, j) -> j - i, 1, 2, 3);
    println(equals(set1, set2)); // = false (because of different order)
    println(equals(set2, set1)); // = false
}

{
    Set<Integer> set1 = Set(1, 2, 3);
    Set<Integer> set2 = API.<Integer> SortedSet((i, j) -> j - i, 1, 2, 3);
    println(equals(set1, set2)); // = true (ignoring different order)
    println(equals(set2, set1)); // = true
}

References

@danieldietrich
Copy link
Contributor

@v1ctor There is a counter-example for transitivity of the proposed Set equality:

For every three elements a, b, and c in X, if a ~ b and b ~ c, then a ~ c (transitivity).

Set<Integer> a = API.<Integer> SortedSet((i, j) -> i - j, 1, 2, 3);
Set<Integer> b = Set(1, 2, 3);
Set<Integer> c = API.<Integer> SortedSet((i, j) -> j - i, 1, 2, 3);

Then equals(a, b) == true and equals(b, c) == true but equals(a, c) == false.

We have to think about another implementation...


In Scala:

scala> val a: Set[Int] = SortedSet(1, 2, 3)((i, j) => i - j)
a: scala.collection.immutable.Set[Int] = TreeSet(1, 2, 3)

scala> val b: Set[Int] = Set(1, 2, 3)
b: scala.collection.immutable.Set[Int] = Set(1, 2, 3)

scala> val c: Set[Int] = SortedSet(1, 2, 3)((i, j) => j - i)
c: scala.collection.immutable.Set[Int] = TreeSet(3, 2, 1)

scala> a == b
res0: Boolean = true

scala> b == c
res1: Boolean = true

scala> a == c
res2: Boolean = true

@danieldietrich
Copy link
Contributor

danieldietrich commented Mar 19, 2017

We can't just use the contains method for SortedSets - it will throw at runtime:

Set<Integer> set1 = SortedSet(1, 2, 3);
Set<String> set2 = SortedSet("a", "b", "c");
println(equals(set1, set2)); // throws ClassCastException
println(equals(set2, set1)); // throws ClassCastException

@danieldietrich
Copy link
Contributor

danieldietrich commented Mar 19, 2017

@v1ctor You were right! - using set1.forAll(set2::contains) works fine for all types of sets and is also used in Scala:

// scala
trait GenSetLike[A, +Repr] with (A => Boolean) {
  override def equals(that: Any): Boolean = that match {
    case that: GenSet[_] =>
      (this eq that) ||
      (that canEqual this) &&
      (this.size == that.size) &&
      (try this subsetOf that.asInstanceOf[GenSet[A]]
       catch { case ex: ClassCastException => false })
    case _ =>
      false
  }
}

(Source: scala/scala: GenSetLike.scala)

We only need to catch ClassCastException that may occur when calling SortedSet.contains():

Please take the following equals impl for Sets, it obeys all laws of an equivalence relation:

// javaslang
@SuppressWarnings("unchecked")
static <T> boolean equals(Set<T> set1, Object o) {
    if (o == set1) {
        return true;
    } else if (set1 != null && o instanceof Set) {
        final Set<T> set2 = (Set<T>) o;
        try {
            return set1.size() == set2.size() && set1.forAll(set2::contains);
        } catch(ClassCastException x) {
            return false;
        }
    } else {
        return false;
    }
}

Because we can't rely on the order of elements for Sets, we need another hashing strategy. Scala uses the following:

// scala
trait GenSetLike[A, +Repr] with (A => Boolean) {
  override def hashCode()= scala.util.hashing.MurmurHash3.setHash(seq)
}

Proposed hashing strategy see here.

danieldietrich pushed a commit that referenced this issue Apr 20, 2017
* issue #1555: add equals and hashCode, to Seq, Set, Map, Multimap

* issue #1555: rewrite equals methods and fixed all tests

* issue #1555: supress rawtypes warning

* issue #1555: supress unchecked warning

* issue #1555: add tests

* issue #1555: rewrite equals methods

* issue #1555: add some tests

* Hardening collection equality constraints and adding javadoc
@danieldietrich
Copy link
Contributor

Fixed with #1948

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants