Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing Chain#distinctBy method #4156

Merged
merged 2 commits into from
Apr 13, 2022
Merged

Conversation

satorg
Copy link
Contributor

@satorg satorg commented Apr 2, 2022

Core:

  • Adds the missing Chain#distinctBy method.
  • Slightly optimizes its implementation (replaces two accesses to TreeSet via contains followed by += with just one access via add).

Tests:

  • Introduces trait cats.tests.compat.ScalaVersionSpecificSyntax that implements an extension method distinctBy for any Seq-derived collection in Scala 2.12 (and does nothing for other Scala versions).
  • Adds a test for Chain#distinctBy to ChainSuite.

@satorg
Copy link
Contributor Author

satorg commented Apr 2, 2022

Note: this distinctBy can be further optimized by returning this in cases when Chain is empty or contains a single element only. Apparently such a collection cannot contain duplicates. Something like this:

  def distinctBy[B](f: A => B)(implicit O: Order[B]): Chain[A] = {
    // Cannot just use `size` (or `length`) method here since currently they enumerate the entire collection.
    if (isEmpty || this.isInstanceOf[Chain.Singleton[_]]) this else {
      implicit val ord: Ordering[B] = O.toOrdering

      val alreadyIn = mutable.TreeSet.empty[B]
      // All the other code is the same
    }
  }

Not sure if it's really worth it though. Wdyt?

core/src/main/scala/cats/data/Chain.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/data/Chain.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/data/Chain.scala Outdated Show resolved Hide resolved
core/src/main/scala/cats/data/Chain.scala Outdated Show resolved Hide resolved
johnynek
johnynek previously approved these changes Apr 4, 2022
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

var result = Chain.empty[A]
val seen = mutable.TreeSet.empty[B]
val it = iterator
while (it.hasNext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one issue: this is going to create a Chain the costs O(N) to uncons.

I wonder if the better approach would build a Vector and Wrap that. Vector seems to be the default collection we use in other places

e.g.

val bldr = Vector.newBuilder[A]
...

Wrap(bldr.result())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wonder too. I didn't really think about it, perhaps you're right. Let's try this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

bldr += next
}
// Result can contain a single element only.
Chain.fromSeq(bldr.result())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially this line (and the similar one below) could be expanded into something more efficient like

bldr.result() match {
  case Vector(a) => Singleton(a)
  case seq       => Wrap(seq)
}

but... I'm not really sure it is worth it – I mean, it is unlikely that such a single call can lead to substantial (or even noticeable) performance decrease, IMO.

Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

@satorg satorg merged commit 68314b4 into typelevel:main Apr 13, 2022
@satorg satorg deleted the chain-distinct-by branch April 13, 2022 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants