Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Overhead - MessageDigest #3157

Closed
etspaceman opened this issue Mar 2, 2023 · 8 comments
Closed

High Overhead - MessageDigest #3157

etspaceman opened this issue Mar 2, 2023 · 8 comments

Comments

@etspaceman
Copy link

etspaceman commented Mar 2, 2023

Bringing this up per a discord conversation with @armanbilge , @djspiewak and @mpilquist .

I was looking to use hashing functionality in FS2 as I believe it is one of the (if not the) only Scala Native / Scala.js compliant abstraction for hashing.

However, it seems that the definition will call MessageDigest.getInstance on every invocation of the Pipe. This has a very high overhead. References:

https://cl4es.github.io/2021/01/04/Investigating-MD5-Overheads.html
svroonland/zio-kinesis#732
https://bugs.openjdk.org/browse/JDK-8259065 (slight improvement via caching in Java 17)

For me, this was problematic, as I was going to be calling that pipe several times (e.g. hash this record). It looks like Java 17 is going to start caching default constructors, but many users are likely still on older java versions, and using this would incur a pretty high cost.

Something we could do is leverage https://github.com/typelevel/keypool to create our own cache of MessageDigest instances. This could substantially lower the overhead of these offerings.

Important to note that MessageDigest must have its reset() function invoked before it is re-used.

@armanbilge
Copy link
Member

I wonder if we can just cheat by having an AtomicReference[List[MessageDigest]]. Every time the stream is invoked it can pop a digest off of that, and when it's done, it can pop it back on.

Cheating because:

  1. global state
  2. it would be leaning into Stream.suspend
    def suspend[F[_], O](s: => Stream[F, O]): Stream[F, O] =
    Pull.suspend(s.underlying).streamNoScope

@etspaceman
Copy link
Author

Isn't that effectively what keypool does? What's the advantage of rolling our own here?

@armanbilge
Copy link
Member

armanbilge commented Mar 2, 2023

Putting aside the issue of keypool specifically, doing anything more than this has two issues:

  1. it requires typeclass constraints. hash currently has no constraints.
  2. getting a hash function is now effectful, since it involves shared state. i.e. you'd have to get a F[md5], instantiate it at the beginning of your program, and pass it around.

@etspaceman
Copy link
Author

I would think 2 would be needed regardless, as an AtomicReference[List[MessageDigest]] would need to be constructed somewhere

@armanbilge
Copy link
Member

Hence, cheat (1) :) I say just put it in the hash object.

@armanbilge
Copy link
Member

My bigger take-away here is that we are stretching the limits of what fs2.hash can do without a more fundamental redesign. And I'm not sure such a thing belongs in FS2 anyway. We are still needing that Typelevel crypto library :)

@etspaceman
Copy link
Author

I would probably agree with this. Either a separate library or just having it in CE would make sense to me.

@mpilquist
Copy link
Member

I don't think fs2 should have a dependency on keypool. If the improvements in Java 17 address all or most of the issue, then I think we should keep things as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants