-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove internal lazy val from Later. #989
Conversation
This commit replaces the previous Later implementation with one that is not based on Scala's lazy val encoding. Concretely, this new implementation uses a private, internal lock to avoid potential deadlocks that could arise if other objects synchronized on a Later instance. The possibility of this happening is relatively remote but not impossible. Performance seems to be equivalent to lazy vals in the cases that were tested. An additional Object is allocated, so Later instance may be slightly larger.
ac22d9e
to
dd3e299
Compare
if (thunk == null) { | ||
result | ||
} else lock.synchronized { | ||
if (thunk == null) result else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that double-checked locking on the JVM requires a volatile
var because of...reasons. http://stackoverflow.com/questions/7855700/why-is-volatile-used-in-this-example-of-double-checked-locking hits on this. You are definitely more of a JVM expert than I am though, so I feel like you could probably convince me that this isn't a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the reason this isn't a problem is because we are only setting thunk = null
after result = thunk()
has fully initialized the result: A
value. This means that we don't have to worry about someone else returning a partially-evaluated result
value. I could be wrong though. I looked at this code with @johnynek -- maybe he can weigh in here too?
I imagine that this might have been inspired in part by scalaz/scalaz#1145 ? If so, I think that the In the case of It's quite possible that this is still a good idea, and I definitely think you would be a better judge of that than I. I just wanted to bring to attention the difference between |
Current coverage is
|
@ceedubs I would be comfortable moving the locking to |
@xuwei-k wow, super helpful. Thanks! |
Thanks! |
I think if you check result != null this works. On Wednesday, April 20, 2016, Erik Osheim notifications@github.com wrote:
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin |
Actually result needs to volatile. See this paper: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html The solution at the end hat works with Java 5 (I think we can skip Java 4 On Wednesday, April 20, 2016, P. Oscar Boykin oscar.boykin@gmail.com
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin |
Just to be clear, locking |
This commit also adds a synchronization test, based on Kenji's example. I confirmed that this test fails when the @volatile annotation is removed from thunk.
Thanks for the example @xuwei-k! I added this as a test case. |
For now, the easiest way to test synchronization is to just let some threads run for awhile. But that makes the current test difficult to write. Eventually we should think about a good way to test this on JS but for now I think a JVM-only test is OK.
@non I see that you have gone to some length to ensure that this is serializable. Is that being checked by a test, or should we be adding one? It's been mentioned by both @non and @johnynek that the chances of deadlock from locking on |
the serialization was exercised by the semigroup serialization check (indeed that is why we added the code). I hear you that it is pretty defensive to worry about locks on |
|
||
final def value: A = | ||
if (thunk == null) { | ||
result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be a nasty corner case here where both thunk
and result
evaluate to null
from a secondary threads perspective, but would definitely love to find out I'm wrong.
Consider:
- Thread A enters
value
method and inspectsthunk
, seeingthunk
as nonnull
- Thread A enters synchronized block
- Thread A inspects
thunk
again, and sees it as nonnull
- Thread A enters else branch and evaluates
thunk
, assigning the calculatedA
toresult
- Thread A assigns
null
tothunk
- Thread B now enters
value
method and inspectsthunk
- Thread B sees
thunk
asnull
due to volatility. The value has been written in Thread A asnull
. - Thread B returns the value of
result
I think that it is possible for the Thread B result
variable to still point to a null
value, briefly in this scenario because I don't think either the volatility of thunk
or the synchronized
block which this code block does not take place in, make any guarantees about which value Thread B might see as the value in result
. If I'm wrong here it would be great to find out why as it would improve my understanding of the JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think that's a real issue. I think switching the volatile
to
result and checking that would be safe, but requires a cast since A >: Null
is not guaranteed (but should be safe on the jvm).
On Mon, May 2, 2016 at 8:50 PM, Mike Curry notifications@github.com wrote:
In core/src/main/scala/cats/Eval.scala
#989 (comment):
- lazy val value: A = {
- val result = thunk()
- thunk = null // scalastyle:off
- result
- }
+final class Later[A](f: %28%29 => A) extends Eval[A] with Externalizable {
- // we use lock to prevent possible problems with someone else
- // synchronizing on 'this'.
- private[this] val lock = new Object()
- @volatile private[this] var thunk: Function0[A] = f
- private[this] var result: A = _
- final def value: A =
- if (thunk == null) {
result
I think there might be a nasty corner case here where both thunk and
result evaluate to null from a secondary threads perspective, but would
definitely love to find out I'm wrong.
Consider:
- Thread A enters value method and inspects thunk, seeing thunk as non
null- Thread A enters synchronized block
- Thread A inspects thunk again, and sees it as non null
- Thread A enters else branch and evaluates thunk, assigning the
calculated A to result- Thread A assigns null to thunk
- Thread B now enters value method and inspects thunk
- Thread B sees thunk as null due to volatility. The value has been
written in Thread A as null.- Thread B returns the value of result
I think that it is possible for the Thread B result variable to still
point to a null value, briefly in this scenario because I don't think
either the volatility of thunk or the synchronized block which this code
block does not take place in, make any guarantees about which value Thread
B might see as the value in result. If I'm wrong here it would be great
to find out why as it would improve my understanding of the JVM.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/typelevel/cats/pull/989/files/39828830bb958ff9bc775ed34897e72d3f59a7c6#r61843443
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, agree - switching the volatile
to result
should close any loops. I don't think you need a cast though - at least not an explicit one - this should be fine.
final class Later[A](f: () => A) extends Eval[A] {
// we use lock to prevent possible problems with someone else
// synchronizing on 'this'.
private[this] val lock = new Object()
private[this] var thunk: Function0[A] = f
@volatile private[this] var result: A = _
final def value: A =
if (result != null) {
result
} else lock.synchronized {
if (result != null) result else {
result = thunk()
thunk = null
result
}
}
def memoize: Eval[A] = this
}
BTW - here is an alternate encoding. Not sure what would be better - but it is an interesting problem so I was thinking of a few different approaches and this one seemed interesting.
final class Later[A](f: () => A) extends Eval[A] {
private[this] var thunk: () => A = f
private[this] trait Result {
def value: A
}
private[this] val evaluated: Result = new Result {
lazy val value = {
val result = thunk()
thunk = null
result
}
}
final def value: A = evaluated.value
def memoize: Eval[A] = this
}
This encoding avoids explicit use of volatile
or synchronized
, and moves the lock to an inner type so should avoid the synchronization
problem mentioned. I'm not sure how the cost of the additional reference to the lazy val
in the inner type would affect performance, I guess it would need to be measured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in the case where we check result != null
we also need to throw an NPE or something if thunk() == null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand.thunk() == null
should never be true
, as it is the evaluation of a Function0[A]
- which evaluates to an A
. Well - at least no more than any other scala function with a return type of A
can evaluate to null
. So, I think thunk() == null
should not be considered here.
The reference to thunk
can equal null
, i.e. thunk == null
, but there is only one dereference of thunk
to and this is in the synchronized
block, which is only entered when result == null
.
So, I think that the volatile
nature of result
would make that safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree one should not return null, but if they do, this code will first
return null, but on a subsequent call would NPE (you would try to call
thunk again, but it would be null).
To me, this behavior sounds no good. I'd rather NPE the first time around
hen returning null once then failing if you happen to get the value again.
On Wednesday, May 4, 2016, Mike Curry notifications@github.com wrote:
In core/src/main/scala/cats/Eval.scala
#989 (comment):
- lazy val value: A = {
- val result = thunk()
- thunk = null // scalastyle:off
- result
- }
+final class Later[A](f: %28%29 => A) extends Eval[A] with Externalizable {
- // we use lock to prevent possible problems with someone else
- // synchronizing on 'this'.
- private[this] val lock = new Object()
- @volatile private[this] var thunk: Function0[A] = f
- private[this] var result: A = _
- final def value: A =
- if (thunk == null) {
result
I'm not sure I understand.thunk() == null should never be true, as it is
the evaluation of a Function0[A]- which evaluates to an A. Well - at
least no more than any other scala function with a return type of A can
evaluate to null. So, I think thunk() == null should not be considered
here.The reference to thunk can equal null, i.e. thunk == null, but there is
only one dereference of thunk to and this is in the synchronized block,
which is only entered when result == null.So, I think that the volatile nature of result would make that safe.
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/typelevel/cats/pull/989/files/39828830bb958ff9bc775ed34897e72d3f59a7c6#r62152850
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. That would be nasty.
I'm going to rescind this for now, until @johnynek and I have a chance to think about this more and do more performance testing. I think the issue it's solving (what if someone locks a |
This commit replaces the previous Later implementation
with one that is not based on Scala's lazy val encoding.
Concretely, this new implementation uses a private,
internal lock to avoid potential deadlocks that could
arise if other objects synchronized on a Later instance.
The possibility of this happening is relatively remote
but not impossible.
Performance seems to be equivalent to lazy vals in the
cases that were tested. An additional Object is
allocated, so Later instance may be slightly larger.
Is this a good idea? Should we stick with
lazy val
?Should we synchronize on
this
to have smallerinstances? Should we be doing something else?