Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Issue #8051: std.Thread.Condition: error: no member named 'unlock' in struct 'std.Thread.Mutex' #10105

Closed
wants to merge 3 commits into from
Closed

Fix for Issue #8051: std.Thread.Condition: error: no member named 'unlock' in struct 'std.Thread.Mutex' #10105

wants to merge 3 commits into from

Conversation

geo-ant
Copy link

@geo-ant geo-ant commented Nov 5, 2021

Hi,
this is a potential fix for Issue 8051. It is a very low tech solution, but I did add a test case (which is also pretty low tech).

The test case will succeed if signalling the condition variable unblocks the waiting thread. However, it would also (often) succeed when waiting on a condition variable has no effect. I'd be interested to get a hint on how I can test that waiting on the condition variable does really block the waiting thread.

There is also PR 9437 which tackles the same issue but hasn't been active for a while.

Cheers
Geo

Copy link
Contributor

@matu3ba matu3ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from mutex.lock() -> mutex.releaseDirect() should be documented. Check, if RWLock.zig got removed from Mutex.zig and does the right thing for all edge cases.

@@ -152,9 +152,9 @@ pub const AtomicCondition = struct {
@atomicStore(bool, &cond.pending, true, .SeqCst);
}

mutex.unlock();
mutex.releaseDirect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock and unlock are implemented in RwLock.zig. Did you check, if the code got moved from Mutex.zig into RwLock.zig and thus broke stuff? Otherwise you dont explain in the commit message why changing this is fine

Also it says /// Release the mutex. Prefer Held.release() if available., so you might want to explain why release.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for taking the time to review the PR. My reasoning was simple (maybe simplistic): since no instance of Held is available to me, I just used the Mutex API. Would it make sense to extend the interface of wait to include the Held instance? This would feel repetitive and error prone, but there's probably something I am not seeing.

Copy link
Contributor

@matu3ba matu3ba Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to extend the interface of wait to include the Held instance?

Held is defined in Mutex.zig, but not needed as we dont expose it to the user. I asked about unlock, but I checked the git log and nothing shows up so it looks like it was broken from the commit it was introduced.

waiter.data.wait();
mutex.lock();
_ = mutex.acquire();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same argument as for line 155.

Copy link
Author

@geo-ant geo-ant Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was this: since we expect a locked mutex to be passed into the wait method of the condition variable, we already have an instance of Held for that mutex in the calling scope. I considered to change the interface for wait to return the Held instance, but that felt confusing to me, because we already have an instance in the calling scope associated with the mutex. Very likely we have code like this:

// the mutex exists somewhere, maybe outside this scope
const held = mutex.acquire();
defer held.release();
while(! some_condition) {
  condvar.wait(&mutex);
}

Since the mutex is guaranteed to be locked when wait returns, it is semantically correct to release the mutex using the held instance. At least I think so.

Now if I make wait return the Held instance, I would have two instances pointing to the same mutex, which is confusing at best. Let me know if that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if that makes sense.

yes. looks good. The overall code in Thread looks good, but there is no nice overview how stuff works for good code navigation. So it took me a while to figure things out.

@matu3ba
Copy link
Contributor

matu3ba commented Nov 7, 2021

The test case will succeed if signalling the condition variable unblocks the waiting thread. However, it would also (often) succeed when waiting on a condition variable has no effect. I'd be interested to get a hint on how I can test that waiting on the condition variable does really block the waiting thread.

Can you clarify what you mean with

    while (!run_condition.load(.SeqCst)) {
        condvar.wait(&mutex);

has no effect?

how I can test that waiting on the condition variable does really block the waiting thread.

It is not easy/portable.

One approach, if you can disable preemption/rely on no scheduler not preemting: get the system time before and after the section against a minimal architecture-specific constant.
(A debugger is essentially a scheduler that can stop execution and read state.)

Another approach would be to "invoke a debugger" (doesnt work in CI though) or the approach mentioned on stackoverflow.

@geo-ant
Copy link
Author

geo-ant commented Nov 7, 2021

Can you clarify what you mean with

    while (!run_condition.load(.SeqCst)) {
        condvar.wait(&mutex);

has no effect?

Yeah, my wording was not very precise. What I meant is that my test would succeed even if condvar.wait(...) was a no-op. This is independent of the surrounding while loop, which I just put there against spurious wake-ups.

how I can test that waiting on the condition variable does really block the waiting thread.

It is not easy/portable.

One approach, if you can disable preemption/rely on no scheduler not preemting: get the system time before and after the section against a minimal architecture-specific constant. (A debugger is essentially a scheduler that can stop execution and read state.)

Another approach would be to "invoke a debugger" (doesnt work in CI though) or the approach mentioned on stackoverflow.

Argh, I feared that would be trouble. Do I understand correctly that none of the approaches is portable or work on CI? Can you suggest a way forward?

@matu3ba
Copy link
Contributor

matu3ba commented Nov 8, 2021

@geo-ant From kprotty:

t1:
  lock()
  defer unlock()
  cond = false
  while !cond:
    wait(&lock)

t2:
  lock()
  defer unlock()
  cond = true
  notify()

and tests in his rework PR.

These were be taken from either #8084 or #7800 (a rewrite was planned to be done eventually, but tests are always valuable).

You may also want to glance at #9136.

@geo-ant
Copy link
Author

geo-ant commented Nov 9, 2021

@geo-ant From kprotty:

t1:
  lock()
  defer unlock()
  cond = false
  while !cond:
    wait(&lock)

t2:
  lock()
  defer unlock()
  cond = true
  notify()

and tests in his rework PR.

These were be taken from either #8084 or #7800 (a rewrite was planned to be done eventually, but tests are always valuable).

You may also want to glance at #9136.

Thanks, I'll follow up on this. I like kprotty's tests more than mine and I'll try to get them working for me, without the implementation changes that kprotty made to the condition variable.

@andrewrk andrewrk closed this in 008b0ec Nov 10, 2021
nuald pushed a commit to nuald/zig that referenced this pull request Nov 13, 2021
This is a breaking change. Before, usage looked like this:

```zig
const held = mutex.acquire();
defer held.release();
```

Now it looks like this:

```zig
mutex.lock();
defer mutex.unlock();
```

The `Held` type was an idea to make mutexes slightly safer by making it
more difficult to forget to release an aquired lock. However, this
ultimately caused more problems than it solved, when any data structures
needed to store a held mutex. Simplify everything by reducing the API
down to the primitives: lock() and unlock().

Closes ziglang#8051
Closes ziglang#8246
Closes ziglang#10105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants