Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not atomic decrement in drop when refcount == 1 #88

Closed
wants to merge 2 commits into from

Conversation

stepancheg
Copy link
Contributor

Synthetic benchmark which becomes 5% faster (on MacBook Pro 2,2 GHz Intel Core i7) is included.

@carllerche
Copy link
Member

I had this code initially the same as std's Arc implementation (IIRC, a relaxed dec then a fence if 1). I switched it because the thread lint complained. How does the thread lint handle this code?

@stepancheg
Copy link
Contributor Author

By thread lint you mean RUSTFLAGS="-Z sanitizer=thread" cargo test?

I need to set up a linux to check it.

@alexcrichton
Copy link
Contributor

This is not correct, it's not how memory orderings work.

The whole point of "release" is to make changes in memory visible to the final thread, otherwise it can cause use-after-free bugs.

@stepancheg
Copy link
Contributor Author

The whole point of "release" is to make changes in memory visible to the final thread, otherwise it can cause use-after-free bugs.

@alexcrichton sorry, I don't understand. Release should make it visible to some other thread, but there are no other threads if refcount == 1, this thread is the only thread referencing the memory.

@alexcrichton
Copy link
Contributor

Ah yes I misinterpreted, but this is unfortunately still not how the memory orderings work I believe. There's some special clause or something like in the C11 standard (IIRC) that makes this exact construction work but no other. The fetch_sub should be AcqRel but this clause allows it to be Release for all threads so long as the final thread does an Acquire

@stepancheg
Copy link
Contributor Author

Anyway, I can use load(Acquire) instead of load(Relaxed) and it still gives the same 5% boost (it probably generates identical machine code x86_64).

I believe code is correct with Acquire. Shouldn't patch be applied?

@carllerche
Copy link
Member

It's not obvious to me that even using an acquire load is correct. Could you explain the the synchronization points and the happens before dependencies that the change in memory order creates in order to guarantee correctness?

@stepancheg
Copy link
Contributor Author

stepancheg commented Mar 21, 2017

Well, I'm not 100% sure either, and I'm not an expert in C++11 memory model.

Acquire means that all counter decrements in other threads (fetch_sub(Release)) happened before that load(Acquire). So this load in release_shared definitely "sees" proper counter value from last fetch_sub in other thread.

fetch_add does not do Release, so load(Acquire) won't easily "see" that change. There are two possible cases:

  • either refcount before fetch_add is > 1, so load(Acquire) == 1 would be false, no problem here
  • refcount before fetch_add is == 1, that means that either
    ** this thread called fetch_add, so load(Acquire) will see that change, or
    ** other thread called fetch_add via &Bytes reference, and Release happened at some time before drop or reserve (e. g. after mutex is unlocked; because drop and reserve take mut pointers).

But you probably know all that yourself.

@alexcrichton
Copy link
Contributor

This has a very large doc block in boost (the original source here), and I have to imagine that such an optimization would have already been conceived by the original authors.

I would not personally know how to verify or deny claims that the construction proposed in this PR is safe.

@stepancheg
Copy link
Contributor Author

I have to imagine that such an optimization would have already been conceived by the original authors

That's good point.

@stepancheg
Copy link
Contributor Author

stepancheg commented Mar 22, 2017

Found it!

Reading libc++ source code. They cannot do this optimization for strong counters, because weak_ptr can be upgraded to strong, but they do it for ~weak_ptr:

void
__shared_weak_count::__release_weak() _NOEXCEPT
{
    // NOTE: The acquire load here is an optimization of the very
    // common case where a shared pointer is being destructed while
    // having no other contended references.
    //
    // BENEFIT: We avoid expensive atomic stores like XADD and STREX
    // in a common case.  Those instructions are slow and do nasty
    // things to caches.
    //
    // IS THIS SAFE?  Yes.  During weak destruction, if we see that we
    // are the last reference, we know that no-one else is accessing
    // us. If someone were accessing us, then they would be doing so
    // while the last shared / weak_ptr was being destructed, and
    // that's undefined anyway.
    //
    // If we see anything other than a 0, then we have possible
    // contention, and need to use an atomicrmw primitive.
    // The same arguments don't apply for increment, where it is legal
    // (though inadvisable) to share shared_ptr references between
    // threads, and have them all get copied at once.  The argument
    // also doesn't apply for __release_shared, because an outstanding
    // weak_ptr::lock() could read / modify the shared count.
    if (__libcpp_atomic_load(&__shared_weak_owners_, _AO_Acquire) == 0)
    {
        // no need to do this store, because we are about
        // to destroy everything.
        //__libcpp_atomic_store(&__shared_weak_owners_, -1, _AO_Release);
        __on_zero_shared_weak();
    }
    else if (__libcpp_atomic_refcount_decrement(__shared_weak_owners_) == -1)
        __on_zero_shared_weak();
}

@stepancheg
Copy link
Contributor Author

Could we reopen the issue please?

@carllerche
Copy link
Member

Hmm, I can't actually re-open it for some reason. Would you mind creating an issue that references this PR to continue the discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants