Do not atomic decrement in drop when refcount == 1 #88

stepancheg · 2017-03-19T22:28:22Z

Synthetic benchmark which becomes 5% faster (on MacBook Pro 2,2 GHz Intel Core i7) is included.

carllerche · 2017-03-21T05:02:11Z

I had this code initially the same as std's Arc implementation (IIRC, a relaxed dec then a fence if 1). I switched it because the thread lint complained. How does the thread lint handle this code?

stepancheg · 2017-03-21T11:04:15Z

By thread lint you mean RUSTFLAGS="-Z sanitizer=thread" cargo test?

I need to set up a linux to check it.

alexcrichton · 2017-03-21T14:10:18Z

This is not correct, it's not how memory orderings work.

The whole point of "release" is to make changes in memory visible to the final thread, otherwise it can cause use-after-free bugs.

stepancheg · 2017-03-21T14:19:29Z

The whole point of "release" is to make changes in memory visible to the final thread, otherwise it can cause use-after-free bugs.

@alexcrichton sorry, I don't understand. Release should make it visible to some other thread, but there are no other threads if refcount == 1, this thread is the only thread referencing the memory.

alexcrichton · 2017-03-21T14:34:51Z

Ah yes I misinterpreted, but this is unfortunately still not how the memory orderings work I believe. There's some special clause or something like in the C11 standard (IIRC) that makes this exact construction work but no other. The fetch_sub should be AcqRel but this clause allows it to be Release for all threads so long as the final thread does an Acquire

stepancheg · 2017-03-21T15:36:12Z

Anyway, I can use load(Acquire) instead of load(Relaxed) and it still gives the same 5% boost (it probably generates identical machine code x86_64).

I believe code is correct with Acquire. Shouldn't patch be applied?

carllerche · 2017-03-21T16:49:02Z

It's not obvious to me that even using an acquire load is correct. Could you explain the the synchronization points and the happens before dependencies that the change in memory order creates in order to guarantee correctness?

stepancheg · 2017-03-21T18:24:46Z

Well, I'm not 100% sure either, and I'm not an expert in C++11 memory model.

Acquire means that all counter decrements in other threads (fetch_sub(Release)) happened before that load(Acquire). So this load in release_shared definitely "sees" proper counter value from last fetch_sub in other thread.

fetch_add does not do Release, so load(Acquire) won't easily "see" that change. There are two possible cases:

either refcount before fetch_add is > 1, so load(Acquire) == 1 would be false, no problem here
refcount before fetch_add is == 1, that means that either
** this thread called fetch_add, so load(Acquire) will see that change, or
** other thread called fetch_add via &Bytes reference, and Release happened at some time before drop or reserve (e. g. after mutex is unlocked; because drop and reserve take mut pointers).

But you probably know all that yourself.

alexcrichton · 2017-03-21T18:25:12Z

This has a very large doc block in boost (the original source here), and I have to imagine that such an optimization would have already been conceived by the original authors.

I would not personally know how to verify or deny claims that the construction proposed in this PR is safe.

stepancheg · 2017-03-21T20:00:05Z

I have to imagine that such an optimization would have already been conceived by the original authors

That's good point.

stepancheg · 2017-03-22T01:01:46Z

Found it!

Reading libc++ source code. They cannot do this optimization for strong counters, because weak_ptr can be upgraded to strong, but they do it for ~weak_ptr:

void
__shared_weak_count::__release_weak() _NOEXCEPT
{
    // NOTE: The acquire load here is an optimization of the very
    // common case where a shared pointer is being destructed while
    // having no other contended references.
    //
    // BENEFIT: We avoid expensive atomic stores like XADD and STREX
    // in a common case.  Those instructions are slow and do nasty
    // things to caches.
    //
    // IS THIS SAFE?  Yes.  During weak destruction, if we see that we
    // are the last reference, we know that no-one else is accessing
    // us. If someone were accessing us, then they would be doing so
    // while the last shared / weak_ptr was being destructed, and
    // that's undefined anyway.
    //
    // If we see anything other than a 0, then we have possible
    // contention, and need to use an atomicrmw primitive.
    // The same arguments don't apply for increment, where it is legal
    // (though inadvisable) to share shared_ptr references between
    // threads, and have them all get copied at once.  The argument
    // also doesn't apply for __release_shared, because an outstanding
    // weak_ptr::lock() could read / modify the shared count.
    if (__libcpp_atomic_load(&__shared_weak_owners_, _AO_Acquire) == 0)
    {
        // no need to do this store, because we are about
        // to destroy everything.
        //__libcpp_atomic_store(&__shared_weak_owners_, -1, _AO_Release);
        __on_zero_shared_weak();
    }
    else if (__libcpp_atomic_refcount_decrement(__shared_weak_owners_) == -1)
        __on_zero_shared_weak();
}

stepancheg · 2017-05-05T22:49:56Z

Could we reopen the issue please?

carllerche · 2017-05-07T17:58:35Z

Hmm, I can't actually re-open it for some reason. Would you mind creating an issue that references this PR to continue the discussion?

stepancheg added 2 commits March 20, 2017 00:58

Bench drop after split_off

30b211a

Unnecessary atomic decrement when refcount is 1

ba4cdb9

alexcrichton closed this Mar 21, 2017

stepancheg mentioned this pull request Mar 21, 2017

Optimize shallow_clone for Bytes::split_{off,to} #92

Merged

stepancheg mentioned this pull request May 7, 2017

Do not atomic decrement in drop when refcount == 1 #118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not atomic decrement in drop when refcount == 1 #88

Do not atomic decrement in drop when refcount == 1 #88

stepancheg commented Mar 19, 2017

carllerche commented Mar 21, 2017

stepancheg commented Mar 21, 2017

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

carllerche commented Mar 21, 2017

stepancheg commented Mar 21, 2017 •

edited

Loading

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

stepancheg commented Mar 22, 2017 •

edited

Loading

stepancheg commented May 5, 2017

carllerche commented May 7, 2017

Do not atomic decrement in drop when refcount == 1 #88

Do not atomic decrement in drop when refcount == 1 #88

Conversation

stepancheg commented Mar 19, 2017

carllerche commented Mar 21, 2017

stepancheg commented Mar 21, 2017

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

carllerche commented Mar 21, 2017

stepancheg commented Mar 21, 2017 • edited Loading

alexcrichton commented Mar 21, 2017

stepancheg commented Mar 21, 2017

stepancheg commented Mar 22, 2017 • edited Loading

stepancheg commented May 5, 2017

carllerche commented May 7, 2017

stepancheg commented Mar 21, 2017 •

edited

Loading

stepancheg commented Mar 22, 2017 •

edited

Loading