Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XrdCl: Provide atomicity for PostMaster value using built-in functions #266

Merged
merged 1 commit into from
Jul 20, 2015

Conversation

esindril
Copy link
Contributor

This uses the built-in atomic functions. The test program is:

#include "XrdSys/XrdSysAtomics.hh"

int a;

void val_set(int v) {  AtomicCAS(a, a, v); }
int val_get() { return AtomicGet(a); }

The assembler code for x86_64 is as expected:

_Z7val_seti:
        .cfi_startproc
        movl    %edi, a(%rip)
        ret
        .cfi_endproc

_Z7val_getv:
        .cfi_startproc
        movl    a(%rip), %eax
        ret
        .cfi_endproc

While for ARM it looks like this:

_Z7val_seti:
        .fnstart
.LFB0:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r3, .L2
        str     r0, [r3, #0]
        bx      lr
.L3:
        .align  2
.L2:
        .word   .LANCHOR0
        .cantunwind
        .fnend
_Z7val_getv:
        .fnstart
.LFB1:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r3, .L5
        ldr     r0, [r3, #0]
        bx      lr
.L6:
        .align  2
.L5:
        .word   .LANCHOR0
        .cantunwind
        .fnend

So, the conclusion is that for x86_64 there is no difference whether we use or not __sync_fetch*. Obviously this also works without c++11. Let me know which one you prefer between this one and #265.

@ljanyst
Copy link
Contributor

ljanyst commented Jul 16, 2015

You don't see the difference because you have used a 32-bit integer in your test case...

@ljanyst
Copy link
Contributor

ljanyst commented Jul 16, 2015

I also think that you will see a difference between __sync_bool_compare_and_swap that you used and __sync_lock_test_and_set, because the former also needs to do the comparison.

@esindril
Copy link
Contributor Author

For x86_64 the assembler code is the same for both int and long long/int64_t.

The details on __sync_lock_test_and_set don't seem too reassuring when it comes to support for this.

type __sync_lock_test_and_set (type *ptr, type value, ...)
  This builtin, as described by Intel, is not a traditional test-and-set operation, 
  but rather an atomic exchange operation. It writes value into *ptr, and returns the 
  previous contents of *ptr.

  Many targets have only minimal support for such locks, and do not support a full 
  exchange operation. In this case, a target may support reduced functionality here
  by which the only valid value to store is the immediate constant 1. The exact value
  actually stored in *ptr is implementation defined.

  This builtin is not a full barrier, but rather an acquire barrier. This means that 
  references after the builtin cannot move to (or be speculated to) before the builtin, 
  but previous memory stores may not be globally visible yet, and previous memory 
  loads may not yet be satisfied. 

And it's not in the XrdSysAtomics.hh header file. I tried to stick to what was available there.

@ljanyst
Copy link
Contributor

ljanyst commented Jul 16, 2015

  void *a = malloc(10);

Results with:

        movl    $10, %edi
        call    malloc
        movq    %rax, -8(%rbp)
  void *a = 0;
  __sync_bool_compare_and_swap(&a, 0, malloc(10));

Results with:

        movl    $10, %edi
        call    malloc
        movq    %rax, %rdx
        movl    $0, %eax
        lock cmpxchgq   %rdx, -8(%rbp)
  void *a = 0;
  __sync_lock_test_and_set(&a, malloc(10));

Results with:

        movq    $0, -8(%rbp)
        movl    $10, %edi
        call    malloc
        xchgq   -8(%rbp), %rax

The LOCK prefix essentially means that the current CPU has ownership of the cache line for the duration of the operation, so you get the atomicity for the memory access. All the internet says that xchgq has implicit LOCK if one of it's arguments is a memory address, which is the case here, so indeed __sync_bool_compare_and_swap and __sync_lock_test_and_set should roughly have the same penalty.

@ljanyst
Copy link
Contributor

ljanyst commented Jul 16, 2015

For ARM both __sync_bool_compare_and_swap and __sync_lock_test_and_set result with function calls that are probably provided by libgcc.

@xrootd-dev
Copy link

Yes, I know it’s not there. I have a new XrdSysAtomics.hh that adds. It also transparently supports C11, gcc __atomic_xxx(), and __sync_xxx() operations, depending on what is available. I am waiting for 4.2 to come out before pushing it in.

Andy

From: Elvin Sindrilaru
Sent: Thursday, July 16, 2015 7:03 AM
To: xrootd/xrootd
Subject: Re: [xrootd] XrdCl: Provide atomicity for PostMaster value using built-in functions (#266)

For x86_64 the assembler code is the same for both int and long long/int64_t.

The details on __sync_lock_test_and_set don't seem too reassuring when it comes to support for this.

type __sync_lock_test_and_set (type *ptr, type value, ...)
This builtin, as described by Intel, is not a traditional test-and-set operation,
but rather an atomic exchange operation. It writes value into *ptr, and returns the
previous contents of *ptr.

Many targets have only minimal support for such locks, and do not support a full
exchange operation. In this case, a target may support reduced functionality here
by which the only valid value to store is the immediate constant 1. The exact value
actually stored in *ptr is implementation defined.

This builtin is not a full barrier, but rather an acquire barrier. This means that
references after the builtin cannot move to (or be speculated to) before the builtin,
but previous memory stores may not be globally visible yet, and previous memory
loads may not yet be satisfied.
And it's not in the XrdSysAtomics.hh header file. I tried to stick to what was available there.


Reply to this email directly or view it on GitHub.


Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@esindril esindril force-pushed the fix_postmaster_builtin_atomic branch 2 times, most recently from c510451 to 106165d Compare July 20, 2015 09:09
@esindril esindril force-pushed the fix_postmaster_builtin_atomic branch from 106165d to b38e60c Compare July 20, 2015 09:10
@esindril
Copy link
Contributor Author

Updated pull request to follow the naming conventions.

esindril added a commit that referenced this pull request Jul 20, 2015
XrdCl: Provide atomicity for PostMaster value using built-in functions
@esindril esindril merged commit 498483b into xrootd:master Jul 20, 2015
@esindril esindril deleted the fix_postmaster_builtin_atomic branch February 7, 2023 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants