Skip to content

Commit

Permalink
futex2: Implement wait and wake functions
Browse files Browse the repository at this point in the history
Create a new set of futex syscalls known as futex2. This new interface
is aimed to implement a more maintainable code, while removing obsolete
features and expanding it with new functionalities.

Implements wait and wake semantics for futexes, along with the base
infrastructure for future operations. The whole wait path is designed to
be used by N waiters, thus making easier to implement vectorized wait.

* Syscalls implemented by this patch:

- futex_wait(void *uaddr, unsigned int val, unsigned int flags,
	     struct timespec *timo)

   The user thread is put to sleep, waiting for a futex_wake() at uaddr,
   if the value at *uaddr is the same as val (otherwise, the syscall
   returns immediately with -EAGAIN). timo is an optional timeout value
   for the operation.

   Return 0 on success, error code otherwise.

 - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)

   Wake `nr_wake` threads waiting at uaddr.

   Return the number of woken threads on success, error code otherwise.

** The `flag` argument

 The flag is used to specify the size of the futex word
 (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no
 default size.

 By default, the timeout uses a monotonic clock, but can be used as a realtime
 one by using the FUTEX_REALTIME_CLOCK flag.

 By default, futexes are of the private type, that means that this user address
 will be accessed by threads that shares the same memory region. This allows for
 some internal optimizations, so they are faster. However, if the address needs
 to be shared with different processes (like using `mmap()` or `shm()`), they
 need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that.

 By default, the operation has no NUMA-awareness, meaning that the user can't
 choose the memory node where the kernel side futex data will be stored. The
 user can choose the node where it wants to operate by setting the
 FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or
 32):

  struct futexX_numa {
          __uX value;
          __sX hint;
  };

 This structure should be passed at the `void *uaddr` of futex functions. The
 address of the structure will be used to be waited/waken on, and the
 `value` will be compared to `val` as usual. The `hint` member is used to
 defined which node the futex will use. When waiting, the futex will be
 registered on a kernel-side table stored on that node; when waking, the futex
 will be searched for on that given table. That means that there's no redundancy
 between tables, and the wrong `hint` value will led to undesired behavior.
 Userspace is responsible for dealing with node migrations issues that may
 occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or
 -1, to use the same node the current process is using.

 When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a
 global table on some node, defined at compilation time.

** The `timo` argument

As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout
options known to be buggy. Given that, `timo` should be a 64bit timeout at
all platforms, using an absolute timeout value.

Signed-off-by: André Almeida <andrealmeid@collabora.com>

Rebased-by: Joshua Ashton <joshua@froggi.es>
  • Loading branch information
andrealmeid authored and xanmod committed Jun 29, 2021
1 parent 74c1627 commit 4f95b45
Show file tree
Hide file tree
Showing 15 changed files with 671 additions and 4 deletions.
2 changes: 1 addition & 1 deletion MAINTAINERS
Expand Up @@ -7529,7 +7529,7 @@ F: Documentation/locking/*futex*
F: include/asm-generic/futex.h
F: include/linux/futex.h
F: include/uapi/linux/futex.h
F: kernel/futex.c
F: kernel/futex*
F: tools/perf/bench/futex*
F: tools/testing/selftests/futex/

Expand Down
2 changes: 2 additions & 0 deletions arch/arm/tools/syscall.tbl
Expand Up @@ -460,3 +460,5 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
447 common futex_wait sys_futex_wait
448 common futex_wake sys_futex_wake
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/unistd.h
Expand Up @@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

#define __NR_compat_syscalls 447
#define __NR_compat_syscalls 449
#endif

#define __ARCH_WANT_SYS_CLONE
Expand Down
4 changes: 4 additions & 0 deletions arch/arm64/include/asm/unistd32.h
Expand Up @@ -900,6 +900,10 @@ __SYSCALL(__NR_landlock_create_ruleset, sys_landlock_create_ruleset)
__SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
#define __NR_landlock_restrict_self 446
__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
#define __NR_futex_wait 447
__SYSCALL(__NR_futex_wait, sys_futex_wait)
#define __NR_futex_wake 448
__SYSCALL(__NR_futex_wake, sys_futex_wake)

/*
* Please add new compat syscalls above this comment and update
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/entry/syscalls/syscall_32.tbl
Expand Up @@ -451,3 +451,5 @@
444 i386 landlock_create_ruleset sys_landlock_create_ruleset
445 i386 landlock_add_rule sys_landlock_add_rule
446 i386 landlock_restrict_self sys_landlock_restrict_self
447 i386 futex_wait sys_futex_wait
448 i386 futex_wake sys_futex_wake
2 changes: 2 additions & 0 deletions arch/x86/entry/syscalls/syscall_64.tbl
Expand Up @@ -368,6 +368,8 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
447 common futex_wait sys_futex_wait
448 common futex_wake sys_futex_wake

#
# Due to a historical design error, certain syscalls are numbered differently
Expand Down
7 changes: 7 additions & 0 deletions include/linux/syscalls.h
Expand Up @@ -623,6 +623,13 @@ asmlinkage long sys_get_robust_list(int pid,
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);

/* kernel/futex2.c */
asmlinkage long sys_futex_wait(void __user *uaddr, unsigned int val,
unsigned int flags,
struct __kernel_timespec __user *timo);
asmlinkage long sys_futex_wake(void __user *uaddr, unsigned int nr_wake,
unsigned int flags);

/* kernel/hrtimer.c */
asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
struct __kernel_timespec __user *rmtp);
Expand Down
8 changes: 7 additions & 1 deletion include/uapi/asm-generic/unistd.h
Expand Up @@ -872,8 +872,14 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
#define __NR_landlock_restrict_self 446
__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)

#define __NR_futex_wait 443
__SYSCALL(__NR_futex_wait, sys_futex_wait)

#define __NR_futex_wake 444
__SYSCALL(__NR_futex_wake, sys_futex_wake)

#undef __NR_syscalls
#define __NR_syscalls 447
#define __NR_syscalls 449

/*
* 32 bit systems traditionally used different
Expand Down
5 changes: 5 additions & 0 deletions include/uapi/linux/futex.h
Expand Up @@ -41,6 +41,11 @@
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)

/* Size argument to futex2 syscall */
#define FUTEX_32 2

#define FUTEX_SIZE_MASK 0x3

/*
* Support for robust futexes: the kernel cleans up held futexes at
* thread exit time.
Expand Down
7 changes: 7 additions & 0 deletions init/Kconfig
Expand Up @@ -1566,6 +1566,13 @@ config FUTEX
support for "fast userspace mutexes". The resulting kernel may not
run glibc-based applications correctly.

config FUTEX2
bool "Enable futex2 support" if EXPERT
depends on FUTEX
default y
help
Support for futex2 interface.

config FUTEX_PI
bool
depends on FUTEX && RT_MUTEXES
Expand Down
1 change: 1 addition & 0 deletions kernel/Makefile
Expand Up @@ -60,6 +60,7 @@ obj-$(CONFIG_PROFILING) += profile.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_FUTEX2) += futex2.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += smp.o
ifneq ($(CONFIG_SMP),y)
Expand Down

0 comments on commit 4f95b45

Please sign in to comment.