Skip to content

Commit

Permalink
arch: POSIX: Fix race with unused threads
Browse files Browse the repository at this point in the history
Fix a race which seems to have been presenting itself
very sporadically on loaded systems.
The race seems to have caused tests/kernel/sched/schedule_api
to fail at random on native_posix.

The case is a bit convoluted:
When the kernel calls z_new_thread(), the POSIX arch saves
the new thread entry call in that new Zephyr thread stack
together with a bit of extra info for the POSIX arch.
And spawns a new pthread (posix_thread_starter()) which
will eventually (after the Zephyr kernel swapped to it),
call that entry function.
(Note that in principle a thread spawned by pthreads may
be arbitrarily delayed)
The POSIX arch does not try to synchronize to that new
pthread (because why should it) until the first time the
Zephyr kernel tries to swap to that thread.
But, the kernel may never try to swap to it.
And therefore that thread's posix_thread_starter() may never
have got to run before the thread was aborted, and its
Zephyr stack reused for something else by the Zephyr app.

As posix_thread_starter() was relaying on looking into that
thread stack, it may now be looking into another thread stack
or anything else.

So, this commit fixes it by having posix_thread_starter()
get the input it always needs not from the Zephyr stack,
but from its own pthread_create() parameter pointing to a
structure kept by the POSIX arch.

Note that if the thread was aborted before reaching that point
posix_thread_starter() will NOT call the Zephyr thread entry
function, but just cleanup.

With this change all "asynchronous" parts of the POSIX arch
should relay only on the POSIX arch own structures.

Signed-off-by: Alberto Escolar Piedras <alpi@oticon.com>
  • Loading branch information
aescolar committed Jul 19, 2019
1 parent 1074efd commit fece690
Showing 1 changed file with 19 additions and 13 deletions.
32 changes: 19 additions & 13 deletions arch/posix/core/posix_core.c
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ struct threads_table_el {
bool running; /* Is this the currently running thread */
pthread_t thread; /* Actual pthread_t as returned by native kernel */
int thead_cnt; /* For debugging: Unique, consecutive, thread number */
/* Pointer to the status kept in the Zephyr thread stack */
posix_thread_status_t *t_status;
};

static struct threads_table_el *threads_table;
Expand Down Expand Up @@ -260,11 +262,11 @@ static void posix_cleanup_handler(void *arg)
*/
static void *posix_thread_starter(void *arg)
{
posix_thread_status_t *ptr = (posix_thread_status_t *) arg;
int thread_idx = (intptr_t)arg;

PC_DEBUG("Thread [%i] %i: %s: Starting\n",
threads_table[ptr->thread_idx].thead_cnt,
ptr->thread_idx,
threads_table[thread_idx].thead_cnt,
thread_idx,
__func__);

/*
Expand All @@ -286,18 +288,20 @@ static void *posix_thread_starter(void *arg)
pthread_cleanup_push(posix_cleanup_handler, arg);

PC_DEBUG("Thread [%i] %i: %s: After start mutex (hav mut)\n",
threads_table[ptr->thread_idx].thead_cnt,
ptr->thread_idx,
threads_table[thread_idx].thead_cnt,
thread_idx,
__func__);

/*
* The thread would try to execute immediately, so we block it
* until allowed
*/
posix_wait_until_allowed(ptr->thread_idx);
posix_wait_until_allowed(thread_idx);

posix_new_thread_pre_start();

posix_thread_status_t *ptr = threads_table[thread_idx].t_status;

z_thread_entry(ptr->entry_point, ptr->arg1, ptr->arg2, ptr->arg3);

/*
Expand All @@ -306,13 +310,13 @@ static void *posix_thread_starter(void *arg)
*/
/* LCOV_EXCL_START */
posix_print_trace(PREFIX"Thread [%i] %i [%lu] ended!?!\n",
threads_table[ptr->thread_idx].thead_cnt,
ptr->thread_idx,
threads_table[thread_idx].thead_cnt,
thread_idx,
pthread_self());


threads_table[ptr->thread_idx].running = false;
threads_table[ptr->thread_idx].state = FAILED;
threads_table[thread_idx].running = false;
threads_table[thread_idx].state = FAILED;

pthread_cleanup_pop(1);

Expand Down Expand Up @@ -370,16 +374,18 @@ void posix_new_thread(posix_thread_status_t *ptr)
threads_table[t_slot].state = USED;
threads_table[t_slot].running = false;
threads_table[t_slot].thead_cnt = thread_create_count++;
threads_table[t_slot].t_status = ptr;
ptr->thread_idx = t_slot;

PC_SAFE_CALL(pthread_create(&threads_table[t_slot].thread,
NULL,
posix_thread_starter,
(void *)ptr));
(void *)(intptr_t)t_slot));

PC_DEBUG("created thread [%i] %i [%lu]\n",
PC_DEBUG("%s created thread [%i] %i [%lu]\n",
__func__,
threads_table[t_slot].thead_cnt,
ptr->thread_idx,
t_slot,
threads_table[t_slot].thread);

}
Expand Down

0 comments on commit fece690

Please sign in to comment.