Skip to content

Commit

Permalink
ring-buffer: Fix waking up ring buffer readers
Browse files Browse the repository at this point in the history
[ Upstream commit b359457 ]

A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.

The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.

If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.

Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.

This had a couple of bugs. One trivial one and one broken by design.

The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.

The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.

The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.

Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.

Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linke li <lilinke99@qq.com>
Cc: Rabin Vincent <rabin@rab.in>
Fixes: e30f53a ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 761d947 ("ring-buffer: Do not set shortest_full when full target is hit")
Signed-off-by: Sasha Levin <sashal@kernel.org>
  • Loading branch information
rostedt authored and gregkh committed Apr 3, 2024
1 parent f81532b commit a8ffb67
Showing 1 changed file with 68 additions and 71 deletions.
139 changes: 68 additions & 71 deletions kernel/trace/ring_buffer.c
Expand Up @@ -412,7 +412,6 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
Expand Down Expand Up @@ -945,14 +944,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
rbwork = &cpu_buffer->irq_work;
}

rbwork->wait_index++;
/* make sure the waiters see the new index */
smp_wmb();

/* This can be called in any context */
irq_work_queue(&rbwork->work);
}

static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
bool ret = false;

/* Reads of all CPUs always waits for any data */
if (cpu == RING_BUFFER_ALL_CPUS)
return !ring_buffer_empty(buffer);

cpu_buffer = buffer->buffers[cpu];

if (!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;

if (!full)
return true;

raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
ret = !pagebusy && full_hit(buffer, cpu, full);

if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
}
return ret;
}

/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
Expand All @@ -968,7 +993,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
long wait_index;
int ret = 0;

/*
Expand All @@ -987,81 +1011,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}

wait_index = READ_ONCE(work->wait_index);

while (true) {
if (full)
prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
else
prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);

/*
* The events can happen in critical sections where
* checking a work queue can cause deadlocks.
* After adding a task to the queue, this flag is set
* only to notify events to try to wake up the queue
* using irq_work.
*
* We don't clear it even if the buffer is no longer
* empty. The flag only causes the next event to run
* irq_work to do the work queue wake up. The worse
* that can happen if we race with !trace_empty() is that
* an event will cause an irq_work to try to wake up
* an empty queue.
*
* There's no reason to protect this flag either, as
* the work queue and irq_work logic will do the necessary
* synchronization for the wake ups. The only thing
* that is necessary is that the wake up happens after
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
work->full_waiters_pending = true;
else
work->waiters_pending = true;

if (signal_pending(current)) {
ret = -EINTR;
break;
}

if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
break;

if (cpu != RING_BUFFER_ALL_CPUS &&
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
bool done;

if (!full)
break;

raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
done = !pagebusy && full_hit(buffer, cpu, full);
if (full)
prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
else
prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);

if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
if (done)
break;
}
/*
* The events can happen in critical sections where
* checking a work queue can cause deadlocks.
* After adding a task to the queue, this flag is set
* only to notify events to try to wake up the queue
* using irq_work.
*
* We don't clear it even if the buffer is no longer
* empty. The flag only causes the next event to run
* irq_work to do the work queue wake up. The worse
* that can happen if we race with !trace_empty() is that
* an event will cause an irq_work to try to wake up
* an empty queue.
*
* There's no reason to protect this flag either, as
* the work queue and irq_work logic will do the necessary
* synchronization for the wake ups. The only thing
* that is necessary is that the wake up happens after
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
work->full_waiters_pending = true;
else
work->waiters_pending = true;

schedule();
if (rb_watermark_hit(buffer, cpu, full))
goto out;

/* Make sure to see the new wait index */
smp_rmb();
if (wait_index != work->wait_index)
break;
if (signal_pending(current)) {
ret = -EINTR;
goto out;
}

schedule();
out:
if (full)
finish_wait(&work->full_waiters, &wait);
else
finish_wait(&work->waiters, &wait);

if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
ret = -EINTR;

return ret;
}

Expand Down

0 comments on commit a8ffb67

Please sign in to comment.