Skip to content

Commit

Permalink
ath9k: Fix potential interrupt storm on queue reset
Browse files Browse the repository at this point in the history
In tests with two Lima boards from 8devices (QCA4531 based) on OpenWrt
19.07 we could force a silent restart of a device with no serial
output when we were sending a high amount of UDP traffic (iperf3 at 80
MBit/s in both directions from external hosts, saturating the wifi and
causing a load of about 4.5 to 6) and were then triggering an
ath9k_queue_reset().

Further debugging showed that the restart was caused by the ath79
watchdog. With disabled watchdog we could observe that the device was
constantly going into ath_isr() interrupt handler and was returning
early after the ATH_OP_HW_RESET flag test, without clearing any
interrupts. Even though ath9k_queue_reset() calls
ath9k_hw_kill_interrupts().

With JTAG we could observe the following race condition:

1) ath9k_queue_reset()
   ...
   -> ath9k_hw_kill_interrupts()
   -> set_bit(ATH_OP_HW_RESET, &common->op_flags);
   ...
   <- returns

      2) ath9k_tasklet()
         ...
         -> ath9k_hw_resume_interrupts()
         ...
         <- returns

                 3) loops around:
                    ...
                    handle_int()
                    -> ath_isr()
                       ...
                       -> if (test_bit(ATH_OP_HW_RESET,
                                       &common->op_flags))
                            return IRQ_HANDLED;

                    x) ath_reset_internal():
                       => never reached <=

And in ath_isr() we would typically see the following interrupts /
interrupt causes:

* status: 0x00111030 or 0x00110030
* async_cause: 2 (AR_INTR_MAC_IPQ)
* sync_cause: 0

So the ath9k_tasklet() reenables the ath9k interrupts
through ath9k_hw_resume_interrupts() which ath9k_queue_reset() had just
disabled. And ath_isr() then keeps firing because it returns IRQ_HANDLED
without actually clearing the interrupt.

To fix this IRQ storm also clear/disable the interrupts again when we
are in reset state.

Cc: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Linus Lüssing <linus.luessing@c0d3.blue>
Fixes: 872b5d8 ("ath9k: do not access hardware on IRQs during reset")
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210914192515.9273-3-linus.luessing@c0d3.blue
  • Loading branch information
T-X authored and Kalle Valo committed Oct 5, 2021
1 parent 053f985 commit 4925642
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion drivers/net/wireless/ath/ath9k/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -533,8 +533,10 @@ irqreturn_t ath_isr(int irq, void *dev)
ath9k_debug_sync_cause(sc, sync_cause);
status &= ah->imask; /* discard unasked-for bits */

if (test_bit(ATH_OP_HW_RESET, &common->op_flags))
if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) {
ath9k_hw_kill_interrupts(sc->sc_ah);
return IRQ_HANDLED;
}

/*
* If there are no status bits set, then this interrupt was not
Expand Down

0 comments on commit 4925642

Please sign in to comment.