Skip to content

Commit

Permalink
iommu/amd: Recover from event log overflow
Browse files Browse the repository at this point in the history
commit 5ce97f4 upstream.

The AMD IOMMU logs I/O page faults and such to a ring buffer in
system memory, and this ring buffer can overflow.  The AMD IOMMU
spec has the following to say about the interrupt status bit that
signals this overflow condition:

	EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
	event log overflow has occurred. This bit is set when a new
	event is to be written to the event log and there is no usable
	entry in the event log, causing the new event information to
	be discarded. An interrupt is generated when EventOverflow = 1b
	and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
	entries are written while this bit is set. Software Note: To
	resume logging, clear EventOverflow (W1C), and write a 1 to
	MMIO Offset 0018h[EventLogEn].

The AMD IOMMU driver doesn't currently implement this recovery
sequence, meaning that if a ring buffer overflow occurs, logging
of EVT/PPR/GA events will cease entirely.

This patch implements the spec-mandated reset sequence, with the
minor tweak that the hardware seems to want to have a 0 written to
MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this
field, or the IOMMU won't actually resume logging events.

Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/YVrSXEdW2rzEfOvk@wantstofly.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
  • Loading branch information
buytenh authored and gregkh committed Mar 8, 2022
1 parent 2aaa085 commit a8a8663
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 2 deletions.
1 change: 1 addition & 0 deletions drivers/iommu/amd/amd_iommu.h
Expand Up @@ -14,6 +14,7 @@
extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
extern void amd_iommu_apply_erratum_63(u16 devid);
extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu);
extern int amd_iommu_init_devices(void);
extern void amd_iommu_uninit_devices(void);
Expand Down
1 change: 1 addition & 0 deletions drivers/iommu/amd/amd_iommu_types.h
Expand Up @@ -110,6 +110,7 @@
#define PASID_MASK 0x0000ffff

/* MMIO status bits */
#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK (1 << 0)
#define MMIO_STATUS_EVT_INT_MASK (1 << 1)
#define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2)
#define MMIO_STATUS_PPR_INT_MASK (1 << 6)
Expand Down
10 changes: 10 additions & 0 deletions drivers/iommu/amd/init.c
Expand Up @@ -655,6 +655,16 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu)
return iommu->cmd_buf ? 0 : -ENOMEM;
}

/*
* This function restarts event logging in case the IOMMU experienced
* an event log buffer overflow.
*/
void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
{
iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
}

/*
* This function resets the command buffer if the IOMMU stopped fetching
* commands from it.
Expand Down
10 changes: 8 additions & 2 deletions drivers/iommu/amd/iommu.c
Expand Up @@ -742,7 +742,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { }
#endif /* !CONFIG_IRQ_REMAP */

#define AMD_IOMMU_INT_MASK \
(MMIO_STATUS_EVT_INT_MASK | \
(MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \
MMIO_STATUS_EVT_INT_MASK | \
MMIO_STATUS_PPR_INT_MASK | \
MMIO_STATUS_GALOG_INT_MASK)

Expand All @@ -752,7 +753,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);

while (status & AMD_IOMMU_INT_MASK) {
/* Enable EVT and PPR and GA interrupts again */
/* Enable interrupt sources again */
writel(AMD_IOMMU_INT_MASK,
iommu->mmio_base + MMIO_STATUS_OFFSET);

Expand All @@ -773,6 +774,11 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
}
#endif

if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) {
pr_info_ratelimited("IOMMU event log overflow\n");
amd_iommu_restart_event_logging(iommu);
}

/*
* Hardware bug: ERBT1312
* When re-enabling interrupt (by writing 1
Expand Down

0 comments on commit a8a8663

Please sign in to comment.