Skip to content

Commit

Permalink
vpci: use per-domain PCI lock to protect vpci structure
Browse files Browse the repository at this point in the history
Use the per-domain PCI read/write lock to protect the presence of the
pci device vpci field. This lock can be used (and in a few cases is used
right away) so that vpci removal can be performed while holding the lock
in write mode. Previously such removal could race with vpci_read for
example.

When taking both d->pci_lock and pdev->vpci->lock, they should be
taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
possible deadlock situations.

1. Per-domain's pci_lock is used to protect pdev->vpci structure
from being removed.

2. Writing the command register and ROM BAR register may trigger
modify_bars to run, which in turn may access multiple pdevs while
checking for the existing BAR's overlap. The overlapping check, if
done under the read lock, requires vpci->lock to be acquired on both
devices being compared, which may produce a deadlock. It is not
possible to upgrade read lock to write lock in such a case. So, in
order to prevent the deadlock, use d->pci_lock in write mode instead.

All other code, which doesn't lead to pdev->vpci destruction and does
not access multiple pdevs at the same time, can still use a
combination of the read lock and pdev->vpci->lock.

3. Drop const qualifier where the new rwlock is used and this is
appropriate.

4. Do not call process_pending_softirqs with any locks held. For that
unlock prior the call and re-acquire the locks after. After
re-acquiring the lock there is no need to check if pdev->vpci exists:
 - in apply_map because of the context it is called (no race condition
   possible)
 - for MSI/MSI-X debug code because it is called at the end of
   pdev->vpci access and no further access to pdev->vpci is made

5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
while accessing pdevs in vpci code.

6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
do not go away. The vPCI functions call several MSI-related functions
which already have existing non-vPCI callers. Change those MSI-related
functions to allow using either pcidevs_lock() or d->pci_lock for
ensuring pdevs do not go away. Holding d->pci_lock in read mode is
sufficient. Note that this pdev protection mechanism does not protect
other state or critical sections. These MSI-related functions already
have other race condition and state protection mechanims (e.g.
d->event_lock and msixtbl RCU), so we deduce that the use of the global
pcidevs_lock() is to ensure that pdevs do not go away.

7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
that pdevs do not go away. The purpose of this wrapper is to aid
readability and document the intent of the pdev protection mechanism.

8. When possible, the existing non-vPCI callers of these MSI-related
functions haven't been switched to use the newly introduced per-domain
pci_lock, and will continue to use the global pcidevs_lock(). This is
done to reduce the risk of the new locking scheme introducing
regressions. Those users will be adjusted in due time. One exception
is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
the caller, physdev_map_pirq(): this instance is switched to
read_lock(&d->pci_lock) right away.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
  • Loading branch information
Oleksandr Andrushchenko authored and jbeulich committed Feb 27, 2024
1 parent cfe3ad6 commit 4f78438
Show file tree
Hide file tree
Showing 12 changed files with 174 additions and 41 deletions.
37 changes: 26 additions & 11 deletions xen/arch/x86/hvm/vmsi.c
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
struct msixtbl_entry *entry, *new_entry;
int r = -EINVAL;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(d);
ASSERT(rw_is_write_locked(&d->event_lock));

if ( !msixtbl_initialised(d) )
Expand Down Expand Up @@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
struct pci_dev *pdev;
struct msixtbl_entry *entry;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(d);
ASSERT(rw_is_write_locked(&d->event_lock));

if ( !msixtbl_initialised(d) )
Expand Down Expand Up @@ -684,7 +684,7 @@ static int vpci_msi_update(const struct pci_dev *pdev, uint32_t data,
{
unsigned int i;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

if ( (address & MSI_ADDR_BASE_MASK) != MSI_ADDR_HEADER )
{
Expand Down Expand Up @@ -725,8 +725,8 @@ void vpci_msi_arch_update(struct vpci_msi *msi, const struct pci_dev *pdev)
int rc;

ASSERT(msi->arch.pirq != INVALID_PIRQ);
ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

pcidevs_lock();
for ( i = 0; i < msi->vectors && msi->arch.bound; i++ )
{
struct xen_domctl_bind_pt_irq unbind = {
Expand All @@ -745,7 +745,6 @@ void vpci_msi_arch_update(struct vpci_msi *msi, const struct pci_dev *pdev)

msi->arch.bound = !vpci_msi_update(pdev, msi->data, msi->address,
msi->vectors, msi->arch.pirq, msi->mask);
pcidevs_unlock();
}

static int vpci_msi_enable(const struct pci_dev *pdev, unsigned int nr,
Expand Down Expand Up @@ -778,15 +777,14 @@ int vpci_msi_arch_enable(struct vpci_msi *msi, const struct pci_dev *pdev,
int rc;

ASSERT(msi->arch.pirq == INVALID_PIRQ);
ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);
rc = vpci_msi_enable(pdev, vectors, 0);
if ( rc < 0 )
return rc;
msi->arch.pirq = rc;

pcidevs_lock();
msi->arch.bound = !vpci_msi_update(pdev, msi->data, msi->address, vectors,
msi->arch.pirq, msi->mask);
pcidevs_unlock();

return 0;
}
Expand All @@ -797,8 +795,8 @@ static void vpci_msi_disable(const struct pci_dev *pdev, int pirq,
unsigned int i;

ASSERT(pirq != INVALID_PIRQ);
ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

pcidevs_lock();
for ( i = 0; i < nr && bound; i++ )
{
struct xen_domctl_bind_pt_irq bind = {
Expand All @@ -814,7 +812,6 @@ static void vpci_msi_disable(const struct pci_dev *pdev, int pirq,
write_lock(&pdev->domain->event_lock);
unmap_domain_pirq(pdev->domain, pirq);
write_unlock(&pdev->domain->event_lock);
pcidevs_unlock();
}

void vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev)
Expand Down Expand Up @@ -854,22 +851,21 @@ int vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
int rc;

ASSERT(entry->arch.pirq == INVALID_PIRQ);
ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);
rc = vpci_msi_enable(pdev, vmsix_entry_nr(pdev->vpci->msix, entry),
table_base);
if ( rc < 0 )
return rc;

entry->arch.pirq = rc;

pcidevs_lock();
rc = vpci_msi_update(pdev, entry->data, entry->addr, 1, entry->arch.pirq,
entry->masked);
if ( rc )
{
vpci_msi_disable(pdev, entry->arch.pirq, 1, false);
entry->arch.pirq = INVALID_PIRQ;
}
pcidevs_unlock();

return rc;
}
Expand All @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
{
unsigned int i;

/*
* Assert that pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED
* is not suitable here because it allows either pcidevs_lock() or
* pci_lock to be held, but here we rely on pci_lock being held, not
* pcidevs_lock() (see the transient lock dropping further down).
*/
ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock));
ASSERT(spin_is_locked(&msix->pdev->vpci->lock));

for ( i = 0; i < msix->max_entries; i++ )
{
const struct vpci_msix_entry *entry = &msix->entries[i];
Expand All @@ -913,13 +918,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
struct pci_dev *pdev = msix->pdev;

spin_unlock(&msix->pdev->vpci->lock);
read_unlock(&pdev->domain->pci_lock);
process_pending_softirqs();

if ( !read_trylock(&pdev->domain->pci_lock) )
return -EBUSY;

/* NB: we assume that pdev cannot go away for an alive domain. */
if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
{
read_unlock(&pdev->domain->pci_lock);
return -EBUSY;
}

if ( pdev->vpci->msix != msix )
{
spin_unlock(&pdev->vpci->lock);
read_unlock(&pdev->domain->pci_lock);
return -EAGAIN;
}
}
Expand Down
2 changes: 1 addition & 1 deletion xen/arch/x86/hvm/vmx/vmx.c
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,7 @@ static int cf_check vmx_pi_update_irte(const struct vcpu *v,

spin_unlock_irq(&desc->lock);

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);

return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);

Expand Down
8 changes: 4 additions & 4 deletions xen/arch/x86/irq.c
Original file line number Diff line number Diff line change
Expand Up @@ -2162,7 +2162,7 @@ int map_domain_pirq(
struct pci_dev *pdev;
unsigned int nr = 0;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(d);

ret = -ENODEV;
if ( !cpu_has_apic )
Expand Down Expand Up @@ -2319,7 +2319,7 @@ int unmap_domain_pirq(struct domain *d, int pirq)
if ( (pirq < 0) || (pirq >= d->nr_pirqs) )
return -EINVAL;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(d);
ASSERT(rw_is_write_locked(&d->event_lock));

info = pirq_info(d, pirq);
Expand Down Expand Up @@ -2884,6 +2884,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,
{
int irq, pirq, ret;

ASSERT_PDEV_LIST_IS_READ_LOCKED(d);

switch ( type )
{
case MAP_PIRQ_TYPE_MSI:
Expand Down Expand Up @@ -2913,7 +2915,6 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,

msi->irq = irq;

pcidevs_lock();
/* Verify or get pirq. */
write_lock(&d->event_lock);
pirq = allocate_pirq(d, index, *pirq_p, irq, type, &msi->entry_nr);
Expand All @@ -2929,7 +2930,6 @@ int allocate_and_map_msi_pirq(struct domain *d, int index, int *pirq_p,

done:
write_unlock(&d->event_lock);
pcidevs_unlock();
if ( ret )
{
switch ( type )
Expand Down
20 changes: 13 additions & 7 deletions xen/arch/x86/msi.c
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ static int msi_capability_init(struct pci_dev *dev,
unsigned int i, mpos;
uint16_t control;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(dev->domain);
pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
if ( !pos )
return -ENODEV;
Expand Down Expand Up @@ -771,7 +771,7 @@ static int msix_capability_init(struct pci_dev *dev,
if ( !pos )
return -ENODEV;

ASSERT(pcidevs_locked());
ASSERT_PDEV_LIST_IS_READ_LOCKED(dev->domain);

control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
/*
Expand Down Expand Up @@ -988,11 +988,11 @@ static int __pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
{
struct msi_desc *old_desc;

ASSERT(pcidevs_locked());

if ( !pdev )
return -ENODEV;

ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

old_desc = find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSI);
if ( old_desc )
{
Expand Down Expand Up @@ -1043,9 +1043,12 @@ static int __pci_enable_msix(struct pci_dev *pdev, struct msi_info *msi,
{
struct msi_desc *old_desc;

ASSERT(pcidevs_locked());
if ( !pdev )
return -ENODEV;

ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

if ( !pdev || !pdev->msix )
if ( !pdev->msix )
return -ENODEV;

if ( msi->entry_nr >= pdev->msix->nr_entries )
Expand Down Expand Up @@ -1154,7 +1157,10 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off)
int pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
struct msi_desc **desc)
{
ASSERT(pcidevs_locked());
if ( !pdev )
return -ENODEV;

ASSERT_PDEV_LIST_IS_READ_LOCKED(pdev->domain);

if ( !use_msi )
return -EPERM;
Expand Down
2 changes: 2 additions & 0 deletions xen/arch/x86/physdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,9 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,

case MAP_PIRQ_TYPE_MSI:
case MAP_PIRQ_TYPE_MULTI_MSI:
read_lock(&d->pci_lock);
ret = allocate_and_map_msi_pirq(d, *index, pirq_p, type, msi);
read_unlock(&d->pci_lock);
break;

default:
Expand Down
9 changes: 5 additions & 4 deletions xen/drivers/passthrough/pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -750,7 +750,6 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
pdev->domain = hardware_domain;
write_lock(&hardware_domain->pci_lock);
list_add(&pdev->domain_list, &hardware_domain->pdev_list);
write_unlock(&hardware_domain->pci_lock);

/*
* For devices not discovered by Xen during boot, add vPCI handlers
Expand All @@ -759,18 +758,18 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
ret = vpci_add_handlers(pdev);
if ( ret )
{
printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
write_lock(&hardware_domain->pci_lock);
list_del(&pdev->domain_list);
write_unlock(&hardware_domain->pci_lock);
pdev->domain = NULL;
printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
goto out;
}
write_unlock(&hardware_domain->pci_lock);
ret = iommu_add_device(pdev);
if ( ret )
{
vpci_remove_device(pdev);
write_lock(&hardware_domain->pci_lock);
vpci_remove_device(pdev);
list_del(&pdev->domain_list);
write_unlock(&hardware_domain->pci_lock);
pdev->domain = NULL;
Expand Down Expand Up @@ -1146,7 +1145,9 @@ static void __hwdom_init setup_one_hwdom_device(const struct setup_hwdom *ctxt,
} while ( devfn != pdev->devfn &&
PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );

write_lock(&ctxt->d->pci_lock);
err = vpci_add_handlers(pdev);
write_unlock(&ctxt->d->pci_lock);
if ( err )
printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
ctxt->d->domain_id, err);
Expand Down
18 changes: 18 additions & 0 deletions xen/drivers/vpci/header.c
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ bool vpci_process_pending(struct vcpu *v)
if ( rc == -ERESTART )
return true;

write_lock(&v->domain->pci_lock);
spin_lock(&v->vpci.pdev->vpci->lock);
/* Disable memory decoding unconditionally on failure. */
modify_decoding(v->vpci.pdev,
Expand All @@ -191,6 +192,7 @@ bool vpci_process_pending(struct vcpu *v)
* failure.
*/
vpci_remove_device(v->vpci.pdev);
write_unlock(&v->domain->pci_lock);
}

return false;
Expand All @@ -202,8 +204,20 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
struct map_data data = { .d = d, .map = true };
int rc;

ASSERT(rw_is_write_locked(&d->pci_lock));

while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
{
/*
* It's safe to drop and reacquire the lock in this context
* without risking pdev disappearing because devices cannot be
* removed until the initial domain has been started.
*/
write_unlock(&d->pci_lock);
process_pending_softirqs();
write_lock(&d->pci_lock);
}

rangeset_destroy(mem);
if ( !rc )
modify_decoding(pdev, cmd, false);
Expand Down Expand Up @@ -244,6 +258,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
unsigned int i;
int rc;

ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));

if ( !mem )
return -ENOMEM;

Expand Down Expand Up @@ -524,6 +540,8 @@ static int cf_check init_header(struct pci_dev *pdev)
int rc;
bool mask_cap_list = false;

ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));

switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
{
case PCI_HEADER_TYPE_NORMAL:
Expand Down

0 comments on commit 4f78438

Please sign in to comment.