Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions xapi/futures/xenopsd_events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
layout: default
title: Process events from xenopsd in a timely manner
design_doc: true
status: proposed
revision: 1
---

# Background

There is a significant delay between the VM being unpaused and XAPI reporting it
as started during a bootstorm.
It can happen that the VM is able to send UDP packets already, but XAPI still reports it as not started for minutes.

XAPI currently processes all events from xenopsd in a single thread, the unpause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the relevant code that processes events from xenopsd in xapi is somewhere here: https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/xapi_xenops.ml#L1204

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event loop for xenopsd->xapi is here in events_watch: https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/xapi_xenops.ml#L1919, and it calls the module you mentioned.

events get queued up behind a lot of other events generated by the already
running VMs.

We need to ensure that unpause events from xenopsd get processed in a timely
manner, even if XAPI is busy processing other events.

# Timely processing of events

If we process the events in a Round-Robin fashion then `unpause` events are reported in a timely fashion.
We need to ensure that events operating on the same VM are not processed in parallel.

Xenopsd already has code that does exactly this, the purpose of the [xapi-work-queues refactoring PR](https://github.com/xapi-project/xenopsd/pull/337) is to
reuse this code in XAPI by creating a shared package between xenopsd and xapi: `xapi-work-queues`.

# xapi-work-queues

From the documentation of the new [Worker Pool interface](https://edwintorok.github.io/xapi-work-queues/Xapi_work_queues.html):

A worker pool has a limited number of worker threads.
Each worker pops one tagged item from the queue in a round-robin fashion.
While the item is executed the tag temporarily doesn't participate in round-robin scheduling.
If during execution more items get queued with the same tag they get redirected to a private queue.
Once the item finishes execution the tag will participate in RR scheduling again.

This ensures that items with the same tag do not get executed in parallel,
and that a tag with a lot of items does not starve the execution of other tags.

The XAPI side of the changes will [look like this](https://github.com/edwintorok/xen-api/commit/b367bf86d3af4f773db9bf5d1500a4dec0f99bfa?diff=unified#diff-344dc1d17c4663add7fe5500813feef2)

Known limitations: The active per-VM events should be a small number, this is already ensured in the `push_with_coalesce` / `should_keep` code on the [xenopsd side](https://github.com/xapi-project/xenopsd/blob/master/lib/xenops_server.ml#L441). Events to XAPI from xenopsd should already arrive coalesced.