specs: add VMClock specification by brauner · Pull Request #199 · uapi-group/specifications

brauner · 2025-12-15T15:14:08Z

This was written by David Woodhouse dwmw@amazon.co.uk. As discussed this will become part of the uapi specs.

VMClock: Efficient time synchronisation for virtual machines

The requirements for accurate synchronisation of application clocks against
real wallclock time are becoming ever more demanding. Increasingly cloud
providers are exposing precision clock devices to virtual machines to allow the
guest operating systems to synchronise their clocks.

Time on modern systems is typically derived from a CPU-internal counter (TSC,
timebase, arch counter) which runs at a nominally constant frequency of
typically between 1GHz and 4GHz. In practice, the frequency of the underlying
hardware counter will vary with environmental conditions, with a tolerance of
the order of ±50PPM. It is this variance which must constantly be corrected by
synchronising against an external clock.

Synchronisation against an external clock typically works by reading the CPU
counter, then reading the external clock, and finally reading the CPU counter
again — then assuming that the external clock reading was concurrent with a
point in time between the two CPU counter readings to give a pair of { CPU
counter, real time } values. Successive such readings are used to calibrate the
precise rate at which the CPU counter is running, in order to use it for
precision timekeeping.

When applied at scale to virtual machines, there are a number of problems with
this approach. Firstly, where virtual CPUs are overcommitted across a smaller
number of physical CPUs in a host, guests experience "steal time" — time when
their vCPU is not actually running. That steal time is unpredictable and can
occur in the critical period between one read of the CPU counter and the next,
affecting the precision of the estimated reading.

A remedy for this issue is to repeat the reading a number of times, and to use
the result where the latency between first and last CPU counter reading is the
lowest. Which exacerbates the second problem, that a large number of separate
guest operating systems on the same host are now repeating the same work of
calibrating the same underlying hardware oscillator.

The third major problem of guest-calibrated time is Live Migration, in which a
guest is transparently moved from one host to another for maintenance reasons.
When this happens, the guest can experience a step change in both the frequency
and the value of the CPU counter. The frequency because the migrated guest is
now using a different underlying counter, and the value because correctly
setting the counter value seen by the guest is dependent on the time
synchronisation of each hypervisor host. After a Live Migration, a guest's
clock should be considered inaccurate until it has been resynchronised from
scratch. Failure to do so can lead to data corruption, in cases where database
coherency depends on accurately timestamped transactions.

bluca · 2025-12-15T15:18:31Z

version needs to be X.Y, ie 1.0 as per readme
needs UAPI.XX prefix as other specs
header needs aliases lines as other specs (eg: https://raw.githubusercontent.com/uapi-group/specifications/refs/heads/main/specs/boot_loader_specification.md )
needs a changelog table in the first paragraph as other specs have:

| Version | Changes |
|---------|---------|
| 1.0     | Initial Release |

dwmw2 · 2025-12-15T15:23:18Z

Is the update to the top-level README.md automatic?

bluca · 2025-12-15T15:31:57Z

No that needs to be updated too, good catch

dwmw2 · 2025-12-15T16:09:35Z

Also happy to have review on the actual content too. This is an almost-final draft of the doc we'd been preparing internally, to go along with the existing implementations in QEMU and the Linux kernel, with the recent updates to add the generation support:

QEMU: [RFC PATCH 0/4] vmclock: add support for VM generation counter and notifications
Linux: [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification

brauner · 2025-12-15T16:15:31Z

Also happy to have review on the actual content too. This is an almost-final draft of the doc we'd been preparing internally, to go along with the existing implementations in QEMU and the Linux kernel, with the recent updates to add the generation support:

QEMU: [RFC PATCH 0/4] vmclock: add support for VM generation counter and notifications

Linux: [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification

Very nice. We should add these in a follow-up pr. I don't mind merging that before it's in. Buy let me know what you think.

dwmw2 · 2025-12-15T16:16:26Z

Those changes to the spec are already included.

poettering · 2025-12-15T17:04:51Z

specs/vmclock.md

@@ -0,0 +1,316 @@
+---
+title: VMClock


please assign UAPI.13 here.

see how this is done elsewhere

Sorry, I had a very outdated version of the repo and that's why all the new bits had been missing because I didn't see them in the existing versions of the specs.

poettering · 2025-12-16T08:54:25Z

Lgtm. Just thenlink to the virtio rtc spec should be fixed

specs/vmclock.md

behrmann · 2025-12-16T08:59:27Z

specs/vmclock.md

+| 0x50 | `uint64_t time_frac_sec` | Fractional part of reference time, in units of second / 2⁶⁴. |
+| 0x58 | `uint64_t time_esterror_nanosec` | Estimated ± error of the time given in `time_sec` + `time_frac_sec`, in nanoseconds |
+| 0x60 | `uint64_t time_maxerror_nanosec` | Maximum ± error of the time given in `time_sec` + `time_frac_sec`, in nanoseconds |
+| 0x64 | `uint64_t vm_generation_count` | A change in this field indicates that the guest has been loaded from a snapshot. In addition to handling a disruption in time (which will also be signalled through the `disruption_marker` field), a guest may wish to discard UUIDs, reset network connections or reseed entropy, etc. |


which will also be signalled through the disruption_marker field

Is this a must or how is this ensured?

specs/vmclock.md

This was written by David Woodhouse <dwmw@amazon.co.uk>. As discussed this will become part of the uapi specs. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>. Signed-off-by: Christian Brauner <brauner@kernel.org>

poettering reviewed Dec 15, 2025

View reviewed changes

brauner force-pushed the main branch from a871cb8 to 59ce984 Compare December 15, 2025 21:40

bluca approved these changes Dec 15, 2025

View reviewed changes

brauner requested a review from poettering December 15, 2025 21:44

behrmann reviewed Dec 16, 2025

View reviewed changes

specs: add VMClock specification

ca417e4

This was written by David Woodhouse <dwmw@amazon.co.uk>. As discussed this will become part of the uapi specs. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>. Signed-off-by: Christian Brauner <brauner@kernel.org>

brauner force-pushed the main branch from e2c16c7 to ca417e4 Compare December 16, 2025 10:17

brauner merged commit 4ecabc5 into uapi-group:main Dec 16, 2025
1 check passed

cgwalters mentioned this pull request Feb 3, 2026

slightly more formality for merges? #205

Open

Comments

Conversation

brauner commented Dec 15, 2025

Uh oh!

bluca commented Dec 15, 2025

Uh oh!

dwmw2 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluca commented Dec 15, 2025

Uh oh!

dwmw2 commented Dec 15, 2025

Uh oh!

brauner commented Dec 15, 2025

Uh oh!

dwmw2 commented Dec 15, 2025

Uh oh!

poettering Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

brauner Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

poettering commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

behrmann Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

dwmw2 commented Dec 15, 2025 •

edited

Loading