Skip to content

Commit 461d2a5

Browse files
committed
Merge remote-tracking branch 'tomoyo/master'
2 parents 02aa95a + 122fa8c commit 461d2a5

File tree

180 files changed

+8437
-2750
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

180 files changed

+8437
-2750
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 335 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -688,9 +688,14 @@ MSRs that have been set successfully.
688688
Defines the vcpu responses to the cpuid instruction. Applications
689689
should use the KVM_SET_CPUID2 ioctl if available.
690690

691-
Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID
692-
configuration (if there is) is not corrupted. Userspace can get a copy of the
693-
resulting CPUID configuration through KVM_GET_CPUID2 in case.
691+
Caveat emptor:
692+
- If this IOCTL fails, KVM gives no guarantees that previous valid CPUID
693+
configuration (if there is) is not corrupted. Userspace can get a copy
694+
of the resulting CPUID configuration through KVM_GET_CPUID2 in case.
695+
- Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
696+
after running the guest, may cause guest instability.
697+
- Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
698+
may cause guest instability.
694699

695700
::
696701

@@ -5034,6 +5039,260 @@ see KVM_XEN_VCPU_SET_ATTR above.
50345039
The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
50355040
with the KVM_XEN_VCPU_GET_ATTR ioctl.
50365041

5042+
4.130 KVM_ARM_MTE_COPY_TAGS
5043+
---------------------------
5044+
5045+
:Capability: KVM_CAP_ARM_MTE
5046+
:Architectures: arm64
5047+
:Type: vm ioctl
5048+
:Parameters: struct kvm_arm_copy_mte_tags
5049+
:Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect
5050+
arguments, -EFAULT if memory cannot be accessed).
5051+
5052+
::
5053+
5054+
struct kvm_arm_copy_mte_tags {
5055+
__u64 guest_ipa;
5056+
__u64 length;
5057+
void __user *addr;
5058+
__u64 flags;
5059+
__u64 reserved[2];
5060+
};
5061+
5062+
Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
5063+
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
5064+
field must point to a buffer which the tags will be copied to or from.
5065+
5066+
``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
5067+
``KVM_ARM_TAGS_FROM_GUEST``.
5068+
5069+
The size of the buffer to store the tags is ``(length / 16)`` bytes
5070+
(granules in MTE are 16 bytes long). Each byte contains a single tag
5071+
value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
5072+
``PTRACE_POKEMTETAGS``.
5073+
5074+
If an error occurs before any data is copied then a negative error code is
5075+
returned. If some tags have been copied before an error occurs then the number
5076+
of bytes successfully copied is returned. If the call completes successfully
5077+
then ``length`` is returned.
5078+
5079+
4.131 KVM_GET_SREGS2
5080+
------------------
5081+
5082+
:Capability: KVM_CAP_SREGS2
5083+
:Architectures: x86
5084+
:Type: vcpu ioctl
5085+
:Parameters: struct kvm_sregs2 (out)
5086+
:Returns: 0 on success, -1 on error
5087+
5088+
Reads special registers from the vcpu.
5089+
This ioctl (when supported) replaces the KVM_GET_SREGS.
5090+
5091+
::
5092+
5093+
struct kvm_sregs2 {
5094+
/* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */
5095+
struct kvm_segment cs, ds, es, fs, gs, ss;
5096+
struct kvm_segment tr, ldt;
5097+
struct kvm_dtable gdt, idt;
5098+
__u64 cr0, cr2, cr3, cr4, cr8;
5099+
__u64 efer;
5100+
__u64 apic_base;
5101+
__u64 flags;
5102+
__u64 pdptrs[4];
5103+
};
5104+
5105+
flags values for ``kvm_sregs2``:
5106+
5107+
``KVM_SREGS2_FLAGS_PDPTRS_VALID``
5108+
5109+
Indicates thats the struct contain valid PDPTR values.
5110+
5111+
5112+
4.132 KVM_SET_SREGS2
5113+
------------------
5114+
5115+
:Capability: KVM_CAP_SREGS2
5116+
:Architectures: x86
5117+
:Type: vcpu ioctl
5118+
:Parameters: struct kvm_sregs2 (in)
5119+
:Returns: 0 on success, -1 on error
5120+
5121+
Writes special registers into the vcpu.
5122+
See KVM_GET_SREGS2 for the data structures.
5123+
This ioctl (when supported) replaces the KVM_SET_SREGS.
5124+
5125+
4.133 KVM_GET_STATS_FD
5126+
----------------------
5127+
5128+
:Capability: KVM_CAP_STATS_BINARY_FD
5129+
:Architectures: all
5130+
:Type: vm ioctl, vcpu ioctl
5131+
:Parameters: none
5132+
:Returns: statistics file descriptor on success, < 0 on error
5133+
5134+
Errors:
5135+
5136+
====== ======================================================
5137+
ENOMEM if the fd could not be created due to lack of memory
5138+
EMFILE if the number of opened files exceeds the limit
5139+
====== ======================================================
5140+
5141+
The returned file descriptor can be used to read VM/vCPU statistics data in
5142+
binary format. The data in the file descriptor consists of four blocks
5143+
organized as follows:
5144+
5145+
+-------------+
5146+
| Header |
5147+
+-------------+
5148+
| id string |
5149+
+-------------+
5150+
| Descriptors |
5151+
+-------------+
5152+
| Stats Data |
5153+
+-------------+
5154+
5155+
Apart from the header starting at offset 0, please be aware that it is
5156+
not guaranteed that the four blocks are adjacent or in the above order;
5157+
the offsets of the id, descriptors and data blocks are found in the
5158+
header. However, all four blocks are aligned to 64 bit offsets in the
5159+
file and they do not overlap.
5160+
5161+
All blocks except the data block are immutable. Userspace can read them
5162+
only one time after retrieving the file descriptor, and then use ``pread`` or
5163+
``lseek`` to read the statistics repeatedly.
5164+
5165+
All data is in system endianness.
5166+
5167+
The format of the header is as follows::
5168+
5169+
struct kvm_stats_header {
5170+
__u32 flags;
5171+
__u32 name_size;
5172+
__u32 num_desc;
5173+
__u32 id_offset;
5174+
__u32 desc_offset;
5175+
__u32 data_offset;
5176+
};
5177+
5178+
The ``flags`` field is not used at the moment. It is always read as 0.
5179+
5180+
The ``name_size`` field is the size (in byte) of the statistics name string
5181+
(including trailing '\0') which is contained in the "id string" block and
5182+
appended at the end of every descriptor.
5183+
5184+
The ``num_desc`` field is the number of descriptors that are included in the
5185+
descriptor block. (The actual number of values in the data block may be
5186+
larger, since each descriptor may comprise more than one value).
5187+
5188+
The ``id_offset`` field is the offset of the id string from the start of the
5189+
file indicated by the file descriptor. It is a multiple of 8.
5190+
5191+
The ``desc_offset`` field is the offset of the Descriptors block from the start
5192+
of the file indicated by the file descriptor. It is a multiple of 8.
5193+
5194+
The ``data_offset`` field is the offset of the Stats Data block from the start
5195+
of the file indicated by the file descriptor. It is a multiple of 8.
5196+
5197+
The id string block contains a string which identifies the file descriptor on
5198+
which KVM_GET_STATS_FD was invoked. The size of the block, including the
5199+
trailing ``'\0'``, is indicated by the ``name_size`` field in the header.
5200+
5201+
The descriptors block is only needed to be read once for the lifetime of the
5202+
file descriptor contains a sequence of ``struct kvm_stats_desc``, each followed
5203+
by a string of size ``name_size``.
5204+
5205+
#define KVM_STATS_TYPE_SHIFT 0
5206+
#define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT)
5207+
#define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT)
5208+
#define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT)
5209+
#define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT)
5210+
5211+
#define KVM_STATS_UNIT_SHIFT 4
5212+
#define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT)
5213+
#define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT)
5214+
#define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT)
5215+
#define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT)
5216+
#define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT)
5217+
5218+
#define KVM_STATS_BASE_SHIFT 8
5219+
#define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT)
5220+
#define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT)
5221+
#define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT)
5222+
5223+
struct kvm_stats_desc {
5224+
__u32 flags;
5225+
__s16 exponent;
5226+
__u16 size;
5227+
__u32 offset;
5228+
__u32 unused;
5229+
char name[];
5230+
};
5231+
5232+
The ``flags`` field contains the type and unit of the statistics data described
5233+
by this descriptor. Its endianness is CPU native.
5234+
The following flags are supported:
5235+
5236+
Bits 0-3 of ``flags`` encode the type:
5237+
* ``KVM_STATS_TYPE_CUMULATIVE``
5238+
The statistics data is cumulative. The value of data can only be increased.
5239+
Most of the counters used in KVM are of this type.
5240+
The corresponding ``size`` field for this type is always 1.
5241+
All cumulative statistics data are read/write.
5242+
* ``KVM_STATS_TYPE_INSTANT``
5243+
The statistics data is instantaneous. Its value can be increased or
5244+
decreased. This type is usually used as a measurement of some resources,
5245+
like the number of dirty pages, the number of large pages, etc.
5246+
All instant statistics are read only.
5247+
The corresponding ``size`` field for this type is always 1.
5248+
* ``KVM_STATS_TYPE_PEAK``
5249+
The statistics data is peak. The value of data can only be increased, and
5250+
represents a peak value for a measurement, for example the maximum number
5251+
of items in a hash table bucket, the longest time waited and so on.
5252+
The corresponding ``size`` field for this type is always 1.
5253+
5254+
Bits 4-7 of ``flags`` encode the unit:
5255+
* ``KVM_STATS_UNIT_NONE``
5256+
There is no unit for the value of statistics data. This usually means that
5257+
the value is a simple counter of an event.
5258+
* ``KVM_STATS_UNIT_BYTES``
5259+
It indicates that the statistics data is used to measure memory size, in the
5260+
unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
5261+
determined by the ``exponent`` field in the descriptor.
5262+
* ``KVM_STATS_UNIT_SECONDS``
5263+
It indicates that the statistics data is used to measure time or latency.
5264+
* ``KVM_STATS_UNIT_CYCLES``
5265+
It indicates that the statistics data is used to measure CPU clock cycles.
5266+
5267+
Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the
5268+
unit:
5269+
* ``KVM_STATS_BASE_POW10``
5270+
The scale is based on power of 10. It is used for measurement of time and
5271+
CPU clock cycles. For example, an exponent of -9 can be used with
5272+
``KVM_STATS_UNIT_SECONDS`` to express that the unit is nanoseconds.
5273+
* ``KVM_STATS_BASE_POW2``
5274+
The scale is based on power of 2. It is used for measurement of memory size.
5275+
For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES`` to
5276+
express that the unit is MiB.
5277+
5278+
The ``size`` field is the number of values of this statistics data. Its
5279+
value is usually 1 for most of simple statistics. 1 means it contains an
5280+
unsigned 64bit data.
5281+
5282+
The ``offset`` field is the offset from the start of Data Block to the start of
5283+
the corresponding statistics data.
5284+
5285+
The ``unused`` field is reserved for future support for other types of
5286+
statistics data, like log/linear histogram. Its value is always 0 for the types
5287+
defined above.
5288+
5289+
The ``name`` field is the name string of the statistics data. The name string
5290+
starts at the end of ``struct kvm_stats_desc``. The maximum length including
5291+
the trailing ``'\0'``, is indicated by ``name_size`` in the header.
5292+
5293+
The Stats Data block contains an array of 64-bit values in the same order
5294+
as the descriptors in Descriptors block.
5295+
50375296
5. The kvm_run structure
50385297
========================
50395298

@@ -6323,6 +6582,7 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
63236582
This capability can be used to check / enable 2nd DAWR feature provided
63246583
by POWER10 processor.
63256584

6585+
63266586
7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
63276587
-------------------------------------
63286588

@@ -6380,6 +6640,48 @@ present in the "ibm,hypertas-functions" device-tree property.
63806640
This capability is enabled for hypervisors on platforms like POWER9
63816641
that support radix MMU.
63826642

6643+
7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE
6644+
--------------------------------------
6645+
6646+
:Architectures: x86
6647+
:Parameters: args[0] whether the feature should be enabled or not
6648+
6649+
When this capability is enabled, an emulation failure will result in an exit
6650+
to userspace with KVM_INTERNAL_ERROR (except when the emulator was invoked
6651+
to handle a VMware backdoor instruction). Furthermore, KVM will now provide up
6652+
to 15 instruction bytes for any exit to userspace resulting from an emulation
6653+
failure. When these exits to userspace occur use the emulation_failure struct
6654+
instead of the internal struct. They both have the same layout, but the
6655+
emulation_failure struct matches the content better. It also explicitly
6656+
defines the 'flags' field which is used to describe the fields in the struct
6657+
that are valid (ie: if KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is
6658+
set in the 'flags' field then both 'insn_size' and 'insn_bytes' have valid data
6659+
in them.)
6660+
6661+
7.28 KVM_CAP_ARM_MTE
6662+
--------------------
6663+
6664+
:Architectures: arm64
6665+
:Parameters: none
6666+
6667+
This capability indicates that KVM (and the hardware) supports exposing the
6668+
Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
6669+
VMM before creating any VCPUs to allow the guest access. Note that MTE is only
6670+
available to a guest running in AArch64 mode and enabling this capability will
6671+
cause attempts to create AArch32 VCPUs to fail.
6672+
6673+
When enabled the guest is able to access tags associated with any memory given
6674+
to the guest. KVM will ensure that the tags are maintained during swap or
6675+
hibernation of the host; however the VMM needs to manually save/restore the
6676+
tags as appropriate if the VM is migrated.
6677+
6678+
When this capability is enabled all memory in memslots must be mapped as
6679+
not-shareable (no MAP_SHARED), attempts to create a memslot with a
6680+
MAP_SHARED mmap will result in an -EINVAL return.
6681+
6682+
When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
6683+
perform a bulk copy of tags to/from the guest.
6684+
63836685
8. Other capabilities.
63846686
======================
63856687

@@ -6909,3 +7211,33 @@ This capability is always enabled.
69097211
This capability indicates that the KVM virtual PTP service is
69107212
supported in the host. A VMM can check whether the service is
69117213
available to the guest on migration.
7214+
7215+
8.33 KVM_CAP_HYPERV_ENFORCE_CPUID
7216+
-----------------------------
7217+
7218+
Architectures: x86
7219+
7220+
When enabled, KVM will disable emulated Hyper-V features provided to the
7221+
guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
7222+
currently implmented Hyper-V features are provided unconditionally when
7223+
Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
7224+
leaf.
7225+
7226+
8.34 KVM_CAP_EXIT_HYPERCALL
7227+
---------------------------
7228+
7229+
:Capability: KVM_CAP_EXIT_HYPERCALL
7230+
:Architectures: x86
7231+
:Type: vm
7232+
7233+
This capability, if enabled, will cause KVM to exit to userspace
7234+
with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
7235+
7236+
Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
7237+
of hypercalls that can be configured to exit to userspace.
7238+
Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
7239+
7240+
The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
7241+
of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
7242+
the hypercalls whose corresponding bit is in the argument, and return
7243+
ENOSYS for the others.

Documentation/virt/kvm/cpuid.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,13 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit
9696
before using extended destination
9797
ID bits in MSI address bits 11-5.
9898

99+
KVM_FEATURE_HC_MAP_GPA_RANGE 16 guest checks this feature bit before
100+
using the map gpa range hypercall
101+
to notify the page state change
102+
103+
KVM_FEATURE_MIGRATION_CONTROL 17 guest checks this feature bit before
104+
using MSR_KVM_MIGRATION_CONTROL
105+
99106
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-side
100107
per-cpu warps are expected in
101108
kvmclock

0 commit comments

Comments
 (0)