@@ -688,9 +688,14 @@ MSRs that have been set successfully.
688688Defines the vcpu responses to the cpuid instruction. Applications
689689should use the KVM_SET_CPUID2 ioctl if available.
690690
691- Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID
692- configuration (if there is) is not corrupted. Userspace can get a copy of the
693- resulting CPUID configuration through KVM_GET_CPUID2 in case.
691+ Caveat emptor:
692+ - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID
693+ configuration (if there is) is not corrupted. Userspace can get a copy
694+ of the resulting CPUID configuration through KVM_GET_CPUID2 in case.
695+ - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
696+ after running the guest, may cause guest instability.
697+ - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
698+ may cause guest instability.
694699
695700::
696701
@@ -5034,6 +5039,260 @@ see KVM_XEN_VCPU_SET_ATTR above.
50345039The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
50355040with the KVM_XEN_VCPU_GET_ATTR ioctl.
50365041
5042+ 4.130 KVM_ARM_MTE_COPY_TAGS
5043+ ---------------------------
5044+
5045+ :Capability: KVM_CAP_ARM_MTE
5046+ :Architectures: arm64
5047+ :Type: vm ioctl
5048+ :Parameters: struct kvm_arm_copy_mte_tags
5049+ :Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect
5050+ arguments, -EFAULT if memory cannot be accessed).
5051+
5052+ ::
5053+
5054+ struct kvm_arm_copy_mte_tags {
5055+ __u64 guest_ipa;
5056+ __u64 length;
5057+ void __user *addr;
5058+ __u64 flags;
5059+ __u64 reserved[2];
5060+ };
5061+
5062+ Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
5063+ ``guest_ipa `` and ``length `` fields must be ``PAGE_SIZE `` aligned. The ``addr ``
5064+ field must point to a buffer which the tags will be copied to or from.
5065+
5066+ ``flags `` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST `` or
5067+ ``KVM_ARM_TAGS_FROM_GUEST ``.
5068+
5069+ The size of the buffer to store the tags is ``(length / 16) `` bytes
5070+ (granules in MTE are 16 bytes long). Each byte contains a single tag
5071+ value. This matches the format of ``PTRACE_PEEKMTETAGS `` and
5072+ ``PTRACE_POKEMTETAGS ``.
5073+
5074+ If an error occurs before any data is copied then a negative error code is
5075+ returned. If some tags have been copied before an error occurs then the number
5076+ of bytes successfully copied is returned. If the call completes successfully
5077+ then ``length `` is returned.
5078+
5079+ 4.131 KVM_GET_SREGS2
5080+ ------------------
5081+
5082+ :Capability: KVM_CAP_SREGS2
5083+ :Architectures: x86
5084+ :Type: vcpu ioctl
5085+ :Parameters: struct kvm_sregs2 (out)
5086+ :Returns: 0 on success, -1 on error
5087+
5088+ Reads special registers from the vcpu.
5089+ This ioctl (when supported) replaces the KVM_GET_SREGS.
5090+
5091+ ::
5092+
5093+ struct kvm_sregs2 {
5094+ /* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */
5095+ struct kvm_segment cs, ds, es, fs, gs, ss;
5096+ struct kvm_segment tr, ldt;
5097+ struct kvm_dtable gdt, idt;
5098+ __u64 cr0, cr2, cr3, cr4, cr8;
5099+ __u64 efer;
5100+ __u64 apic_base;
5101+ __u64 flags;
5102+ __u64 pdptrs[4];
5103+ };
5104+
5105+ flags values for ``kvm_sregs2 ``:
5106+
5107+ ``KVM_SREGS2_FLAGS_PDPTRS_VALID ``
5108+
5109+ Indicates thats the struct contain valid PDPTR values.
5110+
5111+
5112+ 4.132 KVM_SET_SREGS2
5113+ ------------------
5114+
5115+ :Capability: KVM_CAP_SREGS2
5116+ :Architectures: x86
5117+ :Type: vcpu ioctl
5118+ :Parameters: struct kvm_sregs2 (in)
5119+ :Returns: 0 on success, -1 on error
5120+
5121+ Writes special registers into the vcpu.
5122+ See KVM_GET_SREGS2 for the data structures.
5123+ This ioctl (when supported) replaces the KVM_SET_SREGS.
5124+
5125+ 4.133 KVM_GET_STATS_FD
5126+ ----------------------
5127+
5128+ :Capability: KVM_CAP_STATS_BINARY_FD
5129+ :Architectures: all
5130+ :Type: vm ioctl, vcpu ioctl
5131+ :Parameters: none
5132+ :Returns: statistics file descriptor on success, < 0 on error
5133+
5134+ Errors:
5135+
5136+ ====== ======================================================
5137+ ENOMEM if the fd could not be created due to lack of memory
5138+ EMFILE if the number of opened files exceeds the limit
5139+ ====== ======================================================
5140+
5141+ The returned file descriptor can be used to read VM/vCPU statistics data in
5142+ binary format. The data in the file descriptor consists of four blocks
5143+ organized as follows:
5144+
5145+ +-------------+
5146+ | Header |
5147+ +-------------+
5148+ | id string |
5149+ +-------------+
5150+ | Descriptors |
5151+ +-------------+
5152+ | Stats Data |
5153+ +-------------+
5154+
5155+ Apart from the header starting at offset 0, please be aware that it is
5156+ not guaranteed that the four blocks are adjacent or in the above order;
5157+ the offsets of the id, descriptors and data blocks are found in the
5158+ header. However, all four blocks are aligned to 64 bit offsets in the
5159+ file and they do not overlap.
5160+
5161+ All blocks except the data block are immutable. Userspace can read them
5162+ only one time after retrieving the file descriptor, and then use ``pread `` or
5163+ ``lseek `` to read the statistics repeatedly.
5164+
5165+ All data is in system endianness.
5166+
5167+ The format of the header is as follows::
5168+
5169+ struct kvm_stats_header {
5170+ __u32 flags;
5171+ __u32 name_size;
5172+ __u32 num_desc;
5173+ __u32 id_offset;
5174+ __u32 desc_offset;
5175+ __u32 data_offset;
5176+ };
5177+
5178+ The ``flags `` field is not used at the moment. It is always read as 0.
5179+
5180+ The ``name_size `` field is the size (in byte) of the statistics name string
5181+ (including trailing '\0 ') which is contained in the "id string" block and
5182+ appended at the end of every descriptor.
5183+
5184+ The ``num_desc `` field is the number of descriptors that are included in the
5185+ descriptor block. (The actual number of values in the data block may be
5186+ larger, since each descriptor may comprise more than one value).
5187+
5188+ The ``id_offset `` field is the offset of the id string from the start of the
5189+ file indicated by the file descriptor. It is a multiple of 8.
5190+
5191+ The ``desc_offset `` field is the offset of the Descriptors block from the start
5192+ of the file indicated by the file descriptor. It is a multiple of 8.
5193+
5194+ The ``data_offset `` field is the offset of the Stats Data block from the start
5195+ of the file indicated by the file descriptor. It is a multiple of 8.
5196+
5197+ The id string block contains a string which identifies the file descriptor on
5198+ which KVM_GET_STATS_FD was invoked. The size of the block, including the
5199+ trailing ``'\0' ``, is indicated by the ``name_size `` field in the header.
5200+
5201+ The descriptors block is only needed to be read once for the lifetime of the
5202+ file descriptor contains a sequence of ``struct kvm_stats_desc ``, each followed
5203+ by a string of size ``name_size ``.
5204+
5205+ #define KVM_STATS_TYPE_SHIFT 0
5206+ #define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT)
5207+ #define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT)
5208+ #define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT)
5209+ #define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT)
5210+
5211+ #define KVM_STATS_UNIT_SHIFT 4
5212+ #define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT)
5213+ #define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT)
5214+ #define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT)
5215+ #define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT)
5216+ #define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT)
5217+
5218+ #define KVM_STATS_BASE_SHIFT 8
5219+ #define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT)
5220+ #define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT)
5221+ #define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT)
5222+
5223+ struct kvm_stats_desc {
5224+ __u32 flags;
5225+ __s16 exponent;
5226+ __u16 size;
5227+ __u32 offset;
5228+ __u32 unused;
5229+ char name[];
5230+ };
5231+
5232+ The ``flags `` field contains the type and unit of the statistics data described
5233+ by this descriptor. Its endianness is CPU native.
5234+ The following flags are supported:
5235+
5236+ Bits 0-3 of ``flags `` encode the type:
5237+ * ``KVM_STATS_TYPE_CUMULATIVE ``
5238+ The statistics data is cumulative. The value of data can only be increased.
5239+ Most of the counters used in KVM are of this type.
5240+ The corresponding ``size `` field for this type is always 1.
5241+ All cumulative statistics data are read/write.
5242+ * ``KVM_STATS_TYPE_INSTANT ``
5243+ The statistics data is instantaneous. Its value can be increased or
5244+ decreased. This type is usually used as a measurement of some resources,
5245+ like the number of dirty pages, the number of large pages, etc.
5246+ All instant statistics are read only.
5247+ The corresponding ``size `` field for this type is always 1.
5248+ * ``KVM_STATS_TYPE_PEAK ``
5249+ The statistics data is peak. The value of data can only be increased, and
5250+ represents a peak value for a measurement, for example the maximum number
5251+ of items in a hash table bucket, the longest time waited and so on.
5252+ The corresponding ``size `` field for this type is always 1.
5253+
5254+ Bits 4-7 of ``flags `` encode the unit:
5255+ * ``KVM_STATS_UNIT_NONE ``
5256+ There is no unit for the value of statistics data. This usually means that
5257+ the value is a simple counter of an event.
5258+ * ``KVM_STATS_UNIT_BYTES ``
5259+ It indicates that the statistics data is used to measure memory size, in the
5260+ unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
5261+ determined by the ``exponent `` field in the descriptor.
5262+ * ``KVM_STATS_UNIT_SECONDS ``
5263+ It indicates that the statistics data is used to measure time or latency.
5264+ * ``KVM_STATS_UNIT_CYCLES ``
5265+ It indicates that the statistics data is used to measure CPU clock cycles.
5266+
5267+ Bits 8-11 of ``flags ``, together with ``exponent ``, encode the scale of the
5268+ unit:
5269+ * ``KVM_STATS_BASE_POW10 ``
5270+ The scale is based on power of 10. It is used for measurement of time and
5271+ CPU clock cycles. For example, an exponent of -9 can be used with
5272+ ``KVM_STATS_UNIT_SECONDS `` to express that the unit is nanoseconds.
5273+ * ``KVM_STATS_BASE_POW2 ``
5274+ The scale is based on power of 2. It is used for measurement of memory size.
5275+ For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES `` to
5276+ express that the unit is MiB.
5277+
5278+ The ``size `` field is the number of values of this statistics data. Its
5279+ value is usually 1 for most of simple statistics. 1 means it contains an
5280+ unsigned 64bit data.
5281+
5282+ The ``offset `` field is the offset from the start of Data Block to the start of
5283+ the corresponding statistics data.
5284+
5285+ The ``unused `` field is reserved for future support for other types of
5286+ statistics data, like log/linear histogram. Its value is always 0 for the types
5287+ defined above.
5288+
5289+ The ``name `` field is the name string of the statistics data. The name string
5290+ starts at the end of ``struct kvm_stats_desc ``. The maximum length including
5291+ the trailing ``'\0' ``, is indicated by ``name_size `` in the header.
5292+
5293+ The Stats Data block contains an array of 64-bit values in the same order
5294+ as the descriptors in Descriptors block.
5295+
503752965. The kvm_run structure
50385297========================
50395298
@@ -6323,6 +6582,7 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
63236582This capability can be used to check / enable 2nd DAWR feature provided
63246583by POWER10 processor.
63256584
6585+
632665867.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
63276587-------------------------------------
63286588
@@ -6380,6 +6640,48 @@ present in the "ibm,hypertas-functions" device-tree property.
63806640This capability is enabled for hypervisors on platforms like POWER9
63816641that support radix MMU.
63826642
6643+ 7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE
6644+ --------------------------------------
6645+
6646+ :Architectures: x86
6647+ :Parameters: args[0] whether the feature should be enabled or not
6648+
6649+ When this capability is enabled, an emulation failure will result in an exit
6650+ to userspace with KVM_INTERNAL_ERROR (except when the emulator was invoked
6651+ to handle a VMware backdoor instruction). Furthermore, KVM will now provide up
6652+ to 15 instruction bytes for any exit to userspace resulting from an emulation
6653+ failure. When these exits to userspace occur use the emulation_failure struct
6654+ instead of the internal struct. They both have the same layout, but the
6655+ emulation_failure struct matches the content better. It also explicitly
6656+ defines the 'flags' field which is used to describe the fields in the struct
6657+ that are valid (ie: if KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is
6658+ set in the 'flags' field then both 'insn_size' and 'insn_bytes' have valid data
6659+ in them.)
6660+
6661+ 7.28 KVM_CAP_ARM_MTE
6662+ --------------------
6663+
6664+ :Architectures: arm64
6665+ :Parameters: none
6666+
6667+ This capability indicates that KVM (and the hardware) supports exposing the
6668+ Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
6669+ VMM before creating any VCPUs to allow the guest access. Note that MTE is only
6670+ available to a guest running in AArch64 mode and enabling this capability will
6671+ cause attempts to create AArch32 VCPUs to fail.
6672+
6673+ When enabled the guest is able to access tags associated with any memory given
6674+ to the guest. KVM will ensure that the tags are maintained during swap or
6675+ hibernation of the host; however the VMM needs to manually save/restore the
6676+ tags as appropriate if the VM is migrated.
6677+
6678+ When this capability is enabled all memory in memslots must be mapped as
6679+ not-shareable (no MAP_SHARED), attempts to create a memslot with a
6680+ MAP_SHARED mmap will result in an -EINVAL return.
6681+
6682+ When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS `` ioctl to
6683+ perform a bulk copy of tags to/from the guest.
6684+
638366858. Other capabilities.
63846686======================
63856687
@@ -6909,3 +7211,33 @@ This capability is always enabled.
69097211This capability indicates that the KVM virtual PTP service is
69107212supported in the host. A VMM can check whether the service is
69117213available to the guest on migration.
7214+
7215+ 8.33 KVM_CAP_HYPERV_ENFORCE_CPUID
7216+ -----------------------------
7217+
7218+ Architectures: x86
7219+
7220+ When enabled, KVM will disable emulated Hyper-V features provided to the
7221+ guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
7222+ currently implmented Hyper-V features are provided unconditionally when
7223+ Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
7224+ leaf.
7225+
7226+ 8.34 KVM_CAP_EXIT_HYPERCALL
7227+ ---------------------------
7228+
7229+ :Capability: KVM_CAP_EXIT_HYPERCALL
7230+ :Architectures: x86
7231+ :Type: vm
7232+
7233+ This capability, if enabled, will cause KVM to exit to userspace
7234+ with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
7235+
7236+ Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
7237+ of hypercalls that can be configured to exit to userspace.
7238+ Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
7239+
7240+ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
7241+ of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
7242+ the hypercalls whose corresponding bit is in the argument, and return
7243+ ENOSYS for the others.
0 commit comments