Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+85

Documentation/arm64/perf.txt

··· 1 + Perf Event Attributes 2 + ===================== 3 + 4 + Author: Andrew Murray <andrew.murray@arm.com> 5 + Date: 2019-03-06 6 + 7 + exclude_user 8 + ------------ 9 + 10 + This attribute excludes userspace. 11 + 12 + Userspace always runs at EL0 and thus this attribute will exclude EL0. 13 + 14 + 15 + exclude_kernel 16 + -------------- 17 + 18 + This attribute excludes the kernel. 19 + 20 + The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run 21 + at EL1. 22 + 23 + For the host this attribute will exclude EL1 and additionally EL2 on a VHE 24 + system. 25 + 26 + For the guest this attribute will exclude EL1. Please note that EL2 is 27 + never counted within a guest. 28 + 29 + 30 + exclude_hv 31 + ---------- 32 + 33 + This attribute excludes the hypervisor. 34 + 35 + For a VHE host this attribute is ignored as we consider the host kernel to 36 + be the hypervisor. 37 + 38 + For a non-VHE host this attribute will exclude EL2 as we consider the 39 + hypervisor to be any code that runs at EL2 which is predominantly used for 40 + guest/host transitions. 41 + 42 + For the guest this attribute has no effect. Please note that EL2 is 43 + never counted within a guest. 44 + 45 + 46 + exclude_host / exclude_guest 47 + ---------------------------- 48 + 49 + These attributes exclude the KVM host and guest, respectively. 50 + 51 + The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE 52 + kernel or non-VHE hypervisor). 53 + 54 + The KVM guest may run at EL0 (userspace) and EL1 (kernel). 55 + 56 + Due to the overlapping exception levels between host and guests we cannot 57 + exclusively rely on the PMU's hardware exception filtering - therefore we 58 + must enable/disable counting on the entry and exit to the guest. This is 59 + performed differently on VHE and non-VHE systems. 60 + 61 + For non-VHE systems we exclude EL2 for exclude_host - upon entering and 62 + exiting the guest we disable/enable the event as appropriate based on the 63 + exclude_host and exclude_guest attributes. 64 + 65 + For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 66 + for exclude_host. Upon entering and exiting the guest we modify the event 67 + to include/exclude EL0 as appropriate based on the exclude_host and 68 + exclude_guest attributes. 69 + 70 + The statements above also apply when these attributes are used within a 71 + non-VHE guest however please note that EL2 is never counted within a guest. 72 + 73 + 74 + Accuracy 75 + -------- 76 + 77 + On non-VHE hosts we enable/disable counters on the entry/exit of host/guest 78 + transition at EL2 - however there is a period of time between 79 + enabling/disabling the counters and entering/exiting the guest. We are 80 + able to eliminate counters counting host events on the boundaries of guest 81 + entry/exit when counting guest events by filtering out EL2 for 82 + exclude_host. However when using !exclude_hv there is a small blackout 83 + window at the guest entry/exit where host events are not captured. 84 + 85 + On VHE systems there are no blackout windows.

+18 -4

Documentation/arm64/pointer-authentication.txt

··· 87 87 Virtualization 88 88 -------------- 89 89 90 - Pointer authentication is not currently supported in KVM guests. KVM 91 - will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of 92 - the feature will result in an UNDEFINED exception being injected into 93 - the guest. 90 + Pointer authentication is enabled in KVM guest when each virtual cpu is 91 + initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and 92 + requesting these two separate cpu features to be enabled. The current KVM 93 + guest implementation works by enabling both features together, so both 94 + these userspace flags are checked before enabling pointer authentication. 95 + The separate userspace flag will allow to have no userspace ABI changes 96 + if support is added in the future to allow these two features to be 97 + enabled independently of one another. 98 + 99 + As Arm Architecture specifies that Pointer Authentication feature is 100 + implemented along with the VHE feature so KVM arm64 ptrauth code relies 101 + on VHE mode to be present. 102 + 103 + Additionally, when these vcpu feature flags are not set then KVM will 104 + filter out the Pointer Authentication system key registers from 105 + KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID 106 + register. Any attempt to use the Pointer Authentication instructions will 107 + result in an UNDEFINED exception being injected into the guest.

+200 -25

Documentation/virtual/kvm/api.txt

··· 69 69 the VM is shut down. 70 70 71 71 72 - It is important to note that althought VM ioctls may only be issued from 73 - the process that created the VM, a VM's lifecycle is associated with its 74 - file descriptor, not its creator (process). In other words, the VM and 75 - its resources, *including the associated address space*, are not freed 76 - until the last reference to the VM's file descriptor has been released. 77 - For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will 78 - not be freed until both the parent (original) process and its child have 79 - put their references to the VM's file descriptor. 80 - 81 - Because a VM's resources are not freed until the last reference to its 82 - file descriptor is released, creating additional references to a VM via 83 - via fork(), dup(), etc... without careful consideration is strongly 84 - discouraged and may have unwanted side effects, e.g. memory allocated 85 - by and on behalf of the VM's process may not be freed/unaccounted when 86 - the VM is shut down. 87 - 88 - 89 72 3. Extensions 90 73 ------------- 91 74 ··· 330 347 the KVM_CAP_MULTI_ADDRESS_SPACE capability. 331 348 332 349 The bits in the dirty bitmap are cleared before the ioctl returns, unless 333 - KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information, 350 + KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information, 334 351 see the description of the capability. 335 352 336 353 4.9 KVM_SET_MEMORY_ALIAS ··· 1100 1117 This ioctl allows the user to create, modify or delete a guest physical 1101 1118 memory slot. Bits 0-15 of "slot" specify the slot id and this value 1102 1119 should be less than the maximum number of user memory slots supported per 1103 - VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, 1104 - if this capability is supported by the architecture. Slots may not 1105 - overlap in guest physical address space. 1120 + VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS. 1121 + Slots may not overlap in guest physical address space. 1106 1122 1107 1123 If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" 1108 1124 specifies the address space which is being modified. They must be ··· 1883 1901 Type: vcpu ioctl 1884 1902 Parameters: struct kvm_one_reg (in) 1885 1903 Returns: 0 on success, negative value on failure 1904 + Errors: 1905 + ENOENT: no such register 1906 + EINVAL: invalid register ID, or no such register 1907 + EPERM: (arm64) register access not allowed before vcpu finalization 1908 + (These error codes are indicative only: do not rely on a specific error 1909 + code being returned in a specific situation.) 1886 1910 1887 1911 struct kvm_one_reg { 1888 1912 __u64 id; ··· 1973 1985 PPC | KVM_REG_PPC_TLB3PS | 32 1974 1986 PPC | KVM_REG_PPC_EPTCFG | 32 1975 1987 PPC | KVM_REG_PPC_ICP_STATE | 64 1988 + PPC | KVM_REG_PPC_VP_STATE | 128 1976 1989 PPC | KVM_REG_PPC_TB_OFFSET | 64 1977 1990 PPC | KVM_REG_PPC_SPMC1 | 32 1978 1991 PPC | KVM_REG_PPC_SPMC2 | 32 ··· 2126 2137 value in the kvm_regs structure seen as a 32bit array. 2127 2138 0x60x0 0000 0010 <index into the kvm_regs struct:16> 2128 2139 2140 + Specifically: 2141 + Encoding Register Bits kvm_regs member 2142 + ---------------------------------------------------------------- 2143 + 0x6030 0000 0010 0000 X0 64 regs.regs[0] 2144 + 0x6030 0000 0010 0002 X1 64 regs.regs[1] 2145 + ... 2146 + 0x6030 0000 0010 003c X30 64 regs.regs[30] 2147 + 0x6030 0000 0010 003e SP 64 regs.sp 2148 + 0x6030 0000 0010 0040 PC 64 regs.pc 2149 + 0x6030 0000 0010 0042 PSTATE 64 regs.pstate 2150 + 0x6030 0000 0010 0044 SP_EL1 64 sp_el1 2151 + 0x6030 0000 0010 0046 ELR_EL1 64 elr_el1 2152 + 0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC) 2153 + 0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT] 2154 + 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] 2155 + 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] 2156 + 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] 2157 + 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*) 2158 + 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*) 2159 + ... 2160 + 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*) 2161 + 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr 2162 + 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr 2163 + 2164 + (*) These encodings are not accepted for SVE-enabled vcpus. See 2165 + KVM_ARM_VCPU_INIT. 2166 + 2167 + The equivalent register content can be accessed via bits [127:0] of 2168 + the corresponding SVE Zn registers instead for vcpus that have SVE 2169 + enabled (see below). 2170 + 2129 2171 arm64 CCSIDR registers are demultiplexed by CSSELR value: 2130 2172 0x6020 0000 0011 00 <csselr:8> 2131 2173 ··· 2165 2145 2166 2146 arm64 firmware pseudo-registers have the following bit pattern: 2167 2147 0x6030 0000 0014 <regno:16> 2148 + 2149 + arm64 SVE registers have the following bit patterns: 2150 + 0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice] 2151 + 0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice] 2152 + 0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice] 2153 + 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register 2154 + 2155 + Access to register IDs where 2048 * slice >= 128 * max_vq will fail with 2156 + ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit 2157 + quadwords: see (**) below. 2158 + 2159 + These registers are only accessible on vcpus for which SVE is enabled. 2160 + See KVM_ARM_VCPU_INIT for details. 2161 + 2162 + In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not 2163 + accessible until the vcpu's SVE configuration has been finalized 2164 + using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT 2165 + and KVM_ARM_VCPU_FINALIZE for more information about this procedure. 2166 + 2167 + KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector 2168 + lengths supported by the vcpu to be discovered and configured by 2169 + userspace. When transferred to or from user memory via KVM_GET_ONE_REG 2170 + or KVM_SET_ONE_REG, the value of this register is of type 2171 + __u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as 2172 + follows: 2173 + 2174 + __u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; 2175 + 2176 + if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && 2177 + ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> 2178 + ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1)) 2179 + /* Vector length vq * 16 bytes supported */ 2180 + else 2181 + /* Vector length vq * 16 bytes not supported */ 2182 + 2183 + (**) The maximum value vq for which the above condition is true is 2184 + max_vq. This is the maximum vector length available to the guest on 2185 + this vcpu, and determines which register slices are visible through 2186 + this ioctl interface. 2187 + 2188 + (See Documentation/arm64/sve.txt for an explanation of the "vq" 2189 + nomenclature.) 2190 + 2191 + KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. 2192 + KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that 2193 + the host supports. 2194 + 2195 + Userspace may subsequently modify it if desired until the vcpu's SVE 2196 + configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). 2197 + 2198 + Apart from simply removing all vector lengths from the host set that 2199 + exceed some value, support for arbitrarily chosen sets of vector lengths 2200 + is hardware-dependent and may not be available. Attempting to configure 2201 + an invalid set of vector lengths via KVM_SET_ONE_REG will fail with 2202 + EINVAL. 2203 + 2204 + After the vcpu's SVE configuration is finalized, further attempts to 2205 + write this register will fail with EPERM. 2168 2206 2169 2207 2170 2208 MIPS registers are mapped using the lower 32 bits. The upper 16 of that is ··· 2276 2198 Type: vcpu ioctl 2277 2199 Parameters: struct kvm_one_reg (in and out) 2278 2200 Returns: 0 on success, negative value on failure 2201 + Errors include: 2202 + ENOENT: no such register 2203 + EINVAL: invalid register ID, or no such register 2204 + EPERM: (arm64) register access not allowed before vcpu finalization 2205 + (These error codes are indicative only: do not rely on a specific error 2206 + code being returned in a specific situation.) 2279 2207 2280 2208 This ioctl allows to receive the value of a single register implemented 2281 2209 in a vcpu. The register to read is indicated by the "id" field of the ··· 2774 2690 - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. 2775 2691 Depends on KVM_CAP_ARM_PMU_V3. 2776 2692 2693 + - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication 2694 + for arm64 only. 2695 + Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS. 2696 + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are 2697 + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and 2698 + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be 2699 + requested. 2700 + 2701 + - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication 2702 + for arm64 only. 2703 + Depends on KVM_CAP_ARM_PTRAUTH_GENERIC. 2704 + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are 2705 + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and 2706 + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be 2707 + requested. 2708 + 2709 + - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). 2710 + Depends on KVM_CAP_ARM_SVE. 2711 + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): 2712 + 2713 + * After KVM_ARM_VCPU_INIT: 2714 + 2715 + - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the 2716 + initial value of this pseudo-register indicates the best set of 2717 + vector lengths possible for a vcpu on this host. 2718 + 2719 + * Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): 2720 + 2721 + - KVM_RUN and KVM_GET_REG_LIST are not available; 2722 + 2723 + - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access 2724 + the scalable archietctural SVE registers 2725 + KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or 2726 + KVM_REG_ARM64_SVE_FFR; 2727 + 2728 + - KVM_REG_ARM64_SVE_VLS may optionally be written using 2729 + KVM_SET_ONE_REG, to modify the set of vector lengths available 2730 + for the vcpu. 2731 + 2732 + * After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): 2733 + 2734 + - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can 2735 + no longer be written using KVM_SET_ONE_REG. 2777 2736 2778 2737 4.83 KVM_ARM_PREFERRED_TARGET 2779 2738 ··· 3936 3809 3937 3810 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) 3938 3811 3939 - Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 3812 + Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 3940 3813 Architectures: x86, arm, arm64, mips 3941 3814 Type: vm ioctl 3942 3815 Parameters: struct kvm_dirty_log (in) ··· 3969 3842 They must be less than the value that KVM_CHECK_EXTENSION returns for 3970 3843 the KVM_CAP_MULTI_ADDRESS_SPACE capability. 3971 3844 3972 - This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 3845 + This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 3973 3846 is enabled; for more information, see the description of the capability. 3974 3847 However, it can always be used as long as KVM_CHECK_EXTENSION confirms 3975 - that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present. 3848 + that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. 3976 3849 3977 3850 4.118 KVM_GET_SUPPORTED_HV_CPUID 3978 3851 ··· 4030 3903 4031 3904 'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved, 4032 3905 userspace should not expect to get any particular value there. 3906 + 3907 + 4.119 KVM_ARM_VCPU_FINALIZE 3908 + 3909 + Architectures: arm, arm64 3910 + Type: vcpu ioctl 3911 + Parameters: int feature (in) 3912 + Returns: 0 on success, -1 on error 3913 + Errors: 3914 + EPERM: feature not enabled, needs configuration, or already finalized 3915 + EINVAL: feature unknown or not present 3916 + 3917 + Recognised values for feature: 3918 + arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE) 3919 + 3920 + Finalizes the configuration of the specified vcpu feature. 3921 + 3922 + The vcpu must already have been initialised, enabling the affected feature, by 3923 + means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in 3924 + features[]. 3925 + 3926 + For affected vcpu features, this is a mandatory step that must be performed 3927 + before the vcpu is fully usable. 3928 + 3929 + Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be 3930 + configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration 3931 + that should be performaned and how to do it are feature-dependent. 3932 + 3933 + Other calls that depend on a particular feature being finalized, such as 3934 + KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with 3935 + -EPERM unless the feature has already been finalized by means of a 3936 + KVM_ARM_VCPU_FINALIZE call. 3937 + 3938 + See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization 3939 + using this ioctl. 4033 3940 4034 3941 5. The kvm_run structure 4035 3942 ------------------------ ··· 4666 4505 struct kvm_vcpu_events events; 4667 4506 }; 4668 4507 4508 + 6.75 KVM_CAP_PPC_IRQ_XIVE 4509 + 4510 + Architectures: ppc 4511 + Target: vcpu 4512 + Parameters: args[0] is the XIVE device fd 4513 + args[1] is the XIVE CPU number (server ID) for this vcpu 4514 + 4515 + This capability connects the vcpu to an in-kernel XIVE device. 4516 + 4669 4517 7. Capabilities that can be enabled on VMs 4670 4518 ------------------------------------------ 4671 4519 ··· 4968 4798 * For the new DR6 bits, note that bit 16 is set iff the #DB exception 4969 4799 will clear DR6.RTM. 4970 4800 4971 - 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 4801 + 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 4972 4802 4973 4803 Architectures: x86, arm, arm64, mips 4974 4804 Parameters: args[0] whether feature should be enabled or not ··· 4991 4821 helps reducing this time, improving guest performance and reducing the 4992 4822 number of dirty log false positives. 4993 4823 4824 + KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name 4825 + KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make 4826 + it hard or impossible to use it correctly. The availability of 4827 + KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. 4828 + Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. 4994 4829 4995 4830 8. Other capabilities. 4996 4831 ----------------------

+2 -1

Documentation/virtual/kvm/devices/vm.txt

··· 141 141 u8 pcc[16]; # valid with Message-Security-Assist-Extension 4 142 142 u8 ppno[16]; # valid with Message-Security-Assist-Extension 5 143 143 u8 kma[16]; # valid with Message-Security-Assist-Extension 8 144 - u8 reserved[1808]; # reserved for future instructions 144 + u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9 145 + u8 reserved[1792]; # reserved for future instructions 145 146 }; 146 147 147 148 Parameters: address of a buffer to load the subfunction blocks from.

+197

Documentation/virtual/kvm/devices/xive.txt

··· 1 + POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 2 + ========================================================== 3 + 4 + Device types supported: 5 + KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 6 + 7 + This device acts as a VM interrupt controller. It provides the KVM 8 + interface to configure the interrupt sources of a VM in the underlying 9 + POWER9 XIVE interrupt controller. 10 + 11 + Only one XIVE instance may be instantiated. A guest XIVE device 12 + requires a POWER9 host and the guest OS should have support for the 13 + XIVE native exploitation interrupt mode. If not, it should run using 14 + the legacy interrupt mode, referred as XICS (POWER7/8). 15 + 16 + * Device Mappings 17 + 18 + The KVM device exposes different MMIO ranges of the XIVE HW which 19 + are required for interrupt management. These are exposed to the 20 + guest in VMAs populated with a custom VM fault handler. 21 + 22 + 1. Thread Interrupt Management Area (TIMA) 23 + 24 + Each thread has an associated Thread Interrupt Management context 25 + composed of a set of registers. These registers let the thread 26 + handle priority management and interrupt acknowledgment. The most 27 + important are : 28 + 29 + - Interrupt Pending Buffer (IPB) 30 + - Current Processor Priority (CPPR) 31 + - Notification Source Register (NSR) 32 + 33 + They are exposed to software in four different pages each proposing 34 + a view with a different privilege. The first page is for the 35 + physical thread context and the second for the hypervisor. Only the 36 + third (operating system) and the fourth (user level) are exposed the 37 + guest. 38 + 39 + 2. Event State Buffer (ESB) 40 + 41 + Each source is associated with an Event State Buffer (ESB) with 42 + either a pair of even/odd pair of pages which provides commands to 43 + manage the source: to trigger, to EOI, to turn off the source for 44 + instance. 45 + 46 + 3. Device pass-through 47 + 48 + When a device is passed-through into the guest, the source 49 + interrupts are from a different HW controller (PHB4) and the ESB 50 + pages exposed to the guest should accommadate this change. 51 + 52 + The passthru_irq helpers, kvmppc_xive_set_mapped() and 53 + kvmppc_xive_clr_mapped() are called when the device HW irqs are 54 + mapped into or unmapped from the guest IRQ number space. The KVM 55 + device extends these helpers to clear the ESB pages of the guest IRQ 56 + number being mapped and then lets the VM fault handler repopulate. 57 + The handler will insert the ESB page corresponding to the HW 58 + interrupt of the device being passed-through or the initial IPI ESB 59 + page if the device has being removed. 60 + 61 + The ESB remapping is fully transparent to the guest and the OS 62 + device driver. All handling is done within VFIO and the above 63 + helpers in KVM-PPC. 64 + 65 + * Groups: 66 + 67 + 1. KVM_DEV_XIVE_GRP_CTRL 68 + Provides global controls on the device 69 + Attributes: 70 + 1.1 KVM_DEV_XIVE_RESET (write only) 71 + Resets the interrupt controller configuration for sources and event 72 + queues. To be used by kexec and kdump. 73 + Errors: none 74 + 75 + 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 76 + Sync all the sources and queues and mark the EQ pages dirty. This 77 + to make sure that a consistent memory state is captured when 78 + migrating the VM. 79 + Errors: none 80 + 81 + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 82 + Initializes a new source in the XIVE device and mask it. 83 + Attributes: 84 + Interrupt source number (64-bit) 85 + The kvm_device_attr.addr points to a __u64 value: 86 + bits: | 63 .... 2 | 1 | 0 87 + values: | unused | level | type 88 + - type: 0:MSI 1:LSI 89 + - level: assertion level in case of an LSI. 90 + Errors: 91 + -E2BIG: Interrupt source number is out of range 92 + -ENOMEM: Could not create a new source block 93 + -EFAULT: Invalid user pointer for attr->addr. 94 + -ENXIO: Could not allocate underlying HW interrupt 95 + 96 + 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 97 + Configures source targeting 98 + Attributes: 99 + Interrupt source number (64-bit) 100 + The kvm_device_attr.addr points to a __u64 value: 101 + bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 102 + values: | eisn | mask | server | priority 103 + - priority: 0-7 interrupt priority level 104 + - server: CPU number chosen to handle the interrupt 105 + - mask: mask flag (unused) 106 + - eisn: Effective Interrupt Source Number 107 + Errors: 108 + -ENOENT: Unknown source number 109 + -EINVAL: Not initialized source number 110 + -EINVAL: Invalid priority 111 + -EINVAL: Invalid CPU number. 112 + -EFAULT: Invalid user pointer for attr->addr. 113 + -ENXIO: CPU event queues not configured or configuration of the 114 + underlying HW interrupt failed 115 + -EBUSY: No CPU available to serve interrupt 116 + 117 + 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 118 + Configures an event queue of a CPU 119 + Attributes: 120 + EQ descriptor identifier (64-bit) 121 + The EQ descriptor identifier is a tuple (server, priority) : 122 + bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 123 + values: | unused | server | priority 124 + The kvm_device_attr.addr points to : 125 + struct kvm_ppc_xive_eq { 126 + __u32 flags; 127 + __u32 qshift; 128 + __u64 qaddr; 129 + __u32 qtoggle; 130 + __u32 qindex; 131 + __u8 pad[40]; 132 + }; 133 + - flags: queue flags 134 + KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 135 + forces notification without using the coalescing mechanism 136 + provided by the XIVE END ESBs. 137 + - qshift: queue size (power of 2) 138 + - qaddr: real address of queue 139 + - qtoggle: current queue toggle bit 140 + - qindex: current queue index 141 + - pad: reserved for future use 142 + Errors: 143 + -ENOENT: Invalid CPU number 144 + -EINVAL: Invalid priority 145 + -EINVAL: Invalid flags 146 + -EINVAL: Invalid queue size 147 + -EINVAL: Invalid queue address 148 + -EFAULT: Invalid user pointer for attr->addr. 149 + -EIO: Configuration of the underlying HW failed 150 + 151 + 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 152 + Synchronize the source to flush event notifications 153 + Attributes: 154 + Interrupt source number (64-bit) 155 + Errors: 156 + -ENOENT: Unknown source number 157 + -EINVAL: Not initialized source number 158 + 159 + * VCPU state 160 + 161 + The XIVE IC maintains VP interrupt state in an internal structure 162 + called the NVT. When a VP is not dispatched on a HW processor 163 + thread, this structure can be updated by HW if the VP is the target 164 + of an event notification. 165 + 166 + It is important for migration to capture the cached IPB from the NVT 167 + as it synthesizes the priorities of the pending interrupts. We 168 + capture a bit more to report debug information. 169 + 170 + KVM_REG_PPC_VP_STATE (2 * 64bits) 171 + bits: | 63 .... 32 | 31 .... 0 | 172 + values: | TIMA word0 | TIMA word1 | 173 + bits: | 127 .......... 64 | 174 + values: | unused | 175 + 176 + * Migration: 177 + 178 + Saving the state of a VM using the XIVE native exploitation mode 179 + should follow a specific sequence. When the VM is stopped : 180 + 181 + 1. Mask all sources (PQ=01) to stop the flow of events. 182 + 183 + 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 184 + flush any in-flight event notification and to stabilize the EQs. At 185 + this stage, the EQ pages are marked dirty to make sure they are 186 + transferred in the migration sequence. 187 + 188 + 3. Capture the state of the source targeting, the EQs configuration 189 + and the state of thread interrupt context registers. 190 + 191 + Restore is similar : 192 + 193 + 1. Restore the EQ configuration. As targeting depends on it. 194 + 2. Restore targeting 195 + 3. Restore the thread interrupt contexts 196 + 4. Restore the source states 197 + 5. Let the vCPU run

+2

arch/arm/include/asm/kvm_emulate.h

··· 343 343 } 344 344 } 345 345 346 + static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {} 347 + 346 348 #endif /* __ARM_KVM_EMULATE_H__ */

+23 -3

arch/arm/include/asm/kvm_host.h

··· 19 19 #ifndef __ARM_KVM_HOST_H__ 20 20 #define __ARM_KVM_HOST_H__ 21 21 22 + #include <linux/errno.h> 22 23 #include <linux/types.h> 23 24 #include <linux/kvm_types.h> 24 25 #include <asm/cputype.h> ··· 53 52 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) 54 53 55 54 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); 55 + 56 + static inline int kvm_arm_init_sve(void) { return 0; } 56 57 57 58 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); 58 59 int __attribute_const__ kvm_target_cpu(void); ··· 153 150 u32 cp15[NR_CP15_REGS]; 154 151 }; 155 152 156 - typedef struct kvm_cpu_context kvm_cpu_context_t; 153 + struct kvm_host_data { 154 + struct kvm_cpu_context host_ctxt; 155 + }; 157 156 158 - static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, 157 + typedef struct kvm_host_data kvm_host_data_t; 158 + 159 + static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, 159 160 int cpu) 160 161 { 161 162 /* The host's MPIDR is immutable, so let's set it up at boot time */ ··· 189 182 struct kvm_vcpu_fault_info fault; 190 183 191 184 /* Host FP context */ 192 - kvm_cpu_context_t *host_cpu_context; 185 + struct kvm_cpu_context *host_cpu_context; 193 186 194 187 /* VGIC state */ 195 188 struct vgic_cpu vgic_cpu; ··· 368 361 static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {} 369 362 static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {} 370 363 364 + static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} 365 + static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} 366 + 371 367 static inline void kvm_arm_vhe_guest_enter(void) {} 372 368 static inline void kvm_arm_vhe_guest_exit(void) {} 373 369 ··· 417 407 if (type) 418 408 return -EINVAL; 419 409 return 0; 410 + } 411 + 412 + static inline int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature) 413 + { 414 + return -EINVAL; 415 + } 416 + 417 + static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu) 418 + { 419 + return true; 420 420 } 421 421 422 422 #endif /* __ARM_KVM_HOST_H__ */

+4 -2

arch/arm64/Kconfig

··· 1341 1341 config ARM64_PTR_AUTH 1342 1342 bool "Enable support for pointer authentication" 1343 1343 default y 1344 + depends on !KVM || ARM64_VHE 1344 1345 help 1345 1346 Pointer authentication (part of the ARMv8.3 Extensions) provides 1346 1347 instructions for signing and authenticating pointers against secret ··· 1355 1354 context-switched along with the process. 1356 1355 1357 1356 The feature is detected at runtime. If the feature is not present in 1358 - hardware it will not be advertised to userspace nor will it be 1359 - enabled. 1357 + hardware it will not be advertised to userspace/KVM guest nor will it 1358 + be enabled. However, KVM guest also require VHE mode and hence 1359 + CONFIG_ARM64_VHE=y option to use this feature. 1360 1360 1361 1361 endmenu 1362 1362

+28 -1

arch/arm64/include/asm/fpsimd.h

··· 24 24 25 25 #ifndef __ASSEMBLY__ 26 26 27 + #include <linux/bitmap.h> 27 28 #include <linux/build_bug.h> 29 + #include <linux/bug.h> 28 30 #include <linux/cache.h> 29 31 #include <linux/init.h> 30 32 #include <linux/stddef.h> 33 + #include <linux/types.h> 31 34 32 35 #if defined(__KERNEL__) && defined(CONFIG_COMPAT) 33 36 /* Masks for extracting the FPSR and FPCR from the FPSCR */ ··· 59 56 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); 60 57 61 58 extern void fpsimd_bind_task_to_cpu(void); 62 - extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state); 59 + extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, 60 + void *sve_state, unsigned int sve_vl); 63 61 64 62 extern void fpsimd_flush_task_state(struct task_struct *target); 65 63 extern void fpsimd_flush_cpu_state(void); ··· 91 87 extern u64 read_zcr_features(void); 92 88 93 89 extern int __ro_after_init sve_max_vl; 90 + extern int __ro_after_init sve_max_virtualisable_vl; 91 + extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); 92 + 93 + /* 94 + * Helpers to translate bit indices in sve_vq_map to VQ values (and 95 + * vice versa). This allows find_next_bit() to be used to find the 96 + * _maximum_ VQ not exceeding a certain value. 97 + */ 98 + static inline unsigned int __vq_to_bit(unsigned int vq) 99 + { 100 + return SVE_VQ_MAX - vq; 101 + } 102 + 103 + static inline unsigned int __bit_to_vq(unsigned int bit) 104 + { 105 + return SVE_VQ_MAX - bit; 106 + } 107 + 108 + /* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */ 109 + static inline bool sve_vq_available(unsigned int vq) 110 + { 111 + return test_bit(__vq_to_bit(vq), sve_vq_map); 112 + } 94 113 95 114 #ifdef CONFIG_ARM64_SVE 96 115

+2 -1

arch/arm64/include/asm/kvm_asm.h

··· 108 108 .endm 109 109 110 110 .macro get_host_ctxt reg, tmp 111 - hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp 111 + hyp_adr_this_cpu \reg, kvm_host_data, \tmp 112 + add \reg, \reg, #HOST_DATA_CONTEXT 112 113 .endm 113 114 114 115 .macro get_vcpu_ptr vcpu, ctxt

+16

arch/arm64/include/asm/kvm_emulate.h

··· 98 98 vcpu->arch.hcr_el2 |= HCR_TWE; 99 99 } 100 100 101 + static inline void vcpu_ptrauth_enable(struct kvm_vcpu *vcpu) 102 + { 103 + vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK); 104 + } 105 + 106 + static inline void vcpu_ptrauth_disable(struct kvm_vcpu *vcpu) 107 + { 108 + vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK); 109 + } 110 + 111 + static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) 112 + { 113 + if (vcpu_has_ptrauth(vcpu)) 114 + vcpu_ptrauth_disable(vcpu); 115 + } 116 + 101 117 static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) 102 118 { 103 119 return vcpu->arch.vsesr_el2;

+93 -8

arch/arm64/include/asm/kvm_host.h

··· 22 22 #ifndef __ARM64_KVM_HOST_H__ 23 23 #define __ARM64_KVM_HOST_H__ 24 24 25 + #include <linux/bitmap.h> 25 26 #include <linux/types.h> 27 + #include <linux/jump_label.h> 26 28 #include <linux/kvm_types.h> 29 + #include <linux/percpu.h> 27 30 #include <asm/arch_gicv3.h> 31 + #include <asm/barrier.h> 28 32 #include <asm/cpufeature.h> 29 33 #include <asm/daifflags.h> 30 34 #include <asm/fpsimd.h> ··· 49 45 50 46 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS 51 47 52 - #define KVM_VCPU_MAX_FEATURES 4 48 + #define KVM_VCPU_MAX_FEATURES 7 53 49 54 50 #define KVM_REQ_SLEEP \ 55 51 KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) ··· 58 54 59 55 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); 60 56 57 + extern unsigned int kvm_sve_max_vl; 58 + int kvm_arm_init_sve(void); 59 + 61 60 int __attribute_const__ kvm_target_cpu(void); 62 61 int kvm_reset_vcpu(struct kvm_vcpu *vcpu); 62 + void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); 63 63 int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext); 64 64 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start); 65 65 ··· 125 117 SCTLR_EL1, /* System Control Register */ 126 118 ACTLR_EL1, /* Auxiliary Control Register */ 127 119 CPACR_EL1, /* Coprocessor Access Control */ 120 + ZCR_EL1, /* SVE Control */ 128 121 TTBR0_EL1, /* Translation Table Base Register 0 */ 129 122 TTBR1_EL1, /* Translation Table Base Register 1 */ 130 123 TCR_EL1, /* Translation Control Register */ ··· 160 151 PMOVSSET_EL0, /* Overflow Flag Status Set Register */ 161 152 PMSWINC_EL0, /* Software Increment Register */ 162 153 PMUSERENR_EL0, /* User Enable Register */ 154 + 155 + /* Pointer Authentication Registers in a strict increasing order. */ 156 + APIAKEYLO_EL1, 157 + APIAKEYHI_EL1, 158 + APIBKEYLO_EL1, 159 + APIBKEYHI_EL1, 160 + APDAKEYLO_EL1, 161 + APDAKEYHI_EL1, 162 + APDBKEYLO_EL1, 163 + APDBKEYHI_EL1, 164 + APGAKEYLO_EL1, 165 + APGAKEYHI_EL1, 163 166 164 167 /* 32bit specific registers. Keep them at the end of the range */ 165 168 DACR32_EL2, /* Domain Access Control Register */ ··· 233 212 struct kvm_vcpu *__hyp_running_vcpu; 234 213 }; 235 214 236 - typedef struct kvm_cpu_context kvm_cpu_context_t; 215 + struct kvm_pmu_events { 216 + u32 events_host; 217 + u32 events_guest; 218 + }; 219 + 220 + struct kvm_host_data { 221 + struct kvm_cpu_context host_ctxt; 222 + struct kvm_pmu_events pmu_events; 223 + }; 224 + 225 + typedef struct kvm_host_data kvm_host_data_t; 237 226 238 227 struct vcpu_reset_state { 239 228 unsigned long pc; ··· 254 223 255 224 struct kvm_vcpu_arch { 256 225 struct kvm_cpu_context ctxt; 226 + void *sve_state; 227 + unsigned int sve_max_vl; 257 228 258 229 /* HYP configuration */ 259 230 u64 hcr_el2; ··· 288 255 struct kvm_guest_debug_arch external_debug_state; 289 256 290 257 /* Pointer to host CPU context */ 291 - kvm_cpu_context_t *host_cpu_context; 258 + struct kvm_cpu_context *host_cpu_context; 292 259 293 260 struct thread_info *host_thread_info; /* hyp VA */ 294 261 struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */ ··· 351 318 bool sysregs_loaded_on_cpu; 352 319 }; 353 320 321 + /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ 322 + #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \ 323 + sve_ffr_offset((vcpu)->arch.sve_max_vl))) 324 + 325 + #define vcpu_sve_state_size(vcpu) ({ \ 326 + size_t __size_ret; \ 327 + unsigned int __vcpu_vq; \ 328 + \ 329 + if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \ 330 + __size_ret = 0; \ 331 + } else { \ 332 + __vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl); \ 333 + __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \ 334 + } \ 335 + \ 336 + __size_ret; \ 337 + }) 338 + 354 339 /* vcpu_arch flags field values: */ 355 340 #define KVM_ARM64_DEBUG_DIRTY (1 << 0) 356 341 #define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs loaded */ 357 342 #define KVM_ARM64_FP_HOST (1 << 2) /* host FP regs loaded */ 358 343 #define KVM_ARM64_HOST_SVE_IN_USE (1 << 3) /* backup for host TIF_SVE */ 359 344 #define KVM_ARM64_HOST_SVE_ENABLED (1 << 4) /* SVE enabled for EL0 */ 345 + #define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */ 346 + #define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */ 347 + #define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */ 348 + 349 + #define vcpu_has_sve(vcpu) (system_supports_sve() && \ 350 + ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) 351 + 352 + #define vcpu_has_ptrauth(vcpu) ((system_supports_address_auth() || \ 353 + system_supports_generic_auth()) && \ 354 + ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_PTRAUTH)) 360 355 361 356 #define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs) 362 357 ··· 493 432 494 433 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); 495 434 496 - DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); 435 + DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); 497 436 498 - static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, 437 + static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt, 499 438 int cpu) 500 439 { 501 440 /* The host's MPIDR is immutable, so let's set it up at boot time */ ··· 513 452 * kernel's mapping to the linear mapping, and store it in tpidr_el2 514 453 * so that we can use adr_l to access per-cpu variables in EL2. 515 454 */ 516 - u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_cpu_state) - 517 - (u64)kvm_ksym_ref(kvm_host_cpu_state)); 455 + u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_data) - 456 + (u64)kvm_ksym_ref(kvm_host_data)); 518 457 519 458 /* 520 459 * Call initialization code, and switch to the full blown HYP code. ··· 552 491 return false; 553 492 } 554 493 494 + void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu); 495 + 555 496 static inline void kvm_arch_hardware_unsetup(void) {} 556 497 static inline void kvm_arch_sync_events(struct kvm *kvm) {} 557 - static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} 558 498 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} 559 499 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} 560 500 ··· 578 516 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); 579 517 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); 580 518 519 + static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) 520 + { 521 + return (!has_vhe() && attr->exclude_host); 522 + } 523 + 581 524 #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ 582 525 static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) 583 526 { 584 527 return kvm_arch_vcpu_run_map_fp(vcpu); 585 528 } 529 + 530 + void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); 531 + void kvm_clr_pmu_events(u32 clr); 532 + 533 + void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); 534 + bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); 535 + 536 + void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); 537 + void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); 538 + #else 539 + static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} 540 + static inline void kvm_clr_pmu_events(u32 clr) {} 586 541 #endif 587 542 588 543 static inline void kvm_arm_vhe_guest_enter(void) ··· 672 593 void kvm_arch_free_vm(struct kvm *kvm); 673 594 674 595 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type); 596 + 597 + int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature); 598 + bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu); 599 + 600 + #define kvm_arm_vcpu_sve_finalized(vcpu) \ 601 + ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED) 675 602 676 603 #endif /* __ARM64_KVM_HOST_H__ */

-1

arch/arm64/include/asm/kvm_hyp.h

··· 149 149 150 150 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); 151 151 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); 152 - bool __fpsimd_enabled(void); 153 152 154 153 void activate_traps_vhe_load(struct kvm_vcpu *vcpu); 155 154 void deactivate_traps_vhe_put(void);

+111

arch/arm64/include/asm/kvm_ptrauth.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* arch/arm64/include/asm/kvm_ptrauth.h: Guest/host ptrauth save/restore 3 + * Copyright 2019 Arm Limited 4 + * Authors: Mark Rutland <mark.rutland@arm.com> 5 + * Amit Daniel Kachhap <amit.kachhap@arm.com> 6 + */ 7 + 8 + #ifndef __ASM_KVM_PTRAUTH_H 9 + #define __ASM_KVM_PTRAUTH_H 10 + 11 + #ifdef __ASSEMBLY__ 12 + 13 + #include <asm/sysreg.h> 14 + 15 + #ifdef CONFIG_ARM64_PTR_AUTH 16 + 17 + #define PTRAUTH_REG_OFFSET(x) (x - CPU_APIAKEYLO_EL1) 18 + 19 + /* 20 + * CPU_AP*_EL1 values exceed immediate offset range (512) for stp 21 + * instruction so below macros takes CPU_APIAKEYLO_EL1 as base and 22 + * calculates the offset of the keys from this base to avoid an extra add 23 + * instruction. These macros assumes the keys offsets follow the order of 24 + * the sysreg enum in kvm_host.h. 25 + */ 26 + .macro ptrauth_save_state base, reg1, reg2 27 + mrs_s \reg1, SYS_APIAKEYLO_EL1 28 + mrs_s \reg2, SYS_APIAKEYHI_EL1 29 + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)] 30 + mrs_s \reg1, SYS_APIBKEYLO_EL1 31 + mrs_s \reg2, SYS_APIBKEYHI_EL1 32 + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)] 33 + mrs_s \reg1, SYS_APDAKEYLO_EL1 34 + mrs_s \reg2, SYS_APDAKEYHI_EL1 35 + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)] 36 + mrs_s \reg1, SYS_APDBKEYLO_EL1 37 + mrs_s \reg2, SYS_APDBKEYHI_EL1 38 + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)] 39 + mrs_s \reg1, SYS_APGAKEYLO_EL1 40 + mrs_s \reg2, SYS_APGAKEYHI_EL1 41 + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)] 42 + .endm 43 + 44 + .macro ptrauth_restore_state base, reg1, reg2 45 + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)] 46 + msr_s SYS_APIAKEYLO_EL1, \reg1 47 + msr_s SYS_APIAKEYHI_EL1, \reg2 48 + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)] 49 + msr_s SYS_APIBKEYLO_EL1, \reg1 50 + msr_s SYS_APIBKEYHI_EL1, \reg2 51 + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)] 52 + msr_s SYS_APDAKEYLO_EL1, \reg1 53 + msr_s SYS_APDAKEYHI_EL1, \reg2 54 + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)] 55 + msr_s SYS_APDBKEYLO_EL1, \reg1 56 + msr_s SYS_APDBKEYHI_EL1, \reg2 57 + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)] 58 + msr_s SYS_APGAKEYLO_EL1, \reg1 59 + msr_s SYS_APGAKEYHI_EL1, \reg2 60 + .endm 61 + 62 + /* 63 + * Both ptrauth_switch_to_guest and ptrauth_switch_to_host macros will 64 + * check for the presence of one of the cpufeature flag 65 + * ARM64_HAS_ADDRESS_AUTH_ARCH or ARM64_HAS_ADDRESS_AUTH_IMP_DEF and 66 + * then proceed ahead with the save/restore of Pointer Authentication 67 + * key registers. 68 + */ 69 + .macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3 70 + alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH 71 + b 1000f 72 + alternative_else_nop_endif 73 + alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF 74 + b 1001f 75 + alternative_else_nop_endif 76 + 1000: 77 + ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)] 78 + and \reg1, \reg1, #(HCR_API | HCR_APK) 79 + cbz \reg1, 1001f 80 + add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1 81 + ptrauth_restore_state \reg1, \reg2, \reg3 82 + 1001: 83 + .endm 84 + 85 + .macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3 86 + alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH 87 + b 2000f 88 + alternative_else_nop_endif 89 + alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF 90 + b 2001f 91 + alternative_else_nop_endif 92 + 2000: 93 + ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)] 94 + and \reg1, \reg1, #(HCR_API | HCR_APK) 95 + cbz \reg1, 2001f 96 + add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1 97 + ptrauth_save_state \reg1, \reg2, \reg3 98 + add \reg1, \h_ctxt, #CPU_APIAKEYLO_EL1 99 + ptrauth_restore_state \reg1, \reg2, \reg3 100 + isb 101 + 2001: 102 + .endm 103 + 104 + #else /* !CONFIG_ARM64_PTR_AUTH */ 105 + .macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3 106 + .endm 107 + .macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3 108 + .endm 109 + #endif /* CONFIG_ARM64_PTR_AUTH */ 110 + #endif /* __ASSEMBLY__ */ 111 + #endif /* __ASM_KVM_PTRAUTH_H */

+3

arch/arm64/include/asm/sysreg.h

··· 454 454 #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6) 455 455 #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7) 456 456 457 + /* VHE encodings for architectural EL0/1 system registers */ 458 + #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) 459 + 457 460 /* Common SCTLR_ELx flags. */ 458 461 #define SCTLR_ELx_DSSBS (_BITUL(44)) 459 462 #define SCTLR_ELx_ENIA (_BITUL(31))

+43

arch/arm64/include/uapi/asm/kvm.h

··· 35 35 #include <linux/psci.h> 36 36 #include <linux/types.h> 37 37 #include <asm/ptrace.h> 38 + #include <asm/sve_context.h> 38 39 39 40 #define __KVM_HAVE_GUEST_DEBUG 40 41 #define __KVM_HAVE_IRQ_LINE ··· 103 102 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */ 104 103 #define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */ 105 104 #define KVM_ARM_VCPU_PMU_V3 3 /* Support guest PMUv3 */ 105 + #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */ 106 + #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */ 107 + #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */ 106 108 107 109 struct kvm_vcpu_init { 108 110 __u32 target; ··· 229 225 #define KVM_REG_ARM_FW_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ 230 226 KVM_REG_ARM_FW | ((r) & 0xffff)) 231 227 #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0) 228 + 229 + /* SVE registers */ 230 + #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT) 231 + 232 + /* Z- and P-regs occupy blocks at the following offsets within this range: */ 233 + #define KVM_REG_ARM64_SVE_ZREG_BASE 0 234 + #define KVM_REG_ARM64_SVE_PREG_BASE 0x400 235 + #define KVM_REG_ARM64_SVE_FFR_BASE 0x600 236 + 237 + #define KVM_ARM64_SVE_NUM_ZREGS __SVE_NUM_ZREGS 238 + #define KVM_ARM64_SVE_NUM_PREGS __SVE_NUM_PREGS 239 + 240 + #define KVM_ARM64_SVE_MAX_SLICES 32 241 + 242 + #define KVM_REG_ARM64_SVE_ZREG(n, i) \ 243 + (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_ZREG_BASE | \ 244 + KVM_REG_SIZE_U2048 | \ 245 + (((n) & (KVM_ARM64_SVE_NUM_ZREGS - 1)) << 5) | \ 246 + ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1))) 247 + 248 + #define KVM_REG_ARM64_SVE_PREG(n, i) \ 249 + (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_PREG_BASE | \ 250 + KVM_REG_SIZE_U256 | \ 251 + (((n) & (KVM_ARM64_SVE_NUM_PREGS - 1)) << 5) | \ 252 + ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1))) 253 + 254 + #define KVM_REG_ARM64_SVE_FFR(i) \ 255 + (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_FFR_BASE | \ 256 + KVM_REG_SIZE_U256 | \ 257 + ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1))) 258 + 259 + #define KVM_ARM64_SVE_VQ_MIN __SVE_VQ_MIN 260 + #define KVM_ARM64_SVE_VQ_MAX __SVE_VQ_MAX 261 + 262 + /* Vector lengths pseudo-register: */ 263 + #define KVM_REG_ARM64_SVE_VLS (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \ 264 + KVM_REG_SIZE_U512 | 0xffff) 265 + #define KVM_ARM64_SVE_VLS_WORDS \ 266 + ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1) 232 267 233 268 /* Device Control API: ARM VGIC */ 234 269 #define KVM_DEV_ARM_VGIC_GRP_ADDR 0

+7

arch/arm64/kernel/asm-offsets.c

··· 125 125 DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); 126 126 DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1)); 127 127 DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags)); 128 + DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); 128 129 DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); 130 + DEFINE(CPU_APIAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1])); 131 + DEFINE(CPU_APIBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1])); 132 + DEFINE(CPU_APDAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1])); 133 + DEFINE(CPU_APDBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDBKEYLO_EL1])); 134 + DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1])); 129 135 DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); 130 136 DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu)); 137 + DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt)); 131 138 #endif 132 139 #ifdef CONFIG_CPU_PM 133 140 DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));

+1 -1

arch/arm64/kernel/cpufeature.c

··· 1913 1913 unsigned int len = zcr & ZCR_ELx_LEN_MASK; 1914 1914 1915 1915 if (len < safe_len || sve_verify_vq_map()) { 1916 - pr_crit("CPU%d: SVE: required vector length(s) missing\n", 1916 + pr_crit("CPU%d: SVE: vector length support mismatch\n", 1917 1917 smp_processor_id()); 1918 1918 cpu_die_early(); 1919 1919 }

+126 -53

arch/arm64/kernel/fpsimd.c

··· 18 18 */ 19 19 20 20 #include <linux/bitmap.h> 21 + #include <linux/bitops.h> 21 22 #include <linux/bottom_half.h> 22 23 #include <linux/bug.h> 23 24 #include <linux/cache.h> ··· 49 48 #include <asm/sigcontext.h> 50 49 #include <asm/sysreg.h> 51 50 #include <asm/traps.h> 51 + #include <asm/virt.h> 52 52 53 53 #define FPEXC_IOF (1 << 0) 54 54 #define FPEXC_DZF (1 << 1) ··· 121 119 */ 122 120 struct fpsimd_last_state_struct { 123 121 struct user_fpsimd_state *st; 122 + void *sve_state; 123 + unsigned int sve_vl; 124 124 }; 125 125 126 126 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state); ··· 134 130 135 131 /* Maximum supported vector length across all CPUs (initially poisoned) */ 136 132 int __ro_after_init sve_max_vl = SVE_VL_MIN; 137 - /* Set of available vector lengths, as vq_to_bit(vq): */ 138 - static __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); 133 + int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN; 134 + 135 + /* 136 + * Set of available vector lengths, 137 + * where length vq encoded as bit __vq_to_bit(vq): 138 + */ 139 + __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); 140 + /* Set of vector lengths present on at least one cpu: */ 141 + static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX); 142 + 139 143 static void __percpu *efi_sve_state; 140 144 141 145 #else /* ! CONFIG_ARM64_SVE */ 142 146 143 147 /* Dummy declaration for code that will be optimised out: */ 144 148 extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); 149 + extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX); 145 150 extern void __percpu *efi_sve_state; 146 151 147 152 #endif /* ! CONFIG_ARM64_SVE */ ··· 248 235 */ 249 236 void fpsimd_save(void) 250 237 { 251 - struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st); 238 + struct fpsimd_last_state_struct const *last = 239 + this_cpu_ptr(&fpsimd_last_state); 252 240 /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */ 253 241 254 242 WARN_ON(!in_softirq() && !irqs_disabled()); 255 243 256 244 if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { 257 245 if (system_supports_sve() && test_thread_flag(TIF_SVE)) { 258 - if (WARN_ON(sve_get_vl() != current->thread.sve_vl)) { 246 + if (WARN_ON(sve_get_vl() != last->sve_vl)) { 259 247 /* 260 248 * Can't save the user regs, so current would 261 249 * re-enter user with corrupt state. ··· 266 252 return; 267 253 } 268 254 269 - sve_save_state(sve_pffr(&current->thread), &st->fpsr); 255 + sve_save_state((char *)last->sve_state + 256 + sve_ffr_offset(last->sve_vl), 257 + &last->st->fpsr); 270 258 } else 271 - fpsimd_save_state(st); 259 + fpsimd_save_state(last->st); 272 260 } 273 - } 274 - 275 - /* 276 - * Helpers to translate bit indices in sve_vq_map to VQ values (and 277 - * vice versa). This allows find_next_bit() to be used to find the 278 - * _maximum_ VQ not exceeding a certain value. 279 - */ 280 - 281 - static unsigned int vq_to_bit(unsigned int vq) 282 - { 283 - return SVE_VQ_MAX - vq; 284 - } 285 - 286 - static unsigned int bit_to_vq(unsigned int bit) 287 - { 288 - if (WARN_ON(bit >= SVE_VQ_MAX)) 289 - bit = SVE_VQ_MAX - 1; 290 - 291 - return SVE_VQ_MAX - bit; 292 261 } 293 262 294 263 /* ··· 295 298 vl = max_vl; 296 299 297 300 bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, 298 - vq_to_bit(sve_vq_from_vl(vl))); 299 - return sve_vl_from_vq(bit_to_vq(bit)); 301 + __vq_to_bit(sve_vq_from_vl(vl))); 302 + return sve_vl_from_vq(__bit_to_vq(bit)); 300 303 } 301 304 302 305 #ifdef CONFIG_SYSCTL ··· 547 550 local_bh_disable(); 548 551 549 552 fpsimd_save(); 550 - set_thread_flag(TIF_FOREIGN_FPSTATE); 551 553 } 552 554 553 555 fpsimd_flush_task_state(task); ··· 620 624 return sve_prctl_status(0); 621 625 } 622 626 623 - /* 624 - * Bitmap for temporary storage of the per-CPU set of supported vector lengths 625 - * during secondary boot. 626 - */ 627 - static DECLARE_BITMAP(sve_secondary_vq_map, SVE_VQ_MAX); 628 - 629 627 static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX)) 630 628 { 631 629 unsigned int vq, vl; ··· 634 644 write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */ 635 645 vl = sve_get_vl(); 636 646 vq = sve_vq_from_vl(vl); /* skip intervening lengths */ 637 - set_bit(vq_to_bit(vq), map); 647 + set_bit(__vq_to_bit(vq), map); 638 648 } 639 649 } 640 650 651 + /* 652 + * Initialise the set of known supported VQs for the boot CPU. 653 + * This is called during kernel boot, before secondary CPUs are brought up. 654 + */ 641 655 void __init sve_init_vq_map(void) 642 656 { 643 657 sve_probe_vqs(sve_vq_map); 658 + bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX); 644 659 } 645 660 646 661 /* 647 662 * If we haven't committed to the set of supported VQs yet, filter out 648 663 * those not supported by the current CPU. 664 + * This function is called during the bring-up of early secondary CPUs only. 649 665 */ 650 666 void sve_update_vq_map(void) 651 667 { 652 - sve_probe_vqs(sve_secondary_vq_map); 653 - bitmap_and(sve_vq_map, sve_vq_map, sve_secondary_vq_map, SVE_VQ_MAX); 668 + DECLARE_BITMAP(tmp_map, SVE_VQ_MAX); 669 + 670 + sve_probe_vqs(tmp_map); 671 + bitmap_and(sve_vq_map, sve_vq_map, tmp_map, SVE_VQ_MAX); 672 + bitmap_or(sve_vq_partial_map, sve_vq_partial_map, tmp_map, SVE_VQ_MAX); 654 673 } 655 674 656 - /* Check whether the current CPU supports all VQs in the committed set */ 675 + /* 676 + * Check whether the current CPU supports all VQs in the committed set. 677 + * This function is called during the bring-up of late secondary CPUs only. 678 + */ 657 679 int sve_verify_vq_map(void) 658 680 { 659 - int ret = 0; 681 + DECLARE_BITMAP(tmp_map, SVE_VQ_MAX); 682 + unsigned long b; 660 683 661 - sve_probe_vqs(sve_secondary_vq_map); 662 - bitmap_andnot(sve_secondary_vq_map, sve_vq_map, sve_secondary_vq_map, 663 - SVE_VQ_MAX); 664 - if (!bitmap_empty(sve_secondary_vq_map, SVE_VQ_MAX)) { 684 + sve_probe_vqs(tmp_map); 685 + 686 + bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX); 687 + if (bitmap_intersects(tmp_map, sve_vq_map, SVE_VQ_MAX)) { 665 688 pr_warn("SVE: cpu%d: Required vector length(s) missing\n", 666 689 smp_processor_id()); 667 - ret = -EINVAL; 690 + return -EINVAL; 668 691 } 669 692 670 - return ret; 693 + if (!IS_ENABLED(CONFIG_KVM) || !is_hyp_mode_available()) 694 + return 0; 695 + 696 + /* 697 + * For KVM, it is necessary to ensure that this CPU doesn't 698 + * support any vector length that guests may have probed as 699 + * unsupported. 700 + */ 701 + 702 + /* Recover the set of supported VQs: */ 703 + bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX); 704 + /* Find VQs supported that are not globally supported: */ 705 + bitmap_andnot(tmp_map, tmp_map, sve_vq_map, SVE_VQ_MAX); 706 + 707 + /* Find the lowest such VQ, if any: */ 708 + b = find_last_bit(tmp_map, SVE_VQ_MAX); 709 + if (b >= SVE_VQ_MAX) 710 + return 0; /* no mismatches */ 711 + 712 + /* 713 + * Mismatches above sve_max_virtualisable_vl are fine, since 714 + * no guest is allowed to configure ZCR_EL2.LEN to exceed this: 715 + */ 716 + if (sve_vl_from_vq(__bit_to_vq(b)) <= sve_max_virtualisable_vl) { 717 + pr_warn("SVE: cpu%d: Unsupported vector length(s) present\n", 718 + smp_processor_id()); 719 + return -EINVAL; 720 + } 721 + 722 + return 0; 671 723 } 672 724 673 725 static void __init sve_efi_setup(void) ··· 776 744 void __init sve_setup(void) 777 745 { 778 746 u64 zcr; 747 + DECLARE_BITMAP(tmp_map, SVE_VQ_MAX); 748 + unsigned long b; 779 749 780 750 if (!system_supports_sve()) 781 751 return; ··· 787 753 * so sve_vq_map must have at least SVE_VQ_MIN set. 788 754 * If something went wrong, at least try to patch it up: 789 755 */ 790 - if (WARN_ON(!test_bit(vq_to_bit(SVE_VQ_MIN), sve_vq_map))) 791 - set_bit(vq_to_bit(SVE_VQ_MIN), sve_vq_map); 756 + if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map))) 757 + set_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map); 792 758 793 759 zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1); 794 760 sve_max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1); ··· 806 772 */ 807 773 sve_default_vl = find_supported_vector_length(64); 808 774 775 + bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map, 776 + SVE_VQ_MAX); 777 + 778 + b = find_last_bit(tmp_map, SVE_VQ_MAX); 779 + if (b >= SVE_VQ_MAX) 780 + /* No non-virtualisable VLs found */ 781 + sve_max_virtualisable_vl = SVE_VQ_MAX; 782 + else if (WARN_ON(b == SVE_VQ_MAX - 1)) 783 + /* No virtualisable VLs? This is architecturally forbidden. */ 784 + sve_max_virtualisable_vl = SVE_VQ_MIN; 785 + else /* b + 1 < SVE_VQ_MAX */ 786 + sve_max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1)); 787 + 788 + if (sve_max_virtualisable_vl > sve_max_vl) 789 + sve_max_virtualisable_vl = sve_max_vl; 790 + 809 791 pr_info("SVE: maximum available vector length %u bytes per vector\n", 810 792 sve_max_vl); 811 793 pr_info("SVE: default vector length %u bytes per vector\n", 812 794 sve_default_vl); 795 + 796 + /* KVM decides whether to support mismatched systems. Just warn here: */ 797 + if (sve_max_virtualisable_vl < sve_max_vl) 798 + pr_warn("SVE: unvirtualisable vector lengths present\n"); 813 799 814 800 sve_efi_setup(); 815 801 } ··· 870 816 local_bh_disable(); 871 817 872 818 fpsimd_save(); 873 - fpsimd_to_sve(current); 874 819 875 820 /* Force ret_to_user to reload the registers: */ 876 821 fpsimd_flush_task_state(current); 877 - set_thread_flag(TIF_FOREIGN_FPSTATE); 878 822 823 + fpsimd_to_sve(current); 879 824 if (test_and_set_thread_flag(TIF_SVE)) 880 825 WARN_ON(1); /* SVE access shouldn't have trapped */ 881 826 ··· 947 894 948 895 local_bh_disable(); 949 896 897 + fpsimd_flush_task_state(current); 950 898 memset(&current->thread.uw.fpsimd_state, 0, 951 899 sizeof(current->thread.uw.fpsimd_state)); 952 - fpsimd_flush_task_state(current); 953 900 954 901 if (system_supports_sve()) { 955 902 clear_thread_flag(TIF_SVE); ··· 985 932 if (!test_thread_flag(TIF_SVE_VL_INHERIT)) 986 933 current->thread.sve_vl_onexec = 0; 987 934 } 988 - 989 - set_thread_flag(TIF_FOREIGN_FPSTATE); 990 935 991 936 local_bh_enable(); 992 937 } ··· 1025 974 this_cpu_ptr(&fpsimd_last_state); 1026 975 1027 976 last->st = &current->thread.uw.fpsimd_state; 977 + last->sve_state = current->thread.sve_state; 978 + last->sve_vl = current->thread.sve_vl; 1028 979 current->thread.fpsimd_cpu = smp_processor_id(); 1029 980 1030 981 if (system_supports_sve()) { ··· 1040 987 } 1041 988 } 1042 989 1043 - void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st) 990 + void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state, 991 + unsigned int sve_vl) 1044 992 { 1045 993 struct fpsimd_last_state_struct *last = 1046 994 this_cpu_ptr(&fpsimd_last_state); ··· 1049 995 WARN_ON(!in_softirq() && !irqs_disabled()); 1050 996 1051 997 last->st = st; 998 + last->sve_state = sve_state; 999 + last->sve_vl = sve_vl; 1052 1000 } 1053 1001 1054 1002 /* ··· 1099 1043 1100 1044 /* 1101 1045 * Invalidate live CPU copies of task t's FPSIMD state 1046 + * 1047 + * This function may be called with preemption enabled. The barrier() 1048 + * ensures that the assignment to fpsimd_cpu is visible to any 1049 + * preemption/softirq that could race with set_tsk_thread_flag(), so 1050 + * that TIF_FOREIGN_FPSTATE cannot be spuriously re-cleared. 1051 + * 1052 + * The final barrier ensures that TIF_FOREIGN_FPSTATE is seen set by any 1053 + * subsequent code. 1102 1054 */ 1103 1055 void fpsimd_flush_task_state(struct task_struct *t) 1104 1056 { 1105 1057 t->thread.fpsimd_cpu = NR_CPUS; 1058 + 1059 + barrier(); 1060 + set_tsk_thread_flag(t, TIF_FOREIGN_FPSTATE); 1061 + 1062 + barrier(); 1106 1063 } 1107 1064 1065 + /* 1066 + * Invalidate any task's FPSIMD state that is present on this cpu. 1067 + * This function must be called with softirqs disabled. 1068 + */ 1108 1069 void fpsimd_flush_cpu_state(void) 1109 1070 { 1110 1071 __this_cpu_write(fpsimd_last_state.st, NULL);

+41 -9

arch/arm64/kernel/perf_event.c

··· 26 26 27 27 #include <linux/acpi.h> 28 28 #include <linux/clocksource.h> 29 + #include <linux/kvm_host.h> 29 30 #include <linux/of.h> 30 31 #include <linux/perf/arm_pmu.h> 31 32 #include <linux/platform_device.h> ··· 529 528 530 529 static inline void armv8pmu_enable_event_counter(struct perf_event *event) 531 530 { 531 + struct perf_event_attr *attr = &event->attr; 532 532 int idx = event->hw.idx; 533 + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); 533 534 534 - armv8pmu_enable_counter(idx); 535 535 if (armv8pmu_event_is_chained(event)) 536 - armv8pmu_enable_counter(idx - 1); 537 - isb(); 536 + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); 537 + 538 + kvm_set_pmu_events(counter_bits, attr); 539 + 540 + /* We rely on the hypervisor switch code to enable guest counters */ 541 + if (!kvm_pmu_counter_deferred(attr)) { 542 + armv8pmu_enable_counter(idx); 543 + if (armv8pmu_event_is_chained(event)) 544 + armv8pmu_enable_counter(idx - 1); 545 + } 538 546 } 539 547 540 548 static inline int armv8pmu_disable_counter(int idx) ··· 556 546 static inline void armv8pmu_disable_event_counter(struct perf_event *event) 557 547 { 558 548 struct hw_perf_event *hwc = &event->hw; 549 + struct perf_event_attr *attr = &event->attr; 559 550 int idx = hwc->idx; 551 + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); 560 552 561 553 if (armv8pmu_event_is_chained(event)) 562 - armv8pmu_disable_counter(idx - 1); 563 - armv8pmu_disable_counter(idx); 554 + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); 555 + 556 + kvm_clr_pmu_events(counter_bits); 557 + 558 + /* We rely on the hypervisor switch code to disable guest counters */ 559 + if (!kvm_pmu_counter_deferred(attr)) { 560 + if (armv8pmu_event_is_chained(event)) 561 + armv8pmu_disable_counter(idx - 1); 562 + armv8pmu_disable_counter(idx); 563 + } 564 564 } 565 565 566 566 static inline int armv8pmu_enable_intens(int idx) ··· 847 827 * with other architectures (x86 and Power). 848 828 */ 849 829 if (is_kernel_in_hyp_mode()) { 850 - if (!attr->exclude_kernel) 830 + if (!attr->exclude_kernel && !attr->exclude_host) 851 831 config_base |= ARMV8_PMU_INCLUDE_EL2; 852 - } else { 853 - if (attr->exclude_kernel) 832 + if (attr->exclude_guest) 854 833 config_base |= ARMV8_PMU_EXCLUDE_EL1; 855 - if (!attr->exclude_hv) 834 + if (attr->exclude_host) 835 + config_base |= ARMV8_PMU_EXCLUDE_EL0; 836 + } else { 837 + if (!attr->exclude_hv && !attr->exclude_host) 856 838 config_base |= ARMV8_PMU_INCLUDE_EL2; 857 839 } 840 + 841 + /* 842 + * Filter out !VHE kernels and guest kernels 843 + */ 844 + if (attr->exclude_kernel) 845 + config_base |= ARMV8_PMU_EXCLUDE_EL1; 846 + 858 847 if (attr->exclude_user) 859 848 config_base |= ARMV8_PMU_EXCLUDE_EL0; 860 849 ··· 892 863 armv8pmu_disable_counter(idx); 893 864 armv8pmu_disable_intens(idx); 894 865 } 866 + 867 + /* Clear the counters we flip at guest entry/exit */ 868 + kvm_clr_pmu_events(U32_MAX); 895 869 896 870 /* 897 871 * Initialize & Reset PMNC. Request overflow interrupt for

-5

arch/arm64/kernel/signal.c

··· 296 296 */ 297 297 298 298 fpsimd_flush_task_state(current); 299 - barrier(); 300 - /* From now, fpsimd_thread_switch() won't clear TIF_FOREIGN_FPSTATE */ 301 - 302 - set_thread_flag(TIF_FOREIGN_FPSTATE); 303 - barrier(); 304 299 /* From now, fpsimd_thread_switch() won't touch thread.sve_state */ 305 300 306 301 sve_alloc(current);

+1 -1

arch/arm64/kvm/Makefile

··· 17 17 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o 18 18 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o 19 19 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o 20 - kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o 20 + kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o 21 21 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o 22 22 23 23 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o

+14 -3

arch/arm64/kvm/fpsimd.c

··· 9 9 #include <linux/sched.h> 10 10 #include <linux/thread_info.h> 11 11 #include <linux/kvm_host.h> 12 + #include <asm/fpsimd.h> 12 13 #include <asm/kvm_asm.h> 13 14 #include <asm/kvm_host.h> 14 15 #include <asm/kvm_mmu.h> ··· 86 85 WARN_ON_ONCE(!irqs_disabled()); 87 86 88 87 if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { 89 - fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs); 88 + fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs, 89 + vcpu->arch.sve_state, 90 + vcpu->arch.sve_max_vl); 91 + 90 92 clear_thread_flag(TIF_FOREIGN_FPSTATE); 91 - clear_thread_flag(TIF_SVE); 93 + update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu)); 92 94 } 93 95 } 94 96 ··· 104 100 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) 105 101 { 106 102 unsigned long flags; 103 + bool host_has_sve = system_supports_sve(); 104 + bool guest_has_sve = vcpu_has_sve(vcpu); 107 105 108 106 local_irq_save(flags); 109 107 110 108 if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { 109 + u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1]; 110 + 111 111 /* Clean guest FP state to memory and invalidate cpu view */ 112 112 fpsimd_save(); 113 113 fpsimd_flush_cpu_state(); 114 - } else if (system_supports_sve()) { 114 + 115 + if (guest_has_sve) 116 + *guest_zcr = read_sysreg_s(SYS_ZCR_EL12); 117 + } else if (host_has_sve) { 115 118 /* 116 119 * The FPSIMD/SVE state in the CPU has not been touched, and we 117 120 * have SVE (and VHE): CPACR_EL1 (alias CPTR_EL2) has been

+383 -32

arch/arm64/kvm/guest.c

··· 19 19 * along with this program. If not, see <http://www.gnu.org/licenses/>. 20 20 */ 21 21 22 + #include <linux/bits.h> 22 23 #include <linux/errno.h> 23 24 #include <linux/err.h> 25 + #include <linux/nospec.h> 24 26 #include <linux/kvm_host.h> 25 27 #include <linux/module.h> 28 + #include <linux/stddef.h> 29 + #include <linux/string.h> 26 30 #include <linux/vmalloc.h> 27 31 #include <linux/fs.h> 28 32 #include <kvm/arm_psci.h> 29 33 #include <asm/cputype.h> 30 34 #include <linux/uaccess.h> 35 + #include <asm/fpsimd.h> 31 36 #include <asm/kvm.h> 32 37 #include <asm/kvm_emulate.h> 33 38 #include <asm/kvm_coproc.h> 39 + #include <asm/kvm_host.h> 40 + #include <asm/sigcontext.h> 34 41 35 42 #include "trace.h" 36 43 ··· 59 52 return 0; 60 53 } 61 54 55 + static bool core_reg_offset_is_vreg(u64 off) 56 + { 57 + return off >= KVM_REG_ARM_CORE_REG(fp_regs.vregs) && 58 + off < KVM_REG_ARM_CORE_REG(fp_regs.fpsr); 59 + } 60 + 62 61 static u64 core_reg_offset_from_id(u64 id) 63 62 { 64 63 return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE); 65 64 } 66 65 67 - static int validate_core_offset(const struct kvm_one_reg *reg) 66 + static int validate_core_offset(const struct kvm_vcpu *vcpu, 67 + const struct kvm_one_reg *reg) 68 68 { 69 69 u64 off = core_reg_offset_from_id(reg->id); 70 70 int size; ··· 103 89 return -EINVAL; 104 90 } 105 91 106 - if (KVM_REG_SIZE(reg->id) == size && 107 - IS_ALIGNED(off, size / sizeof(__u32))) 108 - return 0; 92 + if (KVM_REG_SIZE(reg->id) != size || 93 + !IS_ALIGNED(off, size / sizeof(__u32))) 94 + return -EINVAL; 109 95 110 - return -EINVAL; 96 + /* 97 + * The KVM_REG_ARM64_SVE regs must be used instead of 98 + * KVM_REG_ARM_CORE for accessing the FPSIMD V-registers on 99 + * SVE-enabled vcpus: 100 + */ 101 + if (vcpu_has_sve(vcpu) && core_reg_offset_is_vreg(off)) 102 + return -EINVAL; 103 + 104 + return 0; 111 105 } 112 106 113 107 static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) ··· 137 115 (off + (KVM_REG_SIZE(reg->id) / sizeof(__u32))) >= nr_regs) 138 116 return -ENOENT; 139 117 140 - if (validate_core_offset(reg)) 118 + if (validate_core_offset(vcpu, reg)) 141 119 return -EINVAL; 142 120 143 121 if (copy_to_user(uaddr, ((u32 *)regs) + off, KVM_REG_SIZE(reg->id))) ··· 162 140 (off + (KVM_REG_SIZE(reg->id) / sizeof(__u32))) >= nr_regs) 163 141 return -ENOENT; 164 142 165 - if (validate_core_offset(reg)) 143 + if (validate_core_offset(vcpu, reg)) 166 144 return -EINVAL; 167 145 168 146 if (KVM_REG_SIZE(reg->id) > sizeof(tmp)) ··· 205 183 return err; 206 184 } 207 185 186 + #define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64) 187 + #define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64) 188 + 189 + static bool vq_present( 190 + const u64 (*const vqs)[KVM_ARM64_SVE_VLS_WORDS], 191 + unsigned int vq) 192 + { 193 + return (*vqs)[vq_word(vq)] & vq_mask(vq); 194 + } 195 + 196 + static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 197 + { 198 + unsigned int max_vq, vq; 199 + u64 vqs[KVM_ARM64_SVE_VLS_WORDS]; 200 + 201 + if (!vcpu_has_sve(vcpu)) 202 + return -ENOENT; 203 + 204 + if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl))) 205 + return -EINVAL; 206 + 207 + memset(vqs, 0, sizeof(vqs)); 208 + 209 + max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); 210 + for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) 211 + if (sve_vq_available(vq)) 212 + vqs[vq_word(vq)] |= vq_mask(vq); 213 + 214 + if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs))) 215 + return -EFAULT; 216 + 217 + return 0; 218 + } 219 + 220 + static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 221 + { 222 + unsigned int max_vq, vq; 223 + u64 vqs[KVM_ARM64_SVE_VLS_WORDS]; 224 + 225 + if (!vcpu_has_sve(vcpu)) 226 + return -ENOENT; 227 + 228 + if (kvm_arm_vcpu_sve_finalized(vcpu)) 229 + return -EPERM; /* too late! */ 230 + 231 + if (WARN_ON(vcpu->arch.sve_state)) 232 + return -EINVAL; 233 + 234 + if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs))) 235 + return -EFAULT; 236 + 237 + max_vq = 0; 238 + for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq) 239 + if (vq_present(&vqs, vq)) 240 + max_vq = vq; 241 + 242 + if (max_vq > sve_vq_from_vl(kvm_sve_max_vl)) 243 + return -EINVAL; 244 + 245 + /* 246 + * Vector lengths supported by the host can't currently be 247 + * hidden from the guest individually: instead we can only set a 248 + * maxmium via ZCR_EL2.LEN. So, make sure the available vector 249 + * lengths match the set requested exactly up to the requested 250 + * maximum: 251 + */ 252 + for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) 253 + if (vq_present(&vqs, vq) != sve_vq_available(vq)) 254 + return -EINVAL; 255 + 256 + /* Can't run with no vector lengths at all: */ 257 + if (max_vq < SVE_VQ_MIN) 258 + return -EINVAL; 259 + 260 + /* vcpu->arch.sve_state will be alloc'd by kvm_vcpu_finalize_sve() */ 261 + vcpu->arch.sve_max_vl = sve_vl_from_vq(max_vq); 262 + 263 + return 0; 264 + } 265 + 266 + #define SVE_REG_SLICE_SHIFT 0 267 + #define SVE_REG_SLICE_BITS 5 268 + #define SVE_REG_ID_SHIFT (SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS) 269 + #define SVE_REG_ID_BITS 5 270 + 271 + #define SVE_REG_SLICE_MASK \ 272 + GENMASK(SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS - 1, \ 273 + SVE_REG_SLICE_SHIFT) 274 + #define SVE_REG_ID_MASK \ 275 + GENMASK(SVE_REG_ID_SHIFT + SVE_REG_ID_BITS - 1, SVE_REG_ID_SHIFT) 276 + 277 + #define SVE_NUM_SLICES (1 << SVE_REG_SLICE_BITS) 278 + 279 + #define KVM_SVE_ZREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_ZREG(0, 0)) 280 + #define KVM_SVE_PREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_PREG(0, 0)) 281 + 282 + /* 283 + * Number of register slices required to cover each whole SVE register. 284 + * NOTE: Only the first slice every exists, for now. 285 + * If you are tempted to modify this, you must also rework sve_reg_to_region() 286 + * to match: 287 + */ 288 + #define vcpu_sve_slices(vcpu) 1 289 + 290 + /* Bounds of a single SVE register slice within vcpu->arch.sve_state */ 291 + struct sve_state_reg_region { 292 + unsigned int koffset; /* offset into sve_state in kernel memory */ 293 + unsigned int klen; /* length in kernel memory */ 294 + unsigned int upad; /* extra trailing padding in user memory */ 295 + }; 296 + 297 + /* 298 + * Validate SVE register ID and get sanitised bounds for user/kernel SVE 299 + * register copy 300 + */ 301 + static int sve_reg_to_region(struct sve_state_reg_region *region, 302 + struct kvm_vcpu *vcpu, 303 + const struct kvm_one_reg *reg) 304 + { 305 + /* reg ID ranges for Z- registers */ 306 + const u64 zreg_id_min = KVM_REG_ARM64_SVE_ZREG(0, 0); 307 + const u64 zreg_id_max = KVM_REG_ARM64_SVE_ZREG(SVE_NUM_ZREGS - 1, 308 + SVE_NUM_SLICES - 1); 309 + 310 + /* reg ID ranges for P- registers and FFR (which are contiguous) */ 311 + const u64 preg_id_min = KVM_REG_ARM64_SVE_PREG(0, 0); 312 + const u64 preg_id_max = KVM_REG_ARM64_SVE_FFR(SVE_NUM_SLICES - 1); 313 + 314 + unsigned int vq; 315 + unsigned int reg_num; 316 + 317 + unsigned int reqoffset, reqlen; /* User-requested offset and length */ 318 + unsigned int maxlen; /* Maxmimum permitted length */ 319 + 320 + size_t sve_state_size; 321 + 322 + const u64 last_preg_id = KVM_REG_ARM64_SVE_PREG(SVE_NUM_PREGS - 1, 323 + SVE_NUM_SLICES - 1); 324 + 325 + /* Verify that the P-regs and FFR really do have contiguous IDs: */ 326 + BUILD_BUG_ON(KVM_REG_ARM64_SVE_FFR(0) != last_preg_id + 1); 327 + 328 + /* Verify that we match the UAPI header: */ 329 + BUILD_BUG_ON(SVE_NUM_SLICES != KVM_ARM64_SVE_MAX_SLICES); 330 + 331 + reg_num = (reg->id & SVE_REG_ID_MASK) >> SVE_REG_ID_SHIFT; 332 + 333 + if (reg->id >= zreg_id_min && reg->id <= zreg_id_max) { 334 + if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) 335 + return -ENOENT; 336 + 337 + vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); 338 + 339 + reqoffset = SVE_SIG_ZREG_OFFSET(vq, reg_num) - 340 + SVE_SIG_REGS_OFFSET; 341 + reqlen = KVM_SVE_ZREG_SIZE; 342 + maxlen = SVE_SIG_ZREG_SIZE(vq); 343 + } else if (reg->id >= preg_id_min && reg->id <= preg_id_max) { 344 + if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) 345 + return -ENOENT; 346 + 347 + vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); 348 + 349 + reqoffset = SVE_SIG_PREG_OFFSET(vq, reg_num) - 350 + SVE_SIG_REGS_OFFSET; 351 + reqlen = KVM_SVE_PREG_SIZE; 352 + maxlen = SVE_SIG_PREG_SIZE(vq); 353 + } else { 354 + return -EINVAL; 355 + } 356 + 357 + sve_state_size = vcpu_sve_state_size(vcpu); 358 + if (WARN_ON(!sve_state_size)) 359 + return -EINVAL; 360 + 361 + region->koffset = array_index_nospec(reqoffset, sve_state_size); 362 + region->klen = min(maxlen, reqlen); 363 + region->upad = reqlen - region->klen; 364 + 365 + return 0; 366 + } 367 + 368 + static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 369 + { 370 + int ret; 371 + struct sve_state_reg_region region; 372 + char __user *uptr = (char __user *)reg->addr; 373 + 374 + /* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */ 375 + if (reg->id == KVM_REG_ARM64_SVE_VLS) 376 + return get_sve_vls(vcpu, reg); 377 + 378 + /* Try to interpret reg ID as an architectural SVE register... */ 379 + ret = sve_reg_to_region(&region, vcpu, reg); 380 + if (ret) 381 + return ret; 382 + 383 + if (!kvm_arm_vcpu_sve_finalized(vcpu)) 384 + return -EPERM; 385 + 386 + if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset, 387 + region.klen) || 388 + clear_user(uptr + region.klen, region.upad)) 389 + return -EFAULT; 390 + 391 + return 0; 392 + } 393 + 394 + static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 395 + { 396 + int ret; 397 + struct sve_state_reg_region region; 398 + const char __user *uptr = (const char __user *)reg->addr; 399 + 400 + /* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */ 401 + if (reg->id == KVM_REG_ARM64_SVE_VLS) 402 + return set_sve_vls(vcpu, reg); 403 + 404 + /* Try to interpret reg ID as an architectural SVE register... */ 405 + ret = sve_reg_to_region(&region, vcpu, reg); 406 + if (ret) 407 + return ret; 408 + 409 + if (!kvm_arm_vcpu_sve_finalized(vcpu)) 410 + return -EPERM; 411 + 412 + if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr, 413 + region.klen)) 414 + return -EFAULT; 415 + 416 + return 0; 417 + } 418 + 208 419 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 209 420 { 210 421 return -EINVAL; ··· 448 193 return -EINVAL; 449 194 } 450 195 451 - static unsigned long num_core_regs(void) 196 + static int copy_core_reg_indices(const struct kvm_vcpu *vcpu, 197 + u64 __user *uindices) 452 198 { 453 - return sizeof(struct kvm_regs) / sizeof(__u32); 199 + unsigned int i; 200 + int n = 0; 201 + const u64 core_reg = KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE; 202 + 203 + for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) { 204 + /* 205 + * The KVM_REG_ARM64_SVE regs must be used instead of 206 + * KVM_REG_ARM_CORE for accessing the FPSIMD V-registers on 207 + * SVE-enabled vcpus: 208 + */ 209 + if (vcpu_has_sve(vcpu) && core_reg_offset_is_vreg(i)) 210 + continue; 211 + 212 + if (uindices) { 213 + if (put_user(core_reg | i, uindices)) 214 + return -EFAULT; 215 + uindices++; 216 + } 217 + 218 + n++; 219 + } 220 + 221 + return n; 222 + } 223 + 224 + static unsigned long num_core_regs(const struct kvm_vcpu *vcpu) 225 + { 226 + return copy_core_reg_indices(vcpu, NULL); 454 227 } 455 228 456 229 /** ··· 534 251 return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)) ? -EFAULT : 0; 535 252 } 536 253 254 + static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu) 255 + { 256 + const unsigned int slices = vcpu_sve_slices(vcpu); 257 + 258 + if (!vcpu_has_sve(vcpu)) 259 + return 0; 260 + 261 + /* Policed by KVM_GET_REG_LIST: */ 262 + WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu)); 263 + 264 + return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */) 265 + + 1; /* KVM_REG_ARM64_SVE_VLS */ 266 + } 267 + 268 + static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu, 269 + u64 __user *uindices) 270 + { 271 + const unsigned int slices = vcpu_sve_slices(vcpu); 272 + u64 reg; 273 + unsigned int i, n; 274 + int num_regs = 0; 275 + 276 + if (!vcpu_has_sve(vcpu)) 277 + return 0; 278 + 279 + /* Policed by KVM_GET_REG_LIST: */ 280 + WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu)); 281 + 282 + /* 283 + * Enumerate this first, so that userspace can save/restore in 284 + * the order reported by KVM_GET_REG_LIST: 285 + */ 286 + reg = KVM_REG_ARM64_SVE_VLS; 287 + if (put_user(reg, uindices++)) 288 + return -EFAULT; 289 + ++num_regs; 290 + 291 + for (i = 0; i < slices; i++) { 292 + for (n = 0; n < SVE_NUM_ZREGS; n++) { 293 + reg = KVM_REG_ARM64_SVE_ZREG(n, i); 294 + if (put_user(reg, uindices++)) 295 + return -EFAULT; 296 + num_regs++; 297 + } 298 + 299 + for (n = 0; n < SVE_NUM_PREGS; n++) { 300 + reg = KVM_REG_ARM64_SVE_PREG(n, i); 301 + if (put_user(reg, uindices++)) 302 + return -EFAULT; 303 + num_regs++; 304 + } 305 + 306 + reg = KVM_REG_ARM64_SVE_FFR(i); 307 + if (put_user(reg, uindices++)) 308 + return -EFAULT; 309 + num_regs++; 310 + } 311 + 312 + return num_regs; 313 + } 314 + 537 315 /** 538 316 * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG 539 317 * ··· 602 258 */ 603 259 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu) 604 260 { 605 - return num_core_regs() + kvm_arm_num_sys_reg_descs(vcpu) 606 - + kvm_arm_get_fw_num_regs(vcpu) + NUM_TIMER_REGS; 261 + unsigned long res = 0; 262 + 263 + res += num_core_regs(vcpu); 264 + res += num_sve_regs(vcpu); 265 + res += kvm_arm_num_sys_reg_descs(vcpu); 266 + res += kvm_arm_get_fw_num_regs(vcpu); 267 + res += NUM_TIMER_REGS; 268 + 269 + return res; 607 270 } 608 271 609 272 /** ··· 620 269 */ 621 270 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices) 622 271 { 623 - unsigned int i; 624 - const u64 core_reg = KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE; 625 272 int ret; 626 273 627 - for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) { 628 - if (put_user(core_reg | i, uindices)) 629 - return -EFAULT; 630 - uindices++; 631 - } 274 + ret = copy_core_reg_indices(vcpu, uindices); 275 + if (ret < 0) 276 + return ret; 277 + uindices += ret; 278 + 279 + ret = copy_sve_reg_indices(vcpu, uindices); 280 + if (ret < 0) 281 + return ret; 282 + uindices += ret; 632 283 633 284 ret = kvm_arm_copy_fw_reg_indices(vcpu, uindices); 634 - if (ret) 285 + if (ret < 0) 635 286 return ret; 636 287 uindices += kvm_arm_get_fw_num_regs(vcpu); 637 288 638 289 ret = copy_timer_indices(vcpu, uindices); 639 - if (ret) 290 + if (ret < 0) 640 291 return ret; 641 292 uindices += NUM_TIMER_REGS; 642 293 ··· 651 298 if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32) 652 299 return -EINVAL; 653 300 654 - /* Register group 16 means we want a core register. */ 655 - if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE) 656 - return get_core_reg(vcpu, reg); 657 - 658 - if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_FW) 659 - return kvm_arm_get_fw_reg(vcpu, reg); 301 + switch (reg->id & KVM_REG_ARM_COPROC_MASK) { 302 + case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg); 303 + case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg); 304 + case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg); 305 + } 660 306 661 307 if (is_timer_reg(reg->id)) 662 308 return get_timer_reg(vcpu, reg); ··· 669 317 if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32) 670 318 return -EINVAL; 671 319 672 - /* Register group 16 means we set a core register. */ 673 - if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE) 674 - return set_core_reg(vcpu, reg); 675 - 676 - if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_FW) 677 - return kvm_arm_set_fw_reg(vcpu, reg); 320 + switch (reg->id & KVM_REG_ARM_COPROC_MASK) { 321 + case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg); 322 + case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg); 323 + case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg); 324 + } 678 325 679 326 if (is_timer_reg(reg->id)) 680 327 return set_timer_reg(vcpu, reg);

+28 -8

arch/arm64/kvm/handle_exit.c

··· 173 173 return 1; 174 174 } 175 175 176 + #define __ptrauth_save_key(regs, key) \ 177 + ({ \ 178 + regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \ 179 + regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \ 180 + }) 181 + 182 + /* 183 + * Handle the guest trying to use a ptrauth instruction, or trying to access a 184 + * ptrauth register. 185 + */ 186 + void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu) 187 + { 188 + struct kvm_cpu_context *ctxt; 189 + 190 + if (vcpu_has_ptrauth(vcpu)) { 191 + vcpu_ptrauth_enable(vcpu); 192 + ctxt = vcpu->arch.host_cpu_context; 193 + __ptrauth_save_key(ctxt->sys_regs, APIA); 194 + __ptrauth_save_key(ctxt->sys_regs, APIB); 195 + __ptrauth_save_key(ctxt->sys_regs, APDA); 196 + __ptrauth_save_key(ctxt->sys_regs, APDB); 197 + __ptrauth_save_key(ctxt->sys_regs, APGA); 198 + } else { 199 + kvm_inject_undefined(vcpu); 200 + } 201 + } 202 + 176 203 /* 177 204 * Guest usage of a ptrauth instruction (which the guest EL1 did not turn into 178 205 * a NOP). 179 206 */ 180 207 static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu, struct kvm_run *run) 181 208 { 182 - /* 183 - * We don't currently support ptrauth in a guest, and we mask the ID 184 - * registers to prevent well-behaved guests from trying to make use of 185 - * it. 186 - * 187 - * Inject an UNDEF, as if the feature really isn't present. 188 - */ 189 - kvm_inject_undefined(vcpu); 209 + kvm_arm_vcpu_ptrauth_trap(vcpu); 190 210 return 1; 191 211 } 192 212

+15

arch/arm64/kvm/hyp/entry.S

··· 24 24 #include <asm/kvm_arm.h> 25 25 #include <asm/kvm_asm.h> 26 26 #include <asm/kvm_mmu.h> 27 + #include <asm/kvm_ptrauth.h> 27 28 28 29 #define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x) 29 30 #define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x) ··· 64 63 save_callee_saved_regs x1 65 64 66 65 add x18, x0, #VCPU_CONTEXT 66 + 67 + // Macro ptrauth_switch_to_guest format: 68 + // ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3) 69 + // The below macro to restore guest keys is not implemented in C code 70 + // as it may cause Pointer Authentication key signing mismatch errors 71 + // when this feature is enabled for kernel code. 72 + ptrauth_switch_to_guest x18, x0, x1, x2 67 73 68 74 // Restore guest regs x0-x17 69 75 ldp x0, x1, [x18, #CPU_XREG_OFFSET(0)] ··· 125 117 save_callee_saved_regs x1 126 118 127 119 get_host_ctxt x2, x3 120 + 121 + // Macro ptrauth_switch_to_guest format: 122 + // ptrauth_switch_to_host(guest cxt, host cxt, tmp1, tmp2, tmp3) 123 + // The below macro to save/restore keys is not implemented in C code 124 + // as it may cause Pointer Authentication key signing mismatch errors 125 + // when this feature is enabled for kernel code. 126 + ptrauth_switch_to_host x1, x2, x3, x4, x5 128 127 129 128 // Now restore the host regs 130 129 restore_callee_saved_regs x2

+64 -16

arch/arm64/kvm/hyp/switch.c

··· 100 100 val = read_sysreg(cpacr_el1); 101 101 val |= CPACR_EL1_TTA; 102 102 val &= ~CPACR_EL1_ZEN; 103 - if (!update_fp_enabled(vcpu)) { 103 + if (update_fp_enabled(vcpu)) { 104 + if (vcpu_has_sve(vcpu)) 105 + val |= CPACR_EL1_ZEN; 106 + } else { 104 107 val &= ~CPACR_EL1_FPEN; 105 108 __activate_traps_fpsimd32(vcpu); 106 109 } ··· 320 317 return true; 321 318 } 322 319 323 - static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu) 320 + /* Check for an FPSIMD/SVE trap and handle as appropriate */ 321 + static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) 324 322 { 325 - struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state; 323 + bool vhe, sve_guest, sve_host; 324 + u8 hsr_ec; 326 325 327 - if (has_vhe()) 328 - write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN, 329 - cpacr_el1); 330 - else 326 + if (!system_supports_fpsimd()) 327 + return false; 328 + 329 + if (system_supports_sve()) { 330 + sve_guest = vcpu_has_sve(vcpu); 331 + sve_host = vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE; 332 + vhe = true; 333 + } else { 334 + sve_guest = false; 335 + sve_host = false; 336 + vhe = has_vhe(); 337 + } 338 + 339 + hsr_ec = kvm_vcpu_trap_get_class(vcpu); 340 + if (hsr_ec != ESR_ELx_EC_FP_ASIMD && 341 + hsr_ec != ESR_ELx_EC_SVE) 342 + return false; 343 + 344 + /* Don't handle SVE traps for non-SVE vcpus here: */ 345 + if (!sve_guest) 346 + if (hsr_ec != ESR_ELx_EC_FP_ASIMD) 347 + return false; 348 + 349 + /* Valid trap. Switch the context: */ 350 + 351 + if (vhe) { 352 + u64 reg = read_sysreg(cpacr_el1) | CPACR_EL1_FPEN; 353 + 354 + if (sve_guest) 355 + reg |= CPACR_EL1_ZEN; 356 + 357 + write_sysreg(reg, cpacr_el1); 358 + } else { 331 359 write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP, 332 360 cptr_el2); 361 + } 333 362 334 363 isb(); 335 364 ··· 370 335 * In the SVE case, VHE is assumed: it is enforced by 371 336 * Kconfig and kvm_arch_init(). 372 337 */ 373 - if (system_supports_sve() && 374 - (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) { 338 + if (sve_host) { 375 339 struct thread_struct *thread = container_of( 376 - host_fpsimd, 340 + vcpu->arch.host_fpsimd_state, 377 341 struct thread_struct, uw.fpsimd_state); 378 342 379 - sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr); 343 + sve_save_state(sve_pffr(thread), 344 + &vcpu->arch.host_fpsimd_state->fpsr); 380 345 } else { 381 - __fpsimd_save_state(host_fpsimd); 346 + __fpsimd_save_state(vcpu->arch.host_fpsimd_state); 382 347 } 383 348 384 349 vcpu->arch.flags &= ~KVM_ARM64_FP_HOST; 385 350 } 386 351 387 - __fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs); 352 + if (sve_guest) { 353 + sve_load_state(vcpu_sve_pffr(vcpu), 354 + &vcpu->arch.ctxt.gp_regs.fp_regs.fpsr, 355 + sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1); 356 + write_sysreg_s(vcpu->arch.ctxt.sys_regs[ZCR_EL1], SYS_ZCR_EL12); 357 + } else { 358 + __fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs); 359 + } 388 360 389 361 /* Skip restoring fpexc32 for AArch64 guests */ 390 362 if (!(read_sysreg(hcr_el2) & HCR_RW)) ··· 427 385 * and restore the guest context lazily. 428 386 * If FP/SIMD is not implemented, handle the trap and inject an 429 387 * undefined instruction exception to the guest. 388 + * Similarly for trapped SVE accesses. 430 389 */ 431 - if (system_supports_fpsimd() && 432 - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD) 433 - return __hyp_switch_fpsimd(vcpu); 390 + if (__hyp_handle_fpsimd(vcpu)) 391 + return true; 434 392 435 393 if (!__populate_fault_info(vcpu)) 436 394 return true; ··· 566 524 { 567 525 struct kvm_cpu_context *host_ctxt; 568 526 struct kvm_cpu_context *guest_ctxt; 527 + bool pmu_switch_needed; 569 528 u64 exit_code; 570 529 571 530 /* ··· 585 542 host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); 586 543 host_ctxt->__hyp_running_vcpu = vcpu; 587 544 guest_ctxt = &vcpu->arch.ctxt; 545 + 546 + pmu_switch_needed = __pmu_switch_to_guest(host_ctxt); 588 547 589 548 __sysreg_save_state_nvhe(host_ctxt); 590 549 ··· 633 588 * system may enable SPE here and make use of the TTBRs. 634 589 */ 635 590 __debug_switch_to_host(vcpu); 591 + 592 + if (pmu_switch_needed) 593 + __pmu_switch_to_host(host_ctxt); 636 594 637 595 /* Returning to host will clear PSR.I, remask PMR if needed */ 638 596 if (system_uses_irq_prio_masking())

+239

arch/arm64/kvm/pmu.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright 2019 Arm Limited 4 + * Author: Andrew Murray <Andrew.Murray@arm.com> 5 + */ 6 + #include <linux/kvm_host.h> 7 + #include <linux/perf_event.h> 8 + #include <asm/kvm_hyp.h> 9 + 10 + /* 11 + * Given the perf event attributes and system type, determine 12 + * if we are going to need to switch counters at guest entry/exit. 13 + */ 14 + static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) 15 + { 16 + /** 17 + * With VHE the guest kernel runs at EL1 and the host at EL2, 18 + * where user (EL0) is excluded then we have no reason to switch 19 + * counters. 20 + */ 21 + if (has_vhe() && attr->exclude_user) 22 + return false; 23 + 24 + /* Only switch if attributes are different */ 25 + return (attr->exclude_host != attr->exclude_guest); 26 + } 27 + 28 + /* 29 + * Add events to track that we may want to switch at guest entry/exit 30 + * time. 31 + */ 32 + void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) 33 + { 34 + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); 35 + 36 + if (!kvm_pmu_switch_needed(attr)) 37 + return; 38 + 39 + if (!attr->exclude_host) 40 + ctx->pmu_events.events_host |= set; 41 + if (!attr->exclude_guest) 42 + ctx->pmu_events.events_guest |= set; 43 + } 44 + 45 + /* 46 + * Stop tracking events 47 + */ 48 + void kvm_clr_pmu_events(u32 clr) 49 + { 50 + struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data); 51 + 52 + ctx->pmu_events.events_host &= ~clr; 53 + ctx->pmu_events.events_guest &= ~clr; 54 + } 55 + 56 + /** 57 + * Disable host events, enable guest events 58 + */ 59 + bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) 60 + { 61 + struct kvm_host_data *host; 62 + struct kvm_pmu_events *pmu; 63 + 64 + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); 65 + pmu = &host->pmu_events; 66 + 67 + if (pmu->events_host) 68 + write_sysreg(pmu->events_host, pmcntenclr_el0); 69 + 70 + if (pmu->events_guest) 71 + write_sysreg(pmu->events_guest, pmcntenset_el0); 72 + 73 + return (pmu->events_host || pmu->events_guest); 74 + } 75 + 76 + /** 77 + * Disable guest events, enable host events 78 + */ 79 + void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) 80 + { 81 + struct kvm_host_data *host; 82 + struct kvm_pmu_events *pmu; 83 + 84 + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); 85 + pmu = &host->pmu_events; 86 + 87 + if (pmu->events_guest) 88 + write_sysreg(pmu->events_guest, pmcntenclr_el0); 89 + 90 + if (pmu->events_host) 91 + write_sysreg(pmu->events_host, pmcntenset_el0); 92 + } 93 + 94 + #define PMEVTYPER_READ_CASE(idx) \ 95 + case idx: \ 96 + return read_sysreg(pmevtyper##idx##_el0) 97 + 98 + #define PMEVTYPER_WRITE_CASE(idx) \ 99 + case idx: \ 100 + write_sysreg(val, pmevtyper##idx##_el0); \ 101 + break 102 + 103 + #define PMEVTYPER_CASES(readwrite) \ 104 + PMEVTYPER_##readwrite##_CASE(0); \ 105 + PMEVTYPER_##readwrite##_CASE(1); \ 106 + PMEVTYPER_##readwrite##_CASE(2); \ 107 + PMEVTYPER_##readwrite##_CASE(3); \ 108 + PMEVTYPER_##readwrite##_CASE(4); \ 109 + PMEVTYPER_##readwrite##_CASE(5); \ 110 + PMEVTYPER_##readwrite##_CASE(6); \ 111 + PMEVTYPER_##readwrite##_CASE(7); \ 112 + PMEVTYPER_##readwrite##_CASE(8); \ 113 + PMEVTYPER_##readwrite##_CASE(9); \ 114 + PMEVTYPER_##readwrite##_CASE(10); \ 115 + PMEVTYPER_##readwrite##_CASE(11); \ 116 + PMEVTYPER_##readwrite##_CASE(12); \ 117 + PMEVTYPER_##readwrite##_CASE(13); \ 118 + PMEVTYPER_##readwrite##_CASE(14); \ 119 + PMEVTYPER_##readwrite##_CASE(15); \ 120 + PMEVTYPER_##readwrite##_CASE(16); \ 121 + PMEVTYPER_##readwrite##_CASE(17); \ 122 + PMEVTYPER_##readwrite##_CASE(18); \ 123 + PMEVTYPER_##readwrite##_CASE(19); \ 124 + PMEVTYPER_##readwrite##_CASE(20); \ 125 + PMEVTYPER_##readwrite##_CASE(21); \ 126 + PMEVTYPER_##readwrite##_CASE(22); \ 127 + PMEVTYPER_##readwrite##_CASE(23); \ 128 + PMEVTYPER_##readwrite##_CASE(24); \ 129 + PMEVTYPER_##readwrite##_CASE(25); \ 130 + PMEVTYPER_##readwrite##_CASE(26); \ 131 + PMEVTYPER_##readwrite##_CASE(27); \ 132 + PMEVTYPER_##readwrite##_CASE(28); \ 133 + PMEVTYPER_##readwrite##_CASE(29); \ 134 + PMEVTYPER_##readwrite##_CASE(30) 135 + 136 + /* 137 + * Read a value direct from PMEVTYPER<idx> where idx is 0-30 138 + * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). 139 + */ 140 + static u64 kvm_vcpu_pmu_read_evtype_direct(int idx) 141 + { 142 + switch (idx) { 143 + PMEVTYPER_CASES(READ); 144 + case ARMV8_PMU_CYCLE_IDX: 145 + return read_sysreg(pmccfiltr_el0); 146 + default: 147 + WARN_ON(1); 148 + } 149 + 150 + return 0; 151 + } 152 + 153 + /* 154 + * Write a value direct to PMEVTYPER<idx> where idx is 0-30 155 + * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). 156 + */ 157 + static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val) 158 + { 159 + switch (idx) { 160 + PMEVTYPER_CASES(WRITE); 161 + case ARMV8_PMU_CYCLE_IDX: 162 + write_sysreg(val, pmccfiltr_el0); 163 + break; 164 + default: 165 + WARN_ON(1); 166 + } 167 + } 168 + 169 + /* 170 + * Modify ARMv8 PMU events to include EL0 counting 171 + */ 172 + static void kvm_vcpu_pmu_enable_el0(unsigned long events) 173 + { 174 + u64 typer; 175 + u32 counter; 176 + 177 + for_each_set_bit(counter, &events, 32) { 178 + typer = kvm_vcpu_pmu_read_evtype_direct(counter); 179 + typer &= ~ARMV8_PMU_EXCLUDE_EL0; 180 + kvm_vcpu_pmu_write_evtype_direct(counter, typer); 181 + } 182 + } 183 + 184 + /* 185 + * Modify ARMv8 PMU events to exclude EL0 counting 186 + */ 187 + static void kvm_vcpu_pmu_disable_el0(unsigned long events) 188 + { 189 + u64 typer; 190 + u32 counter; 191 + 192 + for_each_set_bit(counter, &events, 32) { 193 + typer = kvm_vcpu_pmu_read_evtype_direct(counter); 194 + typer |= ARMV8_PMU_EXCLUDE_EL0; 195 + kvm_vcpu_pmu_write_evtype_direct(counter, typer); 196 + } 197 + } 198 + 199 + /* 200 + * On VHE ensure that only guest events have EL0 counting enabled 201 + */ 202 + void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) 203 + { 204 + struct kvm_cpu_context *host_ctxt; 205 + struct kvm_host_data *host; 206 + u32 events_guest, events_host; 207 + 208 + if (!has_vhe()) 209 + return; 210 + 211 + host_ctxt = vcpu->arch.host_cpu_context; 212 + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); 213 + events_guest = host->pmu_events.events_guest; 214 + events_host = host->pmu_events.events_host; 215 + 216 + kvm_vcpu_pmu_enable_el0(events_guest); 217 + kvm_vcpu_pmu_disable_el0(events_host); 218 + } 219 + 220 + /* 221 + * On VHE ensure that only host events have EL0 counting enabled 222 + */ 223 + void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) 224 + { 225 + struct kvm_cpu_context *host_ctxt; 226 + struct kvm_host_data *host; 227 + u32 events_guest, events_host; 228 + 229 + if (!has_vhe()) 230 + return; 231 + 232 + host_ctxt = vcpu->arch.host_cpu_context; 233 + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); 234 + events_guest = host->pmu_events.events_guest; 235 + events_host = host->pmu_events.events_host; 236 + 237 + kvm_vcpu_pmu_enable_el0(events_host); 238 + kvm_vcpu_pmu_disable_el0(events_guest); 239 + }

+166 -1

arch/arm64/kvm/reset.c

··· 20 20 */ 21 21 22 22 #include <linux/errno.h> 23 + #include <linux/kernel.h> 23 24 #include <linux/kvm_host.h> 24 25 #include <linux/kvm.h> 25 26 #include <linux/hw_breakpoint.h> 27 + #include <linux/slab.h> 28 + #include <linux/string.h> 29 + #include <linux/types.h> 26 30 27 31 #include <kvm/arm_arch_timer.h> 28 32 29 33 #include <asm/cpufeature.h> 30 34 #include <asm/cputype.h> 35 + #include <asm/fpsimd.h> 31 36 #include <asm/ptrace.h> 32 37 #include <asm/kvm_arm.h> 33 38 #include <asm/kvm_asm.h> 34 39 #include <asm/kvm_coproc.h> 35 40 #include <asm/kvm_emulate.h> 36 41 #include <asm/kvm_mmu.h> 42 + #include <asm/virt.h> 37 43 38 44 /* Maximum phys_shift supported for any VM on this host */ 39 45 static u32 kvm_ipa_limit; ··· 98 92 case KVM_CAP_ARM_VM_IPA_SIZE: 99 93 r = kvm_ipa_limit; 100 94 break; 95 + case KVM_CAP_ARM_SVE: 96 + r = system_supports_sve(); 97 + break; 98 + case KVM_CAP_ARM_PTRAUTH_ADDRESS: 99 + case KVM_CAP_ARM_PTRAUTH_GENERIC: 100 + r = has_vhe() && system_supports_address_auth() && 101 + system_supports_generic_auth(); 102 + break; 101 103 default: 102 104 r = 0; 103 105 } 104 106 105 107 return r; 108 + } 109 + 110 + unsigned int kvm_sve_max_vl; 111 + 112 + int kvm_arm_init_sve(void) 113 + { 114 + if (system_supports_sve()) { 115 + kvm_sve_max_vl = sve_max_virtualisable_vl; 116 + 117 + /* 118 + * The get_sve_reg()/set_sve_reg() ioctl interface will need 119 + * to be extended with multiple register slice support in 120 + * order to support vector lengths greater than 121 + * SVE_VL_ARCH_MAX: 122 + */ 123 + if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX)) 124 + kvm_sve_max_vl = SVE_VL_ARCH_MAX; 125 + 126 + /* 127 + * Don't even try to make use of vector lengths that 128 + * aren't available on all CPUs, for now: 129 + */ 130 + if (kvm_sve_max_vl < sve_max_vl) 131 + pr_warn("KVM: SVE vector length for guests limited to %u bytes\n", 132 + kvm_sve_max_vl); 133 + } 134 + 135 + return 0; 136 + } 137 + 138 + static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu) 139 + { 140 + if (!system_supports_sve()) 141 + return -EINVAL; 142 + 143 + /* Verify that KVM startup enforced this when SVE was detected: */ 144 + if (WARN_ON(!has_vhe())) 145 + return -EINVAL; 146 + 147 + vcpu->arch.sve_max_vl = kvm_sve_max_vl; 148 + 149 + /* 150 + * Userspace can still customize the vector lengths by writing 151 + * KVM_REG_ARM64_SVE_VLS. Allocation is deferred until 152 + * kvm_arm_vcpu_finalize(), which freezes the configuration. 153 + */ 154 + vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_SVE; 155 + 156 + return 0; 157 + } 158 + 159 + /* 160 + * Finalize vcpu's maximum SVE vector length, allocating 161 + * vcpu->arch.sve_state as necessary. 162 + */ 163 + static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu) 164 + { 165 + void *buf; 166 + unsigned int vl; 167 + 168 + vl = vcpu->arch.sve_max_vl; 169 + 170 + /* 171 + * Resposibility for these properties is shared between 172 + * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and 173 + * set_sve_vls(). Double-check here just to be sure: 174 + */ 175 + if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl || 176 + vl > SVE_VL_ARCH_MAX)) 177 + return -EIO; 178 + 179 + buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL); 180 + if (!buf) 181 + return -ENOMEM; 182 + 183 + vcpu->arch.sve_state = buf; 184 + vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED; 185 + return 0; 186 + } 187 + 188 + int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature) 189 + { 190 + switch (feature) { 191 + case KVM_ARM_VCPU_SVE: 192 + if (!vcpu_has_sve(vcpu)) 193 + return -EINVAL; 194 + 195 + if (kvm_arm_vcpu_sve_finalized(vcpu)) 196 + return -EPERM; 197 + 198 + return kvm_vcpu_finalize_sve(vcpu); 199 + } 200 + 201 + return -EINVAL; 202 + } 203 + 204 + bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu) 205 + { 206 + if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu)) 207 + return false; 208 + 209 + return true; 210 + } 211 + 212 + void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) 213 + { 214 + kfree(vcpu->arch.sve_state); 215 + } 216 + 217 + static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu) 218 + { 219 + if (vcpu_has_sve(vcpu)) 220 + memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu)); 221 + } 222 + 223 + static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu) 224 + { 225 + /* Support ptrauth only if the system supports these capabilities. */ 226 + if (!has_vhe()) 227 + return -EINVAL; 228 + 229 + if (!system_supports_address_auth() || 230 + !system_supports_generic_auth()) 231 + return -EINVAL; 232 + /* 233 + * For now make sure that both address/generic pointer authentication 234 + * features are requested by the userspace together. 235 + */ 236 + if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) || 237 + !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features)) 238 + return -EINVAL; 239 + 240 + vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_PTRAUTH; 241 + return 0; 106 242 } 107 243 108 244 /** ··· 253 105 * 254 106 * This function finds the right table above and sets the registers on 255 107 * the virtual CPU struct to their architecturally defined reset 256 - * values. 108 + * values, except for registers whose reset is deferred until 109 + * kvm_arm_vcpu_finalize(). 257 110 * 258 111 * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT 259 112 * ioctl or as part of handling a request issued by another VCPU in the PSCI ··· 279 130 loaded = (vcpu->cpu != -1); 280 131 if (loaded) 281 132 kvm_arch_vcpu_put(vcpu); 133 + 134 + if (!kvm_arm_vcpu_sve_finalized(vcpu)) { 135 + if (test_bit(KVM_ARM_VCPU_SVE, vcpu->arch.features)) { 136 + ret = kvm_vcpu_enable_sve(vcpu); 137 + if (ret) 138 + goto out; 139 + } 140 + } else { 141 + kvm_vcpu_reset_sve(vcpu); 142 + } 143 + 144 + if (test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) || 145 + test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features)) { 146 + if (kvm_vcpu_enable_ptrauth(vcpu)) 147 + goto out; 148 + } 282 149 283 150 switch (vcpu->arch.target) { 284 151 default:

+157 -26

arch/arm64/kvm/sys_regs.c

··· 695 695 val |= p->regval & ARMV8_PMU_PMCR_MASK; 696 696 __vcpu_sys_reg(vcpu, PMCR_EL0) = val; 697 697 kvm_pmu_handle_pmcr(vcpu, val); 698 + kvm_vcpu_pmu_restore_guest(vcpu); 698 699 } else { 699 700 /* PMCR.P & PMCR.C are RAZ */ 700 701 val = __vcpu_sys_reg(vcpu, PMCR_EL0) ··· 851 850 if (p->is_write) { 852 851 kvm_pmu_set_counter_event_type(vcpu, p->regval, idx); 853 852 __vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK; 853 + kvm_vcpu_pmu_restore_guest(vcpu); 854 854 } else { 855 855 p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK; 856 856 } ··· 877 875 /* accessing PMCNTENSET_EL0 */ 878 876 __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; 879 877 kvm_pmu_enable_counter(vcpu, val); 878 + kvm_vcpu_pmu_restore_guest(vcpu); 880 879 } else { 881 880 /* accessing PMCNTENCLR_EL0 */ 882 881 __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val; ··· 1010 1007 { SYS_DESC(SYS_PMEVTYPERn_EL0(n)), \ 1011 1008 access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), } 1012 1009 1010 + static bool trap_ptrauth(struct kvm_vcpu *vcpu, 1011 + struct sys_reg_params *p, 1012 + const struct sys_reg_desc *rd) 1013 + { 1014 + kvm_arm_vcpu_ptrauth_trap(vcpu); 1015 + 1016 + /* 1017 + * Return false for both cases as we never skip the trapped 1018 + * instruction: 1019 + * 1020 + * - Either we re-execute the same key register access instruction 1021 + * after enabling ptrauth. 1022 + * - Or an UNDEF is injected as ptrauth is not supported/enabled. 1023 + */ 1024 + return false; 1025 + } 1026 + 1027 + static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu, 1028 + const struct sys_reg_desc *rd) 1029 + { 1030 + return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN_USER | REG_HIDDEN_GUEST; 1031 + } 1032 + 1033 + #define __PTRAUTH_KEY(k) \ 1034 + { SYS_DESC(SYS_## k), trap_ptrauth, reset_unknown, k, \ 1035 + .visibility = ptrauth_visibility} 1036 + 1037 + #define PTRAUTH_KEY(k) \ 1038 + __PTRAUTH_KEY(k ## KEYLO_EL1), \ 1039 + __PTRAUTH_KEY(k ## KEYHI_EL1) 1040 + 1013 1041 static bool access_arch_timer(struct kvm_vcpu *vcpu, 1014 1042 struct sys_reg_params *p, 1015 1043 const struct sys_reg_desc *r) ··· 1078 1044 } 1079 1045 1080 1046 /* Read a sanitised cpufeature ID register by sys_reg_desc */ 1081 - static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) 1047 + static u64 read_id_reg(const struct kvm_vcpu *vcpu, 1048 + struct sys_reg_desc const *r, bool raz) 1082 1049 { 1083 1050 u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, 1084 1051 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2); 1085 1052 u64 val = raz ? 0 : read_sanitised_ftr_reg(id); 1086 1053 1087 - if (id == SYS_ID_AA64PFR0_EL1) { 1088 - if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT)) 1089 - kvm_debug("SVE unsupported for guests, suppressing\n"); 1090 - 1054 + if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) { 1091 1055 val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT); 1092 - } else if (id == SYS_ID_AA64ISAR1_EL1) { 1093 - const u64 ptrauth_mask = (0xfUL << ID_AA64ISAR1_APA_SHIFT) | 1094 - (0xfUL << ID_AA64ISAR1_API_SHIFT) | 1095 - (0xfUL << ID_AA64ISAR1_GPA_SHIFT) | 1096 - (0xfUL << ID_AA64ISAR1_GPI_SHIFT); 1097 - if (val & ptrauth_mask) 1098 - kvm_debug("ptrauth unsupported for guests, suppressing\n"); 1099 - val &= ~ptrauth_mask; 1056 + } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) { 1057 + val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) | 1058 + (0xfUL << ID_AA64ISAR1_API_SHIFT) | 1059 + (0xfUL << ID_AA64ISAR1_GPA_SHIFT) | 1060 + (0xfUL << ID_AA64ISAR1_GPI_SHIFT)); 1100 1061 } 1101 1062 1102 1063 return val; ··· 1107 1078 if (p->is_write) 1108 1079 return write_to_read_only(vcpu, p, r); 1109 1080 1110 - p->regval = read_id_reg(r, raz); 1081 + p->regval = read_id_reg(vcpu, r, raz); 1111 1082 return true; 1112 1083 } 1113 1084 ··· 1129 1100 static int reg_to_user(void __user *uaddr, const u64 *val, u64 id); 1130 1101 static u64 sys_reg_to_index(const struct sys_reg_desc *reg); 1131 1102 1103 + /* Visibility overrides for SVE-specific control registers */ 1104 + static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, 1105 + const struct sys_reg_desc *rd) 1106 + { 1107 + if (vcpu_has_sve(vcpu)) 1108 + return 0; 1109 + 1110 + return REG_HIDDEN_USER | REG_HIDDEN_GUEST; 1111 + } 1112 + 1113 + /* Visibility overrides for SVE-specific ID registers */ 1114 + static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu, 1115 + const struct sys_reg_desc *rd) 1116 + { 1117 + if (vcpu_has_sve(vcpu)) 1118 + return 0; 1119 + 1120 + return REG_HIDDEN_USER; 1121 + } 1122 + 1123 + /* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */ 1124 + static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu) 1125 + { 1126 + if (!vcpu_has_sve(vcpu)) 1127 + return 0; 1128 + 1129 + return read_sanitised_ftr_reg(SYS_ID_AA64ZFR0_EL1); 1130 + } 1131 + 1132 + static bool access_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1133 + struct sys_reg_params *p, 1134 + const struct sys_reg_desc *rd) 1135 + { 1136 + if (p->is_write) 1137 + return write_to_read_only(vcpu, p, rd); 1138 + 1139 + p->regval = guest_id_aa64zfr0_el1(vcpu); 1140 + return true; 1141 + } 1142 + 1143 + static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1144 + const struct sys_reg_desc *rd, 1145 + const struct kvm_one_reg *reg, void __user *uaddr) 1146 + { 1147 + u64 val; 1148 + 1149 + if (WARN_ON(!vcpu_has_sve(vcpu))) 1150 + return -ENOENT; 1151 + 1152 + val = guest_id_aa64zfr0_el1(vcpu); 1153 + return reg_to_user(uaddr, &val, reg->id); 1154 + } 1155 + 1156 + static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu, 1157 + const struct sys_reg_desc *rd, 1158 + const struct kvm_one_reg *reg, void __user *uaddr) 1159 + { 1160 + const u64 id = sys_reg_to_index(rd); 1161 + int err; 1162 + u64 val; 1163 + 1164 + if (WARN_ON(!vcpu_has_sve(vcpu))) 1165 + return -ENOENT; 1166 + 1167 + err = reg_from_user(&val, uaddr, id); 1168 + if (err) 1169 + return err; 1170 + 1171 + /* This is what we mean by invariant: you can't change it. */ 1172 + if (val != guest_id_aa64zfr0_el1(vcpu)) 1173 + return -EINVAL; 1174 + 1175 + return 0; 1176 + } 1177 + 1132 1178 /* 1133 1179 * cpufeature ID register user accessors 1134 1180 * ··· 1211 1107 * are stored, and for set_id_reg() we don't allow the effective value 1212 1108 * to be changed. 1213 1109 */ 1214 - static int __get_id_reg(const struct sys_reg_desc *rd, void __user *uaddr, 1110 + static int __get_id_reg(const struct kvm_vcpu *vcpu, 1111 + const struct sys_reg_desc *rd, void __user *uaddr, 1215 1112 bool raz) 1216 1113 { 1217 1114 const u64 id = sys_reg_to_index(rd); 1218 - const u64 val = read_id_reg(rd, raz); 1115 + const u64 val = read_id_reg(vcpu, rd, raz); 1219 1116 1220 1117 return reg_to_user(uaddr, &val, id); 1221 1118 } 1222 1119 1223 - static int __set_id_reg(const struct sys_reg_desc *rd, void __user *uaddr, 1120 + static int __set_id_reg(const struct kvm_vcpu *vcpu, 1121 + const struct sys_reg_desc *rd, void __user *uaddr, 1224 1122 bool raz) 1225 1123 { 1226 1124 const u64 id = sys_reg_to_index(rd); ··· 1234 1128 return err; 1235 1129 1236 1130 /* This is what we mean by invariant: you can't change it. */ 1237 - if (val != read_id_reg(rd, raz)) 1131 + if (val != read_id_reg(vcpu, rd, raz)) 1238 1132 return -EINVAL; 1239 1133 1240 1134 return 0; ··· 1243 1137 static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1244 1138 const struct kvm_one_reg *reg, void __user *uaddr) 1245 1139 { 1246 - return __get_id_reg(rd, uaddr, false); 1140 + return __get_id_reg(vcpu, rd, uaddr, false); 1247 1141 } 1248 1142 1249 1143 static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1250 1144 const struct kvm_one_reg *reg, void __user *uaddr) 1251 1145 { 1252 - return __set_id_reg(rd, uaddr, false); 1146 + return __set_id_reg(vcpu, rd, uaddr, false); 1253 1147 } 1254 1148 1255 1149 static int get_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1256 1150 const struct kvm_one_reg *reg, void __user *uaddr) 1257 1151 { 1258 - return __get_id_reg(rd, uaddr, true); 1152 + return __get_id_reg(vcpu, rd, uaddr, true); 1259 1153 } 1260 1154 1261 1155 static int set_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1262 1156 const struct kvm_one_reg *reg, void __user *uaddr) 1263 1157 { 1264 - return __set_id_reg(rd, uaddr, true); 1158 + return __set_id_reg(vcpu, rd, uaddr, true); 1265 1159 } 1266 1160 1267 1161 static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, ··· 1449 1343 ID_SANITISED(ID_AA64PFR1_EL1), 1450 1344 ID_UNALLOCATED(4,2), 1451 1345 ID_UNALLOCATED(4,3), 1452 - ID_UNALLOCATED(4,4), 1346 + { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility }, 1453 1347 ID_UNALLOCATED(4,5), 1454 1348 ID_UNALLOCATED(4,6), 1455 1349 ID_UNALLOCATED(4,7), ··· 1486 1380 1487 1381 { SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 }, 1488 1382 { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 }, 1383 + { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility }, 1489 1384 { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 }, 1490 1385 { SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 }, 1491 1386 { SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 }, 1387 + 1388 + PTRAUTH_KEY(APIA), 1389 + PTRAUTH_KEY(APIB), 1390 + PTRAUTH_KEY(APDA), 1391 + PTRAUTH_KEY(APDB), 1392 + PTRAUTH_KEY(APGA), 1492 1393 1493 1394 { SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 }, 1494 1395 { SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 }, ··· 2037 1924 { 2038 1925 trace_kvm_sys_access(*vcpu_pc(vcpu), params, r); 2039 1926 1927 + /* Check for regs disabled by runtime config */ 1928 + if (sysreg_hidden_from_guest(vcpu, r)) { 1929 + kvm_inject_undefined(vcpu); 1930 + return; 1931 + } 1932 + 2040 1933 /* 2041 1934 * Not having an accessor means that we have configured a trap 2042 1935 * that we don't know how to handle. This certainly qualifies ··· 2554 2435 if (!r) 2555 2436 return get_invariant_sys_reg(reg->id, uaddr); 2556 2437 2438 + /* Check for regs disabled by runtime config */ 2439 + if (sysreg_hidden_from_user(vcpu, r)) 2440 + return -ENOENT; 2441 + 2557 2442 if (r->get_user) 2558 2443 return (r->get_user)(vcpu, r, reg, uaddr); 2559 2444 ··· 2578 2455 r = index_to_sys_reg_desc(vcpu, reg->id); 2579 2456 if (!r) 2580 2457 return set_invariant_sys_reg(reg->id, uaddr); 2458 + 2459 + /* Check for regs disabled by runtime config */ 2460 + if (sysreg_hidden_from_user(vcpu, r)) 2461 + return -ENOENT; 2581 2462 2582 2463 if (r->set_user) 2583 2464 return (r->set_user)(vcpu, r, reg, uaddr); ··· 2639 2512 return true; 2640 2513 } 2641 2514 2642 - static int walk_one_sys_reg(const struct sys_reg_desc *rd, 2515 + static int walk_one_sys_reg(const struct kvm_vcpu *vcpu, 2516 + const struct sys_reg_desc *rd, 2643 2517 u64 __user **uind, 2644 2518 unsigned int *total) 2645 2519 { ··· 2649 2521 * and for which no custom user accessor is provided. 2650 2522 */ 2651 2523 if (!(rd->reg || rd->get_user)) 2524 + return 0; 2525 + 2526 + if (sysreg_hidden_from_user(vcpu, rd)) 2652 2527 return 0; 2653 2528 2654 2529 if (!copy_reg_to_user(rd, uind)) ··· 2682 2551 int cmp = cmp_sys_reg(i1, i2); 2683 2552 /* target-specific overrides generic entry. */ 2684 2553 if (cmp <= 0) 2685 - err = walk_one_sys_reg(i1, &uind, &total); 2554 + err = walk_one_sys_reg(vcpu, i1, &uind, &total); 2686 2555 else 2687 - err = walk_one_sys_reg(i2, &uind, &total); 2556 + err = walk_one_sys_reg(vcpu, i2, &uind, &total); 2688 2557 2689 2558 if (err) 2690 2559 return err;

+25

arch/arm64/kvm/sys_regs.h

··· 64 64 const struct kvm_one_reg *reg, void __user *uaddr); 65 65 int (*set_user)(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 66 66 const struct kvm_one_reg *reg, void __user *uaddr); 67 + 68 + /* Return mask of REG_* runtime visibility overrides */ 69 + unsigned int (*visibility)(const struct kvm_vcpu *vcpu, 70 + const struct sys_reg_desc *rd); 67 71 }; 72 + 73 + #define REG_HIDDEN_USER (1 << 0) /* hidden from userspace ioctls */ 74 + #define REG_HIDDEN_GUEST (1 << 1) /* hidden from guest */ 68 75 69 76 static inline void print_sys_reg_instr(const struct sys_reg_params *p) 70 77 { ··· 107 100 BUG_ON(!r->reg); 108 101 BUG_ON(r->reg >= NR_SYS_REGS); 109 102 __vcpu_sys_reg(vcpu, r->reg) = r->val; 103 + } 104 + 105 + static inline bool sysreg_hidden_from_guest(const struct kvm_vcpu *vcpu, 106 + const struct sys_reg_desc *r) 107 + { 108 + if (likely(!r->visibility)) 109 + return false; 110 + 111 + return r->visibility(vcpu, r) & REG_HIDDEN_GUEST; 112 + } 113 + 114 + static inline bool sysreg_hidden_from_user(const struct kvm_vcpu *vcpu, 115 + const struct sys_reg_desc *r) 116 + { 117 + if (likely(!r->visibility)) 118 + return false; 119 + 120 + return r->visibility(vcpu, r) & REG_HIDDEN_USER; 110 121 } 111 122 112 123 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,

+10 -1

arch/powerpc/include/asm/kvm_host.h

··· 201 201 struct kref kref; 202 202 }; 203 203 204 + #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) 205 + 204 206 struct kvmppc_spapr_tce_table { 205 207 struct list_head list; 206 208 struct kvm *kvm; ··· 212 210 u64 offset; /* in pages */ 213 211 u64 size; /* window size in pages */ 214 212 struct list_head iommu_tables; 213 + struct mutex alloc_lock; 215 214 struct page *pages[0]; 216 215 }; 217 216 ··· 225 222 struct kvmppc_xive; 226 223 struct kvmppc_xive_vcpu; 227 224 extern struct kvm_device_ops kvm_xive_ops; 225 + extern struct kvm_device_ops kvm_xive_native_ops; 228 226 229 227 struct kvmppc_passthru_irqmap; 230 228 ··· 316 312 #endif 317 313 #ifdef CONFIG_KVM_XICS 318 314 struct kvmppc_xics *xics; 319 - struct kvmppc_xive *xive; 315 + struct kvmppc_xive *xive; /* Current XIVE device in use */ 316 + struct { 317 + struct kvmppc_xive *native; 318 + struct kvmppc_xive *xics_on_xive; 319 + } xive_devices; 320 320 struct kvmppc_passthru_irqmap *pimap; 321 321 #endif 322 322 struct kvmppc_ops *kvm_ops; ··· 457 449 #define KVMPPC_IRQ_DEFAULT 0 458 450 #define KVMPPC_IRQ_MPIC 1 459 451 #define KVMPPC_IRQ_XICS 2 /* Includes a XIVE option */ 452 + #define KVMPPC_IRQ_XIVE 3 /* XIVE native exploitation mode */ 460 453 461 454 #define MMIO_HPTE_CACHE_SIZE 4 462 455

+37 -4

arch/powerpc/include/asm/kvm_ppc.h

··· 197 197 (iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \ 198 198 (stt)->size, (ioba), (npages)) ? \ 199 199 H_PARAMETER : H_SUCCESS) 200 - extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce, 201 - unsigned long *ua, unsigned long **prmap); 202 - extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt, 203 - unsigned long idx, unsigned long tce); 204 200 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 205 201 unsigned long ioba, unsigned long tce); 206 202 extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, ··· 269 273 u64 addr; 270 274 u64 length; 271 275 } vpaval; 276 + u64 xive_timaval[2]; 272 277 }; 273 278 274 279 struct kvmppc_ops { ··· 477 480 extern void kvm_hv_vm_deactivated(void); 478 481 extern bool kvm_hv_mode_active(void); 479 482 483 + extern void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu, 484 + struct kvm_nested_guest *nested); 485 + 480 486 #else 481 487 static inline void __init kvm_cma_reserve(void) 482 488 {} ··· 594 594 extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, 595 595 int level, bool line_status); 596 596 extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu); 597 + 598 + static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) 599 + { 600 + return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE; 601 + } 602 + 603 + extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev, 604 + struct kvm_vcpu *vcpu, u32 cpu); 605 + extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu); 606 + extern void kvmppc_xive_native_init_module(void); 607 + extern void kvmppc_xive_native_exit_module(void); 608 + extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, 609 + union kvmppc_one_reg *val); 610 + extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, 611 + union kvmppc_one_reg *val); 612 + 597 613 #else 598 614 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server, 599 615 u32 priority) { return -1; } ··· 633 617 static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, 634 618 int level, bool line_status) { return -ENODEV; } 635 619 static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { } 620 + 621 + static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu) 622 + { return 0; } 623 + static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev, 624 + struct kvm_vcpu *vcpu, u32 cpu) { return -EBUSY; } 625 + static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { } 626 + static inline void kvmppc_xive_native_init_module(void) { } 627 + static inline void kvmppc_xive_native_exit_module(void) { } 628 + static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, 629 + union kvmppc_one_reg *val) 630 + { return 0; } 631 + static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, 632 + union kvmppc_one_reg *val) 633 + { return -ENOENT; } 634 + 636 635 #endif /* CONFIG_KVM_XIVE */ 637 636 638 637 #if defined(CONFIG_PPC_POWERNV) && defined(CONFIG_KVM_BOOK3S_64_HANDLER) ··· 696 665 unsigned long pte_index); 697 666 long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, 698 667 unsigned long pte_index); 668 + long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags, 669 + unsigned long dest, unsigned long src); 699 670 long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr, 700 671 unsigned long slb_v, unsigned int status, bool data); 701 672 unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);

+3

arch/powerpc/include/asm/xive.h

··· 23 23 * same offset regardless of where the code is executing 24 24 */ 25 25 extern void __iomem *xive_tima; 26 + extern unsigned long xive_tima_os; 26 27 27 28 /* 28 29 * Offset in the TM area of our current execution level (provided by ··· 74 73 u32 esc_irq; 75 74 atomic_t count; 76 75 atomic_t pending_count; 76 + u64 guest_qaddr; 77 + u32 guest_qshift; 77 78 }; 78 79 79 80 /* Global enable flags for the XIVE support */

+46

arch/powerpc/include/uapi/asm/kvm.h

··· 482 482 #define KVM_REG_PPC_ICP_PPRI_SHIFT 16 /* pending irq priority */ 483 483 #define KVM_REG_PPC_ICP_PPRI_MASK 0xff 484 484 485 + #define KVM_REG_PPC_VP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x8d) 486 + 485 487 /* Device control API: PPC-specific devices */ 486 488 #define KVM_DEV_MPIC_GRP_MISC 1 487 489 #define KVM_DEV_MPIC_BASE_ADDR 0 /* 64-bit */ ··· 678 676 #define KVM_XICS_PENDING (1ULL << 42) 679 677 #define KVM_XICS_PRESENTED (1ULL << 43) 680 678 #define KVM_XICS_QUEUED (1ULL << 44) 679 + 680 + /* POWER9 XIVE Native Interrupt Controller */ 681 + #define KVM_DEV_XIVE_GRP_CTRL 1 682 + #define KVM_DEV_XIVE_RESET 1 683 + #define KVM_DEV_XIVE_EQ_SYNC 2 684 + #define KVM_DEV_XIVE_GRP_SOURCE 2 /* 64-bit source identifier */ 685 + #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG 3 /* 64-bit source identifier */ 686 + #define KVM_DEV_XIVE_GRP_EQ_CONFIG 4 /* 64-bit EQ identifier */ 687 + #define KVM_DEV_XIVE_GRP_SOURCE_SYNC 5 /* 64-bit source identifier */ 688 + 689 + /* Layout of 64-bit XIVE source attribute values */ 690 + #define KVM_XIVE_LEVEL_SENSITIVE (1ULL << 0) 691 + #define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1) 692 + 693 + /* Layout of 64-bit XIVE source configuration attribute values */ 694 + #define KVM_XIVE_SOURCE_PRIORITY_SHIFT 0 695 + #define KVM_XIVE_SOURCE_PRIORITY_MASK 0x7 696 + #define KVM_XIVE_SOURCE_SERVER_SHIFT 3 697 + #define KVM_XIVE_SOURCE_SERVER_MASK 0xfffffff8ULL 698 + #define KVM_XIVE_SOURCE_MASKED_SHIFT 32 699 + #define KVM_XIVE_SOURCE_MASKED_MASK 0x100000000ULL 700 + #define KVM_XIVE_SOURCE_EISN_SHIFT 33 701 + #define KVM_XIVE_SOURCE_EISN_MASK 0xfffffffe00000000ULL 702 + 703 + /* Layout of 64-bit EQ identifier */ 704 + #define KVM_XIVE_EQ_PRIORITY_SHIFT 0 705 + #define KVM_XIVE_EQ_PRIORITY_MASK 0x7 706 + #define KVM_XIVE_EQ_SERVER_SHIFT 3 707 + #define KVM_XIVE_EQ_SERVER_MASK 0xfffffff8ULL 708 + 709 + /* Layout of EQ configuration values (64 bytes) */ 710 + struct kvm_ppc_xive_eq { 711 + __u32 flags; 712 + __u32 qshift; 713 + __u64 qaddr; 714 + __u32 qtoggle; 715 + __u32 qindex; 716 + __u8 pad[40]; 717 + }; 718 + 719 + #define KVM_XIVE_EQ_ALWAYS_NOTIFY 0x00000001 720 + 721 + #define KVM_XIVE_TIMA_PAGE_OFFSET 0 722 + #define KVM_XIVE_ESB_PAGE_OFFSET 4 681 723 682 724 #endif /* __LINUX_KVM_POWERPC_H */

+1 -1

arch/powerpc/kvm/Makefile

··· 94 94 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \ 95 95 book3s_xics.o 96 96 97 - kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o 97 + kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o book3s_xive_native.o 98 98 kvm-book3s_64-objs-$(CONFIG_SPAPR_TCE_IOMMU) += book3s_64_vio.o 99 99 100 100 kvm-book3s_64-module-objs := \

+41 -1

arch/powerpc/kvm/book3s.c

··· 651 651 *val = get_reg_val(id, kvmppc_xics_get_icp(vcpu)); 652 652 break; 653 653 #endif /* CONFIG_KVM_XICS */ 654 + #ifdef CONFIG_KVM_XIVE 655 + case KVM_REG_PPC_VP_STATE: 656 + if (!vcpu->arch.xive_vcpu) { 657 + r = -ENXIO; 658 + break; 659 + } 660 + if (xive_enabled()) 661 + r = kvmppc_xive_native_get_vp(vcpu, val); 662 + else 663 + r = -ENXIO; 664 + break; 665 + #endif /* CONFIG_KVM_XIVE */ 654 666 case KVM_REG_PPC_FSCR: 655 667 *val = get_reg_val(id, vcpu->arch.fscr); 656 668 break; ··· 736 724 r = kvmppc_xics_set_icp(vcpu, set_reg_val(id, *val)); 737 725 break; 738 726 #endif /* CONFIG_KVM_XICS */ 727 + #ifdef CONFIG_KVM_XIVE 728 + case KVM_REG_PPC_VP_STATE: 729 + if (!vcpu->arch.xive_vcpu) { 730 + r = -ENXIO; 731 + break; 732 + } 733 + if (xive_enabled()) 734 + r = kvmppc_xive_native_set_vp(vcpu, val); 735 + else 736 + r = -ENXIO; 737 + break; 738 + #endif /* CONFIG_KVM_XIVE */ 739 739 case KVM_REG_PPC_FSCR: 740 740 vcpu->arch.fscr = set_reg_val(id, *val); 741 741 break; ··· 915 891 kvmppc_rtas_tokens_free(kvm); 916 892 WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables)); 917 893 #endif 894 + 895 + #ifdef CONFIG_KVM_XICS 896 + /* 897 + * Free the XIVE devices which are not directly freed by the 898 + * device 'release' method 899 + */ 900 + kfree(kvm->arch.xive_devices.native); 901 + kvm->arch.xive_devices.native = NULL; 902 + kfree(kvm->arch.xive_devices.xics_on_xive); 903 + kvm->arch.xive_devices.xics_on_xive = NULL; 904 + #endif /* CONFIG_KVM_XICS */ 918 905 } 919 906 920 907 int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu) ··· 1085 1050 if (xics_on_xive()) { 1086 1051 kvmppc_xive_init_module(); 1087 1052 kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS); 1053 + kvmppc_xive_native_init_module(); 1054 + kvm_register_device_ops(&kvm_xive_native_ops, 1055 + KVM_DEV_TYPE_XIVE); 1088 1056 } else 1089 1057 #endif 1090 1058 kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS); ··· 1098 1060 static void kvmppc_book3s_exit(void) 1099 1061 { 1100 1062 #ifdef CONFIG_KVM_XICS 1101 - if (xics_on_xive()) 1063 + if (xics_on_xive()) { 1102 1064 kvmppc_xive_exit_module(); 1065 + kvmppc_xive_native_exit_module(); 1066 + } 1103 1067 #endif 1104 1068 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER 1105 1069 kvmppc_book3s_exit_pr();

+78 -18

arch/powerpc/kvm/book3s_64_vio.c

··· 228 228 unsigned long i, npages = kvmppc_tce_pages(stt->size); 229 229 230 230 for (i = 0; i < npages; i++) 231 - __free_page(stt->pages[i]); 231 + if (stt->pages[i]) 232 + __free_page(stt->pages[i]); 232 233 233 234 kfree(stt); 235 + } 236 + 237 + static struct page *kvm_spapr_get_tce_page(struct kvmppc_spapr_tce_table *stt, 238 + unsigned long sttpage) 239 + { 240 + struct page *page = stt->pages[sttpage]; 241 + 242 + if (page) 243 + return page; 244 + 245 + mutex_lock(&stt->alloc_lock); 246 + page = stt->pages[sttpage]; 247 + if (!page) { 248 + page = alloc_page(GFP_KERNEL | __GFP_ZERO); 249 + WARN_ON_ONCE(!page); 250 + if (page) 251 + stt->pages[sttpage] = page; 252 + } 253 + mutex_unlock(&stt->alloc_lock); 254 + 255 + return page; 234 256 } 235 257 236 258 static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf) ··· 263 241 if (vmf->pgoff >= kvmppc_tce_pages(stt->size)) 264 242 return VM_FAULT_SIGBUS; 265 243 266 - page = stt->pages[vmf->pgoff]; 244 + page = kvm_spapr_get_tce_page(stt, vmf->pgoff); 245 + if (!page) 246 + return VM_FAULT_OOM; 247 + 267 248 get_page(page); 268 249 vmf->page = page; 269 250 return 0; ··· 321 296 struct kvmppc_spapr_tce_table *siter; 322 297 unsigned long npages, size = args->size; 323 298 int ret = -ENOMEM; 324 - int i; 325 299 326 300 if (!args->size || args->page_shift < 12 || args->page_shift > 34 || 327 301 (args->offset + args->size > (ULLONG_MAX >> args->page_shift))) ··· 342 318 stt->offset = args->offset; 343 319 stt->size = size; 344 320 stt->kvm = kvm; 321 + mutex_init(&stt->alloc_lock); 345 322 INIT_LIST_HEAD_RCU(&stt->iommu_tables); 346 - 347 - for (i = 0; i < npages; i++) { 348 - stt->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); 349 - if (!stt->pages[i]) 350 - goto fail; 351 - } 352 323 353 324 mutex_lock(&kvm->lock); 354 325 ··· 371 352 if (ret >= 0) 372 353 return ret; 373 354 374 - fail: 375 - for (i = 0; i < npages; i++) 376 - if (stt->pages[i]) 377 - __free_page(stt->pages[i]); 378 - 379 355 kfree(stt); 380 356 fail_acct: 381 357 kvmppc_account_memlimit(kvmppc_stt_pages(npages), false); 382 358 return ret; 359 + } 360 + 361 + static long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce, 362 + unsigned long *ua) 363 + { 364 + unsigned long gfn = tce >> PAGE_SHIFT; 365 + struct kvm_memory_slot *memslot; 366 + 367 + memslot = search_memslots(kvm_memslots(kvm), gfn); 368 + if (!memslot) 369 + return -EINVAL; 370 + 371 + *ua = __gfn_to_hva_memslot(memslot, gfn) | 372 + (tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE)); 373 + 374 + return 0; 383 375 } 384 376 385 377 static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, ··· 408 378 if (iommu_tce_check_gpa(stt->page_shift, gpa)) 409 379 return H_TOO_HARD; 410 380 411 - if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL)) 381 + if (kvmppc_tce_to_ua(stt->kvm, tce, &ua)) 412 382 return H_TOO_HARD; 413 383 414 384 list_for_each_entry_rcu(stit, &stt->iommu_tables, next) { ··· 425 395 } 426 396 427 397 return H_SUCCESS; 398 + } 399 + 400 + /* 401 + * Handles TCE requests for emulated devices. 402 + * Puts guest TCE values to the table and expects user space to convert them. 403 + * Cannot fail so kvmppc_tce_validate must be called before it. 404 + */ 405 + static void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt, 406 + unsigned long idx, unsigned long tce) 407 + { 408 + struct page *page; 409 + u64 *tbl; 410 + unsigned long sttpage; 411 + 412 + idx -= stt->offset; 413 + sttpage = idx / TCES_PER_PAGE; 414 + page = stt->pages[sttpage]; 415 + 416 + if (!page) { 417 + /* We allow any TCE, not just with read|write permissions */ 418 + if (!tce) 419 + return; 420 + 421 + page = kvm_spapr_get_tce_page(stt, sttpage); 422 + if (!page) 423 + return; 424 + } 425 + tbl = page_to_virt(page); 426 + 427 + tbl[idx % TCES_PER_PAGE] = tce; 428 428 } 429 429 430 430 static void kvmppc_clear_tce(struct mm_struct *mm, struct iommu_table *tbl, ··· 611 551 612 552 dir = iommu_tce_direction(tce); 613 553 614 - if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) { 554 + if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) { 615 555 ret = H_PARAMETER; 616 556 goto unlock_exit; 617 557 } ··· 672 612 return ret; 673 613 674 614 idx = srcu_read_lock(&vcpu->kvm->srcu); 675 - if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) { 615 + if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua)) { 676 616 ret = H_TOO_HARD; 677 617 goto unlock_exit; 678 618 } ··· 707 647 } 708 648 tce = be64_to_cpu(tce); 709 649 710 - if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) 650 + if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) 711 651 return H_PARAMETER; 712 652 713 653 list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {

+67 -38

arch/powerpc/kvm/book3s_64_vio_hv.c

··· 66 66 67 67 #endif 68 68 69 - #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) 70 - 71 69 /* 72 70 * Finds a TCE table descriptor by LIOBN. 73 71 * ··· 86 88 EXPORT_SYMBOL_GPL(kvmppc_find_table); 87 89 88 90 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 91 + static long kvmppc_rm_tce_to_ua(struct kvm *kvm, unsigned long tce, 92 + unsigned long *ua, unsigned long **prmap) 93 + { 94 + unsigned long gfn = tce >> PAGE_SHIFT; 95 + struct kvm_memory_slot *memslot; 96 + 97 + memslot = search_memslots(kvm_memslots_raw(kvm), gfn); 98 + if (!memslot) 99 + return -EINVAL; 100 + 101 + *ua = __gfn_to_hva_memslot(memslot, gfn) | 102 + (tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE)); 103 + 104 + if (prmap) 105 + *prmap = &memslot->arch.rmap[gfn - memslot->base_gfn]; 106 + 107 + return 0; 108 + } 109 + 89 110 /* 90 111 * Validates TCE address. 91 112 * At the moment flags and page mask are validated. ··· 128 111 if (iommu_tce_check_gpa(stt->page_shift, gpa)) 129 112 return H_PARAMETER; 130 113 131 - if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL)) 114 + if (kvmppc_rm_tce_to_ua(stt->kvm, tce, &ua, NULL)) 132 115 return H_TOO_HARD; 133 116 134 117 list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { ··· 146 129 147 130 return H_SUCCESS; 148 131 } 149 - #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ 150 132 151 133 /* Note on the use of page_address() in real mode, 152 134 * ··· 177 161 /* 178 162 * Handles TCE requests for emulated devices. 179 163 * Puts guest TCE values to the table and expects user space to convert them. 180 - * Called in both real and virtual modes. 181 - * Cannot fail so kvmppc_tce_validate must be called before it. 182 - * 183 - * WARNING: This will be called in real-mode on HV KVM and virtual 184 - * mode on PR KVM 164 + * Cannot fail so kvmppc_rm_tce_validate must be called before it. 185 165 */ 186 - void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt, 166 + static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt, 187 167 unsigned long idx, unsigned long tce) 188 168 { 189 169 struct page *page; ··· 187 175 188 176 idx -= stt->offset; 189 177 page = stt->pages[idx / TCES_PER_PAGE]; 178 + /* 179 + * page must not be NULL in real mode, 180 + * kvmppc_rm_ioba_validate() must have taken care of this. 181 + */ 182 + WARN_ON_ONCE_RM(!page); 190 183 tbl = kvmppc_page_address(page); 191 184 192 185 tbl[idx % TCES_PER_PAGE] = tce; 193 186 } 194 - EXPORT_SYMBOL_GPL(kvmppc_tce_put); 195 187 196 - long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce, 197 - unsigned long *ua, unsigned long **prmap) 188 + /* 189 + * TCEs pages are allocated in kvmppc_rm_tce_put() which won't be able to do so 190 + * in real mode. 191 + * Check if kvmppc_rm_tce_put() can succeed in real mode, i.e. a TCEs page is 192 + * allocated or not required (when clearing a tce entry). 193 + */ 194 + static long kvmppc_rm_ioba_validate(struct kvmppc_spapr_tce_table *stt, 195 + unsigned long ioba, unsigned long npages, bool clearing) 198 196 { 199 - unsigned long gfn = tce >> PAGE_SHIFT; 200 - struct kvm_memory_slot *memslot; 197 + unsigned long i, idx, sttpage, sttpages; 198 + unsigned long ret = kvmppc_ioba_validate(stt, ioba, npages); 201 199 202 - memslot = search_memslots(kvm_memslots(kvm), gfn); 203 - if (!memslot) 204 - return -EINVAL; 200 + if (ret) 201 + return ret; 202 + /* 203 + * clearing==true says kvmppc_rm_tce_put won't be allocating pages 204 + * for empty tces. 205 + */ 206 + if (clearing) 207 + return H_SUCCESS; 205 208 206 - *ua = __gfn_to_hva_memslot(memslot, gfn) | 207 - (tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE)); 209 + idx = (ioba >> stt->page_shift) - stt->offset; 210 + sttpage = idx / TCES_PER_PAGE; 211 + sttpages = _ALIGN_UP(idx % TCES_PER_PAGE + npages, TCES_PER_PAGE) / 212 + TCES_PER_PAGE; 213 + for (i = sttpage; i < sttpage + sttpages; ++i) 214 + if (!stt->pages[i]) 215 + return H_TOO_HARD; 208 216 209 - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 210 - if (prmap) 211 - *prmap = &memslot->arch.rmap[gfn - memslot->base_gfn]; 212 - #endif 213 - 214 - return 0; 217 + return H_SUCCESS; 215 218 } 216 - EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua); 217 219 218 - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 219 220 static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl, 220 221 unsigned long entry, unsigned long *hpa, 221 222 enum dma_data_direction *direction) ··· 406 381 if (!stt) 407 382 return H_TOO_HARD; 408 383 409 - ret = kvmppc_ioba_validate(stt, ioba, 1); 384 + ret = kvmppc_rm_ioba_validate(stt, ioba, 1, tce == 0); 410 385 if (ret != H_SUCCESS) 411 386 return ret; 412 387 ··· 415 390 return ret; 416 391 417 392 dir = iommu_tce_direction(tce); 418 - if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) 393 + if ((dir != DMA_NONE) && kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) 419 394 return H_PARAMETER; 420 395 421 396 entry = ioba >> stt->page_shift; ··· 434 409 } 435 410 } 436 411 437 - kvmppc_tce_put(stt, entry, tce); 412 + kvmppc_rm_tce_put(stt, entry, tce); 438 413 439 414 return H_SUCCESS; 440 415 } ··· 505 480 if (tce_list & (SZ_4K - 1)) 506 481 return H_PARAMETER; 507 482 508 - ret = kvmppc_ioba_validate(stt, ioba, npages); 483 + ret = kvmppc_rm_ioba_validate(stt, ioba, npages, false); 509 484 if (ret != H_SUCCESS) 510 485 return ret; 511 486 ··· 517 492 */ 518 493 struct mm_iommu_table_group_mem_t *mem; 519 494 520 - if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) 495 + if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) 521 496 return H_TOO_HARD; 522 497 523 498 mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K); ··· 533 508 * We do not require memory to be preregistered in this case 534 509 * so lock rmap and do __find_linux_pte_or_hugepte(). 535 510 */ 536 - if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap)) 511 + if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap)) 537 512 return H_TOO_HARD; 538 513 539 514 rmap = (void *) vmalloc_to_phys(rmap); ··· 567 542 unsigned long tce = be64_to_cpu(((u64 *)tces)[i]); 568 543 569 544 ua = 0; 570 - if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) 545 + if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) 571 546 return H_PARAMETER; 572 547 573 548 list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { ··· 582 557 } 583 558 } 584 559 585 - kvmppc_tce_put(stt, entry + i, tce); 560 + kvmppc_rm_tce_put(stt, entry + i, tce); 586 561 } 587 562 588 563 unlock_exit: ··· 608 583 if (!stt) 609 584 return H_TOO_HARD; 610 585 611 - ret = kvmppc_ioba_validate(stt, ioba, npages); 586 + ret = kvmppc_rm_ioba_validate(stt, ioba, npages, tce_value == 0); 612 587 if (ret != H_SUCCESS) 613 588 return ret; 614 589 ··· 635 610 } 636 611 637 612 for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift)) 638 - kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value); 613 + kvmppc_rm_tce_put(stt, ioba >> stt->page_shift, tce_value); 639 614 640 615 return H_SUCCESS; 641 616 } ··· 660 635 661 636 idx = (ioba >> stt->page_shift) - stt->offset; 662 637 page = stt->pages[idx / TCES_PER_PAGE]; 638 + if (!page) { 639 + vcpu->arch.regs.gpr[4] = 0; 640 + return H_SUCCESS; 641 + } 663 642 tbl = (u64 *)page_address(page); 664 643 665 644 vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];

+97 -55

arch/powerpc/kvm/book3s_hv.c

··· 750 750 /* 751 751 * Ensure that the read of vcore->dpdes comes after the read 752 752 * of vcpu->doorbell_request. This barrier matches the 753 - * smb_wmb() in kvmppc_guest_entry_inject(). 753 + * smp_wmb() in kvmppc_guest_entry_inject(). 754 754 */ 755 755 smp_rmb(); 756 756 vc = vcpu->arch.vcore; ··· 800 800 default: 801 801 return H_TOO_HARD; 802 802 } 803 + } 804 + 805 + /* Copy guest memory in place - must reside within a single memslot */ 806 + static int kvmppc_copy_guest(struct kvm *kvm, gpa_t to, gpa_t from, 807 + unsigned long len) 808 + { 809 + struct kvm_memory_slot *to_memslot = NULL; 810 + struct kvm_memory_slot *from_memslot = NULL; 811 + unsigned long to_addr, from_addr; 812 + int r; 813 + 814 + /* Get HPA for from address */ 815 + from_memslot = gfn_to_memslot(kvm, from >> PAGE_SHIFT); 816 + if (!from_memslot) 817 + return -EFAULT; 818 + if ((from + len) >= ((from_memslot->base_gfn + from_memslot->npages) 819 + << PAGE_SHIFT)) 820 + return -EINVAL; 821 + from_addr = gfn_to_hva_memslot(from_memslot, from >> PAGE_SHIFT); 822 + if (kvm_is_error_hva(from_addr)) 823 + return -EFAULT; 824 + from_addr |= (from & (PAGE_SIZE - 1)); 825 + 826 + /* Get HPA for to address */ 827 + to_memslot = gfn_to_memslot(kvm, to >> PAGE_SHIFT); 828 + if (!to_memslot) 829 + return -EFAULT; 830 + if ((to + len) >= ((to_memslot->base_gfn + to_memslot->npages) 831 + << PAGE_SHIFT)) 832 + return -EINVAL; 833 + to_addr = gfn_to_hva_memslot(to_memslot, to >> PAGE_SHIFT); 834 + if (kvm_is_error_hva(to_addr)) 835 + return -EFAULT; 836 + to_addr |= (to & (PAGE_SIZE - 1)); 837 + 838 + /* Perform copy */ 839 + r = raw_copy_in_user((void __user *)to_addr, (void __user *)from_addr, 840 + len); 841 + if (r) 842 + return -EFAULT; 843 + mark_page_dirty(kvm, to >> PAGE_SHIFT); 844 + return 0; 845 + } 846 + 847 + static long kvmppc_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags, 848 + unsigned long dest, unsigned long src) 849 + { 850 + u64 pg_sz = SZ_4K; /* 4K page size */ 851 + u64 pg_mask = SZ_4K - 1; 852 + int ret; 853 + 854 + /* Check for invalid flags (H_PAGE_SET_LOANED covers all CMO flags) */ 855 + if (flags & ~(H_ICACHE_INVALIDATE | H_ICACHE_SYNCHRONIZE | 856 + H_ZERO_PAGE | H_COPY_PAGE | H_PAGE_SET_LOANED)) 857 + return H_PARAMETER; 858 + 859 + /* dest (and src if copy_page flag set) must be page aligned */ 860 + if ((dest & pg_mask) || ((flags & H_COPY_PAGE) && (src & pg_mask))) 861 + return H_PARAMETER; 862 + 863 + /* zero and/or copy the page as determined by the flags */ 864 + if (flags & H_COPY_PAGE) { 865 + ret = kvmppc_copy_guest(vcpu->kvm, dest, src, pg_sz); 866 + if (ret < 0) 867 + return H_PARAMETER; 868 + } else if (flags & H_ZERO_PAGE) { 869 + ret = kvm_clear_guest(vcpu->kvm, dest, pg_sz); 870 + if (ret < 0) 871 + return H_PARAMETER; 872 + } 873 + 874 + /* We can ignore the remaining flags */ 875 + 876 + return H_SUCCESS; 803 877 } 804 878 805 879 static int kvm_arch_vcpu_yield_to(struct kvm_vcpu *target) ··· 1078 1004 if (nesting_enabled(vcpu->kvm)) 1079 1005 ret = kvmhv_copy_tofrom_guest_nested(vcpu); 1080 1006 break; 1007 + case H_PAGE_INIT: 1008 + ret = kvmppc_h_page_init(vcpu, kvmppc_get_gpr(vcpu, 4), 1009 + kvmppc_get_gpr(vcpu, 5), 1010 + kvmppc_get_gpr(vcpu, 6)); 1011 + break; 1081 1012 default: 1082 1013 return RESUME_HOST; 1083 1014 } ··· 1127 1048 case H_IPOLL: 1128 1049 case H_XIRR_X: 1129 1050 #endif 1051 + case H_PAGE_INIT: 1130 1052 return 1; 1131 1053 } 1132 1054 ··· 2585 2505 } 2586 2506 } 2587 2507 2588 - static void kvmppc_radix_check_need_tlb_flush(struct kvm *kvm, int pcpu, 2589 - struct kvm_nested_guest *nested) 2590 - { 2591 - cpumask_t *need_tlb_flush; 2592 - int lpid; 2593 - 2594 - if (!cpu_has_feature(CPU_FTR_HVMODE)) 2595 - return; 2596 - 2597 - if (cpu_has_feature(CPU_FTR_ARCH_300)) 2598 - pcpu &= ~0x3UL; 2599 - 2600 - if (nested) { 2601 - lpid = nested->shadow_lpid; 2602 - need_tlb_flush = &nested->need_tlb_flush; 2603 - } else { 2604 - lpid = kvm->arch.lpid; 2605 - need_tlb_flush = &kvm->arch.need_tlb_flush; 2606 - } 2607 - 2608 - mtspr(SPRN_LPID, lpid); 2609 - isync(); 2610 - smp_mb(); 2611 - 2612 - if (cpumask_test_cpu(pcpu, need_tlb_flush)) { 2613 - radix__local_flush_tlb_lpid_guest(lpid); 2614 - /* Clear the bit after the TLB flush */ 2615 - cpumask_clear_cpu(pcpu, need_tlb_flush); 2616 - } 2617 - } 2618 - 2619 2508 static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc) 2620 2509 { 2621 2510 int cpu; ··· 3278 3229 for (sub = 0; sub < core_info.n_subcores; ++sub) 3279 3230 spin_unlock(&core_info.vc[sub]->lock); 3280 3231 3281 - if (kvm_is_radix(vc->kvm)) { 3282 - /* 3283 - * Do we need to flush the process scoped TLB for the LPAR? 3284 - * 3285 - * On POWER9, individual threads can come in here, but the 3286 - * TLB is shared between the 4 threads in a core, hence 3287 - * invalidating on one thread invalidates for all. 3288 - * Thus we make all 4 threads use the same bit here. 3289 - * 3290 - * Hash must be flushed in realmode in order to use tlbiel. 3291 - */ 3292 - kvmppc_radix_check_need_tlb_flush(vc->kvm, pcpu, NULL); 3293 - } 3232 + guest_enter_irqoff(); 3233 + 3234 + srcu_idx = srcu_read_lock(&vc->kvm->srcu); 3235 + 3236 + this_cpu_disable_ftrace(); 3294 3237 3295 3238 /* 3296 3239 * Interrupts will be enabled once we get into the guest, ··· 3290 3249 */ 3291 3250 trace_hardirqs_on(); 3292 3251 3293 - guest_enter_irqoff(); 3294 - 3295 - srcu_idx = srcu_read_lock(&vc->kvm->srcu); 3296 - 3297 - this_cpu_disable_ftrace(); 3298 - 3299 3252 trap = __kvmppc_vcore_entry(); 3253 + 3254 + trace_hardirqs_off(); 3300 3255 3301 3256 this_cpu_enable_ftrace(); 3302 3257 3303 3258 srcu_read_unlock(&vc->kvm->srcu, srcu_idx); 3304 3259 3305 - trace_hardirqs_off(); 3306 3260 set_irq_happened(trap); 3307 3261 3308 3262 spin_lock(&vc->lock); ··· 3550 3514 #ifdef CONFIG_ALTIVEC 3551 3515 load_vr_state(&vcpu->arch.vr); 3552 3516 #endif 3517 + mtspr(SPRN_VRSAVE, vcpu->arch.vrsave); 3553 3518 3554 3519 mtspr(SPRN_DSCR, vcpu->arch.dscr); 3555 3520 mtspr(SPRN_IAMR, vcpu->arch.iamr); ··· 3642 3605 #ifdef CONFIG_ALTIVEC 3643 3606 store_vr_state(&vcpu->arch.vr); 3644 3607 #endif 3608 + vcpu->arch.vrsave = mfspr(SPRN_VRSAVE); 3645 3609 3646 3610 if (cpu_has_feature(CPU_FTR_TM) || 3647 3611 cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) ··· 4008 3970 unsigned long lpcr) 4009 3971 { 4010 3972 int trap, r, pcpu; 4011 - int srcu_idx; 3973 + int srcu_idx, lpid; 4012 3974 struct kvmppc_vcore *vc; 4013 3975 struct kvm *kvm = vcpu->kvm; 4014 3976 struct kvm_nested_guest *nested = vcpu->arch.nested; ··· 4084 4046 vc->vcore_state = VCORE_RUNNING; 4085 4047 trace_kvmppc_run_core(vc, 0); 4086 4048 4087 - if (cpu_has_feature(CPU_FTR_HVMODE)) 4088 - kvmppc_radix_check_need_tlb_flush(kvm, pcpu, nested); 4049 + if (cpu_has_feature(CPU_FTR_HVMODE)) { 4050 + lpid = nested ? nested->shadow_lpid : kvm->arch.lpid; 4051 + mtspr(SPRN_LPID, lpid); 4052 + isync(); 4053 + kvmppc_check_need_tlb_flush(kvm, pcpu, nested); 4054 + } 4089 4055 4090 4056 trace_hardirqs_on(); 4091 4057 guest_enter_irqoff();

+57

arch/powerpc/kvm/book3s_hv_builtin.c

··· 805 805 vcpu->arch.doorbell_request = 0; 806 806 } 807 807 } 808 + 809 + static void flush_guest_tlb(struct kvm *kvm) 810 + { 811 + unsigned long rb, set; 812 + 813 + rb = PPC_BIT(52); /* IS = 2 */ 814 + if (kvm_is_radix(kvm)) { 815 + /* R=1 PRS=1 RIC=2 */ 816 + asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) 817 + : : "r" (rb), "i" (1), "i" (1), "i" (2), 818 + "r" (0) : "memory"); 819 + for (set = 1; set < kvm->arch.tlb_sets; ++set) { 820 + rb += PPC_BIT(51); /* increment set number */ 821 + /* R=1 PRS=1 RIC=0 */ 822 + asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) 823 + : : "r" (rb), "i" (1), "i" (1), "i" (0), 824 + "r" (0) : "memory"); 825 + } 826 + } else { 827 + for (set = 0; set < kvm->arch.tlb_sets; ++set) { 828 + /* R=0 PRS=0 RIC=0 */ 829 + asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) 830 + : : "r" (rb), "i" (0), "i" (0), "i" (0), 831 + "r" (0) : "memory"); 832 + rb += PPC_BIT(51); /* increment set number */ 833 + } 834 + } 835 + asm volatile("ptesync": : :"memory"); 836 + } 837 + 838 + void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu, 839 + struct kvm_nested_guest *nested) 840 + { 841 + cpumask_t *need_tlb_flush; 842 + 843 + /* 844 + * On POWER9, individual threads can come in here, but the 845 + * TLB is shared between the 4 threads in a core, hence 846 + * invalidating on one thread invalidates for all. 847 + * Thus we make all 4 threads use the same bit. 848 + */ 849 + if (cpu_has_feature(CPU_FTR_ARCH_300)) 850 + pcpu = cpu_first_thread_sibling(pcpu); 851 + 852 + if (nested) 853 + need_tlb_flush = &nested->need_tlb_flush; 854 + else 855 + need_tlb_flush = &kvm->arch.need_tlb_flush; 856 + 857 + if (cpumask_test_cpu(pcpu, need_tlb_flush)) { 858 + flush_guest_tlb(kvm); 859 + 860 + /* Clear the bit after the TLB flush */ 861 + cpumask_clear_cpu(pcpu, need_tlb_flush); 862 + } 863 + } 864 + EXPORT_SYMBOL_GPL(kvmppc_check_need_tlb_flush);

+144

arch/powerpc/kvm/book3s_hv_rm_mmu.c

··· 13 13 #include <linux/hugetlb.h> 14 14 #include <linux/module.h> 15 15 #include <linux/log2.h> 16 + #include <linux/sizes.h> 16 17 17 18 #include <asm/trace.h> 18 19 #include <asm/kvm_ppc.h> ··· 865 864 ret = H_SUCCESS; 866 865 out: 867 866 unlock_hpte(hpte, v & ~HPTE_V_HVLOCK); 867 + return ret; 868 + } 869 + 870 + static int kvmppc_get_hpa(struct kvm_vcpu *vcpu, unsigned long gpa, 871 + int writing, unsigned long *hpa, 872 + struct kvm_memory_slot **memslot_p) 873 + { 874 + struct kvm *kvm = vcpu->kvm; 875 + struct kvm_memory_slot *memslot; 876 + unsigned long gfn, hva, pa, psize = PAGE_SHIFT; 877 + unsigned int shift; 878 + pte_t *ptep, pte; 879 + 880 + /* Find the memslot for this address */ 881 + gfn = gpa >> PAGE_SHIFT; 882 + memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); 883 + if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) 884 + return H_PARAMETER; 885 + 886 + /* Translate to host virtual address */ 887 + hva = __gfn_to_hva_memslot(memslot, gfn); 888 + 889 + /* Try to find the host pte for that virtual address */ 890 + ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); 891 + if (!ptep) 892 + return H_TOO_HARD; 893 + pte = kvmppc_read_update_linux_pte(ptep, writing); 894 + if (!pte_present(pte)) 895 + return H_TOO_HARD; 896 + 897 + /* Convert to a physical address */ 898 + if (shift) 899 + psize = 1UL << shift; 900 + pa = pte_pfn(pte) << PAGE_SHIFT; 901 + pa |= hva & (psize - 1); 902 + pa |= gpa & ~PAGE_MASK; 903 + 904 + if (hpa) 905 + *hpa = pa; 906 + if (memslot_p) 907 + *memslot_p = memslot; 908 + 909 + return H_SUCCESS; 910 + } 911 + 912 + static long kvmppc_do_h_page_init_zero(struct kvm_vcpu *vcpu, 913 + unsigned long dest) 914 + { 915 + struct kvm_memory_slot *memslot; 916 + struct kvm *kvm = vcpu->kvm; 917 + unsigned long pa, mmu_seq; 918 + long ret = H_SUCCESS; 919 + int i; 920 + 921 + /* Used later to detect if we might have been invalidated */ 922 + mmu_seq = kvm->mmu_notifier_seq; 923 + smp_rmb(); 924 + 925 + ret = kvmppc_get_hpa(vcpu, dest, 1, &pa, &memslot); 926 + if (ret != H_SUCCESS) 927 + return ret; 928 + 929 + /* Check if we've been invalidated */ 930 + raw_spin_lock(&kvm->mmu_lock.rlock); 931 + if (mmu_notifier_retry(kvm, mmu_seq)) { 932 + ret = H_TOO_HARD; 933 + goto out_unlock; 934 + } 935 + 936 + /* Zero the page */ 937 + for (i = 0; i < SZ_4K; i += L1_CACHE_BYTES, pa += L1_CACHE_BYTES) 938 + dcbz((void *)pa); 939 + kvmppc_update_dirty_map(memslot, dest >> PAGE_SHIFT, PAGE_SIZE); 940 + 941 + out_unlock: 942 + raw_spin_unlock(&kvm->mmu_lock.rlock); 943 + return ret; 944 + } 945 + 946 + static long kvmppc_do_h_page_init_copy(struct kvm_vcpu *vcpu, 947 + unsigned long dest, unsigned long src) 948 + { 949 + unsigned long dest_pa, src_pa, mmu_seq; 950 + struct kvm_memory_slot *dest_memslot; 951 + struct kvm *kvm = vcpu->kvm; 952 + long ret = H_SUCCESS; 953 + 954 + /* Used later to detect if we might have been invalidated */ 955 + mmu_seq = kvm->mmu_notifier_seq; 956 + smp_rmb(); 957 + 958 + ret = kvmppc_get_hpa(vcpu, dest, 1, &dest_pa, &dest_memslot); 959 + if (ret != H_SUCCESS) 960 + return ret; 961 + ret = kvmppc_get_hpa(vcpu, src, 0, &src_pa, NULL); 962 + if (ret != H_SUCCESS) 963 + return ret; 964 + 965 + /* Check if we've been invalidated */ 966 + raw_spin_lock(&kvm->mmu_lock.rlock); 967 + if (mmu_notifier_retry(kvm, mmu_seq)) { 968 + ret = H_TOO_HARD; 969 + goto out_unlock; 970 + } 971 + 972 + /* Copy the page */ 973 + memcpy((void *)dest_pa, (void *)src_pa, SZ_4K); 974 + 975 + kvmppc_update_dirty_map(dest_memslot, dest >> PAGE_SHIFT, PAGE_SIZE); 976 + 977 + out_unlock: 978 + raw_spin_unlock(&kvm->mmu_lock.rlock); 979 + return ret; 980 + } 981 + 982 + long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags, 983 + unsigned long dest, unsigned long src) 984 + { 985 + struct kvm *kvm = vcpu->kvm; 986 + u64 pg_mask = SZ_4K - 1; /* 4K page size */ 987 + long ret = H_SUCCESS; 988 + 989 + /* Don't handle radix mode here, go up to the virtual mode handler */ 990 + if (kvm_is_radix(kvm)) 991 + return H_TOO_HARD; 992 + 993 + /* Check for invalid flags (H_PAGE_SET_LOANED covers all CMO flags) */ 994 + if (flags & ~(H_ICACHE_INVALIDATE | H_ICACHE_SYNCHRONIZE | 995 + H_ZERO_PAGE | H_COPY_PAGE | H_PAGE_SET_LOANED)) 996 + return H_PARAMETER; 997 + 998 + /* dest (and src if copy_page flag set) must be page aligned */ 999 + if ((dest & pg_mask) || ((flags & H_COPY_PAGE) && (src & pg_mask))) 1000 + return H_PARAMETER; 1001 + 1002 + /* zero and/or copy the page as determined by the flags */ 1003 + if (flags & H_COPY_PAGE) 1004 + ret = kvmppc_do_h_page_init_copy(vcpu, dest, src); 1005 + else if (flags & H_ZERO_PAGE) 1006 + ret = kvmppc_do_h_page_init_zero(vcpu, dest); 1007 + 1008 + /* We can ignore the other flags */ 1009 + 868 1010 return ret; 869 1011 } 870 1012

+33 -53

arch/powerpc/kvm/book3s_hv_rmhandlers.S

··· 589 589 1: 590 590 #endif 591 591 592 - /* Use cr7 as an indication of radix mode */ 593 592 ld r5, HSTATE_KVM_VCORE(r13) 594 593 ld r9, VCORE_KVM(r5) /* pointer to struct kvm */ 595 - lbz r0, KVM_RADIX(r9) 596 - cmpwi cr7, r0, 0 597 594 598 595 /* 599 596 * POWER7/POWER8 host -> guest partition switch code. ··· 613 616 cmpwi r6,0 614 617 bne 10f 615 618 616 - /* Radix has already switched LPID and flushed core TLB */ 617 - bne cr7, 22f 618 - 619 619 lwz r7,KVM_LPID(r9) 620 620 BEGIN_FTR_SECTION 621 621 ld r6,KVM_SDR1(r9) ··· 624 630 mtspr SPRN_LPID,r7 625 631 isync 626 632 627 - /* See if we need to flush the TLB. Hash has to be done in RM */ 628 - lhz r6,PACAPACAINDEX(r13) /* test_bit(cpu, need_tlb_flush) */ 629 - BEGIN_FTR_SECTION 630 - /* 631 - * On POWER9, individual threads can come in here, but the 632 - * TLB is shared between the 4 threads in a core, hence 633 - * invalidating on one thread invalidates for all. 634 - * Thus we make all 4 threads use the same bit here. 635 - */ 636 - clrrdi r6,r6,2 637 - END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) 638 - clrldi r7,r6,64-6 /* extract bit number (6 bits) */ 639 - srdi r6,r6,6 /* doubleword number */ 640 - sldi r6,r6,3 /* address offset */ 641 - add r6,r6,r9 642 - addi r6,r6,KVM_NEED_FLUSH /* dword in kvm->arch.need_tlb_flush */ 643 - li r8,1 644 - sld r8,r8,r7 645 - ld r7,0(r6) 646 - and. r7,r7,r8 647 - beq 22f 648 - /* Flush the TLB of any entries for this LPID */ 649 - lwz r0,KVM_TLB_SETS(r9) 650 - mtctr r0 651 - li r7,0x800 /* IS field = 0b10 */ 652 - ptesync 653 - li r0,0 /* RS for P9 version of tlbiel */ 654 - 28: tlbiel r7 /* On P9, rs=0, RIC=0, PRS=0, R=0 */ 655 - addi r7,r7,0x1000 656 - bdnz 28b 657 - ptesync 658 - 23: ldarx r7,0,r6 /* clear the bit after TLB flushed */ 659 - andc r7,r7,r8 660 - stdcx. r7,0,r6 661 - bne 23b 633 + /* See if we need to flush the TLB. */ 634 + mr r3, r9 /* kvm pointer */ 635 + lhz r4, PACAPACAINDEX(r13) /* physical cpu number */ 636 + li r5, 0 /* nested vcpu pointer */ 637 + bl kvmppc_check_need_tlb_flush 638 + nop 639 + ld r5, HSTATE_KVM_VCORE(r13) 662 640 663 641 /* Add timebase offset onto timebase */ 664 642 22: ld r8,VCORE_TB_OFFSET(r5) ··· 946 980 947 981 #ifdef CONFIG_KVM_XICS 948 982 /* We are entering the guest on that thread, push VCPU to XIVE */ 949 - ld r10, HSTATE_XIVE_TIMA_PHYS(r13) 950 - cmpldi cr0, r10, 0 951 - beq no_xive 952 983 ld r11, VCPU_XIVE_SAVED_STATE(r4) 953 984 li r9, TM_QW1_OS 985 + lwz r8, VCPU_XIVE_CAM_WORD(r4) 986 + li r7, TM_QW1_OS + TM_WORD2 987 + mfmsr r0 988 + andi. r0, r0, MSR_DR /* in real mode? */ 989 + beq 2f 990 + ld r10, HSTATE_XIVE_TIMA_VIRT(r13) 991 + cmpldi cr1, r10, 0 992 + beq cr1, no_xive 993 + eieio 994 + stdx r11,r9,r10 995 + stwx r8,r7,r10 996 + b 3f 997 + 2: ld r10, HSTATE_XIVE_TIMA_PHYS(r13) 998 + cmpldi cr1, r10, 0 999 + beq cr1, no_xive 954 1000 eieio 955 1001 stdcix r11,r9,r10 956 - lwz r11, VCPU_XIVE_CAM_WORD(r4) 957 - li r9, TM_QW1_OS + TM_WORD2 958 - stwcix r11,r9,r10 959 - li r9, 1 1002 + stwcix r8,r7,r10 1003 + 3: li r9, 1 960 1004 stb r9, VCPU_XIVE_PUSHED(r4) 961 1005 eieio 962 1006 ··· 985 1009 * on, we mask it. 986 1010 */ 987 1011 lbz r0, VCPU_XIVE_ESC_ON(r4) 988 - cmpwi r0,0 989 - beq 1f 990 - ld r10, VCPU_XIVE_ESC_RADDR(r4) 1012 + cmpwi cr1, r0,0 1013 + beq cr1, 1f 991 1014 li r9, XIVE_ESB_SET_PQ_01 1015 + beq 4f /* in real mode? */ 1016 + ld r10, VCPU_XIVE_ESC_VADDR(r4) 1017 + ldx r0, r10, r9 1018 + b 5f 1019 + 4: ld r10, VCPU_XIVE_ESC_RADDR(r4) 992 1020 ldcix r0, r10, r9 993 - sync 1021 + 5: sync 994 1022 995 1023 /* We have a possible subtle race here: The escalation interrupt might 996 1024 * have fired and be on its way to the host queue while we mask it, ··· 2272 2292 #endif 2273 2293 .long 0 /* 0x24 - H_SET_SPRG0 */ 2274 2294 .long DOTSYM(kvmppc_h_set_dabr) - hcall_real_table 2275 - .long 0 /* 0x2c */ 2295 + .long DOTSYM(kvmppc_rm_h_page_init) - hcall_real_table 2276 2296 .long 0 /* 0x30 */ 2277 2297 .long 0 /* 0x34 */ 2278 2298 .long 0 /* 0x38 */

+188 -62

arch/powerpc/kvm/book3s_xive.c

··· 166 166 return IRQ_HANDLED; 167 167 } 168 168 169 - static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio) 169 + int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio, 170 + bool single_escalation) 170 171 { 171 172 struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 172 173 struct xive_q *q = &xc->queues[prio]; ··· 186 185 return -EIO; 187 186 } 188 187 189 - if (xc->xive->single_escalation) 188 + if (single_escalation) 190 189 name = kasprintf(GFP_KERNEL, "kvm-%d-%d", 191 190 vcpu->kvm->arch.lpid, xc->server_num); 192 191 else ··· 218 217 * interrupt, thus leaving it effectively masked after 219 218 * it fires once. 220 219 */ 221 - if (xc->xive->single_escalation) { 220 + if (single_escalation) { 222 221 struct irq_data *d = irq_get_irq_data(xc->esc_virq[prio]); 223 222 struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); 224 223 ··· 292 291 continue; 293 292 rc = xive_provision_queue(vcpu, prio); 294 293 if (rc == 0 && !xive->single_escalation) 295 - xive_attach_escalation(vcpu, prio); 294 + kvmppc_xive_attach_escalation(vcpu, prio, 295 + xive->single_escalation); 296 296 if (rc) 297 297 return rc; 298 298 } ··· 344 342 return atomic_add_unless(&q->count, 1, max) ? 0 : -EBUSY; 345 343 } 346 344 347 - static int xive_select_target(struct kvm *kvm, u32 *server, u8 prio) 345 + int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio) 348 346 { 349 347 struct kvm_vcpu *vcpu; 350 348 int i, rc; ··· 380 378 381 379 /* No available target ! */ 382 380 return -EBUSY; 383 - } 384 - 385 - static u32 xive_vp(struct kvmppc_xive *xive, u32 server) 386 - { 387 - return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); 388 381 } 389 382 390 383 static u8 xive_lock_and_mask(struct kvmppc_xive *xive, ··· 427 430 */ 428 431 if (xd->flags & OPAL_XIVE_IRQ_MASK_VIA_FW) { 429 432 xive_native_configure_irq(hw_num, 430 - xive_vp(xive, state->act_server), 431 - MASKED, state->number); 433 + kvmppc_xive_vp(xive, state->act_server), 434 + MASKED, state->number); 432 435 /* set old_p so we can track if an H_EOI was done */ 433 436 state->old_p = true; 434 437 state->old_q = false; ··· 483 486 */ 484 487 if (xd->flags & OPAL_XIVE_IRQ_MASK_VIA_FW) { 485 488 xive_native_configure_irq(hw_num, 486 - xive_vp(xive, state->act_server), 487 - state->act_priority, state->number); 489 + kvmppc_xive_vp(xive, state->act_server), 490 + state->act_priority, state->number); 488 491 /* If an EOI is needed, do it here */ 489 492 if (!state->old_p) 490 493 xive_vm_source_eoi(hw_num, xd); ··· 532 535 * priority. The count for that new target will have 533 536 * already been incremented. 534 537 */ 535 - rc = xive_select_target(kvm, &server, prio); 538 + rc = kvmppc_xive_select_target(kvm, &server, prio); 536 539 537 540 /* 538 541 * We failed to find a target ? Not much we can do ··· 560 563 kvmppc_xive_select_irq(state, &hw_num, NULL); 561 564 562 565 return xive_native_configure_irq(hw_num, 563 - xive_vp(xive, server), 566 + kvmppc_xive_vp(xive, server), 564 567 prio, state->number); 565 568 } 566 569 ··· 846 849 847 850 /* 848 851 * We can't update the state of a "pushed" VCPU, but that 849 - * shouldn't happen. 852 + * shouldn't happen because the vcpu->mutex makes running a 853 + * vcpu mutually exclusive with doing one_reg get/set on it. 850 854 */ 851 855 if (WARN_ON(vcpu->arch.xive_pushed)) 852 856 return -EIO; ··· 938 940 /* Turn the IPI hard off */ 939 941 xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); 940 942 943 + /* 944 + * Reset ESB guest mapping. Needed when ESB pages are exposed 945 + * to the guest in XIVE native mode 946 + */ 947 + if (xive->ops && xive->ops->reset_mapped) 948 + xive->ops->reset_mapped(kvm, guest_irq); 949 + 941 950 /* Grab info about irq */ 942 951 state->pt_number = hw_irq; 943 952 state->pt_data = irq_data_get_irq_handler_data(host_data); ··· 956 951 * which is fine for a never started interrupt. 957 952 */ 958 953 xive_native_configure_irq(hw_irq, 959 - xive_vp(xive, state->act_server), 954 + kvmppc_xive_vp(xive, state->act_server), 960 955 state->act_priority, state->number); 961 956 962 957 /* ··· 1030 1025 state->pt_number = 0; 1031 1026 state->pt_data = NULL; 1032 1027 1028 + /* 1029 + * Reset ESB guest mapping. Needed when ESB pages are exposed 1030 + * to the guest in XIVE native mode 1031 + */ 1032 + if (xive->ops && xive->ops->reset_mapped) { 1033 + xive->ops->reset_mapped(kvm, guest_irq); 1034 + } 1035 + 1033 1036 /* Reconfigure the IPI */ 1034 1037 xive_native_configure_irq(state->ipi_number, 1035 - xive_vp(xive, state->act_server), 1038 + kvmppc_xive_vp(xive, state->act_server), 1036 1039 state->act_priority, state->number); 1037 1040 1038 1041 /* ··· 1062 1049 } 1063 1050 EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped); 1064 1051 1065 - static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu) 1052 + void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu) 1066 1053 { 1067 1054 struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1068 1055 struct kvm *kvm = vcpu->kvm; ··· 1096 1083 arch_spin_unlock(&sb->lock); 1097 1084 } 1098 1085 } 1086 + 1087 + /* Disable vcpu's escalation interrupt */ 1088 + if (vcpu->arch.xive_esc_on) { 1089 + __raw_readq((void __iomem *)(vcpu->arch.xive_esc_vaddr + 1090 + XIVE_ESB_SET_PQ_01)); 1091 + vcpu->arch.xive_esc_on = false; 1092 + } 1093 + 1094 + /* 1095 + * Clear pointers to escalation interrupt ESB. 1096 + * This is safe because the vcpu->mutex is held, preventing 1097 + * any other CPU from concurrently executing a KVM_RUN ioctl. 1098 + */ 1099 + vcpu->arch.xive_esc_vaddr = 0; 1100 + vcpu->arch.xive_esc_raddr = 0; 1099 1101 } 1100 1102 1101 1103 void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) 1102 1104 { 1103 1105 struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1104 - struct kvmppc_xive *xive = xc->xive; 1106 + struct kvmppc_xive *xive = vcpu->kvm->arch.xive; 1105 1107 int i; 1108 + 1109 + if (!kvmppc_xics_enabled(vcpu)) 1110 + return; 1111 + 1112 + if (!xc) 1113 + return; 1106 1114 1107 1115 pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num); 1108 1116 ··· 1163 1129 } 1164 1130 /* Free the VP */ 1165 1131 kfree(xc); 1132 + 1133 + /* Cleanup the vcpu */ 1134 + vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT; 1135 + vcpu->arch.xive_vcpu = NULL; 1166 1136 } 1167 1137 1168 1138 int kvmppc_xive_connect_vcpu(struct kvm_device *dev, ··· 1184 1146 } 1185 1147 if (xive->kvm != vcpu->kvm) 1186 1148 return -EPERM; 1187 - if (vcpu->arch.irq_type) 1149 + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT) 1188 1150 return -EBUSY; 1189 1151 if (kvmppc_xive_find_server(vcpu->kvm, cpu)) { 1190 1152 pr_devel("Duplicate !\n"); ··· 1204 1166 xc->xive = xive; 1205 1167 xc->vcpu = vcpu; 1206 1168 xc->server_num = cpu; 1207 - xc->vp_id = xive_vp(xive, cpu); 1169 + xc->vp_id = kvmppc_xive_vp(xive, cpu); 1208 1170 xc->mfrr = 0xff; 1209 1171 xc->valid = true; 1210 1172 ··· 1257 1219 if (xive->qmap & (1 << i)) { 1258 1220 r = xive_provision_queue(vcpu, i); 1259 1221 if (r == 0 && !xive->single_escalation) 1260 - xive_attach_escalation(vcpu, i); 1222 + kvmppc_xive_attach_escalation( 1223 + vcpu, i, xive->single_escalation); 1261 1224 if (r) 1262 1225 goto bail; 1263 1226 } else { ··· 1273 1234 } 1274 1235 1275 1236 /* If not done above, attach priority 0 escalation */ 1276 - r = xive_attach_escalation(vcpu, 0); 1237 + r = kvmppc_xive_attach_escalation(vcpu, 0, xive->single_escalation); 1277 1238 if (r) 1278 1239 goto bail; 1279 1240 ··· 1524 1485 return 0; 1525 1486 } 1526 1487 1527 - static struct kvmppc_xive_src_block *xive_create_src_block(struct kvmppc_xive *xive, 1528 - int irq) 1488 + struct kvmppc_xive_src_block *kvmppc_xive_create_src_block( 1489 + struct kvmppc_xive *xive, int irq) 1529 1490 { 1530 1491 struct kvm *kvm = xive->kvm; 1531 1492 struct kvmppc_xive_src_block *sb; ··· 1548 1509 1549 1510 for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { 1550 1511 sb->irq_state[i].number = (bid << KVMPPC_XICS_ICS_SHIFT) | i; 1512 + sb->irq_state[i].eisn = 0; 1551 1513 sb->irq_state[i].guest_priority = MASKED; 1552 1514 sb->irq_state[i].saved_priority = MASKED; 1553 1515 sb->irq_state[i].act_priority = MASKED; ··· 1605 1565 sb = kvmppc_xive_find_source(xive, irq, &idx); 1606 1566 if (!sb) { 1607 1567 pr_devel("No source, creating source block...\n"); 1608 - sb = xive_create_src_block(xive, irq); 1568 + sb = kvmppc_xive_create_src_block(xive, irq); 1609 1569 if (!sb) { 1610 1570 pr_devel("Failed to create block...\n"); 1611 1571 return -ENOMEM; ··· 1829 1789 xive_cleanup_irq_data(xd); 1830 1790 } 1831 1791 1832 - static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb) 1792 + void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb) 1833 1793 { 1834 1794 int i; 1835 1795 ··· 1850 1810 } 1851 1811 } 1852 1812 1853 - static void kvmppc_xive_free(struct kvm_device *dev) 1813 + /* 1814 + * Called when device fd is closed. kvm->lock is held. 1815 + */ 1816 + static void kvmppc_xive_release(struct kvm_device *dev) 1854 1817 { 1855 1818 struct kvmppc_xive *xive = dev->private; 1856 1819 struct kvm *kvm = xive->kvm; 1820 + struct kvm_vcpu *vcpu; 1857 1821 int i; 1822 + int was_ready; 1823 + 1824 + pr_devel("Releasing xive device\n"); 1858 1825 1859 1826 debugfs_remove(xive->dentry); 1860 1827 1861 - if (kvm) 1862 - kvm->arch.xive = NULL; 1828 + /* 1829 + * Clearing mmu_ready temporarily while holding kvm->lock 1830 + * is a way of ensuring that no vcpus can enter the guest 1831 + * until we drop kvm->lock. Doing kick_all_cpus_sync() 1832 + * ensures that any vcpu executing inside the guest has 1833 + * exited the guest. Once kick_all_cpus_sync() has finished, 1834 + * we know that no vcpu can be executing the XIVE push or 1835 + * pull code, or executing a XICS hcall. 1836 + * 1837 + * Since this is the device release function, we know that 1838 + * userspace does not have any open fd referring to the 1839 + * device. Therefore there can not be any of the device 1840 + * attribute set/get functions being executed concurrently, 1841 + * and similarly, the connect_vcpu and set/clr_mapped 1842 + * functions also cannot be being executed. 1843 + */ 1844 + was_ready = kvm->arch.mmu_ready; 1845 + kvm->arch.mmu_ready = 0; 1846 + kick_all_cpus_sync(); 1847 + 1848 + /* 1849 + * We should clean up the vCPU interrupt presenters first. 1850 + */ 1851 + kvm_for_each_vcpu(i, vcpu, kvm) { 1852 + /* 1853 + * Take vcpu->mutex to ensure that no one_reg get/set ioctl 1854 + * (i.e. kvmppc_xive_[gs]et_icp) can be done concurrently. 1855 + */ 1856 + mutex_lock(&vcpu->mutex); 1857 + kvmppc_xive_cleanup_vcpu(vcpu); 1858 + mutex_unlock(&vcpu->mutex); 1859 + } 1860 + 1861 + kvm->arch.xive = NULL; 1863 1862 1864 1863 /* Mask and free interrupts */ 1865 1864 for (i = 0; i <= xive->max_sbid; i++) { ··· 1911 1832 if (xive->vp_base != XIVE_INVALID_VP) 1912 1833 xive_native_free_vp_block(xive->vp_base); 1913 1834 1835 + kvm->arch.mmu_ready = was_ready; 1914 1836 1915 - kfree(xive); 1837 + /* 1838 + * A reference of the kvmppc_xive pointer is now kept under 1839 + * the xive_devices struct of the machine for reuse. It is 1840 + * freed when the VM is destroyed for now until we fix all the 1841 + * execution paths. 1842 + */ 1843 + 1916 1844 kfree(dev); 1917 1845 } 1918 1846 1847 + /* 1848 + * When the guest chooses the interrupt mode (XICS legacy or XIVE 1849 + * native), the VM will switch of KVM device. The previous device will 1850 + * be "released" before the new one is created. 1851 + * 1852 + * Until we are sure all execution paths are well protected, provide a 1853 + * fail safe (transitional) method for device destruction, in which 1854 + * the XIVE device pointer is recycled and not directly freed. 1855 + */ 1856 + struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type) 1857 + { 1858 + struct kvmppc_xive **kvm_xive_device = type == KVM_DEV_TYPE_XIVE ? 1859 + &kvm->arch.xive_devices.native : 1860 + &kvm->arch.xive_devices.xics_on_xive; 1861 + struct kvmppc_xive *xive = *kvm_xive_device; 1862 + 1863 + if (!xive) { 1864 + xive = kzalloc(sizeof(*xive), GFP_KERNEL); 1865 + *kvm_xive_device = xive; 1866 + } else { 1867 + memset(xive, 0, sizeof(*xive)); 1868 + } 1869 + 1870 + return xive; 1871 + } 1872 + 1873 + /* 1874 + * Create a XICS device with XIVE backend. kvm->lock is held. 1875 + */ 1919 1876 static int kvmppc_xive_create(struct kvm_device *dev, u32 type) 1920 1877 { 1921 1878 struct kvmppc_xive *xive; ··· 1960 1845 1961 1846 pr_devel("Creating xive for partition\n"); 1962 1847 1963 - xive = kzalloc(sizeof(*xive), GFP_KERNEL); 1848 + xive = kvmppc_xive_get_device(kvm, type); 1964 1849 if (!xive) 1965 1850 return -ENOMEM; 1966 1851 ··· 1998 1883 return 0; 1999 1884 } 2000 1885 1886 + int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu) 1887 + { 1888 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1889 + unsigned int i; 1890 + 1891 + for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { 1892 + struct xive_q *q = &xc->queues[i]; 1893 + u32 i0, i1, idx; 1894 + 1895 + if (!q->qpage && !xc->esc_virq[i]) 1896 + continue; 1897 + 1898 + seq_printf(m, " [q%d]: ", i); 1899 + 1900 + if (q->qpage) { 1901 + idx = q->idx; 1902 + i0 = be32_to_cpup(q->qpage + idx); 1903 + idx = (idx + 1) & q->msk; 1904 + i1 = be32_to_cpup(q->qpage + idx); 1905 + seq_printf(m, "T=%d %08x %08x...\n", q->toggle, 1906 + i0, i1); 1907 + } 1908 + if (xc->esc_virq[i]) { 1909 + struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]); 1910 + struct xive_irq_data *xd = 1911 + irq_data_get_irq_handler_data(d); 1912 + u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET); 1913 + 1914 + seq_printf(m, "E:%c%c I(%d:%llx:%llx)", 1915 + (pq & XIVE_ESB_VAL_P) ? 'P' : 'p', 1916 + (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q', 1917 + xc->esc_virq[i], pq, xd->eoi_page); 1918 + seq_puts(m, "\n"); 1919 + } 1920 + } 1921 + return 0; 1922 + } 2001 1923 2002 1924 static int xive_debug_show(struct seq_file *m, void *private) 2003 1925 { ··· 2060 1908 2061 1909 kvm_for_each_vcpu(i, vcpu, kvm) { 2062 1910 struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 2063 - unsigned int i; 2064 1911 2065 1912 if (!xc) 2066 1913 continue; ··· 2069 1918 xc->server_num, xc->cppr, xc->hw_cppr, 2070 1919 xc->mfrr, xc->pending, 2071 1920 xc->stat_rm_h_xirr, xc->stat_vm_h_xirr); 2072 - for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { 2073 - struct xive_q *q = &xc->queues[i]; 2074 - u32 i0, i1, idx; 2075 1921 2076 - if (!q->qpage && !xc->esc_virq[i]) 2077 - continue; 2078 - 2079 - seq_printf(m, " [q%d]: ", i); 2080 - 2081 - if (q->qpage) { 2082 - idx = q->idx; 2083 - i0 = be32_to_cpup(q->qpage + idx); 2084 - idx = (idx + 1) & q->msk; 2085 - i1 = be32_to_cpup(q->qpage + idx); 2086 - seq_printf(m, "T=%d %08x %08x... \n", q->toggle, i0, i1); 2087 - } 2088 - if (xc->esc_virq[i]) { 2089 - struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]); 2090 - struct xive_irq_data *xd = irq_data_get_irq_handler_data(d); 2091 - u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET); 2092 - seq_printf(m, "E:%c%c I(%d:%llx:%llx)", 2093 - (pq & XIVE_ESB_VAL_P) ? 'P' : 'p', 2094 - (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q', 2095 - xc->esc_virq[i], pq, xd->eoi_page); 2096 - seq_printf(m, "\n"); 2097 - } 2098 - } 1922 + kvmppc_xive_debug_show_queues(m, vcpu); 2099 1923 2100 1924 t_rm_h_xirr += xc->stat_rm_h_xirr; 2101 1925 t_rm_h_ipoll += xc->stat_rm_h_ipoll; ··· 2125 1999 .name = "kvm-xive", 2126 2000 .create = kvmppc_xive_create, 2127 2001 .init = kvmppc_xive_init, 2128 - .destroy = kvmppc_xive_free, 2002 + .release = kvmppc_xive_release, 2129 2003 .set_attr = xive_set_attr, 2130 2004 .get_attr = xive_get_attr, 2131 2005 .has_attr = xive_has_attr,

+37

arch/powerpc/kvm/book3s_xive.h

··· 13 13 #include "book3s_xics.h" 14 14 15 15 /* 16 + * The XIVE Interrupt source numbers are within the range 0 to 17 + * KVMPPC_XICS_NR_IRQS. 18 + */ 19 + #define KVMPPC_XIVE_FIRST_IRQ 0 20 + #define KVMPPC_XIVE_NR_IRQS KVMPPC_XICS_NR_IRQS 21 + 22 + /* 16 23 * State for one guest irq source. 17 24 * 18 25 * For each guest source we allocate a HW interrupt in the XIVE ··· 61 54 bool saved_p; 62 55 bool saved_q; 63 56 u8 saved_scan_prio; 57 + 58 + /* Xive native */ 59 + u32 eisn; /* Guest Effective IRQ number */ 64 60 }; 65 61 66 62 /* Select the "right" interrupt (IPI vs. passthrough) */ ··· 94 84 struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS]; 95 85 }; 96 86 87 + struct kvmppc_xive; 88 + 89 + struct kvmppc_xive_ops { 90 + int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq); 91 + }; 97 92 98 93 struct kvmppc_xive { 99 94 struct kvm *kvm; ··· 137 122 138 123 /* Flags */ 139 124 u8 single_escalation; 125 + 126 + struct kvmppc_xive_ops *ops; 127 + struct address_space *mapping; 128 + struct mutex mapping_lock; 140 129 }; 141 130 142 131 #define KVMPPC_XIVE_Q_COUNT 8 ··· 217 198 return xive->src_blocks[bid]; 218 199 } 219 200 201 + static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server) 202 + { 203 + return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); 204 + } 205 + 220 206 /* 221 207 * Mapping between guest priorities and host priorities 222 208 * is as follow. ··· 271 247 unsigned long mfrr); 272 248 extern int (*__xive_vm_h_cppr)(struct kvm_vcpu *vcpu, unsigned long cppr); 273 249 extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, unsigned long xirr); 250 + 251 + /* 252 + * Common Xive routines for XICS-over-XIVE and XIVE native 253 + */ 254 + void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu); 255 + int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu); 256 + struct kvmppc_xive_src_block *kvmppc_xive_create_src_block( 257 + struct kvmppc_xive *xive, int irq); 258 + void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb); 259 + int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio); 260 + int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio, 261 + bool single_escalation); 262 + struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type); 274 263 275 264 #endif /* CONFIG_KVM_XICS */ 276 265 #endif /* _KVM_PPC_BOOK3S_XICS_H */

+1249

arch/powerpc/kvm/book3s_xive_native.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (c) 2017-2019, IBM Corporation. 4 + */ 5 + 6 + #define pr_fmt(fmt) "xive-kvm: " fmt 7 + 8 + #include <linux/kernel.h> 9 + #include <linux/kvm_host.h> 10 + #include <linux/err.h> 11 + #include <linux/gfp.h> 12 + #include <linux/spinlock.h> 13 + #include <linux/delay.h> 14 + #include <linux/file.h> 15 + #include <asm/uaccess.h> 16 + #include <asm/kvm_book3s.h> 17 + #include <asm/kvm_ppc.h> 18 + #include <asm/hvcall.h> 19 + #include <asm/xive.h> 20 + #include <asm/xive-regs.h> 21 + #include <asm/debug.h> 22 + #include <asm/debugfs.h> 23 + #include <asm/opal.h> 24 + 25 + #include <linux/debugfs.h> 26 + #include <linux/seq_file.h> 27 + 28 + #include "book3s_xive.h" 29 + 30 + static u8 xive_vm_esb_load(struct xive_irq_data *xd, u32 offset) 31 + { 32 + u64 val; 33 + 34 + if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG) 35 + offset |= offset << 4; 36 + 37 + val = in_be64(xd->eoi_mmio + offset); 38 + return (u8)val; 39 + } 40 + 41 + static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio) 42 + { 43 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 44 + struct xive_q *q = &xc->queues[prio]; 45 + 46 + xive_native_disable_queue(xc->vp_id, q, prio); 47 + if (q->qpage) { 48 + put_page(virt_to_page(q->qpage)); 49 + q->qpage = NULL; 50 + } 51 + } 52 + 53 + void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) 54 + { 55 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 56 + int i; 57 + 58 + if (!kvmppc_xive_enabled(vcpu)) 59 + return; 60 + 61 + if (!xc) 62 + return; 63 + 64 + pr_devel("native_cleanup_vcpu(cpu=%d)\n", xc->server_num); 65 + 66 + /* Ensure no interrupt is still routed to that VP */ 67 + xc->valid = false; 68 + kvmppc_xive_disable_vcpu_interrupts(vcpu); 69 + 70 + /* Disable the VP */ 71 + xive_native_disable_vp(xc->vp_id); 72 + 73 + /* Free the queues & associated interrupts */ 74 + for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) { 75 + /* Free the escalation irq */ 76 + if (xc->esc_virq[i]) { 77 + free_irq(xc->esc_virq[i], vcpu); 78 + irq_dispose_mapping(xc->esc_virq[i]); 79 + kfree(xc->esc_virq_names[i]); 80 + xc->esc_virq[i] = 0; 81 + } 82 + 83 + /* Free the queue */ 84 + kvmppc_xive_native_cleanup_queue(vcpu, i); 85 + } 86 + 87 + /* Free the VP */ 88 + kfree(xc); 89 + 90 + /* Cleanup the vcpu */ 91 + vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT; 92 + vcpu->arch.xive_vcpu = NULL; 93 + } 94 + 95 + int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev, 96 + struct kvm_vcpu *vcpu, u32 server_num) 97 + { 98 + struct kvmppc_xive *xive = dev->private; 99 + struct kvmppc_xive_vcpu *xc = NULL; 100 + int rc; 101 + 102 + pr_devel("native_connect_vcpu(server=%d)\n", server_num); 103 + 104 + if (dev->ops != &kvm_xive_native_ops) { 105 + pr_devel("Wrong ops !\n"); 106 + return -EPERM; 107 + } 108 + if (xive->kvm != vcpu->kvm) 109 + return -EPERM; 110 + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT) 111 + return -EBUSY; 112 + if (server_num >= KVM_MAX_VCPUS) { 113 + pr_devel("Out of bounds !\n"); 114 + return -EINVAL; 115 + } 116 + 117 + mutex_lock(&vcpu->kvm->lock); 118 + 119 + if (kvmppc_xive_find_server(vcpu->kvm, server_num)) { 120 + pr_devel("Duplicate !\n"); 121 + rc = -EEXIST; 122 + goto bail; 123 + } 124 + 125 + xc = kzalloc(sizeof(*xc), GFP_KERNEL); 126 + if (!xc) { 127 + rc = -ENOMEM; 128 + goto bail; 129 + } 130 + 131 + vcpu->arch.xive_vcpu = xc; 132 + xc->xive = xive; 133 + xc->vcpu = vcpu; 134 + xc->server_num = server_num; 135 + 136 + xc->vp_id = kvmppc_xive_vp(xive, server_num); 137 + xc->valid = true; 138 + vcpu->arch.irq_type = KVMPPC_IRQ_XIVE; 139 + 140 + rc = xive_native_get_vp_info(xc->vp_id, &xc->vp_cam, &xc->vp_chip_id); 141 + if (rc) { 142 + pr_err("Failed to get VP info from OPAL: %d\n", rc); 143 + goto bail; 144 + } 145 + 146 + /* 147 + * Enable the VP first as the single escalation mode will 148 + * affect escalation interrupts numbering 149 + */ 150 + rc = xive_native_enable_vp(xc->vp_id, xive->single_escalation); 151 + if (rc) { 152 + pr_err("Failed to enable VP in OPAL: %d\n", rc); 153 + goto bail; 154 + } 155 + 156 + /* Configure VCPU fields for use by assembly push/pull */ 157 + vcpu->arch.xive_saved_state.w01 = cpu_to_be64(0xff000000); 158 + vcpu->arch.xive_cam_word = cpu_to_be32(xc->vp_cam | TM_QW1W2_VO); 159 + 160 + /* TODO: reset all queues to a clean state ? */ 161 + bail: 162 + mutex_unlock(&vcpu->kvm->lock); 163 + if (rc) 164 + kvmppc_xive_native_cleanup_vcpu(vcpu); 165 + 166 + return rc; 167 + } 168 + 169 + /* 170 + * Device passthrough support 171 + */ 172 + static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long irq) 173 + { 174 + struct kvmppc_xive *xive = kvm->arch.xive; 175 + 176 + if (irq >= KVMPPC_XIVE_NR_IRQS) 177 + return -EINVAL; 178 + 179 + /* 180 + * Clear the ESB pages of the IRQ number being mapped (or 181 + * unmapped) into the guest and let the the VM fault handler 182 + * repopulate with the appropriate ESB pages (device or IC) 183 + */ 184 + pr_debug("clearing esb pages for girq 0x%lx\n", irq); 185 + mutex_lock(&xive->mapping_lock); 186 + if (xive->mapping) 187 + unmap_mapping_range(xive->mapping, 188 + irq * (2ull << PAGE_SHIFT), 189 + 2ull << PAGE_SHIFT, 1); 190 + mutex_unlock(&xive->mapping_lock); 191 + return 0; 192 + } 193 + 194 + static struct kvmppc_xive_ops kvmppc_xive_native_ops = { 195 + .reset_mapped = kvmppc_xive_native_reset_mapped, 196 + }; 197 + 198 + static vm_fault_t xive_native_esb_fault(struct vm_fault *vmf) 199 + { 200 + struct vm_area_struct *vma = vmf->vma; 201 + struct kvm_device *dev = vma->vm_file->private_data; 202 + struct kvmppc_xive *xive = dev->private; 203 + struct kvmppc_xive_src_block *sb; 204 + struct kvmppc_xive_irq_state *state; 205 + struct xive_irq_data *xd; 206 + u32 hw_num; 207 + u16 src; 208 + u64 page; 209 + unsigned long irq; 210 + u64 page_offset; 211 + 212 + /* 213 + * Linux/KVM uses a two pages ESB setting, one for trigger and 214 + * one for EOI 215 + */ 216 + page_offset = vmf->pgoff - vma->vm_pgoff; 217 + irq = page_offset / 2; 218 + 219 + sb = kvmppc_xive_find_source(xive, irq, &src); 220 + if (!sb) { 221 + pr_devel("%s: source %lx not found !\n", __func__, irq); 222 + return VM_FAULT_SIGBUS; 223 + } 224 + 225 + state = &sb->irq_state[src]; 226 + kvmppc_xive_select_irq(state, &hw_num, &xd); 227 + 228 + arch_spin_lock(&sb->lock); 229 + 230 + /* 231 + * first/even page is for trigger 232 + * second/odd page is for EOI and management. 233 + */ 234 + page = page_offset % 2 ? xd->eoi_page : xd->trig_page; 235 + arch_spin_unlock(&sb->lock); 236 + 237 + if (WARN_ON(!page)) { 238 + pr_err("%s: accessing invalid ESB page for source %lx !\n", 239 + __func__, irq); 240 + return VM_FAULT_SIGBUS; 241 + } 242 + 243 + vmf_insert_pfn(vma, vmf->address, page >> PAGE_SHIFT); 244 + return VM_FAULT_NOPAGE; 245 + } 246 + 247 + static const struct vm_operations_struct xive_native_esb_vmops = { 248 + .fault = xive_native_esb_fault, 249 + }; 250 + 251 + static vm_fault_t xive_native_tima_fault(struct vm_fault *vmf) 252 + { 253 + struct vm_area_struct *vma = vmf->vma; 254 + 255 + switch (vmf->pgoff - vma->vm_pgoff) { 256 + case 0: /* HW - forbid access */ 257 + case 1: /* HV - forbid access */ 258 + return VM_FAULT_SIGBUS; 259 + case 2: /* OS */ 260 + vmf_insert_pfn(vma, vmf->address, xive_tima_os >> PAGE_SHIFT); 261 + return VM_FAULT_NOPAGE; 262 + case 3: /* USER - TODO */ 263 + default: 264 + return VM_FAULT_SIGBUS; 265 + } 266 + } 267 + 268 + static const struct vm_operations_struct xive_native_tima_vmops = { 269 + .fault = xive_native_tima_fault, 270 + }; 271 + 272 + static int kvmppc_xive_native_mmap(struct kvm_device *dev, 273 + struct vm_area_struct *vma) 274 + { 275 + struct kvmppc_xive *xive = dev->private; 276 + 277 + /* We only allow mappings at fixed offset for now */ 278 + if (vma->vm_pgoff == KVM_XIVE_TIMA_PAGE_OFFSET) { 279 + if (vma_pages(vma) > 4) 280 + return -EINVAL; 281 + vma->vm_ops = &xive_native_tima_vmops; 282 + } else if (vma->vm_pgoff == KVM_XIVE_ESB_PAGE_OFFSET) { 283 + if (vma_pages(vma) > KVMPPC_XIVE_NR_IRQS * 2) 284 + return -EINVAL; 285 + vma->vm_ops = &xive_native_esb_vmops; 286 + } else { 287 + return -EINVAL; 288 + } 289 + 290 + vma->vm_flags |= VM_IO | VM_PFNMAP; 291 + vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot); 292 + 293 + /* 294 + * Grab the KVM device file address_space to be able to clear 295 + * the ESB pages mapping when a device is passed-through into 296 + * the guest. 297 + */ 298 + xive->mapping = vma->vm_file->f_mapping; 299 + return 0; 300 + } 301 + 302 + static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq, 303 + u64 addr) 304 + { 305 + struct kvmppc_xive_src_block *sb; 306 + struct kvmppc_xive_irq_state *state; 307 + u64 __user *ubufp = (u64 __user *) addr; 308 + u64 val; 309 + u16 idx; 310 + int rc; 311 + 312 + pr_devel("%s irq=0x%lx\n", __func__, irq); 313 + 314 + if (irq < KVMPPC_XIVE_FIRST_IRQ || irq >= KVMPPC_XIVE_NR_IRQS) 315 + return -E2BIG; 316 + 317 + sb = kvmppc_xive_find_source(xive, irq, &idx); 318 + if (!sb) { 319 + pr_debug("No source, creating source block...\n"); 320 + sb = kvmppc_xive_create_src_block(xive, irq); 321 + if (!sb) { 322 + pr_err("Failed to create block...\n"); 323 + return -ENOMEM; 324 + } 325 + } 326 + state = &sb->irq_state[idx]; 327 + 328 + if (get_user(val, ubufp)) { 329 + pr_err("fault getting user info !\n"); 330 + return -EFAULT; 331 + } 332 + 333 + arch_spin_lock(&sb->lock); 334 + 335 + /* 336 + * If the source doesn't already have an IPI, allocate 337 + * one and get the corresponding data 338 + */ 339 + if (!state->ipi_number) { 340 + state->ipi_number = xive_native_alloc_irq(); 341 + if (state->ipi_number == 0) { 342 + pr_err("Failed to allocate IRQ !\n"); 343 + rc = -ENXIO; 344 + goto unlock; 345 + } 346 + xive_native_populate_irq_data(state->ipi_number, 347 + &state->ipi_data); 348 + pr_debug("%s allocated hw_irq=0x%x for irq=0x%lx\n", __func__, 349 + state->ipi_number, irq); 350 + } 351 + 352 + /* Restore LSI state */ 353 + if (val & KVM_XIVE_LEVEL_SENSITIVE) { 354 + state->lsi = true; 355 + if (val & KVM_XIVE_LEVEL_ASSERTED) 356 + state->asserted = true; 357 + pr_devel(" LSI ! Asserted=%d\n", state->asserted); 358 + } 359 + 360 + /* Mask IRQ to start with */ 361 + state->act_server = 0; 362 + state->act_priority = MASKED; 363 + xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); 364 + xive_native_configure_irq(state->ipi_number, 0, MASKED, 0); 365 + 366 + /* Increment the number of valid sources and mark this one valid */ 367 + if (!state->valid) 368 + xive->src_count++; 369 + state->valid = true; 370 + 371 + rc = 0; 372 + 373 + unlock: 374 + arch_spin_unlock(&sb->lock); 375 + 376 + return rc; 377 + } 378 + 379 + static int kvmppc_xive_native_update_source_config(struct kvmppc_xive *xive, 380 + struct kvmppc_xive_src_block *sb, 381 + struct kvmppc_xive_irq_state *state, 382 + u32 server, u8 priority, bool masked, 383 + u32 eisn) 384 + { 385 + struct kvm *kvm = xive->kvm; 386 + u32 hw_num; 387 + int rc = 0; 388 + 389 + arch_spin_lock(&sb->lock); 390 + 391 + if (state->act_server == server && state->act_priority == priority && 392 + state->eisn == eisn) 393 + goto unlock; 394 + 395 + pr_devel("new_act_prio=%d new_act_server=%d mask=%d act_server=%d act_prio=%d\n", 396 + priority, server, masked, state->act_server, 397 + state->act_priority); 398 + 399 + kvmppc_xive_select_irq(state, &hw_num, NULL); 400 + 401 + if (priority != MASKED && !masked) { 402 + rc = kvmppc_xive_select_target(kvm, &server, priority); 403 + if (rc) 404 + goto unlock; 405 + 406 + state->act_priority = priority; 407 + state->act_server = server; 408 + state->eisn = eisn; 409 + 410 + rc = xive_native_configure_irq(hw_num, 411 + kvmppc_xive_vp(xive, server), 412 + priority, eisn); 413 + } else { 414 + state->act_priority = MASKED; 415 + state->act_server = 0; 416 + state->eisn = 0; 417 + 418 + rc = xive_native_configure_irq(hw_num, 0, MASKED, 0); 419 + } 420 + 421 + unlock: 422 + arch_spin_unlock(&sb->lock); 423 + return rc; 424 + } 425 + 426 + static int kvmppc_xive_native_set_source_config(struct kvmppc_xive *xive, 427 + long irq, u64 addr) 428 + { 429 + struct kvmppc_xive_src_block *sb; 430 + struct kvmppc_xive_irq_state *state; 431 + u64 __user *ubufp = (u64 __user *) addr; 432 + u16 src; 433 + u64 kvm_cfg; 434 + u32 server; 435 + u8 priority; 436 + bool masked; 437 + u32 eisn; 438 + 439 + sb = kvmppc_xive_find_source(xive, irq, &src); 440 + if (!sb) 441 + return -ENOENT; 442 + 443 + state = &sb->irq_state[src]; 444 + 445 + if (!state->valid) 446 + return -EINVAL; 447 + 448 + if (get_user(kvm_cfg, ubufp)) 449 + return -EFAULT; 450 + 451 + pr_devel("%s irq=0x%lx cfg=%016llx\n", __func__, irq, kvm_cfg); 452 + 453 + priority = (kvm_cfg & KVM_XIVE_SOURCE_PRIORITY_MASK) >> 454 + KVM_XIVE_SOURCE_PRIORITY_SHIFT; 455 + server = (kvm_cfg & KVM_XIVE_SOURCE_SERVER_MASK) >> 456 + KVM_XIVE_SOURCE_SERVER_SHIFT; 457 + masked = (kvm_cfg & KVM_XIVE_SOURCE_MASKED_MASK) >> 458 + KVM_XIVE_SOURCE_MASKED_SHIFT; 459 + eisn = (kvm_cfg & KVM_XIVE_SOURCE_EISN_MASK) >> 460 + KVM_XIVE_SOURCE_EISN_SHIFT; 461 + 462 + if (priority != xive_prio_from_guest(priority)) { 463 + pr_err("invalid priority for queue %d for VCPU %d\n", 464 + priority, server); 465 + return -EINVAL; 466 + } 467 + 468 + return kvmppc_xive_native_update_source_config(xive, sb, state, server, 469 + priority, masked, eisn); 470 + } 471 + 472 + static int kvmppc_xive_native_sync_source(struct kvmppc_xive *xive, 473 + long irq, u64 addr) 474 + { 475 + struct kvmppc_xive_src_block *sb; 476 + struct kvmppc_xive_irq_state *state; 477 + struct xive_irq_data *xd; 478 + u32 hw_num; 479 + u16 src; 480 + int rc = 0; 481 + 482 + pr_devel("%s irq=0x%lx", __func__, irq); 483 + 484 + sb = kvmppc_xive_find_source(xive, irq, &src); 485 + if (!sb) 486 + return -ENOENT; 487 + 488 + state = &sb->irq_state[src]; 489 + 490 + rc = -EINVAL; 491 + 492 + arch_spin_lock(&sb->lock); 493 + 494 + if (state->valid) { 495 + kvmppc_xive_select_irq(state, &hw_num, &xd); 496 + xive_native_sync_source(hw_num); 497 + rc = 0; 498 + } 499 + 500 + arch_spin_unlock(&sb->lock); 501 + return rc; 502 + } 503 + 504 + static int xive_native_validate_queue_size(u32 qshift) 505 + { 506 + /* 507 + * We only support 64K pages for the moment. This is also 508 + * advertised in the DT property "ibm,xive-eq-sizes" 509 + */ 510 + switch (qshift) { 511 + case 0: /* EQ reset */ 512 + case 16: 513 + return 0; 514 + case 12: 515 + case 21: 516 + case 24: 517 + default: 518 + return -EINVAL; 519 + } 520 + } 521 + 522 + static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive, 523 + long eq_idx, u64 addr) 524 + { 525 + struct kvm *kvm = xive->kvm; 526 + struct kvm_vcpu *vcpu; 527 + struct kvmppc_xive_vcpu *xc; 528 + void __user *ubufp = (void __user *) addr; 529 + u32 server; 530 + u8 priority; 531 + struct kvm_ppc_xive_eq kvm_eq; 532 + int rc; 533 + __be32 *qaddr = 0; 534 + struct page *page; 535 + struct xive_q *q; 536 + gfn_t gfn; 537 + unsigned long page_size; 538 + 539 + /* 540 + * Demangle priority/server tuple from the EQ identifier 541 + */ 542 + priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >> 543 + KVM_XIVE_EQ_PRIORITY_SHIFT; 544 + server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >> 545 + KVM_XIVE_EQ_SERVER_SHIFT; 546 + 547 + if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq))) 548 + return -EFAULT; 549 + 550 + vcpu = kvmppc_xive_find_server(kvm, server); 551 + if (!vcpu) { 552 + pr_err("Can't find server %d\n", server); 553 + return -ENOENT; 554 + } 555 + xc = vcpu->arch.xive_vcpu; 556 + 557 + if (priority != xive_prio_from_guest(priority)) { 558 + pr_err("Trying to restore invalid queue %d for VCPU %d\n", 559 + priority, server); 560 + return -EINVAL; 561 + } 562 + q = &xc->queues[priority]; 563 + 564 + pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n", 565 + __func__, server, priority, kvm_eq.flags, 566 + kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex); 567 + 568 + /* 569 + * sPAPR specifies a "Unconditional Notify (n) flag" for the 570 + * H_INT_SET_QUEUE_CONFIG hcall which forces notification 571 + * without using the coalescing mechanisms provided by the 572 + * XIVE END ESBs. This is required on KVM as notification 573 + * using the END ESBs is not supported. 574 + */ 575 + if (kvm_eq.flags != KVM_XIVE_EQ_ALWAYS_NOTIFY) { 576 + pr_err("invalid flags %d\n", kvm_eq.flags); 577 + return -EINVAL; 578 + } 579 + 580 + rc = xive_native_validate_queue_size(kvm_eq.qshift); 581 + if (rc) { 582 + pr_err("invalid queue size %d\n", kvm_eq.qshift); 583 + return rc; 584 + } 585 + 586 + /* reset queue and disable queueing */ 587 + if (!kvm_eq.qshift) { 588 + q->guest_qaddr = 0; 589 + q->guest_qshift = 0; 590 + 591 + rc = xive_native_configure_queue(xc->vp_id, q, priority, 592 + NULL, 0, true); 593 + if (rc) { 594 + pr_err("Failed to reset queue %d for VCPU %d: %d\n", 595 + priority, xc->server_num, rc); 596 + return rc; 597 + } 598 + 599 + if (q->qpage) { 600 + put_page(virt_to_page(q->qpage)); 601 + q->qpage = NULL; 602 + } 603 + 604 + return 0; 605 + } 606 + 607 + if (kvm_eq.qaddr & ((1ull << kvm_eq.qshift) - 1)) { 608 + pr_err("queue page is not aligned %llx/%llx\n", kvm_eq.qaddr, 609 + 1ull << kvm_eq.qshift); 610 + return -EINVAL; 611 + } 612 + 613 + gfn = gpa_to_gfn(kvm_eq.qaddr); 614 + page = gfn_to_page(kvm, gfn); 615 + if (is_error_page(page)) { 616 + pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr); 617 + return -EINVAL; 618 + } 619 + 620 + page_size = kvm_host_page_size(kvm, gfn); 621 + if (1ull << kvm_eq.qshift > page_size) { 622 + pr_warn("Incompatible host page size %lx!\n", page_size); 623 + return -EINVAL; 624 + } 625 + 626 + qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK); 627 + 628 + /* 629 + * Backup the queue page guest address to the mark EQ page 630 + * dirty for migration. 631 + */ 632 + q->guest_qaddr = kvm_eq.qaddr; 633 + q->guest_qshift = kvm_eq.qshift; 634 + 635 + /* 636 + * Unconditional Notification is forced by default at the 637 + * OPAL level because the use of END ESBs is not supported by 638 + * Linux. 639 + */ 640 + rc = xive_native_configure_queue(xc->vp_id, q, priority, 641 + (__be32 *) qaddr, kvm_eq.qshift, true); 642 + if (rc) { 643 + pr_err("Failed to configure queue %d for VCPU %d: %d\n", 644 + priority, xc->server_num, rc); 645 + put_page(page); 646 + return rc; 647 + } 648 + 649 + /* 650 + * Only restore the queue state when needed. When doing the 651 + * H_INT_SET_SOURCE_CONFIG hcall, it should not. 652 + */ 653 + if (kvm_eq.qtoggle != 1 || kvm_eq.qindex != 0) { 654 + rc = xive_native_set_queue_state(xc->vp_id, priority, 655 + kvm_eq.qtoggle, 656 + kvm_eq.qindex); 657 + if (rc) 658 + goto error; 659 + } 660 + 661 + rc = kvmppc_xive_attach_escalation(vcpu, priority, 662 + xive->single_escalation); 663 + error: 664 + if (rc) 665 + kvmppc_xive_native_cleanup_queue(vcpu, priority); 666 + return rc; 667 + } 668 + 669 + static int kvmppc_xive_native_get_queue_config(struct kvmppc_xive *xive, 670 + long eq_idx, u64 addr) 671 + { 672 + struct kvm *kvm = xive->kvm; 673 + struct kvm_vcpu *vcpu; 674 + struct kvmppc_xive_vcpu *xc; 675 + struct xive_q *q; 676 + void __user *ubufp = (u64 __user *) addr; 677 + u32 server; 678 + u8 priority; 679 + struct kvm_ppc_xive_eq kvm_eq; 680 + u64 qaddr; 681 + u64 qshift; 682 + u64 qeoi_page; 683 + u32 escalate_irq; 684 + u64 qflags; 685 + int rc; 686 + 687 + /* 688 + * Demangle priority/server tuple from the EQ identifier 689 + */ 690 + priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >> 691 + KVM_XIVE_EQ_PRIORITY_SHIFT; 692 + server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >> 693 + KVM_XIVE_EQ_SERVER_SHIFT; 694 + 695 + vcpu = kvmppc_xive_find_server(kvm, server); 696 + if (!vcpu) { 697 + pr_err("Can't find server %d\n", server); 698 + return -ENOENT; 699 + } 700 + xc = vcpu->arch.xive_vcpu; 701 + 702 + if (priority != xive_prio_from_guest(priority)) { 703 + pr_err("invalid priority for queue %d for VCPU %d\n", 704 + priority, server); 705 + return -EINVAL; 706 + } 707 + q = &xc->queues[priority]; 708 + 709 + memset(&kvm_eq, 0, sizeof(kvm_eq)); 710 + 711 + if (!q->qpage) 712 + return 0; 713 + 714 + rc = xive_native_get_queue_info(xc->vp_id, priority, &qaddr, &qshift, 715 + &qeoi_page, &escalate_irq, &qflags); 716 + if (rc) 717 + return rc; 718 + 719 + kvm_eq.flags = 0; 720 + if (qflags & OPAL_XIVE_EQ_ALWAYS_NOTIFY) 721 + kvm_eq.flags |= KVM_XIVE_EQ_ALWAYS_NOTIFY; 722 + 723 + kvm_eq.qshift = q->guest_qshift; 724 + kvm_eq.qaddr = q->guest_qaddr; 725 + 726 + rc = xive_native_get_queue_state(xc->vp_id, priority, &kvm_eq.qtoggle, 727 + &kvm_eq.qindex); 728 + if (rc) 729 + return rc; 730 + 731 + pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n", 732 + __func__, server, priority, kvm_eq.flags, 733 + kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex); 734 + 735 + if (copy_to_user(ubufp, &kvm_eq, sizeof(kvm_eq))) 736 + return -EFAULT; 737 + 738 + return 0; 739 + } 740 + 741 + static void kvmppc_xive_reset_sources(struct kvmppc_xive_src_block *sb) 742 + { 743 + int i; 744 + 745 + for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) { 746 + struct kvmppc_xive_irq_state *state = &sb->irq_state[i]; 747 + 748 + if (!state->valid) 749 + continue; 750 + 751 + if (state->act_priority == MASKED) 752 + continue; 753 + 754 + state->eisn = 0; 755 + state->act_server = 0; 756 + state->act_priority = MASKED; 757 + xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01); 758 + xive_native_configure_irq(state->ipi_number, 0, MASKED, 0); 759 + if (state->pt_number) { 760 + xive_vm_esb_load(state->pt_data, XIVE_ESB_SET_PQ_01); 761 + xive_native_configure_irq(state->pt_number, 762 + 0, MASKED, 0); 763 + } 764 + } 765 + } 766 + 767 + static int kvmppc_xive_reset(struct kvmppc_xive *xive) 768 + { 769 + struct kvm *kvm = xive->kvm; 770 + struct kvm_vcpu *vcpu; 771 + unsigned int i; 772 + 773 + pr_devel("%s\n", __func__); 774 + 775 + mutex_lock(&kvm->lock); 776 + 777 + kvm_for_each_vcpu(i, vcpu, kvm) { 778 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 779 + unsigned int prio; 780 + 781 + if (!xc) 782 + continue; 783 + 784 + kvmppc_xive_disable_vcpu_interrupts(vcpu); 785 + 786 + for (prio = 0; prio < KVMPPC_XIVE_Q_COUNT; prio++) { 787 + 788 + /* Single escalation, no queue 7 */ 789 + if (prio == 7 && xive->single_escalation) 790 + break; 791 + 792 + if (xc->esc_virq[prio]) { 793 + free_irq(xc->esc_virq[prio], vcpu); 794 + irq_dispose_mapping(xc->esc_virq[prio]); 795 + kfree(xc->esc_virq_names[prio]); 796 + xc->esc_virq[prio] = 0; 797 + } 798 + 799 + kvmppc_xive_native_cleanup_queue(vcpu, prio); 800 + } 801 + } 802 + 803 + for (i = 0; i <= xive->max_sbid; i++) { 804 + struct kvmppc_xive_src_block *sb = xive->src_blocks[i]; 805 + 806 + if (sb) { 807 + arch_spin_lock(&sb->lock); 808 + kvmppc_xive_reset_sources(sb); 809 + arch_spin_unlock(&sb->lock); 810 + } 811 + } 812 + 813 + mutex_unlock(&kvm->lock); 814 + 815 + return 0; 816 + } 817 + 818 + static void kvmppc_xive_native_sync_sources(struct kvmppc_xive_src_block *sb) 819 + { 820 + int j; 821 + 822 + for (j = 0; j < KVMPPC_XICS_IRQ_PER_ICS; j++) { 823 + struct kvmppc_xive_irq_state *state = &sb->irq_state[j]; 824 + struct xive_irq_data *xd; 825 + u32 hw_num; 826 + 827 + if (!state->valid) 828 + continue; 829 + 830 + /* 831 + * The struct kvmppc_xive_irq_state reflects the state 832 + * of the EAS configuration and not the state of the 833 + * source. The source is masked setting the PQ bits to 834 + * '-Q', which is what is being done before calling 835 + * the KVM_DEV_XIVE_EQ_SYNC control. 836 + * 837 + * If a source EAS is configured, OPAL syncs the XIVE 838 + * IC of the source and the XIVE IC of the previous 839 + * target if any. 840 + * 841 + * So it should be fine ignoring MASKED sources as 842 + * they have been synced already. 843 + */ 844 + if (state->act_priority == MASKED) 845 + continue; 846 + 847 + kvmppc_xive_select_irq(state, &hw_num, &xd); 848 + xive_native_sync_source(hw_num); 849 + xive_native_sync_queue(hw_num); 850 + } 851 + } 852 + 853 + static int kvmppc_xive_native_vcpu_eq_sync(struct kvm_vcpu *vcpu) 854 + { 855 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 856 + unsigned int prio; 857 + 858 + if (!xc) 859 + return -ENOENT; 860 + 861 + for (prio = 0; prio < KVMPPC_XIVE_Q_COUNT; prio++) { 862 + struct xive_q *q = &xc->queues[prio]; 863 + 864 + if (!q->qpage) 865 + continue; 866 + 867 + /* Mark EQ page dirty for migration */ 868 + mark_page_dirty(vcpu->kvm, gpa_to_gfn(q->guest_qaddr)); 869 + } 870 + return 0; 871 + } 872 + 873 + static int kvmppc_xive_native_eq_sync(struct kvmppc_xive *xive) 874 + { 875 + struct kvm *kvm = xive->kvm; 876 + struct kvm_vcpu *vcpu; 877 + unsigned int i; 878 + 879 + pr_devel("%s\n", __func__); 880 + 881 + mutex_lock(&kvm->lock); 882 + for (i = 0; i <= xive->max_sbid; i++) { 883 + struct kvmppc_xive_src_block *sb = xive->src_blocks[i]; 884 + 885 + if (sb) { 886 + arch_spin_lock(&sb->lock); 887 + kvmppc_xive_native_sync_sources(sb); 888 + arch_spin_unlock(&sb->lock); 889 + } 890 + } 891 + 892 + kvm_for_each_vcpu(i, vcpu, kvm) { 893 + kvmppc_xive_native_vcpu_eq_sync(vcpu); 894 + } 895 + mutex_unlock(&kvm->lock); 896 + 897 + return 0; 898 + } 899 + 900 + static int kvmppc_xive_native_set_attr(struct kvm_device *dev, 901 + struct kvm_device_attr *attr) 902 + { 903 + struct kvmppc_xive *xive = dev->private; 904 + 905 + switch (attr->group) { 906 + case KVM_DEV_XIVE_GRP_CTRL: 907 + switch (attr->attr) { 908 + case KVM_DEV_XIVE_RESET: 909 + return kvmppc_xive_reset(xive); 910 + case KVM_DEV_XIVE_EQ_SYNC: 911 + return kvmppc_xive_native_eq_sync(xive); 912 + } 913 + break; 914 + case KVM_DEV_XIVE_GRP_SOURCE: 915 + return kvmppc_xive_native_set_source(xive, attr->attr, 916 + attr->addr); 917 + case KVM_DEV_XIVE_GRP_SOURCE_CONFIG: 918 + return kvmppc_xive_native_set_source_config(xive, attr->attr, 919 + attr->addr); 920 + case KVM_DEV_XIVE_GRP_EQ_CONFIG: 921 + return kvmppc_xive_native_set_queue_config(xive, attr->attr, 922 + attr->addr); 923 + case KVM_DEV_XIVE_GRP_SOURCE_SYNC: 924 + return kvmppc_xive_native_sync_source(xive, attr->attr, 925 + attr->addr); 926 + } 927 + return -ENXIO; 928 + } 929 + 930 + static int kvmppc_xive_native_get_attr(struct kvm_device *dev, 931 + struct kvm_device_attr *attr) 932 + { 933 + struct kvmppc_xive *xive = dev->private; 934 + 935 + switch (attr->group) { 936 + case KVM_DEV_XIVE_GRP_EQ_CONFIG: 937 + return kvmppc_xive_native_get_queue_config(xive, attr->attr, 938 + attr->addr); 939 + } 940 + return -ENXIO; 941 + } 942 + 943 + static int kvmppc_xive_native_has_attr(struct kvm_device *dev, 944 + struct kvm_device_attr *attr) 945 + { 946 + switch (attr->group) { 947 + case KVM_DEV_XIVE_GRP_CTRL: 948 + switch (attr->attr) { 949 + case KVM_DEV_XIVE_RESET: 950 + case KVM_DEV_XIVE_EQ_SYNC: 951 + return 0; 952 + } 953 + break; 954 + case KVM_DEV_XIVE_GRP_SOURCE: 955 + case KVM_DEV_XIVE_GRP_SOURCE_CONFIG: 956 + case KVM_DEV_XIVE_GRP_SOURCE_SYNC: 957 + if (attr->attr >= KVMPPC_XIVE_FIRST_IRQ && 958 + attr->attr < KVMPPC_XIVE_NR_IRQS) 959 + return 0; 960 + break; 961 + case KVM_DEV_XIVE_GRP_EQ_CONFIG: 962 + return 0; 963 + } 964 + return -ENXIO; 965 + } 966 + 967 + /* 968 + * Called when device fd is closed 969 + */ 970 + static void kvmppc_xive_native_release(struct kvm_device *dev) 971 + { 972 + struct kvmppc_xive *xive = dev->private; 973 + struct kvm *kvm = xive->kvm; 974 + struct kvm_vcpu *vcpu; 975 + int i; 976 + int was_ready; 977 + 978 + debugfs_remove(xive->dentry); 979 + 980 + pr_devel("Releasing xive native device\n"); 981 + 982 + /* 983 + * Clearing mmu_ready temporarily while holding kvm->lock 984 + * is a way of ensuring that no vcpus can enter the guest 985 + * until we drop kvm->lock. Doing kick_all_cpus_sync() 986 + * ensures that any vcpu executing inside the guest has 987 + * exited the guest. Once kick_all_cpus_sync() has finished, 988 + * we know that no vcpu can be executing the XIVE push or 989 + * pull code or accessing the XIVE MMIO regions. 990 + * 991 + * Since this is the device release function, we know that 992 + * userspace does not have any open fd or mmap referring to 993 + * the device. Therefore there can not be any of the 994 + * device attribute set/get, mmap, or page fault functions 995 + * being executed concurrently, and similarly, the 996 + * connect_vcpu and set/clr_mapped functions also cannot 997 + * be being executed. 998 + */ 999 + was_ready = kvm->arch.mmu_ready; 1000 + kvm->arch.mmu_ready = 0; 1001 + kick_all_cpus_sync(); 1002 + 1003 + /* 1004 + * We should clean up the vCPU interrupt presenters first. 1005 + */ 1006 + kvm_for_each_vcpu(i, vcpu, kvm) { 1007 + /* 1008 + * Take vcpu->mutex to ensure that no one_reg get/set ioctl 1009 + * (i.e. kvmppc_xive_native_[gs]et_vp) can be being done. 1010 + */ 1011 + mutex_lock(&vcpu->mutex); 1012 + kvmppc_xive_native_cleanup_vcpu(vcpu); 1013 + mutex_unlock(&vcpu->mutex); 1014 + } 1015 + 1016 + kvm->arch.xive = NULL; 1017 + 1018 + for (i = 0; i <= xive->max_sbid; i++) { 1019 + if (xive->src_blocks[i]) 1020 + kvmppc_xive_free_sources(xive->src_blocks[i]); 1021 + kfree(xive->src_blocks[i]); 1022 + xive->src_blocks[i] = NULL; 1023 + } 1024 + 1025 + if (xive->vp_base != XIVE_INVALID_VP) 1026 + xive_native_free_vp_block(xive->vp_base); 1027 + 1028 + kvm->arch.mmu_ready = was_ready; 1029 + 1030 + /* 1031 + * A reference of the kvmppc_xive pointer is now kept under 1032 + * the xive_devices struct of the machine for reuse. It is 1033 + * freed when the VM is destroyed for now until we fix all the 1034 + * execution paths. 1035 + */ 1036 + 1037 + kfree(dev); 1038 + } 1039 + 1040 + /* 1041 + * Create a XIVE device. kvm->lock is held. 1042 + */ 1043 + static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) 1044 + { 1045 + struct kvmppc_xive *xive; 1046 + struct kvm *kvm = dev->kvm; 1047 + int ret = 0; 1048 + 1049 + pr_devel("Creating xive native device\n"); 1050 + 1051 + if (kvm->arch.xive) 1052 + return -EEXIST; 1053 + 1054 + xive = kvmppc_xive_get_device(kvm, type); 1055 + if (!xive) 1056 + return -ENOMEM; 1057 + 1058 + dev->private = xive; 1059 + xive->dev = dev; 1060 + xive->kvm = kvm; 1061 + kvm->arch.xive = xive; 1062 + mutex_init(&xive->mapping_lock); 1063 + 1064 + /* 1065 + * Allocate a bunch of VPs. KVM_MAX_VCPUS is a large value for 1066 + * a default. Getting the max number of CPUs the VM was 1067 + * configured with would improve our usage of the XIVE VP space. 1068 + */ 1069 + xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS); 1070 + pr_devel("VP_Base=%x\n", xive->vp_base); 1071 + 1072 + if (xive->vp_base == XIVE_INVALID_VP) 1073 + ret = -ENXIO; 1074 + 1075 + xive->single_escalation = xive_native_has_single_escalation(); 1076 + xive->ops = &kvmppc_xive_native_ops; 1077 + 1078 + if (ret) 1079 + kfree(xive); 1080 + 1081 + return ret; 1082 + } 1083 + 1084 + /* 1085 + * Interrupt Pending Buffer (IPB) offset 1086 + */ 1087 + #define TM_IPB_SHIFT 40 1088 + #define TM_IPB_MASK (((u64) 0xFF) << TM_IPB_SHIFT) 1089 + 1090 + int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val) 1091 + { 1092 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1093 + u64 opal_state; 1094 + int rc; 1095 + 1096 + if (!kvmppc_xive_enabled(vcpu)) 1097 + return -EPERM; 1098 + 1099 + if (!xc) 1100 + return -ENOENT; 1101 + 1102 + /* Thread context registers. We only care about IPB and CPPR */ 1103 + val->xive_timaval[0] = vcpu->arch.xive_saved_state.w01; 1104 + 1105 + /* Get the VP state from OPAL */ 1106 + rc = xive_native_get_vp_state(xc->vp_id, &opal_state); 1107 + if (rc) 1108 + return rc; 1109 + 1110 + /* 1111 + * Capture the backup of IPB register in the NVT structure and 1112 + * merge it in our KVM VP state. 1113 + */ 1114 + val->xive_timaval[0] |= cpu_to_be64(opal_state & TM_IPB_MASK); 1115 + 1116 + pr_devel("%s NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x opal=%016llx\n", 1117 + __func__, 1118 + vcpu->arch.xive_saved_state.nsr, 1119 + vcpu->arch.xive_saved_state.cppr, 1120 + vcpu->arch.xive_saved_state.ipb, 1121 + vcpu->arch.xive_saved_state.pipr, 1122 + vcpu->arch.xive_saved_state.w01, 1123 + (u32) vcpu->arch.xive_cam_word, opal_state); 1124 + 1125 + return 0; 1126 + } 1127 + 1128 + int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val) 1129 + { 1130 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1131 + struct kvmppc_xive *xive = vcpu->kvm->arch.xive; 1132 + 1133 + pr_devel("%s w01=%016llx vp=%016llx\n", __func__, 1134 + val->xive_timaval[0], val->xive_timaval[1]); 1135 + 1136 + if (!kvmppc_xive_enabled(vcpu)) 1137 + return -EPERM; 1138 + 1139 + if (!xc || !xive) 1140 + return -ENOENT; 1141 + 1142 + /* We can't update the state of a "pushed" VCPU */ 1143 + if (WARN_ON(vcpu->arch.xive_pushed)) 1144 + return -EBUSY; 1145 + 1146 + /* 1147 + * Restore the thread context registers. IPB and CPPR should 1148 + * be the only ones that matter. 1149 + */ 1150 + vcpu->arch.xive_saved_state.w01 = val->xive_timaval[0]; 1151 + 1152 + /* 1153 + * There is no need to restore the XIVE internal state (IPB 1154 + * stored in the NVT) as the IPB register was merged in KVM VP 1155 + * state when captured. 1156 + */ 1157 + return 0; 1158 + } 1159 + 1160 + static int xive_native_debug_show(struct seq_file *m, void *private) 1161 + { 1162 + struct kvmppc_xive *xive = m->private; 1163 + struct kvm *kvm = xive->kvm; 1164 + struct kvm_vcpu *vcpu; 1165 + unsigned int i; 1166 + 1167 + if (!kvm) 1168 + return 0; 1169 + 1170 + seq_puts(m, "=========\nVCPU state\n=========\n"); 1171 + 1172 + kvm_for_each_vcpu(i, vcpu, kvm) { 1173 + struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; 1174 + 1175 + if (!xc) 1176 + continue; 1177 + 1178 + seq_printf(m, "cpu server %#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n", 1179 + xc->server_num, 1180 + vcpu->arch.xive_saved_state.nsr, 1181 + vcpu->arch.xive_saved_state.cppr, 1182 + vcpu->arch.xive_saved_state.ipb, 1183 + vcpu->arch.xive_saved_state.pipr, 1184 + vcpu->arch.xive_saved_state.w01, 1185 + (u32) vcpu->arch.xive_cam_word); 1186 + 1187 + kvmppc_xive_debug_show_queues(m, vcpu); 1188 + } 1189 + 1190 + return 0; 1191 + } 1192 + 1193 + static int xive_native_debug_open(struct inode *inode, struct file *file) 1194 + { 1195 + return single_open(file, xive_native_debug_show, inode->i_private); 1196 + } 1197 + 1198 + static const struct file_operations xive_native_debug_fops = { 1199 + .open = xive_native_debug_open, 1200 + .read = seq_read, 1201 + .llseek = seq_lseek, 1202 + .release = single_release, 1203 + }; 1204 + 1205 + static void xive_native_debugfs_init(struct kvmppc_xive *xive) 1206 + { 1207 + char *name; 1208 + 1209 + name = kasprintf(GFP_KERNEL, "kvm-xive-%p", xive); 1210 + if (!name) { 1211 + pr_err("%s: no memory for name\n", __func__); 1212 + return; 1213 + } 1214 + 1215 + xive->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root, 1216 + xive, &xive_native_debug_fops); 1217 + 1218 + pr_debug("%s: created %s\n", __func__, name); 1219 + kfree(name); 1220 + } 1221 + 1222 + static void kvmppc_xive_native_init(struct kvm_device *dev) 1223 + { 1224 + struct kvmppc_xive *xive = (struct kvmppc_xive *)dev->private; 1225 + 1226 + /* Register some debug interfaces */ 1227 + xive_native_debugfs_init(xive); 1228 + } 1229 + 1230 + struct kvm_device_ops kvm_xive_native_ops = { 1231 + .name = "kvm-xive-native", 1232 + .create = kvmppc_xive_native_create, 1233 + .init = kvmppc_xive_native_init, 1234 + .release = kvmppc_xive_native_release, 1235 + .set_attr = kvmppc_xive_native_set_attr, 1236 + .get_attr = kvmppc_xive_native_get_attr, 1237 + .has_attr = kvmppc_xive_native_has_attr, 1238 + .mmap = kvmppc_xive_native_mmap, 1239 + }; 1240 + 1241 + void kvmppc_xive_native_init_module(void) 1242 + { 1243 + ; 1244 + } 1245 + 1246 + void kvmppc_xive_native_exit_module(void) 1247 + { 1248 + ; 1249 + }

+40 -38

arch/powerpc/kvm/book3s_xive_template.c

··· 130 130 */ 131 131 prio = ffs(pending) - 1; 132 132 133 - /* 134 - * If the most favoured prio we found pending is less 135 - * favored (or equal) than a pending IPI, we return 136 - * the IPI instead. 137 - * 138 - * Note: If pending was 0 and mfrr is 0xff, we will 139 - * not spurriously take an IPI because mfrr cannot 140 - * then be smaller than cppr. 141 - */ 142 - if (prio >= xc->mfrr && xc->mfrr < xc->cppr) { 143 - prio = xc->mfrr; 144 - hirq = XICS_IPI; 133 + /* Don't scan past the guest cppr */ 134 + if (prio >= xc->cppr || prio > 7) { 135 + if (xc->mfrr < xc->cppr) { 136 + prio = xc->mfrr; 137 + hirq = XICS_IPI; 138 + } 145 139 break; 146 140 } 147 - 148 - /* Don't scan past the guest cppr */ 149 - if (prio >= xc->cppr || prio > 7) 150 - break; 151 141 152 142 /* Grab queue and pointers */ 153 143 q = &xc->queues[prio]; ··· 174 184 * been set and another occurrence of the IPI will trigger. 175 185 */ 176 186 if (hirq == XICS_IPI || (prio == 0 && !qpage)) { 177 - if (scan_type == scan_fetch) 187 + if (scan_type == scan_fetch) { 178 188 GLUE(X_PFX,source_eoi)(xc->vp_ipi, 179 189 &xc->vp_ipi_data); 190 + q->idx = idx; 191 + q->toggle = toggle; 192 + } 180 193 /* Loop back on same queue with updated idx/toggle */ 181 194 #ifdef XIVE_RUNTIME_CHECKS 182 195 WARN_ON(hirq && hirq != XICS_IPI); ··· 192 199 if (hirq == XICS_DUMMY) 193 200 goto skip_ipi; 194 201 202 + /* Clear the pending bit if the queue is now empty */ 203 + if (!hirq) { 204 + pending &= ~(1 << prio); 205 + 206 + /* 207 + * Check if the queue count needs adjusting due to 208 + * interrupts being moved away. 209 + */ 210 + if (atomic_read(&q->pending_count)) { 211 + int p = atomic_xchg(&q->pending_count, 0); 212 + if (p) { 213 + #ifdef XIVE_RUNTIME_CHECKS 214 + WARN_ON(p > atomic_read(&q->count)); 215 + #endif 216 + atomic_sub(p, &q->count); 217 + } 218 + } 219 + } 220 + 221 + /* 222 + * If the most favoured prio we found pending is less 223 + * favored (or equal) than a pending IPI, we return 224 + * the IPI instead. 225 + */ 226 + if (prio >= xc->mfrr && xc->mfrr < xc->cppr) { 227 + prio = xc->mfrr; 228 + hirq = XICS_IPI; 229 + break; 230 + } 231 + 195 232 /* If fetching, update queue pointers */ 196 233 if (scan_type == scan_fetch) { 197 234 q->idx = idx; 198 235 q->toggle = toggle; 199 - } 200 - 201 - /* Something found, stop searching */ 202 - if (hirq) 203 - break; 204 - 205 - /* Clear the pending bit on the now empty queue */ 206 - pending &= ~(1 << prio); 207 - 208 - /* 209 - * Check if the queue count needs adjusting due to 210 - * interrupts being moved away. 211 - */ 212 - if (atomic_read(&q->pending_count)) { 213 - int p = atomic_xchg(&q->pending_count, 0); 214 - if (p) { 215 - #ifdef XIVE_RUNTIME_CHECKS 216 - WARN_ON(p > atomic_read(&q->count)); 217 - #endif 218 - atomic_sub(p, &q->count); 219 - } 220 236 } 221 237 } 222 238

+37 -3

arch/powerpc/kvm/powerpc.c

··· 570 570 case KVM_CAP_PPC_GET_CPU_CHAR: 571 571 r = 1; 572 572 break; 573 + #ifdef CONFIG_KVM_XIVE 574 + case KVM_CAP_PPC_IRQ_XIVE: 575 + /* 576 + * We need XIVE to be enabled on the platform (implies 577 + * a POWER9 processor) and the PowerNV platform, as 578 + * nested is not yet supported. 579 + */ 580 + r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE); 581 + break; 582 + #endif 573 583 574 584 case KVM_CAP_PPC_ALLOC_HTAB: 575 585 r = hv_enabled; ··· 653 643 r = num_present_cpus(); 654 644 else 655 645 r = num_online_cpus(); 656 - break; 657 - case KVM_CAP_NR_MEMSLOTS: 658 - r = KVM_USER_MEM_SLOTS; 659 646 break; 660 647 case KVM_CAP_MAX_VCPUS: 661 648 r = KVM_MAX_VCPUS; ··· 759 752 kvmppc_xive_cleanup_vcpu(vcpu); 760 753 else 761 754 kvmppc_xics_free_icp(vcpu); 755 + break; 756 + case KVMPPC_IRQ_XIVE: 757 + kvmppc_xive_native_cleanup_vcpu(vcpu); 762 758 break; 763 759 } 764 760 ··· 1951 1941 break; 1952 1942 } 1953 1943 #endif /* CONFIG_KVM_XICS */ 1944 + #ifdef CONFIG_KVM_XIVE 1945 + case KVM_CAP_PPC_IRQ_XIVE: { 1946 + struct fd f; 1947 + struct kvm_device *dev; 1948 + 1949 + r = -EBADF; 1950 + f = fdget(cap->args[0]); 1951 + if (!f.file) 1952 + break; 1953 + 1954 + r = -ENXIO; 1955 + if (!xive_enabled()) 1956 + break; 1957 + 1958 + r = -EPERM; 1959 + dev = kvm_device_from_filp(f.file); 1960 + if (dev) 1961 + r = kvmppc_xive_native_connect_vcpu(dev, vcpu, 1962 + cap->args[1]); 1963 + 1964 + fdput(f); 1965 + break; 1966 + } 1967 + #endif /* CONFIG_KVM_XIVE */ 1954 1968 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 1955 1969 case KVM_CAP_PPC_FWNMI: 1956 1970 r = -EINVAL;

+11

arch/powerpc/sysdev/xive/native.c

··· 521 521 } 522 522 EXPORT_SYMBOL_GPL(xive_native_default_eq_shift); 523 523 524 + unsigned long xive_tima_os; 525 + EXPORT_SYMBOL_GPL(xive_tima_os); 526 + 524 527 bool __init xive_native_init(void) 525 528 { 526 529 struct device_node *np; ··· 575 572 /* Configure Thread Management areas for KVM */ 576 573 for_each_possible_cpu(cpu) 577 574 kvmppc_set_xive_tima(cpu, r.start, tima); 575 + 576 + /* Resource 2 is OS window */ 577 + if (of_address_to_resource(np, 2, &r)) { 578 + pr_err("Failed to get thread mgmnt area resource\n"); 579 + return false; 580 + } 581 + 582 + xive_tima_os = r.start; 578 583 579 584 /* Grab size of provisionning pages */ 580 585 xive_parse_provisioning(np);

+1

arch/s390/include/asm/cpacf.h

··· 28 28 #define CPACF_KMCTR 0xb92d /* MSA4 */ 29 29 #define CPACF_PRNO 0xb93c /* MSA5 */ 30 30 #define CPACF_KMA 0xb929 /* MSA8 */ 31 + #define CPACF_KDSA 0xb93a /* MSA9 */ 31 32 32 33 /* 33 34 * En/decryption modifier bits

+2

arch/s390/include/asm/kvm_host.h

··· 278 278 #define ECD_HOSTREGMGMT 0x20000000 279 279 #define ECD_MEF 0x08000000 280 280 #define ECD_ETOKENF 0x02000000 281 + #define ECD_ECC 0x00200000 281 282 __u32 ecd; /* 0x01c8 */ 282 283 __u8 reserved1cc[18]; /* 0x01cc */ 283 284 __u64 pp; /* 0x01de */ ··· 313 312 u64 halt_successful_poll; 314 313 u64 halt_attempted_poll; 315 314 u64 halt_poll_invalid; 315 + u64 halt_no_poll_steal; 316 316 u64 halt_wakeup; 317 317 u64 instruction_lctl; 318 318 u64 instruction_lctlg;

+4 -1

arch/s390/include/uapi/asm/kvm.h

··· 152 152 __u8 pcc[16]; /* with MSA4 */ 153 153 __u8 ppno[16]; /* with MSA5 */ 154 154 __u8 kma[16]; /* with MSA8 */ 155 - __u8 reserved[1808]; 155 + __u8 kdsa[16]; /* with MSA9 */ 156 + __u8 sortl[32]; /* with STFLE.150 */ 157 + __u8 dfltcc[32]; /* with STFLE.151 */ 158 + __u8 reserved[1728]; 156 159 }; 157 160 158 161 /* kvm attributes for crypto */

+1

arch/s390/kvm/Kconfig

··· 30 30 select HAVE_KVM_IRQFD 31 31 select HAVE_KVM_IRQ_ROUTING 32 32 select HAVE_KVM_INVALID_WAKEUPS 33 + select HAVE_KVM_NO_POLL 33 34 select SRCU 34 35 select KVM_VFIO 35 36 ---help---

+9 -2

arch/s390/kvm/interrupt.c

··· 14 14 #include <linux/kvm_host.h> 15 15 #include <linux/hrtimer.h> 16 16 #include <linux/mmu_context.h> 17 + #include <linux/nospec.h> 17 18 #include <linux/signal.h> 18 19 #include <linux/slab.h> 19 20 #include <linux/bitmap.h> ··· 2308 2307 { 2309 2308 if (id >= MAX_S390_IO_ADAPTERS) 2310 2309 return NULL; 2310 + id = array_index_nospec(id, MAX_S390_IO_ADAPTERS); 2311 2311 return kvm->arch.adapters[id]; 2312 2312 } 2313 2313 ··· 2322 2320 (void __user *)attr->addr, sizeof(adapter_info))) 2323 2321 return -EFAULT; 2324 2322 2325 - if ((adapter_info.id >= MAX_S390_IO_ADAPTERS) || 2326 - (dev->kvm->arch.adapters[adapter_info.id] != NULL)) 2323 + if (adapter_info.id >= MAX_S390_IO_ADAPTERS) 2324 + return -EINVAL; 2325 + 2326 + adapter_info.id = array_index_nospec(adapter_info.id, 2327 + MAX_S390_IO_ADAPTERS); 2328 + 2329 + if (dev->kvm->arch.adapters[adapter_info.id] != NULL) 2327 2330 return -EINVAL; 2328 2331 2329 2332 adapter = kzalloc(sizeof(*adapter), GFP_KERNEL);

+116 -4

arch/s390/kvm/kvm-s390.c

··· 75 75 { "halt_successful_poll", VCPU_STAT(halt_successful_poll) }, 76 76 { "halt_attempted_poll", VCPU_STAT(halt_attempted_poll) }, 77 77 { "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) }, 78 + { "halt_no_poll_steal", VCPU_STAT(halt_no_poll_steal) }, 78 79 { "halt_wakeup", VCPU_STAT(halt_wakeup) }, 79 80 { "instruction_lctlg", VCPU_STAT(instruction_lctlg) }, 80 81 { "instruction_lctl", VCPU_STAT(instruction_lctl) }, ··· 177 176 static int hpage; 178 177 module_param(hpage, int, 0444); 179 178 MODULE_PARM_DESC(hpage, "1m huge page backing support"); 179 + 180 + /* maximum percentage of steal time for polling. >100 is treated like 100 */ 181 + static u8 halt_poll_max_steal = 10; 182 + module_param(halt_poll_max_steal, byte, 0644); 183 + MODULE_PARM_DESC(hpage, "Maximum percentage of steal time to allow polling"); 180 184 181 185 /* 182 186 * For now we handle at most 16 double words as this is what the s390 base ··· 327 321 return cc == 0; 328 322 } 329 323 324 + static inline void __insn32_query(unsigned int opcode, u8 query[32]) 325 + { 326 + register unsigned long r0 asm("0") = 0; /* query function */ 327 + register unsigned long r1 asm("1") = (unsigned long) query; 328 + 329 + asm volatile( 330 + /* Parameter regs are ignored */ 331 + " .insn rrf,%[opc] << 16,2,4,6,0\n" 332 + : "=m" (*query) 333 + : "d" (r0), "a" (r1), [opc] "i" (opcode) 334 + : "cc"); 335 + } 336 + 337 + #define INSN_SORTL 0xb938 338 + #define INSN_DFLTCC 0xb939 339 + 330 340 static void kvm_s390_cpu_feat_init(void) 331 341 { 332 342 int i; ··· 389 367 if (test_facility(146)) /* MSA8 */ 390 368 __cpacf_query(CPACF_KMA, (cpacf_mask_t *) 391 369 kvm_s390_available_subfunc.kma); 370 + 371 + if (test_facility(155)) /* MSA9 */ 372 + __cpacf_query(CPACF_KDSA, (cpacf_mask_t *) 373 + kvm_s390_available_subfunc.kdsa); 374 + 375 + if (test_facility(150)) /* SORTL */ 376 + __insn32_query(INSN_SORTL, kvm_s390_available_subfunc.sortl); 377 + 378 + if (test_facility(151)) /* DFLTCC */ 379 + __insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc); 392 380 393 381 if (MACHINE_HAS_ESOP) 394 382 allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP); ··· 545 513 else if (sclp.has_esca && sclp.has_64bscao) 546 514 r = KVM_S390_ESCA_CPU_SLOTS; 547 515 break; 548 - case KVM_CAP_NR_MEMSLOTS: 549 - r = KVM_USER_MEM_SLOTS; 550 - break; 551 516 case KVM_CAP_S390_COW: 552 517 r = MACHINE_HAS_ESOP; 553 518 break; ··· 685 656 if (test_facility(135)) { 686 657 set_kvm_facility(kvm->arch.model.fac_mask, 135); 687 658 set_kvm_facility(kvm->arch.model.fac_list, 135); 659 + } 660 + if (test_facility(148)) { 661 + set_kvm_facility(kvm->arch.model.fac_mask, 148); 662 + set_kvm_facility(kvm->arch.model.fac_list, 148); 663 + } 664 + if (test_facility(152)) { 665 + set_kvm_facility(kvm->arch.model.fac_mask, 152); 666 + set_kvm_facility(kvm->arch.model.fac_list, 152); 688 667 } 689 668 r = 0; 690 669 } else ··· 1360 1323 VM_EVENT(kvm, 3, "SET: guest KMA subfunc 0x%16.16lx.%16.16lx", 1361 1324 ((unsigned long *) &kvm->arch.model.subfuncs.kma)[0], 1362 1325 ((unsigned long *) &kvm->arch.model.subfuncs.kma)[1]); 1326 + VM_EVENT(kvm, 3, "SET: guest KDSA subfunc 0x%16.16lx.%16.16lx", 1327 + ((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[0], 1328 + ((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[1]); 1329 + VM_EVENT(kvm, 3, "SET: guest SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1330 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[0], 1331 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[1], 1332 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[2], 1333 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[3]); 1334 + VM_EVENT(kvm, 3, "SET: guest DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1335 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[0], 1336 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1], 1337 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2], 1338 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]); 1363 1339 1364 1340 return 0; 1365 1341 } ··· 1541 1491 VM_EVENT(kvm, 3, "GET: guest KMA subfunc 0x%16.16lx.%16.16lx", 1542 1492 ((unsigned long *) &kvm->arch.model.subfuncs.kma)[0], 1543 1493 ((unsigned long *) &kvm->arch.model.subfuncs.kma)[1]); 1494 + VM_EVENT(kvm, 3, "GET: guest KDSA subfunc 0x%16.16lx.%16.16lx", 1495 + ((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[0], 1496 + ((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[1]); 1497 + VM_EVENT(kvm, 3, "GET: guest SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1498 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[0], 1499 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[1], 1500 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[2], 1501 + ((unsigned long *) &kvm->arch.model.subfuncs.sortl)[3]); 1502 + VM_EVENT(kvm, 3, "GET: guest DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1503 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[0], 1504 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1], 1505 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2], 1506 + ((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]); 1544 1507 1545 1508 return 0; 1546 1509 } ··· 1609 1546 VM_EVENT(kvm, 3, "GET: host KMA subfunc 0x%16.16lx.%16.16lx", 1610 1547 ((unsigned long *) &kvm_s390_available_subfunc.kma)[0], 1611 1548 ((unsigned long *) &kvm_s390_available_subfunc.kma)[1]); 1549 + VM_EVENT(kvm, 3, "GET: host KDSA subfunc 0x%16.16lx.%16.16lx", 1550 + ((unsigned long *) &kvm_s390_available_subfunc.kdsa)[0], 1551 + ((unsigned long *) &kvm_s390_available_subfunc.kdsa)[1]); 1552 + VM_EVENT(kvm, 3, "GET: host SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1553 + ((unsigned long *) &kvm_s390_available_subfunc.sortl)[0], 1554 + ((unsigned long *) &kvm_s390_available_subfunc.sortl)[1], 1555 + ((unsigned long *) &kvm_s390_available_subfunc.sortl)[2], 1556 + ((unsigned long *) &kvm_s390_available_subfunc.sortl)[3]); 1557 + VM_EVENT(kvm, 3, "GET: host DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx", 1558 + ((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[0], 1559 + ((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[1], 1560 + ((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[2], 1561 + ((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[3]); 1612 1562 1613 1563 return 0; 1614 1564 } ··· 2893 2817 vcpu->arch.enabled_gmap = vcpu->arch.gmap; 2894 2818 } 2895 2819 2820 + static bool kvm_has_pckmo_subfunc(struct kvm *kvm, unsigned long nr) 2821 + { 2822 + if (test_bit_inv(nr, (unsigned long *)&kvm->arch.model.subfuncs.pckmo) && 2823 + test_bit_inv(nr, (unsigned long *)&kvm_s390_available_subfunc.pckmo)) 2824 + return true; 2825 + return false; 2826 + } 2827 + 2828 + static bool kvm_has_pckmo_ecc(struct kvm *kvm) 2829 + { 2830 + /* At least one ECC subfunction must be present */ 2831 + return kvm_has_pckmo_subfunc(kvm, 32) || 2832 + kvm_has_pckmo_subfunc(kvm, 33) || 2833 + kvm_has_pckmo_subfunc(kvm, 34) || 2834 + kvm_has_pckmo_subfunc(kvm, 40) || 2835 + kvm_has_pckmo_subfunc(kvm, 41); 2836 + 2837 + } 2838 + 2896 2839 static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu) 2897 2840 { 2898 2841 /* ··· 2924 2829 vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd; 2925 2830 vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA); 2926 2831 vcpu->arch.sie_block->eca &= ~ECA_APIE; 2832 + vcpu->arch.sie_block->ecd &= ~ECD_ECC; 2927 2833 2928 2834 if (vcpu->kvm->arch.crypto.apie) 2929 2835 vcpu->arch.sie_block->eca |= ECA_APIE; 2930 2836 2931 2837 /* Set up protected key support */ 2932 - if (vcpu->kvm->arch.crypto.aes_kw) 2838 + if (vcpu->kvm->arch.crypto.aes_kw) { 2933 2839 vcpu->arch.sie_block->ecb3 |= ECB3_AES; 2840 + /* ecc is also wrapped with AES key */ 2841 + if (kvm_has_pckmo_ecc(vcpu->kvm)) 2842 + vcpu->arch.sie_block->ecd |= ECD_ECC; 2843 + } 2844 + 2934 2845 if (vcpu->kvm->arch.crypto.dea_kw) 2935 2846 vcpu->arch.sie_block->ecb3 |= ECB3_DEA; 2936 2847 } ··· 3167 3066 kvm_s390_sync_request(KVM_REQ_MMU_RELOAD, vcpu); 3168 3067 } 3169 3068 } 3069 + } 3070 + 3071 + bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) 3072 + { 3073 + /* do not poll with more than halt_poll_max_steal percent of steal time */ 3074 + if (S390_lowcore.avg_steal_timer * 100 / (TICK_USEC << 12) >= 3075 + halt_poll_max_steal) { 3076 + vcpu->stat.halt_no_poll_steal++; 3077 + return true; 3078 + } 3079 + return false; 3170 3080 } 3171 3081 3172 3082 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)

+9 -4

arch/s390/kvm/vsie.c

··· 288 288 const u32 crycb_addr = crycbd_o & 0x7ffffff8U; 289 289 unsigned long *b1, *b2; 290 290 u8 ecb3_flags; 291 + u32 ecd_flags; 291 292 int apie_h; 293 + int apie_s; 292 294 int key_msk = test_kvm_facility(vcpu->kvm, 76); 293 295 int fmt_o = crycbd_o & CRYCB_FORMAT_MASK; 294 296 int fmt_h = vcpu->arch.sie_block->crycbd & CRYCB_FORMAT_MASK; ··· 299 297 scb_s->crycbd = 0; 300 298 301 299 apie_h = vcpu->arch.sie_block->eca & ECA_APIE; 302 - if (!apie_h && (!key_msk || fmt_o == CRYCB_FORMAT0)) 300 + apie_s = apie_h & scb_o->eca; 301 + if (!apie_s && (!key_msk || (fmt_o == CRYCB_FORMAT0))) 303 302 return 0; 304 303 305 304 if (!crycb_addr) ··· 311 308 ((crycb_addr + 128) & PAGE_MASK)) 312 309 return set_validity_icpt(scb_s, 0x003CU); 313 310 314 - if (apie_h && (scb_o->eca & ECA_APIE)) { 311 + if (apie_s) { 315 312 ret = setup_apcb(vcpu, &vsie_page->crycb, crycb_addr, 316 313 vcpu->kvm->arch.crypto.crycb, 317 314 fmt_o, fmt_h); ··· 323 320 /* we may only allow it if enabled for guest 2 */ 324 321 ecb3_flags = scb_o->ecb3 & vcpu->arch.sie_block->ecb3 & 325 322 (ECB3_AES | ECB3_DEA); 326 - if (!ecb3_flags) 323 + ecd_flags = scb_o->ecd & vcpu->arch.sie_block->ecd & ECD_ECC; 324 + if (!ecb3_flags && !ecd_flags) 327 325 goto end; 328 326 329 327 /* copy only the wrapping keys */ ··· 333 329 return set_validity_icpt(scb_s, 0x0035U); 334 330 335 331 scb_s->ecb3 |= ecb3_flags; 332 + scb_s->ecd |= ecd_flags; 336 333 337 334 /* xor both blocks in one run */ 338 335 b1 = (unsigned long *) vsie_page->crycb.dea_wrapping_key_mask; ··· 344 339 end: 345 340 switch (ret) { 346 341 case -EINVAL: 347 - return set_validity_icpt(scb_s, 0x0020U); 342 + return set_validity_icpt(scb_s, 0x0022U); 348 343 case -EFAULT: 349 344 return set_validity_icpt(scb_s, 0x0035U); 350 345 case -EACCES:

+3

arch/s390/tools/gen_facilities.c

··· 93 93 131, /* enhanced-SOP 2 and side-effect */ 94 94 139, /* multiple epoch facility */ 95 95 146, /* msa extension 8 */ 96 + 150, /* enhanced sort */ 97 + 151, /* deflate conversion */ 98 + 155, /* msa extension 9 */ 96 99 -1 /* END */ 97 100 } 98 101 },

+5 -1

arch/x86/events/intel/core.c

··· 2384 2384 */ 2385 2385 if (__test_and_clear_bit(55, (unsigned long *)&status)) { 2386 2386 handled++; 2387 - intel_pt_interrupt(); 2387 + if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() && 2388 + perf_guest_cbs->handle_intel_pt_intr)) 2389 + perf_guest_cbs->handle_intel_pt_intr(); 2390 + else 2391 + intel_pt_interrupt(); 2388 2392 } 2389 2393 2390 2394 /*

+1

arch/x86/include/asm/e820/api.h

··· 10 10 11 11 extern unsigned long pci_mem_start; 12 12 13 + extern bool e820__mapped_raw_any(u64 start, u64 end, enum e820_type type); 13 14 extern bool e820__mapped_any(u64 start, u64 end, enum e820_type type); 14 15 extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type); 15 16

+6 -1

arch/x86/include/asm/kvm_host.h

··· 470 470 u64 global_ovf_ctrl; 471 471 u64 counter_bitmask[2]; 472 472 u64 global_ctrl_mask; 473 + u64 global_ovf_ctrl_mask; 473 474 u64 reserved_bits; 474 475 u8 version; 475 476 struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; ··· 782 781 783 782 /* Flush the L1 Data cache for L1TF mitigation on VMENTER */ 784 783 bool l1tf_flush_l1d; 784 + 785 + /* AMD MSRC001_0015 Hardware Configuration */ 786 + u64 msr_hwcr; 785 787 }; 786 788 787 789 struct kvm_lpage_info { ··· 1172 1168 uint32_t guest_irq, bool set); 1173 1169 void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); 1174 1170 1175 - int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc); 1171 + int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, 1172 + bool *expired); 1176 1173 void (*cancel_hv_timer)(struct kvm_vcpu *vcpu); 1177 1174 1178 1175 void (*setup_mce)(struct kvm_vcpu *vcpu);

+8

arch/x86/include/asm/msr-index.h

··· 789 789 #define MSR_CORE_PERF_GLOBAL_CTRL 0x0000038f 790 790 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x00000390 791 791 792 + /* PERF_GLOBAL_OVF_CTL bits */ 793 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT 55 794 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT) 795 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT 62 796 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT) 797 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT 63 798 + #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT) 799 + 792 800 /* Geode defined MSRs */ 793 801 #define MSR_GEODE_BUSCONT_CONF0 0x00001900 794 802

+15 -3

arch/x86/kernel/e820.c

··· 73 73 * This function checks if any part of the range <start,end> is mapped 74 74 * with type. 75 75 */ 76 - bool e820__mapped_any(u64 start, u64 end, enum e820_type type) 76 + static bool _e820__mapped_any(struct e820_table *table, 77 + u64 start, u64 end, enum e820_type type) 77 78 { 78 79 int i; 79 80 80 - for (i = 0; i < e820_table->nr_entries; i++) { 81 - struct e820_entry *entry = &e820_table->entries[i]; 81 + for (i = 0; i < table->nr_entries; i++) { 82 + struct e820_entry *entry = &table->entries[i]; 82 83 83 84 if (type && entry->type != type) 84 85 continue; ··· 88 87 return 1; 89 88 } 90 89 return 0; 90 + } 91 + 92 + bool e820__mapped_raw_any(u64 start, u64 end, enum e820_type type) 93 + { 94 + return _e820__mapped_any(e820_table_firmware, start, end, type); 95 + } 96 + EXPORT_SYMBOL_GPL(e820__mapped_raw_any); 97 + 98 + bool e820__mapped_any(u64 start, u64 end, enum e820_type type) 99 + { 100 + return _e820__mapped_any(e820_table, start, end, type); 91 101 } 92 102 EXPORT_SYMBOL_GPL(e820__mapped_any); 93 103

+6 -6

arch/x86/kvm/cpuid.c

··· 963 963 if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0)) 964 964 return 1; 965 965 966 - eax = kvm_register_read(vcpu, VCPU_REGS_RAX); 967 - ecx = kvm_register_read(vcpu, VCPU_REGS_RCX); 966 + eax = kvm_rax_read(vcpu); 967 + ecx = kvm_rcx_read(vcpu); 968 968 kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true); 969 - kvm_register_write(vcpu, VCPU_REGS_RAX, eax); 970 - kvm_register_write(vcpu, VCPU_REGS_RBX, ebx); 971 - kvm_register_write(vcpu, VCPU_REGS_RCX, ecx); 972 - kvm_register_write(vcpu, VCPU_REGS_RDX, edx); 969 + kvm_rax_write(vcpu, eax); 970 + kvm_rbx_write(vcpu, ebx); 971 + kvm_rcx_write(vcpu, ecx); 972 + kvm_rdx_write(vcpu, edx); 973 973 return kvm_skip_emulated_instruction(vcpu); 974 974 } 975 975 EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);

+12 -12

arch/x86/kvm/hyperv.c

··· 1535 1535 1536 1536 longmode = is_64_bit_mode(vcpu); 1537 1537 if (longmode) 1538 - kvm_register_write(vcpu, VCPU_REGS_RAX, result); 1538 + kvm_rax_write(vcpu, result); 1539 1539 else { 1540 - kvm_register_write(vcpu, VCPU_REGS_RDX, result >> 32); 1541 - kvm_register_write(vcpu, VCPU_REGS_RAX, result & 0xffffffff); 1540 + kvm_rdx_write(vcpu, result >> 32); 1541 + kvm_rax_write(vcpu, result & 0xffffffff); 1542 1542 } 1543 1543 } 1544 1544 ··· 1611 1611 longmode = is_64_bit_mode(vcpu); 1612 1612 1613 1613 if (!longmode) { 1614 - param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | 1615 - (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff); 1616 - ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | 1617 - (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff); 1618 - outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | 1619 - (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff); 1614 + param = ((u64)kvm_rdx_read(vcpu) << 32) | 1615 + (kvm_rax_read(vcpu) & 0xffffffff); 1616 + ingpa = ((u64)kvm_rbx_read(vcpu) << 32) | 1617 + (kvm_rcx_read(vcpu) & 0xffffffff); 1618 + outgpa = ((u64)kvm_rdi_read(vcpu) << 32) | 1619 + (kvm_rsi_read(vcpu) & 0xffffffff); 1620 1620 } 1621 1621 #ifdef CONFIG_X86_64 1622 1622 else { 1623 - param = kvm_register_read(vcpu, VCPU_REGS_RCX); 1624 - ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); 1625 - outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); 1623 + param = kvm_rcx_read(vcpu); 1624 + ingpa = kvm_rdx_read(vcpu); 1625 + outgpa = kvm_r8_read(vcpu); 1626 1626 } 1627 1627 #endif 1628 1628

+40 -2

arch/x86/kvm/kvm_cache_regs.h

··· 9 9 (X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \ 10 10 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE) 11 11 12 + #define BUILD_KVM_GPR_ACCESSORS(lname, uname) \ 13 + static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\ 14 + { \ 15 + return vcpu->arch.regs[VCPU_REGS_##uname]; \ 16 + } \ 17 + static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \ 18 + unsigned long val) \ 19 + { \ 20 + vcpu->arch.regs[VCPU_REGS_##uname] = val; \ 21 + } 22 + BUILD_KVM_GPR_ACCESSORS(rax, RAX) 23 + BUILD_KVM_GPR_ACCESSORS(rbx, RBX) 24 + BUILD_KVM_GPR_ACCESSORS(rcx, RCX) 25 + BUILD_KVM_GPR_ACCESSORS(rdx, RDX) 26 + BUILD_KVM_GPR_ACCESSORS(rbp, RBP) 27 + BUILD_KVM_GPR_ACCESSORS(rsi, RSI) 28 + BUILD_KVM_GPR_ACCESSORS(rdi, RDI) 29 + #ifdef CONFIG_X86_64 30 + BUILD_KVM_GPR_ACCESSORS(r8, R8) 31 + BUILD_KVM_GPR_ACCESSORS(r9, R9) 32 + BUILD_KVM_GPR_ACCESSORS(r10, R10) 33 + BUILD_KVM_GPR_ACCESSORS(r11, R11) 34 + BUILD_KVM_GPR_ACCESSORS(r12, R12) 35 + BUILD_KVM_GPR_ACCESSORS(r13, R13) 36 + BUILD_KVM_GPR_ACCESSORS(r14, R14) 37 + BUILD_KVM_GPR_ACCESSORS(r15, R15) 38 + #endif 39 + 12 40 static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, 13 41 enum kvm_reg reg) 14 42 { ··· 63 35 static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val) 64 36 { 65 37 kvm_register_write(vcpu, VCPU_REGS_RIP, val); 38 + } 39 + 40 + static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu) 41 + { 42 + return kvm_register_read(vcpu, VCPU_REGS_RSP); 43 + } 44 + 45 + static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val) 46 + { 47 + kvm_register_write(vcpu, VCPU_REGS_RSP, val); 66 48 } 67 49 68 50 static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index) ··· 121 83 122 84 static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu) 123 85 { 124 - return (kvm_register_read(vcpu, VCPU_REGS_RAX) & -1u) 125 - | ((u64)(kvm_register_read(vcpu, VCPU_REGS_RDX) & -1u) << 32); 86 + return (kvm_rax_read(vcpu) & -1u) 87 + | ((u64)(kvm_rdx_read(vcpu) & -1u) << 32); 126 88 } 127 89 128 90 static inline void enter_guest_mode(struct kvm_vcpu *vcpu)

+24 -14

arch/x86/kvm/lapic.c

··· 1454 1454 if (swait_active(q)) 1455 1455 swake_up_one(q); 1456 1456 1457 - if (apic_lvtt_tscdeadline(apic)) 1457 + if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use) 1458 1458 ktimer->expired_tscdeadline = ktimer->tscdeadline; 1459 1459 } 1460 1460 ··· 1696 1696 static bool start_hv_timer(struct kvm_lapic *apic) 1697 1697 { 1698 1698 struct kvm_timer *ktimer = &apic->lapic_timer; 1699 - int r; 1699 + struct kvm_vcpu *vcpu = apic->vcpu; 1700 + bool expired; 1700 1701 1701 1702 WARN_ON(preemptible()); 1702 1703 if (!kvm_x86_ops->set_hv_timer) 1703 1704 return false; 1704 1705 1705 - if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending)) 1706 - return false; 1707 - 1708 1706 if (!ktimer->tscdeadline) 1709 1707 return false; 1710 1708 1711 - r = kvm_x86_ops->set_hv_timer(apic->vcpu, ktimer->tscdeadline); 1712 - if (r < 0) 1709 + if (kvm_x86_ops->set_hv_timer(vcpu, ktimer->tscdeadline, &expired)) 1713 1710 return false; 1714 1711 1715 1712 ktimer->hv_timer_in_use = true; 1716 1713 hrtimer_cancel(&ktimer->timer); 1717 1714 1718 1715 /* 1719 - * Also recheck ktimer->pending, in case the sw timer triggered in 1720 - * the window. For periodic timer, leave the hv timer running for 1721 - * simplicity, and the deadline will be recomputed on the next vmexit. 1716 + * To simplify handling the periodic timer, leave the hv timer running 1717 + * even if the deadline timer has expired, i.e. rely on the resulting 1718 + * VM-Exit to recompute the periodic timer's target expiration. 1722 1719 */ 1723 - if (!apic_lvtt_period(apic) && (r || atomic_read(&ktimer->pending))) { 1724 - if (r) 1720 + if (!apic_lvtt_period(apic)) { 1721 + /* 1722 + * Cancel the hv timer if the sw timer fired while the hv timer 1723 + * was being programmed, or if the hv timer itself expired. 1724 + */ 1725 + if (atomic_read(&ktimer->pending)) { 1726 + cancel_hv_timer(apic); 1727 + } else if (expired) { 1725 1728 apic_timer_expired(apic); 1726 - return false; 1729 + cancel_hv_timer(apic); 1730 + } 1727 1731 } 1728 1732 1729 - trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, true); 1733 + trace_kvm_hv_timer_state(vcpu->vcpu_id, ktimer->hv_timer_in_use); 1734 + 1730 1735 return true; 1731 1736 } 1732 1737 ··· 1755 1750 static void restart_apic_timer(struct kvm_lapic *apic) 1756 1751 { 1757 1752 preempt_disable(); 1753 + 1754 + if (!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending)) 1755 + goto out; 1756 + 1758 1757 if (!start_hv_timer(apic)) 1759 1758 start_sw_timer(apic); 1759 + out: 1760 1760 preempt_enable(); 1761 1761 } 1762 1762

+17 -6

arch/x86/kvm/mmu.c

··· 44 44 #include <asm/page.h> 45 45 #include <asm/pat.h> 46 46 #include <asm/cmpxchg.h> 47 + #include <asm/e820/api.h> 47 48 #include <asm/io.h> 48 49 #include <asm/vmx.h> 49 50 #include <asm/kvm_page_track.h> ··· 488 487 * If the CPU has 46 or less physical address bits, then set an 489 488 * appropriate mask to guard against L1TF attacks. Otherwise, it is 490 489 * assumed that the CPU is not vulnerable to L1TF. 490 + * 491 + * Some Intel CPUs address the L1 cache using more PA bits than are 492 + * reported by CPUID. Use the PA width of the L1 cache when possible 493 + * to achieve more effective mitigation, e.g. if system RAM overlaps 494 + * the most significant bits of legal physical address space. 491 495 */ 492 - low_phys_bits = boot_cpu_data.x86_phys_bits; 493 - if (boot_cpu_data.x86_phys_bits < 496 + shadow_nonpresent_or_rsvd_mask = 0; 497 + low_phys_bits = boot_cpu_data.x86_cache_bits; 498 + if (boot_cpu_data.x86_cache_bits < 494 499 52 - shadow_nonpresent_or_rsvd_mask_len) { 495 500 shadow_nonpresent_or_rsvd_mask = 496 - rsvd_bits(boot_cpu_data.x86_phys_bits - 501 + rsvd_bits(boot_cpu_data.x86_cache_bits - 497 502 shadow_nonpresent_or_rsvd_mask_len, 498 - boot_cpu_data.x86_phys_bits - 1); 503 + boot_cpu_data.x86_cache_bits - 1); 499 504 low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len; 500 - } 505 + } else 506 + WARN_ON_ONCE(boot_cpu_has_bug(X86_BUG_L1TF)); 507 + 501 508 shadow_nonpresent_or_rsvd_lower_gfn_mask = 502 509 GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT); 503 510 } ··· 2901 2892 */ 2902 2893 (!pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn)); 2903 2894 2904 - return true; 2895 + return !e820__mapped_raw_any(pfn_to_hpa(pfn), 2896 + pfn_to_hpa(pfn + 1) - 1, 2897 + E820_TYPE_RAM); 2905 2898 } 2906 2899 2907 2900 /* Bits which may be returned by set_spte() */

+1 -9

arch/x86/kvm/mtrr.c

··· 48 48 return false; 49 49 } 50 50 51 - static bool valid_pat_type(unsigned t) 52 - { 53 - return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */ 54 - } 55 - 56 51 static bool valid_mtrr_type(unsigned t) 57 52 { 58 53 return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */ ··· 62 67 return false; 63 68 64 69 if (msr == MSR_IA32_CR_PAT) { 65 - for (i = 0; i < 8; i++) 66 - if (!valid_pat_type((data >> (i * 8)) & 0xff)) 67 - return false; 68 - return true; 70 + return kvm_pat_valid(data); 69 71 } else if (msr == MSR_MTRRdefType) { 70 72 if (data & ~0xcff) 71 73 return false;

+27 -7

arch/x86/kvm/paging_tmpl.h

··· 141 141 struct page *page; 142 142 143 143 npages = get_user_pages_fast((unsigned long)ptep_user, 1, FOLL_WRITE, &page); 144 - /* Check if the user is doing something meaningless. */ 145 - if (unlikely(npages != 1)) 146 - return -EFAULT; 144 + if (likely(npages == 1)) { 145 + table = kmap_atomic(page); 146 + ret = CMPXCHG(&table[index], orig_pte, new_pte); 147 + kunmap_atomic(table); 147 148 148 - table = kmap_atomic(page); 149 - ret = CMPXCHG(&table[index], orig_pte, new_pte); 150 - kunmap_atomic(table); 149 + kvm_release_page_dirty(page); 150 + } else { 151 + struct vm_area_struct *vma; 152 + unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; 153 + unsigned long pfn; 154 + unsigned long paddr; 151 155 152 - kvm_release_page_dirty(page); 156 + down_read(&current->mm->mmap_sem); 157 + vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE); 158 + if (!vma || !(vma->vm_flags & VM_PFNMAP)) { 159 + up_read(&current->mm->mmap_sem); 160 + return -EFAULT; 161 + } 162 + pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; 163 + paddr = pfn << PAGE_SHIFT; 164 + table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); 165 + if (!table) { 166 + up_read(&current->mm->mmap_sem); 167 + return -EFAULT; 168 + } 169 + ret = CMPXCHG(&table[index], orig_pte, new_pte); 170 + memunmap(table); 171 + up_read(&current->mm->mmap_sem); 172 + } 153 173 154 174 return (ret != orig_pte); 155 175 }

+61 -67

arch/x86/kvm/svm.c

··· 2091 2091 init_vmcb(svm); 2092 2092 2093 2093 kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy, true); 2094 - kvm_register_write(vcpu, VCPU_REGS_RDX, eax); 2094 + kvm_rdx_write(vcpu, eax); 2095 2095 2096 2096 if (kvm_vcpu_apicv_active(vcpu) && !init_event) 2097 2097 avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE); ··· 3071 3071 return false; 3072 3072 } 3073 3073 3074 - static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa, struct page **_page) 3075 - { 3076 - struct page *page; 3077 - 3078 - might_sleep(); 3079 - 3080 - page = kvm_vcpu_gfn_to_page(&svm->vcpu, gpa >> PAGE_SHIFT); 3081 - if (is_error_page(page)) 3082 - goto error; 3083 - 3084 - *_page = page; 3085 - 3086 - return kmap(page); 3087 - 3088 - error: 3089 - kvm_inject_gp(&svm->vcpu, 0); 3090 - 3091 - return NULL; 3092 - } 3093 - 3094 - static void nested_svm_unmap(struct page *page) 3095 - { 3096 - kunmap(page); 3097 - kvm_release_page_dirty(page); 3098 - } 3099 - 3100 3074 static int nested_svm_intercept_ioio(struct vcpu_svm *svm) 3101 3075 { 3102 3076 unsigned port, size, iopm_len; ··· 3273 3299 3274 3300 static int nested_svm_vmexit(struct vcpu_svm *svm) 3275 3301 { 3302 + int rc; 3276 3303 struct vmcb *nested_vmcb; 3277 3304 struct vmcb *hsave = svm->nested.hsave; 3278 3305 struct vmcb *vmcb = svm->vmcb; 3279 - struct page *page; 3306 + struct kvm_host_map map; 3280 3307 3281 3308 trace_kvm_nested_vmexit_inject(vmcb->control.exit_code, 3282 3309 vmcb->control.exit_info_1, ··· 3286 3311 vmcb->control.exit_int_info_err, 3287 3312 KVM_ISA_SVM); 3288 3313 3289 - nested_vmcb = nested_svm_map(svm, svm->nested.vmcb, &page); 3290 - if (!nested_vmcb) 3314 + rc = kvm_vcpu_map(&svm->vcpu, gfn_to_gpa(svm->nested.vmcb), &map); 3315 + if (rc) { 3316 + if (rc == -EINVAL) 3317 + kvm_inject_gp(&svm->vcpu, 0); 3291 3318 return 1; 3319 + } 3320 + 3321 + nested_vmcb = map.hva; 3292 3322 3293 3323 /* Exit Guest-Mode */ 3294 3324 leave_guest_mode(&svm->vcpu); ··· 3388 3408 } else { 3389 3409 (void)kvm_set_cr3(&svm->vcpu, hsave->save.cr3); 3390 3410 } 3391 - kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax); 3392 - kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp); 3393 - kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip); 3411 + kvm_rax_write(&svm->vcpu, hsave->save.rax); 3412 + kvm_rsp_write(&svm->vcpu, hsave->save.rsp); 3413 + kvm_rip_write(&svm->vcpu, hsave->save.rip); 3394 3414 svm->vmcb->save.dr7 = 0; 3395 3415 svm->vmcb->save.cpl = 0; 3396 3416 svm->vmcb->control.exit_int_info = 0; 3397 3417 3398 3418 mark_all_dirty(svm->vmcb); 3399 3419 3400 - nested_svm_unmap(page); 3420 + kvm_vcpu_unmap(&svm->vcpu, &map, true); 3401 3421 3402 3422 nested_svm_uninit_mmu_context(&svm->vcpu); 3403 3423 kvm_mmu_reset_context(&svm->vcpu); ··· 3463 3483 } 3464 3484 3465 3485 static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa, 3466 - struct vmcb *nested_vmcb, struct page *page) 3486 + struct vmcb *nested_vmcb, struct kvm_host_map *map) 3467 3487 { 3468 3488 if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF) 3469 3489 svm->vcpu.arch.hflags |= HF_HIF_MASK; ··· 3496 3516 kvm_mmu_reset_context(&svm->vcpu); 3497 3517 3498 3518 svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = nested_vmcb->save.cr2; 3499 - kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax); 3500 - kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp); 3501 - kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip); 3519 + kvm_rax_write(&svm->vcpu, nested_vmcb->save.rax); 3520 + kvm_rsp_write(&svm->vcpu, nested_vmcb->save.rsp); 3521 + kvm_rip_write(&svm->vcpu, nested_vmcb->save.rip); 3502 3522 3503 3523 /* In case we don't even reach vcpu_run, the fields are not updated */ 3504 3524 svm->vmcb->save.rax = nested_vmcb->save.rax; ··· 3547 3567 svm->vmcb->control.pause_filter_thresh = 3548 3568 nested_vmcb->control.pause_filter_thresh; 3549 3569 3550 - nested_svm_unmap(page); 3570 + kvm_vcpu_unmap(&svm->vcpu, map, true); 3551 3571 3552 3572 /* Enter Guest-Mode */ 3553 3573 enter_guest_mode(&svm->vcpu); ··· 3567 3587 3568 3588 static bool nested_svm_vmrun(struct vcpu_svm *svm) 3569 3589 { 3590 + int rc; 3570 3591 struct vmcb *nested_vmcb; 3571 3592 struct vmcb *hsave = svm->nested.hsave; 3572 3593 struct vmcb *vmcb = svm->vmcb; 3573 - struct page *page; 3594 + struct kvm_host_map map; 3574 3595 u64 vmcb_gpa; 3575 3596 3576 3597 vmcb_gpa = svm->vmcb->save.rax; 3577 3598 3578 - nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page); 3579 - if (!nested_vmcb) 3599 + rc = kvm_vcpu_map(&svm->vcpu, gfn_to_gpa(vmcb_gpa), &map); 3600 + if (rc) { 3601 + if (rc == -EINVAL) 3602 + kvm_inject_gp(&svm->vcpu, 0); 3580 3603 return false; 3604 + } 3605 + 3606 + nested_vmcb = map.hva; 3581 3607 3582 3608 if (!nested_vmcb_checks(nested_vmcb)) { 3583 3609 nested_vmcb->control.exit_code = SVM_EXIT_ERR; ··· 3591 3605 nested_vmcb->control.exit_info_1 = 0; 3592 3606 nested_vmcb->control.exit_info_2 = 0; 3593 3607 3594 - nested_svm_unmap(page); 3608 + kvm_vcpu_unmap(&svm->vcpu, &map, true); 3595 3609 3596 3610 return false; 3597 3611 } ··· 3635 3649 3636 3650 copy_vmcb_control_area(hsave, vmcb); 3637 3651 3638 - enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, page); 3652 + enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, &map); 3639 3653 3640 3654 return true; 3641 3655 } ··· 3659 3673 static int vmload_interception(struct vcpu_svm *svm) 3660 3674 { 3661 3675 struct vmcb *nested_vmcb; 3662 - struct page *page; 3676 + struct kvm_host_map map; 3663 3677 int ret; 3664 3678 3665 3679 if (nested_svm_check_permissions(svm)) 3666 3680 return 1; 3667 3681 3668 - nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page); 3669 - if (!nested_vmcb) 3682 + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map); 3683 + if (ret) { 3684 + if (ret == -EINVAL) 3685 + kvm_inject_gp(&svm->vcpu, 0); 3670 3686 return 1; 3687 + } 3688 + 3689 + nested_vmcb = map.hva; 3671 3690 3672 3691 svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; 3673 3692 ret = kvm_skip_emulated_instruction(&svm->vcpu); 3674 3693 3675 3694 nested_svm_vmloadsave(nested_vmcb, svm->vmcb); 3676 - nested_svm_unmap(page); 3695 + kvm_vcpu_unmap(&svm->vcpu, &map, true); 3677 3696 3678 3697 return ret; 3679 3698 } ··· 3686 3695 static int vmsave_interception(struct vcpu_svm *svm) 3687 3696 { 3688 3697 struct vmcb *nested_vmcb; 3689 - struct page *page; 3698 + struct kvm_host_map map; 3690 3699 int ret; 3691 3700 3692 3701 if (nested_svm_check_permissions(svm)) 3693 3702 return 1; 3694 3703 3695 - nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page); 3696 - if (!nested_vmcb) 3704 + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map); 3705 + if (ret) { 3706 + if (ret == -EINVAL) 3707 + kvm_inject_gp(&svm->vcpu, 0); 3697 3708 return 1; 3709 + } 3710 + 3711 + nested_vmcb = map.hva; 3698 3712 3699 3713 svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; 3700 3714 ret = kvm_skip_emulated_instruction(&svm->vcpu); 3701 3715 3702 3716 nested_svm_vmloadsave(svm->vmcb, nested_vmcb); 3703 - nested_svm_unmap(page); 3717 + kvm_vcpu_unmap(&svm->vcpu, &map, true); 3704 3718 3705 3719 return ret; 3706 3720 } ··· 3787 3791 { 3788 3792 struct kvm_vcpu *vcpu = &svm->vcpu; 3789 3793 3790 - trace_kvm_invlpga(svm->vmcb->save.rip, kvm_register_read(&svm->vcpu, VCPU_REGS_RCX), 3791 - kvm_register_read(&svm->vcpu, VCPU_REGS_RAX)); 3794 + trace_kvm_invlpga(svm->vmcb->save.rip, kvm_rcx_read(&svm->vcpu), 3795 + kvm_rax_read(&svm->vcpu)); 3792 3796 3793 3797 /* Let's treat INVLPGA the same as INVLPG (can be optimized!) */ 3794 - kvm_mmu_invlpg(vcpu, kvm_register_read(&svm->vcpu, VCPU_REGS_RAX)); 3798 + kvm_mmu_invlpg(vcpu, kvm_rax_read(&svm->vcpu)); 3795 3799 3796 3800 svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; 3797 3801 return kvm_skip_emulated_instruction(&svm->vcpu); ··· 3799 3803 3800 3804 static int skinit_interception(struct vcpu_svm *svm) 3801 3805 { 3802 - trace_kvm_skinit(svm->vmcb->save.rip, kvm_register_read(&svm->vcpu, VCPU_REGS_RAX)); 3806 + trace_kvm_skinit(svm->vmcb->save.rip, kvm_rax_read(&svm->vcpu)); 3803 3807 3804 3808 kvm_queue_exception(&svm->vcpu, UD_VECTOR); 3805 3809 return 1; ··· 3813 3817 static int xsetbv_interception(struct vcpu_svm *svm) 3814 3818 { 3815 3819 u64 new_bv = kvm_read_edx_eax(&svm->vcpu); 3816 - u32 index = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX); 3820 + u32 index = kvm_rcx_read(&svm->vcpu); 3817 3821 3818 3822 if (kvm_set_xcr(&svm->vcpu, index, new_bv) == 0) { 3819 3823 svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; ··· 4209 4213 4210 4214 static int rdmsr_interception(struct vcpu_svm *svm) 4211 4215 { 4212 - u32 ecx = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX); 4216 + u32 ecx = kvm_rcx_read(&svm->vcpu); 4213 4217 struct msr_data msr_info; 4214 4218 4215 4219 msr_info.index = ecx; ··· 4221 4225 } else { 4222 4226 trace_kvm_msr_read(ecx, msr_info.data); 4223 4227 4224 - kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, 4225 - msr_info.data & 0xffffffff); 4226 - kvm_register_write(&svm->vcpu, VCPU_REGS_RDX, 4227 - msr_info.data >> 32); 4228 + kvm_rax_write(&svm->vcpu, msr_info.data & 0xffffffff); 4229 + kvm_rdx_write(&svm->vcpu, msr_info.data >> 32); 4228 4230 svm->next_rip = kvm_rip_read(&svm->vcpu) + 2; 4229 4231 return kvm_skip_emulated_instruction(&svm->vcpu); 4230 4232 } ··· 4416 4422 static int wrmsr_interception(struct vcpu_svm *svm) 4417 4423 { 4418 4424 struct msr_data msr; 4419 - u32 ecx = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX); 4425 + u32 ecx = kvm_rcx_read(&svm->vcpu); 4420 4426 u64 data = kvm_read_edx_eax(&svm->vcpu); 4421 4427 4422 4428 msr.data = data; ··· 6230 6236 { 6231 6237 struct vcpu_svm *svm = to_svm(vcpu); 6232 6238 struct vmcb *nested_vmcb; 6233 - struct page *page; 6239 + struct kvm_host_map map; 6234 6240 u64 guest; 6235 6241 u64 vmcb; 6236 6242 ··· 6238 6244 vmcb = GET_SMSTATE(u64, smstate, 0x7ee0); 6239 6245 6240 6246 if (guest) { 6241 - nested_vmcb = nested_svm_map(svm, vmcb, &page); 6242 - if (!nested_vmcb) 6247 + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb), &map) == -EINVAL) 6243 6248 return 1; 6244 - enter_svm_guest_mode(svm, vmcb, nested_vmcb, page); 6249 + nested_vmcb = map.hva; 6250 + enter_svm_guest_mode(svm, vmcb, nested_vmcb, &map); 6245 6251 } 6246 6252 return 0; 6247 6253 }

+2

arch/x86/kvm/vmx/capabilities.h

··· 2 2 #ifndef __KVM_X86_VMX_CAPS_H 3 3 #define __KVM_X86_VMX_CAPS_H 4 4 5 + #include <asm/vmx.h> 6 + 5 7 #include "lapic.h" 6 8 7 9 extern bool __read_mostly enable_vpid;

+164 -198

arch/x86/kvm/vmx/nested.c

··· 193 193 if (!vmx->nested.hv_evmcs) 194 194 return; 195 195 196 - kunmap(vmx->nested.hv_evmcs_page); 197 - kvm_release_page_dirty(vmx->nested.hv_evmcs_page); 196 + kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true); 198 197 vmx->nested.hv_evmcs_vmptr = -1ull; 199 - vmx->nested.hv_evmcs_page = NULL; 200 198 vmx->nested.hv_evmcs = NULL; 201 199 } 202 200 ··· 227 229 kvm_release_page_dirty(vmx->nested.apic_access_page); 228 230 vmx->nested.apic_access_page = NULL; 229 231 } 230 - if (vmx->nested.virtual_apic_page) { 231 - kvm_release_page_dirty(vmx->nested.virtual_apic_page); 232 - vmx->nested.virtual_apic_page = NULL; 233 - } 234 - if (vmx->nested.pi_desc_page) { 235 - kunmap(vmx->nested.pi_desc_page); 236 - kvm_release_page_dirty(vmx->nested.pi_desc_page); 237 - vmx->nested.pi_desc_page = NULL; 238 - vmx->nested.pi_desc = NULL; 239 - } 232 + kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true); 233 + kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); 234 + vmx->nested.pi_desc = NULL; 240 235 241 236 kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); 242 237 ··· 510 519 struct vmcs12 *vmcs12) 511 520 { 512 521 int msr; 513 - struct page *page; 514 522 unsigned long *msr_bitmap_l1; 515 523 unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; 516 - /* 517 - * pred_cmd & spec_ctrl are trying to verify two things: 518 - * 519 - * 1. L0 gave a permission to L1 to actually passthrough the MSR. This 520 - * ensures that we do not accidentally generate an L02 MSR bitmap 521 - * from the L12 MSR bitmap that is too permissive. 522 - * 2. That L1 or L2s have actually used the MSR. This avoids 523 - * unnecessarily merging of the bitmap if the MSR is unused. This 524 - * works properly because we only update the L01 MSR bitmap lazily. 525 - * So even if L0 should pass L1 these MSRs, the L01 bitmap is only 526 - * updated to reflect this when L1 (or its L2s) actually write to 527 - * the MSR. 528 - */ 529 - bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); 530 - bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL); 524 + struct kvm_host_map *map = &to_vmx(vcpu)->nested.msr_bitmap_map; 531 525 532 526 /* Nothing to do if the MSR bitmap is not in use. */ 533 527 if (!cpu_has_vmx_msr_bitmap() || 534 528 !nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) 535 529 return false; 536 530 537 - if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && 538 - !pred_cmd && !spec_ctrl) 531 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), map)) 539 532 return false; 540 533 541 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); 542 - if (is_error_page(page)) 543 - return false; 544 - 545 - msr_bitmap_l1 = (unsigned long *)kmap(page); 534 + msr_bitmap_l1 = (unsigned long *)map->hva; 546 535 547 536 /* 548 537 * To keep the control flow simple, pay eight 8-byte writes (sixteen ··· 563 592 } 564 593 } 565 594 566 - if (spec_ctrl) 595 + /* KVM unconditionally exposes the FS/GS base MSRs to L1. */ 596 + nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0, 597 + MSR_FS_BASE, MSR_TYPE_RW); 598 + 599 + nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0, 600 + MSR_GS_BASE, MSR_TYPE_RW); 601 + 602 + nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0, 603 + MSR_KERNEL_GS_BASE, MSR_TYPE_RW); 604 + 605 + /* 606 + * Checking the L0->L1 bitmap is trying to verify two things: 607 + * 608 + * 1. L0 gave a permission to L1 to actually passthrough the MSR. This 609 + * ensures that we do not accidentally generate an L02 MSR bitmap 610 + * from the L12 MSR bitmap that is too permissive. 611 + * 2. That L1 or L2s have actually used the MSR. This avoids 612 + * unnecessarily merging of the bitmap if the MSR is unused. This 613 + * works properly because we only update the L01 MSR bitmap lazily. 614 + * So even if L0 should pass L1 these MSRs, the L01 bitmap is only 615 + * updated to reflect this when L1 (or its L2s) actually write to 616 + * the MSR. 617 + */ 618 + if (!msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL)) 567 619 nested_vmx_disable_intercept_for_msr( 568 620 msr_bitmap_l1, msr_bitmap_l0, 569 621 MSR_IA32_SPEC_CTRL, 570 622 MSR_TYPE_R | MSR_TYPE_W); 571 623 572 - if (pred_cmd) 624 + if (!msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD)) 573 625 nested_vmx_disable_intercept_for_msr( 574 626 msr_bitmap_l1, msr_bitmap_l0, 575 627 MSR_IA32_PRED_CMD, 576 628 MSR_TYPE_W); 577 629 578 - kunmap(page); 579 - kvm_release_page_clean(page); 630 + kvm_vcpu_unmap(vcpu, &to_vmx(vcpu)->nested.msr_bitmap_map, false); 580 631 581 632 return true; 582 633 } ··· 606 613 static void nested_cache_shadow_vmcs12(struct kvm_vcpu *vcpu, 607 614 struct vmcs12 *vmcs12) 608 615 { 616 + struct kvm_host_map map; 609 617 struct vmcs12 *shadow; 610 - struct page *page; 611 618 612 619 if (!nested_cpu_has_shadow_vmcs(vmcs12) || 613 620 vmcs12->vmcs_link_pointer == -1ull) 614 621 return; 615 622 616 623 shadow = get_shadow_vmcs12(vcpu); 617 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->vmcs_link_pointer); 618 624 619 - memcpy(shadow, kmap(page), VMCS12_SIZE); 625 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->vmcs_link_pointer), &map)) 626 + return; 620 627 621 - kunmap(page); 622 - kvm_release_page_clean(page); 628 + memcpy(shadow, map.hva, VMCS12_SIZE); 629 + kvm_vcpu_unmap(vcpu, &map, false); 623 630 } 624 631 625 632 static void nested_flush_cached_shadow_vmcs12(struct kvm_vcpu *vcpu, ··· 923 930 if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) { 924 931 if (!nested_cr3_valid(vcpu, cr3)) { 925 932 *entry_failure_code = ENTRY_FAIL_DEFAULT; 926 - return 1; 933 + return -EINVAL; 927 934 } 928 935 929 936 /* ··· 934 941 !nested_ept) { 935 942 if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) { 936 943 *entry_failure_code = ENTRY_FAIL_PDPTE; 937 - return 1; 944 + return -EINVAL; 938 945 } 939 946 } 940 947 } ··· 1787 1794 1788 1795 nested_release_evmcs(vcpu); 1789 1796 1790 - vmx->nested.hv_evmcs_page = kvm_vcpu_gpa_to_page( 1791 - vcpu, assist_page.current_nested_vmcs); 1792 - 1793 - if (unlikely(is_error_page(vmx->nested.hv_evmcs_page))) 1797 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs), 1798 + &vmx->nested.hv_evmcs_map)) 1794 1799 return 0; 1795 1800 1796 - vmx->nested.hv_evmcs = kmap(vmx->nested.hv_evmcs_page); 1801 + vmx->nested.hv_evmcs = vmx->nested.hv_evmcs_map.hva; 1797 1802 1798 1803 /* 1799 1804 * Currently, KVM only supports eVMCS version 1 ··· 2364 2373 */ 2365 2374 if (vmx->emulation_required) { 2366 2375 *entry_failure_code = ENTRY_FAIL_DEFAULT; 2367 - return 1; 2376 + return -EINVAL; 2368 2377 } 2369 2378 2370 2379 /* Shadow page tables on either EPT or shadow page tables. */ 2371 2380 if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), 2372 2381 entry_failure_code)) 2373 - return 1; 2382 + return -EINVAL; 2374 2383 2375 2384 if (!enable_ept) 2376 2385 vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested; 2377 2386 2378 - kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp); 2379 - kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->guest_rip); 2387 + kvm_rsp_write(vcpu, vmcs12->guest_rsp); 2388 + kvm_rip_write(vcpu, vmcs12->guest_rip); 2380 2389 return 0; 2381 2390 } 2382 2391 ··· 2580 2589 return 0; 2581 2590 } 2582 2591 2583 - /* 2584 - * Checks related to Host Control Registers and MSRs 2585 - */ 2586 - static int nested_check_host_control_regs(struct kvm_vcpu *vcpu, 2587 - struct vmcs12 *vmcs12) 2592 + static int nested_vmx_check_controls(struct kvm_vcpu *vcpu, 2593 + struct vmcs12 *vmcs12) 2594 + { 2595 + if (nested_check_vm_execution_controls(vcpu, vmcs12) || 2596 + nested_check_vm_exit_controls(vcpu, vmcs12) || 2597 + nested_check_vm_entry_controls(vcpu, vmcs12)) 2598 + return -EINVAL; 2599 + 2600 + return 0; 2601 + } 2602 + 2603 + static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, 2604 + struct vmcs12 *vmcs12) 2588 2605 { 2589 2606 bool ia32e; 2590 2607 ··· 2603 2604 2604 2605 if (is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu) || 2605 2606 is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)) 2607 + return -EINVAL; 2608 + 2609 + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) && 2610 + !kvm_pat_valid(vmcs12->host_ia32_pat)) 2606 2611 return -EINVAL; 2607 2612 2608 2613 /* ··· 2627 2624 return 0; 2628 2625 } 2629 2626 2627 + static int nested_vmx_check_vmcs_link_ptr(struct kvm_vcpu *vcpu, 2628 + struct vmcs12 *vmcs12) 2629 + { 2630 + int r = 0; 2631 + struct vmcs12 *shadow; 2632 + struct kvm_host_map map; 2633 + 2634 + if (vmcs12->vmcs_link_pointer == -1ull) 2635 + return 0; 2636 + 2637 + if (!page_address_valid(vcpu, vmcs12->vmcs_link_pointer)) 2638 + return -EINVAL; 2639 + 2640 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->vmcs_link_pointer), &map)) 2641 + return -EINVAL; 2642 + 2643 + shadow = map.hva; 2644 + 2645 + if (shadow->hdr.revision_id != VMCS12_REVISION || 2646 + shadow->hdr.shadow_vmcs != nested_cpu_has_shadow_vmcs(vmcs12)) 2647 + r = -EINVAL; 2648 + 2649 + kvm_vcpu_unmap(vcpu, &map, false); 2650 + return r; 2651 + } 2652 + 2630 2653 /* 2631 2654 * Checks related to Guest Non-register State 2632 2655 */ ··· 2665 2636 return 0; 2666 2637 } 2667 2638 2668 - static int nested_vmx_check_vmentry_prereqs(struct kvm_vcpu *vcpu, 2669 - struct vmcs12 *vmcs12) 2670 - { 2671 - if (nested_check_vm_execution_controls(vcpu, vmcs12) || 2672 - nested_check_vm_exit_controls(vcpu, vmcs12) || 2673 - nested_check_vm_entry_controls(vcpu, vmcs12)) 2674 - return VMXERR_ENTRY_INVALID_CONTROL_FIELD; 2675 - 2676 - if (nested_check_host_control_regs(vcpu, vmcs12)) 2677 - return VMXERR_ENTRY_INVALID_HOST_STATE_FIELD; 2678 - 2679 - if (nested_check_guest_non_reg_state(vmcs12)) 2680 - return VMXERR_ENTRY_INVALID_CONTROL_FIELD; 2681 - 2682 - return 0; 2683 - } 2684 - 2685 - static int nested_vmx_check_vmcs_link_ptr(struct kvm_vcpu *vcpu, 2686 - struct vmcs12 *vmcs12) 2687 - { 2688 - int r; 2689 - struct page *page; 2690 - struct vmcs12 *shadow; 2691 - 2692 - if (vmcs12->vmcs_link_pointer == -1ull) 2693 - return 0; 2694 - 2695 - if (!page_address_valid(vcpu, vmcs12->vmcs_link_pointer)) 2696 - return -EINVAL; 2697 - 2698 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->vmcs_link_pointer); 2699 - if (is_error_page(page)) 2700 - return -EINVAL; 2701 - 2702 - r = 0; 2703 - shadow = kmap(page); 2704 - if (shadow->hdr.revision_id != VMCS12_REVISION || 2705 - shadow->hdr.shadow_vmcs != nested_cpu_has_shadow_vmcs(vmcs12)) 2706 - r = -EINVAL; 2707 - kunmap(page); 2708 - kvm_release_page_clean(page); 2709 - return r; 2710 - } 2711 - 2712 - static int nested_vmx_check_vmentry_postreqs(struct kvm_vcpu *vcpu, 2713 - struct vmcs12 *vmcs12, 2714 - u32 *exit_qual) 2639 + static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, 2640 + struct vmcs12 *vmcs12, 2641 + u32 *exit_qual) 2715 2642 { 2716 2643 bool ia32e; 2717 2644 ··· 2675 2690 2676 2691 if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) || 2677 2692 !nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)) 2678 - return 1; 2693 + return -EINVAL; 2694 + 2695 + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && 2696 + !kvm_pat_valid(vmcs12->guest_ia32_pat)) 2697 + return -EINVAL; 2679 2698 2680 2699 if (nested_vmx_check_vmcs_link_ptr(vcpu, vmcs12)) { 2681 2700 *exit_qual = ENTRY_FAIL_VMCS_LINK_PTR; 2682 - return 1; 2701 + return -EINVAL; 2683 2702 } 2684 2703 2685 2704 /* ··· 2702 2713 ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) || 2703 2714 ((vmcs12->guest_cr0 & X86_CR0_PG) && 2704 2715 ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME))) 2705 - return 1; 2716 + return -EINVAL; 2706 2717 } 2707 2718 2708 2719 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS) && 2709 - (is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) || 2710 - (vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))) 2711 - return 1; 2720 + (is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) || 2721 + (vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))) 2722 + return -EINVAL; 2723 + 2724 + if (nested_check_guest_non_reg_state(vmcs12)) 2725 + return -EINVAL; 2712 2726 2713 2727 return 0; 2714 2728 } ··· 2824 2832 { 2825 2833 struct vmcs12 *vmcs12 = get_vmcs12(vcpu); 2826 2834 struct vcpu_vmx *vmx = to_vmx(vcpu); 2835 + struct kvm_host_map *map; 2827 2836 struct page *page; 2828 2837 u64 hpa; 2829 2838 ··· 2857 2864 } 2858 2865 2859 2866 if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { 2860 - if (vmx->nested.virtual_apic_page) { /* shouldn't happen */ 2861 - kvm_release_page_dirty(vmx->nested.virtual_apic_page); 2862 - vmx->nested.virtual_apic_page = NULL; 2863 - } 2864 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->virtual_apic_page_addr); 2867 + map = &vmx->nested.virtual_apic_map; 2865 2868 2866 2869 /* 2867 2870 * If translation failed, VM entry will fail because 2868 2871 * prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull. 2869 2872 */ 2870 - if (!is_error_page(page)) { 2871 - vmx->nested.virtual_apic_page = page; 2872 - hpa = page_to_phys(vmx->nested.virtual_apic_page); 2873 - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa); 2873 + if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->virtual_apic_page_addr), map)) { 2874 + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, pfn_to_hpa(map->pfn)); 2874 2875 } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) && 2875 2876 nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) && 2876 2877 !nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { ··· 2885 2898 } 2886 2899 2887 2900 if (nested_cpu_has_posted_intr(vmcs12)) { 2888 - if (vmx->nested.pi_desc_page) { /* shouldn't happen */ 2889 - kunmap(vmx->nested.pi_desc_page); 2890 - kvm_release_page_dirty(vmx->nested.pi_desc_page); 2891 - vmx->nested.pi_desc_page = NULL; 2892 - vmx->nested.pi_desc = NULL; 2893 - vmcs_write64(POSTED_INTR_DESC_ADDR, -1ull); 2901 + map = &vmx->nested.pi_desc_map; 2902 + 2903 + if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->posted_intr_desc_addr), map)) { 2904 + vmx->nested.pi_desc = 2905 + (struct pi_desc *)(((void *)map->hva) + 2906 + offset_in_page(vmcs12->posted_intr_desc_addr)); 2907 + vmcs_write64(POSTED_INTR_DESC_ADDR, 2908 + pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr)); 2894 2909 } 2895 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->posted_intr_desc_addr); 2896 - if (is_error_page(page)) 2897 - return; 2898 - vmx->nested.pi_desc_page = page; 2899 - vmx->nested.pi_desc = kmap(vmx->nested.pi_desc_page); 2900 - vmx->nested.pi_desc = 2901 - (struct pi_desc *)((void *)vmx->nested.pi_desc + 2902 - (unsigned long)(vmcs12->posted_intr_desc_addr & 2903 - (PAGE_SIZE - 1))); 2904 - vmcs_write64(POSTED_INTR_DESC_ADDR, 2905 - page_to_phys(vmx->nested.pi_desc_page) + 2906 - (unsigned long)(vmcs12->posted_intr_desc_addr & 2907 - (PAGE_SIZE - 1))); 2908 2910 } 2909 2911 if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12)) 2910 2912 vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL, ··· 2976 3000 return -1; 2977 3001 } 2978 3002 2979 - if (nested_vmx_check_vmentry_postreqs(vcpu, vmcs12, &exit_qual)) 3003 + if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual)) 2980 3004 goto vmentry_fail_vmexit; 2981 3005 } 2982 3006 ··· 3121 3145 launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS 3122 3146 : VMXERR_VMRESUME_NONLAUNCHED_VMCS); 3123 3147 3124 - ret = nested_vmx_check_vmentry_prereqs(vcpu, vmcs12); 3125 - if (ret) 3126 - return nested_vmx_failValid(vcpu, ret); 3148 + if (nested_vmx_check_controls(vcpu, vmcs12)) 3149 + return nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); 3150 + 3151 + if (nested_vmx_check_host_state(vcpu, vmcs12)) 3152 + return nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_HOST_STATE_FIELD); 3127 3153 3128 3154 /* 3129 3155 * We're finally done with prerequisite checking, and can start with ··· 3288 3310 3289 3311 max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256); 3290 3312 if (max_irr != 256) { 3291 - vapic_page = kmap(vmx->nested.virtual_apic_page); 3313 + vapic_page = vmx->nested.virtual_apic_map.hva; 3314 + if (!vapic_page) 3315 + return; 3316 + 3292 3317 __kvm_apic_update_irr(vmx->nested.pi_desc->pir, 3293 3318 vapic_page, &max_irr); 3294 - kunmap(vmx->nested.virtual_apic_page); 3295 - 3296 3319 status = vmcs_read16(GUEST_INTR_STATUS); 3297 3320 if ((u8)max_irr > ((u8)status & 0xff)) { 3298 3321 status &= ~0xff; ··· 3404 3425 vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12); 3405 3426 vmcs12->guest_cr4 = vmcs12_guest_cr4(vcpu, vmcs12); 3406 3427 3407 - vmcs12->guest_rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); 3408 - vmcs12->guest_rip = kvm_register_read(vcpu, VCPU_REGS_RIP); 3428 + vmcs12->guest_rsp = kvm_rsp_read(vcpu); 3429 + vmcs12->guest_rip = kvm_rip_read(vcpu); 3409 3430 vmcs12->guest_rflags = vmcs_readl(GUEST_RFLAGS); 3410 3431 3411 3432 vmcs12->guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR); ··· 3588 3609 vcpu->arch.efer &= ~(EFER_LMA | EFER_LME); 3589 3610 vmx_set_efer(vcpu, vcpu->arch.efer); 3590 3611 3591 - kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp); 3592 - kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip); 3612 + kvm_rsp_write(vcpu, vmcs12->host_rsp); 3613 + kvm_rip_write(vcpu, vmcs12->host_rip); 3593 3614 vmx_set_rflags(vcpu, X86_EFLAGS_FIXED); 3594 3615 vmx_set_interrupt_shadow(vcpu, 0); 3595 3616 ··· 3934 3955 kvm_release_page_dirty(vmx->nested.apic_access_page); 3935 3956 vmx->nested.apic_access_page = NULL; 3936 3957 } 3937 - if (vmx->nested.virtual_apic_page) { 3938 - kvm_release_page_dirty(vmx->nested.virtual_apic_page); 3939 - vmx->nested.virtual_apic_page = NULL; 3940 - } 3941 - if (vmx->nested.pi_desc_page) { 3942 - kunmap(vmx->nested.pi_desc_page); 3943 - kvm_release_page_dirty(vmx->nested.pi_desc_page); 3944 - vmx->nested.pi_desc_page = NULL; 3945 - vmx->nested.pi_desc = NULL; 3946 - } 3958 + kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true); 3959 + kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); 3960 + vmx->nested.pi_desc = NULL; 3947 3961 3948 3962 /* 3949 3963 * We are now running in L2, mmu_notifier will force to reload the ··· 4232 4260 { 4233 4261 int ret; 4234 4262 gpa_t vmptr; 4235 - struct page *page; 4263 + uint32_t revision; 4236 4264 struct vcpu_vmx *vmx = to_vmx(vcpu); 4237 4265 const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED 4238 4266 | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; ··· 4278 4306 * Note - IA32_VMX_BASIC[48] will never be 1 for the nested case; 4279 4307 * which replaces physical address width with 32 4280 4308 */ 4281 - if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu))) 4309 + if (!page_address_valid(vcpu, vmptr)) 4282 4310 return nested_vmx_failInvalid(vcpu); 4283 4311 4284 - page = kvm_vcpu_gpa_to_page(vcpu, vmptr); 4285 - if (is_error_page(page)) 4312 + if (kvm_read_guest(vcpu->kvm, vmptr, &revision, sizeof(revision)) || 4313 + revision != VMCS12_REVISION) 4286 4314 return nested_vmx_failInvalid(vcpu); 4287 - 4288 - if (*(u32 *)kmap(page) != VMCS12_REVISION) { 4289 - kunmap(page); 4290 - kvm_release_page_clean(page); 4291 - return nested_vmx_failInvalid(vcpu); 4292 - } 4293 - kunmap(page); 4294 - kvm_release_page_clean(page); 4295 4315 4296 4316 vmx->nested.vmxon_ptr = vmptr; 4297 4317 ret = enter_vmx_operation(vcpu); ··· 4341 4377 if (nested_vmx_get_vmptr(vcpu, &vmptr)) 4342 4378 return 1; 4343 4379 4344 - if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu))) 4380 + if (!page_address_valid(vcpu, vmptr)) 4345 4381 return nested_vmx_failValid(vcpu, 4346 4382 VMXERR_VMCLEAR_INVALID_ADDRESS); 4347 4383 ··· 4349 4385 return nested_vmx_failValid(vcpu, 4350 4386 VMXERR_VMCLEAR_VMXON_POINTER); 4351 4387 4352 - if (vmx->nested.hv_evmcs_page) { 4388 + if (vmx->nested.hv_evmcs_map.hva) { 4353 4389 if (vmptr == vmx->nested.hv_evmcs_vmptr) 4354 4390 nested_release_evmcs(vcpu); 4355 4391 } else { ··· 4548 4584 if (nested_vmx_get_vmptr(vcpu, &vmptr)) 4549 4585 return 1; 4550 4586 4551 - if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu))) 4587 + if (!page_address_valid(vcpu, vmptr)) 4552 4588 return nested_vmx_failValid(vcpu, 4553 4589 VMXERR_VMPTRLD_INVALID_ADDRESS); 4554 4590 ··· 4561 4597 return 1; 4562 4598 4563 4599 if (vmx->nested.current_vmptr != vmptr) { 4600 + struct kvm_host_map map; 4564 4601 struct vmcs12 *new_vmcs12; 4565 - struct page *page; 4566 4602 4567 - page = kvm_vcpu_gpa_to_page(vcpu, vmptr); 4568 - if (is_error_page(page)) { 4603 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmptr), &map)) { 4569 4604 /* 4570 4605 * Reads from an unbacked page return all 1s, 4571 4606 * which means that the 32 bits located at the ··· 4574 4611 return nested_vmx_failValid(vcpu, 4575 4612 VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID); 4576 4613 } 4577 - new_vmcs12 = kmap(page); 4614 + 4615 + new_vmcs12 = map.hva; 4616 + 4578 4617 if (new_vmcs12->hdr.revision_id != VMCS12_REVISION || 4579 4618 (new_vmcs12->hdr.shadow_vmcs && 4580 4619 !nested_cpu_has_vmx_shadow_vmcs(vcpu))) { 4581 - kunmap(page); 4582 - kvm_release_page_clean(page); 4620 + kvm_vcpu_unmap(vcpu, &map, false); 4583 4621 return nested_vmx_failValid(vcpu, 4584 4622 VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID); 4585 4623 } ··· 4592 4628 * cached. 4593 4629 */ 4594 4630 memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE); 4595 - kunmap(page); 4596 - kvm_release_page_clean(page); 4631 + kvm_vcpu_unmap(vcpu, &map, false); 4597 4632 4598 4633 set_current_vmptr(vmx, vmptr); 4599 4634 } ··· 4767 4804 static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu, 4768 4805 struct vmcs12 *vmcs12) 4769 4806 { 4770 - u32 index = vcpu->arch.regs[VCPU_REGS_RCX]; 4807 + u32 index = kvm_rcx_read(vcpu); 4771 4808 u64 address; 4772 4809 bool accessed_dirty; 4773 4810 struct kvm_mmu *mmu = vcpu->arch.walk_mmu; ··· 4813 4850 { 4814 4851 struct vcpu_vmx *vmx = to_vmx(vcpu); 4815 4852 struct vmcs12 *vmcs12; 4816 - u32 function = vcpu->arch.regs[VCPU_REGS_RAX]; 4853 + u32 function = kvm_rax_read(vcpu); 4817 4854 4818 4855 /* 4819 4856 * VMFUNC is only supported for nested guests, but we always enable the ··· 4899 4936 static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu, 4900 4937 struct vmcs12 *vmcs12, u32 exit_reason) 4901 4938 { 4902 - u32 msr_index = vcpu->arch.regs[VCPU_REGS_RCX]; 4939 + u32 msr_index = kvm_rcx_read(vcpu); 4903 4940 gpa_t bitmap; 4904 4941 4905 4942 if (!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) ··· 5336 5373 if (kvm_state->format != 0) 5337 5374 return -EINVAL; 5338 5375 5339 - if (kvm_state->flags & KVM_STATE_NESTED_EVMCS) 5340 - nested_enable_evmcs(vcpu, NULL); 5341 - 5342 5376 if (!nested_vmx_allowed(vcpu)) 5343 5377 return kvm_state->vmx.vmxon_pa == -1ull ? 0 : -EINVAL; 5344 5378 ··· 5376 5416 vmx_leave_nested(vcpu); 5377 5417 if (kvm_state->vmx.vmxon_pa == -1ull) 5378 5418 return 0; 5419 + 5420 + if (kvm_state->flags & KVM_STATE_NESTED_EVMCS) 5421 + nested_enable_evmcs(vcpu, NULL); 5379 5422 5380 5423 vmx->nested.vmxon_ptr = kvm_state->vmx.vmxon_pa; 5381 5424 ret = enter_vmx_operation(vcpu); ··· 5423 5460 if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) 5424 5461 return 0; 5425 5462 5426 - vmx->nested.nested_run_pending = 5427 - !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING); 5428 - 5429 5463 if (nested_cpu_has_shadow_vmcs(vmcs12) && 5430 5464 vmcs12->vmcs_link_pointer != -1ull) { 5431 5465 struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu); ··· 5440 5480 return -EINVAL; 5441 5481 } 5442 5482 5443 - if (nested_vmx_check_vmentry_prereqs(vcpu, vmcs12) || 5444 - nested_vmx_check_vmentry_postreqs(vcpu, vmcs12, &exit_qual)) 5483 + if (nested_vmx_check_controls(vcpu, vmcs12) || 5484 + nested_vmx_check_host_state(vcpu, vmcs12) || 5485 + nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual)) 5445 5486 return -EINVAL; 5446 5487 5447 5488 vmx->nested.dirty_vmcs12 = true; 5489 + vmx->nested.nested_run_pending = 5490 + !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING); 5491 + 5448 5492 ret = nested_vmx_enter_non_root_mode(vcpu, false); 5449 - if (ret) 5493 + if (ret) { 5494 + vmx->nested.nested_run_pending = 0; 5450 5495 return -EINVAL; 5496 + } 5451 5497 5452 5498 return 0; 5453 5499 }

+7 -1

arch/x86/kvm/vmx/pmu_intel.c

··· 227 227 } 228 228 break; 229 229 case MSR_CORE_PERF_GLOBAL_OVF_CTRL: 230 - if (!(data & (pmu->global_ctrl_mask & ~(3ull<<62)))) { 230 + if (!(data & pmu->global_ovf_ctrl_mask)) { 231 231 if (!msr_info->host_initiated) 232 232 pmu->global_status &= ~data; 233 233 pmu->global_ovf_ctrl = data; ··· 297 297 pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) | 298 298 (((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED); 299 299 pmu->global_ctrl_mask = ~pmu->global_ctrl; 300 + pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask 301 + & ~(MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF | 302 + MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD); 303 + if (kvm_x86_ops->pt_supported()) 304 + pmu->global_ovf_ctrl_mask &= 305 + ~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI; 300 306 301 307 entry = kvm_find_cpuid_entry(vcpu, 7, 0); 302 308 if (entry &&

+36 -54

arch/x86/kvm/vmx/vmx.c

··· 1692 1692 case MSR_IA32_SYSENTER_ESP: 1693 1693 msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP); 1694 1694 break; 1695 + case MSR_IA32_POWER_CTL: 1696 + msr_info->data = vmx->msr_ia32_power_ctl; 1697 + break; 1695 1698 case MSR_IA32_BNDCFGS: 1696 1699 if (!kvm_mpx_supported() || 1697 1700 (!msr_info->host_initiated && ··· 1825 1822 case MSR_IA32_SYSENTER_ESP: 1826 1823 vmcs_writel(GUEST_SYSENTER_ESP, data); 1827 1824 break; 1825 + case MSR_IA32_POWER_CTL: 1826 + vmx->msr_ia32_power_ctl = data; 1827 + break; 1828 1828 case MSR_IA32_BNDCFGS: 1829 1829 if (!kvm_mpx_supported() || 1830 1830 (!msr_info->host_initiated && ··· 1897 1891 break; 1898 1892 case MSR_IA32_CR_PAT: 1899 1893 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { 1900 - if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) 1894 + if (!kvm_pat_valid(data)) 1901 1895 return 1; 1902 1896 vmcs_write64(GUEST_IA32_PAT, data); 1903 1897 vcpu->arch.pat = data; ··· 2294 2288 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; 2295 2289 #endif 2296 2290 opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | 2297 - VM_EXIT_SAVE_IA32_PAT | 2298 2291 VM_EXIT_LOAD_IA32_PAT | 2299 2292 VM_EXIT_LOAD_IA32_EFER | 2300 2293 VM_EXIT_CLEAR_BNDCFGS | ··· 3624 3619 3625 3620 if (WARN_ON_ONCE(!is_guest_mode(vcpu)) || 3626 3621 !nested_cpu_has_vid(get_vmcs12(vcpu)) || 3627 - WARN_ON_ONCE(!vmx->nested.virtual_apic_page)) 3622 + WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) 3628 3623 return false; 3629 3624 3630 3625 rvi = vmx_get_rvi(); 3631 3626 3632 - vapic_page = kmap(vmx->nested.virtual_apic_page); 3627 + vapic_page = vmx->nested.virtual_apic_map.hva; 3633 3628 vppr = *((u32 *)(vapic_page + APIC_PROCPRI)); 3634 - kunmap(vmx->nested.virtual_apic_page); 3635 3629 3636 3630 return ((rvi & 0xf0) > (vppr & 0xf0)); 3637 3631 } ··· 4831 4827 4832 4828 static int handle_rdmsr(struct kvm_vcpu *vcpu) 4833 4829 { 4834 - u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX]; 4830 + u32 ecx = kvm_rcx_read(vcpu); 4835 4831 struct msr_data msr_info; 4836 4832 4837 4833 msr_info.index = ecx; ··· 4844 4840 4845 4841 trace_kvm_msr_read(ecx, msr_info.data); 4846 4842 4847 - /* FIXME: handling of bits 32:63 of rax, rdx */ 4848 - vcpu->arch.regs[VCPU_REGS_RAX] = msr_info.data & -1u; 4849 - vcpu->arch.regs[VCPU_REGS_RDX] = (msr_info.data >> 32) & -1u; 4843 + kvm_rax_write(vcpu, msr_info.data & -1u); 4844 + kvm_rdx_write(vcpu, (msr_info.data >> 32) & -1u); 4850 4845 return kvm_skip_emulated_instruction(vcpu); 4851 4846 } 4852 4847 4853 4848 static int handle_wrmsr(struct kvm_vcpu *vcpu) 4854 4849 { 4855 4850 struct msr_data msr; 4856 - u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX]; 4857 - u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u) 4858 - | ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32); 4851 + u32 ecx = kvm_rcx_read(vcpu); 4852 + u64 data = kvm_read_edx_eax(vcpu); 4859 4853 4860 4854 msr.data = data; 4861 4855 msr.index = ecx; ··· 4924 4922 static int handle_xsetbv(struct kvm_vcpu *vcpu) 4925 4923 { 4926 4924 u64 new_bv = kvm_read_edx_eax(vcpu); 4927 - u32 index = kvm_register_read(vcpu, VCPU_REGS_RCX); 4925 + u32 index = kvm_rcx_read(vcpu); 4928 4926 4929 4927 if (kvm_set_xcr(vcpu, index, new_bv) == 0) 4930 4928 return kvm_skip_emulated_instruction(vcpu); ··· 5725 5723 if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING) 5726 5724 pr_err("TSC Multiplier = 0x%016llx\n", 5727 5725 vmcs_read64(TSC_MULTIPLIER)); 5728 - if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW) 5729 - pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD)); 5726 + if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW) { 5727 + if (secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) { 5728 + u16 status = vmcs_read16(GUEST_INTR_STATUS); 5729 + pr_err("SVI|RVI = %02x|%02x ", status >> 8, status & 0xff); 5730 + } 5731 + pr_cont("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD)); 5732 + if (secondary_exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) 5733 + pr_err("APIC-access addr = 0x%016llx ", vmcs_read64(APIC_ACCESS_ADDR)); 5734 + pr_cont("virt-APIC addr = 0x%016llx\n", vmcs_read64(VIRTUAL_APIC_PAGE_ADDR)); 5735 + } 5730 5736 if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR) 5731 5737 pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV)); 5732 5738 if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT)) ··· 6866 6856 } 6867 6857 } 6868 6858 6869 - static bool guest_cpuid_has_pmu(struct kvm_vcpu *vcpu) 6870 - { 6871 - struct kvm_cpuid_entry2 *entry; 6872 - union cpuid10_eax eax; 6873 - 6874 - entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); 6875 - if (!entry) 6876 - return false; 6877 - 6878 - eax.full = entry->eax; 6879 - return (eax.split.version_id > 0); 6880 - } 6881 - 6882 - static void nested_vmx_procbased_ctls_update(struct kvm_vcpu *vcpu) 6883 - { 6884 - struct vcpu_vmx *vmx = to_vmx(vcpu); 6885 - bool pmu_enabled = guest_cpuid_has_pmu(vcpu); 6886 - 6887 - if (pmu_enabled) 6888 - vmx->nested.msrs.procbased_ctls_high |= CPU_BASED_RDPMC_EXITING; 6889 - else 6890 - vmx->nested.msrs.procbased_ctls_high &= ~CPU_BASED_RDPMC_EXITING; 6891 - } 6892 - 6893 6859 static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) 6894 6860 { 6895 6861 struct vcpu_vmx *vmx = to_vmx(vcpu); ··· 6954 6968 if (nested_vmx_allowed(vcpu)) { 6955 6969 nested_vmx_cr_fixed1_bits_update(vcpu); 6956 6970 nested_vmx_entry_exit_ctls_update(vcpu); 6957 - nested_vmx_procbased_ctls_update(vcpu); 6958 6971 } 6959 6972 6960 6973 if (boot_cpu_has(X86_FEATURE_INTEL_PT) && ··· 7013 7028 return 0; 7014 7029 } 7015 7030 7016 - static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc) 7031 + static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, 7032 + bool *expired) 7017 7033 { 7018 7034 struct vcpu_vmx *vmx; 7019 7035 u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles; ··· 7037 7051 7038 7052 /* Convert to host delta tsc if tsc scaling is enabled */ 7039 7053 if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio && 7040 - u64_shl_div_u64(delta_tsc, 7054 + delta_tsc && u64_shl_div_u64(delta_tsc, 7041 7055 kvm_tsc_scaling_ratio_frac_bits, 7042 - vcpu->arch.tsc_scaling_ratio, 7043 - &delta_tsc)) 7056 + vcpu->arch.tsc_scaling_ratio, &delta_tsc)) 7044 7057 return -ERANGE; 7045 7058 7046 7059 /* ··· 7052 7067 return -ERANGE; 7053 7068 7054 7069 vmx->hv_deadline_tsc = tscl + delta_tsc; 7055 - return delta_tsc == 0; 7070 + *expired = !delta_tsc; 7071 + return 0; 7056 7072 } 7057 7073 7058 7074 static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu) ··· 7090 7104 { 7091 7105 struct vmcs12 *vmcs12; 7092 7106 struct vcpu_vmx *vmx = to_vmx(vcpu); 7093 - gpa_t gpa; 7094 - struct page *page = NULL; 7095 - u64 *pml_address; 7107 + gpa_t gpa, dst; 7096 7108 7097 7109 if (is_guest_mode(vcpu)) { 7098 7110 WARN_ON_ONCE(vmx->nested.pml_full); ··· 7110 7126 } 7111 7127 7112 7128 gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull; 7129 + dst = vmcs12->pml_address + sizeof(u64) * vmcs12->guest_pml_index; 7113 7130 7114 - page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->pml_address); 7115 - if (is_error_page(page)) 7131 + if (kvm_write_guest_page(vcpu->kvm, gpa_to_gfn(dst), &gpa, 7132 + offset_in_page(dst), sizeof(gpa))) 7116 7133 return 0; 7117 7134 7118 - pml_address = kmap(page); 7119 - pml_address[vmcs12->guest_pml_index--] = gpa; 7120 - kunmap(page); 7121 - kvm_release_page_clean(page); 7135 + vmcs12->guest_pml_index--; 7122 7136 } 7123 7137 7124 7138 return 0;

+8 -3

arch/x86/kvm/vmx/vmx.h

··· 142 142 * pointers, so we must keep them pinned while L2 runs. 143 143 */ 144 144 struct page *apic_access_page; 145 - struct page *virtual_apic_page; 146 - struct page *pi_desc_page; 145 + struct kvm_host_map virtual_apic_map; 146 + struct kvm_host_map pi_desc_map; 147 + 148 + struct kvm_host_map msr_bitmap_map; 149 + 147 150 struct pi_desc *pi_desc; 148 151 bool pi_pending; 149 152 u16 posted_intr_nv; ··· 172 169 } smm; 173 170 174 171 gpa_t hv_evmcs_vmptr; 175 - struct page *hv_evmcs_page; 172 + struct kvm_host_map hv_evmcs_map; 176 173 struct hv_enlightened_vmcs *hv_evmcs; 177 174 }; 178 175 ··· 259 256 u32 host_pkru; 260 257 261 258 unsigned long host_debugctlmsr; 259 + 260 + u64 msr_ia32_power_ctl; 262 261 263 262 /* 264 263 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in

+124 -77

arch/x86/kvm/x86.c

··· 1100 1100 1101 1101 bool kvm_rdpmc(struct kvm_vcpu *vcpu) 1102 1102 { 1103 - u32 ecx = kvm_register_read(vcpu, VCPU_REGS_RCX); 1103 + u32 ecx = kvm_rcx_read(vcpu); 1104 1104 u64 data; 1105 1105 int err; 1106 1106 1107 1107 err = kvm_pmu_rdpmc(vcpu, ecx, &data); 1108 1108 if (err) 1109 1109 return err; 1110 - kvm_register_write(vcpu, VCPU_REGS_RAX, (u32)data); 1111 - kvm_register_write(vcpu, VCPU_REGS_RDX, data >> 32); 1110 + kvm_rax_write(vcpu, (u32)data); 1111 + kvm_rdx_write(vcpu, data >> 32); 1112 1112 return err; 1113 1113 } 1114 1114 EXPORT_SYMBOL_GPL(kvm_rdpmc); ··· 1174 1174 MSR_PLATFORM_INFO, 1175 1175 MSR_MISC_FEATURES_ENABLES, 1176 1176 MSR_AMD64_VIRT_SPEC_CTRL, 1177 + MSR_IA32_POWER_CTL, 1178 + 1179 + MSR_K7_HWCR, 1177 1180 }; 1178 1181 1179 1182 static unsigned num_emulated_msrs; ··· 1265 1262 return 0; 1266 1263 } 1267 1264 1265 + static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) 1266 + { 1267 + if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT)) 1268 + return false; 1269 + 1270 + if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM)) 1271 + return false; 1272 + 1273 + if (efer & (EFER_LME | EFER_LMA) && 1274 + !guest_cpuid_has(vcpu, X86_FEATURE_LM)) 1275 + return false; 1276 + 1277 + if (efer & EFER_NX && !guest_cpuid_has(vcpu, X86_FEATURE_NX)) 1278 + return false; 1279 + 1280 + return true; 1281 + 1282 + } 1268 1283 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) 1269 1284 { 1270 1285 if (efer & efer_reserved_bits) 1271 1286 return false; 1272 1287 1273 - if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT)) 1274 - return false; 1275 - 1276 - if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM)) 1277 - return false; 1278 - 1279 - return true; 1288 + return __kvm_valid_efer(vcpu, efer); 1280 1289 } 1281 1290 EXPORT_SYMBOL_GPL(kvm_valid_efer); 1282 1291 1283 - static int set_efer(struct kvm_vcpu *vcpu, u64 efer) 1292 + static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 1284 1293 { 1285 1294 u64 old_efer = vcpu->arch.efer; 1295 + u64 efer = msr_info->data; 1286 1296 1287 - if (!kvm_valid_efer(vcpu, efer)) 1288 - return 1; 1297 + if (efer & efer_reserved_bits) 1298 + return false; 1289 1299 1290 - if (is_paging(vcpu) 1291 - && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) 1292 - return 1; 1300 + if (!msr_info->host_initiated) { 1301 + if (!__kvm_valid_efer(vcpu, efer)) 1302 + return 1; 1303 + 1304 + if (is_paging(vcpu) && 1305 + (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) 1306 + return 1; 1307 + } 1293 1308 1294 1309 efer &= ~EFER_LMA; 1295 1310 efer |= vcpu->arch.efer & EFER_LMA; ··· 2300 2279 KVMCLOCK_SYNC_PERIOD); 2301 2280 } 2302 2281 2282 + /* 2283 + * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP. 2284 + */ 2285 + static bool can_set_mci_status(struct kvm_vcpu *vcpu) 2286 + { 2287 + /* McStatusWrEn enabled? */ 2288 + if (guest_cpuid_is_amd(vcpu)) 2289 + return !!(vcpu->arch.msr_hwcr & BIT_ULL(18)); 2290 + 2291 + return false; 2292 + } 2293 + 2303 2294 static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 2304 2295 { 2305 2296 u64 mcg_cap = vcpu->arch.mcg_cap; ··· 2343 2310 if ((offset & 0x3) == 0 && 2344 2311 data != 0 && (data | (1 << 10)) != ~(u64)0) 2345 2312 return -1; 2313 + 2314 + /* MCi_STATUS */ 2346 2315 if (!msr_info->host_initiated && 2347 - (offset & 0x3) == 1 && data != 0) 2348 - return -1; 2316 + (offset & 0x3) == 1 && data != 0) { 2317 + if (!can_set_mci_status(vcpu)) 2318 + return -1; 2319 + } 2320 + 2349 2321 vcpu->arch.mce_banks[offset] = data; 2350 2322 break; 2351 2323 } ··· 2494 2456 vcpu->arch.arch_capabilities = data; 2495 2457 break; 2496 2458 case MSR_EFER: 2497 - return set_efer(vcpu, data); 2459 + return set_efer(vcpu, msr_info); 2498 2460 case MSR_K7_HWCR: 2499 2461 data &= ~(u64)0x40; /* ignore flush filter disable */ 2500 2462 data &= ~(u64)0x100; /* ignore ignne emulation enable */ 2501 2463 data &= ~(u64)0x8; /* ignore TLB cache disable */ 2502 - data &= ~(u64)0x40000; /* ignore Mc status write enable */ 2503 - if (data != 0) { 2464 + 2465 + /* Handle McStatusWrEn */ 2466 + if (data == BIT_ULL(18)) { 2467 + vcpu->arch.msr_hwcr = data; 2468 + } else if (data != 0) { 2504 2469 vcpu_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", 2505 2470 data); 2506 2471 return 1; ··· 2777 2736 case MSR_K8_SYSCFG: 2778 2737 case MSR_K8_TSEG_ADDR: 2779 2738 case MSR_K8_TSEG_MASK: 2780 - case MSR_K7_HWCR: 2781 2739 case MSR_VM_HSAVE_PA: 2782 2740 case MSR_K8_INT_PENDING_MSG: 2783 2741 case MSR_AMD64_NB_CFG: ··· 2939 2899 break; 2940 2900 case MSR_MISC_FEATURES_ENABLES: 2941 2901 msr_info->data = vcpu->arch.msr_misc_features_enables; 2902 + break; 2903 + case MSR_K7_HWCR: 2904 + msr_info->data = vcpu->arch.msr_hwcr; 2942 2905 break; 2943 2906 default: 2944 2907 if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) ··· 3121 3078 break; 3122 3079 case KVM_CAP_MAX_VCPUS: 3123 3080 r = KVM_MAX_VCPUS; 3124 - break; 3125 - case KVM_CAP_NR_MEMSLOTS: 3126 - r = KVM_USER_MEM_SLOTS; 3127 3081 break; 3128 3082 case KVM_CAP_PV_MMU: /* obsolete */ 3129 3083 r = 0; ··· 5561 5521 unsigned int bytes, 5562 5522 struct x86_exception *exception) 5563 5523 { 5524 + struct kvm_host_map map; 5564 5525 struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); 5565 5526 gpa_t gpa; 5566 - struct page *page; 5567 5527 char *kaddr; 5568 5528 bool exchanged; 5569 5529 ··· 5580 5540 if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) 5581 5541 goto emul_write; 5582 5542 5583 - page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT); 5584 - if (is_error_page(page)) 5543 + if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map)) 5585 5544 goto emul_write; 5586 5545 5587 - kaddr = kmap_atomic(page); 5588 - kaddr += offset_in_page(gpa); 5546 + kaddr = map.hva + offset_in_page(gpa); 5547 + 5589 5548 switch (bytes) { 5590 5549 case 1: 5591 5550 exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); ··· 5601 5562 default: 5602 5563 BUG(); 5603 5564 } 5604 - kunmap_atomic(kaddr); 5605 - kvm_release_page_dirty(page); 5565 + 5566 + kvm_vcpu_unmap(vcpu, &map, true); 5606 5567 5607 5568 if (!exchanged) 5608 5569 return X86EMUL_CMPXCHG_FAILED; 5609 5570 5610 - kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); 5611 5571 kvm_page_track_write(vcpu, gpa, new, bytes); 5612 5572 5613 5573 return X86EMUL_CONTINUE; ··· 6596 6558 static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, 6597 6559 unsigned short port) 6598 6560 { 6599 - unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); 6561 + unsigned long val = kvm_rax_read(vcpu); 6600 6562 int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt, 6601 6563 size, port, &val, 1); 6602 6564 if (ret) ··· 6631 6593 } 6632 6594 6633 6595 /* For size less than 4 we merge, else we zero extend */ 6634 - val = (vcpu->arch.pio.size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) 6635 - : 0; 6596 + val = (vcpu->arch.pio.size < 4) ? kvm_rax_read(vcpu) : 0; 6636 6597 6637 6598 /* 6638 6599 * Since vcpu->arch.pio.count == 1 let emulator_pio_in_emulated perform ··· 6639 6602 */ 6640 6603 emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size, 6641 6604 vcpu->arch.pio.port, &val, 1); 6642 - kvm_register_write(vcpu, VCPU_REGS_RAX, val); 6605 + kvm_rax_write(vcpu, val); 6643 6606 6644 6607 return kvm_skip_emulated_instruction(vcpu); 6645 6608 } ··· 6651 6614 int ret; 6652 6615 6653 6616 /* For size less than 4 we merge, else we zero extend */ 6654 - val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0; 6617 + val = (size < 4) ? kvm_rax_read(vcpu) : 0; 6655 6618 6656 6619 ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port, 6657 6620 &val, 1); 6658 6621 if (ret) { 6659 - kvm_register_write(vcpu, VCPU_REGS_RAX, val); 6622 + kvm_rax_write(vcpu, val); 6660 6623 return ret; 6661 6624 } 6662 6625 ··· 6891 6854 return ip; 6892 6855 } 6893 6856 6857 + static void kvm_handle_intel_pt_intr(void) 6858 + { 6859 + struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu); 6860 + 6861 + kvm_make_request(KVM_REQ_PMI, vcpu); 6862 + __set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT, 6863 + (unsigned long *)&vcpu->arch.pmu.global_status); 6864 + } 6865 + 6894 6866 static struct perf_guest_info_callbacks kvm_guest_cbs = { 6895 6867 .is_in_guest = kvm_is_in_guest, 6896 6868 .is_user_mode = kvm_is_user_mode, 6897 6869 .get_guest_ip = kvm_get_guest_ip, 6870 + .handle_intel_pt_intr = kvm_handle_intel_pt_intr, 6898 6871 }; 6899 6872 6900 6873 static void kvm_set_mmio_spte_mask(void) ··· 7180 7133 if (kvm_hv_hypercall_enabled(vcpu->kvm)) 7181 7134 return kvm_hv_hypercall(vcpu); 7182 7135 7183 - nr = kvm_register_read(vcpu, VCPU_REGS_RAX); 7184 - a0 = kvm_register_read(vcpu, VCPU_REGS_RBX); 7185 - a1 = kvm_register_read(vcpu, VCPU_REGS_RCX); 7186 - a2 = kvm_register_read(vcpu, VCPU_REGS_RDX); 7187 - a3 = kvm_register_read(vcpu, VCPU_REGS_RSI); 7136 + nr = kvm_rax_read(vcpu); 7137 + a0 = kvm_rbx_read(vcpu); 7138 + a1 = kvm_rcx_read(vcpu); 7139 + a2 = kvm_rdx_read(vcpu); 7140 + a3 = kvm_rsi_read(vcpu); 7188 7141 7189 7142 trace_kvm_hypercall(nr, a0, a1, a2, a3); 7190 7143 ··· 7225 7178 out: 7226 7179 if (!op_64_bit) 7227 7180 ret = (u32)ret; 7228 - kvm_register_write(vcpu, VCPU_REGS_RAX, ret); 7181 + kvm_rax_write(vcpu, ret); 7229 7182 7230 7183 ++vcpu->stat.hypercalls; 7231 7184 return kvm_skip_emulated_instruction(vcpu); ··· 8327 8280 emulator_writeback_register_cache(&vcpu->arch.emulate_ctxt); 8328 8281 vcpu->arch.emulate_regs_need_sync_to_vcpu = false; 8329 8282 } 8330 - regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX); 8331 - regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX); 8332 - regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX); 8333 - regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX); 8334 - regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI); 8335 - regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI); 8336 - regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); 8337 - regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP); 8283 + regs->rax = kvm_rax_read(vcpu); 8284 + regs->rbx = kvm_rbx_read(vcpu); 8285 + regs->rcx = kvm_rcx_read(vcpu); 8286 + regs->rdx = kvm_rdx_read(vcpu); 8287 + regs->rsi = kvm_rsi_read(vcpu); 8288 + regs->rdi = kvm_rdi_read(vcpu); 8289 + regs->rsp = kvm_rsp_read(vcpu); 8290 + regs->rbp = kvm_rbp_read(vcpu); 8338 8291 #ifdef CONFIG_X86_64 8339 - regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8); 8340 - regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9); 8341 - regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10); 8342 - regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11); 8343 - regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12); 8344 - regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13); 8345 - regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14); 8346 - regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15); 8292 + regs->r8 = kvm_r8_read(vcpu); 8293 + regs->r9 = kvm_r9_read(vcpu); 8294 + regs->r10 = kvm_r10_read(vcpu); 8295 + regs->r11 = kvm_r11_read(vcpu); 8296 + regs->r12 = kvm_r12_read(vcpu); 8297 + regs->r13 = kvm_r13_read(vcpu); 8298 + regs->r14 = kvm_r14_read(vcpu); 8299 + regs->r15 = kvm_r15_read(vcpu); 8347 8300 #endif 8348 8301 8349 8302 regs->rip = kvm_rip_read(vcpu); ··· 8363 8316 vcpu->arch.emulate_regs_need_sync_from_vcpu = true; 8364 8317 vcpu->arch.emulate_regs_need_sync_to_vcpu = false; 8365 8318 8366 - kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax); 8367 - kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx); 8368 - kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx); 8369 - kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx); 8370 - kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi); 8371 - kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi); 8372 - kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp); 8373 - kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp); 8319 + kvm_rax_write(vcpu, regs->rax); 8320 + kvm_rbx_write(vcpu, regs->rbx); 8321 + kvm_rcx_write(vcpu, regs->rcx); 8322 + kvm_rdx_write(vcpu, regs->rdx); 8323 + kvm_rsi_write(vcpu, regs->rsi); 8324 + kvm_rdi_write(vcpu, regs->rdi); 8325 + kvm_rsp_write(vcpu, regs->rsp); 8326 + kvm_rbp_write(vcpu, regs->rbp); 8374 8327 #ifdef CONFIG_X86_64 8375 - kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8); 8376 - kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9); 8377 - kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10); 8378 - kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11); 8379 - kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12); 8380 - kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13); 8381 - kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14); 8382 - kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15); 8328 + kvm_r8_write(vcpu, regs->r8); 8329 + kvm_r9_write(vcpu, regs->r9); 8330 + kvm_r10_write(vcpu, regs->r10); 8331 + kvm_r11_write(vcpu, regs->r11); 8332 + kvm_r12_write(vcpu, regs->r12); 8333 + kvm_r13_write(vcpu, regs->r13); 8334 + kvm_r14_write(vcpu, regs->r14); 8335 + kvm_r15_write(vcpu, regs->r15); 8383 8336 #endif 8384 8337 8385 8338 kvm_rip_write(vcpu, regs->rip);

+10

arch/x86/kvm/x86.h

··· 345 345 __this_cpu_write(current_vcpu, NULL); 346 346 } 347 347 348 + 349 + static inline bool kvm_pat_valid(u64 data) 350 + { 351 + if (data & 0xF8F8F8F8F8F8F8F8ull) 352 + return false; 353 + /* 0, 1, 4, 5, 6, 7 are valid values. */ 354 + return (data | ((data & 0x0202020202020202ull) << 1)) == data; 355 + } 356 + 348 357 void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu); 349 358 void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu); 359 + 350 360 #endif

+48

include/linux/kvm_host.h

··· 227 227 READING_SHADOW_PAGE_TABLES, 228 228 }; 229 229 230 + #define KVM_UNMAPPED_PAGE ((void *) 0x500 + POISON_POINTER_DELTA) 231 + 232 + struct kvm_host_map { 233 + /* 234 + * Only valid if the 'pfn' is managed by the host kernel (i.e. There is 235 + * a 'struct page' for it. When using mem= kernel parameter some memory 236 + * can be used as guest memory but they are not managed by host 237 + * kernel). 238 + * If 'pfn' is not managed by the host kernel, this field is 239 + * initialized to KVM_UNMAPPED_PAGE. 240 + */ 241 + struct page *page; 242 + void *hva; 243 + kvm_pfn_t pfn; 244 + kvm_pfn_t gfn; 245 + }; 246 + 247 + /* 248 + * Used to check if the mapping is valid or not. Never use 'kvm_host_map' 249 + * directly to check for that. 250 + */ 251 + static inline bool kvm_vcpu_mapped(struct kvm_host_map *map) 252 + { 253 + return !!map->hva; 254 + } 255 + 230 256 /* 231 257 * Sometimes a large or cross-page mmio needs to be broken up into separate 232 258 * exits for userspace servicing. ··· 759 733 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn); 760 734 kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn); 761 735 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); 736 + int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map); 762 737 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn); 738 + void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty); 763 739 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn); 764 740 unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable); 765 741 int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset, ··· 1270 1242 */ 1271 1243 void (*destroy)(struct kvm_device *dev); 1272 1244 1245 + /* 1246 + * Release is an alternative method to free the device. It is 1247 + * called when the device file descriptor is closed. Once 1248 + * release is called, the destroy method will not be called 1249 + * anymore as the device is removed from the device list of 1250 + * the VM. kvm->lock is held. 1251 + */ 1252 + void (*release)(struct kvm_device *dev); 1253 + 1273 1254 int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); 1274 1255 int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); 1275 1256 int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); 1276 1257 long (*ioctl)(struct kvm_device *dev, unsigned int ioctl, 1277 1258 unsigned long arg); 1259 + int (*mmap)(struct kvm_device *dev, struct vm_area_struct *vma); 1278 1260 }; 1279 1261 1280 1262 void kvm_device_get(struct kvm_device *dev); ··· 1344 1306 return true; 1345 1307 } 1346 1308 #endif /* CONFIG_HAVE_KVM_INVALID_WAKEUPS */ 1309 + 1310 + #ifdef CONFIG_HAVE_KVM_NO_POLL 1311 + /* Callback that tells if we must not poll */ 1312 + bool kvm_arch_no_poll(struct kvm_vcpu *vcpu); 1313 + #else 1314 + static inline bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) 1315 + { 1316 + return false; 1317 + } 1318 + #endif /* CONFIG_HAVE_KVM_NO_POLL */ 1347 1319 1348 1320 #ifdef CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL 1349 1321 long kvm_arch_vcpu_async_ioctl(struct file *filp,

+1

include/linux/perf_event.h

··· 30 30 int (*is_in_guest)(void); 31 31 int (*is_user_mode)(void); 32 32 unsigned long (*get_guest_ip)(void); 33 + void (*handle_intel_pt_intr)(void); 33 34 }; 34 35 35 36 #ifdef CONFIG_HAVE_HW_BREAKPOINT

+13 -2

include/uapi/linux/kvm.h

··· 986 986 #define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 163 987 987 #define KVM_CAP_EXCEPTION_PAYLOAD 164 988 988 #define KVM_CAP_ARM_VM_IPA_SIZE 165 989 - #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166 989 + #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166 /* Obsolete */ 990 990 #define KVM_CAP_HYPERV_CPUID 167 991 + #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 168 992 + #define KVM_CAP_PPC_IRQ_XIVE 169 993 + #define KVM_CAP_ARM_SVE 170 994 + #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 995 + #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 991 996 992 997 #ifdef KVM_CAP_IRQ_ROUTING 993 998 ··· 1150 1145 #define KVM_REG_SIZE_U256 0x0050000000000000ULL 1151 1146 #define KVM_REG_SIZE_U512 0x0060000000000000ULL 1152 1147 #define KVM_REG_SIZE_U1024 0x0070000000000000ULL 1148 + #define KVM_REG_SIZE_U2048 0x0080000000000000ULL 1153 1149 1154 1150 struct kvm_reg_list { 1155 1151 __u64 n; /* number of regs */ ··· 1217 1211 #define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3 1218 1212 KVM_DEV_TYPE_ARM_VGIC_ITS, 1219 1213 #define KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_ARM_VGIC_ITS 1214 + KVM_DEV_TYPE_XIVE, 1215 + #define KVM_DEV_TYPE_XIVE KVM_DEV_TYPE_XIVE 1220 1216 KVM_DEV_TYPE_MAX, 1221 1217 }; 1222 1218 ··· 1442 1434 #define KVM_GET_NESTED_STATE _IOWR(KVMIO, 0xbe, struct kvm_nested_state) 1443 1435 #define KVM_SET_NESTED_STATE _IOW(KVMIO, 0xbf, struct kvm_nested_state) 1444 1436 1445 - /* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT */ 1437 + /* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT_2 */ 1446 1438 #define KVM_CLEAR_DIRTY_LOG _IOWR(KVMIO, 0xc0, struct kvm_clear_dirty_log) 1447 1439 1448 1440 /* Available with KVM_CAP_HYPERV_CPUID */ 1449 1441 #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2) 1442 + 1443 + /* Available with KVM_CAP_ARM_SVE */ 1444 + #define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int) 1450 1445 1451 1446 /* Secure Encrypted Virtualization command */ 1452 1447 enum sev_cmd_id {

+2 -1

tools/arch/s390/include/uapi/asm/kvm.h

··· 152 152 __u8 pcc[16]; /* with MSA4 */ 153 153 __u8 ppno[16]; /* with MSA5 */ 154 154 __u8 kma[16]; /* with MSA8 */ 155 - __u8 reserved[1808]; 155 + __u8 kdsa[16]; /* with MSA9 */ 156 + __u8 reserved[1792]; 156 157 }; 157 158 158 159 /* kvm attributes for crypto */

+6 -1

tools/testing/selftests/kvm/.gitignore

··· 1 1 /x86_64/cr4_cpuid_sync_test 2 2 /x86_64/evmcs_test 3 + /x86_64/hyperv_cpuid 4 + /x86_64/kvm_create_max_vcpus 3 5 /x86_64/platform_info_test 4 6 /x86_64/set_sregs_test 7 + /x86_64/smm_test 8 + /x86_64/state_test 5 9 /x86_64/sync_regs_test 6 10 /x86_64/vmx_close_while_nested_test 11 + /x86_64/vmx_set_nested_state_test 7 12 /x86_64/vmx_tsc_adjust_test 8 - /x86_64/state_test 13 + /clear_dirty_log_test 9 14 /dirty_log_test

+2

tools/testing/selftests/kvm/Makefile

··· 20 20 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid 21 21 TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test 22 22 TEST_GEN_PROGS_x86_64 += x86_64/smm_test 23 + TEST_GEN_PROGS_x86_64 += x86_64/kvm_create_max_vcpus 24 + TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test 23 25 TEST_GEN_PROGS_x86_64 += dirty_log_test 24 26 TEST_GEN_PROGS_x86_64 += clear_dirty_log_test 25 27

+2 -2

tools/testing/selftests/kvm/dirty_log_test.c

··· 314 314 #ifdef USE_CLEAR_DIRTY_LOG 315 315 struct kvm_enable_cap cap = {}; 316 316 317 - cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT; 317 + cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; 318 318 cap.args[0] = 1; 319 319 vm_enable_cap(vm, &cap); 320 320 #endif ··· 430 430 int opt, i; 431 431 432 432 #ifdef USE_CLEAR_DIRTY_LOG 433 - if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT)) { 433 + if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2)) { 434 434 fprintf(stderr, "KVM_CLEAR_DIRTY_LOG not available, skipping tests\n"); 435 435 exit(KSFT_SKIP); 436 436 }

+4

tools/testing/selftests/kvm/include/kvm_util.h

··· 118 118 struct kvm_vcpu_events *events); 119 119 void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid, 120 120 struct kvm_vcpu_events *events); 121 + void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid, 122 + struct kvm_nested_state *state); 123 + int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid, 124 + struct kvm_nested_state *state, bool ignore_error); 121 125 122 126 const char *exit_reason_str(unsigned int exit_reason); 123 127

+32

tools/testing/selftests/kvm/lib/kvm_util.c

··· 1250 1250 ret, errno); 1251 1251 } 1252 1252 1253 + void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid, 1254 + struct kvm_nested_state *state) 1255 + { 1256 + struct vcpu *vcpu = vcpu_find(vm, vcpuid); 1257 + int ret; 1258 + 1259 + TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); 1260 + 1261 + ret = ioctl(vcpu->fd, KVM_GET_NESTED_STATE, state); 1262 + TEST_ASSERT(ret == 0, 1263 + "KVM_SET_NESTED_STATE failed, ret: %i errno: %i", 1264 + ret, errno); 1265 + } 1266 + 1267 + int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid, 1268 + struct kvm_nested_state *state, bool ignore_error) 1269 + { 1270 + struct vcpu *vcpu = vcpu_find(vm, vcpuid); 1271 + int ret; 1272 + 1273 + TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); 1274 + 1275 + ret = ioctl(vcpu->fd, KVM_SET_NESTED_STATE, state); 1276 + if (!ignore_error) { 1277 + TEST_ASSERT(ret == 0, 1278 + "KVM_SET_NESTED_STATE failed, ret: %i errno: %i", 1279 + ret, errno); 1280 + } 1281 + 1282 + return ret; 1283 + } 1284 + 1253 1285 /* 1254 1286 * VM VCPU System Regs Get 1255 1287 *

+70

tools/testing/selftests/kvm/x86_64/kvm_create_max_vcpus.c

··· 1 + /* 2 + * kvm_create_max_vcpus 3 + * 4 + * Copyright (C) 2019, Google LLC. 5 + * 6 + * This work is licensed under the terms of the GNU GPL, version 2. 7 + * 8 + * Test for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_VCPU_ID. 9 + */ 10 + 11 + #define _GNU_SOURCE /* for program_invocation_short_name */ 12 + #include <fcntl.h> 13 + #include <stdio.h> 14 + #include <stdlib.h> 15 + #include <string.h> 16 + 17 + #include "test_util.h" 18 + 19 + #include "kvm_util.h" 20 + #include "asm/kvm.h" 21 + #include "linux/kvm.h" 22 + 23 + void test_vcpu_creation(int first_vcpu_id, int num_vcpus) 24 + { 25 + struct kvm_vm *vm; 26 + int i; 27 + 28 + printf("Testing creating %d vCPUs, with IDs %d...%d.\n", 29 + num_vcpus, first_vcpu_id, first_vcpu_id + num_vcpus - 1); 30 + 31 + vm = vm_create(VM_MODE_P52V48_4K, DEFAULT_GUEST_PHY_PAGES, O_RDWR); 32 + 33 + for (i = 0; i < num_vcpus; i++) { 34 + int vcpu_id = first_vcpu_id + i; 35 + 36 + /* This asserts that the vCPU was created. */ 37 + vm_vcpu_add(vm, vcpu_id, 0, 0); 38 + } 39 + 40 + kvm_vm_free(vm); 41 + } 42 + 43 + int main(int argc, char *argv[]) 44 + { 45 + int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID); 46 + int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); 47 + 48 + printf("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id); 49 + printf("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus); 50 + 51 + /* 52 + * Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID. 53 + * Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID 54 + * in this case. 55 + */ 56 + if (!kvm_max_vcpu_id) 57 + kvm_max_vcpu_id = kvm_max_vcpus; 58 + 59 + TEST_ASSERT(kvm_max_vcpu_id >= kvm_max_vcpus, 60 + "KVM_MAX_VCPU_ID (%d) must be at least as large as KVM_MAX_VCPUS (%d).", 61 + kvm_max_vcpu_id, kvm_max_vcpus); 62 + 63 + test_vcpu_creation(0, kvm_max_vcpus); 64 + 65 + if (kvm_max_vcpu_id > kvm_max_vcpus) 66 + test_vcpu_creation( 67 + kvm_max_vcpu_id - kvm_max_vcpus, kvm_max_vcpus); 68 + 69 + return 0; 70 + }

+280

tools/testing/selftests/kvm/x86_64/vmx_set_nested_state_test.c

··· 1 + /* 2 + * vmx_set_nested_state_test 3 + * 4 + * Copyright (C) 2019, Google LLC. 5 + * 6 + * This work is licensed under the terms of the GNU GPL, version 2. 7 + * 8 + * This test verifies the integrity of calling the ioctl KVM_SET_NESTED_STATE. 9 + */ 10 + 11 + #include "test_util.h" 12 + #include "kvm_util.h" 13 + #include "processor.h" 14 + #include "vmx.h" 15 + 16 + #include <errno.h> 17 + #include <linux/kvm.h> 18 + #include <string.h> 19 + #include <sys/ioctl.h> 20 + #include <unistd.h> 21 + 22 + /* 23 + * Mirror of VMCS12_REVISION in arch/x86/kvm/vmx/vmcs12.h. If that value 24 + * changes this should be updated. 25 + */ 26 + #define VMCS12_REVISION 0x11e57ed0 27 + #define VCPU_ID 5 28 + 29 + void test_nested_state(struct kvm_vm *vm, struct kvm_nested_state *state) 30 + { 31 + volatile struct kvm_run *run; 32 + 33 + vcpu_nested_state_set(vm, VCPU_ID, state, false); 34 + run = vcpu_state(vm, VCPU_ID); 35 + vcpu_run(vm, VCPU_ID); 36 + TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN, 37 + "Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n", 38 + run->exit_reason, 39 + exit_reason_str(run->exit_reason)); 40 + } 41 + 42 + void test_nested_state_expect_errno(struct kvm_vm *vm, 43 + struct kvm_nested_state *state, 44 + int expected_errno) 45 + { 46 + volatile struct kvm_run *run; 47 + int rv; 48 + 49 + rv = vcpu_nested_state_set(vm, VCPU_ID, state, true); 50 + TEST_ASSERT(rv == -1 && errno == expected_errno, 51 + "Expected %s (%d) from vcpu_nested_state_set but got rv: %i errno: %s (%d)", 52 + strerror(expected_errno), expected_errno, rv, strerror(errno), 53 + errno); 54 + run = vcpu_state(vm, VCPU_ID); 55 + vcpu_run(vm, VCPU_ID); 56 + TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN, 57 + "Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n", 58 + run->exit_reason, 59 + exit_reason_str(run->exit_reason)); 60 + } 61 + 62 + void test_nested_state_expect_einval(struct kvm_vm *vm, 63 + struct kvm_nested_state *state) 64 + { 65 + test_nested_state_expect_errno(vm, state, EINVAL); 66 + } 67 + 68 + void test_nested_state_expect_efault(struct kvm_vm *vm, 69 + struct kvm_nested_state *state) 70 + { 71 + test_nested_state_expect_errno(vm, state, EFAULT); 72 + } 73 + 74 + void set_revision_id_for_vmcs12(struct kvm_nested_state *state, 75 + u32 vmcs12_revision) 76 + { 77 + /* Set revision_id in vmcs12 to vmcs12_revision. */ 78 + *(u32 *)(state->data) = vmcs12_revision; 79 + } 80 + 81 + void set_default_state(struct kvm_nested_state *state) 82 + { 83 + memset(state, 0, sizeof(*state)); 84 + state->flags = KVM_STATE_NESTED_RUN_PENDING | 85 + KVM_STATE_NESTED_GUEST_MODE; 86 + state->format = 0; 87 + state->size = sizeof(*state); 88 + } 89 + 90 + void set_default_vmx_state(struct kvm_nested_state *state, int size) 91 + { 92 + memset(state, 0, size); 93 + state->flags = KVM_STATE_NESTED_GUEST_MODE | 94 + KVM_STATE_NESTED_RUN_PENDING | 95 + KVM_STATE_NESTED_EVMCS; 96 + state->format = 0; 97 + state->size = size; 98 + state->vmx.vmxon_pa = 0x1000; 99 + state->vmx.vmcs_pa = 0x2000; 100 + state->vmx.smm.flags = 0; 101 + set_revision_id_for_vmcs12(state, VMCS12_REVISION); 102 + } 103 + 104 + void test_vmx_nested_state(struct kvm_vm *vm) 105 + { 106 + /* Add a page for VMCS12. */ 107 + const int state_sz = sizeof(struct kvm_nested_state) + getpagesize(); 108 + struct kvm_nested_state *state = 109 + (struct kvm_nested_state *)malloc(state_sz); 110 + 111 + /* The format must be set to 0. 0 for VMX, 1 for SVM. */ 112 + set_default_vmx_state(state, state_sz); 113 + state->format = 1; 114 + test_nested_state_expect_einval(vm, state); 115 + 116 + /* 117 + * We cannot virtualize anything if the guest does not have VMX 118 + * enabled. 119 + */ 120 + set_default_vmx_state(state, state_sz); 121 + test_nested_state_expect_einval(vm, state); 122 + 123 + /* 124 + * We cannot virtualize anything if the guest does not have VMX 125 + * enabled. We expect KVM_SET_NESTED_STATE to return 0 if vmxon_pa 126 + * is set to -1ull. 127 + */ 128 + set_default_vmx_state(state, state_sz); 129 + state->vmx.vmxon_pa = -1ull; 130 + test_nested_state(vm, state); 131 + 132 + /* Enable VMX in the guest CPUID. */ 133 + vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); 134 + 135 + /* It is invalid to have vmxon_pa == -1ull and SMM flags non-zero. */ 136 + set_default_vmx_state(state, state_sz); 137 + state->vmx.vmxon_pa = -1ull; 138 + state->vmx.smm.flags = 1; 139 + test_nested_state_expect_einval(vm, state); 140 + 141 + /* It is invalid to have vmxon_pa == -1ull and vmcs_pa != -1ull. */ 142 + set_default_vmx_state(state, state_sz); 143 + state->vmx.vmxon_pa = -1ull; 144 + state->vmx.vmcs_pa = 0; 145 + test_nested_state_expect_einval(vm, state); 146 + 147 + /* 148 + * Setting vmxon_pa == -1ull and vmcs_pa == -1ull exits early without 149 + * setting the nested state. 150 + */ 151 + set_default_vmx_state(state, state_sz); 152 + state->vmx.vmxon_pa = -1ull; 153 + state->vmx.vmcs_pa = -1ull; 154 + test_nested_state(vm, state); 155 + 156 + /* It is invalid to have vmxon_pa set to a non-page aligned address. */ 157 + set_default_vmx_state(state, state_sz); 158 + state->vmx.vmxon_pa = 1; 159 + test_nested_state_expect_einval(vm, state); 160 + 161 + /* 162 + * It is invalid to have KVM_STATE_NESTED_SMM_GUEST_MODE and 163 + * KVM_STATE_NESTED_GUEST_MODE set together. 164 + */ 165 + set_default_vmx_state(state, state_sz); 166 + state->flags = KVM_STATE_NESTED_GUEST_MODE | 167 + KVM_STATE_NESTED_RUN_PENDING; 168 + state->vmx.smm.flags = KVM_STATE_NESTED_SMM_GUEST_MODE; 169 + test_nested_state_expect_einval(vm, state); 170 + 171 + /* 172 + * It is invalid to have any of the SMM flags set besides: 173 + * KVM_STATE_NESTED_SMM_GUEST_MODE 174 + * KVM_STATE_NESTED_SMM_VMXON 175 + */ 176 + set_default_vmx_state(state, state_sz); 177 + state->vmx.smm.flags = ~(KVM_STATE_NESTED_SMM_GUEST_MODE | 178 + KVM_STATE_NESTED_SMM_VMXON); 179 + test_nested_state_expect_einval(vm, state); 180 + 181 + /* Outside SMM, SMM flags must be zero. */ 182 + set_default_vmx_state(state, state_sz); 183 + state->flags = 0; 184 + state->vmx.smm.flags = KVM_STATE_NESTED_SMM_GUEST_MODE; 185 + test_nested_state_expect_einval(vm, state); 186 + 187 + /* Size must be large enough to fit kvm_nested_state and vmcs12. */ 188 + set_default_vmx_state(state, state_sz); 189 + state->size = sizeof(*state); 190 + test_nested_state(vm, state); 191 + 192 + /* vmxon_pa cannot be the same address as vmcs_pa. */ 193 + set_default_vmx_state(state, state_sz); 194 + state->vmx.vmxon_pa = 0; 195 + state->vmx.vmcs_pa = 0; 196 + test_nested_state_expect_einval(vm, state); 197 + 198 + /* The revision id for vmcs12 must be VMCS12_REVISION. */ 199 + set_default_vmx_state(state, state_sz); 200 + set_revision_id_for_vmcs12(state, 0); 201 + test_nested_state_expect_einval(vm, state); 202 + 203 + /* 204 + * Test that if we leave nesting the state reflects that when we get 205 + * it again. 206 + */ 207 + set_default_vmx_state(state, state_sz); 208 + state->vmx.vmxon_pa = -1ull; 209 + state->vmx.vmcs_pa = -1ull; 210 + state->flags = 0; 211 + test_nested_state(vm, state); 212 + vcpu_nested_state_get(vm, VCPU_ID, state); 213 + TEST_ASSERT(state->size >= sizeof(*state) && state->size <= state_sz, 214 + "Size must be between %d and %d. The size returned was %d.", 215 + sizeof(*state), state_sz, state->size); 216 + TEST_ASSERT(state->vmx.vmxon_pa == -1ull, "vmxon_pa must be -1ull."); 217 + TEST_ASSERT(state->vmx.vmcs_pa == -1ull, "vmcs_pa must be -1ull."); 218 + 219 + free(state); 220 + } 221 + 222 + int main(int argc, char *argv[]) 223 + { 224 + struct kvm_vm *vm; 225 + struct kvm_nested_state state; 226 + struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); 227 + 228 + if (!kvm_check_cap(KVM_CAP_NESTED_STATE)) { 229 + printf("KVM_CAP_NESTED_STATE not available, skipping test\n"); 230 + exit(KSFT_SKIP); 231 + } 232 + 233 + /* 234 + * AMD currently does not implement set_nested_state, so for now we 235 + * just early out. 236 + */ 237 + if (!(entry->ecx & CPUID_VMX)) { 238 + fprintf(stderr, "nested VMX not enabled, skipping test\n"); 239 + exit(KSFT_SKIP); 240 + } 241 + 242 + vm = vm_create_default(VCPU_ID, 0, 0); 243 + 244 + /* Passing a NULL kvm_nested_state causes a EFAULT. */ 245 + test_nested_state_expect_efault(vm, NULL); 246 + 247 + /* 'size' cannot be smaller than sizeof(kvm_nested_state). */ 248 + set_default_state(&state); 249 + state.size = 0; 250 + test_nested_state_expect_einval(vm, &state); 251 + 252 + /* 253 + * Setting the flags 0xf fails the flags check. The only flags that 254 + * can be used are: 255 + * KVM_STATE_NESTED_GUEST_MODE 256 + * KVM_STATE_NESTED_RUN_PENDING 257 + * KVM_STATE_NESTED_EVMCS 258 + */ 259 + set_default_state(&state); 260 + state.flags = 0xf; 261 + test_nested_state_expect_einval(vm, &state); 262 + 263 + /* 264 + * If KVM_STATE_NESTED_RUN_PENDING is set then 265 + * KVM_STATE_NESTED_GUEST_MODE has to be set as well. 266 + */ 267 + set_default_state(&state); 268 + state.flags = KVM_STATE_NESTED_RUN_PENDING; 269 + test_nested_state_expect_einval(vm, &state); 270 + 271 + /* 272 + * TODO: When SVM support is added for KVM_SET_NESTED_STATE 273 + * add tests here to support it like VMX. 274 + */ 275 + if (entry->ecx & CPUID_VMX) 276 + test_vmx_nested_state(vm); 277 + 278 + kvm_vm_free(vm); 279 + return 0; 280 + }

+3

virt/kvm/Kconfig

··· 57 57 58 58 config HAVE_KVM_VCPU_RUN_PID_CHANGE 59 59 bool 60 + 61 + config HAVE_KVM_NO_POLL 62 + bool

+34 -9

virt/kvm/arm/arm.c

··· 56 56 __asm__(".arch_extension virt"); 57 57 #endif 58 58 59 - DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); 59 + DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data); 60 60 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); 61 61 62 62 /* Per-CPU variable containing the currently running vcpu. */ ··· 224 224 case KVM_CAP_MAX_VCPUS: 225 225 r = KVM_MAX_VCPUS; 226 226 break; 227 - case KVM_CAP_NR_MEMSLOTS: 228 - r = KVM_USER_MEM_SLOTS; 229 - break; 230 227 case KVM_CAP_MSI_DEVID: 231 228 if (!kvm) 232 229 r = -EINVAL; ··· 357 360 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 358 361 { 359 362 int *last_ran; 363 + kvm_host_data_t *cpu_data; 360 364 361 365 last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran); 366 + cpu_data = this_cpu_ptr(&kvm_host_data); 362 367 363 368 /* 364 369 * We might get preempted before the vCPU actually runs, but ··· 372 373 } 373 374 374 375 vcpu->cpu = cpu; 375 - vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state); 376 + vcpu->arch.host_cpu_context = &cpu_data->host_ctxt; 376 377 377 378 kvm_arm_set_running_vcpu(vcpu); 378 379 kvm_vgic_load(vcpu); 379 380 kvm_timer_vcpu_load(vcpu); 380 381 kvm_vcpu_load_sysregs(vcpu); 381 382 kvm_arch_vcpu_load_fp(vcpu); 383 + kvm_vcpu_pmu_restore_guest(vcpu); 382 384 383 385 if (single_task_running()) 384 386 vcpu_clear_wfe_traps(vcpu); 385 387 else 386 388 vcpu_set_wfe_traps(vcpu); 389 + 390 + vcpu_ptrauth_setup_lazy(vcpu); 387 391 } 388 392 389 393 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) ··· 395 393 kvm_vcpu_put_sysregs(vcpu); 396 394 kvm_timer_vcpu_put(vcpu); 397 395 kvm_vgic_put(vcpu); 396 + kvm_vcpu_pmu_restore_host(vcpu); 398 397 399 398 vcpu->cpu = -1; 400 399 ··· 547 544 548 545 if (likely(vcpu->arch.has_run_once)) 549 546 return 0; 547 + 548 + if (!kvm_arm_vcpu_is_finalized(vcpu)) 549 + return -EPERM; 550 550 551 551 vcpu->arch.has_run_once = true; 552 552 ··· 1127 1121 if (unlikely(!kvm_vcpu_initialized(vcpu))) 1128 1122 break; 1129 1123 1124 + r = -EPERM; 1125 + if (!kvm_arm_vcpu_is_finalized(vcpu)) 1126 + break; 1127 + 1130 1128 r = -EFAULT; 1131 1129 if (copy_from_user(&reg_list, user_list, sizeof(reg_list))) 1132 1130 break; ··· 1183 1173 return -EFAULT; 1184 1174 1185 1175 return kvm_arm_vcpu_set_events(vcpu, &events); 1176 + } 1177 + case KVM_ARM_VCPU_FINALIZE: { 1178 + int what; 1179 + 1180 + if (!kvm_vcpu_initialized(vcpu)) 1181 + return -ENOEXEC; 1182 + 1183 + if (get_user(what, (const int __user *)argp)) 1184 + return -EFAULT; 1185 + 1186 + return kvm_arm_vcpu_finalize(vcpu, what); 1186 1187 } 1187 1188 default: 1188 1189 r = -EINVAL; ··· 1575 1554 } 1576 1555 1577 1556 for_each_possible_cpu(cpu) { 1578 - kvm_cpu_context_t *cpu_ctxt; 1557 + kvm_host_data_t *cpu_data; 1579 1558 1580 - cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); 1581 - kvm_init_host_cpu_context(cpu_ctxt, cpu); 1582 - err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP); 1559 + cpu_data = per_cpu_ptr(&kvm_host_data, cpu); 1560 + kvm_init_host_cpu_context(&cpu_data->host_ctxt, cpu); 1561 + err = create_hyp_mappings(cpu_data, cpu_data + 1, PAGE_HYP); 1583 1562 1584 1563 if (err) { 1585 1564 kvm_err("Cannot map host CPU state: %d\n", err); ··· 1687 1666 } 1688 1667 1689 1668 err = init_common_resources(); 1669 + if (err) 1670 + return err; 1671 + 1672 + err = kvm_arm_init_sve(); 1690 1673 if (err) 1691 1674 return err; 1692 1675

+94 -9

virt/kvm/kvm_main.c

··· 51 51 #include <linux/slab.h> 52 52 #include <linux/sort.h> 53 53 #include <linux/bsearch.h> 54 + #include <linux/io.h> 54 55 55 56 #include <asm/processor.h> 56 - #include <asm/io.h> 57 57 #include <asm/ioctl.h> 58 58 #include <linux/uaccess.h> 59 59 #include <asm/pgtable.h> ··· 1135 1135 1136 1136 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT 1137 1137 /** 1138 - * kvm_get_dirty_log_protect - get a snapshot of dirty pages, and if any pages 1138 + * kvm_get_dirty_log_protect - get a snapshot of dirty pages 1139 1139 * and reenable dirty page tracking for the corresponding pages. 1140 1140 * @kvm: pointer to kvm instance 1141 1141 * @log: slot id and address to which we copy the log 1142 - * @is_dirty: flag set if any page is dirty 1142 + * @flush: true if TLB flush is needed by caller 1143 1143 * 1144 1144 * We need to keep it in mind that VCPU threads can write to the bitmap 1145 1145 * concurrently. So, to avoid losing track of dirty pages we keep the ··· 1224 1224 * and reenable dirty page tracking for the corresponding pages. 1225 1225 * @kvm: pointer to kvm instance 1226 1226 * @log: slot id and address from which to fetch the bitmap of dirty pages 1227 + * @flush: true if TLB flush is needed by caller 1227 1228 */ 1228 1229 int kvm_clear_dirty_log_protect(struct kvm *kvm, 1229 1230 struct kvm_clear_dirty_log *log, bool *flush) ··· 1252 1251 if (!dirty_bitmap) 1253 1252 return -ENOENT; 1254 1253 1255 - n = kvm_dirty_bitmap_bytes(memslot); 1254 + n = ALIGN(log->num_pages, BITS_PER_LONG) / 8; 1256 1255 1257 1256 if (log->first_page > memslot->npages || 1258 1257 log->num_pages > memslot->npages - log->first_page || ··· 1265 1264 return -EFAULT; 1266 1265 1267 1266 spin_lock(&kvm->mmu_lock); 1268 - for (offset = log->first_page, 1269 - i = offset / BITS_PER_LONG, n = log->num_pages / BITS_PER_LONG; n--; 1267 + for (offset = log->first_page, i = offset / BITS_PER_LONG, 1268 + n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--; 1270 1269 i++, offset += BITS_PER_LONG) { 1271 1270 unsigned long mask = *dirty_bitmap_buffer++; 1272 1271 atomic_long_t *p = (atomic_long_t *) &dirty_bitmap[i]; ··· 1742 1741 return kvm_pfn_to_page(pfn); 1743 1742 } 1744 1743 EXPORT_SYMBOL_GPL(gfn_to_page); 1744 + 1745 + static int __kvm_map_gfn(struct kvm_memory_slot *slot, gfn_t gfn, 1746 + struct kvm_host_map *map) 1747 + { 1748 + kvm_pfn_t pfn; 1749 + void *hva = NULL; 1750 + struct page *page = KVM_UNMAPPED_PAGE; 1751 + 1752 + if (!map) 1753 + return -EINVAL; 1754 + 1755 + pfn = gfn_to_pfn_memslot(slot, gfn); 1756 + if (is_error_noslot_pfn(pfn)) 1757 + return -EINVAL; 1758 + 1759 + if (pfn_valid(pfn)) { 1760 + page = pfn_to_page(pfn); 1761 + hva = kmap(page); 1762 + } else { 1763 + hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB); 1764 + } 1765 + 1766 + if (!hva) 1767 + return -EFAULT; 1768 + 1769 + map->page = page; 1770 + map->hva = hva; 1771 + map->pfn = pfn; 1772 + map->gfn = gfn; 1773 + 1774 + return 0; 1775 + } 1776 + 1777 + int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) 1778 + { 1779 + return __kvm_map_gfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, map); 1780 + } 1781 + EXPORT_SYMBOL_GPL(kvm_vcpu_map); 1782 + 1783 + void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, 1784 + bool dirty) 1785 + { 1786 + if (!map) 1787 + return; 1788 + 1789 + if (!map->hva) 1790 + return; 1791 + 1792 + if (map->page) 1793 + kunmap(map->page); 1794 + else 1795 + memunmap(map->hva); 1796 + 1797 + if (dirty) { 1798 + kvm_vcpu_mark_page_dirty(vcpu, map->gfn); 1799 + kvm_release_pfn_dirty(map->pfn); 1800 + } else { 1801 + kvm_release_pfn_clean(map->pfn); 1802 + } 1803 + 1804 + map->hva = NULL; 1805 + map->page = NULL; 1806 + } 1807 + EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); 1745 1808 1746 1809 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn) 1747 1810 { ··· 2320 2255 u64 block_ns; 2321 2256 2322 2257 start = cur = ktime_get(); 2323 - if (vcpu->halt_poll_ns) { 2258 + if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) { 2324 2259 ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns); 2325 2260 2326 2261 ++vcpu->stat.halt_attempted_poll; ··· 2951 2886 } 2952 2887 #endif 2953 2888 2889 + static int kvm_device_mmap(struct file *filp, struct vm_area_struct *vma) 2890 + { 2891 + struct kvm_device *dev = filp->private_data; 2892 + 2893 + if (dev->ops->mmap) 2894 + return dev->ops->mmap(dev, vma); 2895 + 2896 + return -ENODEV; 2897 + } 2898 + 2954 2899 static int kvm_device_ioctl_attr(struct kvm_device *dev, 2955 2900 int (*accessor)(struct kvm_device *dev, 2956 2901 struct kvm_device_attr *attr), ··· 3005 2930 struct kvm_device *dev = filp->private_data; 3006 2931 struct kvm *kvm = dev->kvm; 3007 2932 2933 + if (dev->ops->release) { 2934 + mutex_lock(&kvm->lock); 2935 + list_del(&dev->vm_node); 2936 + dev->ops->release(dev); 2937 + mutex_unlock(&kvm->lock); 2938 + } 2939 + 3008 2940 kvm_put_kvm(kvm); 3009 2941 return 0; 3010 2942 } ··· 3020 2938 .unlocked_ioctl = kvm_device_ioctl, 3021 2939 .release = kvm_device_release, 3022 2940 KVM_COMPAT(kvm_device_ioctl), 2941 + .mmap = kvm_device_mmap, 3023 2942 }; 3024 2943 3025 2944 struct kvm_device *kvm_device_from_filp(struct file *filp) ··· 3129 3046 case KVM_CAP_CHECK_EXTENSION_VM: 3130 3047 case KVM_CAP_ENABLE_CAP_VM: 3131 3048 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT 3132 - case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT: 3049 + case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2: 3133 3050 #endif 3134 3051 return 1; 3135 3052 #ifdef CONFIG_KVM_MMIO ··· 3148 3065 #endif 3149 3066 case KVM_CAP_MAX_VCPU_ID: 3150 3067 return KVM_MAX_VCPU_ID; 3068 + case KVM_CAP_NR_MEMSLOTS: 3069 + return KVM_USER_MEM_SLOTS; 3151 3070 default: 3152 3071 break; 3153 3072 } ··· 3167 3082 { 3168 3083 switch (cap->cap) { 3169 3084 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT 3170 - case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT: 3085 + case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2: 3171 3086 if (cap->flags || (cap->args[0] & ~1)) 3172 3087 return -EINVAL; 3173 3088 kvm->manual_dirty_log_protect = cap->args[0];

Configure Feed

Configure Feed