Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+2

Documentation/arch/arm64/silicon-errata.rst

+61 -5

Documentation/virt/kvm/api.rst

··· 1411 1411 mmap() that affects the region will be made visible immediately. Another 1412 1412 example is madvise(MADV_DROP). 1413 1413 1414 + For TDX guest, deleting/moving memory region loses guest memory contents. 1415 + Read only region isn't supported. Only as-id 0 is supported. 1416 + 1414 1417 Note: On arm64, a write generated by the page-table walker (to update 1415 1418 the Access and Dirty flags, for example) never results in a 1416 1419 KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This ··· 3463 3460 - FPSIMD/NEON registers: set to 0 3464 3461 - SVE registers: set to 0 3465 3462 - System registers: Reset to their architecturally defined 3466 - values as for a warm reset to EL1 (resp. SVC) 3463 + values as for a warm reset to EL1 (resp. SVC) or EL2 (in the 3464 + case of EL2 being enabled). 3467 3465 3468 3466 Note that because some registers reflect machine topology, all vcpus 3469 3467 should be created before this ioctl is invoked. ··· 3530 3526 3531 3527 - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can 3532 3528 no longer be written using KVM_SET_ONE_REG. 3529 + 3530 + - KVM_ARM_VCPU_HAS_EL2: Enable Nested Virtualisation support, 3531 + booting the guest from EL2 instead of EL1. 3532 + Depends on KVM_CAP_ARM_EL2. 3533 + The VM is running with HCR_EL2.E2H being RES1 (VHE) unless 3534 + KVM_ARM_VCPU_HAS_EL2_E2H0 is also set. 3535 + 3536 + - KVM_ARM_VCPU_HAS_EL2_E2H0: Restrict Nested Virtualisation 3537 + support to HCR_EL2.E2H being RES0 (non-VHE). 3538 + Depends on KVM_CAP_ARM_EL2_E2H0. 3539 + KVM_ARM_VCPU_HAS_EL2 must also be set. 3533 3540 3534 3541 4.83 KVM_ARM_PREFERRED_TARGET 3535 3542 ----------------------------- ··· 4783 4768 4784 4769 :Capability: basic 4785 4770 :Architectures: x86 4786 - :Type: vm 4771 + :Type: vm ioctl, vcpu ioctl 4787 4772 :Parameters: an opaque platform specific structure (in/out) 4788 4773 :Returns: 0 on success; -1 on error 4789 4774 ··· 4791 4776 for issuing platform-specific memory encryption commands to manage those 4792 4777 encrypted VMs. 4793 4778 4794 - Currently, this ioctl is used for issuing Secure Encrypted Virtualization 4795 - (SEV) commands on AMD Processors. The SEV commands are defined in 4796 - Documentation/virt/kvm/x86/amd-memory-encryption.rst. 4779 + Currently, this ioctl is used for issuing both Secure Encrypted Virtualization 4780 + (SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands 4781 + on Intel Processors. The detailed commands are defined in 4782 + Documentation/virt/kvm/x86/amd-memory-encryption.rst and 4783 + Documentation/virt/kvm/x86/intel-tdx.rst. 4797 4784 4798 4785 4.111 KVM_MEMORY_ENCRYPT_REG_REGION 4799 4786 ----------------------------------- ··· 6844 6827 #define KVM_SYSTEM_EVENT_WAKEUP 4 6845 6828 #define KVM_SYSTEM_EVENT_SUSPEND 5 6846 6829 #define KVM_SYSTEM_EVENT_SEV_TERM 6 6830 + #define KVM_SYSTEM_EVENT_TDX_FATAL 7 6847 6831 __u32 type; 6848 6832 __u32 ndata; 6849 6833 __u64 data[16]; ··· 6871 6853 reset/shutdown of the VM. 6872 6854 - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination. 6873 6855 The guest physical address of the guest's GHCB is stored in `data[0]`. 6856 + - KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state. 6857 + KVM doesn't do any parsing or conversion, it just dumps 16 general-purpose 6858 + registers to userspace, in ascending order of the 4-bit indices for x86-64 6859 + general-purpose registers in instruction encoding, as defined in the Intel 6860 + SDM. 6874 6861 - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and 6875 6862 KVM has recognized a wakeup event. Userspace may honor this event by 6876 6863 marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. ··· 8217 8194 and 0x489), as KVM does now allow them to 8218 8195 be set by userspace (KVM sets them based on 8219 8196 guest CPUID, for safety purposes). 8197 + 8198 + KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores 8199 + guest PAT and forces the effective memory 8200 + type to WB in EPT. The quirk is not available 8201 + on Intel platforms which are incapable of 8202 + safely honoring guest PAT (i.e., without CPU 8203 + self-snoop, KVM always ignores guest PAT and 8204 + forces effective memory type to WB). It is 8205 + also ignored on AMD platforms or, on Intel, 8206 + when a VM has non-coherent DMA devices 8207 + assigned; KVM always honors guest PAT in 8208 + such case. The quirk is needed to avoid 8209 + slowdowns on certain Intel Xeon platforms 8210 + (e.g. ICX, SPR) where self-snoop feature is 8211 + supported but UC is slow enough to cause 8212 + issues with some older guests that use 8213 + UC instead of WC to map the video RAM. 8214 + Userspace can disable the quirk to honor 8215 + guest PAT if it knows that there is no such 8216 + guest software, for example if it does not 8217 + expose a bochs graphics device (which is 8218 + known to have had a buggy driver). 8220 8219 =================================== ============================================ 8221 8220 8222 8221 7.32 KVM_CAP_MAX_VCPU_ID ··· 8540 8495 aforementioned registers before the first KVM_RUN. These registers are VM 8541 8496 scoped, meaning that the same set of values are presented on all vCPUs in a 8542 8497 given VM. 8498 + 8499 + 7.43 KVM_CAP_RISCV_MP_STATE_RESET 8500 + --------------------------------- 8501 + 8502 + :Architectures: riscv 8503 + :Type: VM 8504 + :Parameters: None 8505 + :Returns: 0 on success, -EINVAL if arg[0] is not zero 8506 + 8507 + When this capability is enabled, KVM resets the VCPU when setting 8508 + MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved. 8543 8509 8544 8510 8. Other capabilities. 8545 8511 ======================

+24

Documentation/virt/kvm/devices/vcpu.rst

··· 137 137 hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and 138 138 the cpu field to the processor id. 139 139 140 + 1.5 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS 141 + -------------------------------------------------- 142 + 143 + :Parameters: in kvm_device_attr.addr the address to an unsigned int 144 + representing the maximum value taken by PMCR_EL0.N 145 + 146 + :Returns: 147 + 148 + ======= ==================================================== 149 + -EBUSY PMUv3 already initialized, a VCPU has already run or 150 + an event filter has already been set 151 + -EFAULT Error accessing the value pointed to by addr 152 + -ENODEV PMUv3 not supported or GIC not initialized 153 + -EINVAL No PMUv3 explicitly selected, or value of N out of 154 + range 155 + ======= ==================================================== 156 + 157 + Set the number of implemented event counters in the virtual PMU. This 158 + mandates that a PMU has explicitly been selected via 159 + KVM_ARM_VCPU_PMU_V3_SET_PMU, and will fail when no PMU has been 160 + explicitly selected, or the number of counters is out of range for the 161 + selected PMU. Selecting a new PMU cancels the effect of setting this 162 + attribute. 163 + 140 164 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL 141 165 ================================= 142 166

+1

Documentation/virt/kvm/x86/index.rst

··· 11 11 cpuid 12 12 errata 13 13 hypercalls 14 + intel-tdx 14 15 mmu 15 16 msr 16 17 nested-vmx

+255

Documentation/virt/kvm/x86/intel-tdx.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =================================== 4 + Intel Trust Domain Extensions (TDX) 5 + =================================== 6 + 7 + Overview 8 + ======== 9 + Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the 10 + host and physical attacks. A CPU-attested software module called 'the TDX 11 + module' runs inside a new CPU isolated range to provide the functionalities to 12 + manage and run protected VMs, a.k.a, TDX guests or TDs. 13 + 14 + Please refer to [1] for the whitepaper, specifications and other resources. 15 + 16 + This documentation describes TDX-specific KVM ABIs. The TDX module needs to be 17 + initialized before it can be used by KVM to run any TDX guests. The host 18 + core-kernel provides the support of initializing the TDX module, which is 19 + described in the Documentation/arch/x86/tdx.rst. 20 + 21 + API description 22 + =============== 23 + 24 + KVM_MEMORY_ENCRYPT_OP 25 + --------------------- 26 + :Type: vm ioctl, vcpu ioctl 27 + 28 + For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic 29 + ioctl with TDX specific sub-ioctl() commands. 30 + 31 + :: 32 + 33 + /* Trust Domain Extensions sub-ioctl() commands. */ 34 + enum kvm_tdx_cmd_id { 35 + KVM_TDX_CAPABILITIES = 0, 36 + KVM_TDX_INIT_VM, 37 + KVM_TDX_INIT_VCPU, 38 + KVM_TDX_INIT_MEM_REGION, 39 + KVM_TDX_FINALIZE_VM, 40 + KVM_TDX_GET_CPUID, 41 + 42 + KVM_TDX_CMD_NR_MAX, 43 + }; 44 + 45 + struct kvm_tdx_cmd { 46 + /* enum kvm_tdx_cmd_id */ 47 + __u32 id; 48 + /* flags for sub-command. If sub-command doesn't use this, set zero. */ 49 + __u32 flags; 50 + /* 51 + * data for each sub-command. An immediate or a pointer to the actual 52 + * data in process virtual address. If sub-command doesn't use it, 53 + * set zero. 54 + */ 55 + __u64 data; 56 + /* 57 + * Auxiliary error code. The sub-command may return TDX SEAMCALL 58 + * status code in addition to -Exxx. 59 + */ 60 + __u64 hw_error; 61 + }; 62 + 63 + KVM_TDX_CAPABILITIES 64 + -------------------- 65 + :Type: vm ioctl 66 + :Returns: 0 on success, <0 on error 67 + 68 + Return the TDX capabilities that current KVM supports with the specific TDX 69 + module loaded in the system. It reports what features/capabilities are allowed 70 + to be configured to the TDX guest. 71 + 72 + - id: KVM_TDX_CAPABILITIES 73 + - flags: must be 0 74 + - data: pointer to struct kvm_tdx_capabilities 75 + - hw_error: must be 0 76 + 77 + :: 78 + 79 + struct kvm_tdx_capabilities { 80 + __u64 supported_attrs; 81 + __u64 supported_xfam; 82 + __u64 reserved[254]; 83 + 84 + /* Configurable CPUID bits for userspace */ 85 + struct kvm_cpuid2 cpuid; 86 + }; 87 + 88 + 89 + KVM_TDX_INIT_VM 90 + --------------- 91 + :Type: vm ioctl 92 + :Returns: 0 on success, <0 on error 93 + 94 + Perform TDX specific VM initialization. This needs to be called after 95 + KVM_CREATE_VM and before creating any VCPUs. 96 + 97 + - id: KVM_TDX_INIT_VM 98 + - flags: must be 0 99 + - data: pointer to struct kvm_tdx_init_vm 100 + - hw_error: must be 0 101 + 102 + :: 103 + 104 + struct kvm_tdx_init_vm { 105 + __u64 attributes; 106 + __u64 xfam; 107 + __u64 mrconfigid[6]; /* sha384 digest */ 108 + __u64 mrowner[6]; /* sha384 digest */ 109 + __u64 mrownerconfig[6]; /* sha384 digest */ 110 + 111 + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ 112 + __u64 reserved[12]; 113 + 114 + /* 115 + * Call KVM_TDX_INIT_VM before vcpu creation, thus before 116 + * KVM_SET_CPUID2. 117 + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the 118 + * TDX module directly virtualizes those CPUIDs without VMM. The user 119 + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with 120 + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of 121 + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX 122 + * module doesn't virtualize. 123 + */ 124 + struct kvm_cpuid2 cpuid; 125 + }; 126 + 127 + 128 + KVM_TDX_INIT_VCPU 129 + ----------------- 130 + :Type: vcpu ioctl 131 + :Returns: 0 on success, <0 on error 132 + 133 + Perform TDX specific VCPU initialization. 134 + 135 + - id: KVM_TDX_INIT_VCPU 136 + - flags: must be 0 137 + - data: initial value of the guest TD VCPU RCX 138 + - hw_error: must be 0 139 + 140 + KVM_TDX_INIT_MEM_REGION 141 + ----------------------- 142 + :Type: vcpu ioctl 143 + :Returns: 0 on success, <0 on error 144 + 145 + Initialize @nr_pages TDX guest private memory starting from @gpa with userspace 146 + provided data from @source_addr. 147 + 148 + Note, before calling this sub command, memory attribute of the range 149 + [gpa, gpa + nr_pages] needs to be private. Userspace can use 150 + KVM_SET_MEMORY_ATTRIBUTES to set the attribute. 151 + 152 + If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement. 153 + 154 + - id: KVM_TDX_INIT_MEM_REGION 155 + - flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined 156 + - data: pointer to struct kvm_tdx_init_mem_region 157 + - hw_error: must be 0 158 + 159 + :: 160 + 161 + #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0) 162 + 163 + struct kvm_tdx_init_mem_region { 164 + __u64 source_addr; 165 + __u64 gpa; 166 + __u64 nr_pages; 167 + }; 168 + 169 + 170 + KVM_TDX_FINALIZE_VM 171 + ------------------- 172 + :Type: vm ioctl 173 + :Returns: 0 on success, <0 on error 174 + 175 + Complete measurement of the initial TD contents and mark it ready to run. 176 + 177 + - id: KVM_TDX_FINALIZE_VM 178 + - flags: must be 0 179 + - data: must be 0 180 + - hw_error: must be 0 181 + 182 + 183 + KVM_TDX_GET_CPUID 184 + ----------------- 185 + :Type: vcpu ioctl 186 + :Returns: 0 on success, <0 on error 187 + 188 + Get the CPUID values that the TDX module virtualizes for the TD guest. 189 + When it returns -E2BIG, the user space should allocate a larger buffer and 190 + retry. The minimum buffer size is updated in the nent field of the 191 + struct kvm_cpuid2. 192 + 193 + - id: KVM_TDX_GET_CPUID 194 + - flags: must be 0 195 + - data: pointer to struct kvm_cpuid2 (in/out) 196 + - hw_error: must be 0 (out) 197 + 198 + :: 199 + 200 + struct kvm_cpuid2 { 201 + __u32 nent; 202 + __u32 padding; 203 + struct kvm_cpuid_entry2 entries[0]; 204 + }; 205 + 206 + struct kvm_cpuid_entry2 { 207 + __u32 function; 208 + __u32 index; 209 + __u32 flags; 210 + __u32 eax; 211 + __u32 ebx; 212 + __u32 ecx; 213 + __u32 edx; 214 + __u32 padding[3]; 215 + }; 216 + 217 + KVM TDX creation flow 218 + ===================== 219 + In addition to the standard KVM flow, new TDX ioctls need to be called. The 220 + control flow is as follows: 221 + 222 + #. Check system wide capability 223 + 224 + * KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM 225 + is supported. 226 + 227 + #. Create VM 228 + 229 + * KVM_CREATE_VM 230 + * KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests. 231 + * KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can 232 + support at VM level (TDX has its own limitation on this). 233 + * KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency 234 + than host is desired. This is Optional. 235 + * KVM_TDX_INIT_VM: Pass TDX specific VM parameters. 236 + 237 + #. Create VCPU 238 + 239 + * KVM_CREATE_VCPU 240 + * KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters. 241 + * KVM_SET_CPUID2: Configure TD's CPUIDs. 242 + * KVM_SET_MSRS: Configure TD's MSRs. 243 + 244 + #. Initialize initial guest memory 245 + 246 + * Prepare content of initial guest memory. 247 + * KVM_TDX_INIT_MEM_REGION: Add initial guest memory. 248 + * KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest. 249 + 250 + #. Run VCPU 251 + 252 + References 253 + ========== 254 + 255 + https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html

+2

MAINTAINERS

··· 13148 13148 F: arch/loongarch/include/asm/kvm* 13149 13149 F: arch/loongarch/include/uapi/asm/kvm* 13150 13150 F: arch/loongarch/kvm/ 13151 + F: tools/testing/selftests/kvm/*/loongarch/ 13152 + F: tools/testing/selftests/kvm/lib/loongarch/ 13151 13153 13152 13154 KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips) 13153 13155 M: Huacai Chen <chenhuacai@kernel.org>

+17

arch/arm64/Kconfig

··· 464 464 465 465 If unsure, say Y. 466 466 467 + config AMPERE_ERRATUM_AC04_CPU_23 468 + bool "AmpereOne: AC04_CPU_23: Failure to synchronize writes to HCR_EL2 may corrupt address translations." 469 + default y 470 + help 471 + This option adds an alternative code sequence to work around Ampere 472 + errata AC04_CPU_23 on AmpereOne. 473 + 474 + Updates to HCR_EL2 can rarely corrupt simultaneous translations for 475 + data addresses initiated by load/store instructions. Only 476 + instruction initiated translations are vulnerable, not translations 477 + from prefetches for example. A DSB before the store to HCR_EL2 is 478 + sufficient to prevent older instructions from hitting the window 479 + for corruption, and an ISB after is sufficient to prevent younger 480 + instructions from hitting the window for corruption. 481 + 482 + If unsure, say Y. 483 + 467 484 config ARM64_WORKAROUND_CLEAN_CACHE 468 485 bool 469 486

+8 -8

arch/arm64/include/asm/el2_setup.h

··· 38 38 39 39 orr x0, x0, #HCR_E2H 40 40 .LnVHE_\@: 41 - msr hcr_el2, x0 41 + msr_hcr_el2 x0 42 42 isb 43 43 .endm 44 44 ··· 215 215 cbz x1, .Lskip_sme_fgt_\@ 216 216 217 217 /* Disable nVHE traps of TPIDR2 and SMPRI */ 218 - orr x0, x0, #HFGxTR_EL2_nSMPRI_EL1_MASK 219 - orr x0, x0, #HFGxTR_EL2_nTPIDR2_EL0_MASK 218 + orr x0, x0, #HFGRTR_EL2_nSMPRI_EL1_MASK 219 + orr x0, x0, #HFGRTR_EL2_nTPIDR2_EL0_MASK 220 220 221 221 .Lskip_sme_fgt_\@: 222 222 mrs_s x1, SYS_ID_AA64MMFR3_EL1 ··· 224 224 cbz x1, .Lskip_pie_fgt_\@ 225 225 226 226 /* Disable trapping of PIR_EL1 / PIRE0_EL1 */ 227 - orr x0, x0, #HFGxTR_EL2_nPIR_EL1 228 - orr x0, x0, #HFGxTR_EL2_nPIRE0_EL1 227 + orr x0, x0, #HFGRTR_EL2_nPIR_EL1 228 + orr x0, x0, #HFGRTR_EL2_nPIRE0_EL1 229 229 230 230 .Lskip_pie_fgt_\@: 231 231 mrs_s x1, SYS_ID_AA64MMFR3_EL1 ··· 233 233 cbz x1, .Lskip_poe_fgt_\@ 234 234 235 235 /* Disable trapping of POR_EL0 */ 236 - orr x0, x0, #HFGxTR_EL2_nPOR_EL0 236 + orr x0, x0, #HFGRTR_EL2_nPOR_EL0 237 237 238 238 .Lskip_poe_fgt_\@: 239 239 /* GCS depends on PIE so we don't check it if PIE is absent */ ··· 242 242 cbz x1, .Lskip_gce_fgt_\@ 243 243 244 244 /* Disable traps of access to GCS registers at EL0 and EL1 */ 245 - orr x0, x0, #HFGxTR_EL2_nGCS_EL1_MASK 246 - orr x0, x0, #HFGxTR_EL2_nGCS_EL0_MASK 245 + orr x0, x0, #HFGRTR_EL2_nGCS_EL1_MASK 246 + orr x0, x0, #HFGRTR_EL2_nGCS_EL0_MASK 247 247 248 248 .Lskip_gce_fgt_\@: 249 249

+16 -1

arch/arm64/include/asm/esr.h

··· 20 20 #define ESR_ELx_EC_FP_ASIMD UL(0x07) 21 21 #define ESR_ELx_EC_CP10_ID UL(0x08) /* EL2 only */ 22 22 #define ESR_ELx_EC_PAC UL(0x09) /* EL2 and above */ 23 - /* Unallocated EC: 0x0A - 0x0B */ 23 + #define ESR_ELx_EC_OTHER UL(0x0A) 24 + /* Unallocated EC: 0x0B */ 24 25 #define ESR_ELx_EC_CP14_64 UL(0x0C) 25 26 #define ESR_ELx_EC_BTI UL(0x0D) 26 27 #define ESR_ELx_EC_ILL UL(0x0E) ··· 100 99 #define ESR_ELx_AET_CE (UL(6) << ESR_ELx_AET_SHIFT) 101 100 102 101 /* Shared ISS field definitions for Data/Instruction aborts */ 102 + #define ESR_ELx_VNCR_SHIFT (13) 103 + #define ESR_ELx_VNCR (UL(1) << ESR_ELx_VNCR_SHIFT) 103 104 #define ESR_ELx_SET_SHIFT (11) 104 105 #define ESR_ELx_SET_MASK (UL(3) << ESR_ELx_SET_SHIFT) 105 106 #define ESR_ELx_FnV_SHIFT (10) ··· 183 180 #define ESR_ELx_WFx_ISS_WFI (UL(0) << 0) 184 181 #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0) 185 182 #define ESR_ELx_xVC_IMM_MASK ((UL(1) << 16) - 1) 183 + 184 + /* ISS definitions for LD64B/ST64B/{T,P}SBCSYNC instructions */ 185 + #define ESR_ELx_ISS_OTHER_ST64BV (0) 186 + #define ESR_ELx_ISS_OTHER_ST64BV0 (1) 187 + #define ESR_ELx_ISS_OTHER_LDST64B (2) 188 + #define ESR_ELx_ISS_OTHER_TSBCSYNC (3) 189 + #define ESR_ELx_ISS_OTHER_PSBCSYNC (4) 186 190 187 191 #define DISR_EL1_IDS (UL(1) << 24) 188 192 /* ··· 450 440 { 451 441 return ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 && 452 442 (esr_brk_comment(esr) & ~CFI_BRK_IMM_MASK) == CFI_BRK_IMM_BASE; 443 + } 444 + 445 + static inline bool esr_is_ubsan_brk(unsigned long esr) 446 + { 447 + return (esr_brk_comment(esr) & ~UBSAN_BRK_MASK) == UBSAN_BRK_IMM; 453 448 } 454 449 455 450 static inline bool esr_fsc_is_translation_fault(unsigned long esr)

+6

arch/arm64/include/asm/fixmap.h

··· 48 48 FIX_EARLYCON_MEM_BASE, 49 49 FIX_TEXT_POKE0, 50 50 51 + #ifdef CONFIG_KVM 52 + /* One slot per CPU, mapping the guest's VNCR page at EL2. */ 53 + FIX_VNCR_END, 54 + FIX_VNCR = FIX_VNCR_END + NR_CPUS, 55 + #endif 56 + 51 57 #ifdef CONFIG_ACPI_APEI_GHES 52 58 /* Used for GHES mapping from assorted contexts */ 53 59 FIX_APEI_GHES_IRQ,

+2 -2

arch/arm64/include/asm/hardirq.h

··· 41 41 \ 42 42 ___hcr = read_sysreg(hcr_el2); \ 43 43 if (!(___hcr & HCR_TGE)) { \ 44 - write_sysreg(___hcr | HCR_TGE, hcr_el2); \ 44 + write_sysreg_hcr(___hcr | HCR_TGE); \ 45 45 isb(); \ 46 46 } \ 47 47 /* \ ··· 82 82 */ \ 83 83 barrier(); \ 84 84 if (!___ctx->cnt && !(___hcr & HCR_TGE)) \ 85 - write_sysreg(___hcr, hcr_el2); \ 85 + write_sysreg_hcr(___hcr); \ 86 86 } while (0) 87 87 88 88 static inline void ack_bad_irq(unsigned int irq)

+75 -109

arch/arm64/include/asm/kvm_arm.h

··· 12 12 #include <asm/sysreg.h> 13 13 #include <asm/types.h> 14 14 15 - /* Hyp Configuration Register (HCR) bits */ 15 + /* 16 + * Because I'm terribly lazy and that repainting the whole of the KVM 17 + * code with the proper names is a pain, use a helper to map the names 18 + * inherited from AArch32 with the new fancy nomenclature. One day... 19 + */ 20 + #define __HCR(x) HCR_EL2_##x 16 21 17 - #define HCR_TID5 (UL(1) << 58) 18 - #define HCR_DCT (UL(1) << 57) 19 - #define HCR_ATA_SHIFT 56 20 - #define HCR_ATA (UL(1) << HCR_ATA_SHIFT) 21 - #define HCR_TTLBOS (UL(1) << 55) 22 - #define HCR_TTLBIS (UL(1) << 54) 23 - #define HCR_ENSCXT (UL(1) << 53) 24 - #define HCR_TOCU (UL(1) << 52) 25 - #define HCR_AMVOFFEN (UL(1) << 51) 26 - #define HCR_TICAB (UL(1) << 50) 27 - #define HCR_TID4 (UL(1) << 49) 28 - #define HCR_FIEN (UL(1) << 47) 29 - #define HCR_FWB (UL(1) << 46) 30 - #define HCR_NV2 (UL(1) << 45) 31 - #define HCR_AT (UL(1) << 44) 32 - #define HCR_NV1 (UL(1) << 43) 33 - #define HCR_NV (UL(1) << 42) 34 - #define HCR_API (UL(1) << 41) 35 - #define HCR_APK (UL(1) << 40) 36 - #define HCR_TEA (UL(1) << 37) 37 - #define HCR_TERR (UL(1) << 36) 38 - #define HCR_TLOR (UL(1) << 35) 39 - #define HCR_E2H (UL(1) << 34) 40 - #define HCR_ID (UL(1) << 33) 41 - #define HCR_CD (UL(1) << 32) 42 - #define HCR_RW_SHIFT 31 43 - #define HCR_RW (UL(1) << HCR_RW_SHIFT) 44 - #define HCR_TRVM (UL(1) << 30) 45 - #define HCR_HCD (UL(1) << 29) 46 - #define HCR_TDZ (UL(1) << 28) 47 - #define HCR_TGE (UL(1) << 27) 48 - #define HCR_TVM (UL(1) << 26) 49 - #define HCR_TTLB (UL(1) << 25) 50 - #define HCR_TPU (UL(1) << 24) 51 - #define HCR_TPC (UL(1) << 23) /* HCR_TPCP if FEAT_DPB */ 52 - #define HCR_TSW (UL(1) << 22) 53 - #define HCR_TACR (UL(1) << 21) 54 - #define HCR_TIDCP (UL(1) << 20) 55 - #define HCR_TSC (UL(1) << 19) 56 - #define HCR_TID3 (UL(1) << 18) 57 - #define HCR_TID2 (UL(1) << 17) 58 - #define HCR_TID1 (UL(1) << 16) 59 - #define HCR_TID0 (UL(1) << 15) 60 - #define HCR_TWE (UL(1) << 14) 61 - #define HCR_TWI (UL(1) << 13) 62 - #define HCR_DC (UL(1) << 12) 63 - #define HCR_BSU (3 << 10) 64 - #define HCR_BSU_IS (UL(1) << 10) 65 - #define HCR_FB (UL(1) << 9) 66 - #define HCR_VSE (UL(1) << 8) 67 - #define HCR_VI (UL(1) << 7) 68 - #define HCR_VF (UL(1) << 6) 69 - #define HCR_AMO (UL(1) << 5) 70 - #define HCR_IMO (UL(1) << 4) 71 - #define HCR_FMO (UL(1) << 3) 72 - #define HCR_PTW (UL(1) << 2) 73 - #define HCR_SWIO (UL(1) << 1) 74 - #define HCR_VM (UL(1) << 0) 75 - #define HCR_RES0 ((UL(1) << 48) | (UL(1) << 39)) 22 + #define HCR_TID5 __HCR(TID5) 23 + #define HCR_DCT __HCR(DCT) 24 + #define HCR_ATA_SHIFT __HCR(ATA_SHIFT) 25 + #define HCR_ATA __HCR(ATA) 26 + #define HCR_TTLBOS __HCR(TTLBOS) 27 + #define HCR_TTLBIS __HCR(TTLBIS) 28 + #define HCR_ENSCXT __HCR(EnSCXT) 29 + #define HCR_TOCU __HCR(TOCU) 30 + #define HCR_AMVOFFEN __HCR(AMVOFFEN) 31 + #define HCR_TICAB __HCR(TICAB) 32 + #define HCR_TID4 __HCR(TID4) 33 + #define HCR_FIEN __HCR(FIEN) 34 + #define HCR_FWB __HCR(FWB) 35 + #define HCR_NV2 __HCR(NV2) 36 + #define HCR_AT __HCR(AT) 37 + #define HCR_NV1 __HCR(NV1) 38 + #define HCR_NV __HCR(NV) 39 + #define HCR_API __HCR(API) 40 + #define HCR_APK __HCR(APK) 41 + #define HCR_TEA __HCR(TEA) 42 + #define HCR_TERR __HCR(TERR) 43 + #define HCR_TLOR __HCR(TLOR) 44 + #define HCR_E2H __HCR(E2H) 45 + #define HCR_ID __HCR(ID) 46 + #define HCR_CD __HCR(CD) 47 + #define HCR_RW __HCR(RW) 48 + #define HCR_TRVM __HCR(TRVM) 49 + #define HCR_HCD __HCR(HCD) 50 + #define HCR_TDZ __HCR(TDZ) 51 + #define HCR_TGE __HCR(TGE) 52 + #define HCR_TVM __HCR(TVM) 53 + #define HCR_TTLB __HCR(TTLB) 54 + #define HCR_TPU __HCR(TPU) 55 + #define HCR_TPC __HCR(TPCP) 56 + #define HCR_TSW __HCR(TSW) 57 + #define HCR_TACR __HCR(TACR) 58 + #define HCR_TIDCP __HCR(TIDCP) 59 + #define HCR_TSC __HCR(TSC) 60 + #define HCR_TID3 __HCR(TID3) 61 + #define HCR_TID2 __HCR(TID2) 62 + #define HCR_TID1 __HCR(TID1) 63 + #define HCR_TID0 __HCR(TID0) 64 + #define HCR_TWE __HCR(TWE) 65 + #define HCR_TWI __HCR(TWI) 66 + #define HCR_DC __HCR(DC) 67 + #define HCR_BSU __HCR(BSU) 68 + #define HCR_BSU_IS __HCR(BSU_IS) 69 + #define HCR_FB __HCR(FB) 70 + #define HCR_VSE __HCR(VSE) 71 + #define HCR_VI __HCR(VI) 72 + #define HCR_VF __HCR(VF) 73 + #define HCR_AMO __HCR(AMO) 74 + #define HCR_IMO __HCR(IMO) 75 + #define HCR_FMO __HCR(FMO) 76 + #define HCR_PTW __HCR(PTW) 77 + #define HCR_SWIO __HCR(SWIO) 78 + #define HCR_VM __HCR(VM) 76 79 77 80 /* 78 81 * The bits we set in HCR: ··· 315 312 GENMASK(15, 0)) 316 313 317 314 /* 318 - * FGT register definitions 319 - * 320 - * RES0 and polarity masks as of DDI0487J.a, to be updated as needed. 321 - * We're not using the generated masks as they are usually ahead of 322 - * the published ARM ARM, which we use as a reference. 323 - * 324 - * Once we get to a point where the two describe the same thing, we'll 325 - * merge the definitions. One day. 315 + * Polarity masks for HCRX_EL2, limited to the bits that we know about 316 + * at this point in time. It doesn't mean that we actually *handle* 317 + * them, but that at least those that are not advertised to a guest 318 + * will be RES0 for that guest. 326 319 */ 327 - #define __HFGRTR_EL2_RES0 HFGxTR_EL2_RES0 328 - #define __HFGRTR_EL2_MASK GENMASK(49, 0) 329 - #define __HFGRTR_EL2_nMASK ~(__HFGRTR_EL2_RES0 | __HFGRTR_EL2_MASK) 330 - 331 - /* 332 - * The HFGWTR bits are a subset of HFGRTR bits. To ensure we don't miss any 333 - * future additions, define __HFGWTR* macros relative to __HFGRTR* ones. 334 - */ 335 - #define __HFGRTR_ONLY_MASK (BIT(46) | BIT(42) | BIT(40) | BIT(28) | \ 336 - GENMASK(26, 25) | BIT(21) | BIT(18) | \ 337 - GENMASK(15, 14) | GENMASK(10, 9) | BIT(2)) 338 - #define __HFGWTR_EL2_RES0 (__HFGRTR_EL2_RES0 | __HFGRTR_ONLY_MASK) 339 - #define __HFGWTR_EL2_MASK (__HFGRTR_EL2_MASK & ~__HFGRTR_ONLY_MASK) 340 - #define __HFGWTR_EL2_nMASK ~(__HFGWTR_EL2_RES0 | __HFGWTR_EL2_MASK) 341 - 342 - #define __HFGITR_EL2_RES0 HFGITR_EL2_RES0 343 - #define __HFGITR_EL2_MASK (BIT(62) | BIT(60) | GENMASK(54, 0)) 344 - #define __HFGITR_EL2_nMASK ~(__HFGITR_EL2_RES0 | __HFGITR_EL2_MASK) 345 - 346 - #define __HDFGRTR_EL2_RES0 HDFGRTR_EL2_RES0 347 - #define __HDFGRTR_EL2_MASK (BIT(63) | GENMASK(58, 50) | GENMASK(48, 43) | \ 348 - GENMASK(41, 40) | GENMASK(37, 22) | \ 349 - GENMASK(19, 9) | GENMASK(7, 0)) 350 - #define __HDFGRTR_EL2_nMASK ~(__HDFGRTR_EL2_RES0 | __HDFGRTR_EL2_MASK) 351 - 352 - #define __HDFGWTR_EL2_RES0 HDFGWTR_EL2_RES0 353 - #define __HDFGWTR_EL2_MASK (GENMASK(57, 52) | GENMASK(50, 48) | \ 354 - GENMASK(46, 44) | GENMASK(42, 41) | \ 355 - GENMASK(37, 35) | GENMASK(33, 31) | \ 356 - GENMASK(29, 23) | GENMASK(21, 10) | \ 357 - GENMASK(8, 7) | GENMASK(5, 0)) 358 - #define __HDFGWTR_EL2_nMASK ~(__HDFGWTR_EL2_RES0 | __HDFGWTR_EL2_MASK) 359 - 360 - #define __HAFGRTR_EL2_RES0 HAFGRTR_EL2_RES0 361 - #define __HAFGRTR_EL2_MASK (GENMASK(49, 17) | GENMASK(4, 0)) 362 - #define __HAFGRTR_EL2_nMASK ~(__HAFGRTR_EL2_RES0 | __HAFGRTR_EL2_MASK) 363 - 364 - /* Similar definitions for HCRX_EL2 */ 365 - #define __HCRX_EL2_RES0 HCRX_EL2_RES0 366 - #define __HCRX_EL2_MASK (BIT(6)) 367 - #define __HCRX_EL2_nMASK ~(__HCRX_EL2_RES0 | __HCRX_EL2_MASK) 320 + #define __HCRX_EL2_MASK (BIT_ULL(6)) 321 + #define __HCRX_EL2_nMASK (GENMASK_ULL(24, 14) | \ 322 + GENMASK_ULL(11, 7) | \ 323 + GENMASK_ULL(5, 0)) 324 + #define __HCRX_EL2_RES0 ~(__HCRX_EL2_nMASK | __HCRX_EL2_MASK) 325 + #define __HCRX_EL2_RES1 ~(__HCRX_EL2_nMASK | \ 326 + __HCRX_EL2_MASK | \ 327 + __HCRX_EL2_RES0) 368 328 369 329 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */ 370 330 #define HPFAR_MASK (~UL(0xf))

+77 -11

arch/arm64/include/asm/kvm_host.h

··· 39 39 40 40 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS 41 41 42 - #define KVM_VCPU_MAX_FEATURES 7 42 + #define KVM_VCPU_MAX_FEATURES 9 43 43 #define KVM_VCPU_VALID_FEATURES (BIT(KVM_VCPU_MAX_FEATURES) - 1) 44 44 45 45 #define KVM_REQ_SLEEP \ ··· 53 53 #define KVM_REQ_RESYNC_PMU_EL0 KVM_ARCH_REQ(7) 54 54 #define KVM_REQ_NESTED_S2_UNMAP KVM_ARCH_REQ(8) 55 55 #define KVM_REQ_GUEST_HYP_IRQ_PENDING KVM_ARCH_REQ(9) 56 + #define KVM_REQ_MAP_L1_VNCR_EL2 KVM_ARCH_REQ(10) 56 57 57 58 #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ 58 59 KVM_DIRTY_LOG_INITIALLY_SET) ··· 274 273 275 274 enum fgt_group_id { 276 275 __NO_FGT_GROUP__, 277 - HFGxTR_GROUP, 276 + HFGRTR_GROUP, 277 + HFGWTR_GROUP = HFGRTR_GROUP, 278 278 HDFGRTR_GROUP, 279 279 HDFGWTR_GROUP = HDFGRTR_GROUP, 280 280 HFGITR_GROUP, 281 281 HAFGRTR_GROUP, 282 + HFGRTR2_GROUP, 283 + HFGWTR2_GROUP = HFGRTR2_GROUP, 284 + HDFGRTR2_GROUP, 285 + HDFGWTR2_GROUP = HDFGRTR2_GROUP, 286 + HFGITR2_GROUP, 282 287 283 288 /* Must be last */ 284 289 __NR_FGT_GROUP_IDS__ ··· 366 359 367 360 cpumask_var_t supported_cpus; 368 361 369 - /* PMCR_EL0.N value for the guest */ 370 - u8 pmcr_n; 362 + /* Maximum number of counters for the guest */ 363 + u8 nr_pmu_counters; 371 364 372 365 /* Iterator for idreg debugfs */ 373 366 u8 idreg_debugfs_iter; ··· 395 388 396 389 /* Masks for VNCR-backed and general EL2 sysregs */ 397 390 struct kvm_sysreg_masks *sysreg_masks; 391 + 392 + /* Count the number of VNCR_EL2 currently mapped */ 393 + atomic_t vncr_map_count; 398 394 399 395 /* 400 396 * For an untrusted host VM, 'pkvm.handle' is used to lookup ··· 571 561 VNCR(HDFGRTR_EL2), 572 562 VNCR(HDFGWTR_EL2), 573 563 VNCR(HAFGRTR_EL2), 564 + VNCR(HFGRTR2_EL2), 565 + VNCR(HFGWTR2_EL2), 566 + VNCR(HFGITR2_EL2), 567 + VNCR(HDFGRTR2_EL2), 568 + VNCR(HDFGWTR2_EL2), 569 + 570 + VNCR(VNCR_EL2), 574 571 575 572 VNCR(CNTVOFF_EL2), 576 573 VNCR(CNTV_CVAL_EL0), ··· 622 605 u64 res1; 623 606 } mask[NR_SYS_REGS - __SANITISED_REG_START__]; 624 607 }; 608 + 609 + struct fgt_masks { 610 + const char *str; 611 + u64 mask; 612 + u64 nmask; 613 + u64 res0; 614 + }; 615 + 616 + extern struct fgt_masks hfgrtr_masks; 617 + extern struct fgt_masks hfgwtr_masks; 618 + extern struct fgt_masks hfgitr_masks; 619 + extern struct fgt_masks hdfgrtr_masks; 620 + extern struct fgt_masks hdfgwtr_masks; 621 + extern struct fgt_masks hafgrtr_masks; 622 + extern struct fgt_masks hfgrtr2_masks; 623 + extern struct fgt_masks hfgwtr2_masks; 624 + extern struct fgt_masks hfgitr2_masks; 625 + extern struct fgt_masks hdfgrtr2_masks; 626 + extern struct fgt_masks hdfgwtr2_masks; 627 + 628 + extern struct fgt_masks kvm_nvhe_sym(hfgrtr_masks); 629 + extern struct fgt_masks kvm_nvhe_sym(hfgwtr_masks); 630 + extern struct fgt_masks kvm_nvhe_sym(hfgitr_masks); 631 + extern struct fgt_masks kvm_nvhe_sym(hdfgrtr_masks); 632 + extern struct fgt_masks kvm_nvhe_sym(hdfgwtr_masks); 633 + extern struct fgt_masks kvm_nvhe_sym(hafgrtr_masks); 634 + extern struct fgt_masks kvm_nvhe_sym(hfgrtr2_masks); 635 + extern struct fgt_masks kvm_nvhe_sym(hfgwtr2_masks); 636 + extern struct fgt_masks kvm_nvhe_sym(hfgitr2_masks); 637 + extern struct fgt_masks kvm_nvhe_sym(hdfgrtr2_masks); 638 + extern struct fgt_masks kvm_nvhe_sym(hdfgwtr2_masks); 625 639 626 640 struct kvm_cpu_context { 627 641 struct user_pt_regs regs; /* sp = sp_el0 */ ··· 702 654 #define KVM_HOST_DATA_FLAG_HAS_TRBE 1 703 655 #define KVM_HOST_DATA_FLAG_TRBE_ENABLED 4 704 656 #define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED 5 657 + #define KVM_HOST_DATA_FLAG_VCPU_IN_HYP_CONTEXT 6 658 + #define KVM_HOST_DATA_FLAG_L1_VNCR_MAPPED 7 705 659 unsigned long flags; 706 660 707 661 struct kvm_cpu_context host_ctxt; ··· 779 729 bool be; 780 730 bool reset; 781 731 }; 732 + 733 + struct vncr_tlb; 782 734 783 735 struct kvm_vcpu_arch { 784 736 struct kvm_cpu_context ctxt; ··· 876 824 877 825 /* Per-vcpu CCSIDR override or NULL */ 878 826 u32 *ccsidr; 827 + 828 + /* Per-vcpu TLB for VNCR_EL2 -- NULL when !NV */ 829 + struct vncr_tlb *vncr_tlb; 879 830 }; 880 831 881 832 /* ··· 1026 971 #define vcpu_sve_zcr_elx(vcpu) \ 1027 972 (unlikely(is_hyp_ctxt(vcpu)) ? ZCR_EL2 : ZCR_EL1) 1028 973 1029 - #define vcpu_sve_state_size(vcpu) ({ \ 974 + #define sve_state_size_from_vl(sve_max_vl) ({ \ 1030 975 size_t __size_ret; \ 1031 - unsigned int __vcpu_vq; \ 976 + unsigned int __vq; \ 1032 977 \ 1033 - if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \ 978 + if (WARN_ON(!sve_vl_valid(sve_max_vl))) { \ 1034 979 __size_ret = 0; \ 1035 980 } else { \ 1036 - __vcpu_vq = vcpu_sve_max_vq(vcpu); \ 1037 - __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \ 981 + __vq = sve_vq_from_vl(sve_max_vl); \ 982 + __size_ret = SVE_SIG_REGS_SIZE(__vq); \ 1038 983 } \ 1039 984 \ 1040 985 __size_ret; \ 1041 986 }) 987 + 988 + #define vcpu_sve_state_size(vcpu) sve_state_size_from_vl((vcpu)->arch.sve_max_vl) 1042 989 1043 990 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \ 1044 991 KVM_GUESTDBG_USE_SW_BP | \ ··· 1607 1550 kvm_cmp_feat_signed(kvm, id, fld, op, limit) : \ 1608 1551 kvm_cmp_feat_unsigned(kvm, id, fld, op, limit)) 1609 1552 1610 - #define kvm_has_feat(kvm, id, fld, limit) \ 1553 + #define __kvm_has_feat(kvm, id, fld, limit) \ 1611 1554 kvm_cmp_feat(kvm, id, fld, >=, limit) 1612 1555 1613 - #define kvm_has_feat_enum(kvm, id, fld, val) \ 1556 + #define kvm_has_feat(kvm, ...) __kvm_has_feat(kvm, __VA_ARGS__) 1557 + 1558 + #define __kvm_has_feat_enum(kvm, id, fld, val) \ 1614 1559 kvm_cmp_feat_unsigned(kvm, id, fld, ==, val) 1560 + 1561 + #define kvm_has_feat_enum(kvm, ...) __kvm_has_feat_enum(kvm, __VA_ARGS__) 1615 1562 1616 1563 #define kvm_has_feat_range(kvm, id, fld, min, max) \ 1617 1564 (kvm_cmp_feat(kvm, id, fld, >=, min) && \ ··· 1653 1592 { 1654 1593 return true; 1655 1594 } 1595 + 1596 + void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt); 1597 + void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1); 1598 + void check_feature_map(void); 1599 + 1656 1600 1657 1601 #endif /* __ARM64_KVM_HOST_H__ */

+100

arch/arm64/include/asm/kvm_nested.h

··· 231 231 shift; \ 232 232 }) 233 233 234 + static inline u64 decode_range_tlbi(u64 val, u64 *range, u16 *asid) 235 + { 236 + u64 base, tg, num, scale; 237 + int shift; 238 + 239 + tg = FIELD_GET(GENMASK(47, 46), val); 240 + 241 + switch(tg) { 242 + case 1: 243 + shift = 12; 244 + break; 245 + case 2: 246 + shift = 14; 247 + break; 248 + case 3: 249 + default: /* IMPDEF: handle tg==0 as 64k */ 250 + shift = 16; 251 + break; 252 + } 253 + 254 + base = (val & GENMASK(36, 0)) << shift; 255 + 256 + if (asid) 257 + *asid = FIELD_GET(TLBIR_ASID_MASK, val); 258 + 259 + scale = FIELD_GET(GENMASK(45, 44), val); 260 + num = FIELD_GET(GENMASK(43, 39), val); 261 + *range = __TLBI_RANGE_PAGES(num, scale) << shift; 262 + 263 + return base; 264 + } 265 + 234 266 static inline unsigned int ps_to_output_size(unsigned int ps) 235 267 { 236 268 switch (ps) { ··· 276 244 return 48; 277 245 } 278 246 } 247 + 248 + enum trans_regime { 249 + TR_EL10, 250 + TR_EL20, 251 + TR_EL2, 252 + }; 253 + 254 + struct s1_walk_info { 255 + u64 baddr; 256 + enum trans_regime regime; 257 + unsigned int max_oa_bits; 258 + unsigned int pgshift; 259 + unsigned int txsz; 260 + int sl; 261 + bool as_el0; 262 + bool hpd; 263 + bool e0poe; 264 + bool poe; 265 + bool pan; 266 + bool be; 267 + bool s2; 268 + }; 269 + 270 + struct s1_walk_result { 271 + union { 272 + struct { 273 + u64 desc; 274 + u64 pa; 275 + s8 level; 276 + u8 APTable; 277 + bool nG; 278 + u16 asid; 279 + bool UXNTable; 280 + bool PXNTable; 281 + bool uwxn; 282 + bool uov; 283 + bool ur; 284 + bool uw; 285 + bool ux; 286 + bool pwxn; 287 + bool pov; 288 + bool pr; 289 + bool pw; 290 + bool px; 291 + }; 292 + struct { 293 + u8 fst; 294 + bool ptw; 295 + bool s2; 296 + }; 297 + }; 298 + bool failed; 299 + }; 300 + 301 + int __kvm_translate_va(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, 302 + struct s1_walk_result *wr, u64 va); 303 + 304 + /* VNCR management */ 305 + int kvm_vcpu_allocate_vncr_tlb(struct kvm_vcpu *vcpu); 306 + int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu); 307 + void kvm_handle_s1e2_tlbi(struct kvm_vcpu *vcpu, u32 inst, u64 val); 308 + 309 + #define vncr_fixmap(c) \ 310 + ({ \ 311 + u32 __c = (c); \ 312 + BUG_ON(__c >= NR_CPUS); \ 313 + (FIX_VNCR - __c); \ 314 + }) 279 315 280 316 #endif /* __ARM64_KVM_NESTED_H */

+6 -1

arch/arm64/include/asm/kvm_pgtable.h

··· 59 59 60 60 #define KVM_PHYS_INVALID (-1ULL) 61 61 62 + #define KVM_PTE_TYPE BIT(1) 63 + #define KVM_PTE_TYPE_BLOCK 0 64 + #define KVM_PTE_TYPE_PAGE 1 65 + #define KVM_PTE_TYPE_TABLE 1 66 + 62 67 #define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2) 63 68 64 69 #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2) ··· 418 413 */ 419 414 struct kvm_pgtable { 420 415 union { 421 - struct rb_root pkvm_mappings; 416 + struct rb_root_cached pkvm_mappings; 422 417 struct { 423 418 u32 ia_bits; 424 419 s8 start_level;

+8

arch/arm64/include/asm/kvm_pkvm.h

··· 135 135 return res; 136 136 } 137 137 138 + #ifdef CONFIG_NVHE_EL2_DEBUG 139 + static inline unsigned long pkvm_selftest_pages(void) { return 32; } 140 + #else 141 + static inline unsigned long pkvm_selftest_pages(void) { return 0; } 142 + #endif 143 + 138 144 #define KVM_FFA_MBOX_NR_PAGES 1 139 145 140 146 static inline unsigned long hyp_ffa_proxy_pages(void) ··· 173 167 struct rb_node node; 174 168 u64 gfn; 175 169 u64 pfn; 170 + u64 nr_pages; 171 + u64 __subtree_last; /* Internal member for interval tree */ 176 172 }; 177 173 178 174 int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,

+1

arch/arm64/include/asm/sections.h

··· 11 11 extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[]; 12 12 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[]; 13 13 extern char __hyp_text_start[], __hyp_text_end[]; 14 + extern char __hyp_data_start[], __hyp_data_end[]; 14 15 extern char __hyp_rodata_start[], __hyp_rodata_end[]; 15 16 extern char __hyp_reloc_begin[], __hyp_reloc_end[]; 16 17 extern char __hyp_bss_start[], __hyp_bss_end[];

+41 -12

arch/arm64/include/asm/sysreg.h

··· 117 117 118 118 #define SB_BARRIER_INSN __SYS_BARRIER_INSN(0, 7, 31) 119 119 120 + /* Data cache zero operations */ 120 121 #define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) 121 122 #define SYS_DC_IGSW sys_insn(1, 0, 7, 6, 4) 122 123 #define SYS_DC_IGDSW sys_insn(1, 0, 7, 6, 6) ··· 154 153 #define SYS_DC_CIGVAC sys_insn(1, 3, 7, 14, 3) 155 154 #define SYS_DC_CIGDVAC sys_insn(1, 3, 7, 14, 5) 156 155 157 - /* Data cache zero operations */ 158 156 #define SYS_DC_ZVA sys_insn(1, 3, 7, 4, 1) 159 157 #define SYS_DC_GVA sys_insn(1, 3, 7, 4, 3) 160 158 #define SYS_DC_GZVA sys_insn(1, 3, 7, 4, 4) 159 + 160 + #define SYS_DC_CIVAPS sys_insn(1, 0, 7, 15, 1) 161 + #define SYS_DC_CIGDVAPS sys_insn(1, 0, 7, 15, 5) 161 162 162 163 /* 163 164 * Automatically generated definitions for system registers, the ··· 500 497 501 498 #define __PMEV_op2(n) ((n) & 0x7) 502 499 #define __CNTR_CRm(n) (0x8 | (((n) >> 3) & 0x3)) 500 + #define SYS_PMEVCNTSVRn_EL1(n) sys_reg(2, 0, 14, __CNTR_CRm(n), __PMEV_op2(n)) 503 501 #define SYS_PMEVCNTRn_EL0(n) sys_reg(3, 3, 14, __CNTR_CRm(n), __PMEV_op2(n)) 504 502 #define __TYPER_CRm(n) (0xc | (((n) >> 3) & 0x3)) 505 503 #define SYS_PMEVTYPERn_EL0(n) sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n)) 506 504 507 505 #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) 506 + 507 + #define SYS_SPMCGCRn_EL1(n) sys_reg(2, 0, 9, 13, ((n) & 1)) 508 + 509 + #define __SPMEV_op2(n) ((n) & 0x7) 510 + #define __SPMEV_crm(p, n) ((((p) & 7) << 1) | (((n) >> 3) & 1)) 511 + #define SYS_SPMEVCNTRn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b000, n), __SPMEV_op2(n)) 512 + #define SYS_SPMEVFILT2Rn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b011, n), __SPMEV_op2(n)) 513 + #define SYS_SPMEVFILTRn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b010, n), __SPMEV_op2(n)) 514 + #define SYS_SPMEVTYPERn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b001, n), __SPMEV_op2(n)) 508 515 509 516 #define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0) 510 517 #define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5) ··· 534 521 #define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0) 535 522 #define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2) 536 523 537 - #define SYS_VNCR_EL2 sys_reg(3, 4, 2, 2, 0) 538 524 #define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6) 539 525 #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) 540 526 #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) ··· 620 608 621 609 /* VHE encodings for architectural EL0/1 system registers */ 622 610 #define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0) 623 - #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) 624 - #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) 625 - #define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3) 626 - #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) 627 - #define SYS_TRFCR_EL12 sys_reg(3, 5, 1, 2, 1) 628 - #define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6) 629 611 #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) 630 612 #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) 631 - #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) 632 - #define SYS_TCR2_EL12 sys_reg(3, 5, 2, 0, 3) 633 613 #define SYS_SPSR_EL12 sys_reg(3, 5, 4, 0, 0) 634 614 #define SYS_ELR_EL12 sys_reg(3, 5, 4, 0, 1) 635 615 #define SYS_AFSR0_EL12 sys_reg(3, 5, 5, 1, 0) 636 616 #define SYS_AFSR1_EL12 sys_reg(3, 5, 5, 1, 1) 637 617 #define SYS_ESR_EL12 sys_reg(3, 5, 5, 2, 0) 638 618 #define SYS_TFSR_EL12 sys_reg(3, 5, 5, 6, 0) 639 - #define SYS_FAR_EL12 sys_reg(3, 5, 6, 0, 0) 640 619 #define SYS_PMSCR_EL12 sys_reg(3, 5, 9, 9, 0) 641 620 #define SYS_MAIR_EL12 sys_reg(3, 5, 10, 2, 0) 642 621 #define SYS_AMAIR_EL12 sys_reg(3, 5, 10, 3, 0) 643 622 #define SYS_VBAR_EL12 sys_reg(3, 5, 12, 0, 0) 644 - #define SYS_CONTEXTIDR_EL12 sys_reg(3, 5, 13, 0, 1) 645 623 #define SYS_SCXTNUM_EL12 sys_reg(3, 5, 13, 0, 7) 646 624 #define SYS_CNTKCTL_EL12 sys_reg(3, 5, 14, 1, 0) 647 625 #define SYS_CNTP_TVAL_EL02 sys_reg(3, 5, 14, 2, 0) ··· 1093 1091 __emit_inst(0xd5000000|(\sreg)|(.L__gpr_num_\rt)) 1094 1092 .endm 1095 1093 1094 + .macro msr_hcr_el2, reg 1095 + #if IS_ENABLED(CONFIG_AMPERE_ERRATUM_AC04_CPU_23) 1096 + dsb nsh 1097 + msr hcr_el2, \reg 1098 + isb 1099 + #else 1100 + msr hcr_el2, \reg 1101 + #endif 1102 + .endm 1096 1103 #else 1097 1104 1098 1105 #include <linux/bitfield.h> ··· 1189 1178 write_sysreg(__scs_new, sysreg); \ 1190 1179 } while (0) 1191 1180 1181 + #define sysreg_clear_set_hcr(clear, set) do { \ 1182 + u64 __scs_val = read_sysreg(hcr_el2); \ 1183 + u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ 1184 + if (__scs_new != __scs_val) \ 1185 + write_sysreg_hcr(__scs_new); \ 1186 + } while (0) 1187 + 1192 1188 #define sysreg_clear_set_s(sysreg, clear, set) do { \ 1193 1189 u64 __scs_val = read_sysreg_s(sysreg); \ 1194 1190 u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ 1195 1191 if (__scs_new != __scs_val) \ 1196 1192 write_sysreg_s(__scs_new, sysreg); \ 1193 + } while (0) 1194 + 1195 + #define write_sysreg_hcr(__val) do { \ 1196 + if (IS_ENABLED(CONFIG_AMPERE_ERRATUM_AC04_CPU_23) && \ 1197 + (!system_capabilities_finalized() || \ 1198 + alternative_has_cap_unlikely(ARM64_WORKAROUND_AMPERE_AC04_CPU_23))) \ 1199 + asm volatile("dsb nsh; msr hcr_el2, %x0; isb" \ 1200 + : : "rZ" (__val)); \ 1201 + else \ 1202 + asm volatile("msr hcr_el2, %x0" \ 1203 + : : "rZ" (__val)); \ 1197 1204 } while (0) 1198 1205 1199 1206 #define read_sysreg_par() ({ \

+5

arch/arm64/include/asm/vncr_mapping.h

··· 35 35 #define VNCR_CNTP_CTL_EL0 0x180 36 36 #define VNCR_SCXTNUM_EL1 0x188 37 37 #define VNCR_TFSR_EL1 0x190 38 + #define VNCR_HDFGRTR2_EL2 0x1A0 39 + #define VNCR_HDFGWTR2_EL2 0x1B0 38 40 #define VNCR_HFGRTR_EL2 0x1B8 39 41 #define VNCR_HFGWTR_EL2 0x1C0 40 42 #define VNCR_HFGITR_EL2 0x1C8 ··· 54 52 #define VNCR_PIRE0_EL1 0x290 55 53 #define VNCR_PIR_EL1 0x2A0 56 54 #define VNCR_POR_EL1 0x2A8 55 + #define VNCR_HFGRTR2_EL2 0x2C0 56 + #define VNCR_HFGWTR2_EL2 0x2C8 57 + #define VNCR_HFGITR2_EL2 0x310 57 58 #define VNCR_ICH_LR0_EL2 0x400 58 59 #define VNCR_ICH_LR1_EL2 0x408 59 60 #define VNCR_ICH_LR2_EL2 0x410

+5 -4

arch/arm64/include/uapi/asm/kvm.h

··· 431 431 432 432 /* Device Control API on vcpu fd */ 433 433 #define KVM_ARM_VCPU_PMU_V3_CTRL 0 434 - #define KVM_ARM_VCPU_PMU_V3_IRQ 0 435 - #define KVM_ARM_VCPU_PMU_V3_INIT 1 436 - #define KVM_ARM_VCPU_PMU_V3_FILTER 2 437 - #define KVM_ARM_VCPU_PMU_V3_SET_PMU 3 434 + #define KVM_ARM_VCPU_PMU_V3_IRQ 0 435 + #define KVM_ARM_VCPU_PMU_V3_INIT 1 436 + #define KVM_ARM_VCPU_PMU_V3_FILTER 2 437 + #define KVM_ARM_VCPU_PMU_V3_SET_PMU 3 438 + #define KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS 4 438 439 #define KVM_ARM_VCPU_TIMER_CTRL 1 439 440 #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0 440 441 #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1

+14

arch/arm64/kernel/cpu_errata.c

··· 557 557 }; 558 558 #endif 559 559 560 + #ifdef CONFIG_AMPERE_ERRATUM_AC04_CPU_23 561 + static const struct midr_range erratum_ac04_cpu_23_list[] = { 562 + MIDR_ALL_VERSIONS(MIDR_AMPERE1A), 563 + {}, 564 + }; 565 + #endif 566 + 560 567 const struct arm64_cpu_capabilities arm64_errata[] = { 561 568 #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE 562 569 { ··· 881 874 .desc = "AmpereOne erratum AC03_CPU_38", 882 875 .capability = ARM64_WORKAROUND_AMPERE_AC03_CPU_38, 883 876 ERRATA_MIDR_RANGE_LIST(erratum_ac03_cpu_38_list), 877 + }, 878 + #endif 879 + #ifdef CONFIG_AMPERE_ERRATUM_AC04_CPU_23 880 + { 881 + .desc = "AmpereOne erratum AC04_CPU_23", 882 + .capability = ARM64_WORKAROUND_AMPERE_AC04_CPU_23, 883 + ERRATA_MIDR_RANGE_LIST(erratum_ac04_cpu_23_list), 884 884 }, 885 885 #endif 886 886 {

+8

arch/arm64/kernel/cpufeature.c

··· 305 305 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { 306 306 ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_GCS), 307 307 FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_GCS_SHIFT, 4, 0), 308 + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MTE_frac_SHIFT, 4, 0), 308 309 ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME), 309 310 FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_SME_SHIFT, 4, 0), 310 311 ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MPAM_frac_SHIFT, 4, 0), ··· 2885 2884 .capability = ARM64_HAS_FGT, 2886 2885 .matches = has_cpuid_feature, 2887 2886 ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, FGT, IMP) 2887 + }, 2888 + { 2889 + .desc = "Fine Grained Traps 2", 2890 + .type = ARM64_CPUCAP_SYSTEM_FEATURE, 2891 + .capability = ARM64_HAS_FGT2, 2892 + .matches = has_cpuid_feature, 2893 + ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, FGT, FGT2) 2888 2894 }, 2889 2895 #ifdef CONFIG_ARM64_SME 2890 2896 {

+1 -1

arch/arm64/kernel/hyp-stub.S

··· 97 97 2: 98 98 // Engage the VHE magic! 99 99 mov_q x0, HCR_HOST_VHE_FLAGS 100 - msr hcr_el2, x0 100 + msr_hcr_el2 x0 101 101 isb 102 102 103 103 // Use the EL1 allocated stack, per-cpu offset

+2

arch/arm64/kernel/image-vars.h

··· 126 126 KVM_NVHE_ALIAS(__hyp_text_end); 127 127 KVM_NVHE_ALIAS(__hyp_bss_start); 128 128 KVM_NVHE_ALIAS(__hyp_bss_end); 129 + KVM_NVHE_ALIAS(__hyp_data_start); 130 + KVM_NVHE_ALIAS(__hyp_data_end); 129 131 KVM_NVHE_ALIAS(__hyp_rodata_start); 130 132 KVM_NVHE_ALIAS(__hyp_rodata_end); 131 133

+2 -2

arch/arm64/kernel/traps.c

··· 1118 1118 #ifdef CONFIG_UBSAN_TRAP 1119 1119 static int ubsan_handler(struct pt_regs *regs, unsigned long esr) 1120 1120 { 1121 - die(report_ubsan_failure(regs, esr & UBSAN_BRK_MASK), regs, esr); 1121 + die(report_ubsan_failure(esr & UBSAN_BRK_MASK), regs, esr); 1122 1122 return DBG_HOOK_HANDLED; 1123 1123 } 1124 1124 ··· 1145 1145 return kasan_handler(regs, esr) != DBG_HOOK_HANDLED; 1146 1146 #endif 1147 1147 #ifdef CONFIG_UBSAN_TRAP 1148 - if ((esr_brk_comment(esr) & ~UBSAN_BRK_MASK) == UBSAN_BRK_IMM) 1148 + if (esr_is_ubsan_brk(esr)) 1149 1149 return ubsan_handler(regs, esr) != DBG_HOOK_HANDLED; 1150 1150 #endif 1151 1151 return bug_handler(regs, esr) != DBG_HOOK_HANDLED;

+15 -3

arch/arm64/kernel/vmlinux.lds.S

··· 13 13 *(__kvm_ex_table) \ 14 14 __stop___kvm_ex_table = .; 15 15 16 - #define HYPERVISOR_DATA_SECTIONS \ 16 + #define HYPERVISOR_RODATA_SECTIONS \ 17 17 HYP_SECTION_NAME(.rodata) : { \ 18 18 . = ALIGN(PAGE_SIZE); \ 19 19 __hyp_rodata_start = .; \ ··· 21 21 *(HYP_SECTION_NAME(.rodata)) \ 22 22 . = ALIGN(PAGE_SIZE); \ 23 23 __hyp_rodata_end = .; \ 24 + } 25 + 26 + #define HYPERVISOR_DATA_SECTION \ 27 + HYP_SECTION_NAME(.data) : { \ 28 + . = ALIGN(PAGE_SIZE); \ 29 + __hyp_data_start = .; \ 30 + *(HYP_SECTION_NAME(.data)) \ 31 + . = ALIGN(PAGE_SIZE); \ 32 + __hyp_data_end = .; \ 24 33 } 25 34 26 35 #define HYPERVISOR_PERCPU_SECTION \ ··· 60 51 #define SBSS_ALIGN PAGE_SIZE 61 52 #else /* CONFIG_KVM */ 62 53 #define HYPERVISOR_EXTABLE 63 - #define HYPERVISOR_DATA_SECTIONS 54 + #define HYPERVISOR_RODATA_SECTIONS 55 + #define HYPERVISOR_DATA_SECTION 64 56 #define HYPERVISOR_PERCPU_SECTION 65 57 #define HYPERVISOR_RELOC_SECTION 66 58 #define SBSS_ALIGN 0 ··· 200 190 /* everything from this point to __init_begin will be marked RO NX */ 201 191 RO_DATA(PAGE_SIZE) 202 192 203 - HYPERVISOR_DATA_SECTIONS 193 + HYPERVISOR_RODATA_SECTIONS 204 194 205 195 .got : { *(.got) } 206 196 /* ··· 304 294 _data = .; 305 295 _sdata = .; 306 296 RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN) 297 + 298 + HYPERVISOR_DATA_SECTION 307 299 308 300 /* 309 301 * Data written with the MMU off but read with the MMU on requires

+1 -1

arch/arm64/kvm/Makefile

··· 14 14 CFLAGS_handle_exit.o += -Wno-override-init 15 15 16 16 kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ 17 - inject_fault.o va_layout.o handle_exit.o \ 17 + inject_fault.o va_layout.o handle_exit.o config.o \ 18 18 guest.o debug.o reset.o sys_regs.o stacktrace.o \ 19 19 vgic-sys-reg-v3.o fpsimd.o pkvm.o \ 20 20 arch_timer.o trng.o vmid.o emulate-nested.o nested.o at.o \

+30

arch/arm64/kvm/arm.c

··· 368 368 case KVM_CAP_ARM_EL1_32BIT: 369 369 r = cpus_have_final_cap(ARM64_HAS_32BIT_EL1); 370 370 break; 371 + case KVM_CAP_ARM_EL2: 372 + r = cpus_have_final_cap(ARM64_HAS_NESTED_VIRT); 373 + break; 374 + case KVM_CAP_ARM_EL2_E2H0: 375 + r = cpus_have_final_cap(ARM64_HAS_HCR_NV1); 376 + break; 371 377 case KVM_CAP_GUEST_DEBUG_HW_BPS: 372 378 r = get_num_brps(); 373 379 break; ··· 849 843 return ret; 850 844 851 845 if (vcpu_has_nv(vcpu)) { 846 + ret = kvm_vcpu_allocate_vncr_tlb(vcpu); 847 + if (ret) 848 + return ret; 849 + 852 850 ret = kvm_vgic_vcpu_nv_init(vcpu); 853 851 if (ret) 854 852 return ret; ··· 2460 2450 kvm_nvhe_sym(__icache_flags) = __icache_flags; 2461 2451 kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits; 2462 2452 2453 + /* Propagate the FGT state to the the nVHE side */ 2454 + kvm_nvhe_sym(hfgrtr_masks) = hfgrtr_masks; 2455 + kvm_nvhe_sym(hfgwtr_masks) = hfgwtr_masks; 2456 + kvm_nvhe_sym(hfgitr_masks) = hfgitr_masks; 2457 + kvm_nvhe_sym(hdfgrtr_masks) = hdfgrtr_masks; 2458 + kvm_nvhe_sym(hdfgwtr_masks) = hdfgwtr_masks; 2459 + kvm_nvhe_sym(hafgrtr_masks) = hafgrtr_masks; 2460 + kvm_nvhe_sym(hfgrtr2_masks) = hfgrtr2_masks; 2461 + kvm_nvhe_sym(hfgwtr2_masks) = hfgwtr2_masks; 2462 + kvm_nvhe_sym(hfgitr2_masks) = hfgitr2_masks; 2463 + kvm_nvhe_sym(hdfgrtr2_masks)= hdfgrtr2_masks; 2464 + kvm_nvhe_sym(hdfgwtr2_masks)= hdfgwtr2_masks; 2465 + 2463 2466 /* 2464 2467 * Flush entire BSS since part of its data containing init symbols is read 2465 2468 * while the MMU is off. ··· 2624 2601 kvm_ksym_ref(__hyp_text_end), PAGE_HYP_EXEC); 2625 2602 if (err) { 2626 2603 kvm_err("Cannot map world-switch code\n"); 2604 + goto out_err; 2605 + } 2606 + 2607 + err = create_hyp_mappings(kvm_ksym_ref(__hyp_data_start), 2608 + kvm_ksym_ref(__hyp_data_end), PAGE_HYP); 2609 + if (err) { 2610 + kvm_err("Cannot map .hyp.data section\n"); 2627 2611 goto out_err; 2628 2612 } 2629 2613

+102 -84

arch/arm64/kvm/at.c

··· 10 10 #include <asm/kvm_hyp.h> 11 11 #include <asm/kvm_mmu.h> 12 12 13 - enum trans_regime { 14 - TR_EL10, 15 - TR_EL20, 16 - TR_EL2, 17 - }; 18 - 19 - struct s1_walk_info { 20 - u64 baddr; 21 - enum trans_regime regime; 22 - unsigned int max_oa_bits; 23 - unsigned int pgshift; 24 - unsigned int txsz; 25 - int sl; 26 - bool hpd; 27 - bool e0poe; 28 - bool poe; 29 - bool pan; 30 - bool be; 31 - bool s2; 32 - }; 33 - 34 - struct s1_walk_result { 35 - union { 36 - struct { 37 - u64 desc; 38 - u64 pa; 39 - s8 level; 40 - u8 APTable; 41 - bool UXNTable; 42 - bool PXNTable; 43 - bool uwxn; 44 - bool uov; 45 - bool ur; 46 - bool uw; 47 - bool ux; 48 - bool pwxn; 49 - bool pov; 50 - bool pr; 51 - bool pw; 52 - bool px; 53 - }; 54 - struct { 55 - u8 fst; 56 - bool ptw; 57 - bool s2; 58 - }; 59 - }; 60 - bool failed; 61 - }; 62 - 63 - static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool ptw, bool s2) 13 + static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool s1ptw) 64 14 { 65 15 wr->fst = fst; 66 - wr->ptw = ptw; 67 - wr->s2 = s2; 16 + wr->ptw = s1ptw; 17 + wr->s2 = s1ptw; 68 18 wr->failed = true; 69 19 } 70 20 ··· 95 145 } 96 146 } 97 147 98 - static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi, 148 + static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, 99 149 struct s1_walk_result *wr, u64 va) 100 150 { 101 151 u64 hcr, sctlr, tcr, tg, ps, ia_bits, ttbr; 102 152 unsigned int stride, x; 103 - bool va55, tbi, lva, as_el0; 153 + bool va55, tbi, lva; 104 154 105 155 hcr = __vcpu_sys_reg(vcpu, HCR_EL2); 106 - 107 - wi->regime = compute_translation_regime(vcpu, op); 108 - as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W); 109 - wi->pan = (op == OP_AT_S1E1RP || op == OP_AT_S1E1WP) && 110 - (*vcpu_cpsr(vcpu) & PSR_PAN_BIT); 111 156 112 157 va55 = va & BIT(55); 113 158 ··· 264 319 265 320 /* R_BNDVG and following statements */ 266 321 if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, E0PD, IMP) && 267 - as_el0 && (tcr & (va55 ? TCR_E0PD1 : TCR_E0PD0))) 322 + wi->as_el0 && (tcr & (va55 ? TCR_E0PD1 : TCR_E0PD0))) 268 323 goto transfault_l0; 269 324 270 325 /* AArch64.S1StartLevel() */ ··· 290 345 return 0; 291 346 292 347 addrsz: /* Address Size Fault level 0 */ 293 - fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false, false); 348 + fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false); 294 349 return -EFAULT; 295 350 296 351 transfault_l0: /* Translation Fault level 0 */ 297 - fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(0), false, false); 352 + fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(0), false); 298 353 return -EFAULT; 299 354 } 300 355 ··· 325 380 if (ret) { 326 381 fail_s1_walk(wr, 327 382 (s2_trans.esr & ~ESR_ELx_FSC_LEVEL) | level, 328 - true, true); 383 + true); 329 384 return ret; 330 385 } 331 386 332 387 if (!kvm_s2_trans_readable(&s2_trans)) { 333 388 fail_s1_walk(wr, ESR_ELx_FSC_PERM_L(level), 334 - true, true); 389 + true); 335 390 336 391 return -EPERM; 337 392 } ··· 341 396 342 397 ret = kvm_read_guest(vcpu->kvm, ipa, &desc, sizeof(desc)); 343 398 if (ret) { 344 - fail_s1_walk(wr, ESR_ELx_FSC_SEA_TTW(level), 345 - true, false); 399 + fail_s1_walk(wr, ESR_ELx_FSC_SEA_TTW(level), false); 346 400 return ret; 347 401 } 348 402 ··· 401 457 if (check_output_size(desc & GENMASK(47, va_bottom), wi)) 402 458 goto addrsz; 403 459 460 + if (!(desc & PTE_AF)) { 461 + fail_s1_walk(wr, ESR_ELx_FSC_ACCESS_L(level), false); 462 + return -EACCES; 463 + } 464 + 404 465 va_bottom += contiguous_bit_shift(desc, wi, level); 405 466 406 467 wr->failed = false; ··· 414 465 wr->pa = desc & GENMASK(47, va_bottom); 415 466 wr->pa |= va & GENMASK_ULL(va_bottom - 1, 0); 416 467 468 + wr->nG = (wi->regime != TR_EL2) && (desc & PTE_NG); 469 + if (wr->nG) { 470 + u64 asid_ttbr, tcr; 471 + 472 + switch (wi->regime) { 473 + case TR_EL10: 474 + tcr = vcpu_read_sys_reg(vcpu, TCR_EL1); 475 + asid_ttbr = ((tcr & TCR_A1) ? 476 + vcpu_read_sys_reg(vcpu, TTBR1_EL1) : 477 + vcpu_read_sys_reg(vcpu, TTBR0_EL1)); 478 + break; 479 + case TR_EL20: 480 + tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); 481 + asid_ttbr = ((tcr & TCR_A1) ? 482 + vcpu_read_sys_reg(vcpu, TTBR1_EL2) : 483 + vcpu_read_sys_reg(vcpu, TTBR0_EL2)); 484 + break; 485 + default: 486 + BUG(); 487 + } 488 + 489 + wr->asid = FIELD_GET(TTBR_ASID_MASK, asid_ttbr); 490 + if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || 491 + !(tcr & TCR_ASID16)) 492 + wr->asid &= GENMASK(7, 0); 493 + } 494 + 417 495 return 0; 418 496 419 497 addrsz: 420 - fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(level), true, false); 498 + fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(level), false); 421 499 return -EINVAL; 422 500 transfault: 423 - fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(level), true, false); 501 + fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(level), false); 424 502 return -ENOENT; 425 503 } 426 504 ··· 464 488 u64 sctlr; 465 489 u64 vttbr; 466 490 u64 vtcr; 467 - u64 hcr; 468 491 }; 469 492 470 493 static void __mmu_config_save(struct mmu_config *config) ··· 486 511 config->sctlr = read_sysreg_el1(SYS_SCTLR); 487 512 config->vttbr = read_sysreg(vttbr_el2); 488 513 config->vtcr = read_sysreg(vtcr_el2); 489 - config->hcr = read_sysreg(hcr_el2); 490 514 } 491 515 492 516 static void __mmu_config_restore(struct mmu_config *config) 493 517 { 494 - write_sysreg(config->hcr, hcr_el2); 495 - 496 518 /* 497 519 * ARM errata 1165522 and 1530923 require TGE to be 1 before 498 520 * we update the guest state. ··· 1127 1155 bool perm_fail = false; 1128 1156 int ret, idx; 1129 1157 1130 - ret = setup_s1_walk(vcpu, op, &wi, &wr, vaddr); 1158 + wi.regime = compute_translation_regime(vcpu, op); 1159 + wi.as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W); 1160 + wi.pan = (op == OP_AT_S1E1RP || op == OP_AT_S1E1WP) && 1161 + (*vcpu_cpsr(vcpu) & PSR_PAN_BIT); 1162 + 1163 + ret = setup_s1_walk(vcpu, &wi, &wr, vaddr); 1131 1164 if (ret) 1132 1165 goto compute_par; 1133 1166 ··· 1175 1198 } 1176 1199 1177 1200 if (perm_fail) 1178 - fail_s1_walk(&wr, ESR_ELx_FSC_PERM_L(wr.level), false, false); 1201 + fail_s1_walk(&wr, ESR_ELx_FSC_PERM_L(wr.level), false); 1179 1202 1180 1203 compute_par: 1181 1204 return compute_par_s1(vcpu, &wr, wi.regime); ··· 1187 1210 * If the translation is unsuccessful, the value may only contain 1188 1211 * PAR_EL1.F, and cannot be taken at face value. It isn't an 1189 1212 * indication of the translation having failed, only that the fast 1190 - * path did not succeed, *unless* it indicates a S1 permission fault. 1213 + * path did not succeed, *unless* it indicates a S1 permission or 1214 + * access fault. 1191 1215 */ 1192 1216 static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) 1193 1217 { ··· 1244 1266 __load_stage2(mmu, mmu->arch); 1245 1267 1246 1268 skip_mmu_switch: 1247 - /* Clear TGE, enable S2 translation, we're rolling */ 1248 - write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM, hcr_el2); 1269 + /* Temporarily switch back to guest context */ 1270 + write_sysreg_hcr(vcpu->arch.hcr_el2); 1249 1271 isb(); 1250 1272 1251 1273 switch (op) { ··· 1277 1299 if (!fail) 1278 1300 par = read_sysreg_par(); 1279 1301 1302 + write_sysreg_hcr(HCR_HOST_VHE_FLAGS); 1303 + 1280 1304 if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))) 1281 1305 __mmu_config_restore(&config); 1282 1306 ··· 1293 1313 !(par & SYS_PAR_EL1_S)); 1294 1314 } 1295 1315 1316 + static bool par_check_s1_access_fault(u64 par) 1317 + { 1318 + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); 1319 + 1320 + return ((fst & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_ACCESS && 1321 + !(par & SYS_PAR_EL1_S)); 1322 + } 1323 + 1296 1324 void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) 1297 1325 { 1298 1326 u64 par = __kvm_at_s1e01_fast(vcpu, op, vaddr); 1299 1327 1300 1328 /* 1301 - * If PAR_EL1 reports that AT failed on a S1 permission fault, we 1302 - * know for sure that the PTW was able to walk the S1 tables and 1303 - * there's nothing else to do. 1329 + * If PAR_EL1 reports that AT failed on a S1 permission or access 1330 + * fault, we know for sure that the PTW was able to walk the S1 1331 + * tables and there's nothing else to do. 1304 1332 * 1305 1333 * If AT failed for any other reason, then we must walk the guest S1 1306 1334 * to emulate the instruction. 1307 1335 */ 1308 - if ((par & SYS_PAR_EL1_F) && !par_check_s1_perm_fault(par)) 1336 + if ((par & SYS_PAR_EL1_F) && 1337 + !par_check_s1_perm_fault(par) && 1338 + !par_check_s1_access_fault(par)) 1309 1339 par = handle_at_slow(vcpu, op, vaddr); 1310 1340 1311 1341 vcpu_write_sys_reg(vcpu, par, PAR_EL1); ··· 1340 1350 if (!vcpu_el2_e2h_is_set(vcpu)) 1341 1351 val |= HCR_NV | HCR_NV1; 1342 1352 1343 - write_sysreg(val, hcr_el2); 1353 + write_sysreg_hcr(val); 1344 1354 isb(); 1345 1355 1346 1356 par = SYS_PAR_EL1_F; ··· 1365 1375 if (!fail) 1366 1376 par = read_sysreg_par(); 1367 1377 1368 - write_sysreg(hcr, hcr_el2); 1378 + write_sysreg_hcr(hcr); 1369 1379 isb(); 1370 1380 } 1371 1381 ··· 1433 1443 1434 1444 par = compute_par_s12(vcpu, par, &out); 1435 1445 vcpu_write_sys_reg(vcpu, par, PAR_EL1); 1446 + } 1447 + 1448 + /* 1449 + * Translate a VA for a given EL in a given translation regime, with 1450 + * or without PAN. This requires wi->{regime, as_el0, pan} to be 1451 + * set. The rest of the wi and wr should be 0-initialised. 1452 + */ 1453 + int __kvm_translate_va(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, 1454 + struct s1_walk_result *wr, u64 va) 1455 + { 1456 + int ret; 1457 + 1458 + ret = setup_s1_walk(vcpu, wi, wr, va); 1459 + if (ret) 1460 + return ret; 1461 + 1462 + if (wr->level == S1_MMU_DISABLED) { 1463 + wr->ur = wr->uw = wr->ux = true; 1464 + wr->pr = wr->pw = wr->px = true; 1465 + } else { 1466 + ret = walk_s1(vcpu, wi, wr, va); 1467 + if (ret) 1468 + return ret; 1469 + 1470 + compute_s1_permissions(vcpu, wi, wr); 1471 + } 1472 + 1473 + return 0; 1436 1474 }

+1085

arch/arm64/kvm/config.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2025 Google LLC 4 + * Author: Marc Zyngier <maz@kernel.org> 5 + */ 6 + 7 + #include <linux/kvm_host.h> 8 + #include <asm/sysreg.h> 9 + 10 + struct reg_bits_to_feat_map { 11 + u64 bits; 12 + 13 + #define NEVER_FGU BIT(0) /* Can trap, but never UNDEF */ 14 + #define CALL_FUNC BIT(1) /* Needs to evaluate tons of crap */ 15 + #define FIXED_VALUE BIT(2) /* RAZ/WI or RAO/WI in KVM */ 16 + unsigned long flags; 17 + 18 + union { 19 + struct { 20 + u8 regidx; 21 + u8 shift; 22 + u8 width; 23 + bool sign; 24 + s8 lo_lim; 25 + }; 26 + bool (*match)(struct kvm *); 27 + bool (*fval)(struct kvm *, u64 *); 28 + }; 29 + }; 30 + 31 + #define __NEEDS_FEAT_3(m, f, id, fld, lim) \ 32 + { \ 33 + .bits = (m), \ 34 + .flags = (f), \ 35 + .regidx = IDREG_IDX(SYS_ ## id), \ 36 + .shift = id ##_## fld ## _SHIFT, \ 37 + .width = id ##_## fld ## _WIDTH, \ 38 + .sign = id ##_## fld ## _SIGNED, \ 39 + .lo_lim = id ##_## fld ##_## lim \ 40 + } 41 + 42 + #define __NEEDS_FEAT_2(m, f, fun, dummy) \ 43 + { \ 44 + .bits = (m), \ 45 + .flags = (f) | CALL_FUNC, \ 46 + .fval = (fun), \ 47 + } 48 + 49 + #define __NEEDS_FEAT_1(m, f, fun) \ 50 + { \ 51 + .bits = (m), \ 52 + .flags = (f) | CALL_FUNC, \ 53 + .match = (fun), \ 54 + } 55 + 56 + #define NEEDS_FEAT_FLAG(m, f, ...) \ 57 + CONCATENATE(__NEEDS_FEAT_, COUNT_ARGS(__VA_ARGS__))(m, f, __VA_ARGS__) 58 + 59 + #define NEEDS_FEAT_FIXED(m, ...) \ 60 + NEEDS_FEAT_FLAG(m, FIXED_VALUE, __VA_ARGS__, 0) 61 + 62 + #define NEEDS_FEAT(m, ...) NEEDS_FEAT_FLAG(m, 0, __VA_ARGS__) 63 + 64 + #define FEAT_SPE ID_AA64DFR0_EL1, PMSVer, IMP 65 + #define FEAT_SPE_FnE ID_AA64DFR0_EL1, PMSVer, V1P2 66 + #define FEAT_BRBE ID_AA64DFR0_EL1, BRBE, IMP 67 + #define FEAT_TRC_SR ID_AA64DFR0_EL1, TraceVer, IMP 68 + #define FEAT_PMUv3 ID_AA64DFR0_EL1, PMUVer, IMP 69 + #define FEAT_PMUv3p9 ID_AA64DFR0_EL1, PMUVer, V3P9 70 + #define FEAT_TRBE ID_AA64DFR0_EL1, TraceBuffer, IMP 71 + #define FEAT_TRBEv1p1 ID_AA64DFR0_EL1, TraceBuffer, TRBE_V1P1 72 + #define FEAT_DoubleLock ID_AA64DFR0_EL1, DoubleLock, IMP 73 + #define FEAT_TRF ID_AA64DFR0_EL1, TraceFilt, IMP 74 + #define FEAT_AA32EL0 ID_AA64PFR0_EL1, EL0, AARCH32 75 + #define FEAT_AA32EL1 ID_AA64PFR0_EL1, EL1, AARCH32 76 + #define FEAT_AA64EL1 ID_AA64PFR0_EL1, EL1, IMP 77 + #define FEAT_AA64EL3 ID_AA64PFR0_EL1, EL3, IMP 78 + #define FEAT_AIE ID_AA64MMFR3_EL1, AIE, IMP 79 + #define FEAT_S2POE ID_AA64MMFR3_EL1, S2POE, IMP 80 + #define FEAT_S1POE ID_AA64MMFR3_EL1, S1POE, IMP 81 + #define FEAT_S1PIE ID_AA64MMFR3_EL1, S1PIE, IMP 82 + #define FEAT_THE ID_AA64PFR1_EL1, THE, IMP 83 + #define FEAT_SME ID_AA64PFR1_EL1, SME, IMP 84 + #define FEAT_GCS ID_AA64PFR1_EL1, GCS, IMP 85 + #define FEAT_LS64 ID_AA64ISAR1_EL1, LS64, LS64 86 + #define FEAT_LS64_V ID_AA64ISAR1_EL1, LS64, LS64_V 87 + #define FEAT_LS64_ACCDATA ID_AA64ISAR1_EL1, LS64, LS64_ACCDATA 88 + #define FEAT_RAS ID_AA64PFR0_EL1, RAS, IMP 89 + #define FEAT_RASv2 ID_AA64PFR0_EL1, RAS, V2 90 + #define FEAT_GICv3 ID_AA64PFR0_EL1, GIC, IMP 91 + #define FEAT_LOR ID_AA64MMFR1_EL1, LO, IMP 92 + #define FEAT_SPEv1p4 ID_AA64DFR0_EL1, PMSVer, V1P4 93 + #define FEAT_SPEv1p5 ID_AA64DFR0_EL1, PMSVer, V1P5 94 + #define FEAT_ATS1A ID_AA64ISAR2_EL1, ATS1A, IMP 95 + #define FEAT_SPECRES2 ID_AA64ISAR1_EL1, SPECRES, COSP_RCTX 96 + #define FEAT_SPECRES ID_AA64ISAR1_EL1, SPECRES, IMP 97 + #define FEAT_TLBIRANGE ID_AA64ISAR0_EL1, TLB, RANGE 98 + #define FEAT_TLBIOS ID_AA64ISAR0_EL1, TLB, OS 99 + #define FEAT_PAN2 ID_AA64MMFR1_EL1, PAN, PAN2 100 + #define FEAT_DPB2 ID_AA64ISAR1_EL1, DPB, DPB2 101 + #define FEAT_AMUv1 ID_AA64PFR0_EL1, AMU, IMP 102 + #define FEAT_AMUv1p1 ID_AA64PFR0_EL1, AMU, V1P1 103 + #define FEAT_CMOW ID_AA64MMFR1_EL1, CMOW, IMP 104 + #define FEAT_D128 ID_AA64MMFR3_EL1, D128, IMP 105 + #define FEAT_DoubleFault2 ID_AA64PFR1_EL1, DF2, IMP 106 + #define FEAT_FPMR ID_AA64PFR2_EL1, FPMR, IMP 107 + #define FEAT_MOPS ID_AA64ISAR2_EL1, MOPS, IMP 108 + #define FEAT_NMI ID_AA64PFR1_EL1, NMI, IMP 109 + #define FEAT_SCTLR2 ID_AA64MMFR3_EL1, SCTLRX, IMP 110 + #define FEAT_SYSREG128 ID_AA64ISAR2_EL1, SYSREG_128, IMP 111 + #define FEAT_TCR2 ID_AA64MMFR3_EL1, TCRX, IMP 112 + #define FEAT_XS ID_AA64ISAR1_EL1, XS, IMP 113 + #define FEAT_EVT ID_AA64MMFR2_EL1, EVT, IMP 114 + #define FEAT_EVT_TTLBxS ID_AA64MMFR2_EL1, EVT, TTLBxS 115 + #define FEAT_MTE2 ID_AA64PFR1_EL1, MTE, MTE2 116 + #define FEAT_RME ID_AA64PFR0_EL1, RME, IMP 117 + #define FEAT_MPAM ID_AA64PFR0_EL1, MPAM, 1 118 + #define FEAT_S2FWB ID_AA64MMFR2_EL1, FWB, IMP 119 + #define FEAT_TME ID_AA64ISAR0_EL1, TME, IMP 120 + #define FEAT_TWED ID_AA64MMFR1_EL1, TWED, IMP 121 + #define FEAT_E2H0 ID_AA64MMFR4_EL1, E2H0, IMP 122 + #define FEAT_SRMASK ID_AA64MMFR4_EL1, SRMASK, IMP 123 + #define FEAT_PoPS ID_AA64MMFR4_EL1, PoPS, IMP 124 + #define FEAT_PFAR ID_AA64PFR1_EL1, PFAR, IMP 125 + #define FEAT_Debugv8p9 ID_AA64DFR0_EL1, PMUVer, V3P9 126 + #define FEAT_PMUv3_SS ID_AA64DFR0_EL1, PMSS, IMP 127 + #define FEAT_SEBEP ID_AA64DFR0_EL1, SEBEP, IMP 128 + #define FEAT_EBEP ID_AA64DFR1_EL1, EBEP, IMP 129 + #define FEAT_ITE ID_AA64DFR1_EL1, ITE, IMP 130 + #define FEAT_PMUv3_ICNTR ID_AA64DFR1_EL1, PMICNTR, IMP 131 + #define FEAT_SPMU ID_AA64DFR1_EL1, SPMU, IMP 132 + #define FEAT_SPE_nVM ID_AA64DFR2_EL1, SPE_nVM, IMP 133 + #define FEAT_STEP2 ID_AA64DFR2_EL1, STEP, IMP 134 + 135 + static bool not_feat_aa64el3(struct kvm *kvm) 136 + { 137 + return !kvm_has_feat(kvm, FEAT_AA64EL3); 138 + } 139 + 140 + static bool feat_nv2(struct kvm *kvm) 141 + { 142 + return ((kvm_has_feat(kvm, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY) && 143 + kvm_has_feat_enum(kvm, ID_AA64MMFR2_EL1, NV, NI)) || 144 + kvm_has_feat(kvm, ID_AA64MMFR2_EL1, NV, NV2)); 145 + } 146 + 147 + static bool feat_nv2_e2h0_ni(struct kvm *kvm) 148 + { 149 + return feat_nv2(kvm) && !kvm_has_feat(kvm, FEAT_E2H0); 150 + } 151 + 152 + static bool feat_rasv1p1(struct kvm *kvm) 153 + { 154 + return (kvm_has_feat(kvm, ID_AA64PFR0_EL1, RAS, V1P1) || 155 + (kvm_has_feat_enum(kvm, ID_AA64PFR0_EL1, RAS, IMP) && 156 + kvm_has_feat(kvm, ID_AA64PFR1_EL1, RAS_frac, RASv1p1))); 157 + } 158 + 159 + static bool feat_csv2_2_csv2_1p2(struct kvm *kvm) 160 + { 161 + return (kvm_has_feat(kvm, ID_AA64PFR0_EL1, CSV2, CSV2_2) || 162 + (kvm_has_feat(kvm, ID_AA64PFR1_EL1, CSV2_frac, CSV2_1p2) && 163 + kvm_has_feat_enum(kvm, ID_AA64PFR0_EL1, CSV2, IMP))); 164 + } 165 + 166 + static bool feat_pauth(struct kvm *kvm) 167 + { 168 + return kvm_has_pauth(kvm, PAuth); 169 + } 170 + 171 + static bool feat_pauth_lr(struct kvm *kvm) 172 + { 173 + return kvm_has_pauth(kvm, PAuth_LR); 174 + } 175 + 176 + static bool feat_aderr(struct kvm *kvm) 177 + { 178 + return (kvm_has_feat(kvm, ID_AA64MMFR3_EL1, ADERR, FEAT_ADERR) && 179 + kvm_has_feat(kvm, ID_AA64MMFR3_EL1, SDERR, FEAT_ADERR)); 180 + } 181 + 182 + static bool feat_anerr(struct kvm *kvm) 183 + { 184 + return (kvm_has_feat(kvm, ID_AA64MMFR3_EL1, ANERR, FEAT_ANERR) && 185 + kvm_has_feat(kvm, ID_AA64MMFR3_EL1, SNERR, FEAT_ANERR)); 186 + } 187 + 188 + static bool feat_sme_smps(struct kvm *kvm) 189 + { 190 + /* 191 + * Revists this if KVM ever supports SME -- this really should 192 + * look at the guest's view of SMIDR_EL1. Funnily enough, this 193 + * is not captured in the JSON file, but only as a note in the 194 + * ARM ARM. 195 + */ 196 + return (kvm_has_feat(kvm, FEAT_SME) && 197 + (read_sysreg_s(SYS_SMIDR_EL1) & SMIDR_EL1_SMPS)); 198 + } 199 + 200 + static bool feat_spe_fds(struct kvm *kvm) 201 + { 202 + /* 203 + * Revists this if KVM ever supports SPE -- this really should 204 + * look at the guest's view of PMSIDR_EL1. 205 + */ 206 + return (kvm_has_feat(kvm, FEAT_SPEv1p4) && 207 + (read_sysreg_s(SYS_PMSIDR_EL1) & PMSIDR_EL1_FDS)); 208 + } 209 + 210 + static bool feat_trbe_mpam(struct kvm *kvm) 211 + { 212 + /* 213 + * Revists this if KVM ever supports both MPAM and TRBE -- 214 + * this really should look at the guest's view of TRBIDR_EL1. 215 + */ 216 + return (kvm_has_feat(kvm, FEAT_TRBE) && 217 + kvm_has_feat(kvm, FEAT_MPAM) && 218 + (read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_MPAM)); 219 + } 220 + 221 + static bool feat_ebep_pmuv3_ss(struct kvm *kvm) 222 + { 223 + return kvm_has_feat(kvm, FEAT_EBEP) || kvm_has_feat(kvm, FEAT_PMUv3_SS); 224 + } 225 + 226 + static bool compute_hcr_rw(struct kvm *kvm, u64 *bits) 227 + { 228 + /* This is purely academic: AArch32 and NV are mutually exclusive */ 229 + if (bits) { 230 + if (kvm_has_feat(kvm, FEAT_AA32EL1)) 231 + *bits &= ~HCR_EL2_RW; 232 + else 233 + *bits |= HCR_EL2_RW; 234 + } 235 + 236 + return true; 237 + } 238 + 239 + static bool compute_hcr_e2h(struct kvm *kvm, u64 *bits) 240 + { 241 + if (bits) { 242 + if (kvm_has_feat(kvm, FEAT_E2H0)) 243 + *bits &= ~HCR_EL2_E2H; 244 + else 245 + *bits |= HCR_EL2_E2H; 246 + } 247 + 248 + return true; 249 + } 250 + 251 + static const struct reg_bits_to_feat_map hfgrtr_feat_map[] = { 252 + NEEDS_FEAT(HFGRTR_EL2_nAMAIR2_EL1 | 253 + HFGRTR_EL2_nMAIR2_EL1, 254 + FEAT_AIE), 255 + NEEDS_FEAT(HFGRTR_EL2_nS2POR_EL1, FEAT_S2POE), 256 + NEEDS_FEAT(HFGRTR_EL2_nPOR_EL1 | 257 + HFGRTR_EL2_nPOR_EL0, 258 + FEAT_S1POE), 259 + NEEDS_FEAT(HFGRTR_EL2_nPIR_EL1 | 260 + HFGRTR_EL2_nPIRE0_EL1, 261 + FEAT_S1PIE), 262 + NEEDS_FEAT(HFGRTR_EL2_nRCWMASK_EL1, FEAT_THE), 263 + NEEDS_FEAT(HFGRTR_EL2_nTPIDR2_EL0 | 264 + HFGRTR_EL2_nSMPRI_EL1, 265 + FEAT_SME), 266 + NEEDS_FEAT(HFGRTR_EL2_nGCS_EL1 | 267 + HFGRTR_EL2_nGCS_EL0, 268 + FEAT_GCS), 269 + NEEDS_FEAT(HFGRTR_EL2_nACCDATA_EL1, FEAT_LS64_ACCDATA), 270 + NEEDS_FEAT(HFGRTR_EL2_ERXADDR_EL1 | 271 + HFGRTR_EL2_ERXMISCn_EL1 | 272 + HFGRTR_EL2_ERXSTATUS_EL1 | 273 + HFGRTR_EL2_ERXCTLR_EL1 | 274 + HFGRTR_EL2_ERXFR_EL1 | 275 + HFGRTR_EL2_ERRSELR_EL1 | 276 + HFGRTR_EL2_ERRIDR_EL1, 277 + FEAT_RAS), 278 + NEEDS_FEAT(HFGRTR_EL2_ERXPFGCDN_EL1 | 279 + HFGRTR_EL2_ERXPFGCTL_EL1 | 280 + HFGRTR_EL2_ERXPFGF_EL1, 281 + feat_rasv1p1), 282 + NEEDS_FEAT(HFGRTR_EL2_ICC_IGRPENn_EL1, FEAT_GICv3), 283 + NEEDS_FEAT(HFGRTR_EL2_SCXTNUM_EL0 | 284 + HFGRTR_EL2_SCXTNUM_EL1, 285 + feat_csv2_2_csv2_1p2), 286 + NEEDS_FEAT(HFGRTR_EL2_LORSA_EL1 | 287 + HFGRTR_EL2_LORN_EL1 | 288 + HFGRTR_EL2_LORID_EL1 | 289 + HFGRTR_EL2_LOREA_EL1 | 290 + HFGRTR_EL2_LORC_EL1, 291 + FEAT_LOR), 292 + NEEDS_FEAT(HFGRTR_EL2_APIBKey | 293 + HFGRTR_EL2_APIAKey | 294 + HFGRTR_EL2_APGAKey | 295 + HFGRTR_EL2_APDBKey | 296 + HFGRTR_EL2_APDAKey, 297 + feat_pauth), 298 + NEEDS_FEAT_FLAG(HFGRTR_EL2_VBAR_EL1 | 299 + HFGRTR_EL2_TTBR1_EL1 | 300 + HFGRTR_EL2_TTBR0_EL1 | 301 + HFGRTR_EL2_TPIDR_EL0 | 302 + HFGRTR_EL2_TPIDRRO_EL0 | 303 + HFGRTR_EL2_TPIDR_EL1 | 304 + HFGRTR_EL2_TCR_EL1 | 305 + HFGRTR_EL2_SCTLR_EL1 | 306 + HFGRTR_EL2_REVIDR_EL1 | 307 + HFGRTR_EL2_PAR_EL1 | 308 + HFGRTR_EL2_MPIDR_EL1 | 309 + HFGRTR_EL2_MIDR_EL1 | 310 + HFGRTR_EL2_MAIR_EL1 | 311 + HFGRTR_EL2_ISR_EL1 | 312 + HFGRTR_EL2_FAR_EL1 | 313 + HFGRTR_EL2_ESR_EL1 | 314 + HFGRTR_EL2_DCZID_EL0 | 315 + HFGRTR_EL2_CTR_EL0 | 316 + HFGRTR_EL2_CSSELR_EL1 | 317 + HFGRTR_EL2_CPACR_EL1 | 318 + HFGRTR_EL2_CONTEXTIDR_EL1| 319 + HFGRTR_EL2_CLIDR_EL1 | 320 + HFGRTR_EL2_CCSIDR_EL1 | 321 + HFGRTR_EL2_AMAIR_EL1 | 322 + HFGRTR_EL2_AIDR_EL1 | 323 + HFGRTR_EL2_AFSR1_EL1 | 324 + HFGRTR_EL2_AFSR0_EL1, 325 + NEVER_FGU, FEAT_AA64EL1), 326 + }; 327 + 328 + static const struct reg_bits_to_feat_map hfgwtr_feat_map[] = { 329 + NEEDS_FEAT(HFGWTR_EL2_nAMAIR2_EL1 | 330 + HFGWTR_EL2_nMAIR2_EL1, 331 + FEAT_AIE), 332 + NEEDS_FEAT(HFGWTR_EL2_nS2POR_EL1, FEAT_S2POE), 333 + NEEDS_FEAT(HFGWTR_EL2_nPOR_EL1 | 334 + HFGWTR_EL2_nPOR_EL0, 335 + FEAT_S1POE), 336 + NEEDS_FEAT(HFGWTR_EL2_nPIR_EL1 | 337 + HFGWTR_EL2_nPIRE0_EL1, 338 + FEAT_S1PIE), 339 + NEEDS_FEAT(HFGWTR_EL2_nRCWMASK_EL1, FEAT_THE), 340 + NEEDS_FEAT(HFGWTR_EL2_nTPIDR2_EL0 | 341 + HFGWTR_EL2_nSMPRI_EL1, 342 + FEAT_SME), 343 + NEEDS_FEAT(HFGWTR_EL2_nGCS_EL1 | 344 + HFGWTR_EL2_nGCS_EL0, 345 + FEAT_GCS), 346 + NEEDS_FEAT(HFGWTR_EL2_nACCDATA_EL1, FEAT_LS64_ACCDATA), 347 + NEEDS_FEAT(HFGWTR_EL2_ERXADDR_EL1 | 348 + HFGWTR_EL2_ERXMISCn_EL1 | 349 + HFGWTR_EL2_ERXSTATUS_EL1 | 350 + HFGWTR_EL2_ERXCTLR_EL1 | 351 + HFGWTR_EL2_ERRSELR_EL1, 352 + FEAT_RAS), 353 + NEEDS_FEAT(HFGWTR_EL2_ERXPFGCDN_EL1 | 354 + HFGWTR_EL2_ERXPFGCTL_EL1, 355 + feat_rasv1p1), 356 + NEEDS_FEAT(HFGWTR_EL2_ICC_IGRPENn_EL1, FEAT_GICv3), 357 + NEEDS_FEAT(HFGWTR_EL2_SCXTNUM_EL0 | 358 + HFGWTR_EL2_SCXTNUM_EL1, 359 + feat_csv2_2_csv2_1p2), 360 + NEEDS_FEAT(HFGWTR_EL2_LORSA_EL1 | 361 + HFGWTR_EL2_LORN_EL1 | 362 + HFGWTR_EL2_LOREA_EL1 | 363 + HFGWTR_EL2_LORC_EL1, 364 + FEAT_LOR), 365 + NEEDS_FEAT(HFGWTR_EL2_APIBKey | 366 + HFGWTR_EL2_APIAKey | 367 + HFGWTR_EL2_APGAKey | 368 + HFGWTR_EL2_APDBKey | 369 + HFGWTR_EL2_APDAKey, 370 + feat_pauth), 371 + NEEDS_FEAT_FLAG(HFGWTR_EL2_VBAR_EL1 | 372 + HFGWTR_EL2_TTBR1_EL1 | 373 + HFGWTR_EL2_TTBR0_EL1 | 374 + HFGWTR_EL2_TPIDR_EL0 | 375 + HFGWTR_EL2_TPIDRRO_EL0 | 376 + HFGWTR_EL2_TPIDR_EL1 | 377 + HFGWTR_EL2_TCR_EL1 | 378 + HFGWTR_EL2_SCTLR_EL1 | 379 + HFGWTR_EL2_PAR_EL1 | 380 + HFGWTR_EL2_MAIR_EL1 | 381 + HFGWTR_EL2_FAR_EL1 | 382 + HFGWTR_EL2_ESR_EL1 | 383 + HFGWTR_EL2_CSSELR_EL1 | 384 + HFGWTR_EL2_CPACR_EL1 | 385 + HFGWTR_EL2_CONTEXTIDR_EL1| 386 + HFGWTR_EL2_AMAIR_EL1 | 387 + HFGWTR_EL2_AFSR1_EL1 | 388 + HFGWTR_EL2_AFSR0_EL1, 389 + NEVER_FGU, FEAT_AA64EL1), 390 + }; 391 + 392 + static const struct reg_bits_to_feat_map hdfgrtr_feat_map[] = { 393 + NEEDS_FEAT(HDFGRTR_EL2_PMBIDR_EL1 | 394 + HDFGRTR_EL2_PMSLATFR_EL1 | 395 + HDFGRTR_EL2_PMSIRR_EL1 | 396 + HDFGRTR_EL2_PMSIDR_EL1 | 397 + HDFGRTR_EL2_PMSICR_EL1 | 398 + HDFGRTR_EL2_PMSFCR_EL1 | 399 + HDFGRTR_EL2_PMSEVFR_EL1 | 400 + HDFGRTR_EL2_PMSCR_EL1 | 401 + HDFGRTR_EL2_PMBSR_EL1 | 402 + HDFGRTR_EL2_PMBPTR_EL1 | 403 + HDFGRTR_EL2_PMBLIMITR_EL1, 404 + FEAT_SPE), 405 + NEEDS_FEAT(HDFGRTR_EL2_nPMSNEVFR_EL1, FEAT_SPE_FnE), 406 + NEEDS_FEAT(HDFGRTR_EL2_nBRBDATA | 407 + HDFGRTR_EL2_nBRBCTL | 408 + HDFGRTR_EL2_nBRBIDR, 409 + FEAT_BRBE), 410 + NEEDS_FEAT(HDFGRTR_EL2_TRCVICTLR | 411 + HDFGRTR_EL2_TRCSTATR | 412 + HDFGRTR_EL2_TRCSSCSRn | 413 + HDFGRTR_EL2_TRCSEQSTR | 414 + HDFGRTR_EL2_TRCPRGCTLR | 415 + HDFGRTR_EL2_TRCOSLSR | 416 + HDFGRTR_EL2_TRCIMSPECn | 417 + HDFGRTR_EL2_TRCID | 418 + HDFGRTR_EL2_TRCCNTVRn | 419 + HDFGRTR_EL2_TRCCLAIM | 420 + HDFGRTR_EL2_TRCAUXCTLR | 421 + HDFGRTR_EL2_TRCAUTHSTATUS | 422 + HDFGRTR_EL2_TRC, 423 + FEAT_TRC_SR), 424 + NEEDS_FEAT(HDFGRTR_EL2_PMCEIDn_EL0 | 425 + HDFGRTR_EL2_PMUSERENR_EL0 | 426 + HDFGRTR_EL2_PMMIR_EL1 | 427 + HDFGRTR_EL2_PMSELR_EL0 | 428 + HDFGRTR_EL2_PMOVS | 429 + HDFGRTR_EL2_PMINTEN | 430 + HDFGRTR_EL2_PMCNTEN | 431 + HDFGRTR_EL2_PMCCNTR_EL0 | 432 + HDFGRTR_EL2_PMCCFILTR_EL0 | 433 + HDFGRTR_EL2_PMEVTYPERn_EL0 | 434 + HDFGRTR_EL2_PMEVCNTRn_EL0, 435 + FEAT_PMUv3), 436 + NEEDS_FEAT(HDFGRTR_EL2_TRBTRG_EL1 | 437 + HDFGRTR_EL2_TRBSR_EL1 | 438 + HDFGRTR_EL2_TRBPTR_EL1 | 439 + HDFGRTR_EL2_TRBMAR_EL1 | 440 + HDFGRTR_EL2_TRBLIMITR_EL1 | 441 + HDFGRTR_EL2_TRBIDR_EL1 | 442 + HDFGRTR_EL2_TRBBASER_EL1, 443 + FEAT_TRBE), 444 + NEEDS_FEAT_FLAG(HDFGRTR_EL2_OSDLR_EL1, NEVER_FGU, 445 + FEAT_DoubleLock), 446 + NEEDS_FEAT_FLAG(HDFGRTR_EL2_OSECCR_EL1 | 447 + HDFGRTR_EL2_OSLSR_EL1 | 448 + HDFGRTR_EL2_DBGPRCR_EL1 | 449 + HDFGRTR_EL2_DBGAUTHSTATUS_EL1| 450 + HDFGRTR_EL2_DBGCLAIM | 451 + HDFGRTR_EL2_MDSCR_EL1 | 452 + HDFGRTR_EL2_DBGWVRn_EL1 | 453 + HDFGRTR_EL2_DBGWCRn_EL1 | 454 + HDFGRTR_EL2_DBGBVRn_EL1 | 455 + HDFGRTR_EL2_DBGBCRn_EL1, 456 + NEVER_FGU, FEAT_AA64EL1) 457 + }; 458 + 459 + static const struct reg_bits_to_feat_map hdfgwtr_feat_map[] = { 460 + NEEDS_FEAT(HDFGWTR_EL2_PMSLATFR_EL1 | 461 + HDFGWTR_EL2_PMSIRR_EL1 | 462 + HDFGWTR_EL2_PMSICR_EL1 | 463 + HDFGWTR_EL2_PMSFCR_EL1 | 464 + HDFGWTR_EL2_PMSEVFR_EL1 | 465 + HDFGWTR_EL2_PMSCR_EL1 | 466 + HDFGWTR_EL2_PMBSR_EL1 | 467 + HDFGWTR_EL2_PMBPTR_EL1 | 468 + HDFGWTR_EL2_PMBLIMITR_EL1, 469 + FEAT_SPE), 470 + NEEDS_FEAT(HDFGWTR_EL2_nPMSNEVFR_EL1, FEAT_SPE_FnE), 471 + NEEDS_FEAT(HDFGWTR_EL2_nBRBDATA | 472 + HDFGWTR_EL2_nBRBCTL, 473 + FEAT_BRBE), 474 + NEEDS_FEAT(HDFGWTR_EL2_TRCVICTLR | 475 + HDFGWTR_EL2_TRCSSCSRn | 476 + HDFGWTR_EL2_TRCSEQSTR | 477 + HDFGWTR_EL2_TRCPRGCTLR | 478 + HDFGWTR_EL2_TRCOSLAR | 479 + HDFGWTR_EL2_TRCIMSPECn | 480 + HDFGWTR_EL2_TRCCNTVRn | 481 + HDFGWTR_EL2_TRCCLAIM | 482 + HDFGWTR_EL2_TRCAUXCTLR | 483 + HDFGWTR_EL2_TRC, 484 + FEAT_TRC_SR), 485 + NEEDS_FEAT(HDFGWTR_EL2_PMUSERENR_EL0 | 486 + HDFGWTR_EL2_PMCR_EL0 | 487 + HDFGWTR_EL2_PMSWINC_EL0 | 488 + HDFGWTR_EL2_PMSELR_EL0 | 489 + HDFGWTR_EL2_PMOVS | 490 + HDFGWTR_EL2_PMINTEN | 491 + HDFGWTR_EL2_PMCNTEN | 492 + HDFGWTR_EL2_PMCCNTR_EL0 | 493 + HDFGWTR_EL2_PMCCFILTR_EL0 | 494 + HDFGWTR_EL2_PMEVTYPERn_EL0 | 495 + HDFGWTR_EL2_PMEVCNTRn_EL0, 496 + FEAT_PMUv3), 497 + NEEDS_FEAT(HDFGWTR_EL2_TRBTRG_EL1 | 498 + HDFGWTR_EL2_TRBSR_EL1 | 499 + HDFGWTR_EL2_TRBPTR_EL1 | 500 + HDFGWTR_EL2_TRBMAR_EL1 | 501 + HDFGWTR_EL2_TRBLIMITR_EL1 | 502 + HDFGWTR_EL2_TRBBASER_EL1, 503 + FEAT_TRBE), 504 + NEEDS_FEAT_FLAG(HDFGWTR_EL2_OSDLR_EL1, 505 + NEVER_FGU, FEAT_DoubleLock), 506 + NEEDS_FEAT_FLAG(HDFGWTR_EL2_OSECCR_EL1 | 507 + HDFGWTR_EL2_OSLAR_EL1 | 508 + HDFGWTR_EL2_DBGPRCR_EL1 | 509 + HDFGWTR_EL2_DBGCLAIM | 510 + HDFGWTR_EL2_MDSCR_EL1 | 511 + HDFGWTR_EL2_DBGWVRn_EL1 | 512 + HDFGWTR_EL2_DBGWCRn_EL1 | 513 + HDFGWTR_EL2_DBGBVRn_EL1 | 514 + HDFGWTR_EL2_DBGBCRn_EL1, 515 + NEVER_FGU, FEAT_AA64EL1), 516 + NEEDS_FEAT(HDFGWTR_EL2_TRFCR_EL1, FEAT_TRF), 517 + }; 518 + 519 + 520 + static const struct reg_bits_to_feat_map hfgitr_feat_map[] = { 521 + NEEDS_FEAT(HFGITR_EL2_PSBCSYNC, FEAT_SPEv1p5), 522 + NEEDS_FEAT(HFGITR_EL2_ATS1E1A, FEAT_ATS1A), 523 + NEEDS_FEAT(HFGITR_EL2_COSPRCTX, FEAT_SPECRES2), 524 + NEEDS_FEAT(HFGITR_EL2_nGCSEPP | 525 + HFGITR_EL2_nGCSSTR_EL1 | 526 + HFGITR_EL2_nGCSPUSHM_EL1, 527 + FEAT_GCS), 528 + NEEDS_FEAT(HFGITR_EL2_nBRBIALL | 529 + HFGITR_EL2_nBRBINJ, 530 + FEAT_BRBE), 531 + NEEDS_FEAT(HFGITR_EL2_CPPRCTX | 532 + HFGITR_EL2_DVPRCTX | 533 + HFGITR_EL2_CFPRCTX, 534 + FEAT_SPECRES), 535 + NEEDS_FEAT(HFGITR_EL2_TLBIRVAALE1 | 536 + HFGITR_EL2_TLBIRVALE1 | 537 + HFGITR_EL2_TLBIRVAAE1 | 538 + HFGITR_EL2_TLBIRVAE1 | 539 + HFGITR_EL2_TLBIRVAALE1IS | 540 + HFGITR_EL2_TLBIRVALE1IS | 541 + HFGITR_EL2_TLBIRVAAE1IS | 542 + HFGITR_EL2_TLBIRVAE1IS | 543 + HFGITR_EL2_TLBIRVAALE1OS | 544 + HFGITR_EL2_TLBIRVALE1OS | 545 + HFGITR_EL2_TLBIRVAAE1OS | 546 + HFGITR_EL2_TLBIRVAE1OS, 547 + FEAT_TLBIRANGE), 548 + NEEDS_FEAT(HFGITR_EL2_TLBIVAALE1OS | 549 + HFGITR_EL2_TLBIVALE1OS | 550 + HFGITR_EL2_TLBIVAAE1OS | 551 + HFGITR_EL2_TLBIASIDE1OS | 552 + HFGITR_EL2_TLBIVAE1OS | 553 + HFGITR_EL2_TLBIVMALLE1OS, 554 + FEAT_TLBIOS), 555 + NEEDS_FEAT(HFGITR_EL2_ATS1E1WP | 556 + HFGITR_EL2_ATS1E1RP, 557 + FEAT_PAN2), 558 + NEEDS_FEAT(HFGITR_EL2_DCCVADP, FEAT_DPB2), 559 + NEEDS_FEAT_FLAG(HFGITR_EL2_DCCVAC | 560 + HFGITR_EL2_SVC_EL1 | 561 + HFGITR_EL2_SVC_EL0 | 562 + HFGITR_EL2_ERET | 563 + HFGITR_EL2_TLBIVAALE1 | 564 + HFGITR_EL2_TLBIVALE1 | 565 + HFGITR_EL2_TLBIVAAE1 | 566 + HFGITR_EL2_TLBIASIDE1 | 567 + HFGITR_EL2_TLBIVAE1 | 568 + HFGITR_EL2_TLBIVMALLE1 | 569 + HFGITR_EL2_TLBIVAALE1IS | 570 + HFGITR_EL2_TLBIVALE1IS | 571 + HFGITR_EL2_TLBIVAAE1IS | 572 + HFGITR_EL2_TLBIASIDE1IS | 573 + HFGITR_EL2_TLBIVAE1IS | 574 + HFGITR_EL2_TLBIVMALLE1IS| 575 + HFGITR_EL2_ATS1E0W | 576 + HFGITR_EL2_ATS1E0R | 577 + HFGITR_EL2_ATS1E1W | 578 + HFGITR_EL2_ATS1E1R | 579 + HFGITR_EL2_DCZVA | 580 + HFGITR_EL2_DCCIVAC | 581 + HFGITR_EL2_DCCVAP | 582 + HFGITR_EL2_DCCVAU | 583 + HFGITR_EL2_DCCISW | 584 + HFGITR_EL2_DCCSW | 585 + HFGITR_EL2_DCISW | 586 + HFGITR_EL2_DCIVAC | 587 + HFGITR_EL2_ICIVAU | 588 + HFGITR_EL2_ICIALLU | 589 + HFGITR_EL2_ICIALLUIS, 590 + NEVER_FGU, FEAT_AA64EL1), 591 + }; 592 + 593 + static const struct reg_bits_to_feat_map hafgrtr_feat_map[] = { 594 + NEEDS_FEAT(HAFGRTR_EL2_AMEVTYPER115_EL0 | 595 + HAFGRTR_EL2_AMEVTYPER114_EL0 | 596 + HAFGRTR_EL2_AMEVTYPER113_EL0 | 597 + HAFGRTR_EL2_AMEVTYPER112_EL0 | 598 + HAFGRTR_EL2_AMEVTYPER111_EL0 | 599 + HAFGRTR_EL2_AMEVTYPER110_EL0 | 600 + HAFGRTR_EL2_AMEVTYPER19_EL0 | 601 + HAFGRTR_EL2_AMEVTYPER18_EL0 | 602 + HAFGRTR_EL2_AMEVTYPER17_EL0 | 603 + HAFGRTR_EL2_AMEVTYPER16_EL0 | 604 + HAFGRTR_EL2_AMEVTYPER15_EL0 | 605 + HAFGRTR_EL2_AMEVTYPER14_EL0 | 606 + HAFGRTR_EL2_AMEVTYPER13_EL0 | 607 + HAFGRTR_EL2_AMEVTYPER12_EL0 | 608 + HAFGRTR_EL2_AMEVTYPER11_EL0 | 609 + HAFGRTR_EL2_AMEVTYPER10_EL0 | 610 + HAFGRTR_EL2_AMEVCNTR115_EL0 | 611 + HAFGRTR_EL2_AMEVCNTR114_EL0 | 612 + HAFGRTR_EL2_AMEVCNTR113_EL0 | 613 + HAFGRTR_EL2_AMEVCNTR112_EL0 | 614 + HAFGRTR_EL2_AMEVCNTR111_EL0 | 615 + HAFGRTR_EL2_AMEVCNTR110_EL0 | 616 + HAFGRTR_EL2_AMEVCNTR19_EL0 | 617 + HAFGRTR_EL2_AMEVCNTR18_EL0 | 618 + HAFGRTR_EL2_AMEVCNTR17_EL0 | 619 + HAFGRTR_EL2_AMEVCNTR16_EL0 | 620 + HAFGRTR_EL2_AMEVCNTR15_EL0 | 621 + HAFGRTR_EL2_AMEVCNTR14_EL0 | 622 + HAFGRTR_EL2_AMEVCNTR13_EL0 | 623 + HAFGRTR_EL2_AMEVCNTR12_EL0 | 624 + HAFGRTR_EL2_AMEVCNTR11_EL0 | 625 + HAFGRTR_EL2_AMEVCNTR10_EL0 | 626 + HAFGRTR_EL2_AMCNTEN1 | 627 + HAFGRTR_EL2_AMCNTEN0 | 628 + HAFGRTR_EL2_AMEVCNTR03_EL0 | 629 + HAFGRTR_EL2_AMEVCNTR02_EL0 | 630 + HAFGRTR_EL2_AMEVCNTR01_EL0 | 631 + HAFGRTR_EL2_AMEVCNTR00_EL0, 632 + FEAT_AMUv1), 633 + }; 634 + 635 + static const struct reg_bits_to_feat_map hfgitr2_feat_map[] = { 636 + NEEDS_FEAT(HFGITR2_EL2_nDCCIVAPS, FEAT_PoPS), 637 + NEEDS_FEAT(HFGITR2_EL2_TSBCSYNC, FEAT_TRBEv1p1) 638 + }; 639 + 640 + static const struct reg_bits_to_feat_map hfgrtr2_feat_map[] = { 641 + NEEDS_FEAT(HFGRTR2_EL2_nPFAR_EL1, FEAT_PFAR), 642 + NEEDS_FEAT(HFGRTR2_EL2_nERXGSR_EL1, FEAT_RASv2), 643 + NEEDS_FEAT(HFGRTR2_EL2_nACTLRALIAS_EL1 | 644 + HFGRTR2_EL2_nACTLRMASK_EL1 | 645 + HFGRTR2_EL2_nCPACRALIAS_EL1 | 646 + HFGRTR2_EL2_nCPACRMASK_EL1 | 647 + HFGRTR2_EL2_nSCTLR2MASK_EL1 | 648 + HFGRTR2_EL2_nSCTLRALIAS2_EL1 | 649 + HFGRTR2_EL2_nSCTLRALIAS_EL1 | 650 + HFGRTR2_EL2_nSCTLRMASK_EL1 | 651 + HFGRTR2_EL2_nTCR2ALIAS_EL1 | 652 + HFGRTR2_EL2_nTCR2MASK_EL1 | 653 + HFGRTR2_EL2_nTCRALIAS_EL1 | 654 + HFGRTR2_EL2_nTCRMASK_EL1, 655 + FEAT_SRMASK), 656 + NEEDS_FEAT(HFGRTR2_EL2_nRCWSMASK_EL1, FEAT_THE), 657 + }; 658 + 659 + static const struct reg_bits_to_feat_map hfgwtr2_feat_map[] = { 660 + NEEDS_FEAT(HFGWTR2_EL2_nPFAR_EL1, FEAT_PFAR), 661 + NEEDS_FEAT(HFGWTR2_EL2_nACTLRALIAS_EL1 | 662 + HFGWTR2_EL2_nACTLRMASK_EL1 | 663 + HFGWTR2_EL2_nCPACRALIAS_EL1 | 664 + HFGWTR2_EL2_nCPACRMASK_EL1 | 665 + HFGWTR2_EL2_nSCTLR2MASK_EL1 | 666 + HFGWTR2_EL2_nSCTLRALIAS2_EL1 | 667 + HFGWTR2_EL2_nSCTLRALIAS_EL1 | 668 + HFGWTR2_EL2_nSCTLRMASK_EL1 | 669 + HFGWTR2_EL2_nTCR2ALIAS_EL1 | 670 + HFGWTR2_EL2_nTCR2MASK_EL1 | 671 + HFGWTR2_EL2_nTCRALIAS_EL1 | 672 + HFGWTR2_EL2_nTCRMASK_EL1, 673 + FEAT_SRMASK), 674 + NEEDS_FEAT(HFGWTR2_EL2_nRCWSMASK_EL1, FEAT_THE), 675 + }; 676 + 677 + static const struct reg_bits_to_feat_map hdfgrtr2_feat_map[] = { 678 + NEEDS_FEAT(HDFGRTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9), 679 + NEEDS_FEAT(HDFGRTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss), 680 + NEEDS_FEAT(HDFGRTR2_EL2_nTRCITECR_EL1, FEAT_ITE), 681 + NEEDS_FEAT(HDFGRTR2_EL2_nPMICFILTR_EL0 | 682 + HDFGRTR2_EL2_nPMICNTR_EL0, 683 + FEAT_PMUv3_ICNTR), 684 + NEEDS_FEAT(HDFGRTR2_EL2_nPMUACR_EL1, FEAT_PMUv3p9), 685 + NEEDS_FEAT(HDFGRTR2_EL2_nPMSSCR_EL1 | 686 + HDFGRTR2_EL2_nPMSSDATA, 687 + FEAT_PMUv3_SS), 688 + NEEDS_FEAT(HDFGRTR2_EL2_nPMIAR_EL1, FEAT_SEBEP), 689 + NEEDS_FEAT(HDFGRTR2_EL2_nPMSDSFR_EL1, feat_spe_fds), 690 + NEEDS_FEAT(HDFGRTR2_EL2_nPMBMAR_EL1, FEAT_SPE_nVM), 691 + NEEDS_FEAT(HDFGRTR2_EL2_nSPMACCESSR_EL1 | 692 + HDFGRTR2_EL2_nSPMCNTEN | 693 + HDFGRTR2_EL2_nSPMCR_EL0 | 694 + HDFGRTR2_EL2_nSPMDEVAFF_EL1 | 695 + HDFGRTR2_EL2_nSPMEVCNTRn_EL0 | 696 + HDFGRTR2_EL2_nSPMEVTYPERn_EL0| 697 + HDFGRTR2_EL2_nSPMID | 698 + HDFGRTR2_EL2_nSPMINTEN | 699 + HDFGRTR2_EL2_nSPMOVS | 700 + HDFGRTR2_EL2_nSPMSCR_EL1 | 701 + HDFGRTR2_EL2_nSPMSELR_EL0, 702 + FEAT_SPMU), 703 + NEEDS_FEAT(HDFGRTR2_EL2_nMDSTEPOP_EL1, FEAT_STEP2), 704 + NEEDS_FEAT(HDFGRTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam), 705 + }; 706 + 707 + static const struct reg_bits_to_feat_map hdfgwtr2_feat_map[] = { 708 + NEEDS_FEAT(HDFGWTR2_EL2_nMDSELR_EL1, FEAT_Debugv8p9), 709 + NEEDS_FEAT(HDFGWTR2_EL2_nPMECR_EL1, feat_ebep_pmuv3_ss), 710 + NEEDS_FEAT(HDFGWTR2_EL2_nTRCITECR_EL1, FEAT_ITE), 711 + NEEDS_FEAT(HDFGWTR2_EL2_nPMICFILTR_EL0 | 712 + HDFGWTR2_EL2_nPMICNTR_EL0, 713 + FEAT_PMUv3_ICNTR), 714 + NEEDS_FEAT(HDFGWTR2_EL2_nPMUACR_EL1 | 715 + HDFGWTR2_EL2_nPMZR_EL0, 716 + FEAT_PMUv3p9), 717 + NEEDS_FEAT(HDFGWTR2_EL2_nPMSSCR_EL1, FEAT_PMUv3_SS), 718 + NEEDS_FEAT(HDFGWTR2_EL2_nPMIAR_EL1, FEAT_SEBEP), 719 + NEEDS_FEAT(HDFGWTR2_EL2_nPMSDSFR_EL1, feat_spe_fds), 720 + NEEDS_FEAT(HDFGWTR2_EL2_nPMBMAR_EL1, FEAT_SPE_nVM), 721 + NEEDS_FEAT(HDFGWTR2_EL2_nSPMACCESSR_EL1 | 722 + HDFGWTR2_EL2_nSPMCNTEN | 723 + HDFGWTR2_EL2_nSPMCR_EL0 | 724 + HDFGWTR2_EL2_nSPMEVCNTRn_EL0 | 725 + HDFGWTR2_EL2_nSPMEVTYPERn_EL0| 726 + HDFGWTR2_EL2_nSPMINTEN | 727 + HDFGWTR2_EL2_nSPMOVS | 728 + HDFGWTR2_EL2_nSPMSCR_EL1 | 729 + HDFGWTR2_EL2_nSPMSELR_EL0, 730 + FEAT_SPMU), 731 + NEEDS_FEAT(HDFGWTR2_EL2_nMDSTEPOP_EL1, FEAT_STEP2), 732 + NEEDS_FEAT(HDFGWTR2_EL2_nTRBMPAM_EL1, feat_trbe_mpam), 733 + }; 734 + 735 + static const struct reg_bits_to_feat_map hcrx_feat_map[] = { 736 + NEEDS_FEAT(HCRX_EL2_PACMEn, feat_pauth_lr), 737 + NEEDS_FEAT(HCRX_EL2_EnFPM, FEAT_FPMR), 738 + NEEDS_FEAT(HCRX_EL2_GCSEn, FEAT_GCS), 739 + NEEDS_FEAT(HCRX_EL2_EnIDCP128, FEAT_SYSREG128), 740 + NEEDS_FEAT(HCRX_EL2_EnSDERR, feat_aderr), 741 + NEEDS_FEAT(HCRX_EL2_TMEA, FEAT_DoubleFault2), 742 + NEEDS_FEAT(HCRX_EL2_EnSNERR, feat_anerr), 743 + NEEDS_FEAT(HCRX_EL2_D128En, FEAT_D128), 744 + NEEDS_FEAT(HCRX_EL2_PTTWI, FEAT_THE), 745 + NEEDS_FEAT(HCRX_EL2_SCTLR2En, FEAT_SCTLR2), 746 + NEEDS_FEAT(HCRX_EL2_TCR2En, FEAT_TCR2), 747 + NEEDS_FEAT(HCRX_EL2_MSCEn | 748 + HCRX_EL2_MCE2, 749 + FEAT_MOPS), 750 + NEEDS_FEAT(HCRX_EL2_CMOW, FEAT_CMOW), 751 + NEEDS_FEAT(HCRX_EL2_VFNMI | 752 + HCRX_EL2_VINMI | 753 + HCRX_EL2_TALLINT, 754 + FEAT_NMI), 755 + NEEDS_FEAT(HCRX_EL2_SMPME, feat_sme_smps), 756 + NEEDS_FEAT(HCRX_EL2_FGTnXS | 757 + HCRX_EL2_FnXS, 758 + FEAT_XS), 759 + NEEDS_FEAT(HCRX_EL2_EnASR, FEAT_LS64_V), 760 + NEEDS_FEAT(HCRX_EL2_EnALS, FEAT_LS64), 761 + NEEDS_FEAT(HCRX_EL2_EnAS0, FEAT_LS64_ACCDATA), 762 + }; 763 + 764 + static const struct reg_bits_to_feat_map hcr_feat_map[] = { 765 + NEEDS_FEAT(HCR_EL2_TID0, FEAT_AA32EL0), 766 + NEEDS_FEAT_FIXED(HCR_EL2_RW, compute_hcr_rw), 767 + NEEDS_FEAT(HCR_EL2_HCD, not_feat_aa64el3), 768 + NEEDS_FEAT(HCR_EL2_AMO | 769 + HCR_EL2_BSU | 770 + HCR_EL2_CD | 771 + HCR_EL2_DC | 772 + HCR_EL2_FB | 773 + HCR_EL2_FMO | 774 + HCR_EL2_ID | 775 + HCR_EL2_IMO | 776 + HCR_EL2_MIOCNCE | 777 + HCR_EL2_PTW | 778 + HCR_EL2_SWIO | 779 + HCR_EL2_TACR | 780 + HCR_EL2_TDZ | 781 + HCR_EL2_TGE | 782 + HCR_EL2_TID1 | 783 + HCR_EL2_TID2 | 784 + HCR_EL2_TID3 | 785 + HCR_EL2_TIDCP | 786 + HCR_EL2_TPCP | 787 + HCR_EL2_TPU | 788 + HCR_EL2_TRVM | 789 + HCR_EL2_TSC | 790 + HCR_EL2_TSW | 791 + HCR_EL2_TTLB | 792 + HCR_EL2_TVM | 793 + HCR_EL2_TWE | 794 + HCR_EL2_TWI | 795 + HCR_EL2_VF | 796 + HCR_EL2_VI | 797 + HCR_EL2_VM | 798 + HCR_EL2_VSE, 799 + FEAT_AA64EL1), 800 + NEEDS_FEAT(HCR_EL2_AMVOFFEN, FEAT_AMUv1p1), 801 + NEEDS_FEAT(HCR_EL2_EnSCXT, feat_csv2_2_csv2_1p2), 802 + NEEDS_FEAT(HCR_EL2_TICAB | 803 + HCR_EL2_TID4 | 804 + HCR_EL2_TOCU, 805 + FEAT_EVT), 806 + NEEDS_FEAT(HCR_EL2_TTLBIS | 807 + HCR_EL2_TTLBOS, 808 + FEAT_EVT_TTLBxS), 809 + NEEDS_FEAT(HCR_EL2_TLOR, FEAT_LOR), 810 + NEEDS_FEAT(HCR_EL2_ATA | 811 + HCR_EL2_DCT | 812 + HCR_EL2_TID5, 813 + FEAT_MTE2), 814 + NEEDS_FEAT(HCR_EL2_AT | /* Ignore the original FEAT_NV */ 815 + HCR_EL2_NV2 | 816 + HCR_EL2_NV, 817 + feat_nv2), 818 + NEEDS_FEAT(HCR_EL2_NV1, feat_nv2_e2h0_ni), /* Missing from JSON */ 819 + NEEDS_FEAT(HCR_EL2_API | 820 + HCR_EL2_APK, 821 + feat_pauth), 822 + NEEDS_FEAT(HCR_EL2_TEA | 823 + HCR_EL2_TERR, 824 + FEAT_RAS), 825 + NEEDS_FEAT(HCR_EL2_FIEN, feat_rasv1p1), 826 + NEEDS_FEAT(HCR_EL2_GPF, FEAT_RME), 827 + NEEDS_FEAT(HCR_EL2_FWB, FEAT_S2FWB), 828 + NEEDS_FEAT(HCR_EL2_TME, FEAT_TME), 829 + NEEDS_FEAT(HCR_EL2_TWEDEL | 830 + HCR_EL2_TWEDEn, 831 + FEAT_TWED), 832 + NEEDS_FEAT_FIXED(HCR_EL2_E2H, compute_hcr_e2h), 833 + }; 834 + 835 + static void __init check_feat_map(const struct reg_bits_to_feat_map *map, 836 + int map_size, u64 res0, const char *str) 837 + { 838 + u64 mask = 0; 839 + 840 + for (int i = 0; i < map_size; i++) 841 + mask |= map[i].bits; 842 + 843 + if (mask != ~res0) 844 + kvm_err("Undefined %s behaviour, bits %016llx\n", 845 + str, mask ^ ~res0); 846 + } 847 + 848 + void __init check_feature_map(void) 849 + { 850 + check_feat_map(hfgrtr_feat_map, ARRAY_SIZE(hfgrtr_feat_map), 851 + hfgrtr_masks.res0, hfgrtr_masks.str); 852 + check_feat_map(hfgwtr_feat_map, ARRAY_SIZE(hfgwtr_feat_map), 853 + hfgwtr_masks.res0, hfgwtr_masks.str); 854 + check_feat_map(hfgitr_feat_map, ARRAY_SIZE(hfgitr_feat_map), 855 + hfgitr_masks.res0, hfgitr_masks.str); 856 + check_feat_map(hdfgrtr_feat_map, ARRAY_SIZE(hdfgrtr_feat_map), 857 + hdfgrtr_masks.res0, hdfgrtr_masks.str); 858 + check_feat_map(hdfgwtr_feat_map, ARRAY_SIZE(hdfgwtr_feat_map), 859 + hdfgwtr_masks.res0, hdfgwtr_masks.str); 860 + check_feat_map(hafgrtr_feat_map, ARRAY_SIZE(hafgrtr_feat_map), 861 + hafgrtr_masks.res0, hafgrtr_masks.str); 862 + check_feat_map(hcrx_feat_map, ARRAY_SIZE(hcrx_feat_map), 863 + __HCRX_EL2_RES0, "HCRX_EL2"); 864 + check_feat_map(hcr_feat_map, ARRAY_SIZE(hcr_feat_map), 865 + HCR_EL2_RES0, "HCR_EL2"); 866 + } 867 + 868 + static bool idreg_feat_match(struct kvm *kvm, const struct reg_bits_to_feat_map *map) 869 + { 870 + u64 regval = kvm->arch.id_regs[map->regidx]; 871 + u64 regfld = (regval >> map->shift) & GENMASK(map->width - 1, 0); 872 + 873 + if (map->sign) { 874 + s64 sfld = sign_extend64(regfld, map->width - 1); 875 + s64 slim = sign_extend64(map->lo_lim, map->width - 1); 876 + return sfld >= slim; 877 + } else { 878 + return regfld >= map->lo_lim; 879 + } 880 + } 881 + 882 + static u64 __compute_fixed_bits(struct kvm *kvm, 883 + const struct reg_bits_to_feat_map *map, 884 + int map_size, 885 + u64 *fixed_bits, 886 + unsigned long require, 887 + unsigned long exclude) 888 + { 889 + u64 val = 0; 890 + 891 + for (int i = 0; i < map_size; i++) { 892 + bool match; 893 + 894 + if ((map[i].flags & require) != require) 895 + continue; 896 + 897 + if (map[i].flags & exclude) 898 + continue; 899 + 900 + if (map[i].flags & CALL_FUNC) 901 + match = (map[i].flags & FIXED_VALUE) ? 902 + map[i].fval(kvm, fixed_bits) : 903 + map[i].match(kvm); 904 + else 905 + match = idreg_feat_match(kvm, &map[i]); 906 + 907 + if (!match || (map[i].flags & FIXED_VALUE)) 908 + val |= map[i].bits; 909 + } 910 + 911 + return val; 912 + } 913 + 914 + static u64 compute_res0_bits(struct kvm *kvm, 915 + const struct reg_bits_to_feat_map *map, 916 + int map_size, 917 + unsigned long require, 918 + unsigned long exclude) 919 + { 920 + return __compute_fixed_bits(kvm, map, map_size, NULL, 921 + require, exclude | FIXED_VALUE); 922 + } 923 + 924 + static u64 compute_fixed_bits(struct kvm *kvm, 925 + const struct reg_bits_to_feat_map *map, 926 + int map_size, 927 + u64 *fixed_bits, 928 + unsigned long require, 929 + unsigned long exclude) 930 + { 931 + return __compute_fixed_bits(kvm, map, map_size, fixed_bits, 932 + require | FIXED_VALUE, exclude); 933 + } 934 + 935 + void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt) 936 + { 937 + u64 val = 0; 938 + 939 + switch (fgt) { 940 + case HFGRTR_GROUP: 941 + val |= compute_res0_bits(kvm, hfgrtr_feat_map, 942 + ARRAY_SIZE(hfgrtr_feat_map), 943 + 0, NEVER_FGU); 944 + val |= compute_res0_bits(kvm, hfgwtr_feat_map, 945 + ARRAY_SIZE(hfgwtr_feat_map), 946 + 0, NEVER_FGU); 947 + break; 948 + case HFGITR_GROUP: 949 + val |= compute_res0_bits(kvm, hfgitr_feat_map, 950 + ARRAY_SIZE(hfgitr_feat_map), 951 + 0, NEVER_FGU); 952 + break; 953 + case HDFGRTR_GROUP: 954 + val |= compute_res0_bits(kvm, hdfgrtr_feat_map, 955 + ARRAY_SIZE(hdfgrtr_feat_map), 956 + 0, NEVER_FGU); 957 + val |= compute_res0_bits(kvm, hdfgwtr_feat_map, 958 + ARRAY_SIZE(hdfgwtr_feat_map), 959 + 0, NEVER_FGU); 960 + break; 961 + case HAFGRTR_GROUP: 962 + val |= compute_res0_bits(kvm, hafgrtr_feat_map, 963 + ARRAY_SIZE(hafgrtr_feat_map), 964 + 0, NEVER_FGU); 965 + break; 966 + case HFGRTR2_GROUP: 967 + val |= compute_res0_bits(kvm, hfgrtr2_feat_map, 968 + ARRAY_SIZE(hfgrtr2_feat_map), 969 + 0, NEVER_FGU); 970 + val |= compute_res0_bits(kvm, hfgwtr2_feat_map, 971 + ARRAY_SIZE(hfgwtr2_feat_map), 972 + 0, NEVER_FGU); 973 + break; 974 + case HFGITR2_GROUP: 975 + val |= compute_res0_bits(kvm, hfgitr2_feat_map, 976 + ARRAY_SIZE(hfgitr2_feat_map), 977 + 0, NEVER_FGU); 978 + break; 979 + case HDFGRTR2_GROUP: 980 + val |= compute_res0_bits(kvm, hdfgrtr2_feat_map, 981 + ARRAY_SIZE(hdfgrtr2_feat_map), 982 + 0, NEVER_FGU); 983 + val |= compute_res0_bits(kvm, hdfgwtr2_feat_map, 984 + ARRAY_SIZE(hdfgwtr2_feat_map), 985 + 0, NEVER_FGU); 986 + break; 987 + default: 988 + BUG(); 989 + } 990 + 991 + kvm->arch.fgu[fgt] = val; 992 + } 993 + 994 + void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1) 995 + { 996 + u64 fixed = 0, mask; 997 + 998 + switch (reg) { 999 + case HFGRTR_EL2: 1000 + *res0 = compute_res0_bits(kvm, hfgrtr_feat_map, 1001 + ARRAY_SIZE(hfgrtr_feat_map), 0, 0); 1002 + *res0 |= hfgrtr_masks.res0; 1003 + *res1 = HFGRTR_EL2_RES1; 1004 + break; 1005 + case HFGWTR_EL2: 1006 + *res0 = compute_res0_bits(kvm, hfgwtr_feat_map, 1007 + ARRAY_SIZE(hfgwtr_feat_map), 0, 0); 1008 + *res0 |= hfgwtr_masks.res0; 1009 + *res1 = HFGWTR_EL2_RES1; 1010 + break; 1011 + case HFGITR_EL2: 1012 + *res0 = compute_res0_bits(kvm, hfgitr_feat_map, 1013 + ARRAY_SIZE(hfgitr_feat_map), 0, 0); 1014 + *res0 |= hfgitr_masks.res0; 1015 + *res1 = HFGITR_EL2_RES1; 1016 + break; 1017 + case HDFGRTR_EL2: 1018 + *res0 = compute_res0_bits(kvm, hdfgrtr_feat_map, 1019 + ARRAY_SIZE(hdfgrtr_feat_map), 0, 0); 1020 + *res0 |= hdfgrtr_masks.res0; 1021 + *res1 = HDFGRTR_EL2_RES1; 1022 + break; 1023 + case HDFGWTR_EL2: 1024 + *res0 = compute_res0_bits(kvm, hdfgwtr_feat_map, 1025 + ARRAY_SIZE(hdfgwtr_feat_map), 0, 0); 1026 + *res0 |= hdfgwtr_masks.res0; 1027 + *res1 = HDFGWTR_EL2_RES1; 1028 + break; 1029 + case HAFGRTR_EL2: 1030 + *res0 = compute_res0_bits(kvm, hafgrtr_feat_map, 1031 + ARRAY_SIZE(hafgrtr_feat_map), 0, 0); 1032 + *res0 |= hafgrtr_masks.res0; 1033 + *res1 = HAFGRTR_EL2_RES1; 1034 + break; 1035 + case HFGRTR2_EL2: 1036 + *res0 = compute_res0_bits(kvm, hfgrtr2_feat_map, 1037 + ARRAY_SIZE(hfgrtr2_feat_map), 0, 0); 1038 + *res0 |= hfgrtr2_masks.res0; 1039 + *res1 = HFGRTR2_EL2_RES1; 1040 + break; 1041 + case HFGWTR2_EL2: 1042 + *res0 = compute_res0_bits(kvm, hfgwtr2_feat_map, 1043 + ARRAY_SIZE(hfgwtr2_feat_map), 0, 0); 1044 + *res0 |= hfgwtr2_masks.res0; 1045 + *res1 = HFGWTR2_EL2_RES1; 1046 + break; 1047 + case HFGITR2_EL2: 1048 + *res0 = compute_res0_bits(kvm, hfgitr2_feat_map, 1049 + ARRAY_SIZE(hfgitr2_feat_map), 0, 0); 1050 + *res0 |= hfgitr2_masks.res0; 1051 + *res1 = HFGITR2_EL2_RES1; 1052 + break; 1053 + case HDFGRTR2_EL2: 1054 + *res0 = compute_res0_bits(kvm, hdfgrtr2_feat_map, 1055 + ARRAY_SIZE(hdfgrtr2_feat_map), 0, 0); 1056 + *res0 |= hdfgrtr2_masks.res0; 1057 + *res1 = HDFGRTR2_EL2_RES1; 1058 + break; 1059 + case HDFGWTR2_EL2: 1060 + *res0 = compute_res0_bits(kvm, hdfgwtr2_feat_map, 1061 + ARRAY_SIZE(hdfgwtr2_feat_map), 0, 0); 1062 + *res0 |= hdfgwtr2_masks.res0; 1063 + *res1 = HDFGWTR2_EL2_RES1; 1064 + break; 1065 + case HCRX_EL2: 1066 + *res0 = compute_res0_bits(kvm, hcrx_feat_map, 1067 + ARRAY_SIZE(hcrx_feat_map), 0, 0); 1068 + *res0 |= __HCRX_EL2_RES0; 1069 + *res1 = __HCRX_EL2_RES1; 1070 + break; 1071 + case HCR_EL2: 1072 + mask = compute_fixed_bits(kvm, hcr_feat_map, 1073 + ARRAY_SIZE(hcr_feat_map), &fixed, 1074 + 0, 0); 1075 + *res0 = compute_res0_bits(kvm, hcr_feat_map, 1076 + ARRAY_SIZE(hcr_feat_map), 0, 0); 1077 + *res0 |= HCR_EL2_RES0 | (mask & ~fixed); 1078 + *res1 = HCR_EL2_RES1 | (mask & fixed); 1079 + break; 1080 + default: 1081 + WARN_ON_ONCE(1); 1082 + *res0 = *res1 = 0; 1083 + break; 1084 + } 1085 + }

+387 -203

arch/arm64/kvm/emulate-nested.c

··· 622 622 const unsigned int line; 623 623 }; 624 624 625 + /* 626 + * WARNING: using ranges is a treacherous endeavour, as sysregs that 627 + * are part of an architectural range are not necessarily contiguous 628 + * in the [Op0,Op1,CRn,CRm,Ops] space. Tread carefully. 629 + */ 625 630 #define SR_RANGE_TRAP(sr_start, sr_end, trap_id) \ 626 631 { \ 627 632 .encoding = sr_start, \ ··· 1284 1279 __NR_FG_FILTER_IDS__ 1285 1280 }; 1286 1281 1287 - #define SR_FGF(sr, g, b, p, f) \ 1288 - { \ 1289 - .encoding = sr, \ 1290 - .end = sr, \ 1291 - .tc = { \ 1282 + #define __FGT(g, b, p, f) \ 1283 + { \ 1292 1284 .fgt = g ## _GROUP, \ 1293 1285 .bit = g ## _EL2_ ## b ## _SHIFT, \ 1294 1286 .pol = p, \ 1295 1287 .fgf = f, \ 1296 - }, \ 1288 + } 1289 + 1290 + #define FGT(g, b, p) __FGT(g, b, p, __NO_FGF__) 1291 + 1292 + /* 1293 + * See the warning next to SR_RANGE_TRAP(), and apply the same 1294 + * level of caution. 1295 + */ 1296 + #define SR_FGF_RANGE(sr, e, g, b, p, f) \ 1297 + { \ 1298 + .encoding = sr, \ 1299 + .end = e, \ 1300 + .tc = __FGT(g, b, p, f), \ 1297 1301 .line = __LINE__, \ 1298 1302 } 1299 1303 1300 - #define SR_FGT(sr, g, b, p) SR_FGF(sr, g, b, p, __NO_FGF__) 1304 + #define SR_FGF(sr, g, b, p, f) SR_FGF_RANGE(sr, sr, g, b, p, f) 1305 + #define SR_FGT(sr, g, b, p) SR_FGF_RANGE(sr, sr, g, b, p, __NO_FGF__) 1306 + #define SR_FGT_RANGE(sr, end, g, b, p) \ 1307 + SR_FGF_RANGE(sr, end, g, b, p, __NO_FGF__) 1301 1308 1302 1309 static const struct encoding_to_trap_config encoding_to_fgt[] __initconst = { 1303 1310 /* HFGRTR_EL2, HFGWTR_EL2 */ 1304 - SR_FGT(SYS_AMAIR2_EL1, HFGxTR, nAMAIR2_EL1, 0), 1305 - SR_FGT(SYS_MAIR2_EL1, HFGxTR, nMAIR2_EL1, 0), 1306 - SR_FGT(SYS_S2POR_EL1, HFGxTR, nS2POR_EL1, 0), 1307 - SR_FGT(SYS_POR_EL1, HFGxTR, nPOR_EL1, 0), 1308 - SR_FGT(SYS_POR_EL0, HFGxTR, nPOR_EL0, 0), 1309 - SR_FGT(SYS_PIR_EL1, HFGxTR, nPIR_EL1, 0), 1310 - SR_FGT(SYS_PIRE0_EL1, HFGxTR, nPIRE0_EL1, 0), 1311 - SR_FGT(SYS_RCWMASK_EL1, HFGxTR, nRCWMASK_EL1, 0), 1312 - SR_FGT(SYS_TPIDR2_EL0, HFGxTR, nTPIDR2_EL0, 0), 1313 - SR_FGT(SYS_SMPRI_EL1, HFGxTR, nSMPRI_EL1, 0), 1314 - SR_FGT(SYS_GCSCR_EL1, HFGxTR, nGCS_EL1, 0), 1315 - SR_FGT(SYS_GCSPR_EL1, HFGxTR, nGCS_EL1, 0), 1316 - SR_FGT(SYS_GCSCRE0_EL1, HFGxTR, nGCS_EL0, 0), 1317 - SR_FGT(SYS_GCSPR_EL0, HFGxTR, nGCS_EL0, 0), 1318 - SR_FGT(SYS_ACCDATA_EL1, HFGxTR, nACCDATA_EL1, 0), 1319 - SR_FGT(SYS_ERXADDR_EL1, HFGxTR, ERXADDR_EL1, 1), 1320 - SR_FGT(SYS_ERXPFGCDN_EL1, HFGxTR, ERXPFGCDN_EL1, 1), 1321 - SR_FGT(SYS_ERXPFGCTL_EL1, HFGxTR, ERXPFGCTL_EL1, 1), 1322 - SR_FGT(SYS_ERXPFGF_EL1, HFGxTR, ERXPFGF_EL1, 1), 1323 - SR_FGT(SYS_ERXMISC0_EL1, HFGxTR, ERXMISCn_EL1, 1), 1324 - SR_FGT(SYS_ERXMISC1_EL1, HFGxTR, ERXMISCn_EL1, 1), 1325 - SR_FGT(SYS_ERXMISC2_EL1, HFGxTR, ERXMISCn_EL1, 1), 1326 - SR_FGT(SYS_ERXMISC3_EL1, HFGxTR, ERXMISCn_EL1, 1), 1327 - SR_FGT(SYS_ERXSTATUS_EL1, HFGxTR, ERXSTATUS_EL1, 1), 1328 - SR_FGT(SYS_ERXCTLR_EL1, HFGxTR, ERXCTLR_EL1, 1), 1329 - SR_FGT(SYS_ERXFR_EL1, HFGxTR, ERXFR_EL1, 1), 1330 - SR_FGT(SYS_ERRSELR_EL1, HFGxTR, ERRSELR_EL1, 1), 1331 - SR_FGT(SYS_ERRIDR_EL1, HFGxTR, ERRIDR_EL1, 1), 1332 - SR_FGT(SYS_ICC_IGRPEN0_EL1, HFGxTR, ICC_IGRPENn_EL1, 1), 1333 - SR_FGT(SYS_ICC_IGRPEN1_EL1, HFGxTR, ICC_IGRPENn_EL1, 1), 1334 - SR_FGT(SYS_VBAR_EL1, HFGxTR, VBAR_EL1, 1), 1335 - SR_FGT(SYS_TTBR1_EL1, HFGxTR, TTBR1_EL1, 1), 1336 - SR_FGT(SYS_TTBR0_EL1, HFGxTR, TTBR0_EL1, 1), 1337 - SR_FGT(SYS_TPIDR_EL0, HFGxTR, TPIDR_EL0, 1), 1338 - SR_FGT(SYS_TPIDRRO_EL0, HFGxTR, TPIDRRO_EL0, 1), 1339 - SR_FGT(SYS_TPIDR_EL1, HFGxTR, TPIDR_EL1, 1), 1340 - SR_FGT(SYS_TCR_EL1, HFGxTR, TCR_EL1, 1), 1341 - SR_FGT(SYS_TCR2_EL1, HFGxTR, TCR_EL1, 1), 1342 - SR_FGT(SYS_SCXTNUM_EL0, HFGxTR, SCXTNUM_EL0, 1), 1343 - SR_FGT(SYS_SCXTNUM_EL1, HFGxTR, SCXTNUM_EL1, 1), 1344 - SR_FGT(SYS_SCTLR_EL1, HFGxTR, SCTLR_EL1, 1), 1345 - SR_FGT(SYS_REVIDR_EL1, HFGxTR, REVIDR_EL1, 1), 1346 - SR_FGT(SYS_PAR_EL1, HFGxTR, PAR_EL1, 1), 1347 - SR_FGT(SYS_MPIDR_EL1, HFGxTR, MPIDR_EL1, 1), 1348 - SR_FGT(SYS_MIDR_EL1, HFGxTR, MIDR_EL1, 1), 1349 - SR_FGT(SYS_MAIR_EL1, HFGxTR, MAIR_EL1, 1), 1350 - SR_FGT(SYS_LORSA_EL1, HFGxTR, LORSA_EL1, 1), 1351 - SR_FGT(SYS_LORN_EL1, HFGxTR, LORN_EL1, 1), 1352 - SR_FGT(SYS_LORID_EL1, HFGxTR, LORID_EL1, 1), 1353 - SR_FGT(SYS_LOREA_EL1, HFGxTR, LOREA_EL1, 1), 1354 - SR_FGT(SYS_LORC_EL1, HFGxTR, LORC_EL1, 1), 1355 - SR_FGT(SYS_ISR_EL1, HFGxTR, ISR_EL1, 1), 1356 - SR_FGT(SYS_FAR_EL1, HFGxTR, FAR_EL1, 1), 1357 - SR_FGT(SYS_ESR_EL1, HFGxTR, ESR_EL1, 1), 1358 - SR_FGT(SYS_DCZID_EL0, HFGxTR, DCZID_EL0, 1), 1359 - SR_FGT(SYS_CTR_EL0, HFGxTR, CTR_EL0, 1), 1360 - SR_FGT(SYS_CSSELR_EL1, HFGxTR, CSSELR_EL1, 1), 1361 - SR_FGT(SYS_CPACR_EL1, HFGxTR, CPACR_EL1, 1), 1362 - SR_FGT(SYS_CONTEXTIDR_EL1, HFGxTR, CONTEXTIDR_EL1, 1), 1363 - SR_FGT(SYS_CLIDR_EL1, HFGxTR, CLIDR_EL1, 1), 1364 - SR_FGT(SYS_CCSIDR_EL1, HFGxTR, CCSIDR_EL1, 1), 1365 - SR_FGT(SYS_APIBKEYLO_EL1, HFGxTR, APIBKey, 1), 1366 - SR_FGT(SYS_APIBKEYHI_EL1, HFGxTR, APIBKey, 1), 1367 - SR_FGT(SYS_APIAKEYLO_EL1, HFGxTR, APIAKey, 1), 1368 - SR_FGT(SYS_APIAKEYHI_EL1, HFGxTR, APIAKey, 1), 1369 - SR_FGT(SYS_APGAKEYLO_EL1, HFGxTR, APGAKey, 1), 1370 - SR_FGT(SYS_APGAKEYHI_EL1, HFGxTR, APGAKey, 1), 1371 - SR_FGT(SYS_APDBKEYLO_EL1, HFGxTR, APDBKey, 1), 1372 - SR_FGT(SYS_APDBKEYHI_EL1, HFGxTR, APDBKey, 1), 1373 - SR_FGT(SYS_APDAKEYLO_EL1, HFGxTR, APDAKey, 1), 1374 - SR_FGT(SYS_APDAKEYHI_EL1, HFGxTR, APDAKey, 1), 1375 - SR_FGT(SYS_AMAIR_EL1, HFGxTR, AMAIR_EL1, 1), 1376 - SR_FGT(SYS_AIDR_EL1, HFGxTR, AIDR_EL1, 1), 1377 - SR_FGT(SYS_AFSR1_EL1, HFGxTR, AFSR1_EL1, 1), 1378 - SR_FGT(SYS_AFSR0_EL1, HFGxTR, AFSR0_EL1, 1), 1311 + SR_FGT(SYS_AMAIR2_EL1, HFGRTR, nAMAIR2_EL1, 0), 1312 + SR_FGT(SYS_MAIR2_EL1, HFGRTR, nMAIR2_EL1, 0), 1313 + SR_FGT(SYS_S2POR_EL1, HFGRTR, nS2POR_EL1, 0), 1314 + SR_FGT(SYS_POR_EL1, HFGRTR, nPOR_EL1, 0), 1315 + SR_FGT(SYS_POR_EL0, HFGRTR, nPOR_EL0, 0), 1316 + SR_FGT(SYS_PIR_EL1, HFGRTR, nPIR_EL1, 0), 1317 + SR_FGT(SYS_PIRE0_EL1, HFGRTR, nPIRE0_EL1, 0), 1318 + SR_FGT(SYS_RCWMASK_EL1, HFGRTR, nRCWMASK_EL1, 0), 1319 + SR_FGT(SYS_TPIDR2_EL0, HFGRTR, nTPIDR2_EL0, 0), 1320 + SR_FGT(SYS_SMPRI_EL1, HFGRTR, nSMPRI_EL1, 0), 1321 + SR_FGT(SYS_GCSCR_EL1, HFGRTR, nGCS_EL1, 0), 1322 + SR_FGT(SYS_GCSPR_EL1, HFGRTR, nGCS_EL1, 0), 1323 + SR_FGT(SYS_GCSCRE0_EL1, HFGRTR, nGCS_EL0, 0), 1324 + SR_FGT(SYS_GCSPR_EL0, HFGRTR, nGCS_EL0, 0), 1325 + SR_FGT(SYS_ACCDATA_EL1, HFGRTR, nACCDATA_EL1, 0), 1326 + SR_FGT(SYS_ERXADDR_EL1, HFGRTR, ERXADDR_EL1, 1), 1327 + SR_FGT(SYS_ERXPFGCDN_EL1, HFGRTR, ERXPFGCDN_EL1, 1), 1328 + SR_FGT(SYS_ERXPFGCTL_EL1, HFGRTR, ERXPFGCTL_EL1, 1), 1329 + SR_FGT(SYS_ERXPFGF_EL1, HFGRTR, ERXPFGF_EL1, 1), 1330 + SR_FGT(SYS_ERXMISC0_EL1, HFGRTR, ERXMISCn_EL1, 1), 1331 + SR_FGT(SYS_ERXMISC1_EL1, HFGRTR, ERXMISCn_EL1, 1), 1332 + SR_FGT(SYS_ERXMISC2_EL1, HFGRTR, ERXMISCn_EL1, 1), 1333 + SR_FGT(SYS_ERXMISC3_EL1, HFGRTR, ERXMISCn_EL1, 1), 1334 + SR_FGT(SYS_ERXSTATUS_EL1, HFGRTR, ERXSTATUS_EL1, 1), 1335 + SR_FGT(SYS_ERXCTLR_EL1, HFGRTR, ERXCTLR_EL1, 1), 1336 + SR_FGT(SYS_ERXFR_EL1, HFGRTR, ERXFR_EL1, 1), 1337 + SR_FGT(SYS_ERRSELR_EL1, HFGRTR, ERRSELR_EL1, 1), 1338 + SR_FGT(SYS_ERRIDR_EL1, HFGRTR, ERRIDR_EL1, 1), 1339 + SR_FGT(SYS_ICC_IGRPEN0_EL1, HFGRTR, ICC_IGRPENn_EL1, 1), 1340 + SR_FGT(SYS_ICC_IGRPEN1_EL1, HFGRTR, ICC_IGRPENn_EL1, 1), 1341 + SR_FGT(SYS_VBAR_EL1, HFGRTR, VBAR_EL1, 1), 1342 + SR_FGT(SYS_TTBR1_EL1, HFGRTR, TTBR1_EL1, 1), 1343 + SR_FGT(SYS_TTBR0_EL1, HFGRTR, TTBR0_EL1, 1), 1344 + SR_FGT(SYS_TPIDR_EL0, HFGRTR, TPIDR_EL0, 1), 1345 + SR_FGT(SYS_TPIDRRO_EL0, HFGRTR, TPIDRRO_EL0, 1), 1346 + SR_FGT(SYS_TPIDR_EL1, HFGRTR, TPIDR_EL1, 1), 1347 + SR_FGT(SYS_TCR_EL1, HFGRTR, TCR_EL1, 1), 1348 + SR_FGT(SYS_TCR2_EL1, HFGRTR, TCR_EL1, 1), 1349 + SR_FGT(SYS_SCXTNUM_EL0, HFGRTR, SCXTNUM_EL0, 1), 1350 + SR_FGT(SYS_SCXTNUM_EL1, HFGRTR, SCXTNUM_EL1, 1), 1351 + SR_FGT(SYS_SCTLR_EL1, HFGRTR, SCTLR_EL1, 1), 1352 + SR_FGT(SYS_REVIDR_EL1, HFGRTR, REVIDR_EL1, 1), 1353 + SR_FGT(SYS_PAR_EL1, HFGRTR, PAR_EL1, 1), 1354 + SR_FGT(SYS_MPIDR_EL1, HFGRTR, MPIDR_EL1, 1), 1355 + SR_FGT(SYS_MIDR_EL1, HFGRTR, MIDR_EL1, 1), 1356 + SR_FGT(SYS_MAIR_EL1, HFGRTR, MAIR_EL1, 1), 1357 + SR_FGT(SYS_LORSA_EL1, HFGRTR, LORSA_EL1, 1), 1358 + SR_FGT(SYS_LORN_EL1, HFGRTR, LORN_EL1, 1), 1359 + SR_FGT(SYS_LORID_EL1, HFGRTR, LORID_EL1, 1), 1360 + SR_FGT(SYS_LOREA_EL1, HFGRTR, LOREA_EL1, 1), 1361 + SR_FGT(SYS_LORC_EL1, HFGRTR, LORC_EL1, 1), 1362 + SR_FGT(SYS_ISR_EL1, HFGRTR, ISR_EL1, 1), 1363 + SR_FGT(SYS_FAR_EL1, HFGRTR, FAR_EL1, 1), 1364 + SR_FGT(SYS_ESR_EL1, HFGRTR, ESR_EL1, 1), 1365 + SR_FGT(SYS_DCZID_EL0, HFGRTR, DCZID_EL0, 1), 1366 + SR_FGT(SYS_CTR_EL0, HFGRTR, CTR_EL0, 1), 1367 + SR_FGT(SYS_CSSELR_EL1, HFGRTR, CSSELR_EL1, 1), 1368 + SR_FGT(SYS_CPACR_EL1, HFGRTR, CPACR_EL1, 1), 1369 + SR_FGT(SYS_CONTEXTIDR_EL1, HFGRTR, CONTEXTIDR_EL1, 1), 1370 + SR_FGT(SYS_CLIDR_EL1, HFGRTR, CLIDR_EL1, 1), 1371 + SR_FGT(SYS_CCSIDR_EL1, HFGRTR, CCSIDR_EL1, 1), 1372 + SR_FGT(SYS_APIBKEYLO_EL1, HFGRTR, APIBKey, 1), 1373 + SR_FGT(SYS_APIBKEYHI_EL1, HFGRTR, APIBKey, 1), 1374 + SR_FGT(SYS_APIAKEYLO_EL1, HFGRTR, APIAKey, 1), 1375 + SR_FGT(SYS_APIAKEYHI_EL1, HFGRTR, APIAKey, 1), 1376 + SR_FGT(SYS_APGAKEYLO_EL1, HFGRTR, APGAKey, 1), 1377 + SR_FGT(SYS_APGAKEYHI_EL1, HFGRTR, APGAKey, 1), 1378 + SR_FGT(SYS_APDBKEYLO_EL1, HFGRTR, APDBKey, 1), 1379 + SR_FGT(SYS_APDBKEYHI_EL1, HFGRTR, APDBKey, 1), 1380 + SR_FGT(SYS_APDAKEYLO_EL1, HFGRTR, APDAKey, 1), 1381 + SR_FGT(SYS_APDAKEYHI_EL1, HFGRTR, APDAKey, 1), 1382 + SR_FGT(SYS_AMAIR_EL1, HFGRTR, AMAIR_EL1, 1), 1383 + SR_FGT(SYS_AIDR_EL1, HFGRTR, AIDR_EL1, 1), 1384 + SR_FGT(SYS_AFSR1_EL1, HFGRTR, AFSR1_EL1, 1), 1385 + SR_FGT(SYS_AFSR0_EL1, HFGRTR, AFSR0_EL1, 1), 1386 + 1387 + /* HFGRTR2_EL2, HFGWTR2_EL2 */ 1388 + SR_FGT(SYS_ACTLRALIAS_EL1, HFGRTR2, nACTLRALIAS_EL1, 0), 1389 + SR_FGT(SYS_ACTLRMASK_EL1, HFGRTR2, nACTLRMASK_EL1, 0), 1390 + SR_FGT(SYS_CPACRALIAS_EL1, HFGRTR2, nCPACRALIAS_EL1, 0), 1391 + SR_FGT(SYS_CPACRMASK_EL1, HFGRTR2, nCPACRMASK_EL1, 0), 1392 + SR_FGT(SYS_PFAR_EL1, HFGRTR2, nPFAR_EL1, 0), 1393 + SR_FGT(SYS_RCWSMASK_EL1, HFGRTR2, nRCWSMASK_EL1, 0), 1394 + SR_FGT(SYS_SCTLR2ALIAS_EL1, HFGRTR2, nSCTLRALIAS2_EL1, 0), 1395 + SR_FGT(SYS_SCTLR2MASK_EL1, HFGRTR2, nSCTLR2MASK_EL1, 0), 1396 + SR_FGT(SYS_SCTLRALIAS_EL1, HFGRTR2, nSCTLRALIAS_EL1, 0), 1397 + SR_FGT(SYS_SCTLRMASK_EL1, HFGRTR2, nSCTLRMASK_EL1, 0), 1398 + SR_FGT(SYS_TCR2ALIAS_EL1, HFGRTR2, nTCR2ALIAS_EL1, 0), 1399 + SR_FGT(SYS_TCR2MASK_EL1, HFGRTR2, nTCR2MASK_EL1, 0), 1400 + SR_FGT(SYS_TCRALIAS_EL1, HFGRTR2, nTCRALIAS_EL1, 0), 1401 + SR_FGT(SYS_TCRMASK_EL1, HFGRTR2, nTCRMASK_EL1, 0), 1402 + SR_FGT(SYS_ERXGSR_EL1, HFGRTR2, nERXGSR_EL1, 0), 1403 + 1379 1404 /* HFGITR_EL2 */ 1380 1405 SR_FGT(OP_AT_S1E1A, HFGITR, ATS1E1A, 1), 1381 1406 SR_FGT(OP_COSP_RCTX, HFGITR, COSPRCTX, 1), ··· 1515 1480 SR_FGT(SYS_IC_IVAU, HFGITR, ICIVAU, 1), 1516 1481 SR_FGT(SYS_IC_IALLU, HFGITR, ICIALLU, 1), 1517 1482 SR_FGT(SYS_IC_IALLUIS, HFGITR, ICIALLUIS, 1), 1483 + 1484 + /* HFGITR2_EL2 */ 1485 + SR_FGT(SYS_DC_CIGDVAPS, HFGITR2, nDCCIVAPS, 0), 1486 + SR_FGT(SYS_DC_CIVAPS, HFGITR2, nDCCIVAPS, 0), 1487 + 1518 1488 /* HDFGRTR_EL2 */ 1519 1489 SR_FGT(SYS_PMBIDR_EL1, HDFGRTR, PMBIDR_EL1, 1), 1520 1490 SR_FGT(SYS_PMSNEVFR_EL1, HDFGRTR, nPMSNEVFR_EL1, 0), ··· 1829 1789 SR_FGT(SYS_PMCNTENSET_EL0, HDFGRTR, PMCNTEN, 1), 1830 1790 SR_FGT(SYS_PMCCNTR_EL0, HDFGRTR, PMCCNTR_EL0, 1), 1831 1791 SR_FGT(SYS_PMCCFILTR_EL0, HDFGRTR, PMCCFILTR_EL0, 1), 1832 - SR_FGT(SYS_PMEVTYPERn_EL0(0), HDFGRTR, PMEVTYPERn_EL0, 1), 1833 - SR_FGT(SYS_PMEVTYPERn_EL0(1), HDFGRTR, PMEVTYPERn_EL0, 1), 1834 - SR_FGT(SYS_PMEVTYPERn_EL0(2), HDFGRTR, PMEVTYPERn_EL0, 1), 1835 - SR_FGT(SYS_PMEVTYPERn_EL0(3), HDFGRTR, PMEVTYPERn_EL0, 1), 1836 - SR_FGT(SYS_PMEVTYPERn_EL0(4), HDFGRTR, PMEVTYPERn_EL0, 1), 1837 - SR_FGT(SYS_PMEVTYPERn_EL0(5), HDFGRTR, PMEVTYPERn_EL0, 1), 1838 - SR_FGT(SYS_PMEVTYPERn_EL0(6), HDFGRTR, PMEVTYPERn_EL0, 1), 1839 - SR_FGT(SYS_PMEVTYPERn_EL0(7), HDFGRTR, PMEVTYPERn_EL0, 1), 1840 - SR_FGT(SYS_PMEVTYPERn_EL0(8), HDFGRTR, PMEVTYPERn_EL0, 1), 1841 - SR_FGT(SYS_PMEVTYPERn_EL0(9), HDFGRTR, PMEVTYPERn_EL0, 1), 1842 - SR_FGT(SYS_PMEVTYPERn_EL0(10), HDFGRTR, PMEVTYPERn_EL0, 1), 1843 - SR_FGT(SYS_PMEVTYPERn_EL0(11), HDFGRTR, PMEVTYPERn_EL0, 1), 1844 - SR_FGT(SYS_PMEVTYPERn_EL0(12), HDFGRTR, PMEVTYPERn_EL0, 1), 1845 - SR_FGT(SYS_PMEVTYPERn_EL0(13), HDFGRTR, PMEVTYPERn_EL0, 1), 1846 - SR_FGT(SYS_PMEVTYPERn_EL0(14), HDFGRTR, PMEVTYPERn_EL0, 1), 1847 - SR_FGT(SYS_PMEVTYPERn_EL0(15), HDFGRTR, PMEVTYPERn_EL0, 1), 1848 - SR_FGT(SYS_PMEVTYPERn_EL0(16), HDFGRTR, PMEVTYPERn_EL0, 1), 1849 - SR_FGT(SYS_PMEVTYPERn_EL0(17), HDFGRTR, PMEVTYPERn_EL0, 1), 1850 - SR_FGT(SYS_PMEVTYPERn_EL0(18), HDFGRTR, PMEVTYPERn_EL0, 1), 1851 - SR_FGT(SYS_PMEVTYPERn_EL0(19), HDFGRTR, PMEVTYPERn_EL0, 1), 1852 - SR_FGT(SYS_PMEVTYPERn_EL0(20), HDFGRTR, PMEVTYPERn_EL0, 1), 1853 - SR_FGT(SYS_PMEVTYPERn_EL0(21), HDFGRTR, PMEVTYPERn_EL0, 1), 1854 - SR_FGT(SYS_PMEVTYPERn_EL0(22), HDFGRTR, PMEVTYPERn_EL0, 1), 1855 - SR_FGT(SYS_PMEVTYPERn_EL0(23), HDFGRTR, PMEVTYPERn_EL0, 1), 1856 - SR_FGT(SYS_PMEVTYPERn_EL0(24), HDFGRTR, PMEVTYPERn_EL0, 1), 1857 - SR_FGT(SYS_PMEVTYPERn_EL0(25), HDFGRTR, PMEVTYPERn_EL0, 1), 1858 - SR_FGT(SYS_PMEVTYPERn_EL0(26), HDFGRTR, PMEVTYPERn_EL0, 1), 1859 - SR_FGT(SYS_PMEVTYPERn_EL0(27), HDFGRTR, PMEVTYPERn_EL0, 1), 1860 - SR_FGT(SYS_PMEVTYPERn_EL0(28), HDFGRTR, PMEVTYPERn_EL0, 1), 1861 - SR_FGT(SYS_PMEVTYPERn_EL0(29), HDFGRTR, PMEVTYPERn_EL0, 1), 1862 - SR_FGT(SYS_PMEVTYPERn_EL0(30), HDFGRTR, PMEVTYPERn_EL0, 1), 1863 - SR_FGT(SYS_PMEVCNTRn_EL0(0), HDFGRTR, PMEVCNTRn_EL0, 1), 1864 - SR_FGT(SYS_PMEVCNTRn_EL0(1), HDFGRTR, PMEVCNTRn_EL0, 1), 1865 - SR_FGT(SYS_PMEVCNTRn_EL0(2), HDFGRTR, PMEVCNTRn_EL0, 1), 1866 - SR_FGT(SYS_PMEVCNTRn_EL0(3), HDFGRTR, PMEVCNTRn_EL0, 1), 1867 - SR_FGT(SYS_PMEVCNTRn_EL0(4), HDFGRTR, PMEVCNTRn_EL0, 1), 1868 - SR_FGT(SYS_PMEVCNTRn_EL0(5), HDFGRTR, PMEVCNTRn_EL0, 1), 1869 - SR_FGT(SYS_PMEVCNTRn_EL0(6), HDFGRTR, PMEVCNTRn_EL0, 1), 1870 - SR_FGT(SYS_PMEVCNTRn_EL0(7), HDFGRTR, PMEVCNTRn_EL0, 1), 1871 - SR_FGT(SYS_PMEVCNTRn_EL0(8), HDFGRTR, PMEVCNTRn_EL0, 1), 1872 - SR_FGT(SYS_PMEVCNTRn_EL0(9), HDFGRTR, PMEVCNTRn_EL0, 1), 1873 - SR_FGT(SYS_PMEVCNTRn_EL0(10), HDFGRTR, PMEVCNTRn_EL0, 1), 1874 - SR_FGT(SYS_PMEVCNTRn_EL0(11), HDFGRTR, PMEVCNTRn_EL0, 1), 1875 - SR_FGT(SYS_PMEVCNTRn_EL0(12), HDFGRTR, PMEVCNTRn_EL0, 1), 1876 - SR_FGT(SYS_PMEVCNTRn_EL0(13), HDFGRTR, PMEVCNTRn_EL0, 1), 1877 - SR_FGT(SYS_PMEVCNTRn_EL0(14), HDFGRTR, PMEVCNTRn_EL0, 1), 1878 - SR_FGT(SYS_PMEVCNTRn_EL0(15), HDFGRTR, PMEVCNTRn_EL0, 1), 1879 - SR_FGT(SYS_PMEVCNTRn_EL0(16), HDFGRTR, PMEVCNTRn_EL0, 1), 1880 - SR_FGT(SYS_PMEVCNTRn_EL0(17), HDFGRTR, PMEVCNTRn_EL0, 1), 1881 - SR_FGT(SYS_PMEVCNTRn_EL0(18), HDFGRTR, PMEVCNTRn_EL0, 1), 1882 - SR_FGT(SYS_PMEVCNTRn_EL0(19), HDFGRTR, PMEVCNTRn_EL0, 1), 1883 - SR_FGT(SYS_PMEVCNTRn_EL0(20), HDFGRTR, PMEVCNTRn_EL0, 1), 1884 - SR_FGT(SYS_PMEVCNTRn_EL0(21), HDFGRTR, PMEVCNTRn_EL0, 1), 1885 - SR_FGT(SYS_PMEVCNTRn_EL0(22), HDFGRTR, PMEVCNTRn_EL0, 1), 1886 - SR_FGT(SYS_PMEVCNTRn_EL0(23), HDFGRTR, PMEVCNTRn_EL0, 1), 1887 - SR_FGT(SYS_PMEVCNTRn_EL0(24), HDFGRTR, PMEVCNTRn_EL0, 1), 1888 - SR_FGT(SYS_PMEVCNTRn_EL0(25), HDFGRTR, PMEVCNTRn_EL0, 1), 1889 - SR_FGT(SYS_PMEVCNTRn_EL0(26), HDFGRTR, PMEVCNTRn_EL0, 1), 1890 - SR_FGT(SYS_PMEVCNTRn_EL0(27), HDFGRTR, PMEVCNTRn_EL0, 1), 1891 - SR_FGT(SYS_PMEVCNTRn_EL0(28), HDFGRTR, PMEVCNTRn_EL0, 1), 1892 - SR_FGT(SYS_PMEVCNTRn_EL0(29), HDFGRTR, PMEVCNTRn_EL0, 1), 1893 - SR_FGT(SYS_PMEVCNTRn_EL0(30), HDFGRTR, PMEVCNTRn_EL0, 1), 1792 + SR_FGT_RANGE(SYS_PMEVTYPERn_EL0(0), 1793 + SYS_PMEVTYPERn_EL0(30), 1794 + HDFGRTR, PMEVTYPERn_EL0, 1), 1795 + SR_FGT_RANGE(SYS_PMEVCNTRn_EL0(0), 1796 + SYS_PMEVCNTRn_EL0(30), 1797 + HDFGRTR, PMEVCNTRn_EL0, 1), 1894 1798 SR_FGT(SYS_OSDLR_EL1, HDFGRTR, OSDLR_EL1, 1), 1895 1799 SR_FGT(SYS_OSECCR_EL1, HDFGRTR, OSECCR_EL1, 1), 1896 1800 SR_FGT(SYS_OSLSR_EL1, HDFGRTR, OSLSR_EL1, 1), ··· 1912 1928 SR_FGT(SYS_DBGBCRn_EL1(13), HDFGRTR, DBGBCRn_EL1, 1), 1913 1929 SR_FGT(SYS_DBGBCRn_EL1(14), HDFGRTR, DBGBCRn_EL1, 1), 1914 1930 SR_FGT(SYS_DBGBCRn_EL1(15), HDFGRTR, DBGBCRn_EL1, 1), 1931 + 1932 + /* HDFGRTR2_EL2 */ 1933 + SR_FGT(SYS_MDSELR_EL1, HDFGRTR2, nMDSELR_EL1, 0), 1934 + SR_FGT(SYS_MDSTEPOP_EL1, HDFGRTR2, nMDSTEPOP_EL1, 0), 1935 + SR_FGT(SYS_PMCCNTSVR_EL1, HDFGRTR2, nPMSSDATA, 0), 1936 + SR_FGT_RANGE(SYS_PMEVCNTSVRn_EL1(0), 1937 + SYS_PMEVCNTSVRn_EL1(30), 1938 + HDFGRTR2, nPMSSDATA, 0), 1939 + SR_FGT(SYS_PMICNTSVR_EL1, HDFGRTR2, nPMSSDATA, 0), 1940 + SR_FGT(SYS_PMECR_EL1, HDFGRTR2, nPMECR_EL1, 0), 1941 + SR_FGT(SYS_PMIAR_EL1, HDFGRTR2, nPMIAR_EL1, 0), 1942 + SR_FGT(SYS_PMICFILTR_EL0, HDFGRTR2, nPMICFILTR_EL0, 0), 1943 + SR_FGT(SYS_PMICNTR_EL0, HDFGRTR2, nPMICNTR_EL0, 0), 1944 + SR_FGT(SYS_PMSSCR_EL1, HDFGRTR2, nPMSSCR_EL1, 0), 1945 + SR_FGT(SYS_PMUACR_EL1, HDFGRTR2, nPMUACR_EL1, 0), 1946 + SR_FGT(SYS_SPMACCESSR_EL1, HDFGRTR2, nSPMACCESSR_EL1, 0), 1947 + SR_FGT(SYS_SPMCFGR_EL1, HDFGRTR2, nSPMID, 0), 1948 + SR_FGT(SYS_SPMDEVARCH_EL1, HDFGRTR2, nSPMID, 0), 1949 + SR_FGT(SYS_SPMCGCRn_EL1(0), HDFGRTR2, nSPMID, 0), 1950 + SR_FGT(SYS_SPMCGCRn_EL1(1), HDFGRTR2, nSPMID, 0), 1951 + SR_FGT(SYS_SPMIIDR_EL1, HDFGRTR2, nSPMID, 0), 1952 + SR_FGT(SYS_SPMCNTENCLR_EL0, HDFGRTR2, nSPMCNTEN, 0), 1953 + SR_FGT(SYS_SPMCNTENSET_EL0, HDFGRTR2, nSPMCNTEN, 0), 1954 + SR_FGT(SYS_SPMCR_EL0, HDFGRTR2, nSPMCR_EL0, 0), 1955 + SR_FGT(SYS_SPMDEVAFF_EL1, HDFGRTR2, nSPMDEVAFF_EL1, 0), 1956 + /* 1957 + * We have up to 64 of these registers in ranges of 16, banked via 1958 + * SPMSELR_EL0.BANK. We're only concerned with the accessors here, 1959 + * not the architectural registers. 1960 + */ 1961 + SR_FGT_RANGE(SYS_SPMEVCNTRn_EL0(0), 1962 + SYS_SPMEVCNTRn_EL0(15), 1963 + HDFGRTR2, nSPMEVCNTRn_EL0, 0), 1964 + SR_FGT_RANGE(SYS_SPMEVFILT2Rn_EL0(0), 1965 + SYS_SPMEVFILT2Rn_EL0(15), 1966 + HDFGRTR2, nSPMEVTYPERn_EL0, 0), 1967 + SR_FGT_RANGE(SYS_SPMEVFILTRn_EL0(0), 1968 + SYS_SPMEVFILTRn_EL0(15), 1969 + HDFGRTR2, nSPMEVTYPERn_EL0, 0), 1970 + SR_FGT_RANGE(SYS_SPMEVTYPERn_EL0(0), 1971 + SYS_SPMEVTYPERn_EL0(15), 1972 + HDFGRTR2, nSPMEVTYPERn_EL0, 0), 1973 + SR_FGT(SYS_SPMINTENCLR_EL1, HDFGRTR2, nSPMINTEN, 0), 1974 + SR_FGT(SYS_SPMINTENSET_EL1, HDFGRTR2, nSPMINTEN, 0), 1975 + SR_FGT(SYS_SPMOVSCLR_EL0, HDFGRTR2, nSPMOVS, 0), 1976 + SR_FGT(SYS_SPMOVSSET_EL0, HDFGRTR2, nSPMOVS, 0), 1977 + SR_FGT(SYS_SPMSCR_EL1, HDFGRTR2, nSPMSCR_EL1, 0), 1978 + SR_FGT(SYS_SPMSELR_EL0, HDFGRTR2, nSPMSELR_EL0, 0), 1979 + SR_FGT(SYS_TRCITECR_EL1, HDFGRTR2, nTRCITECR_EL1, 0), 1980 + SR_FGT(SYS_PMBMAR_EL1, HDFGRTR2, nPMBMAR_EL1, 0), 1981 + SR_FGT(SYS_PMSDSFR_EL1, HDFGRTR2, nPMSDSFR_EL1, 0), 1982 + SR_FGT(SYS_TRBMPAM_EL1, HDFGRTR2, nTRBMPAM_EL1, 0), 1983 + 1915 1984 /* 1916 1985 * HDFGWTR_EL2 1917 1986 * ··· 1975 1938 * read-side mappings, and only the write-side mappings that 1976 1939 * differ from the read side, and the trap handler will pick 1977 1940 * the correct shadow register based on the access type. 1941 + * 1942 + * Same model applies to the FEAT_FGT2 registers. 1978 1943 */ 1979 1944 SR_FGT(SYS_TRFCR_EL1, HDFGWTR, TRFCR_EL1, 1), 1980 1945 SR_FGT(SYS_TRCOSLAR, HDFGWTR, TRCOSLAR, 1), 1981 1946 SR_FGT(SYS_PMCR_EL0, HDFGWTR, PMCR_EL0, 1), 1982 1947 SR_FGT(SYS_PMSWINC_EL0, HDFGWTR, PMSWINC_EL0, 1), 1983 1948 SR_FGT(SYS_OSLAR_EL1, HDFGWTR, OSLAR_EL1, 1), 1949 + 1950 + /* HDFGWTR2_EL2 */ 1951 + SR_FGT(SYS_PMZR_EL0, HDFGWTR2, nPMZR_EL0, 0), 1952 + SR_FGT(SYS_SPMZR_EL0, HDFGWTR2, nSPMEVCNTRn_EL0, 0), 1953 + 1984 1954 /* 1985 1955 * HAFGRTR_EL2 1986 1956 */ ··· 2031 1987 SR_FGT(SYS_AMEVCNTR0_EL0(2), HAFGRTR, AMEVCNTR02_EL0, 1), 2032 1988 SR_FGT(SYS_AMEVCNTR0_EL0(1), HAFGRTR, AMEVCNTR01_EL0, 1), 2033 1989 SR_FGT(SYS_AMEVCNTR0_EL0(0), HAFGRTR, AMEVCNTR00_EL0, 1), 1990 + }; 1991 + 1992 + /* 1993 + * Additional FGTs that do not fire with ESR_EL2.EC==0x18. This table 1994 + * isn't used for exception routing, but only as a promise that the 1995 + * trap is handled somewhere else. 1996 + */ 1997 + static const union trap_config non_0x18_fgt[] __initconst = { 1998 + FGT(HFGITR, PSBCSYNC, 1), 1999 + FGT(HFGITR, nGCSSTR_EL1, 0), 2000 + FGT(HFGITR, SVC_EL1, 1), 2001 + FGT(HFGITR, SVC_EL0, 1), 2002 + FGT(HFGITR, ERET, 1), 2003 + FGT(HFGITR2, TSBCSYNC, 1), 2034 2004 }; 2035 2005 2036 2006 static union trap_config get_trap_config(u32 sysreg) ··· 2091 2033 return sys_reg(op0 + 1, 0, 0, 0, 0); 2092 2034 } 2093 2035 2036 + #define FGT_MASKS(__n, __m) \ 2037 + struct fgt_masks __n = { .str = #__m, .res0 = __m, } 2038 + 2039 + FGT_MASKS(hfgrtr_masks, HFGRTR_EL2_RES0); 2040 + FGT_MASKS(hfgwtr_masks, HFGWTR_EL2_RES0); 2041 + FGT_MASKS(hfgitr_masks, HFGITR_EL2_RES0); 2042 + FGT_MASKS(hdfgrtr_masks, HDFGRTR_EL2_RES0); 2043 + FGT_MASKS(hdfgwtr_masks, HDFGWTR_EL2_RES0); 2044 + FGT_MASKS(hafgrtr_masks, HAFGRTR_EL2_RES0); 2045 + FGT_MASKS(hfgrtr2_masks, HFGRTR2_EL2_RES0); 2046 + FGT_MASKS(hfgwtr2_masks, HFGWTR2_EL2_RES0); 2047 + FGT_MASKS(hfgitr2_masks, HFGITR2_EL2_RES0); 2048 + FGT_MASKS(hdfgrtr2_masks, HDFGRTR2_EL2_RES0); 2049 + FGT_MASKS(hdfgwtr2_masks, HDFGWTR2_EL2_RES0); 2050 + 2051 + static __init bool aggregate_fgt(union trap_config tc) 2052 + { 2053 + struct fgt_masks *rmasks, *wmasks; 2054 + 2055 + switch (tc.fgt) { 2056 + case HFGRTR_GROUP: 2057 + rmasks = &hfgrtr_masks; 2058 + wmasks = &hfgwtr_masks; 2059 + break; 2060 + case HDFGRTR_GROUP: 2061 + rmasks = &hdfgrtr_masks; 2062 + wmasks = &hdfgwtr_masks; 2063 + break; 2064 + case HAFGRTR_GROUP: 2065 + rmasks = &hafgrtr_masks; 2066 + wmasks = NULL; 2067 + break; 2068 + case HFGITR_GROUP: 2069 + rmasks = &hfgitr_masks; 2070 + wmasks = NULL; 2071 + break; 2072 + case HFGRTR2_GROUP: 2073 + rmasks = &hfgrtr2_masks; 2074 + wmasks = &hfgwtr2_masks; 2075 + break; 2076 + case HDFGRTR2_GROUP: 2077 + rmasks = &hdfgrtr2_masks; 2078 + wmasks = &hdfgwtr2_masks; 2079 + break; 2080 + case HFGITR2_GROUP: 2081 + rmasks = &hfgitr2_masks; 2082 + wmasks = NULL; 2083 + break; 2084 + } 2085 + 2086 + /* 2087 + * A bit can be reserved in either the R or W register, but 2088 + * not both. 2089 + */ 2090 + if ((BIT(tc.bit) & rmasks->res0) && 2091 + (!wmasks || (BIT(tc.bit) & wmasks->res0))) 2092 + return false; 2093 + 2094 + if (tc.pol) 2095 + rmasks->mask |= BIT(tc.bit) & ~rmasks->res0; 2096 + else 2097 + rmasks->nmask |= BIT(tc.bit) & ~rmasks->res0; 2098 + 2099 + if (wmasks) { 2100 + if (tc.pol) 2101 + wmasks->mask |= BIT(tc.bit) & ~wmasks->res0; 2102 + else 2103 + wmasks->nmask |= BIT(tc.bit) & ~wmasks->res0; 2104 + } 2105 + 2106 + return true; 2107 + } 2108 + 2109 + static __init int check_fgt_masks(struct fgt_masks *masks) 2110 + { 2111 + unsigned long duplicate = masks->mask & masks->nmask; 2112 + u64 res0 = masks->res0; 2113 + int ret = 0; 2114 + 2115 + if (duplicate) { 2116 + int i; 2117 + 2118 + for_each_set_bit(i, &duplicate, 64) { 2119 + kvm_err("%s[%d] bit has both polarities\n", 2120 + masks->str, i); 2121 + } 2122 + 2123 + ret = -EINVAL; 2124 + } 2125 + 2126 + masks->res0 = ~(masks->mask | masks->nmask); 2127 + if (masks->res0 != res0) 2128 + kvm_info("Implicit %s = %016llx, expecting %016llx\n", 2129 + masks->str, masks->res0, res0); 2130 + 2131 + return ret; 2132 + } 2133 + 2134 + static __init int check_all_fgt_masks(int ret) 2135 + { 2136 + static struct fgt_masks * const masks[] __initconst = { 2137 + &hfgrtr_masks, 2138 + &hfgwtr_masks, 2139 + &hfgitr_masks, 2140 + &hdfgrtr_masks, 2141 + &hdfgwtr_masks, 2142 + &hafgrtr_masks, 2143 + &hfgrtr2_masks, 2144 + &hfgwtr2_masks, 2145 + &hfgitr2_masks, 2146 + &hdfgrtr2_masks, 2147 + &hdfgwtr2_masks, 2148 + }; 2149 + int err = 0; 2150 + 2151 + for (int i = 0; i < ARRAY_SIZE(masks); i++) 2152 + err |= check_fgt_masks(masks[i]); 2153 + 2154 + return ret ?: err; 2155 + } 2156 + 2157 + #define for_each_encoding_in(__x, __s, __e) \ 2158 + for (u32 __x = __s; __x <= __e; __x = encoding_next(__x)) 2159 + 2094 2160 int __init populate_nv_trap_config(void) 2095 2161 { 2096 2162 int ret = 0; ··· 2223 2041 BUILD_BUG_ON(__NR_CGT_GROUP_IDS__ > BIT(TC_CGT_BITS)); 2224 2042 BUILD_BUG_ON(__NR_FGT_GROUP_IDS__ > BIT(TC_FGT_BITS)); 2225 2043 BUILD_BUG_ON(__NR_FG_FILTER_IDS__ > BIT(TC_FGF_BITS)); 2044 + BUILD_BUG_ON(__HCRX_EL2_MASK & __HCRX_EL2_nMASK); 2226 2045 2227 2046 for (int i = 0; i < ARRAY_SIZE(encoding_to_cgt); i++) { 2228 2047 const struct encoding_to_trap_config *cgt = &encoding_to_cgt[i]; ··· 2234 2051 ret = -EINVAL; 2235 2052 } 2236 2053 2237 - for (u32 enc = cgt->encoding; enc <= cgt->end; enc = encoding_next(enc)) { 2054 + for_each_encoding_in(enc, cgt->encoding, cgt->end) { 2238 2055 prev = xa_store(&sr_forward_xa, enc, 2239 2056 xa_mk_value(cgt->tc.val), GFP_KERNEL); 2240 2057 if (prev && !xa_is_err(prev)) { ··· 2248 2065 } 2249 2066 } 2250 2067 } 2068 + 2069 + if (__HCRX_EL2_RES0 != HCRX_EL2_RES0) 2070 + kvm_info("Sanitised HCR_EL2_RES0 = %016llx, expecting %016llx\n", 2071 + __HCRX_EL2_RES0, HCRX_EL2_RES0); 2251 2072 2252 2073 kvm_info("nv: %ld coarse grained trap handlers\n", 2253 2074 ARRAY_SIZE(encoding_to_cgt)); ··· 2269 2082 print_nv_trap_error(fgt, "Invalid FGT", ret); 2270 2083 } 2271 2084 2272 - tc = get_trap_config(fgt->encoding); 2085 + for_each_encoding_in(enc, fgt->encoding, fgt->end) { 2086 + tc = get_trap_config(enc); 2273 2087 2274 - if (tc.fgt) { 2275 - ret = -EINVAL; 2276 - print_nv_trap_error(fgt, "Duplicate FGT", ret); 2277 - } 2088 + if (tc.fgt) { 2089 + ret = -EINVAL; 2090 + print_nv_trap_error(fgt, "Duplicate FGT", ret); 2091 + } 2278 2092 2279 - tc.val |= fgt->tc.val; 2280 - prev = xa_store(&sr_forward_xa, fgt->encoding, 2281 - xa_mk_value(tc.val), GFP_KERNEL); 2093 + tc.val |= fgt->tc.val; 2094 + prev = xa_store(&sr_forward_xa, enc, 2095 + xa_mk_value(tc.val), GFP_KERNEL); 2282 2096 2283 - if (xa_is_err(prev)) { 2284 - ret = xa_err(prev); 2285 - print_nv_trap_error(fgt, "Failed FGT insertion", ret); 2097 + if (xa_is_err(prev)) { 2098 + ret = xa_err(prev); 2099 + print_nv_trap_error(fgt, "Failed FGT insertion", ret); 2100 + } 2101 + 2102 + if (!aggregate_fgt(tc)) { 2103 + ret = -EINVAL; 2104 + print_nv_trap_error(fgt, "FGT bit is reserved", ret); 2105 + } 2286 2106 } 2287 2107 } 2108 + 2109 + for (int i = 0; i < ARRAY_SIZE(non_0x18_fgt); i++) { 2110 + if (!aggregate_fgt(non_0x18_fgt[i])) { 2111 + ret = -EINVAL; 2112 + kvm_err("non_0x18_fgt[%d] is reserved\n", i); 2113 + } 2114 + } 2115 + 2116 + ret = check_all_fgt_masks(ret); 2288 2117 2289 2118 kvm_info("nv: %ld fine grained trap handlers\n", 2290 2119 ARRAY_SIZE(encoding_to_fgt)); ··· 2418 2215 return masks->mask[sr - __VNCR_START__].res0; 2419 2216 } 2420 2217 2421 - static bool check_fgt_bit(struct kvm_vcpu *vcpu, bool is_read, 2422 - u64 val, const union trap_config tc) 2218 + static bool check_fgt_bit(struct kvm_vcpu *vcpu, enum vcpu_sysreg sr, 2219 + const union trap_config tc) 2423 2220 { 2424 2221 struct kvm *kvm = vcpu->kvm; 2425 - enum vcpu_sysreg sr; 2222 + u64 val; 2426 2223 2427 2224 /* 2428 2225 * KVM doesn't know about any FGTs that apply to the host, and hopefully ··· 2430 2227 */ 2431 2228 if (is_hyp_ctxt(vcpu)) 2432 2229 return false; 2230 + 2231 + val = __vcpu_sys_reg(vcpu, sr); 2433 2232 2434 2233 if (tc.pol) 2435 2234 return (val & BIT(tc.bit)); ··· 2447 2242 if (val & BIT(tc.bit)) 2448 2243 return false; 2449 2244 2450 - switch ((enum fgt_group_id)tc.fgt) { 2451 - case HFGxTR_GROUP: 2452 - sr = is_read ? HFGRTR_EL2 : HFGWTR_EL2; 2453 - break; 2454 - 2455 - case HDFGRTR_GROUP: 2456 - sr = is_read ? HDFGRTR_EL2 : HDFGWTR_EL2; 2457 - break; 2458 - 2459 - case HAFGRTR_GROUP: 2460 - sr = HAFGRTR_EL2; 2461 - break; 2462 - 2463 - case HFGITR_GROUP: 2464 - sr = HFGITR_EL2; 2465 - break; 2466 - 2467 - default: 2468 - WARN_ONCE(1, "Unhandled FGT group"); 2469 - return false; 2470 - } 2471 - 2472 2245 return !(kvm_get_sysreg_res0(kvm, sr) & BIT(tc.bit)); 2473 2246 } 2474 2247 2475 2248 bool triage_sysreg_trap(struct kvm_vcpu *vcpu, int *sr_index) 2476 2249 { 2250 + enum vcpu_sysreg fgtreg; 2477 2251 union trap_config tc; 2478 2252 enum trap_behaviour b; 2479 2253 bool is_read; 2480 2254 u32 sysreg; 2481 - u64 esr, val; 2255 + u64 esr; 2482 2256 2483 2257 esr = kvm_vcpu_get_esr(vcpu); 2484 2258 sysreg = esr_sys64_to_sysreg(esr); ··· 2503 2319 case __NO_FGT_GROUP__: 2504 2320 break; 2505 2321 2506 - case HFGxTR_GROUP: 2507 - if (is_read) 2508 - val = __vcpu_sys_reg(vcpu, HFGRTR_EL2); 2509 - else 2510 - val = __vcpu_sys_reg(vcpu, HFGWTR_EL2); 2322 + case HFGRTR_GROUP: 2323 + fgtreg = is_read ? HFGRTR_EL2 : HFGWTR_EL2; 2511 2324 break; 2512 2325 2513 2326 case HDFGRTR_GROUP: 2514 - if (is_read) 2515 - val = __vcpu_sys_reg(vcpu, HDFGRTR_EL2); 2516 - else 2517 - val = __vcpu_sys_reg(vcpu, HDFGWTR_EL2); 2327 + fgtreg = is_read ? HDFGRTR_EL2 : HDFGWTR_EL2; 2518 2328 break; 2519 2329 2520 2330 case HAFGRTR_GROUP: 2521 - val = __vcpu_sys_reg(vcpu, HAFGRTR_EL2); 2331 + fgtreg = HAFGRTR_EL2; 2522 2332 break; 2523 2333 2524 2334 case HFGITR_GROUP: 2525 - val = __vcpu_sys_reg(vcpu, HFGITR_EL2); 2335 + fgtreg = HFGITR_EL2; 2526 2336 switch (tc.fgf) { 2527 2337 u64 tmp; 2528 2338 ··· 2530 2352 } 2531 2353 break; 2532 2354 2533 - case __NR_FGT_GROUP_IDS__: 2355 + case HFGRTR2_GROUP: 2356 + fgtreg = is_read ? HFGRTR2_EL2 : HFGWTR2_EL2; 2357 + break; 2358 + 2359 + case HDFGRTR2_GROUP: 2360 + fgtreg = is_read ? HDFGRTR2_EL2 : HDFGWTR2_EL2; 2361 + break; 2362 + 2363 + case HFGITR2_GROUP: 2364 + fgtreg = HFGITR2_EL2; 2365 + break; 2366 + 2367 + default: 2534 2368 /* Something is really wrong, bail out */ 2535 - WARN_ONCE(1, "__NR_FGT_GROUP_IDS__"); 2369 + WARN_ONCE(1, "Bad FGT group (encoding %08x, config %016llx)\n", 2370 + sysreg, tc.val); 2536 2371 goto local; 2537 2372 } 2538 2373 2539 - if (tc.fgt != __NO_FGT_GROUP__ && check_fgt_bit(vcpu, is_read, val, tc)) 2374 + if (tc.fgt != __NO_FGT_GROUP__ && check_fgt_bit(vcpu, fgtreg, tc)) 2540 2375 goto inject; 2541 2376 2542 2377 b = compute_trap_behaviour(vcpu, tc); ··· 2661 2470 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu) 2662 2471 { 2663 2472 u64 spsr, elr, esr; 2664 - 2665 - /* 2666 - * Forward this trap to the virtual EL2 if the virtual 2667 - * HCR_EL2.NV bit is set and this is coming from !EL2. 2668 - */ 2669 - if (forward_hcr_traps(vcpu, HCR_NV)) 2670 - return; 2671 2473 2672 2474 spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2); 2673 2475 spsr = kvm_check_illegal_exception_return(vcpu, spsr);

+84

arch/arm64/kvm/handle_exit.c

··· 10 10 11 11 #include <linux/kvm.h> 12 12 #include <linux/kvm_host.h> 13 + #include <linux/ubsan.h> 13 14 14 15 #include <asm/esr.h> 15 16 #include <asm/exception.h> ··· 299 298 return 1; 300 299 } 301 300 301 + static int kvm_handle_gcs(struct kvm_vcpu *vcpu) 302 + { 303 + /* We don't expect GCS, so treat it with contempt */ 304 + if (kvm_has_feat(vcpu->kvm, ID_AA64PFR1_EL1, GCS, IMP)) 305 + WARN_ON_ONCE(1); 306 + 307 + kvm_inject_undefined(vcpu); 308 + return 1; 309 + } 310 + 311 + static int handle_other(struct kvm_vcpu *vcpu) 312 + { 313 + bool is_l2 = vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu); 314 + u64 hcrx = __vcpu_sys_reg(vcpu, HCRX_EL2); 315 + u64 esr = kvm_vcpu_get_esr(vcpu); 316 + u64 iss = ESR_ELx_ISS(esr); 317 + struct kvm *kvm = vcpu->kvm; 318 + bool allowed, fwd = false; 319 + 320 + /* 321 + * We only trap for two reasons: 322 + * 323 + * - the feature is disabled, and the only outcome is to 324 + * generate an UNDEF. 325 + * 326 + * - the feature is enabled, but a NV guest wants to trap the 327 + * feature used by its L2 guest. We forward the exception in 328 + * this case. 329 + * 330 + * What we don't expect is to end-up here if the guest is 331 + * expected be be able to directly use the feature, hence the 332 + * WARN_ON below. 333 + */ 334 + switch (iss) { 335 + case ESR_ELx_ISS_OTHER_ST64BV: 336 + allowed = kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64_V); 337 + if (is_l2) 338 + fwd = !(hcrx & HCRX_EL2_EnASR); 339 + break; 340 + case ESR_ELx_ISS_OTHER_ST64BV0: 341 + allowed = kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64_ACCDATA); 342 + if (is_l2) 343 + fwd = !(hcrx & HCRX_EL2_EnAS0); 344 + break; 345 + case ESR_ELx_ISS_OTHER_LDST64B: 346 + allowed = kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64); 347 + if (is_l2) 348 + fwd = !(hcrx & HCRX_EL2_EnALS); 349 + break; 350 + case ESR_ELx_ISS_OTHER_TSBCSYNC: 351 + allowed = kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceBuffer, TRBE_V1P1); 352 + if (is_l2) 353 + fwd = (__vcpu_sys_reg(vcpu, HFGITR2_EL2) & HFGITR2_EL2_TSBCSYNC); 354 + break; 355 + case ESR_ELx_ISS_OTHER_PSBCSYNC: 356 + allowed = kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSVer, V1P5); 357 + if (is_l2) 358 + fwd = (__vcpu_sys_reg(vcpu, HFGITR_EL2) & HFGITR_EL2_PSBCSYNC); 359 + break; 360 + default: 361 + /* Clearly, we're missing something. */ 362 + WARN_ON_ONCE(1); 363 + allowed = false; 364 + } 365 + 366 + WARN_ON_ONCE(allowed && !fwd); 367 + 368 + if (allowed && fwd) 369 + kvm_inject_nested_sync(vcpu, esr); 370 + else 371 + kvm_inject_undefined(vcpu); 372 + 373 + return 1; 374 + } 375 + 302 376 static exit_handle_fn arm_exit_handlers[] = { 303 377 [0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec, 304 378 [ESR_ELx_EC_WFx] = kvm_handle_wfx, ··· 383 307 [ESR_ELx_EC_CP14_LS] = kvm_handle_cp14_load_store, 384 308 [ESR_ELx_EC_CP10_ID] = kvm_handle_cp10_id, 385 309 [ESR_ELx_EC_CP14_64] = kvm_handle_cp14_64, 310 + [ESR_ELx_EC_OTHER] = handle_other, 386 311 [ESR_ELx_EC_HVC32] = handle_hvc, 387 312 [ESR_ELx_EC_SMC32] = handle_smc, 388 313 [ESR_ELx_EC_HVC64] = handle_hvc, ··· 394 317 [ESR_ELx_EC_ERET] = kvm_handle_eret, 395 318 [ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort, 396 319 [ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort, 320 + [ESR_ELx_EC_DABT_CUR] = kvm_handle_vncr_abort, 397 321 [ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug, 398 322 [ESR_ELx_EC_WATCHPT_LOW]= kvm_handle_guest_debug, 399 323 [ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug, ··· 402 324 [ESR_ELx_EC_BRK64] = kvm_handle_guest_debug, 403 325 [ESR_ELx_EC_FP_ASIMD] = kvm_handle_fpasimd, 404 326 [ESR_ELx_EC_PAC] = kvm_handle_ptrauth, 327 + [ESR_ELx_EC_GCS] = kvm_handle_gcs, 405 328 }; 406 329 407 330 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu) ··· 553 474 print_nvhe_hyp_panic("BUG", panic_addr); 554 475 } else if (IS_ENABLED(CONFIG_CFI_CLANG) && esr_is_cfi_brk(esr)) { 555 476 kvm_nvhe_report_cfi_failure(panic_addr); 477 + } else if (IS_ENABLED(CONFIG_UBSAN_KVM_EL2) && 478 + ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 && 479 + esr_is_ubsan_brk(esr)) { 480 + print_nvhe_hyp_panic(report_ubsan_failure(esr & UBSAN_BRK_MASK), 481 + panic_addr); 556 482 } else { 557 483 print_nvhe_hyp_panic("panic", panic_addr); 558 484 }

+102 -58

arch/arm64/kvm/hyp/include/hyp/switch.h

··· 65 65 } 66 66 } 67 67 68 + #define reg_to_fgt_masks(reg) \ 69 + ({ \ 70 + struct fgt_masks *m; \ 71 + switch(reg) { \ 72 + case HFGRTR_EL2: \ 73 + m = &hfgrtr_masks; \ 74 + break; \ 75 + case HFGWTR_EL2: \ 76 + m = &hfgwtr_masks; \ 77 + break; \ 78 + case HFGITR_EL2: \ 79 + m = &hfgitr_masks; \ 80 + break; \ 81 + case HDFGRTR_EL2: \ 82 + m = &hdfgrtr_masks; \ 83 + break; \ 84 + case HDFGWTR_EL2: \ 85 + m = &hdfgwtr_masks; \ 86 + break; \ 87 + case HAFGRTR_EL2: \ 88 + m = &hafgrtr_masks; \ 89 + break; \ 90 + case HFGRTR2_EL2: \ 91 + m = &hfgrtr2_masks; \ 92 + break; \ 93 + case HFGWTR2_EL2: \ 94 + m = &hfgwtr2_masks; \ 95 + break; \ 96 + case HFGITR2_EL2: \ 97 + m = &hfgitr2_masks; \ 98 + break; \ 99 + case HDFGRTR2_EL2: \ 100 + m = &hdfgrtr2_masks; \ 101 + break; \ 102 + case HDFGWTR2_EL2: \ 103 + m = &hdfgwtr2_masks; \ 104 + break; \ 105 + default: \ 106 + BUILD_BUG_ON(1); \ 107 + } \ 108 + \ 109 + m; \ 110 + }) 111 + 68 112 #define compute_clr_set(vcpu, reg, clr, set) \ 69 113 do { \ 70 - u64 hfg; \ 71 - hfg = __vcpu_sys_reg(vcpu, reg) & ~__ ## reg ## _RES0; \ 72 - set |= hfg & __ ## reg ## _MASK; \ 73 - clr |= ~hfg & __ ## reg ## _nMASK; \ 114 + u64 hfg = __vcpu_sys_reg(vcpu, reg); \ 115 + struct fgt_masks *m = reg_to_fgt_masks(reg); \ 116 + set |= hfg & m->mask; \ 117 + clr |= ~hfg & m->nmask; \ 74 118 } while(0) 75 119 76 120 #define reg_to_fgt_group_id(reg) \ ··· 123 79 switch(reg) { \ 124 80 case HFGRTR_EL2: \ 125 81 case HFGWTR_EL2: \ 126 - id = HFGxTR_GROUP; \ 82 + id = HFGRTR_GROUP; \ 127 83 break; \ 128 84 case HFGITR_EL2: \ 129 85 id = HFGITR_GROUP; \ ··· 135 91 case HAFGRTR_EL2: \ 136 92 id = HAFGRTR_GROUP; \ 137 93 break; \ 94 + case HFGRTR2_EL2: \ 95 + case HFGWTR2_EL2: \ 96 + id = HFGRTR2_GROUP; \ 97 + break; \ 98 + case HFGITR2_EL2: \ 99 + id = HFGITR2_GROUP; \ 100 + break; \ 101 + case HDFGRTR2_EL2: \ 102 + case HDFGWTR2_EL2: \ 103 + id = HDFGRTR2_GROUP; \ 104 + break; \ 138 105 default: \ 139 106 BUILD_BUG_ON(1); \ 140 107 } \ ··· 156 101 #define compute_undef_clr_set(vcpu, kvm, reg, clr, set) \ 157 102 do { \ 158 103 u64 hfg = kvm->arch.fgu[reg_to_fgt_group_id(reg)]; \ 159 - set |= hfg & __ ## reg ## _MASK; \ 160 - clr |= hfg & __ ## reg ## _nMASK; \ 104 + struct fgt_masks *m = reg_to_fgt_masks(reg); \ 105 + set |= hfg & m->mask; \ 106 + clr |= hfg & m->nmask; \ 161 107 } while(0) 162 108 163 109 #define update_fgt_traps_cs(hctxt, vcpu, kvm, reg, clr, set) \ 164 110 do { \ 165 - u64 c = 0, s = 0; \ 111 + struct fgt_masks *m = reg_to_fgt_masks(reg); \ 112 + u64 c = clr, s = set; \ 113 + u64 val; \ 166 114 \ 167 115 ctxt_sys_reg(hctxt, reg) = read_sysreg_s(SYS_ ## reg); \ 168 116 if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) \ ··· 173 115 \ 174 116 compute_undef_clr_set(vcpu, kvm, reg, c, s); \ 175 117 \ 176 - s |= set; \ 177 - c |= clr; \ 178 - if (c || s) { \ 179 - u64 val = __ ## reg ## _nMASK; \ 180 - val |= s; \ 181 - val &= ~c; \ 182 - write_sysreg_s(val, SYS_ ## reg); \ 183 - } \ 118 + val = m->nmask; \ 119 + val |= s; \ 120 + val &= ~c; \ 121 + write_sysreg_s(val, SYS_ ## reg); \ 184 122 } while(0) 185 123 186 124 #define update_fgt_traps(hctxt, vcpu, kvm, reg) \ 187 125 update_fgt_traps_cs(hctxt, vcpu, kvm, reg, 0, 0) 188 - 189 - /* 190 - * Validate the fine grain trap masks. 191 - * Check that the masks do not overlap and that all bits are accounted for. 192 - */ 193 - #define CHECK_FGT_MASKS(reg) \ 194 - do { \ 195 - BUILD_BUG_ON((__ ## reg ## _MASK) & (__ ## reg ## _nMASK)); \ 196 - BUILD_BUG_ON(~((__ ## reg ## _RES0) ^ (__ ## reg ## _MASK) ^ \ 197 - (__ ## reg ## _nMASK))); \ 198 - } while(0) 199 126 200 127 static inline bool cpu_has_amu(void) 201 128 { ··· 195 152 struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt); 196 153 struct kvm *kvm = kern_hyp_va(vcpu->kvm); 197 154 198 - CHECK_FGT_MASKS(HFGRTR_EL2); 199 - CHECK_FGT_MASKS(HFGWTR_EL2); 200 - CHECK_FGT_MASKS(HFGITR_EL2); 201 - CHECK_FGT_MASKS(HDFGRTR_EL2); 202 - CHECK_FGT_MASKS(HDFGWTR_EL2); 203 - CHECK_FGT_MASKS(HAFGRTR_EL2); 204 - CHECK_FGT_MASKS(HCRX_EL2); 205 - 206 155 if (!cpus_have_final_cap(ARM64_HAS_FGT)) 207 156 return; 208 157 209 158 update_fgt_traps(hctxt, vcpu, kvm, HFGRTR_EL2); 210 159 update_fgt_traps_cs(hctxt, vcpu, kvm, HFGWTR_EL2, 0, 211 160 cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38) ? 212 - HFGxTR_EL2_TCR_EL1_MASK : 0); 161 + HFGWTR_EL2_TCR_EL1_MASK : 0); 213 162 update_fgt_traps(hctxt, vcpu, kvm, HFGITR_EL2); 214 163 update_fgt_traps(hctxt, vcpu, kvm, HDFGRTR_EL2); 215 164 update_fgt_traps(hctxt, vcpu, kvm, HDFGWTR_EL2); 216 165 217 166 if (cpu_has_amu()) 218 167 update_fgt_traps(hctxt, vcpu, kvm, HAFGRTR_EL2); 168 + 169 + if (!cpus_have_final_cap(ARM64_HAS_FGT2)) 170 + return; 171 + 172 + update_fgt_traps(hctxt, vcpu, kvm, HFGRTR2_EL2); 173 + update_fgt_traps(hctxt, vcpu, kvm, HFGWTR2_EL2); 174 + update_fgt_traps(hctxt, vcpu, kvm, HFGITR2_EL2); 175 + update_fgt_traps(hctxt, vcpu, kvm, HDFGRTR2_EL2); 176 + update_fgt_traps(hctxt, vcpu, kvm, HDFGWTR2_EL2); 219 177 } 220 178 221 - #define __deactivate_fgt(htcxt, vcpu, kvm, reg) \ 179 + #define __deactivate_fgt(htcxt, vcpu, reg) \ 222 180 do { \ 223 - if ((vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) || \ 224 - kvm->arch.fgu[reg_to_fgt_group_id(reg)]) \ 225 - write_sysreg_s(ctxt_sys_reg(hctxt, reg), \ 226 - SYS_ ## reg); \ 181 + write_sysreg_s(ctxt_sys_reg(hctxt, reg), \ 182 + SYS_ ## reg); \ 227 183 } while(0) 228 184 229 185 static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu) 230 186 { 231 187 struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt); 232 - struct kvm *kvm = kern_hyp_va(vcpu->kvm); 233 188 234 189 if (!cpus_have_final_cap(ARM64_HAS_FGT)) 235 190 return; 236 191 237 - __deactivate_fgt(hctxt, vcpu, kvm, HFGRTR_EL2); 238 - if (cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) 239 - write_sysreg_s(ctxt_sys_reg(hctxt, HFGWTR_EL2), SYS_HFGWTR_EL2); 240 - else 241 - __deactivate_fgt(hctxt, vcpu, kvm, HFGWTR_EL2); 242 - __deactivate_fgt(hctxt, vcpu, kvm, HFGITR_EL2); 243 - __deactivate_fgt(hctxt, vcpu, kvm, HDFGRTR_EL2); 244 - __deactivate_fgt(hctxt, vcpu, kvm, HDFGWTR_EL2); 192 + __deactivate_fgt(hctxt, vcpu, HFGRTR_EL2); 193 + __deactivate_fgt(hctxt, vcpu, HFGWTR_EL2); 194 + __deactivate_fgt(hctxt, vcpu, HFGITR_EL2); 195 + __deactivate_fgt(hctxt, vcpu, HDFGRTR_EL2); 196 + __deactivate_fgt(hctxt, vcpu, HDFGWTR_EL2); 245 197 246 198 if (cpu_has_amu()) 247 - __deactivate_fgt(hctxt, vcpu, kvm, HAFGRTR_EL2); 199 + __deactivate_fgt(hctxt, vcpu, HAFGRTR_EL2); 200 + 201 + if (!cpus_have_final_cap(ARM64_HAS_FGT2)) 202 + return; 203 + 204 + __deactivate_fgt(hctxt, vcpu, HFGRTR2_EL2); 205 + __deactivate_fgt(hctxt, vcpu, HFGWTR2_EL2); 206 + __deactivate_fgt(hctxt, vcpu, HFGITR2_EL2); 207 + __deactivate_fgt(hctxt, vcpu, HDFGRTR2_EL2); 208 + __deactivate_fgt(hctxt, vcpu, HDFGWTR2_EL2); 248 209 } 249 210 250 211 static inline void __activate_traps_mpam(struct kvm_vcpu *vcpu) ··· 307 260 if (cpus_have_final_cap(ARM64_HAS_HCX)) { 308 261 u64 hcrx = vcpu->arch.hcrx_el2; 309 262 if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) { 310 - u64 clr = 0, set = 0; 311 - 312 - compute_clr_set(vcpu, HCRX_EL2, clr, set); 313 - 314 - hcrx |= set; 315 - hcrx &= ~clr; 263 + u64 val = __vcpu_sys_reg(vcpu, HCRX_EL2); 264 + hcrx |= val & __HCRX_EL2_MASK; 265 + hcrx &= ~(~val & __HCRX_EL2_nMASK); 316 266 } 317 267 318 268 ctxt_sys_reg(hctxt, HCRX_EL2) = read_sysreg_s(SYS_HCRX_EL2); ··· 344 300 if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM)) 345 301 hcr |= HCR_TVM; 346 302 347 - write_sysreg(hcr, hcr_el2); 303 + write_sysreg_hcr(hcr); 348 304 349 305 if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE)) 350 306 write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);

+10 -4

arch/arm64/kvm/hyp/include/nvhe/mem_protect.h

··· 39 39 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages); 40 40 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages); 41 41 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages); 42 - int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, 42 + int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu, 43 43 enum kvm_pgtable_prot prot); 44 - int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm); 44 + int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm); 45 45 int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot); 46 - int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm); 47 - int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm); 46 + int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm); 47 + int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm); 48 48 int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu); 49 49 50 50 bool addr_is_memory(phys_addr_t phys); ··· 67 67 else 68 68 write_sysreg(0, vttbr_el2); 69 69 } 70 + 71 + #ifdef CONFIG_NVHE_EL2_DEBUG 72 + void pkvm_ownership_selftest(void *base); 73 + #else 74 + static inline void pkvm_ownership_selftest(void *base) { } 75 + #endif 70 76 #endif /* __KVM_NVHE_MEM_PROTECT__ */

+46 -12

arch/arm64/kvm/hyp/include/nvhe/memory.h

··· 8 8 #include <linux/types.h> 9 9 10 10 /* 11 - * Bits 0-1 are reserved to track the memory ownership state of each page: 12 - * 00: The page is owned exclusively by the page-table owner. 13 - * 01: The page is owned by the page-table owner, but is shared 14 - * with another entity. 15 - * 10: The page is shared with, but not owned by the page-table owner. 16 - * 11: Reserved for future use (lending). 11 + * Bits 0-1 are used to encode the memory ownership state of each page from the 12 + * point of view of a pKVM "component" (host, hyp, guest, ... see enum 13 + * pkvm_component_id): 14 + * 00: The page is owned and exclusively accessible by the component; 15 + * 01: The page is owned and accessible by the component, but is also 16 + * accessible by another component; 17 + * 10: The page is accessible but not owned by the component; 18 + * The storage of this state depends on the component: either in the 19 + * hyp_vmemmap for the host and hyp states or in PTE software bits for guests. 17 20 */ 18 21 enum pkvm_page_state { 19 22 PKVM_PAGE_OWNED = 0ULL, 20 23 PKVM_PAGE_SHARED_OWNED = BIT(0), 21 24 PKVM_PAGE_SHARED_BORROWED = BIT(1), 22 - __PKVM_PAGE_RESERVED = BIT(0) | BIT(1), 23 25 24 - /* Meta-states which aren't encoded directly in the PTE's SW bits */ 25 - PKVM_NOPAGE = BIT(2), 26 + /* 27 + * 'Meta-states' are not stored directly in PTE SW bits for guest 28 + * states, but inferred from the context (e.g. invalid PTE entries). 29 + * For the host and hyp, meta-states are stored directly in the 30 + * struct hyp_page. 31 + */ 32 + PKVM_NOPAGE = BIT(0) | BIT(1), 26 33 }; 27 - #define PKVM_PAGE_META_STATES_MASK (~__PKVM_PAGE_RESERVED) 34 + #define PKVM_PAGE_STATE_MASK (BIT(0) | BIT(1)) 28 35 29 36 #define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1) 30 37 static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot, ··· 51 44 u16 refcount; 52 45 u8 order; 53 46 54 - /* Host (non-meta) state. Guarded by the host stage-2 lock. */ 55 - enum pkvm_page_state host_state : 8; 47 + /* Host state. Guarded by the host stage-2 lock. */ 48 + unsigned __host_state : 4; 49 + 50 + /* 51 + * Complement of the hyp state. Guarded by the hyp stage-1 lock. We use 52 + * the complement so that the initial 0 in __hyp_state_comp (due to the 53 + * entire vmemmap starting off zeroed) encodes PKVM_NOPAGE. 54 + */ 55 + unsigned __hyp_state_comp : 4; 56 56 57 57 u32 host_share_guest_count; 58 58 }; ··· 95 81 #define hyp_page_to_phys(page) hyp_pfn_to_phys((hyp_page_to_pfn(page))) 96 82 #define hyp_page_to_virt(page) __hyp_va(hyp_page_to_phys(page)) 97 83 #define hyp_page_to_pool(page) (((struct hyp_page *)page)->pool) 84 + 85 + static inline enum pkvm_page_state get_host_state(struct hyp_page *p) 86 + { 87 + return p->__host_state; 88 + } 89 + 90 + static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state) 91 + { 92 + p->__host_state = state; 93 + } 94 + 95 + static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p) 96 + { 97 + return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK; 98 + } 99 + 100 + static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state) 101 + { 102 + p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK; 103 + } 98 104 99 105 /* 100 106 * Refcounting for 'struct hyp_page'.

+3 -1

arch/arm64/kvm/hyp/include/nvhe/mm.h

··· 13 13 extern struct kvm_pgtable pkvm_pgtable; 14 14 extern hyp_spinlock_t pkvm_pgd_lock; 15 15 16 - int hyp_create_pcpu_fixmap(void); 16 + int hyp_create_fixmap(void); 17 17 void *hyp_fixmap_map(phys_addr_t phys); 18 18 void hyp_fixmap_unmap(void); 19 + void *hyp_fixblock_map(phys_addr_t phys, size_t *size); 20 + void hyp_fixblock_unmap(void); 19 21 20 22 int hyp_create_idmap(u32 hyp_va_bits); 21 23 int hyp_map_vectors(void);

+6

arch/arm64/kvm/hyp/nvhe/Makefile

··· 99 99 # causes a build failure. Remove profile optimization flags. 100 100 KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%, $(KBUILD_CFLAGS)) 101 101 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables 102 + 103 + ifeq ($(CONFIG_UBSAN_KVM_EL2),y) 104 + UBSAN_SANITIZE := y 105 + # Always use brk and not hooks 106 + ccflags-y += $(CFLAGS_UBSAN_TRAP) 107 + endif

+1 -1

arch/arm64/kvm/hyp/nvhe/host.S

··· 124 124 /* Ensure host stage-2 is disabled */ 125 125 mrs x0, hcr_el2 126 126 bic x0, x0, #HCR_VM 127 - msr hcr_el2, x0 127 + msr_hcr_el2 x0 128 128 isb 129 129 tlbi vmalls12e1 130 130 dsb nsh

+2 -2

arch/arm64/kvm/hyp/nvhe/hyp-init.S

··· 100 100 msr mair_el2, x1 101 101 102 102 ldr x1, [x0, #NVHE_INIT_HCR_EL2] 103 - msr hcr_el2, x1 103 + msr_hcr_el2 x1 104 104 105 105 mov x2, #HCR_E2H 106 106 and x2, x1, x2 ··· 262 262 263 263 alternative_if ARM64_KVM_PROTECTED_MODE 264 264 mov_q x5, HCR_HOST_NVHE_FLAGS 265 - msr hcr_el2, x5 265 + msr_hcr_el2 x5 266 266 alternative_else_nop_endif 267 267 268 268 /* Install stub vectors */

+10 -10

arch/arm64/kvm/hyp/nvhe/hyp-main.c

··· 123 123 124 124 hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt; 125 125 126 - hyp_vcpu->vcpu.arch.sve_state = kern_hyp_va(host_vcpu->arch.sve_state); 127 - /* Limit guest vector length to the maximum supported by the host. */ 128 - hyp_vcpu->vcpu.arch.sve_max_vl = min(host_vcpu->arch.sve_max_vl, kvm_host_sve_max_vl); 129 - 130 126 hyp_vcpu->vcpu.arch.mdcr_el2 = host_vcpu->arch.mdcr_el2; 131 127 hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE); 132 128 hyp_vcpu->vcpu.arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2) & ··· 245 249 { 246 250 DECLARE_REG(u64, pfn, host_ctxt, 1); 247 251 DECLARE_REG(u64, gfn, host_ctxt, 2); 248 - DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3); 252 + DECLARE_REG(u64, nr_pages, host_ctxt, 3); 253 + DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4); 249 254 struct pkvm_hyp_vcpu *hyp_vcpu; 250 255 int ret = -EINVAL; 251 256 ··· 261 264 if (ret) 262 265 goto out; 263 266 264 - ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot); 267 + ret = __pkvm_host_share_guest(pfn, gfn, nr_pages, hyp_vcpu, prot); 265 268 out: 266 269 cpu_reg(host_ctxt, 1) = ret; 267 270 } ··· 270 273 { 271 274 DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1); 272 275 DECLARE_REG(u64, gfn, host_ctxt, 2); 276 + DECLARE_REG(u64, nr_pages, host_ctxt, 3); 273 277 struct pkvm_hyp_vm *hyp_vm; 274 278 int ret = -EINVAL; 275 279 ··· 281 283 if (!hyp_vm) 282 284 goto out; 283 285 284 - ret = __pkvm_host_unshare_guest(gfn, hyp_vm); 286 + ret = __pkvm_host_unshare_guest(gfn, nr_pages, hyp_vm); 285 287 put_pkvm_hyp_vm(hyp_vm); 286 288 out: 287 289 cpu_reg(host_ctxt, 1) = ret; ··· 310 312 { 311 313 DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1); 312 314 DECLARE_REG(u64, gfn, host_ctxt, 2); 315 + DECLARE_REG(u64, nr_pages, host_ctxt, 3); 313 316 struct pkvm_hyp_vm *hyp_vm; 314 317 int ret = -EINVAL; 315 318 ··· 321 322 if (!hyp_vm) 322 323 goto out; 323 324 324 - ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm); 325 + ret = __pkvm_host_wrprotect_guest(gfn, nr_pages, hyp_vm); 325 326 put_pkvm_hyp_vm(hyp_vm); 326 327 out: 327 328 cpu_reg(host_ctxt, 1) = ret; ··· 331 332 { 332 333 DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1); 333 334 DECLARE_REG(u64, gfn, host_ctxt, 2); 334 - DECLARE_REG(bool, mkold, host_ctxt, 3); 335 + DECLARE_REG(u64, nr_pages, host_ctxt, 3); 336 + DECLARE_REG(bool, mkold, host_ctxt, 4); 335 337 struct pkvm_hyp_vm *hyp_vm; 336 338 int ret = -EINVAL; 337 339 ··· 343 343 if (!hyp_vm) 344 344 goto out; 345 345 346 - ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm); 346 + ret = __pkvm_host_test_clear_young_guest(gfn, nr_pages, mkold, hyp_vm); 347 347 put_pkvm_hyp_vm(hyp_vm); 348 348 out: 349 349 cpu_reg(host_ctxt, 1) = ret;

+2

arch/arm64/kvm/hyp/nvhe/hyp.lds.S

··· 25 25 BEGIN_HYP_SECTION(.data..percpu) 26 26 PERCPU_INPUT(L1_CACHE_BYTES) 27 27 END_HYP_SECTION 28 + 28 29 HYP_SECTION(.bss) 30 + HYP_SECTION(.data) 29 31 }

+389 -121

arch/arm64/kvm/hyp/nvhe/mem_protect.c

··· 60 60 hyp_spin_unlock(&pkvm_pgd_lock); 61 61 } 62 62 63 + #define for_each_hyp_page(__p, __st, __sz) \ 64 + for (struct hyp_page *__p = hyp_phys_to_page(__st), \ 65 + *__e = __p + ((__sz) >> PAGE_SHIFT); \ 66 + __p < __e; __p++) 67 + 63 68 static void *host_s2_zalloc_pages_exact(size_t size) 64 69 { 65 70 void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size)); ··· 166 161 return 0; 167 162 } 168 163 169 - static bool guest_stage2_force_pte_cb(u64 addr, u64 end, 170 - enum kvm_pgtable_prot prot) 171 - { 172 - return true; 173 - } 174 - 175 164 static void *guest_s2_zalloc_pages_exact(size_t size) 176 165 { 177 166 void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size)); ··· 216 217 hyp_put_page(&current_vm->pool, addr); 217 218 } 218 219 220 + static void __apply_guest_page(void *va, size_t size, 221 + void (*func)(void *addr, size_t size)) 222 + { 223 + size += va - PTR_ALIGN_DOWN(va, PAGE_SIZE); 224 + va = PTR_ALIGN_DOWN(va, PAGE_SIZE); 225 + size = PAGE_ALIGN(size); 226 + 227 + while (size) { 228 + size_t map_size = PAGE_SIZE; 229 + void *map; 230 + 231 + if (IS_ALIGNED((unsigned long)va, PMD_SIZE) && size >= PMD_SIZE) 232 + map = hyp_fixblock_map(__hyp_pa(va), &map_size); 233 + else 234 + map = hyp_fixmap_map(__hyp_pa(va)); 235 + 236 + func(map, map_size); 237 + 238 + if (map_size == PMD_SIZE) 239 + hyp_fixblock_unmap(); 240 + else 241 + hyp_fixmap_unmap(); 242 + 243 + size -= map_size; 244 + va += map_size; 245 + } 246 + } 247 + 219 248 static void clean_dcache_guest_page(void *va, size_t size) 220 249 { 221 - __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size); 222 - hyp_fixmap_unmap(); 250 + __apply_guest_page(va, size, __clean_dcache_guest_page); 223 251 } 224 252 225 253 static void invalidate_icache_guest_page(void *va, size_t size) 226 254 { 227 - __invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size); 228 - hyp_fixmap_unmap(); 255 + __apply_guest_page(va, size, __invalidate_icache_guest_page); 229 256 } 230 257 231 258 int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd) ··· 280 255 }; 281 256 282 257 guest_lock_component(vm); 283 - ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0, 284 - guest_stage2_force_pte_cb); 258 + ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0, NULL); 285 259 guest_unlock_component(vm); 286 260 if (ret) 287 261 return ret; ··· 333 309 */ 334 310 kvm_flush_dcache_to_poc(params, sizeof(*params)); 335 311 336 - write_sysreg(params->hcr_el2, hcr_el2); 312 + write_sysreg_hcr(params->hcr_el2); 337 313 __load_stage2(&host_mmu.arch.mmu, &host_mmu.arch); 338 314 339 315 /* ··· 491 467 return -EAGAIN; 492 468 493 469 if (pte) { 494 - WARN_ON(addr_is_memory(addr) && hyp_phys_to_page(addr)->host_state != PKVM_NOPAGE); 470 + WARN_ON(addr_is_memory(addr) && 471 + get_host_state(hyp_phys_to_page(addr)) != PKVM_NOPAGE); 495 472 return -EPERM; 496 473 } 497 474 ··· 518 493 519 494 static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state) 520 495 { 521 - phys_addr_t end = addr + size; 522 - 523 - for (; addr < end; addr += PAGE_SIZE) 524 - hyp_phys_to_page(addr)->host_state = state; 496 + for_each_hyp_page(page, addr, size) 497 + set_host_state(page, state); 525 498 } 526 499 527 500 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id) ··· 641 618 static int __host_check_page_state_range(u64 addr, u64 size, 642 619 enum pkvm_page_state state) 643 620 { 644 - u64 end = addr + size; 645 621 int ret; 646 622 647 - ret = check_range_allowed_memory(addr, end); 623 + ret = check_range_allowed_memory(addr, addr + size); 648 624 if (ret) 649 625 return ret; 650 626 651 627 hyp_assert_lock_held(&host_mmu.lock); 652 - for (; addr < end; addr += PAGE_SIZE) { 653 - if (hyp_phys_to_page(addr)->host_state != state) 628 + 629 + for_each_hyp_page(page, addr, size) { 630 + if (get_host_state(page) != state) 654 631 return -EPERM; 655 632 } 656 633 ··· 660 637 static int __host_set_page_state_range(u64 addr, u64 size, 661 638 enum pkvm_page_state state) 662 639 { 663 - if (hyp_phys_to_page(addr)->host_state == PKVM_NOPAGE) { 640 + if (get_host_state(hyp_phys_to_page(addr)) == PKVM_NOPAGE) { 664 641 int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT); 665 642 666 643 if (ret) ··· 672 649 return 0; 673 650 } 674 651 675 - static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte, u64 addr) 652 + static void __hyp_set_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state) 676 653 { 677 - if (!kvm_pte_valid(pte)) 678 - return PKVM_NOPAGE; 679 - 680 - return pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte)); 654 + for_each_hyp_page(page, phys, size) 655 + set_hyp_state(page, state); 681 656 } 682 657 683 - static int __hyp_check_page_state_range(u64 addr, u64 size, 684 - enum pkvm_page_state state) 658 + static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state) 685 659 { 686 - struct check_walk_data d = { 687 - .desired = state, 688 - .get_page_state = hyp_get_page_state, 689 - }; 660 + for_each_hyp_page(page, phys, size) { 661 + if (get_hyp_state(page) != state) 662 + return -EPERM; 663 + } 690 664 691 - hyp_assert_lock_held(&pkvm_pgd_lock); 692 - return check_page_state_range(&pkvm_pgtable, addr, size, &d); 665 + return 0; 693 666 } 694 667 695 668 static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr) ··· 696 677 return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)); 697 678 } 698 679 699 - static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr, 680 + static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr, 700 681 u64 size, enum pkvm_page_state state) 701 682 { 702 - struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu); 703 683 struct check_walk_data d = { 704 684 .desired = state, 705 685 .get_page_state = guest_get_page_state, ··· 711 693 int __pkvm_host_share_hyp(u64 pfn) 712 694 { 713 695 u64 phys = hyp_pfn_to_phys(pfn); 714 - void *virt = __hyp_va(phys); 715 - enum kvm_pgtable_prot prot; 716 696 u64 size = PAGE_SIZE; 717 697 int ret; 718 698 ··· 720 704 ret = __host_check_page_state_range(phys, size, PKVM_PAGE_OWNED); 721 705 if (ret) 722 706 goto unlock; 723 - if (IS_ENABLED(CONFIG_NVHE_EL2_DEBUG)) { 724 - ret = __hyp_check_page_state_range((u64)virt, size, PKVM_NOPAGE); 725 - if (ret) 726 - goto unlock; 727 - } 707 + ret = __hyp_check_page_state_range(phys, size, PKVM_NOPAGE); 708 + if (ret) 709 + goto unlock; 728 710 729 - prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED); 730 - WARN_ON(pkvm_create_mappings_locked(virt, virt + size, prot)); 711 + __hyp_set_page_state_range(phys, size, PKVM_PAGE_SHARED_BORROWED); 731 712 WARN_ON(__host_set_page_state_range(phys, size, PKVM_PAGE_SHARED_OWNED)); 732 713 733 714 unlock: ··· 747 734 ret = __host_check_page_state_range(phys, size, PKVM_PAGE_SHARED_OWNED); 748 735 if (ret) 749 736 goto unlock; 750 - ret = __hyp_check_page_state_range(virt, size, PKVM_PAGE_SHARED_BORROWED); 737 + ret = __hyp_check_page_state_range(phys, size, PKVM_PAGE_SHARED_BORROWED); 751 738 if (ret) 752 739 goto unlock; 753 740 if (hyp_page_count((void *)virt)) { ··· 755 742 goto unlock; 756 743 } 757 744 758 - WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size); 745 + __hyp_set_page_state_range(phys, size, PKVM_NOPAGE); 759 746 WARN_ON(__host_set_page_state_range(phys, size, PKVM_PAGE_OWNED)); 760 747 761 748 unlock: ··· 770 757 u64 phys = hyp_pfn_to_phys(pfn); 771 758 u64 size = PAGE_SIZE * nr_pages; 772 759 void *virt = __hyp_va(phys); 773 - enum kvm_pgtable_prot prot; 774 760 int ret; 775 761 776 762 host_lock_component(); ··· 778 766 ret = __host_check_page_state_range(phys, size, PKVM_PAGE_OWNED); 779 767 if (ret) 780 768 goto unlock; 781 - if (IS_ENABLED(CONFIG_NVHE_EL2_DEBUG)) { 782 - ret = __hyp_check_page_state_range((u64)virt, size, PKVM_NOPAGE); 783 - if (ret) 784 - goto unlock; 785 - } 769 + ret = __hyp_check_page_state_range(phys, size, PKVM_NOPAGE); 770 + if (ret) 771 + goto unlock; 786 772 787 - prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED); 788 - WARN_ON(pkvm_create_mappings_locked(virt, virt + size, prot)); 773 + __hyp_set_page_state_range(phys, size, PKVM_PAGE_OWNED); 774 + WARN_ON(pkvm_create_mappings_locked(virt, virt + size, PAGE_HYP)); 789 775 WARN_ON(host_stage2_set_owner_locked(phys, size, PKVM_ID_HYP)); 790 776 791 777 unlock: ··· 803 793 host_lock_component(); 804 794 hyp_lock_component(); 805 795 806 - ret = __hyp_check_page_state_range(virt, size, PKVM_PAGE_OWNED); 796 + ret = __hyp_check_page_state_range(phys, size, PKVM_PAGE_OWNED); 807 797 if (ret) 808 798 goto unlock; 809 - if (IS_ENABLED(CONFIG_NVHE_EL2_DEBUG)) { 810 - ret = __host_check_page_state_range(phys, size, PKVM_NOPAGE); 811 - if (ret) 812 - goto unlock; 813 - } 799 + ret = __host_check_page_state_range(phys, size, PKVM_NOPAGE); 800 + if (ret) 801 + goto unlock; 814 802 803 + __hyp_set_page_state_range(phys, size, PKVM_NOPAGE); 815 804 WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size); 816 805 WARN_ON(host_stage2_set_owner_locked(phys, size, PKVM_ID_HOST)); 817 806 ··· 825 816 { 826 817 u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE); 827 818 u64 end = PAGE_ALIGN((u64)to); 819 + u64 phys = __hyp_pa(start); 828 820 u64 size = end - start; 821 + struct hyp_page *p; 829 822 int ret; 830 823 831 824 host_lock_component(); 832 825 hyp_lock_component(); 833 826 834 - ret = __host_check_page_state_range(__hyp_pa(start), size, 835 - PKVM_PAGE_SHARED_OWNED); 827 + ret = __host_check_page_state_range(phys, size, PKVM_PAGE_SHARED_OWNED); 836 828 if (ret) 837 829 goto unlock; 838 830 839 - ret = __hyp_check_page_state_range(start, size, 840 - PKVM_PAGE_SHARED_BORROWED); 831 + ret = __hyp_check_page_state_range(phys, size, PKVM_PAGE_SHARED_BORROWED); 841 832 if (ret) 842 833 goto unlock; 843 834 844 - for (cur = start; cur < end; cur += PAGE_SIZE) 845 - hyp_page_ref_inc(hyp_virt_to_page(cur)); 835 + for (cur = start; cur < end; cur += PAGE_SIZE) { 836 + p = hyp_virt_to_page(cur); 837 + hyp_page_ref_inc(p); 838 + if (p->refcount == 1) 839 + WARN_ON(pkvm_create_mappings_locked((void *)cur, 840 + (void *)cur + PAGE_SIZE, 841 + PAGE_HYP)); 842 + } 846 843 847 844 unlock: 848 845 hyp_unlock_component(); ··· 861 846 { 862 847 u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE); 863 848 u64 end = PAGE_ALIGN((u64)to); 849 + struct hyp_page *p; 864 850 865 851 host_lock_component(); 866 852 hyp_lock_component(); 867 853 868 - for (cur = start; cur < end; cur += PAGE_SIZE) 869 - hyp_page_ref_dec(hyp_virt_to_page(cur)); 854 + for (cur = start; cur < end; cur += PAGE_SIZE) { 855 + p = hyp_virt_to_page(cur); 856 + if (p->refcount == 1) 857 + WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, cur, PAGE_SIZE) != PAGE_SIZE); 858 + hyp_page_ref_dec(p); 859 + } 870 860 871 861 hyp_unlock_component(); 872 862 host_unlock_component(); ··· 907 887 return ret; 908 888 } 909 889 910 - int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu, 890 + static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *size) 891 + { 892 + size_t block_size; 893 + 894 + if (nr_pages == 1) { 895 + *size = PAGE_SIZE; 896 + return 0; 897 + } 898 + 899 + /* We solely support second to last level huge mapping */ 900 + block_size = kvm_granule_size(KVM_PGTABLE_LAST_LEVEL - 1); 901 + 902 + if (nr_pages != block_size >> PAGE_SHIFT) 903 + return -EINVAL; 904 + 905 + if (!IS_ALIGNED(phys | ipa, block_size)) 906 + return -EINVAL; 907 + 908 + *size = block_size; 909 + return 0; 910 + } 911 + 912 + int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu, 911 913 enum kvm_pgtable_prot prot) 912 914 { 913 915 struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu); 914 916 u64 phys = hyp_pfn_to_phys(pfn); 915 917 u64 ipa = hyp_pfn_to_phys(gfn); 916 - struct hyp_page *page; 918 + u64 size; 917 919 int ret; 918 920 919 921 if (prot & ~KVM_PGTABLE_PROT_RWX) 920 922 return -EINVAL; 921 923 922 - ret = check_range_allowed_memory(phys, phys + PAGE_SIZE); 924 + ret = __guest_check_transition_size(phys, ipa, nr_pages, &size); 925 + if (ret) 926 + return ret; 927 + 928 + ret = check_range_allowed_memory(phys, phys + size); 923 929 if (ret) 924 930 return ret; 925 931 926 932 host_lock_component(); 927 933 guest_lock_component(vm); 928 934 929 - ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE); 935 + ret = __guest_check_page_state_range(vm, ipa, size, PKVM_NOPAGE); 930 936 if (ret) 931 937 goto unlock; 932 938 933 - page = hyp_phys_to_page(phys); 934 - switch (page->host_state) { 935 - case PKVM_PAGE_OWNED: 936 - WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED)); 937 - break; 938 - case PKVM_PAGE_SHARED_OWNED: 939 - if (page->host_share_guest_count) 940 - break; 941 - /* Only host to np-guest multi-sharing is tolerated */ 942 - WARN_ON(1); 943 - fallthrough; 944 - default: 945 - ret = -EPERM; 946 - goto unlock; 939 + for_each_hyp_page(page, phys, size) { 940 + switch (get_host_state(page)) { 941 + case PKVM_PAGE_OWNED: 942 + continue; 943 + case PKVM_PAGE_SHARED_OWNED: 944 + if (page->host_share_guest_count == U32_MAX) { 945 + ret = -EBUSY; 946 + goto unlock; 947 + } 948 + 949 + /* Only host to np-guest multi-sharing is tolerated */ 950 + if (page->host_share_guest_count) 951 + continue; 952 + 953 + fallthrough; 954 + default: 955 + ret = -EPERM; 956 + goto unlock; 957 + } 947 958 } 948 959 949 - WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys, 960 + for_each_hyp_page(page, phys, size) { 961 + set_host_state(page, PKVM_PAGE_SHARED_OWNED); 962 + page->host_share_guest_count++; 963 + } 964 + 965 + WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, size, phys, 950 966 pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED), 951 967 &vcpu->vcpu.arch.pkvm_memcache, 0)); 952 - page->host_share_guest_count++; 953 968 954 969 unlock: 955 970 guest_unlock_component(vm); ··· 993 938 return ret; 994 939 } 995 940 996 - static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa) 941 + static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa, u64 size) 997 942 { 998 943 enum pkvm_page_state state; 999 - struct hyp_page *page; 1000 944 kvm_pte_t pte; 1001 945 u64 phys; 1002 946 s8 level; ··· 1006 952 return ret; 1007 953 if (!kvm_pte_valid(pte)) 1008 954 return -ENOENT; 1009 - if (level != KVM_PGTABLE_LAST_LEVEL) 955 + if (kvm_granule_size(level) != size) 1010 956 return -E2BIG; 1011 957 1012 958 state = guest_get_page_state(pte, ipa); ··· 1014 960 return -EPERM; 1015 961 1016 962 phys = kvm_pte_to_phys(pte); 1017 - ret = check_range_allowed_memory(phys, phys + PAGE_SIZE); 963 + ret = check_range_allowed_memory(phys, phys + size); 1018 964 if (WARN_ON(ret)) 1019 965 return ret; 1020 966 1021 - page = hyp_phys_to_page(phys); 1022 - if (page->host_state != PKVM_PAGE_SHARED_OWNED) 1023 - return -EPERM; 1024 - if (WARN_ON(!page->host_share_guest_count)) 1025 - return -EINVAL; 967 + for_each_hyp_page(page, phys, size) { 968 + if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED) 969 + return -EPERM; 970 + if (WARN_ON(!page->host_share_guest_count)) 971 + return -EINVAL; 972 + } 1026 973 1027 974 *__phys = phys; 1028 975 1029 976 return 0; 1030 977 } 1031 978 1032 - int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *vm) 979 + int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm) 1033 980 { 1034 981 u64 ipa = hyp_pfn_to_phys(gfn); 1035 - struct hyp_page *page; 1036 - u64 phys; 982 + u64 size, phys; 1037 983 int ret; 984 + 985 + ret = __guest_check_transition_size(0, ipa, nr_pages, &size); 986 + if (ret) 987 + return ret; 1038 988 1039 989 host_lock_component(); 1040 990 guest_lock_component(vm); 1041 991 1042 - ret = __check_host_shared_guest(vm, &phys, ipa); 992 + ret = __check_host_shared_guest(vm, &phys, ipa, size); 1043 993 if (ret) 1044 994 goto unlock; 1045 995 1046 - ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE); 996 + ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, size); 1047 997 if (ret) 1048 998 goto unlock; 1049 999 1050 - page = hyp_phys_to_page(phys); 1051 - page->host_share_guest_count--; 1052 - if (!page->host_share_guest_count) 1053 - WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED)); 1000 + for_each_hyp_page(page, phys, size) { 1001 + /* __check_host_shared_guest() protects against underflow */ 1002 + page->host_share_guest_count--; 1003 + if (!page->host_share_guest_count) 1004 + set_host_state(page, PKVM_PAGE_OWNED); 1005 + } 1054 1006 1055 1007 unlock: 1056 1008 guest_unlock_component(vm); ··· 1065 1005 return ret; 1066 1006 } 1067 1007 1068 - static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa) 1008 + static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa, u64 size) 1069 1009 { 1070 1010 u64 phys; 1071 1011 int ret; ··· 1076 1016 host_lock_component(); 1077 1017 guest_lock_component(vm); 1078 1018 1079 - ret = __check_host_shared_guest(vm, &phys, ipa); 1019 + ret = __check_host_shared_guest(vm, &phys, ipa, size); 1080 1020 1081 1021 guest_unlock_component(vm); 1082 1022 host_unlock_component(); ··· 1096 1036 if (prot & ~KVM_PGTABLE_PROT_RWX) 1097 1037 return -EINVAL; 1098 1038 1099 - assert_host_shared_guest(vm, ipa); 1039 + assert_host_shared_guest(vm, ipa, PAGE_SIZE); 1100 1040 guest_lock_component(vm); 1101 1041 ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0); 1102 1042 guest_unlock_component(vm); ··· 1104 1044 return ret; 1105 1045 } 1106 1046 1107 - int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm) 1047 + int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm) 1108 1048 { 1109 - u64 ipa = hyp_pfn_to_phys(gfn); 1049 + u64 size, ipa = hyp_pfn_to_phys(gfn); 1110 1050 int ret; 1111 1051 1112 1052 if (pkvm_hyp_vm_is_protected(vm)) 1113 1053 return -EPERM; 1114 1054 1115 - assert_host_shared_guest(vm, ipa); 1055 + ret = __guest_check_transition_size(0, ipa, nr_pages, &size); 1056 + if (ret) 1057 + return ret; 1058 + 1059 + assert_host_shared_guest(vm, ipa, size); 1116 1060 guest_lock_component(vm); 1117 - ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE); 1061 + ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, size); 1118 1062 guest_unlock_component(vm); 1119 1063 1120 1064 return ret; 1121 1065 } 1122 1066 1123 - int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm) 1067 + int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm) 1124 1068 { 1125 - u64 ipa = hyp_pfn_to_phys(gfn); 1069 + u64 size, ipa = hyp_pfn_to_phys(gfn); 1126 1070 int ret; 1127 1071 1128 1072 if (pkvm_hyp_vm_is_protected(vm)) 1129 1073 return -EPERM; 1130 1074 1131 - assert_host_shared_guest(vm, ipa); 1075 + ret = __guest_check_transition_size(0, ipa, nr_pages, &size); 1076 + if (ret) 1077 + return ret; 1078 + 1079 + assert_host_shared_guest(vm, ipa, size); 1132 1080 guest_lock_component(vm); 1133 - ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold); 1081 + ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, size, mkold); 1134 1082 guest_unlock_component(vm); 1135 1083 1136 1084 return ret; ··· 1152 1084 if (pkvm_hyp_vm_is_protected(vm)) 1153 1085 return -EPERM; 1154 1086 1155 - assert_host_shared_guest(vm, ipa); 1087 + assert_host_shared_guest(vm, ipa, PAGE_SIZE); 1156 1088 guest_lock_component(vm); 1157 1089 kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0); 1158 1090 guest_unlock_component(vm); 1159 1091 1160 1092 return 0; 1161 1093 } 1094 + 1095 + #ifdef CONFIG_NVHE_EL2_DEBUG 1096 + struct pkvm_expected_state { 1097 + enum pkvm_page_state host; 1098 + enum pkvm_page_state hyp; 1099 + enum pkvm_page_state guest[2]; /* [ gfn, gfn + 1 ] */ 1100 + }; 1101 + 1102 + static struct pkvm_expected_state selftest_state; 1103 + static struct hyp_page *selftest_page; 1104 + 1105 + static struct pkvm_hyp_vm selftest_vm = { 1106 + .kvm = { 1107 + .arch = { 1108 + .mmu = { 1109 + .arch = &selftest_vm.kvm.arch, 1110 + .pgt = &selftest_vm.pgt, 1111 + }, 1112 + }, 1113 + }, 1114 + }; 1115 + 1116 + static struct pkvm_hyp_vcpu selftest_vcpu = { 1117 + .vcpu = { 1118 + .arch = { 1119 + .hw_mmu = &selftest_vm.kvm.arch.mmu, 1120 + }, 1121 + .kvm = &selftest_vm.kvm, 1122 + }, 1123 + }; 1124 + 1125 + static void init_selftest_vm(void *virt) 1126 + { 1127 + struct hyp_page *p = hyp_virt_to_page(virt); 1128 + int i; 1129 + 1130 + selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr; 1131 + WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt)); 1132 + 1133 + for (i = 0; i < pkvm_selftest_pages(); i++) { 1134 + if (p[i].refcount) 1135 + continue; 1136 + p[i].refcount = 1; 1137 + hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i])); 1138 + } 1139 + } 1140 + 1141 + static u64 selftest_ipa(void) 1142 + { 1143 + return BIT(selftest_vm.pgt.ia_bits - 1); 1144 + } 1145 + 1146 + static void assert_page_state(void) 1147 + { 1148 + void *virt = hyp_page_to_virt(selftest_page); 1149 + u64 size = PAGE_SIZE << selftest_page->order; 1150 + struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu; 1151 + u64 phys = hyp_virt_to_phys(virt); 1152 + u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE }; 1153 + struct pkvm_hyp_vm *vm; 1154 + 1155 + vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu); 1156 + 1157 + host_lock_component(); 1158 + WARN_ON(__host_check_page_state_range(phys, size, selftest_state.host)); 1159 + host_unlock_component(); 1160 + 1161 + hyp_lock_component(); 1162 + WARN_ON(__hyp_check_page_state_range(phys, size, selftest_state.hyp)); 1163 + hyp_unlock_component(); 1164 + 1165 + guest_lock_component(&selftest_vm); 1166 + WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0])); 1167 + WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1])); 1168 + guest_unlock_component(&selftest_vm); 1169 + } 1170 + 1171 + #define assert_transition_res(res, fn, ...) \ 1172 + do { \ 1173 + WARN_ON(fn(__VA_ARGS__) != res); \ 1174 + assert_page_state(); \ 1175 + } while (0) 1176 + 1177 + void pkvm_ownership_selftest(void *base) 1178 + { 1179 + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX; 1180 + void *virt = hyp_alloc_pages(&host_s2_pool, 0); 1181 + struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu; 1182 + struct pkvm_hyp_vm *vm = &selftest_vm; 1183 + u64 phys, size, pfn, gfn; 1184 + 1185 + WARN_ON(!virt); 1186 + selftest_page = hyp_virt_to_page(virt); 1187 + selftest_page->refcount = 0; 1188 + init_selftest_vm(base); 1189 + 1190 + size = PAGE_SIZE << selftest_page->order; 1191 + phys = hyp_virt_to_phys(virt); 1192 + pfn = hyp_phys_to_pfn(phys); 1193 + gfn = hyp_phys_to_pfn(selftest_ipa()); 1194 + 1195 + selftest_state.host = PKVM_NOPAGE; 1196 + selftest_state.hyp = PKVM_PAGE_OWNED; 1197 + selftest_state.guest[0] = selftest_state.guest[1] = PKVM_NOPAGE; 1198 + assert_page_state(); 1199 + assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1); 1200 + assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn); 1201 + assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn); 1202 + assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1); 1203 + assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1); 1204 + assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size); 1205 + assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1206 + assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm); 1207 + 1208 + selftest_state.host = PKVM_PAGE_OWNED; 1209 + selftest_state.hyp = PKVM_NOPAGE; 1210 + assert_transition_res(0, __pkvm_hyp_donate_host, pfn, 1); 1211 + assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1); 1212 + assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn); 1213 + assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1); 1214 + assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm); 1215 + assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size); 1216 + 1217 + selftest_state.host = PKVM_PAGE_SHARED_OWNED; 1218 + selftest_state.hyp = PKVM_PAGE_SHARED_BORROWED; 1219 + assert_transition_res(0, __pkvm_host_share_hyp, pfn); 1220 + assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn); 1221 + assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1); 1222 + assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1); 1223 + assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1); 1224 + assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1225 + assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm); 1226 + 1227 + assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size); 1228 + assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size); 1229 + hyp_unpin_shared_mem(virt, virt + size); 1230 + WARN_ON(hyp_page_count(virt) != 1); 1231 + assert_transition_res(-EBUSY, __pkvm_host_unshare_hyp, pfn); 1232 + assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn); 1233 + assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1); 1234 + assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1); 1235 + assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1); 1236 + assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1237 + assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm); 1238 + 1239 + hyp_unpin_shared_mem(virt, virt + size); 1240 + assert_page_state(); 1241 + WARN_ON(hyp_page_count(virt)); 1242 + 1243 + selftest_state.host = PKVM_PAGE_OWNED; 1244 + selftest_state.hyp = PKVM_NOPAGE; 1245 + assert_transition_res(0, __pkvm_host_unshare_hyp, pfn); 1246 + 1247 + selftest_state.host = PKVM_PAGE_SHARED_OWNED; 1248 + selftest_state.hyp = PKVM_NOPAGE; 1249 + assert_transition_res(0, __pkvm_host_share_ffa, pfn, 1); 1250 + assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1); 1251 + assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1); 1252 + assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn); 1253 + assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn); 1254 + assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1); 1255 + assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1256 + assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm); 1257 + assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size); 1258 + 1259 + selftest_state.host = PKVM_PAGE_OWNED; 1260 + selftest_state.hyp = PKVM_NOPAGE; 1261 + assert_transition_res(0, __pkvm_host_unshare_ffa, pfn, 1); 1262 + assert_transition_res(-EPERM, __pkvm_host_unshare_ffa, pfn, 1); 1263 + 1264 + selftest_state.host = PKVM_PAGE_SHARED_OWNED; 1265 + selftest_state.guest[0] = PKVM_PAGE_SHARED_BORROWED; 1266 + assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1267 + assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot); 1268 + assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1); 1269 + assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1); 1270 + assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn); 1271 + assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn); 1272 + assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1); 1273 + assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size); 1274 + 1275 + selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED; 1276 + assert_transition_res(0, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot); 1277 + WARN_ON(hyp_virt_to_page(virt)->host_share_guest_count != 2); 1278 + 1279 + selftest_state.guest[0] = PKVM_NOPAGE; 1280 + assert_transition_res(0, __pkvm_host_unshare_guest, gfn, 1, vm); 1281 + 1282 + selftest_state.guest[1] = PKVM_NOPAGE; 1283 + selftest_state.host = PKVM_PAGE_OWNED; 1284 + assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, 1, vm); 1285 + 1286 + selftest_state.host = PKVM_NOPAGE; 1287 + selftest_state.hyp = PKVM_PAGE_OWNED; 1288 + assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1); 1289 + 1290 + selftest_page->refcount = 1; 1291 + hyp_put_page(&host_s2_pool, virt); 1292 + } 1293 + #endif

+89 -8

arch/arm64/kvm/hyp/nvhe/mm.c

··· 229 229 return 0; 230 230 } 231 231 232 - void *hyp_fixmap_map(phys_addr_t phys) 232 + static void *fixmap_map_slot(struct hyp_fixmap_slot *slot, phys_addr_t phys) 233 233 { 234 - struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots); 235 234 kvm_pte_t pte, *ptep = slot->ptep; 236 235 237 236 pte = *ptep; ··· 242 243 return (void *)slot->addr; 243 244 } 244 245 246 + void *hyp_fixmap_map(phys_addr_t phys) 247 + { 248 + return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys); 249 + } 250 + 245 251 static void fixmap_clear_slot(struct hyp_fixmap_slot *slot) 246 252 { 247 253 kvm_pte_t *ptep = slot->ptep; 248 254 u64 addr = slot->addr; 255 + u32 level; 256 + 257 + if (FIELD_GET(KVM_PTE_TYPE, *ptep) == KVM_PTE_TYPE_PAGE) 258 + level = KVM_PGTABLE_LAST_LEVEL; 259 + else 260 + level = KVM_PGTABLE_LAST_LEVEL - 1; /* create_fixblock() guarantees PMD level */ 249 261 250 262 WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID); 251 263 ··· 270 260 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03 271 261 */ 272 262 dsb(ishst); 273 - __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL); 263 + __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level); 274 264 dsb(ish); 275 265 isb(); 276 266 } ··· 283 273 static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx, 284 274 enum kvm_pgtable_walk_flags visit) 285 275 { 286 - struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg); 276 + struct hyp_fixmap_slot *slot = (struct hyp_fixmap_slot *)ctx->arg; 287 277 288 - if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL) 278 + if (!kvm_pte_valid(ctx->old) || (ctx->end - ctx->start) != kvm_granule_size(ctx->level)) 289 279 return -EINVAL; 290 280 291 281 slot->addr = ctx->addr; ··· 306 296 struct kvm_pgtable_walker walker = { 307 297 .cb = __create_fixmap_slot_cb, 308 298 .flags = KVM_PGTABLE_WALK_LEAF, 309 - .arg = (void *)cpu, 299 + .arg = per_cpu_ptr(&fixmap_slots, cpu), 310 300 }; 311 301 312 302 return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker); 313 303 } 314 304 315 - int hyp_create_pcpu_fixmap(void) 305 + #if PAGE_SHIFT < 16 306 + #define HAS_FIXBLOCK 307 + static struct hyp_fixmap_slot hyp_fixblock_slot; 308 + static DEFINE_HYP_SPINLOCK(hyp_fixblock_lock); 309 + #endif 310 + 311 + static int create_fixblock(void) 312 + { 313 + #ifdef HAS_FIXBLOCK 314 + struct kvm_pgtable_walker walker = { 315 + .cb = __create_fixmap_slot_cb, 316 + .flags = KVM_PGTABLE_WALK_LEAF, 317 + .arg = &hyp_fixblock_slot, 318 + }; 319 + unsigned long addr; 320 + phys_addr_t phys; 321 + int ret, i; 322 + 323 + /* Find a RAM phys address, PMD aligned */ 324 + for (i = 0; i < hyp_memblock_nr; i++) { 325 + phys = ALIGN(hyp_memory[i].base, PMD_SIZE); 326 + if (phys + PMD_SIZE < (hyp_memory[i].base + hyp_memory[i].size)) 327 + break; 328 + } 329 + 330 + if (i >= hyp_memblock_nr) 331 + return -EINVAL; 332 + 333 + hyp_spin_lock(&pkvm_pgd_lock); 334 + addr = ALIGN(__io_map_base, PMD_SIZE); 335 + ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE); 336 + if (ret) 337 + goto unlock; 338 + 339 + ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP); 340 + if (ret) 341 + goto unlock; 342 + 343 + ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker); 344 + 345 + unlock: 346 + hyp_spin_unlock(&pkvm_pgd_lock); 347 + 348 + return ret; 349 + #else 350 + return 0; 351 + #endif 352 + } 353 + 354 + void *hyp_fixblock_map(phys_addr_t phys, size_t *size) 355 + { 356 + #ifdef HAS_FIXBLOCK 357 + *size = PMD_SIZE; 358 + hyp_spin_lock(&hyp_fixblock_lock); 359 + return fixmap_map_slot(&hyp_fixblock_slot, phys); 360 + #else 361 + *size = PAGE_SIZE; 362 + return hyp_fixmap_map(phys); 363 + #endif 364 + } 365 + 366 + void hyp_fixblock_unmap(void) 367 + { 368 + #ifdef HAS_FIXBLOCK 369 + fixmap_clear_slot(&hyp_fixblock_slot); 370 + hyp_spin_unlock(&hyp_fixblock_lock); 371 + #else 372 + hyp_fixmap_unmap(); 373 + #endif 374 + } 375 + 376 + int hyp_create_fixmap(void) 316 377 { 317 378 unsigned long addr, i; 318 379 int ret; ··· 403 322 return ret; 404 323 } 405 324 406 - return 0; 325 + return create_fixblock(); 407 326 } 408 327 409 328 int hyp_create_idmap(u32 hyp_va_bits)

+44 -3

arch/arm64/kvm/hyp/nvhe/pkvm.c

··· 372 372 hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1); 373 373 } 374 374 375 + static void unpin_host_sve_state(struct pkvm_hyp_vcpu *hyp_vcpu) 376 + { 377 + void *sve_state; 378 + 379 + if (!vcpu_has_feature(&hyp_vcpu->vcpu, KVM_ARM_VCPU_SVE)) 380 + return; 381 + 382 + sve_state = kern_hyp_va(hyp_vcpu->vcpu.arch.sve_state); 383 + hyp_unpin_shared_mem(sve_state, 384 + sve_state + vcpu_sve_state_size(&hyp_vcpu->vcpu)); 385 + } 386 + 375 387 static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[], 376 388 unsigned int nr_vcpus) 377 389 { ··· 396 384 continue; 397 385 398 386 unpin_host_vcpu(hyp_vcpu->host_vcpu); 387 + unpin_host_sve_state(hyp_vcpu); 399 388 } 400 389 } 401 390 ··· 411 398 pkvm_init_features_from_host(hyp_vm, host_kvm); 412 399 } 413 400 414 - static void pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu) 401 + static int pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu) 415 402 { 416 403 struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu; 404 + unsigned int sve_max_vl; 405 + size_t sve_state_size; 406 + void *sve_state; 407 + int ret = 0; 417 408 418 - if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) 409 + if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) { 419 410 vcpu_clear_flag(vcpu, VCPU_SVE_FINALIZED); 411 + return 0; 412 + } 413 + 414 + /* Limit guest vector length to the maximum supported by the host. */ 415 + sve_max_vl = min(READ_ONCE(host_vcpu->arch.sve_max_vl), kvm_host_sve_max_vl); 416 + sve_state_size = sve_state_size_from_vl(sve_max_vl); 417 + sve_state = kern_hyp_va(READ_ONCE(host_vcpu->arch.sve_state)); 418 + 419 + if (!sve_state || !sve_state_size) { 420 + ret = -EINVAL; 421 + goto err; 422 + } 423 + 424 + ret = hyp_pin_shared_mem(sve_state, sve_state + sve_state_size); 425 + if (ret) 426 + goto err; 427 + 428 + vcpu->arch.sve_state = sve_state; 429 + vcpu->arch.sve_max_vl = sve_max_vl; 430 + 431 + return 0; 432 + err: 433 + clear_bit(KVM_ARM_VCPU_SVE, vcpu->kvm->arch.vcpu_features); 434 + return ret; 420 435 } 421 436 422 437 static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu, ··· 473 432 if (ret) 474 433 goto done; 475 434 476 - pkvm_vcpu_init_sve(hyp_vcpu, host_vcpu); 435 + ret = pkvm_vcpu_init_sve(hyp_vcpu, host_vcpu); 477 436 done: 478 437 if (ret) 479 438 unpin_host_vcpu(host_vcpu);

+23 -4

arch/arm64/kvm/hyp/nvhe/setup.c

··· 28 28 static void *vm_table_base; 29 29 static void *hyp_pgt_base; 30 30 static void *host_s2_pgt_base; 31 + static void *selftest_base; 31 32 static void *ffa_proxy_pages; 32 33 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops; 33 34 static struct hyp_pool hpool; ··· 38 37 unsigned long nr_pages; 39 38 40 39 hyp_early_alloc_init(virt, size); 40 + 41 + nr_pages = pkvm_selftest_pages(); 42 + selftest_base = hyp_early_alloc_contig(nr_pages); 43 + if (nr_pages && !selftest_base) 44 + return -ENOMEM; 41 45 42 46 nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page)); 43 47 vmemmap_base = hyp_early_alloc_contig(nr_pages); ··· 125 119 if (ret) 126 120 return ret; 127 121 122 + ret = pkvm_create_mappings(__hyp_data_start, __hyp_data_end, PAGE_HYP); 123 + if (ret) 124 + return ret; 125 + 128 126 ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO); 129 127 if (ret) 130 128 return ret; ··· 190 180 enum kvm_pgtable_walk_flags visit) 191 181 { 192 182 enum pkvm_page_state state; 183 + struct hyp_page *page; 193 184 phys_addr_t phys; 194 185 195 186 if (!kvm_pte_valid(ctx->old)) ··· 203 192 if (!addr_is_memory(phys)) 204 193 return -EINVAL; 205 194 195 + page = hyp_phys_to_page(phys); 196 + 206 197 /* 207 198 * Adjust the host stage-2 mappings to match the ownership attributes 208 - * configured in the hypervisor stage-1. 199 + * configured in the hypervisor stage-1, and make sure to propagate them 200 + * to the hyp_vmemmap state. 209 201 */ 210 202 state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old)); 211 203 switch (state) { 212 204 case PKVM_PAGE_OWNED: 205 + set_hyp_state(page, PKVM_PAGE_OWNED); 213 206 return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP); 214 207 case PKVM_PAGE_SHARED_OWNED: 215 - hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_BORROWED; 208 + set_hyp_state(page, PKVM_PAGE_SHARED_OWNED); 209 + set_host_state(page, PKVM_PAGE_SHARED_BORROWED); 216 210 break; 217 211 case PKVM_PAGE_SHARED_BORROWED: 218 - hyp_phys_to_page(phys)->host_state = PKVM_PAGE_SHARED_OWNED; 212 + set_hyp_state(page, PKVM_PAGE_SHARED_BORROWED); 213 + set_host_state(page, PKVM_PAGE_SHARED_OWNED); 219 214 break; 220 215 default: 221 216 return -EINVAL; ··· 312 295 if (ret) 313 296 goto out; 314 297 315 - ret = hyp_create_pcpu_fixmap(); 298 + ret = hyp_create_fixmap(); 316 299 if (ret) 317 300 goto out; 318 301 ··· 321 304 goto out; 322 305 323 306 pkvm_hyp_vm_table_init(vm_table_base); 307 + 308 + pkvm_ownership_selftest(selftest_base); 324 309 out: 325 310 /* 326 311 * We tail-called to here from handle___pkvm_init() and will not return,

+13 -1

arch/arm64/kvm/hyp/nvhe/switch.c

··· 33 33 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt); 34 34 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector); 35 35 36 + struct fgt_masks hfgrtr_masks; 37 + struct fgt_masks hfgwtr_masks; 38 + struct fgt_masks hfgitr_masks; 39 + struct fgt_masks hdfgrtr_masks; 40 + struct fgt_masks hdfgwtr_masks; 41 + struct fgt_masks hafgrtr_masks; 42 + struct fgt_masks hfgrtr2_masks; 43 + struct fgt_masks hfgwtr2_masks; 44 + struct fgt_masks hfgitr2_masks; 45 + struct fgt_masks hdfgrtr2_masks; 46 + struct fgt_masks hdfgwtr2_masks; 47 + 36 48 extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc); 37 49 38 50 static void __activate_cptr_traps(struct kvm_vcpu *vcpu) ··· 154 142 155 143 __deactivate_traps_common(vcpu); 156 144 157 - write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2); 145 + write_sysreg_hcr(this_cpu_ptr(&kvm_init_params)->hcr_el2); 158 146 159 147 __deactivate_cptr_traps(vcpu); 160 148 write_sysreg(__kvm_hyp_host_vector, vbar_el2);

-6

arch/arm64/kvm/hyp/pgtable.c

··· 11 11 #include <asm/kvm_pgtable.h> 12 12 #include <asm/stage2_pgtable.h> 13 13 14 - 15 - #define KVM_PTE_TYPE BIT(1) 16 - #define KVM_PTE_TYPE_BLOCK 0 17 - #define KVM_PTE_TYPE_PAGE 1 18 - #define KVM_PTE_TYPE_TABLE 1 19 - 20 14 struct kvm_pgtable_walk_data { 21 15 struct kvm_pgtable_walker *walker; 22 16

+6 -6

arch/arm64/kvm/hyp/vgic-v3-sr.c

··· 446 446 if (has_vhe()) { 447 447 flags = local_daif_save(); 448 448 } else { 449 - sysreg_clear_set(hcr_el2, 0, HCR_AMO | HCR_FMO | HCR_IMO); 449 + sysreg_clear_set_hcr(0, HCR_AMO | HCR_FMO | HCR_IMO); 450 450 isb(); 451 451 } 452 452 ··· 461 461 if (has_vhe()) { 462 462 local_daif_restore(flags); 463 463 } else { 464 - sysreg_clear_set(hcr_el2, HCR_AMO | HCR_FMO | HCR_IMO, 0); 464 + sysreg_clear_set_hcr(HCR_AMO | HCR_FMO | HCR_IMO, 0); 465 465 isb(); 466 466 } 467 467 ··· 1058 1058 switch (sysreg) { 1059 1059 case SYS_ICC_IGRPEN0_EL1: 1060 1060 if (is_read && 1061 - (__vcpu_sys_reg(vcpu, HFGRTR_EL2) & HFGxTR_EL2_ICC_IGRPENn_EL1)) 1061 + (__vcpu_sys_reg(vcpu, HFGRTR_EL2) & HFGRTR_EL2_ICC_IGRPENn_EL1)) 1062 1062 return true; 1063 1063 1064 1064 if (!is_read && 1065 - (__vcpu_sys_reg(vcpu, HFGWTR_EL2) & HFGxTR_EL2_ICC_IGRPENn_EL1)) 1065 + (__vcpu_sys_reg(vcpu, HFGWTR_EL2) & HFGWTR_EL2_ICC_IGRPENn_EL1)) 1066 1066 return true; 1067 1067 1068 1068 fallthrough; ··· 1079 1079 1080 1080 case SYS_ICC_IGRPEN1_EL1: 1081 1081 if (is_read && 1082 - (__vcpu_sys_reg(vcpu, HFGRTR_EL2) & HFGxTR_EL2_ICC_IGRPENn_EL1)) 1082 + (__vcpu_sys_reg(vcpu, HFGRTR_EL2) & HFGRTR_EL2_ICC_IGRPENn_EL1)) 1083 1083 return true; 1084 1084 1085 1085 if (!is_read && 1086 - (__vcpu_sys_reg(vcpu, HFGWTR_EL2) & HFGxTR_EL2_ICC_IGRPENn_EL1)) 1086 + (__vcpu_sys_reg(vcpu, HFGWTR_EL2) & HFGWTR_EL2_ICC_IGRPENn_EL1)) 1087 1087 return true; 1088 1088 1089 1089 fallthrough;

+44 -4

arch/arm64/kvm/hyp/vhe/switch.c

··· 48 48 49 49 static u64 __compute_hcr(struct kvm_vcpu *vcpu) 50 50 { 51 + u64 guest_hcr = __vcpu_sys_reg(vcpu, HCR_EL2); 51 52 u64 hcr = vcpu->arch.hcr_el2; 52 53 53 54 if (!vcpu_has_nv(vcpu)) 54 55 return hcr; 55 56 57 + /* 58 + * We rely on the invariant that a vcpu entered from HYP 59 + * context must also exit in the same context, as only an ERET 60 + * instruction can kick us out of it, and we obviously trap 61 + * that sucker. PSTATE.M will get fixed-up on exit. 62 + */ 56 63 if (is_hyp_ctxt(vcpu)) { 64 + host_data_set_flag(VCPU_IN_HYP_CONTEXT); 65 + 57 66 hcr |= HCR_NV | HCR_NV2 | HCR_AT | HCR_TTLB; 58 67 59 68 if (!vcpu_el2_e2h_is_set(vcpu)) 60 69 hcr |= HCR_NV1; 61 70 62 71 write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2); 72 + } else { 73 + host_data_clear_flag(VCPU_IN_HYP_CONTEXT); 74 + 75 + if (guest_hcr & HCR_NV) { 76 + u64 va = __fix_to_virt(vncr_fixmap(smp_processor_id())); 77 + 78 + /* Inherit the low bits from the actual register */ 79 + va |= __vcpu_sys_reg(vcpu, VNCR_EL2) & GENMASK(PAGE_SHIFT - 1, 0); 80 + write_sysreg_s(va, SYS_VNCR_EL2); 81 + 82 + /* Force NV2 in case the guest is forgetful... */ 83 + guest_hcr |= HCR_NV2; 84 + } 63 85 } 64 86 65 - return hcr | (__vcpu_sys_reg(vcpu, HCR_EL2) & ~NV_HCR_GUEST_EXCLUDE); 87 + BUG_ON(host_data_test_flag(VCPU_IN_HYP_CONTEXT) && 88 + host_data_test_flag(L1_VNCR_MAPPED)); 89 + 90 + return hcr | (guest_hcr & ~NV_HCR_GUEST_EXCLUDE); 66 91 } 67 92 68 93 static void __activate_cptr_traps(struct kvm_vcpu *vcpu) ··· 209 184 210 185 ___deactivate_traps(vcpu); 211 186 212 - write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2); 187 + write_sysreg_hcr(HCR_HOST_VHE_FLAGS); 213 188 214 189 if (has_cntpoff()) { 215 190 struct timer_map map; ··· 484 459 if (ret) 485 460 return false; 486 461 462 + /* 463 + * If we have to check for any VNCR mapping being invalidated, 464 + * go back to the slow path for further processing. 465 + */ 466 + if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu) && 467 + atomic_read(&vcpu->kvm->arch.vncr_map_count)) 468 + return false; 469 + 487 470 __kvm_skip_instr(vcpu); 488 471 489 472 return true; ··· 601 568 602 569 /* 603 570 * If we were in HYP context on entry, adjust the PSTATE view 604 - * so that the usual helpers work correctly. 571 + * so that the usual helpers work correctly. This enforces our 572 + * invariant that the guest's HYP context status is preserved 573 + * across a run. 605 574 */ 606 - if (vcpu_has_nv(vcpu) && (read_sysreg(hcr_el2) & HCR_NV)) { 575 + if (vcpu_has_nv(vcpu) && 576 + unlikely(host_data_test_flag(VCPU_IN_HYP_CONTEXT))) { 607 577 u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT); 608 578 609 579 switch (mode) { ··· 621 585 *vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT); 622 586 *vcpu_cpsr(vcpu) |= mode; 623 587 } 588 + 589 + /* Apply extreme paranoia! */ 590 + BUG_ON(vcpu_has_nv(vcpu) && 591 + !!host_data_test_flag(VCPU_IN_HYP_CONTEXT) != is_hyp_ctxt(vcpu)); 624 592 625 593 return __fixup_guest_exit(vcpu, exit_code, hyp_exit_handlers); 626 594 }

+2 -2

arch/arm64/kvm/hyp/vhe/tlb.c

··· 63 63 __load_stage2(mmu, mmu->arch); 64 64 val = read_sysreg(hcr_el2); 65 65 val &= ~HCR_TGE; 66 - write_sysreg(val, hcr_el2); 66 + write_sysreg_hcr(val); 67 67 isb(); 68 68 } 69 69 ··· 73 73 * We're done with the TLB operation, let's restore the host's 74 74 * view of HCR_EL2. 75 75 */ 76 - write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2); 76 + write_sysreg_hcr(HCR_HOST_VHE_FLAGS); 77 77 isb(); 78 78 79 79 /* ... and the stage-2 MMU context that we switched away from */

+5 -1

arch/arm64/kvm/mmu.c

··· 1304 1304 if (map_size == PAGE_SIZE) 1305 1305 return true; 1306 1306 1307 + /* pKVM only supports PMD_SIZE huge-mappings */ 1308 + if (is_protected_kvm_enabled() && map_size != PMD_SIZE) 1309 + return false; 1310 + 1307 1311 size = memslot->npages * PAGE_SIZE; 1308 1312 1309 1313 gpa_start = memslot->base_gfn << PAGE_SHIFT; ··· 1544 1540 * logging_active is guaranteed to never be true for VM_PFNMAP 1545 1541 * memslots. 1546 1542 */ 1547 - if (logging_active || is_protected_kvm_enabled()) { 1543 + if (logging_active) { 1548 1544 force_pte = true; 1549 1545 vma_shift = PAGE_SHIFT; 1550 1546 } else {

+642 -204

arch/arm64/kvm/nested.c

··· 8 8 #include <linux/kvm.h> 9 9 #include <linux/kvm_host.h> 10 10 11 + #include <asm/fixmap.h> 11 12 #include <asm/kvm_arm.h> 12 13 #include <asm/kvm_emulate.h> 13 14 #include <asm/kvm_mmu.h> ··· 16 15 #include <asm/sysreg.h> 17 16 18 17 #include "sys_regs.h" 18 + 19 + struct vncr_tlb { 20 + /* The guest's VNCR_EL2 */ 21 + u64 gva; 22 + struct s1_walk_info wi; 23 + struct s1_walk_result wr; 24 + 25 + u64 hpa; 26 + 27 + /* -1 when not mapped on a CPU */ 28 + int cpu; 29 + 30 + /* 31 + * true if the TLB is valid. Can only be changed with the 32 + * mmu_lock held. 33 + */ 34 + bool valid; 35 + }; 19 36 20 37 /* 21 38 * Ratio of live shadow S2 MMU per vcpu. This is a trade-off between ··· 47 28 { 48 29 kvm->arch.nested_mmus = NULL; 49 30 kvm->arch.nested_mmus_size = 0; 31 + atomic_set(&kvm->arch.vncr_map_count, 0); 50 32 } 51 33 52 34 static int init_nested_s2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu) ··· 74 54 if (test_bit(KVM_ARM_VCPU_HAS_EL2_E2H0, kvm->arch.vcpu_features) && 75 55 !cpus_have_final_cap(ARM64_HAS_HCR_NV1)) 76 56 return -EINVAL; 57 + 58 + if (!vcpu->arch.ctxt.vncr_array) 59 + vcpu->arch.ctxt.vncr_array = (u64 *)__get_free_page(GFP_KERNEL_ACCOUNT | 60 + __GFP_ZERO); 61 + 62 + if (!vcpu->arch.ctxt.vncr_array) 63 + return -ENOMEM; 77 64 78 65 /* 79 66 * Let's treat memory allocation failures as benign: If we fail to ··· 111 84 if (ret) { 112 85 for (int i = kvm->arch.nested_mmus_size; i < num_mmus; i++) 113 86 kvm_free_stage2_pgd(&kvm->arch.nested_mmus[i]); 87 + 88 + free_page((unsigned long)vcpu->arch.ctxt.vncr_array); 89 + vcpu->arch.ctxt.vncr_array = NULL; 114 90 115 91 return ret; 116 92 } ··· 435 405 return max_size; 436 406 } 437 407 408 + static u8 pgshift_level_to_ttl(u16 shift, u8 level) 409 + { 410 + u8 ttl; 411 + 412 + switch(shift) { 413 + case 12: 414 + ttl = TLBI_TTL_TG_4K; 415 + break; 416 + case 14: 417 + ttl = TLBI_TTL_TG_16K; 418 + break; 419 + case 16: 420 + ttl = TLBI_TTL_TG_64K; 421 + break; 422 + default: 423 + BUG(); 424 + } 425 + 426 + ttl <<= 2; 427 + ttl |= level & 3; 428 + 429 + return ttl; 430 + } 431 + 438 432 /* 439 433 * Compute the equivalent of the TTL field by parsing the shadow PT. The 440 434 * granule size is extracted from the cached VTCR_EL2.TG0 while the level is ··· 730 676 void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) 731 677 { 732 678 /* 733 - * The vCPU kept its reference on the MMU after the last put, keep 734 - * rolling with it. 679 + * If the vCPU kept its reference on the MMU after the last put, 680 + * keep rolling with it. 735 681 */ 736 - if (vcpu->arch.hw_mmu) 737 - return; 738 - 739 682 if (is_hyp_ctxt(vcpu)) { 740 - vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; 683 + if (!vcpu->arch.hw_mmu) 684 + vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; 741 685 } else { 742 - write_lock(&vcpu->kvm->mmu_lock); 743 - vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu); 744 - write_unlock(&vcpu->kvm->mmu_lock); 686 + if (!vcpu->arch.hw_mmu) { 687 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) 688 + vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu); 689 + } 690 + 691 + if (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV) 692 + kvm_make_request(KVM_REQ_MAP_L1_VNCR_EL2, vcpu); 745 693 } 746 694 } 747 695 748 696 void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) 749 697 { 698 + /* Unconditionally drop the VNCR mapping if we have one */ 699 + if (host_data_test_flag(L1_VNCR_MAPPED)) { 700 + BUG_ON(vcpu->arch.vncr_tlb->cpu != smp_processor_id()); 701 + BUG_ON(is_hyp_ctxt(vcpu)); 702 + 703 + clear_fixmap(vncr_fixmap(vcpu->arch.vncr_tlb->cpu)); 704 + vcpu->arch.vncr_tlb->cpu = -1; 705 + host_data_clear_flag(L1_VNCR_MAPPED); 706 + atomic_dec(&vcpu->kvm->arch.vncr_map_count); 707 + } 708 + 750 709 /* 751 710 * Keep a reference on the associated stage-2 MMU if the vCPU is 752 711 * scheduling out and not in WFI emulation, suggesting it is likely to ··· 810 743 return kvm_inject_nested_sync(vcpu, esr_el2); 811 744 } 812 745 746 + static void invalidate_vncr(struct vncr_tlb *vt) 747 + { 748 + vt->valid = false; 749 + if (vt->cpu != -1) 750 + clear_fixmap(vncr_fixmap(vt->cpu)); 751 + } 752 + 753 + static void kvm_invalidate_vncr_ipa(struct kvm *kvm, u64 start, u64 end) 754 + { 755 + struct kvm_vcpu *vcpu; 756 + unsigned long i; 757 + 758 + lockdep_assert_held_write(&kvm->mmu_lock); 759 + 760 + if (!kvm_has_feat(kvm, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY)) 761 + return; 762 + 763 + kvm_for_each_vcpu(i, vcpu, kvm) { 764 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 765 + u64 ipa_start, ipa_end, ipa_size; 766 + 767 + /* 768 + * Careful here: We end-up here from an MMU notifier, 769 + * and this can race against a vcpu not being onlined 770 + * yet, without the pseudo-TLB being allocated. 771 + * 772 + * Skip those, as they obviously don't participate in 773 + * the invalidation at this stage. 774 + */ 775 + if (!vt) 776 + continue; 777 + 778 + if (!vt->valid) 779 + continue; 780 + 781 + ipa_size = ttl_to_size(pgshift_level_to_ttl(vt->wi.pgshift, 782 + vt->wr.level)); 783 + ipa_start = vt->wr.pa & (ipa_size - 1); 784 + ipa_end = ipa_start + ipa_size; 785 + 786 + if (ipa_end <= start || ipa_start >= end) 787 + continue; 788 + 789 + invalidate_vncr(vt); 790 + } 791 + } 792 + 793 + struct s1e2_tlbi_scope { 794 + enum { 795 + TLBI_ALL, 796 + TLBI_VA, 797 + TLBI_VAA, 798 + TLBI_ASID, 799 + } type; 800 + 801 + u16 asid; 802 + u64 va; 803 + u64 size; 804 + }; 805 + 806 + static void invalidate_vncr_va(struct kvm *kvm, 807 + struct s1e2_tlbi_scope *scope) 808 + { 809 + struct kvm_vcpu *vcpu; 810 + unsigned long i; 811 + 812 + lockdep_assert_held_write(&kvm->mmu_lock); 813 + 814 + kvm_for_each_vcpu(i, vcpu, kvm) { 815 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 816 + u64 va_start, va_end, va_size; 817 + 818 + if (!vt->valid) 819 + continue; 820 + 821 + va_size = ttl_to_size(pgshift_level_to_ttl(vt->wi.pgshift, 822 + vt->wr.level)); 823 + va_start = vt->gva & (va_size - 1); 824 + va_end = va_start + va_size; 825 + 826 + switch (scope->type) { 827 + case TLBI_ALL: 828 + break; 829 + 830 + case TLBI_VA: 831 + if (va_end <= scope->va || 832 + va_start >= (scope->va + scope->size)) 833 + continue; 834 + if (vt->wr.nG && vt->wr.asid != scope->asid) 835 + continue; 836 + break; 837 + 838 + case TLBI_VAA: 839 + if (va_end <= scope->va || 840 + va_start >= (scope->va + scope->size)) 841 + continue; 842 + break; 843 + 844 + case TLBI_ASID: 845 + if (!vt->wr.nG || vt->wr.asid != scope->asid) 846 + continue; 847 + break; 848 + } 849 + 850 + invalidate_vncr(vt); 851 + } 852 + } 853 + 854 + static void compute_s1_tlbi_range(struct kvm_vcpu *vcpu, u32 inst, u64 val, 855 + struct s1e2_tlbi_scope *scope) 856 + { 857 + switch (inst) { 858 + case OP_TLBI_ALLE2: 859 + case OP_TLBI_ALLE2IS: 860 + case OP_TLBI_ALLE2OS: 861 + case OP_TLBI_VMALLE1: 862 + case OP_TLBI_VMALLE1IS: 863 + case OP_TLBI_VMALLE1OS: 864 + case OP_TLBI_ALLE2NXS: 865 + case OP_TLBI_ALLE2ISNXS: 866 + case OP_TLBI_ALLE2OSNXS: 867 + case OP_TLBI_VMALLE1NXS: 868 + case OP_TLBI_VMALLE1ISNXS: 869 + case OP_TLBI_VMALLE1OSNXS: 870 + scope->type = TLBI_ALL; 871 + break; 872 + case OP_TLBI_VAE2: 873 + case OP_TLBI_VAE2IS: 874 + case OP_TLBI_VAE2OS: 875 + case OP_TLBI_VAE1: 876 + case OP_TLBI_VAE1IS: 877 + case OP_TLBI_VAE1OS: 878 + case OP_TLBI_VAE2NXS: 879 + case OP_TLBI_VAE2ISNXS: 880 + case OP_TLBI_VAE2OSNXS: 881 + case OP_TLBI_VAE1NXS: 882 + case OP_TLBI_VAE1ISNXS: 883 + case OP_TLBI_VAE1OSNXS: 884 + case OP_TLBI_VALE2: 885 + case OP_TLBI_VALE2IS: 886 + case OP_TLBI_VALE2OS: 887 + case OP_TLBI_VALE1: 888 + case OP_TLBI_VALE1IS: 889 + case OP_TLBI_VALE1OS: 890 + case OP_TLBI_VALE2NXS: 891 + case OP_TLBI_VALE2ISNXS: 892 + case OP_TLBI_VALE2OSNXS: 893 + case OP_TLBI_VALE1NXS: 894 + case OP_TLBI_VALE1ISNXS: 895 + case OP_TLBI_VALE1OSNXS: 896 + scope->type = TLBI_VA; 897 + scope->size = ttl_to_size(FIELD_GET(TLBI_TTL_MASK, val)); 898 + if (!scope->size) 899 + scope->size = SZ_1G; 900 + scope->va = (val << 12) & ~(scope->size - 1); 901 + scope->asid = FIELD_GET(TLBIR_ASID_MASK, val); 902 + break; 903 + case OP_TLBI_ASIDE1: 904 + case OP_TLBI_ASIDE1IS: 905 + case OP_TLBI_ASIDE1OS: 906 + case OP_TLBI_ASIDE1NXS: 907 + case OP_TLBI_ASIDE1ISNXS: 908 + case OP_TLBI_ASIDE1OSNXS: 909 + scope->type = TLBI_ASID; 910 + scope->asid = FIELD_GET(TLBIR_ASID_MASK, val); 911 + break; 912 + case OP_TLBI_VAAE1: 913 + case OP_TLBI_VAAE1IS: 914 + case OP_TLBI_VAAE1OS: 915 + case OP_TLBI_VAAE1NXS: 916 + case OP_TLBI_VAAE1ISNXS: 917 + case OP_TLBI_VAAE1OSNXS: 918 + case OP_TLBI_VAALE1: 919 + case OP_TLBI_VAALE1IS: 920 + case OP_TLBI_VAALE1OS: 921 + case OP_TLBI_VAALE1NXS: 922 + case OP_TLBI_VAALE1ISNXS: 923 + case OP_TLBI_VAALE1OSNXS: 924 + scope->type = TLBI_VAA; 925 + scope->size = ttl_to_size(FIELD_GET(TLBI_TTL_MASK, val)); 926 + if (!scope->size) 927 + scope->size = SZ_1G; 928 + scope->va = (val << 12) & ~(scope->size - 1); 929 + break; 930 + case OP_TLBI_RVAE2: 931 + case OP_TLBI_RVAE2IS: 932 + case OP_TLBI_RVAE2OS: 933 + case OP_TLBI_RVAE1: 934 + case OP_TLBI_RVAE1IS: 935 + case OP_TLBI_RVAE1OS: 936 + case OP_TLBI_RVAE2NXS: 937 + case OP_TLBI_RVAE2ISNXS: 938 + case OP_TLBI_RVAE2OSNXS: 939 + case OP_TLBI_RVAE1NXS: 940 + case OP_TLBI_RVAE1ISNXS: 941 + case OP_TLBI_RVAE1OSNXS: 942 + case OP_TLBI_RVALE2: 943 + case OP_TLBI_RVALE2IS: 944 + case OP_TLBI_RVALE2OS: 945 + case OP_TLBI_RVALE1: 946 + case OP_TLBI_RVALE1IS: 947 + case OP_TLBI_RVALE1OS: 948 + case OP_TLBI_RVALE2NXS: 949 + case OP_TLBI_RVALE2ISNXS: 950 + case OP_TLBI_RVALE2OSNXS: 951 + case OP_TLBI_RVALE1NXS: 952 + case OP_TLBI_RVALE1ISNXS: 953 + case OP_TLBI_RVALE1OSNXS: 954 + scope->type = TLBI_VA; 955 + scope->va = decode_range_tlbi(val, &scope->size, &scope->asid); 956 + break; 957 + case OP_TLBI_RVAAE1: 958 + case OP_TLBI_RVAAE1IS: 959 + case OP_TLBI_RVAAE1OS: 960 + case OP_TLBI_RVAAE1NXS: 961 + case OP_TLBI_RVAAE1ISNXS: 962 + case OP_TLBI_RVAAE1OSNXS: 963 + case OP_TLBI_RVAALE1: 964 + case OP_TLBI_RVAALE1IS: 965 + case OP_TLBI_RVAALE1OS: 966 + case OP_TLBI_RVAALE1NXS: 967 + case OP_TLBI_RVAALE1ISNXS: 968 + case OP_TLBI_RVAALE1OSNXS: 969 + scope->type = TLBI_VAA; 970 + scope->va = decode_range_tlbi(val, &scope->size, NULL); 971 + break; 972 + } 973 + } 974 + 975 + void kvm_handle_s1e2_tlbi(struct kvm_vcpu *vcpu, u32 inst, u64 val) 976 + { 977 + struct s1e2_tlbi_scope scope = {}; 978 + 979 + compute_s1_tlbi_range(vcpu, inst, val, &scope); 980 + 981 + guard(write_lock)(&vcpu->kvm->mmu_lock); 982 + invalidate_vncr_va(vcpu->kvm, &scope); 983 + } 984 + 813 985 void kvm_nested_s2_wp(struct kvm *kvm) 814 986 { 815 987 int i; ··· 1061 755 if (kvm_s2_mmu_valid(mmu)) 1062 756 kvm_stage2_wp_range(mmu, 0, kvm_phys_size(mmu)); 1063 757 } 758 + 759 + kvm_invalidate_vncr_ipa(kvm, 0, BIT(kvm->arch.mmu.pgt->ia_bits)); 1064 760 } 1065 761 1066 762 void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block) ··· 1077 769 if (kvm_s2_mmu_valid(mmu)) 1078 770 kvm_stage2_unmap_range(mmu, 0, kvm_phys_size(mmu), may_block); 1079 771 } 772 + 773 + kvm_invalidate_vncr_ipa(kvm, 0, BIT(kvm->arch.mmu.pgt->ia_bits)); 1080 774 } 1081 775 1082 776 void kvm_nested_s2_flush(struct kvm *kvm) ··· 1109 799 kvm->arch.nested_mmus = NULL; 1110 800 kvm->arch.nested_mmus_size = 0; 1111 801 kvm_uninit_stage2_mmu(kvm); 802 + } 803 + 804 + /* 805 + * Dealing with VNCR_EL2 exposed by the *guest* is a complicated matter: 806 + * 807 + * - We introduce an internal representation of a vcpu-private TLB, 808 + * representing the mapping between the guest VA contained in VNCR_EL2, 809 + * the IPA the guest's EL2 PTs point to, and the actual PA this lives at. 810 + * 811 + * - On translation fault from a nested VNCR access, we create such a TLB. 812 + * If there is no mapping to describe, the guest inherits the fault. 813 + * Crucially, no actual mapping is done at this stage. 814 + * 815 + * - On vcpu_load() in a non-HYP context with HCR_EL2.NV==1, if the above 816 + * TLB exists, we map it in the fixmap for this CPU, and run with it. We 817 + * have to respect the permissions dictated by the guest, but not the 818 + * memory type (FWB is a must). 819 + * 820 + * - Note that we usually don't do a vcpu_load() on the back of a fault 821 + * (unless we are preempted), so the resolution of a translation fault 822 + * must go via a request that will map the VNCR page in the fixmap. 823 + * vcpu_load() might as well use the same mechanism. 824 + * 825 + * - On vcpu_put() in a non-HYP context with HCR_EL2.NV==1, if the TLB was 826 + * mapped, we unmap it. Yes it is that simple. The TLB still exists 827 + * though, and may be reused at a later load. 828 + * 829 + * - On permission fault, we simply forward the fault to the guest's EL2. 830 + * Get out of my way. 831 + * 832 + * - On any TLBI for the EL2&0 translation regime, we must find any TLB that 833 + * intersects with the TLBI request, invalidate it, and unmap the page 834 + * from the fixmap. Because we need to look at all the vcpu-private TLBs, 835 + * this requires some wide-ranging locking to ensure that nothing races 836 + * against it. This may require some refcounting to avoid the search when 837 + * no such TLB is present. 838 + * 839 + * - On MMU notifiers, we must invalidate our TLB in a similar way, but 840 + * looking at the IPA instead. The funny part is that there may not be a 841 + * stage-2 mapping for this page if L1 hasn't accessed it using LD/ST 842 + * instructions. 843 + */ 844 + 845 + int kvm_vcpu_allocate_vncr_tlb(struct kvm_vcpu *vcpu) 846 + { 847 + if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY)) 848 + return 0; 849 + 850 + vcpu->arch.vncr_tlb = kzalloc(sizeof(*vcpu->arch.vncr_tlb), 851 + GFP_KERNEL_ACCOUNT); 852 + if (!vcpu->arch.vncr_tlb) 853 + return -ENOMEM; 854 + 855 + return 0; 856 + } 857 + 858 + static u64 read_vncr_el2(struct kvm_vcpu *vcpu) 859 + { 860 + return (u64)sign_extend64(__vcpu_sys_reg(vcpu, VNCR_EL2), 48); 861 + } 862 + 863 + static int kvm_translate_vncr(struct kvm_vcpu *vcpu) 864 + { 865 + bool write_fault, writable; 866 + unsigned long mmu_seq; 867 + struct vncr_tlb *vt; 868 + struct page *page; 869 + u64 va, pfn, gfn; 870 + int ret; 871 + 872 + vt = vcpu->arch.vncr_tlb; 873 + 874 + /* 875 + * If we're about to walk the EL2 S1 PTs, we must invalidate the 876 + * current TLB, as it could be sampled from another vcpu doing a 877 + * TLBI *IS. A real CPU wouldn't do that, but we only keep a single 878 + * translation, so not much of a choice. 879 + * 880 + * We also prepare the next walk wilst we're at it. 881 + */ 882 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) { 883 + invalidate_vncr(vt); 884 + 885 + vt->wi = (struct s1_walk_info) { 886 + .regime = TR_EL20, 887 + .as_el0 = false, 888 + .pan = false, 889 + }; 890 + vt->wr = (struct s1_walk_result){}; 891 + } 892 + 893 + guard(srcu)(&vcpu->kvm->srcu); 894 + 895 + va = read_vncr_el2(vcpu); 896 + 897 + ret = __kvm_translate_va(vcpu, &vt->wi, &vt->wr, va); 898 + if (ret) 899 + return ret; 900 + 901 + write_fault = kvm_is_write_fault(vcpu); 902 + 903 + mmu_seq = vcpu->kvm->mmu_invalidate_seq; 904 + smp_rmb(); 905 + 906 + gfn = vt->wr.pa >> PAGE_SHIFT; 907 + pfn = kvm_faultin_pfn(vcpu, gfn, write_fault, &writable, &page); 908 + if (is_error_noslot_pfn(pfn) || (write_fault && !writable)) 909 + return -EFAULT; 910 + 911 + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) { 912 + if (mmu_invalidate_retry(vcpu->kvm, mmu_seq)) 913 + return -EAGAIN; 914 + 915 + vt->gva = va; 916 + vt->hpa = pfn << PAGE_SHIFT; 917 + vt->valid = true; 918 + vt->cpu = -1; 919 + 920 + kvm_make_request(KVM_REQ_MAP_L1_VNCR_EL2, vcpu); 921 + kvm_release_faultin_page(vcpu->kvm, page, false, vt->wr.pw); 922 + } 923 + 924 + if (vt->wr.pw) 925 + mark_page_dirty(vcpu->kvm, gfn); 926 + 927 + return 0; 928 + } 929 + 930 + static void inject_vncr_perm(struct kvm_vcpu *vcpu) 931 + { 932 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 933 + u64 esr = kvm_vcpu_get_esr(vcpu); 934 + 935 + /* Adjust the fault level to reflect that of the guest's */ 936 + esr &= ~ESR_ELx_FSC; 937 + esr |= FIELD_PREP(ESR_ELx_FSC, 938 + ESR_ELx_FSC_PERM_L(vt->wr.level)); 939 + 940 + kvm_inject_nested_sync(vcpu, esr); 941 + } 942 + 943 + static bool kvm_vncr_tlb_lookup(struct kvm_vcpu *vcpu) 944 + { 945 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 946 + 947 + lockdep_assert_held_read(&vcpu->kvm->mmu_lock); 948 + 949 + if (!vt->valid) 950 + return false; 951 + 952 + if (read_vncr_el2(vcpu) != vt->gva) 953 + return false; 954 + 955 + if (vt->wr.nG) { 956 + u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); 957 + u64 ttbr = ((tcr & TCR_A1) ? 958 + vcpu_read_sys_reg(vcpu, TTBR1_EL2) : 959 + vcpu_read_sys_reg(vcpu, TTBR0_EL2)); 960 + u16 asid; 961 + 962 + asid = FIELD_GET(TTBR_ASID_MASK, ttbr); 963 + if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || 964 + !(tcr & TCR_ASID16)) 965 + asid &= GENMASK(7, 0); 966 + 967 + return asid != vt->wr.asid; 968 + } 969 + 970 + return true; 971 + } 972 + 973 + int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu) 974 + { 975 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 976 + u64 esr = kvm_vcpu_get_esr(vcpu); 977 + 978 + BUG_ON(!(esr & ESR_ELx_VNCR_SHIFT)); 979 + 980 + if (esr_fsc_is_permission_fault(esr)) { 981 + inject_vncr_perm(vcpu); 982 + } else if (esr_fsc_is_translation_fault(esr)) { 983 + bool valid; 984 + int ret; 985 + 986 + scoped_guard(read_lock, &vcpu->kvm->mmu_lock) 987 + valid = kvm_vncr_tlb_lookup(vcpu); 988 + 989 + if (!valid) 990 + ret = kvm_translate_vncr(vcpu); 991 + else 992 + ret = -EPERM; 993 + 994 + switch (ret) { 995 + case -EAGAIN: 996 + case -ENOMEM: 997 + /* Let's try again... */ 998 + break; 999 + case -EFAULT: 1000 + case -EINVAL: 1001 + case -ENOENT: 1002 + case -EACCES: 1003 + /* 1004 + * Translation failed, inject the corresponding 1005 + * exception back to EL2. 1006 + */ 1007 + BUG_ON(!vt->wr.failed); 1008 + 1009 + esr &= ~ESR_ELx_FSC; 1010 + esr |= FIELD_PREP(ESR_ELx_FSC, vt->wr.fst); 1011 + 1012 + kvm_inject_nested_sync(vcpu, esr); 1013 + break; 1014 + case -EPERM: 1015 + /* Hack to deal with POE until we get kernel support */ 1016 + inject_vncr_perm(vcpu); 1017 + break; 1018 + case 0: 1019 + break; 1020 + } 1021 + } else { 1022 + WARN_ONCE(1, "Unhandled VNCR abort, ESR=%llx\n", esr); 1023 + } 1024 + 1025 + return 1; 1026 + } 1027 + 1028 + static void kvm_map_l1_vncr(struct kvm_vcpu *vcpu) 1029 + { 1030 + struct vncr_tlb *vt = vcpu->arch.vncr_tlb; 1031 + pgprot_t prot; 1032 + 1033 + guard(preempt)(); 1034 + guard(read_lock)(&vcpu->kvm->mmu_lock); 1035 + 1036 + /* 1037 + * The request to map VNCR may have raced against some other 1038 + * event, such as an interrupt, and may not be valid anymore. 1039 + */ 1040 + if (is_hyp_ctxt(vcpu)) 1041 + return; 1042 + 1043 + /* 1044 + * Check that the pseudo-TLB is valid and that VNCR_EL2 still 1045 + * contains the expected value. If it doesn't, we simply bail out 1046 + * without a mapping -- a transformed MSR/MRS will generate the 1047 + * fault and allows us to populate the pseudo-TLB. 1048 + */ 1049 + if (!vt->valid) 1050 + return; 1051 + 1052 + if (read_vncr_el2(vcpu) != vt->gva) 1053 + return; 1054 + 1055 + if (vt->wr.nG) { 1056 + u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); 1057 + u64 ttbr = ((tcr & TCR_A1) ? 1058 + vcpu_read_sys_reg(vcpu, TTBR1_EL2) : 1059 + vcpu_read_sys_reg(vcpu, TTBR0_EL2)); 1060 + u16 asid; 1061 + 1062 + asid = FIELD_GET(TTBR_ASID_MASK, ttbr); 1063 + if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || 1064 + !(tcr & TCR_ASID16)) 1065 + asid &= GENMASK(7, 0); 1066 + 1067 + if (asid != vt->wr.asid) 1068 + return; 1069 + } 1070 + 1071 + vt->cpu = smp_processor_id(); 1072 + 1073 + if (vt->wr.pw && vt->wr.pr) 1074 + prot = PAGE_KERNEL; 1075 + else if (vt->wr.pr) 1076 + prot = PAGE_KERNEL_RO; 1077 + else 1078 + prot = PAGE_NONE; 1079 + 1080 + /* 1081 + * We can't map write-only (or no permission at all) in the kernel, 1082 + * but the guest can do it if using POE, so we'll have to turn a 1083 + * translation fault into a permission fault at runtime. 1084 + * FIXME: WO doesn't work at all, need POE support in the kernel. 1085 + */ 1086 + if (pgprot_val(prot) != pgprot_val(PAGE_NONE)) { 1087 + __set_fixmap(vncr_fixmap(vt->cpu), vt->hpa, prot); 1088 + host_data_set_flag(L1_VNCR_MAPPED); 1089 + atomic_inc(&vcpu->kvm->arch.vncr_map_count); 1090 + } 1112 1091 } 1113 1092 1114 1093 /* ··· 1617 1018 set_sysreg_masks(kvm, VMPIDR_EL2, res0, res1); 1618 1019 1619 1020 /* HCR_EL2 */ 1620 - res0 = BIT(48); 1621 - res1 = HCR_RW; 1622 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, TWED, IMP)) 1623 - res0 |= GENMASK(63, 59); 1624 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, MTE, MTE2)) 1625 - res0 |= (HCR_TID5 | HCR_DCT | HCR_ATA); 1626 - if (!kvm_has_feat(kvm, ID_AA64MMFR2_EL1, EVT, TTLBxS)) 1627 - res0 |= (HCR_TTLBIS | HCR_TTLBOS); 1628 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, CSV2, CSV2_2) && 1629 - !kvm_has_feat(kvm, ID_AA64PFR1_EL1, CSV2_frac, CSV2_1p2)) 1630 - res0 |= HCR_ENSCXT; 1631 - if (!kvm_has_feat(kvm, ID_AA64MMFR2_EL1, EVT, IMP)) 1632 - res0 |= (HCR_TOCU | HCR_TICAB | HCR_TID4); 1633 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, AMU, V1P1)) 1634 - res0 |= HCR_AMVOFFEN; 1635 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, RAS, V1P1)) 1636 - res0 |= HCR_FIEN; 1637 - if (!kvm_has_feat(kvm, ID_AA64MMFR2_EL1, FWB, IMP)) 1638 - res0 |= HCR_FWB; 1639 - /* Implementation choice: NV2 is the only supported config */ 1640 - if (!kvm_has_feat(kvm, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY)) 1641 - res0 |= (HCR_NV2 | HCR_NV | HCR_AT); 1642 - if (!kvm_has_feat(kvm, ID_AA64MMFR4_EL1, E2H0, NI)) 1643 - res0 |= HCR_NV1; 1644 - if (!(kvm_vcpu_has_feature(kvm, KVM_ARM_VCPU_PTRAUTH_ADDRESS) && 1645 - kvm_vcpu_has_feature(kvm, KVM_ARM_VCPU_PTRAUTH_GENERIC))) 1646 - res0 |= (HCR_API | HCR_APK); 1647 - if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TME, IMP)) 1648 - res0 |= BIT(39); 1649 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, RAS, IMP)) 1650 - res0 |= (HCR_TEA | HCR_TERR); 1651 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, LO, IMP)) 1652 - res0 |= HCR_TLOR; 1653 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, VH, IMP)) 1654 - res0 |= HCR_E2H; 1655 - if (!kvm_has_feat(kvm, ID_AA64MMFR4_EL1, E2H0, IMP)) 1656 - res1 |= HCR_E2H; 1021 + get_reg_fixed_bits(kvm, HCR_EL2, &res0, &res1); 1657 1022 set_sysreg_masks(kvm, HCR_EL2, res0, res1); 1658 1023 1659 1024 /* HCRX_EL2 */ 1660 - res0 = HCRX_EL2_RES0; 1661 - res1 = HCRX_EL2_RES1; 1662 - if (!kvm_has_feat(kvm, ID_AA64ISAR3_EL1, PACM, TRIVIAL_IMP)) 1663 - res0 |= HCRX_EL2_PACMEn; 1664 - if (!kvm_has_feat(kvm, ID_AA64PFR2_EL1, FPMR, IMP)) 1665 - res0 |= HCRX_EL2_EnFPM; 1666 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, GCS, IMP)) 1667 - res0 |= HCRX_EL2_GCSEn; 1668 - if (!kvm_has_feat(kvm, ID_AA64ISAR2_EL1, SYSREG_128, IMP)) 1669 - res0 |= HCRX_EL2_EnIDCP128; 1670 - if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, ADERR, DEV_ASYNC)) 1671 - res0 |= (HCRX_EL2_EnSDERR | HCRX_EL2_EnSNERR); 1672 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, DF2, IMP)) 1673 - res0 |= HCRX_EL2_TMEA; 1674 - if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, D128, IMP)) 1675 - res0 |= HCRX_EL2_D128En; 1676 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, THE, IMP)) 1677 - res0 |= HCRX_EL2_PTTWI; 1678 - if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, SCTLRX, IMP)) 1679 - res0 |= HCRX_EL2_SCTLR2En; 1680 - if (!kvm_has_tcr2(kvm)) 1681 - res0 |= HCRX_EL2_TCR2En; 1682 - if (!kvm_has_feat(kvm, ID_AA64ISAR2_EL1, MOPS, IMP)) 1683 - res0 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2); 1684 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, CMOW, IMP)) 1685 - res0 |= HCRX_EL2_CMOW; 1686 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, NMI, IMP)) 1687 - res0 |= (HCRX_EL2_VFNMI | HCRX_EL2_VINMI | HCRX_EL2_TALLINT); 1688 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, SME, IMP) || 1689 - !(read_sysreg_s(SYS_SMIDR_EL1) & SMIDR_EL1_SMPS)) 1690 - res0 |= HCRX_EL2_SMPME; 1691 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, XS, IMP)) 1692 - res0 |= (HCRX_EL2_FGTnXS | HCRX_EL2_FnXS); 1693 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64_V)) 1694 - res0 |= HCRX_EL2_EnASR; 1695 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64)) 1696 - res0 |= HCRX_EL2_EnALS; 1697 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64_ACCDATA)) 1698 - res0 |= HCRX_EL2_EnAS0; 1025 + get_reg_fixed_bits(kvm, HCRX_EL2, &res0, &res1); 1699 1026 set_sysreg_masks(kvm, HCRX_EL2, res0, res1); 1700 1027 1701 1028 /* HFG[RW]TR_EL2 */ 1702 - res0 = res1 = 0; 1703 - if (!(kvm_vcpu_has_feature(kvm, KVM_ARM_VCPU_PTRAUTH_ADDRESS) && 1704 - kvm_vcpu_has_feature(kvm, KVM_ARM_VCPU_PTRAUTH_GENERIC))) 1705 - res0 |= (HFGxTR_EL2_APDAKey | HFGxTR_EL2_APDBKey | 1706 - HFGxTR_EL2_APGAKey | HFGxTR_EL2_APIAKey | 1707 - HFGxTR_EL2_APIBKey); 1708 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, LO, IMP)) 1709 - res0 |= (HFGxTR_EL2_LORC_EL1 | HFGxTR_EL2_LOREA_EL1 | 1710 - HFGxTR_EL2_LORID_EL1 | HFGxTR_EL2_LORN_EL1 | 1711 - HFGxTR_EL2_LORSA_EL1); 1712 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, CSV2, CSV2_2) && 1713 - !kvm_has_feat(kvm, ID_AA64PFR1_EL1, CSV2_frac, CSV2_1p2)) 1714 - res0 |= (HFGxTR_EL2_SCXTNUM_EL1 | HFGxTR_EL2_SCXTNUM_EL0); 1715 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, GIC, IMP)) 1716 - res0 |= HFGxTR_EL2_ICC_IGRPENn_EL1; 1717 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, RAS, IMP)) 1718 - res0 |= (HFGxTR_EL2_ERRIDR_EL1 | HFGxTR_EL2_ERRSELR_EL1 | 1719 - HFGxTR_EL2_ERXFR_EL1 | HFGxTR_EL2_ERXCTLR_EL1 | 1720 - HFGxTR_EL2_ERXSTATUS_EL1 | HFGxTR_EL2_ERXMISCn_EL1 | 1721 - HFGxTR_EL2_ERXPFGF_EL1 | HFGxTR_EL2_ERXPFGCTL_EL1 | 1722 - HFGxTR_EL2_ERXPFGCDN_EL1 | HFGxTR_EL2_ERXADDR_EL1); 1723 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, LS64, LS64_ACCDATA)) 1724 - res0 |= HFGxTR_EL2_nACCDATA_EL1; 1725 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, GCS, IMP)) 1726 - res0 |= (HFGxTR_EL2_nGCS_EL0 | HFGxTR_EL2_nGCS_EL1); 1727 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, SME, IMP)) 1728 - res0 |= (HFGxTR_EL2_nSMPRI_EL1 | HFGxTR_EL2_nTPIDR2_EL0); 1729 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, THE, IMP)) 1730 - res0 |= HFGxTR_EL2_nRCWMASK_EL1; 1731 - if (!kvm_has_s1pie(kvm)) 1732 - res0 |= (HFGxTR_EL2_nPIRE0_EL1 | HFGxTR_EL2_nPIR_EL1); 1733 - if (!kvm_has_s1poe(kvm)) 1734 - res0 |= (HFGxTR_EL2_nPOR_EL0 | HFGxTR_EL2_nPOR_EL1); 1735 - if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, S2POE, IMP)) 1736 - res0 |= HFGxTR_EL2_nS2POR_EL1; 1737 - if (!kvm_has_feat(kvm, ID_AA64MMFR3_EL1, AIE, IMP)) 1738 - res0 |= (HFGxTR_EL2_nMAIR2_EL1 | HFGxTR_EL2_nAMAIR2_EL1); 1739 - set_sysreg_masks(kvm, HFGRTR_EL2, res0 | __HFGRTR_EL2_RES0, res1); 1740 - set_sysreg_masks(kvm, HFGWTR_EL2, res0 | __HFGWTR_EL2_RES0, res1); 1029 + get_reg_fixed_bits(kvm, HFGRTR_EL2, &res0, &res1); 1030 + set_sysreg_masks(kvm, HFGRTR_EL2, res0, res1); 1031 + get_reg_fixed_bits(kvm, HFGWTR_EL2, &res0, &res1); 1032 + set_sysreg_masks(kvm, HFGWTR_EL2, res0, res1); 1741 1033 1742 1034 /* HDFG[RW]TR_EL2 */ 1743 - res0 = res1 = 0; 1744 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, DoubleLock, IMP)) 1745 - res0 |= HDFGRTR_EL2_OSDLR_EL1; 1746 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, IMP)) 1747 - res0 |= (HDFGRTR_EL2_PMEVCNTRn_EL0 | HDFGRTR_EL2_PMEVTYPERn_EL0 | 1748 - HDFGRTR_EL2_PMCCFILTR_EL0 | HDFGRTR_EL2_PMCCNTR_EL0 | 1749 - HDFGRTR_EL2_PMCNTEN | HDFGRTR_EL2_PMINTEN | 1750 - HDFGRTR_EL2_PMOVS | HDFGRTR_EL2_PMSELR_EL0 | 1751 - HDFGRTR_EL2_PMMIR_EL1 | HDFGRTR_EL2_PMUSERENR_EL0 | 1752 - HDFGRTR_EL2_PMCEIDn_EL0); 1753 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSVer, IMP)) 1754 - res0 |= (HDFGRTR_EL2_PMBLIMITR_EL1 | HDFGRTR_EL2_PMBPTR_EL1 | 1755 - HDFGRTR_EL2_PMBSR_EL1 | HDFGRTR_EL2_PMSCR_EL1 | 1756 - HDFGRTR_EL2_PMSEVFR_EL1 | HDFGRTR_EL2_PMSFCR_EL1 | 1757 - HDFGRTR_EL2_PMSICR_EL1 | HDFGRTR_EL2_PMSIDR_EL1 | 1758 - HDFGRTR_EL2_PMSIRR_EL1 | HDFGRTR_EL2_PMSLATFR_EL1 | 1759 - HDFGRTR_EL2_PMBIDR_EL1); 1760 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceVer, IMP)) 1761 - res0 |= (HDFGRTR_EL2_TRC | HDFGRTR_EL2_TRCAUTHSTATUS | 1762 - HDFGRTR_EL2_TRCAUXCTLR | HDFGRTR_EL2_TRCCLAIM | 1763 - HDFGRTR_EL2_TRCCNTVRn | HDFGRTR_EL2_TRCID | 1764 - HDFGRTR_EL2_TRCIMSPECn | HDFGRTR_EL2_TRCOSLSR | 1765 - HDFGRTR_EL2_TRCPRGCTLR | HDFGRTR_EL2_TRCSEQSTR | 1766 - HDFGRTR_EL2_TRCSSCSRn | HDFGRTR_EL2_TRCSTATR | 1767 - HDFGRTR_EL2_TRCVICTLR); 1768 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceBuffer, IMP)) 1769 - res0 |= (HDFGRTR_EL2_TRBBASER_EL1 | HDFGRTR_EL2_TRBIDR_EL1 | 1770 - HDFGRTR_EL2_TRBLIMITR_EL1 | HDFGRTR_EL2_TRBMAR_EL1 | 1771 - HDFGRTR_EL2_TRBPTR_EL1 | HDFGRTR_EL2_TRBSR_EL1 | 1772 - HDFGRTR_EL2_TRBTRG_EL1); 1773 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, BRBE, IMP)) 1774 - res0 |= (HDFGRTR_EL2_nBRBIDR | HDFGRTR_EL2_nBRBCTL | 1775 - HDFGRTR_EL2_nBRBDATA); 1776 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMSVer, V1P2)) 1777 - res0 |= HDFGRTR_EL2_nPMSNEVFR_EL1; 1778 - set_sysreg_masks(kvm, HDFGRTR_EL2, res0 | HDFGRTR_EL2_RES0, res1); 1779 - 1780 - /* Reuse the bits from the read-side and add the write-specific stuff */ 1781 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, PMUVer, IMP)) 1782 - res0 |= (HDFGWTR_EL2_PMCR_EL0 | HDFGWTR_EL2_PMSWINC_EL0); 1783 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceVer, IMP)) 1784 - res0 |= HDFGWTR_EL2_TRCOSLAR; 1785 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceFilt, IMP)) 1786 - res0 |= HDFGWTR_EL2_TRFCR_EL1; 1787 - set_sysreg_masks(kvm, HFGWTR_EL2, res0 | HDFGWTR_EL2_RES0, res1); 1035 + get_reg_fixed_bits(kvm, HDFGRTR_EL2, &res0, &res1); 1036 + set_sysreg_masks(kvm, HDFGRTR_EL2, res0, res1); 1037 + get_reg_fixed_bits(kvm, HDFGWTR_EL2, &res0, &res1); 1038 + set_sysreg_masks(kvm, HDFGWTR_EL2, res0, res1); 1788 1039 1789 1040 /* HFGITR_EL2 */ 1790 - res0 = HFGITR_EL2_RES0; 1791 - res1 = HFGITR_EL2_RES1; 1792 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, DPB, DPB2)) 1793 - res0 |= HFGITR_EL2_DCCVADP; 1794 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, PAN, PAN2)) 1795 - res0 |= (HFGITR_EL2_ATS1E1RP | HFGITR_EL2_ATS1E1WP); 1796 - if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS)) 1797 - res0 |= (HFGITR_EL2_TLBIRVAALE1OS | HFGITR_EL2_TLBIRVALE1OS | 1798 - HFGITR_EL2_TLBIRVAAE1OS | HFGITR_EL2_TLBIRVAE1OS | 1799 - HFGITR_EL2_TLBIVAALE1OS | HFGITR_EL2_TLBIVALE1OS | 1800 - HFGITR_EL2_TLBIVAAE1OS | HFGITR_EL2_TLBIASIDE1OS | 1801 - HFGITR_EL2_TLBIVAE1OS | HFGITR_EL2_TLBIVMALLE1OS); 1802 - if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE)) 1803 - res0 |= (HFGITR_EL2_TLBIRVAALE1 | HFGITR_EL2_TLBIRVALE1 | 1804 - HFGITR_EL2_TLBIRVAAE1 | HFGITR_EL2_TLBIRVAE1 | 1805 - HFGITR_EL2_TLBIRVAALE1IS | HFGITR_EL2_TLBIRVALE1IS | 1806 - HFGITR_EL2_TLBIRVAAE1IS | HFGITR_EL2_TLBIRVAE1IS | 1807 - HFGITR_EL2_TLBIRVAALE1OS | HFGITR_EL2_TLBIRVALE1OS | 1808 - HFGITR_EL2_TLBIRVAAE1OS | HFGITR_EL2_TLBIRVAE1OS); 1809 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, SPECRES, IMP)) 1810 - res0 |= (HFGITR_EL2_CFPRCTX | HFGITR_EL2_DVPRCTX | 1811 - HFGITR_EL2_CPPRCTX); 1812 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, BRBE, IMP)) 1813 - res0 |= (HFGITR_EL2_nBRBINJ | HFGITR_EL2_nBRBIALL); 1814 - if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, GCS, IMP)) 1815 - res0 |= (HFGITR_EL2_nGCSPUSHM_EL1 | HFGITR_EL2_nGCSSTR_EL1 | 1816 - HFGITR_EL2_nGCSEPP); 1817 - if (!kvm_has_feat(kvm, ID_AA64ISAR1_EL1, SPECRES, COSP_RCTX)) 1818 - res0 |= HFGITR_EL2_COSPRCTX; 1819 - if (!kvm_has_feat(kvm, ID_AA64ISAR2_EL1, ATS1A, IMP)) 1820 - res0 |= HFGITR_EL2_ATS1E1A; 1041 + get_reg_fixed_bits(kvm, HFGITR_EL2, &res0, &res1); 1821 1042 set_sysreg_masks(kvm, HFGITR_EL2, res0, res1); 1822 1043 1823 1044 /* HAFGRTR_EL2 - not a lot to see here */ 1824 - res0 = HAFGRTR_EL2_RES0; 1825 - res1 = HAFGRTR_EL2_RES1; 1826 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, AMU, V1P1)) 1827 - res0 |= ~(res0 | res1); 1045 + get_reg_fixed_bits(kvm, HAFGRTR_EL2, &res0, &res1); 1828 1046 set_sysreg_masks(kvm, HAFGRTR_EL2, res0, res1); 1047 + 1048 + /* HFG[RW]TR2_EL2 */ 1049 + get_reg_fixed_bits(kvm, HFGRTR2_EL2, &res0, &res1); 1050 + set_sysreg_masks(kvm, HFGRTR2_EL2, res0, res1); 1051 + get_reg_fixed_bits(kvm, HFGWTR2_EL2, &res0, &res1); 1052 + set_sysreg_masks(kvm, HFGWTR2_EL2, res0, res1); 1053 + 1054 + /* HDFG[RW]TR2_EL2 */ 1055 + get_reg_fixed_bits(kvm, HDFGRTR2_EL2, &res0, &res1); 1056 + set_sysreg_masks(kvm, HDFGRTR2_EL2, res0, res1); 1057 + get_reg_fixed_bits(kvm, HDFGWTR2_EL2, &res0, &res1); 1058 + set_sysreg_masks(kvm, HDFGWTR2_EL2, res0, res1); 1059 + 1060 + /* HFGITR2_EL2 */ 1061 + get_reg_fixed_bits(kvm, HFGITR2_EL2, &res0, &res1); 1062 + set_sysreg_masks(kvm, HFGITR2_EL2, res0, res1); 1829 1063 1830 1064 /* TCR2_EL2 */ 1831 1065 res0 = TCR2_EL2_RES0; ··· 1750 1318 res0 |= ICH_HCR_EL2_DVIM | ICH_HCR_EL2_vSGIEOICount; 1751 1319 set_sysreg_masks(kvm, ICH_HCR_EL2, res0, res1); 1752 1320 1321 + /* VNCR_EL2 */ 1322 + set_sysreg_masks(kvm, VNCR_EL2, VNCR_EL2_RES0, VNCR_EL2_RES1); 1323 + 1753 1324 out: 1754 1325 for (enum vcpu_sysreg sr = __SANITISED_REG_START__; sr < NR_SYS_REGS; sr++) 1755 1326 (void)__vcpu_sys_reg(vcpu, sr); ··· 1772 1337 } 1773 1338 write_unlock(&vcpu->kvm->mmu_lock); 1774 1339 } 1340 + 1341 + if (kvm_check_request(KVM_REQ_MAP_L1_VNCR_EL2, vcpu)) 1342 + kvm_map_l1_vncr(vcpu); 1775 1343 1776 1344 /* Must be last, as may switch context! */ 1777 1345 if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))

+71 -93

arch/arm64/kvm/pkvm.c

··· 5 5 */ 6 6 7 7 #include <linux/init.h> 8 + #include <linux/interval_tree_generic.h> 8 9 #include <linux/kmemleak.h> 9 10 #include <linux/kvm_host.h> 10 11 #include <asm/kvm_mmu.h> 11 12 #include <linux/memblock.h> 12 13 #include <linux/mutex.h> 13 - #include <linux/sort.h> 14 14 15 15 #include <asm/kvm_pkvm.h> 16 16 ··· 24 24 phys_addr_t hyp_mem_base; 25 25 phys_addr_t hyp_mem_size; 26 26 27 - static int cmp_hyp_memblock(const void *p1, const void *p2) 28 - { 29 - const struct memblock_region *r1 = p1; 30 - const struct memblock_region *r2 = p2; 31 - 32 - return r1->base < r2->base ? -1 : (r1->base > r2->base); 33 - } 34 - 35 - static void __init sort_memblock_regions(void) 36 - { 37 - sort(hyp_memory, 38 - *hyp_memblock_nr_ptr, 39 - sizeof(struct memblock_region), 40 - cmp_hyp_memblock, 41 - NULL); 42 - } 43 - 44 27 static int __init register_memblock_regions(void) 45 28 { 46 29 struct memblock_region *reg; ··· 35 52 hyp_memory[*hyp_memblock_nr_ptr] = *reg; 36 53 (*hyp_memblock_nr_ptr)++; 37 54 } 38 - sort_memblock_regions(); 39 55 40 56 return 0; 41 57 } ··· 61 79 hyp_mem_pages += host_s2_pgtable_pages(); 62 80 hyp_mem_pages += hyp_vm_table_pages(); 63 81 hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE); 82 + hyp_mem_pages += pkvm_selftest_pages(); 64 83 hyp_mem_pages += hyp_ffa_proxy_pages(); 65 84 66 85 /* ··· 245 262 * at, which would end badly once inaccessible. 246 263 */ 247 264 kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start); 265 + kmemleak_free_part(__hyp_data_start, __hyp_data_end - __hyp_data_start); 248 266 kmemleak_free_part(__hyp_rodata_start, __hyp_rodata_end - __hyp_rodata_start); 249 267 kmemleak_free_part_phys(hyp_mem_base, hyp_mem_size); 250 268 ··· 257 273 } 258 274 device_initcall_sync(finalize_pkvm); 259 275 260 - static int cmp_mappings(struct rb_node *node, const struct rb_node *parent) 276 + static u64 __pkvm_mapping_start(struct pkvm_mapping *m) 261 277 { 262 - struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node); 263 - struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node); 264 - 265 - if (a->gfn < b->gfn) 266 - return -1; 267 - if (a->gfn > b->gfn) 268 - return 1; 269 - return 0; 278 + return m->gfn * PAGE_SIZE; 270 279 } 271 280 272 - static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn) 281 + static u64 __pkvm_mapping_end(struct pkvm_mapping *m) 273 282 { 274 - struct rb_node *node = root->rb_node, *prev = NULL; 275 - struct pkvm_mapping *mapping; 276 - 277 - while (node) { 278 - mapping = rb_entry(node, struct pkvm_mapping, node); 279 - if (mapping->gfn == gfn) 280 - return node; 281 - prev = node; 282 - node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right; 283 - } 284 - 285 - return prev; 283 + return (m->gfn + m->nr_pages) * PAGE_SIZE - 1; 286 284 } 285 + 286 + INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last, 287 + __pkvm_mapping_start, __pkvm_mapping_end, static, 288 + pkvm_mapping); 287 289 288 290 /* 289 - * __tmp is updated to rb_next(__tmp) *before* entering the body of the loop to allow freeing 290 - * of __map inline. 291 + * __tmp is updated to iter_first(pkvm_mappings) *before* entering the body of the loop to allow 292 + * freeing of __map inline. 291 293 */ 292 294 #define for_each_mapping_in_range_safe(__pgt, __start, __end, __map) \ 293 - for (struct rb_node *__tmp = find_first_mapping_node(&(__pgt)->pkvm_mappings, \ 294 - ((__start) >> PAGE_SHIFT)); \ 295 + for (struct pkvm_mapping *__tmp = pkvm_mapping_iter_first(&(__pgt)->pkvm_mappings, \ 296 + __start, __end - 1); \ 295 297 __tmp && ({ \ 296 - __map = rb_entry(__tmp, struct pkvm_mapping, node); \ 297 - __tmp = rb_next(__tmp); \ 298 + __map = __tmp; \ 299 + __tmp = pkvm_mapping_iter_next(__map, __start, __end - 1); \ 298 300 true; \ 299 301 }); \ 300 - ) \ 301 - if (__map->gfn < ((__start) >> PAGE_SHIFT)) \ 302 - continue; \ 303 - else if (__map->gfn >= ((__end) >> PAGE_SHIFT)) \ 304 - break; \ 305 - else 302 + ) 306 303 307 304 int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, 308 305 struct kvm_pgtable_mm_ops *mm_ops) 309 306 { 310 - pgt->pkvm_mappings = RB_ROOT; 307 + pgt->pkvm_mappings = RB_ROOT_CACHED; 311 308 pgt->mmu = mmu; 309 + 310 + return 0; 311 + } 312 + 313 + static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end) 314 + { 315 + struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu); 316 + pkvm_handle_t handle = kvm->arch.pkvm.handle; 317 + struct pkvm_mapping *mapping; 318 + int ret; 319 + 320 + if (!handle) 321 + return 0; 322 + 323 + for_each_mapping_in_range_safe(pgt, start, end, mapping) { 324 + ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 325 + mapping->nr_pages); 326 + if (WARN_ON(ret)) 327 + return ret; 328 + pkvm_mapping_remove(mapping, &pgt->pkvm_mappings); 329 + kfree(mapping); 330 + } 312 331 313 332 return 0; 314 333 } 315 334 316 335 void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) 317 336 { 318 - struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu); 319 - pkvm_handle_t handle = kvm->arch.pkvm.handle; 320 - struct pkvm_mapping *mapping; 321 - struct rb_node *node; 322 - 323 - if (!handle) 324 - return; 325 - 326 - node = rb_first(&pgt->pkvm_mappings); 327 - while (node) { 328 - mapping = rb_entry(node, struct pkvm_mapping, node); 329 - kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn); 330 - node = rb_next(node); 331 - rb_erase(&mapping->node, &pgt->pkvm_mappings); 332 - kfree(mapping); 333 - } 337 + __pkvm_pgtable_stage2_unmap(pgt, 0, ~(0ULL)); 334 338 } 335 339 336 340 int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, ··· 332 360 u64 pfn = phys >> PAGE_SHIFT; 333 361 int ret; 334 362 335 - if (size != PAGE_SIZE) 363 + if (size != PAGE_SIZE && size != PMD_SIZE) 336 364 return -EINVAL; 337 365 338 366 lockdep_assert_held_write(&kvm->mmu_lock); 339 - ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot); 340 - if (ret) { 341 - /* Is the gfn already mapped due to a racing vCPU? */ 342 - if (ret == -EPERM) 367 + 368 + /* 369 + * Calling stage2_map() on top of existing mappings is either happening because of a race 370 + * with another vCPU, or because we're changing between page and block mappings. As per 371 + * user_mem_abort(), same-size permission faults are handled in the relax_perms() path. 372 + */ 373 + mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1); 374 + if (mapping) { 375 + if (size == (mapping->nr_pages * PAGE_SIZE)) 343 376 return -EAGAIN; 377 + 378 + /* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */ 379 + ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size); 380 + if (ret) 381 + return ret; 382 + mapping = NULL; 344 383 } 384 + 385 + ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot); 386 + if (WARN_ON(ret)) 387 + return ret; 345 388 346 389 swap(mapping, cache->mapping); 347 390 mapping->gfn = gfn; 348 391 mapping->pfn = pfn; 349 - WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm_mappings, cmp_mappings)); 392 + mapping->nr_pages = size / PAGE_SIZE; 393 + pkvm_mapping_insert(mapping, &pgt->pkvm_mappings); 350 394 351 395 return ret; 352 396 } 353 397 354 398 int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size) 355 399 { 356 - struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu); 357 - pkvm_handle_t handle = kvm->arch.pkvm.handle; 358 - struct pkvm_mapping *mapping; 359 - int ret = 0; 400 + lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock); 360 401 361 - lockdep_assert_held_write(&kvm->mmu_lock); 362 - for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) { 363 - ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn); 364 - if (WARN_ON(ret)) 365 - break; 366 - rb_erase(&mapping->node, &pgt->pkvm_mappings); 367 - kfree(mapping); 368 - } 369 - 370 - return ret; 402 + return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size); 371 403 } 372 404 373 405 int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size) ··· 383 407 384 408 lockdep_assert_held(&kvm->mmu_lock); 385 409 for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) { 386 - ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn); 410 + ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, 411 + mapping->nr_pages); 387 412 if (WARN_ON(ret)) 388 413 break; 389 414 } ··· 399 422 400 423 lockdep_assert_held(&kvm->mmu_lock); 401 424 for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) 402 - __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE); 425 + __clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), 426 + PAGE_SIZE * mapping->nr_pages); 403 427 404 428 return 0; 405 429 } ··· 415 437 lockdep_assert_held(&kvm->mmu_lock); 416 438 for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) 417 439 young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn, 418 - mkold); 440 + mapping->nr_pages, mkold); 419 441 420 442 return young; 421 443 }

+52 -8

arch/arm64/kvm/pmu-emul.c

··· 280 280 return 0; 281 281 282 282 hpmn = SYS_FIELD_GET(MDCR_EL2, HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); 283 - n = vcpu->kvm->arch.pmcr_n; 283 + n = vcpu->kvm->arch.nr_pmu_counters; 284 284 285 285 /* 286 286 * Programming HPMN to a value greater than PMCR_EL0.N is ··· 608 608 kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0); 609 609 610 610 if (val & ARMV8_PMU_PMCR_P) { 611 - /* 612 - * Unlike other PMU sysregs, the controls in PMCR_EL0 always apply 613 - * to the 'guest' range of counters and never the 'hyp' range. 614 - */ 615 611 unsigned long mask = kvm_pmu_implemented_counter_mask(vcpu) & 616 - ~kvm_pmu_hyp_counter_mask(vcpu) & 617 612 ~BIT(ARMV8_PMU_CYCLE_IDX); 613 + 614 + if (!vcpu_is_el2(vcpu)) 615 + mask &= ~kvm_pmu_hyp_counter_mask(vcpu); 618 616 619 617 for_each_set_bit(i, &mask, 32) 620 618 kvm_pmu_set_pmc_value(kvm_vcpu_idx_to_pmc(vcpu, i), 0, true); ··· 1025 1027 return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); 1026 1028 } 1027 1029 1030 + static void kvm_arm_set_nr_counters(struct kvm *kvm, unsigned int nr) 1031 + { 1032 + kvm->arch.nr_pmu_counters = nr; 1033 + 1034 + /* Reset MDCR_EL2.HPMN behind the vcpus' back... */ 1035 + if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) { 1036 + struct kvm_vcpu *vcpu; 1037 + unsigned long i; 1038 + 1039 + kvm_for_each_vcpu(i, vcpu, kvm) { 1040 + u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2); 1041 + val &= ~MDCR_EL2_HPMN; 1042 + val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.nr_pmu_counters); 1043 + __vcpu_sys_reg(vcpu, MDCR_EL2) = val; 1044 + } 1045 + } 1046 + } 1047 + 1028 1048 static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) 1029 1049 { 1030 1050 lockdep_assert_held(&kvm->arch.config_lock); 1031 1051 1032 1052 kvm->arch.arm_pmu = arm_pmu; 1033 - kvm->arch.pmcr_n = kvm_arm_pmu_get_max_counters(kvm); 1053 + kvm_arm_set_nr_counters(kvm, kvm_arm_pmu_get_max_counters(kvm)); 1034 1054 } 1035 1055 1036 1056 /** ··· 1102 1086 1103 1087 mutex_unlock(&arm_pmus_lock); 1104 1088 return ret; 1089 + } 1090 + 1091 + static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n) 1092 + { 1093 + struct kvm *kvm = vcpu->kvm; 1094 + 1095 + if (!kvm->arch.arm_pmu) 1096 + return -EINVAL; 1097 + 1098 + if (n > kvm_arm_pmu_get_max_counters(kvm)) 1099 + return -EINVAL; 1100 + 1101 + kvm_arm_set_nr_counters(kvm, n); 1102 + return 0; 1105 1103 } 1106 1104 1107 1105 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) ··· 1214 1184 1215 1185 return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); 1216 1186 } 1187 + case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: { 1188 + unsigned int __user *uaddr = (unsigned int __user *)(long)attr->addr; 1189 + unsigned int n; 1190 + 1191 + if (get_user(n, uaddr)) 1192 + return -EFAULT; 1193 + 1194 + return kvm_arm_pmu_v3_set_nr_counters(vcpu, n); 1195 + } 1217 1196 case KVM_ARM_VCPU_PMU_V3_INIT: 1218 1197 return kvm_arm_pmu_v3_init(vcpu); 1219 1198 } ··· 1261 1222 case KVM_ARM_VCPU_PMU_V3_INIT: 1262 1223 case KVM_ARM_VCPU_PMU_V3_FILTER: 1263 1224 case KVM_ARM_VCPU_PMU_V3_SET_PMU: 1225 + case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS: 1264 1226 if (kvm_vcpu_has_pmu(vcpu)) 1265 1227 return 0; 1266 1228 } ··· 1300 1260 u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) 1301 1261 { 1302 1262 u64 pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0); 1263 + u64 n = vcpu->kvm->arch.nr_pmu_counters; 1303 1264 1304 - return u64_replace_bits(pmcr, vcpu->kvm->arch.pmcr_n, ARMV8_PMU_PMCR_N); 1265 + if (vcpu_has_nv(vcpu) && !vcpu_is_el2(vcpu)) 1266 + n = FIELD_GET(MDCR_EL2_HPMN, __vcpu_sys_reg(vcpu, MDCR_EL2)); 1267 + 1268 + return u64_replace_bits(pmcr, n, ARMV8_PMU_PMCR_N); 1305 1269 } 1306 1270 1307 1271 void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu)

+2

arch/arm64/kvm/reset.c

··· 158 158 if (sve_state) 159 159 kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu)); 160 160 kfree(sve_state); 161 + free_page((unsigned long)vcpu->arch.ctxt.vncr_array); 162 + kfree(vcpu->arch.vncr_tlb); 161 163 kfree(vcpu->arch.ccsidr); 162 164 } 163 165

+146 -127

arch/arm64/kvm/sys_regs.c

··· 785 785 static u64 reset_pmu_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) 786 786 { 787 787 u64 mask = BIT(ARMV8_PMU_CYCLE_IDX); 788 - u8 n = vcpu->kvm->arch.pmcr_n; 788 + u8 n = vcpu->kvm->arch.nr_pmu_counters; 789 789 790 790 if (n) 791 791 mask |= GENMASK(n - 1, 0); ··· 1216 1216 * with the existing KVM behavior. 1217 1217 */ 1218 1218 if (!kvm_vm_has_ran_once(kvm) && 1219 + !vcpu_has_nv(vcpu) && 1219 1220 new_n <= kvm_arm_pmu_get_max_counters(kvm)) 1220 - kvm->arch.pmcr_n = new_n; 1221 + kvm->arch.nr_pmu_counters = new_n; 1221 1222 1222 1223 mutex_unlock(&kvm->arch.config_lock); 1223 1224 ··· 1601 1600 val = sanitise_id_aa64pfr0_el1(vcpu, val); 1602 1601 break; 1603 1602 case SYS_ID_AA64PFR1_EL1: 1604 - if (!kvm_has_mte(vcpu->kvm)) 1603 + if (!kvm_has_mte(vcpu->kvm)) { 1605 1604 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); 1605 + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); 1606 + } 1606 1607 1607 1608 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); 1608 1609 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_RNDR_trap); 1609 1610 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_NMI); 1610 - val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac); 1611 1611 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_GCS); 1612 1612 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_THE); 1613 1613 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX); ··· 1961 1959 { 1962 1960 u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); 1963 1961 u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK; 1962 + u8 mte = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE, hw_val); 1963 + u8 user_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, user_val); 1964 + u8 hw_mte_frac = SYS_FIELD_GET(ID_AA64PFR1_EL1, MTE_frac, hw_val); 1964 1965 1965 1966 /* See set_id_aa64pfr0_el1 for comment about MPAM */ 1966 1967 if ((hw_val & mpam_mask) == (user_val & mpam_mask)) 1967 1968 user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK; 1969 + 1970 + /* 1971 + * Previously MTE_frac was hidden from guest. However, if the 1972 + * hardware supports MTE2 but not MTE_ASYM_FAULT then a value 1973 + * of 0 for this field indicates that the hardware supports 1974 + * MTE_ASYNC. Whereas, 0xf indicates MTE_ASYNC is not supported. 1975 + * 1976 + * As KVM must accept values from KVM provided by user-space, 1977 + * when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set 1978 + * ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid 1979 + * incorrectly claiming hardware support for MTE_ASYNC in the 1980 + * guest. 1981 + */ 1982 + 1983 + if (mte == ID_AA64PFR1_EL1_MTE_MTE2 && 1984 + hw_mte_frac == ID_AA64PFR1_EL1_MTE_frac_NI && 1985 + user_mte_frac == ID_AA64PFR1_EL1_MTE_frac_ASYNC) { 1986 + user_val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; 1987 + user_val |= hw_val & ID_AA64PFR1_EL1_MTE_frac_MASK; 1988 + } 1968 1989 1969 1990 return set_id_reg(vcpu, rd, user_val); 1970 1991 } ··· 2312 2287 "trap of EL2 register redirected to EL1"); 2313 2288 } 2314 2289 2315 - #define EL2_REG(name, acc, rst, v) { \ 2316 - SYS_DESC(SYS_##name), \ 2317 - .access = acc, \ 2318 - .reset = rst, \ 2319 - .reg = name, \ 2320 - .visibility = el2_visibility, \ 2321 - .val = v, \ 2322 - } 2323 - 2324 2290 #define EL2_REG_FILTERED(name, acc, rst, v, filter) { \ 2325 2291 SYS_DESC(SYS_##name), \ 2326 2292 .access = acc, \ ··· 2320 2304 .visibility = filter, \ 2321 2305 .val = v, \ 2322 2306 } 2307 + 2308 + #define EL2_REG(name, acc, rst, v) \ 2309 + EL2_REG_FILTERED(name, acc, rst, v, el2_visibility) 2323 2310 2324 2311 #define EL2_REG_VNCR(name, rst, v) EL2_REG(name, bad_vncr_trap, rst, v) 2325 2312 #define EL2_REG_REDIR(name, rst, v) EL2_REG(name, bad_redir_trap, rst, v) ··· 2471 2452 return __el2_visibility(vcpu, rd, sve_visibility); 2472 2453 } 2473 2454 2455 + static unsigned int vncr_el2_visibility(const struct kvm_vcpu *vcpu, 2456 + const struct sys_reg_desc *rd) 2457 + { 2458 + if (el2_visibility(vcpu, rd) == 0 && 2459 + kvm_has_feat(vcpu->kvm, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY)) 2460 + return 0; 2461 + 2462 + return REG_HIDDEN; 2463 + } 2464 + 2474 2465 static bool access_zcr_el2(struct kvm_vcpu *vcpu, 2475 2466 struct sys_reg_params *p, 2476 2467 const struct sys_reg_desc *r) ··· 2605 2576 struct sys_reg_params *p, 2606 2577 const struct sys_reg_desc *r) 2607 2578 { 2608 - u64 old = __vcpu_sys_reg(vcpu, MDCR_EL2); 2579 + u64 hpmn, val, old = __vcpu_sys_reg(vcpu, MDCR_EL2); 2609 2580 2610 - if (!access_rw(vcpu, p, r)) 2611 - return false; 2581 + if (!p->is_write) { 2582 + p->regval = old; 2583 + return true; 2584 + } 2585 + 2586 + val = p->regval; 2587 + hpmn = FIELD_GET(MDCR_EL2_HPMN, val); 2612 2588 2613 2589 /* 2614 - * Request a reload of the PMU to enable/disable the counters affected 2615 - * by HPME. 2590 + * If HPMN is out of bounds, limit it to what we actually 2591 + * support. This matches the UNKNOWN definition of the field 2592 + * in that case, and keeps the emulation simple. Sort of. 2616 2593 */ 2617 - if ((old ^ __vcpu_sys_reg(vcpu, MDCR_EL2)) & MDCR_EL2_HPME) 2594 + if (hpmn > vcpu->kvm->arch.nr_pmu_counters) { 2595 + hpmn = vcpu->kvm->arch.nr_pmu_counters; 2596 + u64_replace_bits(val, hpmn, MDCR_EL2_HPMN); 2597 + } 2598 + 2599 + __vcpu_sys_reg(vcpu, MDCR_EL2) = val; 2600 + 2601 + /* 2602 + * Request a reload of the PMU to enable/disable the counters 2603 + * affected by HPME. 2604 + */ 2605 + if ((old ^ val) & MDCR_EL2_HPME) 2618 2606 kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu); 2619 2607 2620 2608 return true; ··· 2750 2704 .set_user = set_imp_id_reg, \ 2751 2705 .reset = reset_imp_id_reg, \ 2752 2706 .val = mask, \ 2707 + } 2708 + 2709 + static u64 reset_mdcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) 2710 + { 2711 + __vcpu_sys_reg(vcpu, r->reg) = vcpu->kvm->arch.nr_pmu_counters; 2712 + return vcpu->kvm->arch.nr_pmu_counters; 2753 2713 } 2754 2714 2755 2715 /* ··· 3301 3249 EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1), 3302 3250 EL2_REG(ACTLR_EL2, access_rw, reset_val, 0), 3303 3251 EL2_REG_VNCR(HCR_EL2, reset_hcr, 0), 3304 - EL2_REG(MDCR_EL2, access_mdcr, reset_val, 0), 3252 + EL2_REG(MDCR_EL2, access_mdcr, reset_mdcr, 0), 3305 3253 EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_NVHE_EL2_RES1), 3306 3254 EL2_REG_VNCR(HSTR_EL2, reset_val, 0), 3307 3255 EL2_REG_VNCR(HFGRTR_EL2, reset_val, 0), ··· 3321 3269 tcr2_el2_visibility), 3322 3270 EL2_REG_VNCR(VTTBR_EL2, reset_val, 0), 3323 3271 EL2_REG_VNCR(VTCR_EL2, reset_val, 0), 3272 + EL2_REG_FILTERED(VNCR_EL2, bad_vncr_trap, reset_val, 0, 3273 + vncr_el2_visibility), 3324 3274 3325 3275 { SYS_DESC(SYS_DACR32_EL2), undef_access, reset_unknown, DACR32_EL2 }, 3326 3276 EL2_REG_VNCR(HDFGRTR_EL2, reset_val, 0), ··· 3606 3552 { 3607 3553 u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2); 3608 3554 u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2); 3609 - u64 base, range, tg, num, scale; 3610 - int shift; 3555 + u64 base, range; 3611 3556 3612 3557 if (!kvm_supported_tlbi_ipas2_op(vcpu, sys_encoding)) 3613 3558 return undef_access(vcpu, p, r); ··· 3616 3563 * of the guest's S2 (different base granule size, for example), we 3617 3564 * decide to ignore TTL and only use the described range. 3618 3565 */ 3619 - tg = FIELD_GET(GENMASK(47, 46), p->regval); 3620 - scale = FIELD_GET(GENMASK(45, 44), p->regval); 3621 - num = FIELD_GET(GENMASK(43, 39), p->regval); 3622 - base = p->regval & GENMASK(36, 0); 3623 - 3624 - switch(tg) { 3625 - case 1: 3626 - shift = 12; 3627 - break; 3628 - case 2: 3629 - shift = 14; 3630 - break; 3631 - case 3: 3632 - default: /* IMPDEF: handle tg==0 as 64k */ 3633 - shift = 16; 3634 - break; 3635 - } 3636 - 3637 - base <<= shift; 3638 - range = __TLBI_RANGE_PAGES(num, scale) << shift; 3566 + base = decode_range_tlbi(p->regval, &range, NULL); 3639 3567 3640 3568 kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr), 3641 3569 &(union tlbi_info) { ··· 3682 3648 WARN_ON(__kvm_tlbi_s1e2(mmu, info->va.addr, info->va.encoding)); 3683 3649 } 3684 3650 3651 + static bool handle_tlbi_el2(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 3652 + const struct sys_reg_desc *r) 3653 + { 3654 + u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2); 3655 + 3656 + if (!kvm_supported_tlbi_s1e2_op(vcpu, sys_encoding)) 3657 + return undef_access(vcpu, p, r); 3658 + 3659 + kvm_handle_s1e2_tlbi(vcpu, sys_encoding, p->regval); 3660 + return true; 3661 + } 3662 + 3685 3663 static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 3686 3664 const struct sys_reg_desc *r) 3687 3665 { 3688 3666 u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2); 3689 - u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2); 3690 3667 3691 3668 /* 3692 3669 * If we're here, this is because we've trapped on a EL1 TLBI ··· 3708 3663 * - HCR_EL2.E2H == 0 : a non-VHE guest 3709 3664 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode 3710 3665 * 3666 + * Another possibility is that we are invalidating the EL2 context 3667 + * using EL1 instructions, but that we landed here because we need 3668 + * additional invalidation for structures that are not held in the 3669 + * CPU TLBs (such as the VNCR pseudo-TLB and its EL2 mapping). In 3670 + * that case, we are guaranteed that HCR_EL2.{E2H,TGE} == { 1, 1 } 3671 + * as we don't allow an NV-capable L1 in a nVHE configuration. 3672 + * 3711 3673 * We don't expect these helpers to ever be called when running 3712 3674 * in a vEL1 context. 3713 3675 */ ··· 3724 3672 if (!kvm_supported_tlbi_s1e1_op(vcpu, sys_encoding)) 3725 3673 return undef_access(vcpu, p, r); 3726 3674 3727 - kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr), 3675 + if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)) { 3676 + kvm_handle_s1e2_tlbi(vcpu, sys_encoding, p->regval); 3677 + return true; 3678 + } 3679 + 3680 + kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, 3681 + get_vmid(__vcpu_sys_reg(vcpu, VTTBR_EL2)), 3728 3682 &(union tlbi_info) { 3729 3683 .va = { 3730 3684 .addr = p->regval, ··· 3852 3794 SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is), 3853 3795 SYS_INSN(TLBI_RIPAS2LE1IS, handle_ripas2e1is), 3854 3796 3855 - SYS_INSN(TLBI_ALLE2OS, undef_access), 3856 - SYS_INSN(TLBI_VAE2OS, undef_access), 3797 + SYS_INSN(TLBI_ALLE2OS, handle_tlbi_el2), 3798 + SYS_INSN(TLBI_VAE2OS, handle_tlbi_el2), 3857 3799 SYS_INSN(TLBI_ALLE1OS, handle_alle1is), 3858 - SYS_INSN(TLBI_VALE2OS, undef_access), 3800 + SYS_INSN(TLBI_VALE2OS, handle_tlbi_el2), 3859 3801 SYS_INSN(TLBI_VMALLS12E1OS, handle_vmalls12e1is), 3860 3802 3861 - SYS_INSN(TLBI_RVAE2IS, undef_access), 3862 - SYS_INSN(TLBI_RVALE2IS, undef_access), 3803 + SYS_INSN(TLBI_RVAE2IS, handle_tlbi_el2), 3804 + SYS_INSN(TLBI_RVALE2IS, handle_tlbi_el2), 3805 + SYS_INSN(TLBI_ALLE2IS, handle_tlbi_el2), 3806 + SYS_INSN(TLBI_VAE2IS, handle_tlbi_el2), 3863 3807 3864 3808 SYS_INSN(TLBI_ALLE1IS, handle_alle1is), 3809 + 3810 + SYS_INSN(TLBI_VALE2IS, handle_tlbi_el2), 3811 + 3865 3812 SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is), 3866 3813 SYS_INSN(TLBI_IPAS2E1OS, handle_ipas2e1is), 3867 3814 SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is), ··· 3876 3813 SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is), 3877 3814 SYS_INSN(TLBI_RIPAS2LE1, handle_ripas2e1is), 3878 3815 SYS_INSN(TLBI_RIPAS2LE1OS, handle_ripas2e1is), 3879 - SYS_INSN(TLBI_RVAE2OS, undef_access), 3880 - SYS_INSN(TLBI_RVALE2OS, undef_access), 3881 - SYS_INSN(TLBI_RVAE2, undef_access), 3882 - SYS_INSN(TLBI_RVALE2, undef_access), 3816 + SYS_INSN(TLBI_RVAE2OS, handle_tlbi_el2), 3817 + SYS_INSN(TLBI_RVALE2OS, handle_tlbi_el2), 3818 + SYS_INSN(TLBI_RVAE2, handle_tlbi_el2), 3819 + SYS_INSN(TLBI_RVALE2, handle_tlbi_el2), 3820 + SYS_INSN(TLBI_ALLE2, handle_tlbi_el2), 3821 + SYS_INSN(TLBI_VAE2, handle_tlbi_el2), 3822 + 3883 3823 SYS_INSN(TLBI_ALLE1, handle_alle1is), 3824 + 3825 + SYS_INSN(TLBI_VALE2, handle_tlbi_el2), 3826 + 3884 3827 SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is), 3885 3828 3886 3829 SYS_INSN(TLBI_IPAS2E1ISNXS, handle_ipas2e1is), ··· 3894 3825 SYS_INSN(TLBI_IPAS2LE1ISNXS, handle_ipas2e1is), 3895 3826 SYS_INSN(TLBI_RIPAS2LE1ISNXS, handle_ripas2e1is), 3896 3827 3897 - SYS_INSN(TLBI_ALLE2OSNXS, undef_access), 3898 - SYS_INSN(TLBI_VAE2OSNXS, undef_access), 3828 + SYS_INSN(TLBI_ALLE2OSNXS, handle_tlbi_el2), 3829 + SYS_INSN(TLBI_VAE2OSNXS, handle_tlbi_el2), 3899 3830 SYS_INSN(TLBI_ALLE1OSNXS, handle_alle1is), 3900 - SYS_INSN(TLBI_VALE2OSNXS, undef_access), 3831 + SYS_INSN(TLBI_VALE2OSNXS, handle_tlbi_el2), 3901 3832 SYS_INSN(TLBI_VMALLS12E1OSNXS, handle_vmalls12e1is), 3902 3833 3903 - SYS_INSN(TLBI_RVAE2ISNXS, undef_access), 3904 - SYS_INSN(TLBI_RVALE2ISNXS, undef_access), 3905 - SYS_INSN(TLBI_ALLE2ISNXS, undef_access), 3906 - SYS_INSN(TLBI_VAE2ISNXS, undef_access), 3834 + SYS_INSN(TLBI_RVAE2ISNXS, handle_tlbi_el2), 3835 + SYS_INSN(TLBI_RVALE2ISNXS, handle_tlbi_el2), 3836 + SYS_INSN(TLBI_ALLE2ISNXS, handle_tlbi_el2), 3837 + SYS_INSN(TLBI_VAE2ISNXS, handle_tlbi_el2), 3907 3838 3908 3839 SYS_INSN(TLBI_ALLE1ISNXS, handle_alle1is), 3909 - SYS_INSN(TLBI_VALE2ISNXS, undef_access), 3840 + SYS_INSN(TLBI_VALE2ISNXS, handle_tlbi_el2), 3910 3841 SYS_INSN(TLBI_VMALLS12E1ISNXS, handle_vmalls12e1is), 3911 3842 SYS_INSN(TLBI_IPAS2E1OSNXS, handle_ipas2e1is), 3912 3843 SYS_INSN(TLBI_IPAS2E1NXS, handle_ipas2e1is), ··· 3916 3847 SYS_INSN(TLBI_IPAS2LE1NXS, handle_ipas2e1is), 3917 3848 SYS_INSN(TLBI_RIPAS2LE1NXS, handle_ripas2e1is), 3918 3849 SYS_INSN(TLBI_RIPAS2LE1OSNXS, handle_ripas2e1is), 3919 - SYS_INSN(TLBI_RVAE2OSNXS, undef_access), 3920 - SYS_INSN(TLBI_RVALE2OSNXS, undef_access), 3921 - SYS_INSN(TLBI_RVAE2NXS, undef_access), 3922 - SYS_INSN(TLBI_RVALE2NXS, undef_access), 3923 - SYS_INSN(TLBI_ALLE2NXS, undef_access), 3924 - SYS_INSN(TLBI_VAE2NXS, undef_access), 3850 + SYS_INSN(TLBI_RVAE2OSNXS, handle_tlbi_el2), 3851 + SYS_INSN(TLBI_RVALE2OSNXS, handle_tlbi_el2), 3852 + SYS_INSN(TLBI_RVAE2NXS, handle_tlbi_el2), 3853 + SYS_INSN(TLBI_RVALE2NXS, handle_tlbi_el2), 3854 + SYS_INSN(TLBI_ALLE2NXS, handle_tlbi_el2), 3855 + SYS_INSN(TLBI_VAE2NXS, handle_tlbi_el2), 3925 3856 SYS_INSN(TLBI_ALLE1NXS, handle_alle1is), 3926 - SYS_INSN(TLBI_VALE2NXS, undef_access), 3857 + SYS_INSN(TLBI_VALE2NXS, handle_tlbi_el2), 3927 3858 SYS_INSN(TLBI_VMALLS12E1NXS, handle_vmalls12e1is), 3928 3859 }; 3929 3860 ··· 5222 5153 if (test_bit(KVM_ARCH_FLAG_FGU_INITIALIZED, &kvm->arch.flags)) 5223 5154 goto out; 5224 5155 5225 - kvm->arch.fgu[HFGxTR_GROUP] = (HFGxTR_EL2_nAMAIR2_EL1 | 5226 - HFGxTR_EL2_nMAIR2_EL1 | 5227 - HFGxTR_EL2_nS2POR_EL1 | 5228 - HFGxTR_EL2_nACCDATA_EL1 | 5229 - HFGxTR_EL2_nSMPRI_EL1_MASK | 5230 - HFGxTR_EL2_nTPIDR2_EL0_MASK); 5231 - 5232 - if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS)) 5233 - kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_TLBIRVAALE1OS| 5234 - HFGITR_EL2_TLBIRVALE1OS | 5235 - HFGITR_EL2_TLBIRVAAE1OS | 5236 - HFGITR_EL2_TLBIRVAE1OS | 5237 - HFGITR_EL2_TLBIVAALE1OS | 5238 - HFGITR_EL2_TLBIVALE1OS | 5239 - HFGITR_EL2_TLBIVAAE1OS | 5240 - HFGITR_EL2_TLBIASIDE1OS | 5241 - HFGITR_EL2_TLBIVAE1OS | 5242 - HFGITR_EL2_TLBIVMALLE1OS); 5243 - 5244 - if (!kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE)) 5245 - kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_TLBIRVAALE1 | 5246 - HFGITR_EL2_TLBIRVALE1 | 5247 - HFGITR_EL2_TLBIRVAAE1 | 5248 - HFGITR_EL2_TLBIRVAE1 | 5249 - HFGITR_EL2_TLBIRVAALE1IS| 5250 - HFGITR_EL2_TLBIRVALE1IS | 5251 - HFGITR_EL2_TLBIRVAAE1IS | 5252 - HFGITR_EL2_TLBIRVAE1IS | 5253 - HFGITR_EL2_TLBIRVAALE1OS| 5254 - HFGITR_EL2_TLBIRVALE1OS | 5255 - HFGITR_EL2_TLBIRVAAE1OS | 5256 - HFGITR_EL2_TLBIRVAE1OS); 5257 - 5258 - if (!kvm_has_feat(kvm, ID_AA64ISAR2_EL1, ATS1A, IMP)) 5259 - kvm->arch.fgu[HFGITR_GROUP] |= HFGITR_EL2_ATS1E1A; 5260 - 5261 - if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, PAN, PAN2)) 5262 - kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_ATS1E1RP | 5263 - HFGITR_EL2_ATS1E1WP); 5264 - 5265 - if (!kvm_has_s1pie(kvm)) 5266 - kvm->arch.fgu[HFGxTR_GROUP] |= (HFGxTR_EL2_nPIRE0_EL1 | 5267 - HFGxTR_EL2_nPIR_EL1); 5268 - 5269 - if (!kvm_has_s1poe(kvm)) 5270 - kvm->arch.fgu[HFGxTR_GROUP] |= (HFGxTR_EL2_nPOR_EL1 | 5271 - HFGxTR_EL2_nPOR_EL0); 5272 - 5273 - if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, AMU, IMP)) 5274 - kvm->arch.fgu[HAFGRTR_GROUP] |= ~(HAFGRTR_EL2_RES0 | 5275 - HAFGRTR_EL2_RES1); 5276 - 5277 - if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, BRBE, IMP)) { 5278 - kvm->arch.fgu[HDFGRTR_GROUP] |= (HDFGRTR_EL2_nBRBDATA | 5279 - HDFGRTR_EL2_nBRBCTL | 5280 - HDFGRTR_EL2_nBRBIDR); 5281 - kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_nBRBINJ | 5282 - HFGITR_EL2_nBRBIALL); 5283 - } 5156 + compute_fgu(kvm, HFGRTR_GROUP); 5157 + compute_fgu(kvm, HFGITR_GROUP); 5158 + compute_fgu(kvm, HDFGRTR_GROUP); 5159 + compute_fgu(kvm, HAFGRTR_GROUP); 5160 + compute_fgu(kvm, HFGRTR2_GROUP); 5161 + compute_fgu(kvm, HFGITR2_GROUP); 5162 + compute_fgu(kvm, HDFGRTR2_GROUP); 5284 5163 5285 5164 set_bit(KVM_ARCH_FLAG_FGU_INITIALIZED, &kvm->arch.flags); 5286 5165 out: ··· 5285 5268 init_imp_id_regs(); 5286 5269 5287 5270 ret = populate_nv_trap_config(); 5271 + 5272 + check_feature_map(); 5288 5273 5289 5274 for (i = 0; !ret && i < ARRAY_SIZE(sys_reg_descs); i++) 5290 5275 ret = populate_sysreg_config(sys_reg_descs + i, i);

+3 -3

arch/arm64/kvm/trace_arm.h

··· 176 176 ), 177 177 178 178 TP_printk("S/W flush at 0x%016lx (cache %s)", 179 - __entry->vcpu_pc, __entry->cache ? "on" : "off") 179 + __entry->vcpu_pc, str_on_off(__entry->cache)) 180 180 ); 181 181 182 182 TRACE_EVENT(kvm_toggle_cache, ··· 196 196 ), 197 197 198 198 TP_printk("VM op at 0x%016lx (cache was %s, now %s)", 199 - __entry->vcpu_pc, __entry->was ? "on" : "off", 200 - __entry->now ? "on" : "off") 199 + __entry->vcpu_pc, str_on_off(__entry->was), 200 + str_on_off(__entry->now)) 201 201 ); 202 202 203 203 /*

+224

arch/arm64/kvm/vgic/vgic-debug.c

··· 320 320 void vgic_debug_destroy(struct kvm *kvm) 321 321 { 322 322 } 323 + 324 + /** 325 + * struct vgic_its_iter - Iterator for traversing VGIC ITS device tables. 326 + * @dev: Pointer to the current its_device being processed. 327 + * @ite: Pointer to the current its_ite within the device being processed. 328 + * 329 + * This structure is used to maintain the current position during iteration 330 + * over the ITS device tables. It holds pointers to both the current device 331 + * and the current ITE within that device. 332 + */ 333 + struct vgic_its_iter { 334 + struct its_device *dev; 335 + struct its_ite *ite; 336 + }; 337 + 338 + /** 339 + * end_of_iter - Checks if the iterator has reached the end. 340 + * @iter: The iterator to check. 341 + * 342 + * When the iterator completed processing the final ITE in the last device 343 + * table, it was marked to indicate the end of iteration by setting its 344 + * device and ITE pointers to NULL. 345 + * This function checks whether the iterator was marked as end. 346 + * 347 + * Return: True if the iterator is marked as end, false otherwise. 348 + */ 349 + static inline bool end_of_iter(struct vgic_its_iter *iter) 350 + { 351 + return !iter->dev && !iter->ite; 352 + } 353 + 354 + /** 355 + * vgic_its_iter_next - Advances the iterator to the next entry in the ITS tables. 356 + * @its: The VGIC ITS structure. 357 + * @iter: The iterator to advance. 358 + * 359 + * This function moves the iterator to the next ITE within the current device, 360 + * or to the first ITE of the next device if the current ITE is the last in 361 + * the device. If the current device is the last device, the iterator is set 362 + * to indicate the end of iteration. 363 + */ 364 + static void vgic_its_iter_next(struct vgic_its *its, struct vgic_its_iter *iter) 365 + { 366 + struct its_device *dev = iter->dev; 367 + struct its_ite *ite = iter->ite; 368 + 369 + if (!ite || list_is_last(&ite->ite_list, &dev->itt_head)) { 370 + if (list_is_last(&dev->dev_list, &its->device_list)) { 371 + dev = NULL; 372 + ite = NULL; 373 + } else { 374 + dev = list_next_entry(dev, dev_list); 375 + ite = list_first_entry_or_null(&dev->itt_head, 376 + struct its_ite, 377 + ite_list); 378 + } 379 + } else { 380 + ite = list_next_entry(ite, ite_list); 381 + } 382 + 383 + iter->dev = dev; 384 + iter->ite = ite; 385 + } 386 + 387 + /** 388 + * vgic_its_debug_start - Start function for the seq_file interface. 389 + * @s: The seq_file structure. 390 + * @pos: The starting position (offset). 391 + * 392 + * This function initializes the iterator to the beginning of the ITS tables 393 + * and advances it to the specified position. It acquires the its_lock mutex 394 + * to protect shared data. 395 + * 396 + * Return: An iterator pointer on success, NULL if no devices are found or 397 + * the end of the list is reached, or ERR_PTR(-ENOMEM) on memory 398 + * allocation failure. 399 + */ 400 + static void *vgic_its_debug_start(struct seq_file *s, loff_t *pos) 401 + { 402 + struct vgic_its *its = s->private; 403 + struct vgic_its_iter *iter; 404 + struct its_device *dev; 405 + loff_t offset = *pos; 406 + 407 + mutex_lock(&its->its_lock); 408 + 409 + dev = list_first_entry_or_null(&its->device_list, 410 + struct its_device, dev_list); 411 + if (!dev) 412 + return NULL; 413 + 414 + iter = kmalloc(sizeof(*iter), GFP_KERNEL); 415 + if (!iter) 416 + return ERR_PTR(-ENOMEM); 417 + 418 + iter->dev = dev; 419 + iter->ite = list_first_entry_or_null(&dev->itt_head, 420 + struct its_ite, ite_list); 421 + 422 + while (!end_of_iter(iter) && offset--) 423 + vgic_its_iter_next(its, iter); 424 + 425 + if (end_of_iter(iter)) { 426 + kfree(iter); 427 + return NULL; 428 + } 429 + 430 + return iter; 431 + } 432 + 433 + /** 434 + * vgic_its_debug_next - Next function for the seq_file interface. 435 + * @s: The seq_file structure. 436 + * @v: The current iterator. 437 + * @pos: The current position (offset). 438 + * 439 + * This function advances the iterator to the next entry and increments the 440 + * position. 441 + * 442 + * Return: An iterator pointer on success, or NULL if the end of the list is 443 + * reached. 444 + */ 445 + static void *vgic_its_debug_next(struct seq_file *s, void *v, loff_t *pos) 446 + { 447 + struct vgic_its *its = s->private; 448 + struct vgic_its_iter *iter = v; 449 + 450 + ++*pos; 451 + vgic_its_iter_next(its, iter); 452 + 453 + if (end_of_iter(iter)) { 454 + kfree(iter); 455 + return NULL; 456 + } 457 + return iter; 458 + } 459 + 460 + /** 461 + * vgic_its_debug_stop - Stop function for the seq_file interface. 462 + * @s: The seq_file structure. 463 + * @v: The current iterator. 464 + * 465 + * This function frees the iterator and releases the its_lock mutex. 466 + */ 467 + static void vgic_its_debug_stop(struct seq_file *s, void *v) 468 + { 469 + struct vgic_its *its = s->private; 470 + struct vgic_its_iter *iter = v; 471 + 472 + if (!IS_ERR_OR_NULL(iter)) 473 + kfree(iter); 474 + mutex_unlock(&its->its_lock); 475 + } 476 + 477 + /** 478 + * vgic_its_debug_show - Show function for the seq_file interface. 479 + * @s: The seq_file structure. 480 + * @v: The current iterator. 481 + * 482 + * This function formats and prints the ITS table entry information to the 483 + * seq_file output. 484 + * 485 + * Return: 0 on success. 486 + */ 487 + static int vgic_its_debug_show(struct seq_file *s, void *v) 488 + { 489 + struct vgic_its_iter *iter = v; 490 + struct its_device *dev = iter->dev; 491 + struct its_ite *ite = iter->ite; 492 + 493 + if (list_is_first(&ite->ite_list, &dev->itt_head)) { 494 + seq_printf(s, "\n"); 495 + seq_printf(s, "Device ID: 0x%x, Event ID Range: [0 - %llu]\n", 496 + dev->device_id, BIT_ULL(dev->num_eventid_bits) - 1); 497 + seq_printf(s, "EVENT_ID INTID HWINTID TARGET COL_ID HW\n"); 498 + seq_printf(s, "-----------------------------------------------\n"); 499 + } 500 + 501 + if (ite && ite->irq && ite->collection) { 502 + seq_printf(s, "%8u %8u %8u %8u %8u %2d\n", 503 + ite->event_id, ite->irq->intid, ite->irq->hwintid, 504 + ite->collection->target_addr, 505 + ite->collection->collection_id, ite->irq->hw); 506 + } 507 + 508 + return 0; 509 + } 510 + 511 + static const struct seq_operations vgic_its_debug_sops = { 512 + .start = vgic_its_debug_start, 513 + .next = vgic_its_debug_next, 514 + .stop = vgic_its_debug_stop, 515 + .show = vgic_its_debug_show 516 + }; 517 + 518 + DEFINE_SEQ_ATTRIBUTE(vgic_its_debug); 519 + 520 + /** 521 + * vgic_its_debug_init - Initializes the debugfs interface for VGIC ITS. 522 + * @dev: The KVM device structure. 523 + * 524 + * This function creates a debugfs file named "vgic-its-state@%its_base" 525 + * to expose the ITS table information. 526 + * 527 + * Return: 0 on success. 528 + */ 529 + int vgic_its_debug_init(struct kvm_device *dev) 530 + { 531 + struct vgic_its *its = dev->private; 532 + char *name; 533 + 534 + name = kasprintf(GFP_KERNEL, "vgic-its-state@%llx", (u64)its->vgic_its_base); 535 + if (!name) 536 + return -ENOMEM; 537 + 538 + debugfs_create_file(name, 0444, dev->kvm->debugfs_dentry, its, &vgic_its_debug_fops); 539 + 540 + kfree(name); 541 + return 0; 542 + } 543 + 544 + void vgic_its_debug_destroy(struct kvm_device *dev) 545 + { 546 + }

+8 -31

arch/arm64/kvm/vgic/vgic-its.c

··· 154 154 return irq; 155 155 } 156 156 157 - struct its_device { 158 - struct list_head dev_list; 159 - 160 - /* the head for the list of ITTEs */ 161 - struct list_head itt_head; 162 - u32 num_eventid_bits; 163 - gpa_t itt_addr; 164 - u32 device_id; 165 - }; 166 - 167 - #define COLLECTION_NOT_MAPPED ((u32)~0) 168 - 169 - struct its_collection { 170 - struct list_head coll_list; 171 - 172 - u32 collection_id; 173 - u32 target_addr; 174 - }; 175 - 176 - #define its_is_collection_mapped(coll) ((coll) && \ 177 - ((coll)->target_addr != COLLECTION_NOT_MAPPED)) 178 - 179 - struct its_ite { 180 - struct list_head ite_list; 181 - 182 - struct vgic_irq *irq; 183 - struct its_collection *collection; 184 - u32 event_id; 185 - }; 186 - 187 157 /** 188 158 * struct vgic_its_abi - ITS abi ops and settings 189 159 * @cte_esz: collection table entry size ··· 1908 1938 1909 1939 mutex_lock(&its->its_lock); 1910 1940 1941 + vgic_its_debug_destroy(kvm_dev); 1942 + 1911 1943 vgic_its_free_device_list(kvm, its); 1912 1944 vgic_its_free_collection_list(kvm, its); 1913 1945 vgic_its_invalidate_cache(its); ··· 2743 2771 if (ret) 2744 2772 return ret; 2745 2773 2746 - return vgic_register_its_iodev(dev->kvm, its, addr); 2774 + ret = vgic_register_its_iodev(dev->kvm, its, addr); 2775 + if (ret) 2776 + return ret; 2777 + 2778 + return vgic_its_debug_init(dev); 2779 + 2747 2780 } 2748 2781 case KVM_DEV_ARM_VGIC_GRP_CTRL: 2749 2782 return vgic_its_ctrl(dev->kvm, its, attr->attr);

-3

arch/arm64/kvm/vgic/vgic-v3-nested.c

··· 240 240 goto next; 241 241 } 242 242 243 - /* It is illegal to have the EOI bit set with HW */ 244 - lr &= ~ICH_LR_EOI; 245 - 246 243 /* Translate the virtual mapping to the real one */ 247 244 lr &= ~ICH_LR_PHYS_ID_MASK; 248 245 lr |= FIELD_PREP(ICH_LR_PHYS_ID_MASK, (u64)irq->hwintid);

+33

arch/arm64/kvm/vgic/vgic.h

··· 172 172 gpa_t addr; 173 173 }; 174 174 175 + struct its_device { 176 + struct list_head dev_list; 177 + 178 + /* the head for the list of ITTEs */ 179 + struct list_head itt_head; 180 + u32 num_eventid_bits; 181 + gpa_t itt_addr; 182 + u32 device_id; 183 + }; 184 + 185 + #define COLLECTION_NOT_MAPPED ((u32)~0) 186 + 187 + struct its_collection { 188 + struct list_head coll_list; 189 + 190 + u32 collection_id; 191 + u32 target_addr; 192 + }; 193 + 194 + #define its_is_collection_mapped(coll) ((coll) && \ 195 + ((coll)->target_addr != COLLECTION_NOT_MAPPED)) 196 + 197 + struct its_ite { 198 + struct list_head ite_list; 199 + 200 + struct vgic_irq *irq; 201 + struct its_collection *collection; 202 + u32 event_id; 203 + }; 204 + 175 205 int vgic_v3_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr, 176 206 struct vgic_reg_attr *reg_attr); 177 207 int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr, ··· 388 358 void vgic_v3_put_nested(struct kvm_vcpu *vcpu); 389 359 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu); 390 360 void vgic_v3_nested_update_mi(struct kvm_vcpu *vcpu); 361 + 362 + int vgic_its_debug_init(struct kvm_device *dev); 363 + void vgic_its_debug_destroy(struct kvm_device *dev); 391 364 392 365 #endif

+2

arch/arm64/tools/cpucaps

··· 28 28 HAS_EVT 29 29 HAS_FPMR 30 30 HAS_FGT 31 + HAS_FGT2 31 32 HAS_FPSIMD 32 33 HAS_GCS 33 34 HAS_GENERIC_AUTH ··· 95 94 WORKAROUND_2645198 96 95 WORKAROUND_2658417 97 96 WORKAROUND_AMPERE_AC03_CPU_38 97 + WORKAROUND_AMPERE_AC04_CPU_23 98 98 WORKAROUND_TRBE_OVERWRITE_FILL_MODE 99 99 WORKAROUND_TSB_FLUSH_FAILURE 100 100 WORKAROUND_TRBE_WRITE_OUT_OF_RANGE

+963 -49

arch/arm64/tools/sysreg

··· 101 101 Field 31:0 DTRTX 102 102 EndSysreg 103 103 104 + Sysreg MDSELR_EL1 2 0 0 4 2 105 + Res0 63:6 106 + Field 5:4 BANK 107 + Res0 3:0 108 + EndSysreg 109 + 110 + Sysreg MDSTEPOP_EL1 2 0 0 5 2 111 + Res0 63:32 112 + Field 31:0 OPCODE 113 + EndSysreg 114 + 104 115 Sysreg OSECCR_EL1 2 0 0 6 2 105 116 Res0 63:32 106 117 Field 31:0 EDECCR ··· 120 109 Sysreg OSLAR_EL1 2 0 1 0 4 121 110 Res0 63:1 122 111 Field 0 OSLK 112 + EndSysreg 113 + 114 + Sysreg SPMACCESSR_EL1 2 0 9 13 3 115 + UnsignedEnum 63:62 P31 116 + 0b00 TRAP_RW 117 + 0b01 TRAP_W 118 + 0b11 NOTRAP 119 + EndEnum 120 + UnsignedEnum 61:60 P30 121 + 0b00 TRAP_RW 122 + 0b01 TRAP_W 123 + 0b11 NOTRAP 124 + EndEnum 125 + UnsignedEnum 59:58 P29 126 + 0b00 TRAP_RW 127 + 0b01 TRAP_W 128 + 0b11 NOTRAP 129 + EndEnum 130 + UnsignedEnum 57:56 P28 131 + 0b00 TRAP_RW 132 + 0b01 TRAP_W 133 + 0b11 NOTRAP 134 + EndEnum 135 + UnsignedEnum 55:54 P27 136 + 0b00 TRAP_RW 137 + 0b01 TRAP_W 138 + 0b11 NOTRAP 139 + EndEnum 140 + UnsignedEnum 53:52 P26 141 + 0b00 TRAP_RW 142 + 0b01 TRAP_W 143 + 0b11 NOTRAP 144 + EndEnum 145 + UnsignedEnum 51:50 P25 146 + 0b00 TRAP_RW 147 + 0b01 TRAP_W 148 + 0b11 NOTRAP 149 + EndEnum 150 + UnsignedEnum 49:48 P24 151 + 0b00 TRAP_RW 152 + 0b01 TRAP_W 153 + 0b11 NOTRAP 154 + EndEnum 155 + UnsignedEnum 47:46 P23 156 + 0b00 TRAP_RW 157 + 0b01 TRAP_W 158 + 0b11 NOTRAP 159 + EndEnum 160 + UnsignedEnum 45:44 P22 161 + 0b00 TRAP_RW 162 + 0b01 TRAP_W 163 + 0b11 NOTRAP 164 + EndEnum 165 + UnsignedEnum 43:42 P21 166 + 0b00 TRAP_RW 167 + 0b01 TRAP_W 168 + 0b11 NOTRAP 169 + EndEnum 170 + UnsignedEnum 41:40 P20 171 + 0b00 TRAP_RW 172 + 0b01 TRAP_W 173 + 0b11 NOTRAP 174 + EndEnum 175 + UnsignedEnum 39:38 P19 176 + 0b00 TRAP_RW 177 + 0b01 TRAP_W 178 + 0b11 NOTRAP 179 + EndEnum 180 + UnsignedEnum 37:36 P18 181 + 0b00 TRAP_RW 182 + 0b01 TRAP_W 183 + 0b11 NOTRAP 184 + EndEnum 185 + UnsignedEnum 35:34 P17 186 + 0b00 TRAP_RW 187 + 0b01 TRAP_W 188 + 0b11 NOTRAP 189 + EndEnum 190 + UnsignedEnum 33:32 P16 191 + 0b00 TRAP_RW 192 + 0b01 TRAP_W 193 + 0b11 NOTRAP 194 + EndEnum 195 + UnsignedEnum 31:30 P15 196 + 0b00 TRAP_RW 197 + 0b01 TRAP_W 198 + 0b11 NOTRAP 199 + EndEnum 200 + UnsignedEnum 29:28 P14 201 + 0b00 TRAP_RW 202 + 0b01 TRAP_W 203 + 0b11 NOTRAP 204 + EndEnum 205 + UnsignedEnum 27:26 P13 206 + 0b00 TRAP_RW 207 + 0b01 TRAP_W 208 + 0b11 NOTRAP 209 + EndEnum 210 + UnsignedEnum 25:24 P12 211 + 0b00 TRAP_RW 212 + 0b01 TRAP_W 213 + 0b11 NOTRAP 214 + EndEnum 215 + UnsignedEnum 23:22 P11 216 + 0b00 TRAP_RW 217 + 0b01 TRAP_W 218 + 0b11 NOTRAP 219 + EndEnum 220 + UnsignedEnum 21:20 P10 221 + 0b00 TRAP_RW 222 + 0b01 TRAP_W 223 + 0b11 NOTRAP 224 + EndEnum 225 + UnsignedEnum 19:18 P9 226 + 0b00 TRAP_RW 227 + 0b01 TRAP_W 228 + 0b11 NOTRAP 229 + EndEnum 230 + UnsignedEnum 17:16 P8 231 + 0b00 TRAP_RW 232 + 0b01 TRAP_W 233 + 0b11 NOTRAP 234 + EndEnum 235 + UnsignedEnum 15:14 P7 236 + 0b00 TRAP_RW 237 + 0b01 TRAP_W 238 + 0b11 NOTRAP 239 + EndEnum 240 + UnsignedEnum 13:12 P6 241 + 0b00 TRAP_RW 242 + 0b01 TRAP_W 243 + 0b11 NOTRAP 244 + EndEnum 245 + UnsignedEnum 11:10 P5 246 + 0b00 TRAP_RW 247 + 0b01 TRAP_W 248 + 0b11 NOTRAP 249 + EndEnum 250 + UnsignedEnum 9:8 P4 251 + 0b00 TRAP_RW 252 + 0b01 TRAP_W 253 + 0b11 NOTRAP 254 + EndEnum 255 + UnsignedEnum 7:6 P3 256 + 0b00 TRAP_RW 257 + 0b01 TRAP_W 258 + 0b11 NOTRAP 259 + EndEnum 260 + UnsignedEnum 5:4 P2 261 + 0b00 TRAP_RW 262 + 0b01 TRAP_W 263 + 0b11 NOTRAP 264 + EndEnum 265 + UnsignedEnum 3:2 P1 266 + 0b00 TRAP_RW 267 + 0b01 TRAP_W 268 + 0b11 NOTRAP 269 + EndEnum 270 + UnsignedEnum 1:0 P0 271 + 0b00 TRAP_RW 272 + 0b01 TRAP_W 273 + 0b11 NOTRAP 274 + EndEnum 275 + EndSysreg 276 + 277 + Sysreg SPMACCESSR_EL12 2 5 9 13 3 278 + Mapping SPMACCESSR_EL1 279 + EndSysreg 280 + 281 + Sysreg SPMIIDR_EL1 2 0 9 13 4 282 + Res0 63:32 283 + Field 31:20 ProductID 284 + Field 19:16 Variant 285 + Field 15:12 Revision 286 + Field 11:0 Implementer 287 + EndSysreg 288 + 289 + Sysreg SPMDEVARCH_EL1 2 0 9 13 5 290 + Res0 63:32 291 + Field 31:21 ARCHITECT 292 + Field 20 PRESENT 293 + Field 19:16 REVISION 294 + Field 15:12 ARCHVER 295 + Field 11:0 ARCHPART 296 + EndSysreg 297 + 298 + Sysreg SPMDEVAFF_EL1 2 0 9 13 6 299 + Res0 63:40 300 + Field 39:32 Aff3 301 + Field 31 F0V 302 + Field 30 U 303 + Res0 29:25 304 + Field 24 MT 305 + Field 23:16 Aff2 306 + Field 15:8 Aff1 307 + Field 7:0 Aff0 308 + EndSysreg 309 + 310 + Sysreg SPMCFGR_EL1 2 0 9 13 7 311 + Res0 63:32 312 + Field 31:28 NCG 313 + Res0 27:25 314 + Field 24 HDBG 315 + Field 23 TRO 316 + Field 22 SS 317 + Field 21 FZO 318 + Field 20 MSI 319 + Field 19 RAO 320 + Res0 18 321 + Field 17 NA 322 + Field 16 EX 323 + Field 15:14 RAZ 324 + Field 13:8 SIZE 325 + Field 7:0 N 326 + EndSysreg 327 + 328 + Sysreg SPMINTENSET_EL1 2 0 9 14 1 329 + Field 63:0 P 330 + EndSysreg 331 + 332 + Sysreg SPMINTENCLR_EL1 2 0 9 14 2 333 + Field 63:0 P 334 + EndSysreg 335 + 336 + Sysreg PMCCNTSVR_EL1 2 0 14 11 7 337 + Field 63:0 CCNT 338 + EndSysreg 339 + 340 + Sysreg PMICNTSVR_EL1 2 0 14 12 0 341 + Field 63:0 ICNT 342 + EndSysreg 343 + 344 + Sysreg SPMCR_EL0 2 3 9 12 0 345 + Res0 63:12 346 + Field 11 TRO 347 + Field 10 HDBG 348 + Field 9 FZO 349 + Field 8 NA 350 + Res0 7:5 351 + Field 4 EX 352 + Res0 3:2 353 + Field 1 P 354 + Field 0 E 355 + EndSysreg 356 + 357 + Sysreg SPMCNTENSET_EL0 2 3 9 12 1 358 + Field 63:0 P 359 + EndSysreg 360 + 361 + Sysreg SPMCNTENCLR_EL0 2 3 9 12 2 362 + Field 63:0 P 363 + EndSysreg 364 + 365 + Sysreg SPMOVSCLR_EL0 2 3 9 12 3 366 + Field 63:0 P 367 + EndSysreg 368 + 369 + Sysreg SPMZR_EL0 2 3 9 12 4 370 + Field 63:0 P 371 + EndSysreg 372 + 373 + Sysreg SPMSELR_EL0 2 3 9 12 5 374 + Res0 63:10 375 + Field 9:4 SYSPMUSEL 376 + Res0 3:2 377 + Field 1:0 BANK 378 + EndSysreg 379 + 380 + Sysreg SPMOVSSET_EL0 2 3 9 14 3 381 + Field 63:0 P 382 + EndSysreg 383 + 384 + Sysreg SPMSCR_EL1 2 7 9 14 7 385 + Field 63:32 IMPDEF 386 + Field 31 RAO 387 + Res0 30:5 388 + Field 4 NAO 389 + Res0 3:1 390 + Field 0 SO 123 391 EndSysreg 124 392 125 393 Sysreg ID_PFR0_EL1 3 0 0 1 0 ··· 1197 907 0b0000 NI 1198 908 0b0001 IMP 1199 909 0b0010 V1P1 910 + 0b0011 V2 1200 911 EndEnum 1201 912 UnsignedEnum 27:24 GIC 1202 913 0b0000 NI ··· 1757 1466 0b0001 LS64 1758 1467 0b0010 LS64_V 1759 1468 0b0011 LS64_ACCDATA 1469 + 0b0100 LS64WB 1760 1470 EndEnum 1761 1471 UnsignedEnum 59:56 XS 1762 1472 0b0000 NI ··· 2237 1945 EndSysreg 2238 1946 2239 1947 Sysreg ID_AA64MMFR4_EL1 3 0 0 7 4 2240 - Res0 63:40 1948 + Res0 63:48 1949 + UnsignedEnum 47:44 SRMASK 1950 + 0b0000 NI 1951 + 0b0001 IMP 1952 + EndEnum 1953 + Res0 43:40 2241 1954 UnsignedEnum 39:36 E3DSE 2242 1955 0b0000 NI 2243 1956 0b0001 IMP 2244 1957 EndEnum 2245 - Res0 35:28 1958 + Res0 35:32 1959 + UnsignedEnum 31:28 RMEGDI 1960 + 0b0000 NI 1961 + 0b0001 IMP 1962 + EndEnum 2246 1963 SignedEnum 27:24 E2H0 2247 1964 0b0000 IMP 2248 1965 0b1110 NI_NV1 ··· 2260 1959 UnsignedEnum 23:20 NV_frac 2261 1960 0b0000 NV_NV2 2262 1961 0b0001 NV2_ONLY 1962 + 0b0010 NV2P1 2263 1963 EndEnum 2264 1964 UnsignedEnum 19:16 FGWTE3 2265 1965 0b0000 NI ··· 2280 1978 0b0010 ToELx 2281 1979 0b1111 ANY 2282 1980 EndEnum 2283 - Res0 3:0 1981 + UnsignedEnum 3:0 PoPS 1982 + 0b0000 NI 1983 + 0b0001 IMP 1984 + EndEnum 2284 1985 EndSysreg 2285 1986 2286 1987 Sysreg SCTLR_EL1 3 0 1 0 0 ··· 2358 2053 Field 0 M 2359 2054 EndSysreg 2360 2055 2056 + Sysreg SCTLR_EL12 3 5 1 0 0 2057 + Mapping SCTLR_EL1 2058 + EndSysreg 2059 + 2060 + Sysreg SCTLRALIAS_EL1 3 0 1 4 6 2061 + Mapping SCTLR_EL1 2062 + EndSysreg 2063 + 2064 + Sysreg ACTLR_EL1 3 0 1 0 1 2065 + Field 63:0 IMPDEF 2066 + EndSysreg 2067 + 2068 + Sysreg ACTLR_EL12 3 5 1 0 1 2069 + Mapping ACTLR_EL1 2070 + EndSysreg 2071 + 2072 + Sysreg ACTLRALIAS_EL1 3 0 1 4 5 2073 + Mapping ACTLR_EL1 2074 + EndSysreg 2075 + 2361 2076 Sysreg CPACR_EL1 3 0 1 0 2 2362 - Res0 63:30 2077 + Res0 63:32 2078 + Field 31 TCPAC 2079 + Field 30 TAM 2363 2080 Field 29 E0POE 2364 2081 Field 28 TTA 2365 2082 Res0 27:26 ··· 2391 2064 Res0 19:18 2392 2065 Field 17:16 ZEN 2393 2066 Res0 15:0 2067 + EndSysreg 2068 + 2069 + Sysreg CPACR_EL12 3 5 1 0 2 2070 + Mapping CPACR_EL1 2071 + EndSysreg 2072 + 2073 + Sysreg CPACRALIAS_EL1 3 0 1 4 4 2074 + Mapping CPACR_EL1 2075 + EndSysreg 2076 + 2077 + Sysreg ACTLRMASK_EL1 3 0 1 4 1 2078 + Field 63:0 IMPDEF 2079 + EndSysreg 2080 + 2081 + Sysreg ACTLRMASK_EL12 3 5 1 4 1 2082 + Mapping ACTLRMASK_EL1 2083 + EndSysreg 2084 + 2085 + Sysreg CPACRMASK_EL1 3 0 1 4 2 2086 + Res0 63:32 2087 + Field 31 TCPAC 2088 + Field 30 TAM 2089 + Field 29 E0POE 2090 + Field 28 TTA 2091 + Res0 27:25 2092 + Field 24 SMEN 2093 + Res0 23:21 2094 + Field 20 FPEN 2095 + Res0 19:17 2096 + Field 16 ZEN 2097 + Res0 15:0 2098 + EndSysreg 2099 + 2100 + Sysreg CPACRMASK_EL12 3 5 1 4 2 2101 + Mapping CPACRMASK_EL1 2102 + EndSysreg 2103 + 2104 + Sysreg PFAR_EL1 3 0 6 0 5 2105 + Field 63 NS 2106 + Field 62 NSE 2107 + Res0 61:56 2108 + Field 55:52 PA_55_52 2109 + Field 51:48 PA_51_48 2110 + Field 47:0 PA 2111 + EndSysreg 2112 + 2113 + Sysreg PFAR_EL12 3 5 6 0 5 2114 + Mapping PFAR_EL1 2115 + EndSysreg 2116 + 2117 + Sysreg RCWSMASK_EL1 3 0 13 0 3 2118 + Field 63:0 RCWSMASK 2119 + EndSysreg 2120 + 2121 + Sysreg SCTLR2_EL1 3 0 1 0 3 2122 + Res0 63:13 2123 + Field 12 CPTM0 2124 + Field 11 CPTM 2125 + Field 10 CPTA0 2126 + Field 9 CPTA 2127 + Field 8 EnPACM0 2128 + Field 7 EnPACM 2129 + Field 6 EnIDCP128 2130 + Field 5 EASE 2131 + Field 4 EnANERR 2132 + Field 3 EnADERR 2133 + Field 2 NMEA 2134 + Res0 1:0 2135 + EndSysreg 2136 + 2137 + Sysreg SCTLR2_EL12 3 5 1 0 3 2138 + Mapping SCTLR2_EL1 2139 + EndSysreg 2140 + 2141 + Sysreg SCTLR2ALIAS_EL1 3 0 1 4 7 2142 + Mapping SCTLR2_EL1 2143 + EndSysreg 2144 + 2145 + Sysreg SCTLR2MASK_EL1 3 0 1 4 3 2146 + Res0 63:13 2147 + Field 12 CPTM0 2148 + Field 11 CPTM 2149 + Field 10 CPTA0 2150 + Field 9 CPTA 2151 + Field 8 EnPACM0 2152 + Field 7 EnPACM 2153 + Field 6 EnIDCP128 2154 + Field 5 EASE 2155 + Field 4 EnANERR 2156 + Field 3 EnADERR 2157 + Field 2 NMEA 2158 + Res0 1:0 2159 + EndSysreg 2160 + 2161 + Sysreg SCTLR2MASK_EL12 3 5 1 4 3 2162 + Mapping SCTLR2MASK_EL1 2163 + EndSysreg 2164 + 2165 + Sysreg SCTLRMASK_EL1 3 0 1 4 0 2166 + Field 63 TIDCP 2167 + Field 62 SPINTMASK 2168 + Field 61 NMI 2169 + Field 60 EnTP2 2170 + Field 59 TCSO 2171 + Field 58 TCSO0 2172 + Field 57 EPAN 2173 + Field 56 EnALS 2174 + Field 55 EnAS0 2175 + Field 54 EnASR 2176 + Field 53 TME 2177 + Field 52 TME0 2178 + Field 51 TMT 2179 + Field 50 TMT0 2180 + Res0 49:47 2181 + Field 46 TWEDEL 2182 + Field 45 TWEDEn 2183 + Field 44 DSSBS 2184 + Field 43 ATA 2185 + Field 42 ATA0 2186 + Res0 41 2187 + Field 40 TCF 2188 + Res0 39 2189 + Field 38 TCF0 2190 + Field 37 ITFSB 2191 + Field 36 BT1 2192 + Field 35 BT0 2193 + Field 34 EnFPM 2194 + Field 33 MSCEn 2195 + Field 32 CMOW 2196 + Field 31 EnIA 2197 + Field 30 EnIB 2198 + Field 29 LSMAOE 2199 + Field 28 nTLSMD 2200 + Field 27 EnDA 2201 + Field 26 UCI 2202 + Field 25 EE 2203 + Field 24 E0E 2204 + Field 23 SPAN 2205 + Field 22 EIS 2206 + Field 21 IESB 2207 + Field 20 TSCXT 2208 + Field 19 WXN 2209 + Field 18 nTWE 2210 + Res0 17 2211 + Field 16 nTWI 2212 + Field 15 UCT 2213 + Field 14 DZE 2214 + Field 13 EnDB 2215 + Field 12 I 2216 + Field 11 EOS 2217 + Field 10 EnRCTX 2218 + Field 9 UMA 2219 + Field 8 SED 2220 + Field 7 ITD 2221 + Field 6 nAA 2222 + Field 5 CP15BEN 2223 + Field 4 SA0 2224 + Field 3 SA 2225 + Field 2 C 2226 + Field 1 A 2227 + Field 0 M 2228 + EndSysreg 2229 + 2230 + Sysreg SCTLRMASK_EL12 3 5 1 4 0 2231 + Mapping SCTLRMASK_EL1 2232 + EndSysreg 2233 + 2234 + Sysreg TCR2MASK_EL1 3 0 2 7 3 2235 + Res0 63:22 2236 + Field 21 FNGNA1 2237 + Field 20 FNGNA0 2238 + Res0 19 2239 + Field 18 FNG1 2240 + Field 17 FNG0 2241 + Field 16 A2 2242 + Field 15 DisCH1 2243 + Field 14 DisCH0 2244 + Res0 13:12 2245 + Field 11 HAFT 2246 + Field 10 PTTWI 2247 + Res0 9:6 2248 + Field 5 D128 2249 + Field 4 AIE 2250 + Field 3 POE 2251 + Field 2 E0POE 2252 + Field 1 PIE 2253 + Field 0 PnCH 2254 + EndSysreg 2255 + 2256 + Sysreg TCR2MASK_EL12 3 5 2 7 3 2257 + Mapping TCR2MASK_EL1 2258 + EndSysreg 2259 + 2260 + Sysreg TCRMASK_EL1 3 0 2 7 2 2261 + Res0 63:62 2262 + Field 61 MTX1 2263 + Field 60 MTX0 2264 + Field 59 DS 2265 + Field 58 TCMA1 2266 + Field 57 TCMA0 2267 + Field 56 E0PD1 2268 + Field 55 E0PD0 2269 + Field 54 NFD1 2270 + Field 53 NFD0 2271 + Field 52 TBID1 2272 + Field 51 TBID0 2273 + Field 50 HWU162 2274 + Field 49 HWU161 2275 + Field 48 HWU160 2276 + Field 47 HWU159 2277 + Field 46 HWU062 2278 + Field 45 HWU061 2279 + Field 44 HWU060 2280 + Field 43 HWU059 2281 + Field 42 HPD1 2282 + Field 41 HPD0 2283 + Field 40 HD 2284 + Field 39 HA 2285 + Field 38 TBI1 2286 + Field 37 TBI0 2287 + Field 36 AS 2288 + Res0 35:33 2289 + Field 32 IPS 2290 + Res0 31 2291 + Field 30 TG1 2292 + Res0 29 2293 + Field 28 SH1 2294 + Res0 27 2295 + Field 26 ORGN1 2296 + Res0 25 2297 + Field 24 IRGN1 2298 + Field 23 EPD1 2299 + Field 22 A1 2300 + Res0 21:17 2301 + Field 16 T1SZ 2302 + Res0 15 2303 + Field 14 TG0 2304 + Res0 13 2305 + Field 12 SH0 2306 + Res0 11 2307 + Field 10 ORGN0 2308 + Res0 9 2309 + Field 8 IRGN0 2310 + Field 7 EPD0 2311 + Res0 6:1 2312 + Field 0 T0SZ 2313 + EndSysreg 2314 + 2315 + Sysreg TCRMASK_EL12 3 5 2 7 2 2316 + Mapping TCRMASK_EL1 2317 + EndSysreg 2318 + 2319 + Sysreg ERXGSR_EL1 3 0 5 3 2 2320 + Field 63 S63 2321 + Field 62 S62 2322 + Field 61 S61 2323 + Field 60 S60 2324 + Field 59 S59 2325 + Field 58 S58 2326 + Field 57 S57 2327 + Field 56 S56 2328 + Field 55 S55 2329 + Field 54 S54 2330 + Field 53 S53 2331 + Field 52 S52 2332 + Field 51 S51 2333 + Field 50 S50 2334 + Field 49 S49 2335 + Field 48 S48 2336 + Field 47 S47 2337 + Field 46 S46 2338 + Field 45 S45 2339 + Field 44 S44 2340 + Field 43 S43 2341 + Field 42 S42 2342 + Field 41 S41 2343 + Field 40 S40 2344 + Field 39 S39 2345 + Field 38 S38 2346 + Field 37 S37 2347 + Field 36 S36 2348 + Field 35 S35 2349 + Field 34 S34 2350 + Field 33 S33 2351 + Field 32 S32 2352 + Field 31 S31 2353 + Field 30 S30 2354 + Field 29 S29 2355 + Field 28 S28 2356 + Field 27 S27 2357 + Field 26 S26 2358 + Field 25 S25 2359 + Field 24 S24 2360 + Field 23 S23 2361 + Field 22 S22 2362 + Field 21 S21 2363 + Field 20 S20 2364 + Field 19 S19 2365 + Field 18 S18 2366 + Field 17 S17 2367 + Field 16 S16 2368 + Field 15 S15 2369 + Field 14 S14 2370 + Field 13 S13 2371 + Field 12 S12 2372 + Field 11 S11 2373 + Field 10 S10 2374 + Field 9 S9 2375 + Field 8 S8 2376 + Field 7 S7 2377 + Field 6 S6 2378 + Field 5 S5 2379 + Field 4 S4 2380 + Field 3 S3 2381 + Field 2 S2 2382 + Field 1 S1 2383 + Field 0 S0 2394 2384 EndSysreg 2395 2385 2396 2386 Sysreg TRFCR_EL1 3 0 1 2 1 ··· 2720 2076 Res0 4:2 2721 2077 Field 1 ExTRE 2722 2078 Field 0 E0TRE 2079 + EndSysreg 2080 + 2081 + Sysreg TRCITECR_EL1 3 0 1 2 3 2082 + Res0 63:2 2083 + Field 1 E1E 2084 + Field 0 E0E 2085 + EndSysreg 2086 + 2087 + Sysreg TRCITECR_EL12 3 5 1 2 3 2088 + Mapping TRCITECR_EL1 2723 2089 EndSysreg 2724 2090 2725 2091 Sysreg SMPRI_EL1 3 0 1 2 4 ··· 2880 2226 EndSysreg 2881 2227 2882 2228 Sysreg PMSIDR_EL1 3 0 9 9 7 2883 - Res0 63:25 2229 + Res0 63:33 2230 + UnsignedEnum 32 SME 2231 + 0b0 NI 2232 + 0b1 IMP 2233 + EndEnum 2234 + UnsignedEnum 31:28 ALTCLK 2235 + 0b0000 NI 2236 + 0b0001 IMP 2237 + 0b1111 IMPDEF 2238 + EndEnum 2239 + UnsignedEnum 27 FPF 2240 + 0b0 NI 2241 + 0b1 IMP 2242 + EndEnum 2243 + UnsignedEnum 26 EFT 2244 + 0b0 NI 2245 + 0b1 IMP 2246 + EndEnum 2247 + UnsignedEnum 25 CRR 2248 + 0b0 NI 2249 + 0b1 IMP 2250 + EndEnum 2884 2251 Field 24 PBT 2885 2252 Field 23:20 FORMAT 2886 2253 Enum 19:16 COUNTSIZE ··· 2919 2244 0b0111 3072 2920 2245 0b1000 4096 2921 2246 EndEnum 2922 - Res0 7 2247 + UnsignedEnum 7 FDS 2248 + 0b0 NI 2249 + 0b1 IMP 2250 + EndEnum 2923 2251 Field 6 FnE 2924 2252 Field 5 ERND 2925 2253 Field 4 LDS ··· 2965 2287 Field 15:0 MSS 2966 2288 EndSysreg 2967 2289 2290 + Sysreg PMSDSFR_EL1 3 0 9 10 4 2291 + Field 63:0 S 2292 + EndSysreg 2293 + 2294 + Sysreg PMBMAR_EL1 3 0 9 10 5 2295 + Res0 63:10 2296 + Field 9:8 SH 2297 + Field 7:0 Attr 2298 + EndSysreg 2299 + 2968 2300 Sysreg PMBIDR_EL1 3 0 9 10 7 2969 2301 Res0 63:12 2970 2302 Enum 11:8 EA ··· 2988 2300 Field 3:0 ALIGN 2989 2301 EndSysreg 2990 2302 2303 + Sysreg TRBMPAM_EL1 3 0 9 11 5 2304 + Res0 63:27 2305 + Field 26 EN 2306 + Field 25:24 MPAM_SP 2307 + Field 23:16 PMG 2308 + Field 15:0 PARTID 2309 + EndSysreg 2310 + 2311 + Sysreg PMSSCR_EL1 3 0 9 13 3 2312 + Res0 63:33 2313 + Field 32 NC 2314 + Res0 31:1 2315 + Field 0 SS 2316 + EndSysreg 2317 + 2991 2318 Sysreg PMUACR_EL1 3 0 9 14 4 2992 2319 Res0 63:33 2993 2320 Field 32 F0 ··· 3010 2307 Field 30:0 P 3011 2308 EndSysreg 3012 2309 2310 + Sysreg PMECR_EL1 3 0 9 14 5 2311 + Res0 63:5 2312 + Field 4:3 SSE 2313 + Field 2 KPME 2314 + Field 1:0 PMEE 2315 + EndSysreg 2316 + 2317 + Sysreg PMIAR_EL1 3 0 9 14 7 2318 + Field 63:0 ADDRESS 2319 + EndSysreg 2320 + 3013 2321 Sysreg PMSELR_EL0 3 3 9 12 5 3014 2322 Res0 63:5 3015 2323 Field 4:0 SEL 2324 + EndSysreg 2325 + 2326 + Sysreg PMZR_EL0 3 3 9 13 4 2327 + Res0 63:33 2328 + Field 32 F0 2329 + Field 31 C 2330 + Field 30:0 P 3016 2331 EndSysreg 3017 2332 3018 2333 SysregFields CONTEXTIDR_ELx ··· 3171 2450 EndEnum 3172 2451 EndSysreg 3173 2452 3174 - SysregFields HFGxTR_EL2 2453 + Sysreg HCR_EL2 3 4 1 1 0 2454 + Field 63:60 TWEDEL 2455 + Field 59 TWEDEn 2456 + Field 58 TID5 2457 + Field 57 DCT 2458 + Field 56 ATA 2459 + Field 55 TTLBOS 2460 + Field 54 TTLBIS 2461 + Field 53 EnSCXT 2462 + Field 52 TOCU 2463 + Field 51 AMVOFFEN 2464 + Field 50 TICAB 2465 + Field 49 TID4 2466 + Field 48 GPF 2467 + Field 47 FIEN 2468 + Field 46 FWB 2469 + Field 45 NV2 2470 + Field 44 AT 2471 + Field 43 NV1 2472 + Field 42 NV 2473 + Field 41 API 2474 + Field 40 APK 2475 + Field 39 TME 2476 + Field 38 MIOCNCE 2477 + Field 37 TEA 2478 + Field 36 TERR 2479 + Field 35 TLOR 2480 + Field 34 E2H 2481 + Field 33 ID 2482 + Field 32 CD 2483 + Field 31 RW 2484 + Field 30 TRVM 2485 + Field 29 HCD 2486 + Field 28 TDZ 2487 + Field 27 TGE 2488 + Field 26 TVM 2489 + Field 25 TTLB 2490 + Field 24 TPU 2491 + Field 23 TPCP 2492 + Field 22 TSW 2493 + Field 21 TACR 2494 + Field 20 TIDCP 2495 + Field 19 TSC 2496 + Field 18 TID3 2497 + Field 17 TID2 2498 + Field 16 TID1 2499 + Field 15 TID0 2500 + Field 14 TWE 2501 + Field 13 TWI 2502 + Field 12 DC 2503 + UnsignedEnum 11:10 BSU 2504 + 0b00 NONE 2505 + 0b01 IS 2506 + 0b10 OS 2507 + 0b11 FS 2508 + EndEnum 2509 + Field 9 FB 2510 + Field 8 VSE 2511 + Field 7 VI 2512 + Field 6 VF 2513 + Field 5 AMO 2514 + Field 4 IMO 2515 + Field 3 FMO 2516 + Field 2 PTW 2517 + Field 1 SWIO 2518 + Field 0 VM 2519 + EndSysreg 2520 + 2521 + Sysreg MDCR_EL2 3 4 1 1 1 2522 + Res0 63:51 2523 + Field 50 EnSTEPOP 2524 + Res0 49:44 2525 + Field 43 EBWE 2526 + Res0 42 2527 + Field 41:40 PMEE 2528 + Res0 39:37 2529 + Field 36 HPMFZS 2530 + Res0 35:32 2531 + Field 31:30 PMSSE 2532 + Field 29 HPMFZO 2533 + Field 28 MTPME 2534 + Field 27 TDCC 2535 + Field 26 HLP 2536 + Field 25:24 E2TB 2537 + Field 23 HCCD 2538 + Res0 22:20 2539 + Field 19 TTRF 2540 + Res0 18 2541 + Field 17 HPMD 2542 + Res0 16 2543 + Field 15 EnSPM 2544 + Field 14 TPMS 2545 + Field 13:12 E2PB 2546 + Field 11 TDRA 2547 + Field 10 TDOSA 2548 + Field 9 TDA 2549 + Field 8 TDE 2550 + Field 7 HPME 2551 + Field 6 TPM 2552 + Field 5 TPMCR 2553 + Field 4:0 HPMN 2554 + EndSysreg 2555 + 2556 + Sysreg HFGRTR_EL2 3 4 1 1 4 3175 2557 Field 63 nAMAIR2_EL1 3176 2558 Field 62 nMAIR2_EL1 3177 2559 Field 61 nS2POR_EL1 ··· 3339 2515 Field 2 AIDR_EL1 3340 2516 Field 1 AFSR1_EL1 3341 2517 Field 0 AFSR0_EL1 3342 - EndSysregFields 3343 - 3344 - Sysreg MDCR_EL2 3 4 1 1 1 3345 - Res0 63:51 3346 - Field 50 EnSTEPOP 3347 - Res0 49:44 3348 - Field 43 EBWE 3349 - Res0 42 3350 - Field 41:40 PMEE 3351 - Res0 39:37 3352 - Field 36 HPMFZS 3353 - Res0 35:32 3354 - Field 31:30 PMSSE 3355 - Field 29 HPMFZO 3356 - Field 28 MTPME 3357 - Field 27 TDCC 3358 - Field 26 HLP 3359 - Field 25:24 E2TB 3360 - Field 23 HCCD 3361 - Res0 22:20 3362 - Field 19 TTRF 3363 - Res0 18 3364 - Field 17 HPMD 3365 - Res0 16 3366 - Field 15 EnSPM 3367 - Field 14 TPMS 3368 - Field 13:12 E2PB 3369 - Field 11 TDRA 3370 - Field 10 TDOSA 3371 - Field 9 TDA 3372 - Field 8 TDE 3373 - Field 7 HPME 3374 - Field 6 TPM 3375 - Field 5 TPMCR 3376 - Field 4:0 HPMN 3377 - EndSysreg 3378 - 3379 - Sysreg HFGRTR_EL2 3 4 1 1 4 3380 - Fields HFGxTR_EL2 3381 2518 EndSysreg 3382 2519 3383 2520 Sysreg HFGWTR_EL2 3 4 1 1 5 3384 - Fields HFGxTR_EL2 2521 + Field 63 nAMAIR2_EL1 2522 + Field 62 nMAIR2_EL1 2523 + Field 61 nS2POR_EL1 2524 + Field 60 nPOR_EL1 2525 + Field 59 nPOR_EL0 2526 + Field 58 nPIR_EL1 2527 + Field 57 nPIRE0_EL1 2528 + Field 56 nRCWMASK_EL1 2529 + Field 55 nTPIDR2_EL0 2530 + Field 54 nSMPRI_EL1 2531 + Field 53 nGCS_EL1 2532 + Field 52 nGCS_EL0 2533 + Res0 51 2534 + Field 50 nACCDATA_EL1 2535 + Field 49 ERXADDR_EL1 2536 + Field 48 ERXPFGCDN_EL1 2537 + Field 47 ERXPFGCTL_EL1 2538 + Res0 46 2539 + Field 45 ERXMISCn_EL1 2540 + Field 44 ERXSTATUS_EL1 2541 + Field 43 ERXCTLR_EL1 2542 + Res0 42 2543 + Field 41 ERRSELR_EL1 2544 + Res0 40 2545 + Field 39 ICC_IGRPENn_EL1 2546 + Field 38 VBAR_EL1 2547 + Field 37 TTBR1_EL1 2548 + Field 36 TTBR0_EL1 2549 + Field 35 TPIDR_EL0 2550 + Field 34 TPIDRRO_EL0 2551 + Field 33 TPIDR_EL1 2552 + Field 32 TCR_EL1 2553 + Field 31 SCXTNUM_EL0 2554 + Field 30 SCXTNUM_EL1 2555 + Field 29 SCTLR_EL1 2556 + Res0 28 2557 + Field 27 PAR_EL1 2558 + Res0 26:25 2559 + Field 24 MAIR_EL1 2560 + Field 23 LORSA_EL1 2561 + Field 22 LORN_EL1 2562 + Res0 21 2563 + Field 20 LOREA_EL1 2564 + Field 19 LORC_EL1 2565 + Res0 18 2566 + Field 17 FAR_EL1 2567 + Field 16 ESR_EL1 2568 + Res0 15:14 2569 + Field 13 CSSELR_EL1 2570 + Field 12 CPACR_EL1 2571 + Field 11 CONTEXTIDR_EL1 2572 + Res0 10:9 2573 + Field 8 APIBKey 2574 + Field 7 APIAKey 2575 + Field 6 APGAKey 2576 + Field 5 APDBKey 2577 + Field 4 APDAKey 2578 + Field 3 AMAIR_EL1 2579 + Res0 2 2580 + Field 1 AFSR1_EL1 2581 + Field 0 AFSR0_EL1 3385 2582 EndSysreg 3386 2583 3387 2584 Sysreg HFGITR_EL2 3 4 1 1 6 3388 - Res0 63 2585 + Field 63 PSBCSYNC 3389 2586 Field 62 ATS1E1A 3390 2587 Res0 61 3391 2588 Field 60 COSPRCTX ··· 3816 2971 Fields SMCR_ELx 3817 2972 EndSysreg 3818 2973 2974 + Sysreg VNCR_EL2 3 4 2 2 0 2975 + Field 63:57 RESS 2976 + Field 56:12 BADDR 2977 + Res0 11:0 2978 + EndSysreg 2979 + 3819 2980 Sysreg GCSCR_EL2 3 4 2 5 0 3820 2981 Fields GCSCR_ELx 3821 2982 EndSysreg ··· 4095 3244 Fields TTBRx_EL1 4096 3245 EndSysreg 4097 3246 3247 + Sysreg TCR_EL1 3 0 2 0 2 3248 + Res0 63:62 3249 + Field 61 MTX1 3250 + Field 60 MTX0 3251 + Field 59 DS 3252 + Field 58 TCMA1 3253 + Field 57 TCMA0 3254 + Field 56 E0PD1 3255 + Field 55 E0PD0 3256 + Field 54 NFD1 3257 + Field 53 NFD0 3258 + Field 52 TBID1 3259 + Field 51 TBID0 3260 + Field 50 HWU162 3261 + Field 49 HWU161 3262 + Field 48 HWU160 3263 + Field 47 HWU159 3264 + Field 46 HWU062 3265 + Field 45 HWU061 3266 + Field 44 HWU060 3267 + Field 43 HWU059 3268 + Field 42 HPD1 3269 + Field 41 HPD0 3270 + Field 40 HD 3271 + Field 39 HA 3272 + Field 38 TBI1 3273 + Field 37 TBI0 3274 + Field 36 AS 3275 + Res0 35 3276 + Field 34:32 IPS 3277 + Field 31:30 TG1 3278 + Field 29:28 SH1 3279 + Field 27:26 ORGN1 3280 + Field 25:24 IRGN1 3281 + Field 23 EPD1 3282 + Field 22 A1 3283 + Field 21:16 T1SZ 3284 + Field 15:14 TG0 3285 + Field 13:12 SH0 3286 + Field 11:10 ORGN0 3287 + Field 9:8 IRGN0 3288 + Field 7 EPD0 3289 + Res0 6 3290 + Field 5:0 T0SZ 3291 + EndSysreg 3292 + 3293 + Sysreg TCR_EL12 3 5 2 0 2 3294 + Mapping TCR_EL1 3295 + EndSysreg 3296 + 3297 + Sysreg TCRALIAS_EL1 3 0 2 7 6 3298 + Mapping TCR_EL1 3299 + EndSysreg 3300 + 4098 3301 Sysreg TCR2_EL1 3 0 2 0 3 4099 3302 Res0 63:16 4100 3303 Field 15 DisCH1 ··· 4166 3261 EndSysreg 4167 3262 4168 3263 Sysreg TCR2_EL12 3 5 2 0 3 3264 + Mapping TCR2_EL1 3265 + EndSysreg 3266 + 3267 + Sysreg TCR2ALIAS_EL1 3 0 2 7 7 4169 3268 Mapping TCR2_EL1 4170 3269 EndSysreg 4171 3270 ··· 4434 3525 EndSysreg 4435 3526 4436 3527 Sysreg TRBIDR_EL1 3 0 9 11 7 4437 - Res0 63:12 3528 + Res0 63:16 3529 + UnsignedEnum 15:12 MPAM 3530 + 0b0000 NI 3531 + 0b0001 DEFAULT 3532 + 0b0010 IMP 3533 + EndEnum 4438 3534 Enum 11:8 EA 4439 3535 0b0000 NON_DESC 4440 3536 0b0001 IGNORE

+1 -1

arch/loongarch/include/asm/kvm_host.h

··· 301 301 /* MMU handling */ 302 302 void kvm_flush_tlb_all(void); 303 303 void kvm_flush_tlb_gpa(struct kvm_vcpu *vcpu, unsigned long gpa); 304 - int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long badv, bool write); 304 + int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long badv, bool write, int ecode); 305 305 306 306 int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end, bool blockable); 307 307 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);

+1 -1

arch/loongarch/include/asm/kvm_vcpu.h

··· 37 37 #define KVM_LOONGSON_IRQ_NUM_MASK 0xffff 38 38 39 39 typedef union loongarch_instruction larch_inst; 40 - typedef int (*exit_handle_fn)(struct kvm_vcpu *); 40 + typedef int (*exit_handle_fn)(struct kvm_vcpu *, int); 41 41 42 42 int kvm_emu_mmio_read(struct kvm_vcpu *vcpu, larch_inst inst); 43 43 int kvm_emu_mmio_write(struct kvm_vcpu *vcpu, larch_inst inst);

+19 -18

arch/loongarch/kvm/exit.c

··· 341 341 * 2) Execute CACOP/IDLE instructions; 342 342 * 3) Access to unimplemented CSRs/IOCSRs. 343 343 */ 344 - static int kvm_handle_gspr(struct kvm_vcpu *vcpu) 344 + static int kvm_handle_gspr(struct kvm_vcpu *vcpu, int ecode) 345 345 { 346 346 int ret = RESUME_GUEST; 347 347 enum emulation_result er = EMULATE_DONE; ··· 661 661 return ret; 662 662 } 663 663 664 - static int kvm_handle_rdwr_fault(struct kvm_vcpu *vcpu, bool write) 664 + static int kvm_handle_rdwr_fault(struct kvm_vcpu *vcpu, bool write, int ecode) 665 665 { 666 666 int ret; 667 667 larch_inst inst; ··· 675 675 return RESUME_GUEST; 676 676 } 677 677 678 - ret = kvm_handle_mm_fault(vcpu, badv, write); 678 + ret = kvm_handle_mm_fault(vcpu, badv, write, ecode); 679 679 if (ret) { 680 680 /* Treat as MMIO */ 681 681 inst.word = vcpu->arch.badi; ··· 705 705 return ret; 706 706 } 707 707 708 - static int kvm_handle_read_fault(struct kvm_vcpu *vcpu) 708 + static int kvm_handle_read_fault(struct kvm_vcpu *vcpu, int ecode) 709 709 { 710 - return kvm_handle_rdwr_fault(vcpu, false); 710 + return kvm_handle_rdwr_fault(vcpu, false, ecode); 711 711 } 712 712 713 - static int kvm_handle_write_fault(struct kvm_vcpu *vcpu) 713 + static int kvm_handle_write_fault(struct kvm_vcpu *vcpu, int ecode) 714 714 { 715 - return kvm_handle_rdwr_fault(vcpu, true); 715 + return kvm_handle_rdwr_fault(vcpu, true, ecode); 716 716 } 717 717 718 718 int kvm_complete_user_service(struct kvm_vcpu *vcpu, struct kvm_run *run) ··· 726 726 /** 727 727 * kvm_handle_fpu_disabled() - Guest used fpu however it is disabled at host 728 728 * @vcpu: Virtual CPU context. 729 + * @ecode: Exception code. 729 730 * 730 731 * Handle when the guest attempts to use fpu which hasn't been allowed 731 732 * by the root context. 732 733 */ 733 - static int kvm_handle_fpu_disabled(struct kvm_vcpu *vcpu) 734 + static int kvm_handle_fpu_disabled(struct kvm_vcpu *vcpu, int ecode) 734 735 { 735 736 struct kvm_run *run = vcpu->run; 736 737 ··· 784 783 /* 785 784 * kvm_handle_lsx_disabled() - Guest used LSX while disabled in root. 786 785 * @vcpu: Virtual CPU context. 786 + * @ecode: Exception code. 787 787 * 788 788 * Handle when the guest attempts to use LSX when it is disabled in the root 789 789 * context. 790 790 */ 791 - static int kvm_handle_lsx_disabled(struct kvm_vcpu *vcpu) 791 + static int kvm_handle_lsx_disabled(struct kvm_vcpu *vcpu, int ecode) 792 792 { 793 793 if (kvm_own_lsx(vcpu)) 794 794 kvm_queue_exception(vcpu, EXCCODE_INE, 0); ··· 800 798 /* 801 799 * kvm_handle_lasx_disabled() - Guest used LASX while disabled in root. 802 800 * @vcpu: Virtual CPU context. 801 + * @ecode: Exception code. 803 802 * 804 803 * Handle when the guest attempts to use LASX when it is disabled in the root 805 804 * context. 806 805 */ 807 - static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu) 806 + static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu, int ecode) 808 807 { 809 808 if (kvm_own_lasx(vcpu)) 810 809 kvm_queue_exception(vcpu, EXCCODE_INE, 0); ··· 813 810 return RESUME_GUEST; 814 811 } 815 812 816 - static int kvm_handle_lbt_disabled(struct kvm_vcpu *vcpu) 813 + static int kvm_handle_lbt_disabled(struct kvm_vcpu *vcpu, int ecode) 817 814 { 818 815 if (kvm_own_lbt(vcpu)) 819 816 kvm_queue_exception(vcpu, EXCCODE_INE, 0); ··· 875 872 kvm_write_reg(vcpu, LOONGARCH_GPR_A0, ret); 876 873 } 877 874 878 - static int kvm_handle_hypercall(struct kvm_vcpu *vcpu) 875 + static int kvm_handle_hypercall(struct kvm_vcpu *vcpu, int ecode) 879 876 { 880 877 int ret; 881 878 larch_inst inst; ··· 935 932 /* 936 933 * LoongArch KVM callback handling for unimplemented guest exiting 937 934 */ 938 - static int kvm_fault_ni(struct kvm_vcpu *vcpu) 935 + static int kvm_fault_ni(struct kvm_vcpu *vcpu, int ecode) 939 936 { 940 - unsigned int ecode, inst; 941 - unsigned long estat, badv; 937 + unsigned int inst; 938 + unsigned long badv; 942 939 943 940 /* Fetch the instruction */ 944 941 inst = vcpu->arch.badi; 945 942 badv = vcpu->arch.badv; 946 - estat = vcpu->arch.host_estat; 947 - ecode = (estat & CSR_ESTAT_EXC) >> CSR_ESTAT_EXC_SHIFT; 948 943 kvm_err("ECode: %d PC=%#lx Inst=0x%08x BadVaddr=%#lx ESTAT=%#lx\n", 949 944 ecode, vcpu->arch.pc, inst, badv, read_gcsr_estat()); 950 945 kvm_arch_vcpu_dump_regs(vcpu); ··· 967 966 968 967 int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault) 969 968 { 970 - return kvm_fault_tables[fault](vcpu); 969 + return kvm_fault_tables[fault](vcpu, fault); 971 970 }

+12 -3

arch/loongarch/kvm/mmu.c

··· 912 912 return err; 913 913 } 914 914 915 - int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long gpa, bool write) 915 + int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long gpa, bool write, int ecode) 916 916 { 917 917 int ret; 918 918 ··· 921 921 return ret; 922 922 923 923 /* Invalidate this entry in the TLB */ 924 - vcpu->arch.flush_gpa = gpa; 925 - kvm_make_request(KVM_REQ_TLB_FLUSH_GPA, vcpu); 924 + if (!cpu_has_ptw || (ecode == EXCCODE_TLBM)) { 925 + /* 926 + * With HW PTW, invalid TLB is not added when page fault. But 927 + * for EXCCODE_TLBM exception, stale TLB may exist because of 928 + * the last read access. 929 + * 930 + * With SW PTW, invalid TLB is added in TLB refill exception. 931 + */ 932 + vcpu->arch.flush_gpa = gpa; 933 + kvm_make_request(KVM_REQ_TLB_FLUSH_GPA, vcpu); 934 + } 926 935 927 936 return 0; 928 937 }

-3

arch/riscv/include/asm/kvm_aia.h

··· 63 63 /* CPU AIA CSR context of Guest VCPU */ 64 64 struct kvm_vcpu_aia_csr guest_csr; 65 65 66 - /* CPU AIA CSR context upon Guest VCPU reset */ 67 - struct kvm_vcpu_aia_csr guest_reset_csr; 68 - 69 66 /* Guest physical address of IMSIC for this VCPU */ 70 67 gpa_t imsic_addr; 71 68

+11 -6

arch/riscv/include/asm/kvm_host.h

··· 119 119 120 120 /* AIA Guest/VM context */ 121 121 struct kvm_aia aia; 122 + 123 + /* KVM_CAP_RISCV_MP_STATE_RESET */ 124 + bool mp_state_reset; 122 125 }; 123 126 124 127 struct kvm_cpu_trap { ··· 196 193 unsigned long sstateen0; 197 194 }; 198 195 196 + struct kvm_vcpu_reset_state { 197 + spinlock_t lock; 198 + unsigned long pc; 199 + unsigned long a1; 200 + }; 201 + 199 202 struct kvm_vcpu_arch { 200 203 /* VCPU ran at least once */ 201 204 bool ran_atleast_once; ··· 236 227 /* CPU Smstateen CSR context of Guest VCPU */ 237 228 struct kvm_vcpu_smstateen_csr smstateen_csr; 238 229 239 - /* CPU context upon Guest VCPU reset */ 240 - struct kvm_cpu_context guest_reset_context; 241 - spinlock_t reset_cntx_lock; 242 - 243 - /* CPU CSR context upon Guest VCPU reset */ 244 - struct kvm_vcpu_csr guest_reset_csr; 230 + /* CPU reset state of Guest VCPU */ 231 + struct kvm_vcpu_reset_state reset_state; 245 232 246 233 /* 247 234 * VCPU interrupts

+3

arch/riscv/include/asm/kvm_vcpu_sbi.h

··· 55 55 void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu, 56 56 struct kvm_run *run, 57 57 u32 type, u64 flags); 58 + void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu, 59 + unsigned long pc, unsigned long a1); 60 + void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu); 58 61 int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run); 59 62 int kvm_riscv_vcpu_set_reg_sbi_ext(struct kvm_vcpu *vcpu, 60 63 const struct kvm_one_reg *reg);

+2 -4

arch/riscv/include/asm/kvm_vcpu_vector.h

··· 33 33 unsigned long *isa); 34 34 void kvm_riscv_vcpu_host_vector_save(struct kvm_cpu_context *cntx); 35 35 void kvm_riscv_vcpu_host_vector_restore(struct kvm_cpu_context *cntx); 36 - int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu, 37 - struct kvm_cpu_context *cntx); 36 + int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu); 38 37 void kvm_riscv_vcpu_free_vector_context(struct kvm_vcpu *vcpu); 39 38 #else 40 39 ··· 61 62 { 62 63 } 63 64 64 - static inline int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu, 65 - struct kvm_cpu_context *cntx) 65 + static inline int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu) 66 66 { 67 67 return 0; 68 68 }

+10

arch/riscv/kernel/head.S

··· 131 131 csrw CSR_IE, zero 132 132 csrw CSR_IP, zero 133 133 134 + #ifndef CONFIG_RISCV_M_MODE 135 + /* Enable time CSR */ 136 + li t0, 0x2 137 + csrw CSR_SCOUNTEREN, t0 138 + #endif 139 + 134 140 /* Load the global pointer */ 135 141 load_global_pointer 136 142 ··· 232 226 * to hand it to us. 233 227 */ 234 228 csrr a0, CSR_MHARTID 229 + #else 230 + /* Enable time CSR */ 231 + li t0, 0x2 232 + csrw CSR_SCOUNTEREN, t0 235 233 #endif /* CONFIG_RISCV_M_MODE */ 236 234 237 235 /* Load the global pointer */

+1 -1

arch/riscv/kvm/Kconfig

··· 18 18 if VIRTUALIZATION 19 19 20 20 config KVM 21 - tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)" 21 + tristate "Kernel-based Virtual Machine (KVM) support" 22 22 depends on RISCV_SBI && MMU 23 23 select HAVE_KVM_IRQCHIP 24 24 select HAVE_KVM_IRQ_ROUTING

+1 -3

arch/riscv/kvm/aia_device.c

··· 526 526 void kvm_riscv_vcpu_aia_reset(struct kvm_vcpu *vcpu) 527 527 { 528 528 struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr; 529 - struct kvm_vcpu_aia_csr *reset_csr = 530 - &vcpu->arch.aia_context.guest_reset_csr; 531 529 532 530 if (!kvm_riscv_aia_available()) 533 531 return; 534 - memcpy(csr, reset_csr, sizeof(*csr)); 532 + memset(csr, 0, sizeof(*csr)); 535 533 536 534 /* Proceed only if AIA was initialized successfully */ 537 535 if (!kvm_riscv_aia_initialized(vcpu->kvm))

+35 -29

arch/riscv/kvm/vcpu.c

··· 51 51 sizeof(kvm_vcpu_stats_desc), 52 52 }; 53 53 54 - static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) 54 + static void kvm_riscv_vcpu_context_reset(struct kvm_vcpu *vcpu, 55 + bool kvm_sbi_reset) 55 56 { 56 57 struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 57 - struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr; 58 58 struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 59 - struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context; 59 + void *vector_datap = cntx->vector.datap; 60 + 61 + memset(cntx, 0, sizeof(*cntx)); 62 + memset(csr, 0, sizeof(*csr)); 63 + memset(&vcpu->arch.smstateen_csr, 0, sizeof(vcpu->arch.smstateen_csr)); 64 + 65 + /* Restore datap as it's not a part of the guest context. */ 66 + cntx->vector.datap = vector_datap; 67 + 68 + if (kvm_sbi_reset) 69 + kvm_riscv_vcpu_sbi_load_reset_state(vcpu); 70 + 71 + /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */ 72 + cntx->sstatus = SR_SPP | SR_SPIE; 73 + 74 + cntx->hstatus |= HSTATUS_VTW; 75 + cntx->hstatus |= HSTATUS_SPVP; 76 + cntx->hstatus |= HSTATUS_SPV; 77 + } 78 + 79 + static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu, bool kvm_sbi_reset) 80 + { 60 81 bool loaded; 61 82 62 83 /** ··· 92 71 93 72 vcpu->arch.last_exit_cpu = -1; 94 73 95 - memcpy(csr, reset_csr, sizeof(*csr)); 96 - 97 - spin_lock(&vcpu->arch.reset_cntx_lock); 98 - memcpy(cntx, reset_cntx, sizeof(*cntx)); 99 - spin_unlock(&vcpu->arch.reset_cntx_lock); 100 - 101 - memset(&vcpu->arch.smstateen_csr, 0, sizeof(vcpu->arch.smstateen_csr)); 74 + kvm_riscv_vcpu_context_reset(vcpu, kvm_sbi_reset); 102 75 103 76 kvm_riscv_vcpu_fp_reset(vcpu); 104 77 ··· 127 112 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) 128 113 { 129 114 int rc; 130 - struct kvm_cpu_context *cntx; 131 - struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr; 132 115 133 116 spin_lock_init(&vcpu->arch.mp_state_lock); 134 117 ··· 146 133 /* Setup VCPU hfence queue */ 147 134 spin_lock_init(&vcpu->arch.hfence_lock); 148 135 149 - /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */ 150 - spin_lock_init(&vcpu->arch.reset_cntx_lock); 136 + spin_lock_init(&vcpu->arch.reset_state.lock); 151 137 152 - spin_lock(&vcpu->arch.reset_cntx_lock); 153 - cntx = &vcpu->arch.guest_reset_context; 154 - cntx->sstatus = SR_SPP | SR_SPIE; 155 - cntx->hstatus = 0; 156 - cntx->hstatus |= HSTATUS_VTW; 157 - cntx->hstatus |= HSTATUS_SPVP; 158 - cntx->hstatus |= HSTATUS_SPV; 159 - spin_unlock(&vcpu->arch.reset_cntx_lock); 160 - 161 - if (kvm_riscv_vcpu_alloc_vector_context(vcpu, cntx)) 138 + if (kvm_riscv_vcpu_alloc_vector_context(vcpu)) 162 139 return -ENOMEM; 163 - 164 - /* By default, make CY, TM, and IR counters accessible in VU mode */ 165 - reset_csr->scounteren = 0x7; 166 140 167 141 /* Setup VCPU timer */ 168 142 kvm_riscv_vcpu_timer_init(vcpu); ··· 169 169 kvm_riscv_vcpu_sbi_init(vcpu); 170 170 171 171 /* Reset VCPU */ 172 - kvm_riscv_reset_vcpu(vcpu); 172 + kvm_riscv_reset_vcpu(vcpu, false); 173 173 174 174 return 0; 175 175 } ··· 518 518 case KVM_MP_STATE_STOPPED: 519 519 __kvm_riscv_vcpu_power_off(vcpu); 520 520 break; 521 + case KVM_MP_STATE_INIT_RECEIVED: 522 + if (vcpu->kvm->arch.mp_state_reset) 523 + kvm_riscv_reset_vcpu(vcpu, false); 524 + else 525 + ret = -EINVAL; 526 + break; 521 527 default: 522 528 ret = -EINVAL; 523 529 } ··· 712 706 } 713 707 714 708 if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) 715 - kvm_riscv_reset_vcpu(vcpu); 709 + kvm_riscv_reset_vcpu(vcpu, true); 716 710 717 711 if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu)) 718 712 kvm_riscv_gstage_update_hgatp(vcpu);

+30 -2

arch/riscv/kvm/vcpu_sbi.c

··· 143 143 struct kvm_vcpu *tmp; 144 144 145 145 kvm_for_each_vcpu(i, tmp, vcpu->kvm) { 146 - spin_lock(&vcpu->arch.mp_state_lock); 146 + spin_lock(&tmp->arch.mp_state_lock); 147 147 WRITE_ONCE(tmp->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED); 148 - spin_unlock(&vcpu->arch.mp_state_lock); 148 + spin_unlock(&tmp->arch.mp_state_lock); 149 149 } 150 150 kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP); 151 151 ··· 154 154 run->system_event.ndata = 1; 155 155 run->system_event.data[0] = reason; 156 156 run->exit_reason = KVM_EXIT_SYSTEM_EVENT; 157 + } 158 + 159 + void kvm_riscv_vcpu_sbi_request_reset(struct kvm_vcpu *vcpu, 160 + unsigned long pc, unsigned long a1) 161 + { 162 + spin_lock(&vcpu->arch.reset_state.lock); 163 + vcpu->arch.reset_state.pc = pc; 164 + vcpu->arch.reset_state.a1 = a1; 165 + spin_unlock(&vcpu->arch.reset_state.lock); 166 + 167 + kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); 168 + } 169 + 170 + void kvm_riscv_vcpu_sbi_load_reset_state(struct kvm_vcpu *vcpu) 171 + { 172 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 173 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 174 + struct kvm_vcpu_reset_state *reset_state = &vcpu->arch.reset_state; 175 + 176 + cntx->a0 = vcpu->vcpu_id; 177 + 178 + spin_lock(&vcpu->arch.reset_state.lock); 179 + cntx->sepc = reset_state->pc; 180 + cntx->a1 = reset_state->a1; 181 + spin_unlock(&vcpu->arch.reset_state.lock); 182 + 183 + cntx->sstatus &= ~SR_SIE; 184 + csr->vsatp = 0; 157 185 } 158 186 159 187 int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run)

+1 -12

arch/riscv/kvm/vcpu_sbi_hsm.c

··· 15 15 16 16 static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu) 17 17 { 18 - struct kvm_cpu_context *reset_cntx; 19 18 struct kvm_cpu_context *cp = &vcpu->arch.guest_context; 20 19 struct kvm_vcpu *target_vcpu; 21 20 unsigned long target_vcpuid = cp->a0; ··· 31 32 goto out; 32 33 } 33 34 34 - spin_lock(&target_vcpu->arch.reset_cntx_lock); 35 - reset_cntx = &target_vcpu->arch.guest_reset_context; 36 - /* start address */ 37 - reset_cntx->sepc = cp->a1; 38 - /* target vcpu id to start */ 39 - reset_cntx->a0 = target_vcpuid; 40 - /* private data passed from kernel */ 41 - reset_cntx->a1 = cp->a2; 42 - spin_unlock(&target_vcpu->arch.reset_cntx_lock); 43 - 44 - kvm_make_request(KVM_REQ_VCPU_RESET, target_vcpu); 35 + kvm_riscv_vcpu_sbi_request_reset(target_vcpu, cp->a1, cp->a2); 45 36 46 37 __kvm_riscv_vcpu_power_on(target_vcpu); 47 38

+1 -9

arch/riscv/kvm/vcpu_sbi_system.c

··· 13 13 struct kvm_vcpu_sbi_return *retdata) 14 14 { 15 15 struct kvm_cpu_context *cp = &vcpu->arch.guest_context; 16 - struct kvm_cpu_context *reset_cntx; 17 16 unsigned long funcid = cp->a6; 18 17 unsigned long hva, i; 19 18 struct kvm_vcpu *tmp; ··· 44 45 } 45 46 } 46 47 47 - spin_lock(&vcpu->arch.reset_cntx_lock); 48 - reset_cntx = &vcpu->arch.guest_reset_context; 49 - reset_cntx->sepc = cp->a1; 50 - reset_cntx->a0 = vcpu->vcpu_id; 51 - reset_cntx->a1 = cp->a2; 52 - spin_unlock(&vcpu->arch.reset_cntx_lock); 53 - 54 - kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); 48 + kvm_riscv_vcpu_sbi_request_reset(vcpu, cp->a1, cp->a2); 55 49 56 50 /* userspace provides the suspend implementation */ 57 51 kvm_riscv_vcpu_sbi_forward(vcpu, run);

+7 -6

arch/riscv/kvm/vcpu_vector.c

··· 22 22 struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 23 23 24 24 cntx->sstatus &= ~SR_VS; 25 + 26 + cntx->vector.vlenb = riscv_v_vsize / 32; 27 + 25 28 if (riscv_isa_extension_available(isa, v)) { 26 29 cntx->sstatus |= SR_VS_INITIAL; 27 30 WARN_ON(!cntx->vector.datap); ··· 73 70 __kvm_riscv_vector_restore(cntx); 74 71 } 75 72 76 - int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu, 77 - struct kvm_cpu_context *cntx) 73 + int kvm_riscv_vcpu_alloc_vector_context(struct kvm_vcpu *vcpu) 78 74 { 79 - cntx->vector.datap = kmalloc(riscv_v_vsize, GFP_KERNEL); 80 - if (!cntx->vector.datap) 75 + vcpu->arch.guest_context.vector.datap = kzalloc(riscv_v_vsize, GFP_KERNEL); 76 + if (!vcpu->arch.guest_context.vector.datap) 81 77 return -ENOMEM; 82 - cntx->vector.vlenb = riscv_v_vsize / 32; 83 78 84 79 vcpu->arch.host_context.vector.datap = kzalloc(riscv_v_vsize, GFP_KERNEL); 85 80 if (!vcpu->arch.host_context.vector.datap) ··· 88 87 89 88 void kvm_riscv_vcpu_free_vector_context(struct kvm_vcpu *vcpu) 90 89 { 91 - kfree(vcpu->arch.guest_reset_context.vector.datap); 90 + kfree(vcpu->arch.guest_context.vector.datap); 92 91 kfree(vcpu->arch.host_context.vector.datap); 93 92 } 94 93 #endif

+13

arch/riscv/kvm/vm.c

··· 209 209 return r; 210 210 } 211 211 212 + int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) 213 + { 214 + switch (cap->cap) { 215 + case KVM_CAP_RISCV_MP_STATE_RESET: 216 + if (cap->flags) 217 + return -EINVAL; 218 + kvm->arch.mp_state_reset = true; 219 + return 0; 220 + default: 221 + return -EINVAL; 222 + } 223 + } 224 + 212 225 int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) 213 226 { 214 227 return -EINVAL;

+4 -1

arch/x86/include/asm/kvm-x86-ops.h

··· 21 21 KVM_X86_OP(vcpu_after_set_cpuid) 22 22 KVM_X86_OP(vm_init) 23 23 KVM_X86_OP_OPTIONAL(vm_destroy) 24 + KVM_X86_OP_OPTIONAL(vm_pre_destroy) 24 25 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) 25 26 KVM_X86_OP(vcpu_create) 26 27 KVM_X86_OP(vcpu_free) ··· 116 115 KVM_X86_OP_OPTIONAL(apicv_pre_state_restore) 117 116 KVM_X86_OP_OPTIONAL(apicv_post_state_restore) 118 117 KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt) 118 + KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt) 119 119 KVM_X86_OP_OPTIONAL(set_hv_timer) 120 120 KVM_X86_OP_OPTIONAL(cancel_hv_timer) 121 121 KVM_X86_OP(setup_mce) ··· 127 125 KVM_X86_OP(enable_smi_window) 128 126 #endif 129 127 KVM_X86_OP_OPTIONAL(dev_get_attr) 130 - KVM_X86_OP_OPTIONAL(mem_enc_ioctl) 128 + KVM_X86_OP(mem_enc_ioctl) 129 + KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl) 131 130 KVM_X86_OP_OPTIONAL(mem_enc_register_region) 132 131 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region) 133 132 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)

+26 -8

arch/x86/include/asm/kvm_host.h

··· 609 609 struct kvm_pmu_ops; 610 610 611 611 enum { 612 - KVM_DEBUGREG_BP_ENABLED = 1, 613 - KVM_DEBUGREG_WONT_EXIT = 2, 612 + KVM_DEBUGREG_BP_ENABLED = BIT(0), 613 + KVM_DEBUGREG_WONT_EXIT = BIT(1), 614 + /* 615 + * Guest debug registers (DR0-3, DR6 and DR7) are saved/restored by 616 + * hardware on exit from or enter to guest. KVM needn't switch them. 617 + * DR0-3, DR6 and DR7 are set to their architectural INIT value on VM 618 + * exit, host values need to be restored. 619 + */ 620 + KVM_DEBUGREG_AUTO_SWITCH = BIT(2), 614 621 }; 615 622 616 623 struct kvm_mtrr { ··· 1578 1571 struct kvm_mmu_memory_cache split_desc_cache; 1579 1572 1580 1573 gfn_t gfn_direct_bits; 1574 + 1575 + /* 1576 + * Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A Zero 1577 + * value indicates CPU dirty logging is unsupported or disabled in 1578 + * current VM. 1579 + */ 1580 + int cpu_dirty_log_size; 1581 1581 }; 1582 1582 1583 1583 struct kvm_vm_stat { ··· 1688 1674 unsigned int vm_size; 1689 1675 int (*vm_init)(struct kvm *kvm); 1690 1676 void (*vm_destroy)(struct kvm *kvm); 1677 + void (*vm_pre_destroy)(struct kvm *kvm); 1691 1678 1692 1679 /* Create, but do not attach this VCPU */ 1693 1680 int (*vcpu_precreate)(struct kvm *kvm); ··· 1838 1823 struct x86_exception *exception); 1839 1824 void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu); 1840 1825 1841 - /* 1842 - * Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A zero 1843 - * value indicates CPU dirty logging is unsupported or disabled. 1844 - */ 1845 - int cpu_dirty_log_size; 1846 1826 void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu); 1847 1827 1848 1828 const struct kvm_x86_nested_ops *nested_ops; ··· 1851 1841 void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu); 1852 1842 void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); 1853 1843 bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu); 1844 + bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu); 1854 1845 1855 1846 int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, 1856 1847 bool *expired); ··· 1868 1857 1869 1858 int (*dev_get_attr)(u32 group, u64 attr, u64 *val); 1870 1859 int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp); 1860 + int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp); 1871 1861 int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp); 1872 1862 int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp); 1873 1863 int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); ··· 2345 2333 int kvm_add_user_return_msr(u32 msr); 2346 2334 int kvm_find_user_return_msr(u32 msr); 2347 2335 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask); 2336 + void kvm_user_return_msr_update_cache(unsigned int index, u64 val); 2348 2337 2349 2338 static inline bool kvm_is_supported_user_return_msr(u32 msr) 2350 2339 { ··· 2429 2416 KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \ 2430 2417 KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \ 2431 2418 KVM_X86_QUIRK_SLOT_ZAP_ALL | \ 2432 - KVM_X86_QUIRK_STUFF_FEATURE_MSRS) 2419 + KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \ 2420 + KVM_X86_QUIRK_IGNORE_GUEST_PAT) 2421 + 2422 + #define KVM_X86_CONDITIONAL_QUIRKS \ 2423 + (KVM_X86_QUIRK_CD_NW_CLEARED | \ 2424 + KVM_X86_QUIRK_IGNORE_GUEST_PAT) 2433 2425 2434 2426 /* 2435 2427 * KVM previously used a u32 field in kvm_run to indicate the hypercall was

+5

arch/x86/include/asm/posted_intr.h

··· 81 81 return test_bit(POSTED_INTR_SN, (unsigned long *)&pi_desc->control); 82 82 } 83 83 84 + static inline bool pi_test_pir(int vector, struct pi_desc *pi_desc) 85 + { 86 + return test_bit(vector, (unsigned long *)pi_desc->pir); 87 + } 88 + 84 89 /* Non-atomic helpers */ 85 90 static inline void __pi_set_sn(struct pi_desc *pi_desc) 86 91 {

+8 -1

arch/x86/include/asm/shared/tdx.h

··· 67 67 #define TD_CTLS_LOCK BIT_ULL(TD_CTLS_LOCK_BIT) 68 68 69 69 /* TDX hypercall Leaf IDs */ 70 + #define TDVMCALL_GET_TD_VM_CALL_INFO 0x10000 70 71 #define TDVMCALL_MAP_GPA 0x10001 71 72 #define TDVMCALL_GET_QUOTE 0x10002 72 73 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 73 74 74 - #define TDVMCALL_STATUS_RETRY 1 75 + /* 76 + * TDG.VP.VMCALL Status Codes (returned in R10) 77 + */ 78 + #define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL 79 + #define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL 80 + #define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL 81 + #define TDVMCALL_STATUS_ALIGN_ERROR 0x8000000000000002ULL 75 82 76 83 /* 77 84 * Bitmasks of exposed registers (with VMM).

+75

arch/x86/include/asm/tdx.h

··· 5 5 6 6 #include <linux/init.h> 7 7 #include <linux/bits.h> 8 + #include <linux/mmzone.h> 8 9 9 10 #include <asm/errno.h> 10 11 #include <asm/ptrace.h> ··· 19 18 * TDX module. 20 19 */ 21 20 #define TDX_ERROR _BITUL(63) 21 + #define TDX_NON_RECOVERABLE _BITUL(62) 22 22 #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) 23 23 #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) 24 24 ··· 35 33 #ifndef __ASSEMBLER__ 36 34 37 35 #include <uapi/asm/mce.h> 36 + #include <asm/tdx_global_metadata.h> 37 + #include <linux/pgtable.h> 38 38 39 39 /* 40 40 * Used by the #VE exception handler to gather the #VE exception ··· 123 119 int tdx_cpu_enable(void); 124 120 int tdx_enable(void); 125 121 const char *tdx_dump_mce_info(struct mce *m); 122 + const struct tdx_sys_info *tdx_get_sysinfo(void); 123 + 124 + int tdx_guest_keyid_alloc(void); 125 + u32 tdx_get_nr_guest_keyids(void); 126 + void tdx_guest_keyid_free(unsigned int keyid); 127 + 128 + struct tdx_td { 129 + /* TD root structure: */ 130 + struct page *tdr_page; 131 + 132 + int tdcs_nr_pages; 133 + /* TD control structure: */ 134 + struct page **tdcs_pages; 135 + 136 + /* Size of `tdcx_pages` in struct tdx_vp */ 137 + int tdcx_nr_pages; 138 + }; 139 + 140 + struct tdx_vp { 141 + /* TDVP root page */ 142 + struct page *tdvpr_page; 143 + 144 + /* TD vCPU control structure: */ 145 + struct page **tdcx_pages; 146 + }; 147 + 148 + static inline u64 mk_keyed_paddr(u16 hkid, struct page *page) 149 + { 150 + u64 ret; 151 + 152 + ret = page_to_phys(page); 153 + /* KeyID bits are just above the physical address bits: */ 154 + ret |= (u64)hkid << boot_cpu_data.x86_phys_bits; 155 + 156 + return ret; 157 + } 158 + 159 + static inline int pg_level_to_tdx_sept_level(enum pg_level level) 160 + { 161 + WARN_ON_ONCE(level == PG_LEVEL_NONE); 162 + return level - 1; 163 + } 164 + 165 + u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args); 166 + u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); 167 + u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2); 168 + u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); 169 + u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); 170 + u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); 171 + u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2); 172 + u64 tdh_mng_key_config(struct tdx_td *td); 173 + u64 tdh_mng_create(struct tdx_td *td, u16 hkid); 174 + u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); 175 + u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data); 176 + u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2); 177 + u64 tdh_mr_finalize(struct tdx_td *td); 178 + u64 tdh_vp_flush(struct tdx_vp *vp); 179 + u64 tdh_mng_vpflushdone(struct tdx_td *td); 180 + u64 tdh_mng_key_freeid(struct tdx_td *td); 181 + u64 tdh_mng_init(struct tdx_td *td, u64 td_params, u64 *extended_err); 182 + u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid); 183 + u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); 184 + u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); 185 + u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx_size); 186 + u64 tdh_mem_track(struct tdx_td *tdr); 187 + u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u64 *ext_err2); 188 + u64 tdh_phymem_cache_wb(bool resume); 189 + u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); 190 + u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); 126 191 #else 127 192 static inline void tdx_init(void) { } 128 193 static inline int tdx_cpu_enable(void) { return -ENODEV; } 129 194 static inline int tdx_enable(void) { return -ENODEV; } 195 + static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } 130 196 static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } 197 + static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } 131 198 #endif /* CONFIG_INTEL_TDX_HOST */ 132 199 133 200 #endif /* !__ASSEMBLER__ */

+2

arch/x86/include/asm/vmx.h

··· 256 256 TSC_MULTIPLIER_HIGH = 0x00002033, 257 257 TERTIARY_VM_EXEC_CONTROL = 0x00002034, 258 258 TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, 259 + SHARED_EPT_POINTER = 0x0000203C, 259 260 PID_POINTER_TABLE = 0x00002042, 260 261 PID_POINTER_TABLE_HIGH = 0x00002043, 261 262 GUEST_PHYSICAL_ADDRESS = 0x00002400, ··· 587 586 #define EPT_VIOLATION_PROT_READ BIT(3) 588 587 #define EPT_VIOLATION_PROT_WRITE BIT(4) 589 588 #define EPT_VIOLATION_PROT_EXEC BIT(5) 589 + #define EPT_VIOLATION_EXEC_FOR_RING3_LIN BIT(6) 590 590 #define EPT_VIOLATION_PROT_MASK (EPT_VIOLATION_PROT_READ | \ 591 591 EPT_VIOLATION_PROT_WRITE | \ 592 592 EPT_VIOLATION_PROT_EXEC)

+71

arch/x86/include/uapi/asm/kvm.h

··· 441 441 #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6) 442 442 #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) 443 443 #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) 444 + #define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9) 444 445 445 446 #define KVM_STATE_NESTED_FORMAT_VMX 0 446 447 #define KVM_STATE_NESTED_FORMAT_SVM 1 ··· 930 929 #define KVM_X86_SEV_ES_VM 3 931 930 #define KVM_X86_SNP_VM 4 932 931 #define KVM_X86_TDX_VM 5 932 + 933 + /* Trust Domain eXtension sub-ioctl() commands. */ 934 + enum kvm_tdx_cmd_id { 935 + KVM_TDX_CAPABILITIES = 0, 936 + KVM_TDX_INIT_VM, 937 + KVM_TDX_INIT_VCPU, 938 + KVM_TDX_INIT_MEM_REGION, 939 + KVM_TDX_FINALIZE_VM, 940 + KVM_TDX_GET_CPUID, 941 + 942 + KVM_TDX_CMD_NR_MAX, 943 + }; 944 + 945 + struct kvm_tdx_cmd { 946 + /* enum kvm_tdx_cmd_id */ 947 + __u32 id; 948 + /* flags for sub-commend. If sub-command doesn't use this, set zero. */ 949 + __u32 flags; 950 + /* 951 + * data for each sub-command. An immediate or a pointer to the actual 952 + * data in process virtual address. If sub-command doesn't use it, 953 + * set zero. 954 + */ 955 + __u64 data; 956 + /* 957 + * Auxiliary error code. The sub-command may return TDX SEAMCALL 958 + * status code in addition to -Exxx. 959 + */ 960 + __u64 hw_error; 961 + }; 962 + 963 + struct kvm_tdx_capabilities { 964 + __u64 supported_attrs; 965 + __u64 supported_xfam; 966 + __u64 reserved[254]; 967 + 968 + /* Configurable CPUID bits for userspace */ 969 + struct kvm_cpuid2 cpuid; 970 + }; 971 + 972 + struct kvm_tdx_init_vm { 973 + __u64 attributes; 974 + __u64 xfam; 975 + __u64 mrconfigid[6]; /* sha384 digest */ 976 + __u64 mrowner[6]; /* sha384 digest */ 977 + __u64 mrownerconfig[6]; /* sha384 digest */ 978 + 979 + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ 980 + __u64 reserved[12]; 981 + 982 + /* 983 + * Call KVM_TDX_INIT_VM before vcpu creation, thus before 984 + * KVM_SET_CPUID2. 985 + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the 986 + * TDX module directly virtualizes those CPUIDs without VMM. The user 987 + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with 988 + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of 989 + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX 990 + * module doesn't virtualize. 991 + */ 992 + struct kvm_cpuid2 cpuid; 993 + }; 994 + 995 + #define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) 996 + 997 + struct kvm_tdx_init_mem_region { 998 + __u64 source_addr; 999 + __u64 gpa; 1000 + __u64 nr_pages; 1001 + }; 933 1002 934 1003 #endif /* _ASM_X86_KVM_H */

+4 -1

arch/x86/include/uapi/asm/vmx.h

··· 34 34 #define EXIT_REASON_TRIPLE_FAULT 2 35 35 #define EXIT_REASON_INIT_SIGNAL 3 36 36 #define EXIT_REASON_SIPI_SIGNAL 4 37 + #define EXIT_REASON_OTHER_SMI 6 37 38 38 39 #define EXIT_REASON_INTERRUPT_WINDOW 7 39 40 #define EXIT_REASON_NMI_WINDOW 8 ··· 93 92 #define EXIT_REASON_TPAUSE 68 94 93 #define EXIT_REASON_BUS_LOCK 74 95 94 #define EXIT_REASON_NOTIFY 75 95 + #define EXIT_REASON_TDCALL 77 96 96 97 97 #define VMX_EXIT_REASONS \ 98 98 { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ ··· 157 155 { EXIT_REASON_UMWAIT, "UMWAIT" }, \ 158 156 { EXIT_REASON_TPAUSE, "TPAUSE" }, \ 159 157 { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ 160 - { EXIT_REASON_NOTIFY, "NOTIFY" } 158 + { EXIT_REASON_NOTIFY, "NOTIFY" }, \ 159 + { EXIT_REASON_TDCALL, "TDCALL" } 161 160 162 161 #define VMX_EXIT_REASON_FLAGS \ 163 162 { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" }

+1 -1

arch/x86/kernel/traps.c

··· 352 352 case BUG_UD1_UBSAN: 353 353 if (IS_ENABLED(CONFIG_UBSAN_TRAP)) { 354 354 pr_crit("%s at %pS\n", 355 - report_ubsan_failure(regs, ud_imm), 355 + report_ubsan_failure(ud_imm), 356 356 (void *)regs->ip); 357 357 } 358 358 break;

+12

arch/x86/kvm/Kconfig

··· 95 95 config KVM_INTEL 96 96 tristate "KVM for Intel (and compatible) processors support" 97 97 depends on KVM && IA32_FEAT_CTL 98 + select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST 99 + select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST 98 100 help 99 101 Provides support for KVM on processors equipped with Intel's VT 100 102 extensions, a.k.a. Virtual Machine Extensions (VMX). ··· 128 126 129 127 This includes support to expose "raw" unreclaimable enclave memory to 130 128 guests via a device node, e.g. /dev/sgx_vepc. 129 + 130 + If unsure, say N. 131 + 132 + config KVM_INTEL_TDX 133 + bool "Intel Trust Domain Extensions (TDX) support" 134 + default y 135 + depends on INTEL_TDX_HOST 136 + help 137 + Provides support for launching Intel Trust Domain Extensions (TDX) 138 + confidential VMs on Intel processors. 131 139 132 140 If unsure, say N. 133 141

+1

arch/x86/kvm/Makefile

··· 20 20 21 21 kvm-intel-$(CONFIG_X86_SGX_KVM) += vmx/sgx.o 22 22 kvm-intel-$(CONFIG_KVM_HYPERV) += vmx/hyperv.o vmx/hyperv_evmcs.o 23 + kvm-intel-$(CONFIG_KVM_INTEL_TDX) += vmx/tdx.o 23 24 24 25 kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o 25 26

+19 -33

arch/x86/kvm/cpuid.c

··· 81 81 return ret; 82 82 } 83 83 84 - /* 85 - * Magic value used by KVM when querying userspace-provided CPUID entries and 86 - * doesn't care about the CPIUD index because the index of the function in 87 - * question is not significant. Note, this magic value must have at least one 88 - * bit set in bits[63:32] and must be consumed as a u64 by cpuid_entry2_find() 89 - * to avoid false positives when processing guest CPUID input. 90 - */ 91 - #define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull 92 - 93 - static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu, 94 - u32 function, u64 index) 84 + struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( 85 + struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index) 95 86 { 96 87 struct kvm_cpuid_entry2 *e; 97 88 int i; ··· 99 108 */ 100 109 lockdep_assert_irqs_enabled(); 101 110 102 - for (i = 0; i < vcpu->arch.cpuid_nent; i++) { 103 - e = &vcpu->arch.cpuid_entries[i]; 111 + for (i = 0; i < nent; i++) { 112 + e = &entries[i]; 104 113 105 114 if (e->function != function) 106 115 continue; ··· 131 140 132 141 return NULL; 133 142 } 134 - 135 - struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, 136 - u32 function, u32 index) 137 - { 138 - return cpuid_entry2_find(vcpu, function, index); 139 - } 140 - EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index); 141 - 142 - struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 143 - u32 function) 144 - { 145 - return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT); 146 - } 147 - EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry); 148 - 149 - /* 150 - * cpuid_entry2_find() and KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used 151 - * directly outside of kvm_find_cpuid_entry() and kvm_find_cpuid_entry_index(). 152 - */ 153 - #undef KVM_CPUID_INDEX_NOT_SIGNIFICANT 143 + EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2); 154 144 155 145 static int kvm_check_cpuid(struct kvm_vcpu *vcpu) 156 146 { ··· 462 490 return best->eax & 0xff; 463 491 not_found: 464 492 return 36; 493 + } 494 + 495 + int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu) 496 + { 497 + struct kvm_cpuid_entry2 *best; 498 + 499 + best = kvm_find_cpuid_entry(vcpu, 0x80000000); 500 + if (!best || best->eax < 0x80000008) 501 + goto not_found; 502 + best = kvm_find_cpuid_entry(vcpu, 0x80000008); 503 + if (best) 504 + return (best->eax >> 16) & 0xff; 505 + not_found: 506 + return 0; 465 507 } 466 508 467 509 /*

+29 -4

arch/x86/kvm/cpuid.h

··· 11 11 void kvm_set_cpu_caps(void); 12 12 13 13 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu); 14 - struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, 15 - u32 function, u32 index); 16 - struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 17 - u32 function); 14 + struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries, 15 + int nent, u32 function, u64 index); 16 + /* 17 + * Magic value used by KVM when querying userspace-provided CPUID entries and 18 + * doesn't care about the CPIUD index because the index of the function in 19 + * question is not significant. Note, this magic value must have at least one 20 + * bit set in bits[63:32] and must be consumed as a u64 by kvm_find_cpuid_entry2() 21 + * to avoid false positives when processing guest CPUID input. 22 + * 23 + * KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used directly outside of 24 + * kvm_find_cpuid_entry2() and kvm_find_cpuid_entry(). 25 + */ 26 + #define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull 27 + 28 + static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, 29 + u32 function, u32 index) 30 + { 31 + return kvm_find_cpuid_entry2(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent, 32 + function, index); 33 + } 34 + 35 + static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 36 + u32 function) 37 + { 38 + return kvm_find_cpuid_entry2(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent, 39 + function, KVM_CPUID_INDEX_NOT_SIGNIFICANT); 40 + } 41 + 18 42 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, 19 43 struct kvm_cpuid_entry2 __user *entries, 20 44 unsigned int type); ··· 58 34 u32 xstate_required_size(u64 xstate_bv, bool compacted); 59 35 60 36 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); 37 + int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu); 61 38 u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu); 62 39 63 40 static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)

+3

arch/x86/kvm/irq.c

··· 100 100 if (kvm_cpu_has_extint(v)) 101 101 return 1; 102 102 103 + if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected) 104 + return kvm_x86_call(protected_apic_has_interrupt)(v); 105 + 103 106 return kvm_apic_has_interrupt(v) != -1; /* LAPIC */ 104 107 } 105 108 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);

+14 -1

arch/x86/kvm/lapic.c

··· 1790 1790 static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) 1791 1791 { 1792 1792 struct kvm_lapic *apic = vcpu->arch.apic; 1793 - u32 reg = kvm_lapic_get_reg(apic, APIC_LVTT); 1793 + u32 reg; 1794 1794 1795 + /* 1796 + * Assume a timer IRQ was "injected" if the APIC is protected. KVM's 1797 + * copy of the vIRR is bogus, it's the responsibility of the caller to 1798 + * precisely check whether or not a timer IRQ is pending. 1799 + */ 1800 + if (apic->guest_apic_protected) 1801 + return true; 1802 + 1803 + reg = kvm_lapic_get_reg(apic, APIC_LVTT); 1795 1804 if (kvm_apic_hw_enabled(apic)) { 1796 1805 int vec = reg & APIC_VECTOR_MASK; 1797 1806 void *bitmap = apic->regs + APIC_ISR; ··· 2659 2650 kvm_recalculate_apic_map(vcpu->kvm); 2660 2651 return 0; 2661 2652 } 2653 + EXPORT_SYMBOL_GPL(kvm_apic_set_base); 2662 2654 2663 2655 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) 2664 2656 { ··· 2966 2956 u32 ppr; 2967 2957 2968 2958 if (!kvm_apic_present(vcpu)) 2959 + return -1; 2960 + 2961 + if (apic->guest_apic_protected) 2969 2962 return -1; 2970 2963 2971 2964 __apic_update_ppr(apic, &ppr);

+2

arch/x86/kvm/lapic.h

··· 65 65 bool sw_enabled; 66 66 bool irr_pending; 67 67 bool lvt0_in_nmi_mode; 68 + /* Select registers in the vAPIC cannot be read/written. */ 69 + bool guest_apic_protected; 68 70 /* Number of bits set in ISR. */ 69 71 s16 isr_count; 70 72 /* The highest vector set in ISR; if -1 - invalid, must scan ISR. */

+5 -1

arch/x86/kvm/mmu.h

··· 79 79 u8 kvm_mmu_get_max_tdp_level(void); 80 80 81 81 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); 82 + void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value); 82 83 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); 83 84 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); 84 85 ··· 235 234 return -(u32)fault & errcode; 236 235 } 237 236 238 - bool kvm_mmu_may_ignore_guest_pat(void); 237 + bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm); 239 238 240 239 int kvm_mmu_post_init_vm(struct kvm *kvm); 241 240 void kvm_mmu_pre_destroy_vm(struct kvm *kvm); ··· 256 255 #else 257 256 #define tdp_mmu_enabled false 258 257 #endif 258 + 259 + bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa); 260 + int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level); 259 261 260 262 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) 261 263 {

+20 -19

arch/x86/kvm/mmu/mmu.c

··· 110 110 #ifdef CONFIG_X86_64 111 111 bool __read_mostly tdp_mmu_enabled = true; 112 112 module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); 113 + EXPORT_SYMBOL_GPL(tdp_mmu_enabled); 113 114 #endif 114 115 115 116 static int max_huge_page_level __read_mostly; ··· 1457 1456 * enabled but it chooses between clearing the Dirty bit and Writeable 1458 1457 * bit based on the context. 1459 1458 */ 1460 - if (kvm_x86_ops.cpu_dirty_log_size) 1459 + if (kvm->arch.cpu_dirty_log_size) 1461 1460 kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask); 1462 1461 else 1463 1462 kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); 1464 1463 } 1465 1464 1466 - int kvm_cpu_dirty_log_size(void) 1465 + int kvm_cpu_dirty_log_size(struct kvm *kvm) 1467 1466 { 1468 - return kvm_x86_ops.cpu_dirty_log_size; 1467 + return kvm->arch.cpu_dirty_log_size; 1469 1468 } 1470 1469 1471 1470 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, ··· 4836 4835 } 4837 4836 #endif 4838 4837 4839 - bool kvm_mmu_may_ignore_guest_pat(void) 4840 - { 4841 - /* 4842 - * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM 4843 - * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to 4844 - * honor the memtype from the guest's PAT so that guest accesses to 4845 - * memory that is DMA'd aren't cached against the guest's wishes. As a 4846 - * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA, 4847 - * KVM _always_ ignores guest PAT (when EPT is enabled). 4848 - */ 4849 - return shadow_memtype_mask; 4850 - } 4851 - 4852 4838 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 4853 4839 { 4854 4840 #ifdef CONFIG_X86_64 ··· 4846 4858 return direct_page_fault(vcpu, fault); 4847 4859 } 4848 4860 4849 - static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, 4850 - u8 *level) 4861 + int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level) 4851 4862 { 4852 4863 int r; 4853 4864 ··· 4860 4873 do { 4861 4874 if (signal_pending(current)) 4862 4875 return -EINTR; 4876 + 4877 + if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) 4878 + return -EIO; 4879 + 4863 4880 cond_resched(); 4864 4881 r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); 4865 4882 } while (r == RET_PF_RETRY); ··· 4888 4897 return -EIO; 4889 4898 } 4890 4899 } 4900 + EXPORT_SYMBOL_GPL(kvm_tdp_map_page); 4891 4901 4892 4902 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, 4893 4903 struct kvm_pre_fault_memory *range) ··· 5581 5589 5582 5590 static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) 5583 5591 { 5592 + int maxpa; 5593 + 5594 + if (vcpu->kvm->arch.vm_type == KVM_X86_TDX_VM) 5595 + maxpa = cpuid_query_maxguestphyaddr(vcpu); 5596 + else 5597 + maxpa = cpuid_maxphyaddr(vcpu); 5598 + 5584 5599 /* tdp_root_level is architecture forced level, use it if nonzero */ 5585 5600 if (tdp_root_level) 5586 5601 return tdp_root_level; 5587 5602 5588 5603 /* Use 5-level TDP if and only if it's useful/necessary. */ 5589 - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48) 5604 + if (max_tdp_level == 5 && maxpa <= 48) 5590 5605 return 4; 5591 5606 5592 5607 return max_tdp_level; ··· 5912 5913 out: 5913 5914 return r; 5914 5915 } 5916 + EXPORT_SYMBOL_GPL(kvm_mmu_load); 5915 5917 5916 5918 void kvm_mmu_unload(struct kvm_vcpu *vcpu) 5917 5919 { ··· 7239 7239 .start = slot->base_gfn, 7240 7240 .end = slot->base_gfn + slot->npages, 7241 7241 .may_block = true, 7242 + .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED, 7242 7243 }; 7243 7244 bool flush; 7244 7245

+3 -2

arch/x86/kvm/mmu/mmu_internal.h

··· 187 187 return kvm_gfn_direct_bits(kvm); 188 188 } 189 189 190 - static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) 190 + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm *kvm, 191 + struct kvm_mmu_page *sp) 191 192 { 192 193 /* 193 194 * When using the EPT page-modification log, the GPAs in the CPU dirty ··· 198 197 * being enabled is mandatory as the bits used to denote WP-only SPTEs 199 198 * are reserved for PAE paging (32-bit KVM). 200 199 */ 201 - return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; 200 + return kvm->arch.cpu_dirty_log_size && sp->role.guest_mode; 202 201 } 203 202 204 203 static inline gfn_t gfn_round_for_level(gfn_t gfn, int level)

+3

arch/x86/kvm/mmu/page_track.c

··· 172 172 struct kvm_memory_slot *slot; 173 173 int r = 0, i, bkt; 174 174 175 + if (kvm->arch.vm_type == KVM_X86_TDX_VM) 176 + return -EOPNOTSUPP; 177 + 175 178 mutex_lock(&kvm->slots_arch_lock); 176 179 177 180 /*

+9 -20

arch/x86/kvm/mmu/spte.c

··· 37 37 u64 __read_mostly shadow_mmio_mask; 38 38 u64 __read_mostly shadow_mmio_access_mask; 39 39 u64 __read_mostly shadow_present_mask; 40 - u64 __read_mostly shadow_memtype_mask; 41 40 u64 __read_mostly shadow_me_value; 42 41 u64 __read_mostly shadow_me_mask; 43 42 u64 __read_mostly shadow_acc_track_mask; ··· 94 95 u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK; 95 96 u64 spte = generation_mmio_spte_mask(gen); 96 97 u64 gpa = gfn << PAGE_SHIFT; 97 - 98 - WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value); 99 98 100 99 access &= shadow_mmio_access_mask; 101 100 spte |= vcpu->kvm->arch.shadow_mmio_value | access; ··· 174 177 175 178 if (sp->role.ad_disabled) 176 179 spte |= SPTE_TDP_AD_DISABLED; 177 - else if (kvm_mmu_page_ad_need_write_protect(sp)) 180 + else if (kvm_mmu_page_ad_need_write_protect(vcpu->kvm, sp)) 178 181 spte |= SPTE_TDP_AD_WRPROT_ONLY; 179 182 180 183 spte |= shadow_present_mask; ··· 209 212 if (level > PG_LEVEL_4K) 210 213 spte |= PT_PAGE_SIZE_MASK; 211 214 212 - if (shadow_memtype_mask) 213 - spte |= kvm_x86_call(get_mt_mask)(vcpu, gfn, 214 - kvm_is_mmio_pfn(pfn)); 215 + spte |= kvm_x86_call(get_mt_mask)(vcpu, gfn, kvm_is_mmio_pfn(pfn)); 215 216 if (host_writable) 216 217 spte |= shadow_host_writable_mask; 217 218 else ··· 435 440 } 436 441 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); 437 442 443 + void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) 444 + { 445 + kvm->arch.shadow_mmio_value = mmio_value; 446 + } 447 + EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); 448 + 438 449 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) 439 450 { 440 451 /* shadow_me_value must be a subset of shadow_me_mask */ ··· 464 463 /* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */ 465 464 shadow_present_mask = 466 465 (has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT; 467 - /* 468 - * EPT overrides the host MTRRs, and so KVM must program the desired 469 - * memtype directly into the SPTEs. Note, this mask is just the mask 470 - * of all bits that factor into the memtype, the actual memtype must be 471 - * dynamically calculated, e.g. to ensure host MMIO is mapped UC. 472 - */ 473 - shadow_memtype_mask = VMX_EPT_MT_MASK | VMX_EPT_IPAT_BIT; 466 + 474 467 shadow_acc_track_mask = VMX_EPT_RWX_MASK; 475 468 shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE; 476 469 shadow_mmu_writable_mask = EPT_SPTE_MMU_WRITABLE; ··· 516 521 shadow_x_mask = 0; 517 522 shadow_present_mask = PT_PRESENT_MASK; 518 523 519 - /* 520 - * For shadow paging and NPT, KVM uses PAT entry '0' to encode WB 521 - * memtype in the SPTEs, i.e. relies on host MTRRs to provide the 522 - * correct memtype (WB is the "weakest" memtype). 523 - */ 524 - shadow_memtype_mask = 0; 525 524 shadow_acc_track_mask = 0; 526 525 shadow_me_mask = 0; 527 526 shadow_me_value = 0;

-1

arch/x86/kvm/mmu/spte.h

··· 187 187 extern u64 __read_mostly shadow_mmio_mask; 188 188 extern u64 __read_mostly shadow_mmio_access_mask; 189 189 extern u64 __read_mostly shadow_present_mask; 190 - extern u64 __read_mostly shadow_memtype_mask; 191 190 extern u64 __read_mostly shadow_me_value; 192 191 extern u64 __read_mostly shadow_me_mask; 193 192

+38 -11

arch/x86/kvm/mmu/tdp_mmu.c

··· 1630 1630 } 1631 1631 } 1632 1632 1633 - static bool tdp_mmu_need_write_protect(struct kvm_mmu_page *sp) 1633 + static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_page *sp) 1634 1634 { 1635 1635 /* 1636 1636 * All TDP MMU shadow pages share the same role as their root, aside 1637 1637 * from level, so it is valid to key off any shadow page to determine if 1638 1638 * write protection is needed for an entire tree. 1639 1639 */ 1640 - return kvm_mmu_page_ad_need_write_protect(sp) || !kvm_ad_enabled; 1640 + return kvm_mmu_page_ad_need_write_protect(kvm, sp) || !kvm_ad_enabled; 1641 1641 } 1642 1642 1643 1643 static void clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, 1644 1644 gfn_t start, gfn_t end) 1645 1645 { 1646 - const u64 dbit = tdp_mmu_need_write_protect(root) ? PT_WRITABLE_MASK : 1647 - shadow_dirty_mask; 1646 + const u64 dbit = tdp_mmu_need_write_protect(kvm, root) ? 1647 + PT_WRITABLE_MASK : shadow_dirty_mask; 1648 1648 struct tdp_iter iter; 1649 1649 1650 1650 rcu_read_lock(); ··· 1689 1689 static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, 1690 1690 gfn_t gfn, unsigned long mask, bool wrprot) 1691 1691 { 1692 - const u64 dbit = (wrprot || tdp_mmu_need_write_protect(root)) ? PT_WRITABLE_MASK : 1693 - shadow_dirty_mask; 1692 + const u64 dbit = (wrprot || tdp_mmu_need_write_protect(kvm, root)) ? 1693 + PT_WRITABLE_MASK : shadow_dirty_mask; 1694 1694 struct tdp_iter iter; 1695 1695 1696 1696 lockdep_assert_held_write(&kvm->mmu_lock); ··· 1911 1911 * 1912 1912 * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. 1913 1913 */ 1914 - int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, 1915 - int *root_level) 1914 + static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, 1915 + struct kvm_mmu_page *root) 1916 1916 { 1917 - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); 1918 1917 struct tdp_iter iter; 1919 1918 gfn_t gfn = addr >> PAGE_SHIFT; 1920 1919 int leaf = -1; 1921 - 1922 - *root_level = vcpu->arch.mmu->root_role.level; 1923 1920 1924 1921 for_each_tdp_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { 1925 1922 leaf = iter.level; ··· 1925 1928 1926 1929 return leaf; 1927 1930 } 1931 + 1932 + int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, 1933 + int *root_level) 1934 + { 1935 + struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); 1936 + *root_level = vcpu->arch.mmu->root_role.level; 1937 + 1938 + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root); 1939 + } 1940 + 1941 + bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa) 1942 + { 1943 + struct kvm *kvm = vcpu->kvm; 1944 + bool is_direct = kvm_is_addr_direct(kvm, gpa); 1945 + hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa : 1946 + vcpu->arch.mmu->mirror_root_hpa; 1947 + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; 1948 + int leaf; 1949 + 1950 + lockdep_assert_held(&kvm->mmu_lock); 1951 + rcu_read_lock(); 1952 + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root)); 1953 + rcu_read_unlock(); 1954 + if (leaf < 0) 1955 + return false; 1956 + 1957 + spte = sptes[leaf]; 1958 + return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); 1959 + } 1960 + EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); 1928 1961 1929 1962 /* 1930 1963 * Returns the last level spte pointer of the shadow page walk for the given

+3

arch/x86/kvm/smm.h

··· 142 142 143 143 static inline int kvm_inject_smi(struct kvm_vcpu *vcpu) 144 144 { 145 + if (!kvm_x86_call(has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE)) 146 + return -ENOTTY; 147 + 145 148 kvm_make_request(KVM_REQ_SMI, vcpu); 146 149 return 0; 147 150 }

+1

arch/x86/kvm/svm/svm.c

··· 5551 5551 */ 5552 5552 allow_smaller_maxphyaddr = !npt_enabled; 5553 5553 5554 + kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED; 5554 5555 return 0; 5555 5556 5556 5557 err:

+182

arch/x86/kvm/vmx/common.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef __KVM_X86_VMX_COMMON_H 3 + #define __KVM_X86_VMX_COMMON_H 4 + 5 + #include <linux/kvm_host.h> 6 + #include <asm/posted_intr.h> 7 + 8 + #include "mmu.h" 9 + 10 + union vmx_exit_reason { 11 + struct { 12 + u32 basic : 16; 13 + u32 reserved16 : 1; 14 + u32 reserved17 : 1; 15 + u32 reserved18 : 1; 16 + u32 reserved19 : 1; 17 + u32 reserved20 : 1; 18 + u32 reserved21 : 1; 19 + u32 reserved22 : 1; 20 + u32 reserved23 : 1; 21 + u32 reserved24 : 1; 22 + u32 reserved25 : 1; 23 + u32 bus_lock_detected : 1; 24 + u32 enclave_mode : 1; 25 + u32 smi_pending_mtf : 1; 26 + u32 smi_from_vmx_root : 1; 27 + u32 reserved30 : 1; 28 + u32 failed_vmentry : 1; 29 + }; 30 + u32 full; 31 + }; 32 + 33 + struct vcpu_vt { 34 + /* Posted interrupt descriptor */ 35 + struct pi_desc pi_desc; 36 + 37 + /* Used if this vCPU is waiting for PI notification wakeup. */ 38 + struct list_head pi_wakeup_list; 39 + 40 + union vmx_exit_reason exit_reason; 41 + 42 + unsigned long exit_qualification; 43 + u32 exit_intr_info; 44 + 45 + /* 46 + * If true, guest state has been loaded into hardware, and host state 47 + * saved into vcpu_{vt,vmx,tdx}. If false, host state is loaded into 48 + * hardware. 49 + */ 50 + bool guest_state_loaded; 51 + bool emulation_required; 52 + 53 + #ifdef CONFIG_X86_64 54 + u64 msr_host_kernel_gs_base; 55 + #endif 56 + 57 + unsigned long host_debugctlmsr; 58 + }; 59 + 60 + #ifdef CONFIG_KVM_INTEL_TDX 61 + 62 + static __always_inline bool is_td(struct kvm *kvm) 63 + { 64 + return kvm->arch.vm_type == KVM_X86_TDX_VM; 65 + } 66 + 67 + static __always_inline bool is_td_vcpu(struct kvm_vcpu *vcpu) 68 + { 69 + return is_td(vcpu->kvm); 70 + } 71 + 72 + #else 73 + 74 + static inline bool is_td(struct kvm *kvm) { return false; } 75 + static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; } 76 + 77 + #endif 78 + 79 + static inline bool vt_is_tdx_private_gpa(struct kvm *kvm, gpa_t gpa) 80 + { 81 + /* For TDX the direct mask is the shared mask. */ 82 + return !kvm_is_addr_direct(kvm, gpa); 83 + } 84 + 85 + static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, 86 + unsigned long exit_qualification) 87 + { 88 + u64 error_code; 89 + 90 + /* Is it a read fault? */ 91 + error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) 92 + ? PFERR_USER_MASK : 0; 93 + /* Is it a write fault? */ 94 + error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) 95 + ? PFERR_WRITE_MASK : 0; 96 + /* Is it a fetch fault? */ 97 + error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) 98 + ? PFERR_FETCH_MASK : 0; 99 + /* ept page table entry is present? */ 100 + error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK) 101 + ? PFERR_PRESENT_MASK : 0; 102 + 103 + if (error_code & EPT_VIOLATION_GVA_IS_VALID) 104 + error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? 105 + PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; 106 + 107 + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) 108 + error_code |= PFERR_PRIVATE_ACCESS; 109 + 110 + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); 111 + } 112 + 113 + static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu, 114 + int pi_vec) 115 + { 116 + #ifdef CONFIG_SMP 117 + if (vcpu->mode == IN_GUEST_MODE) { 118 + /* 119 + * The vector of the virtual has already been set in the PIR. 120 + * Send a notification event to deliver the virtual interrupt 121 + * unless the vCPU is the currently running vCPU, i.e. the 122 + * event is being sent from a fastpath VM-Exit handler, in 123 + * which case the PIR will be synced to the vIRR before 124 + * re-entering the guest. 125 + * 126 + * When the target is not the running vCPU, the following 127 + * possibilities emerge: 128 + * 129 + * Case 1: vCPU stays in non-root mode. Sending a notification 130 + * event posts the interrupt to the vCPU. 131 + * 132 + * Case 2: vCPU exits to root mode and is still runnable. The 133 + * PIR will be synced to the vIRR before re-entering the guest. 134 + * Sending a notification event is ok as the host IRQ handler 135 + * will ignore the spurious event. 136 + * 137 + * Case 3: vCPU exits to root mode and is blocked. vcpu_block() 138 + * has already synced PIR to vIRR and never blocks the vCPU if 139 + * the vIRR is not empty. Therefore, a blocked vCPU here does 140 + * not wait for any requested interrupts in PIR, and sending a 141 + * notification event also results in a benign, spurious event. 142 + */ 143 + 144 + if (vcpu != kvm_get_running_vcpu()) 145 + __apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec); 146 + return; 147 + } 148 + #endif 149 + /* 150 + * The vCPU isn't in the guest; wake the vCPU in case it is blocking, 151 + * otherwise do nothing as KVM will grab the highest priority pending 152 + * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest(). 153 + */ 154 + kvm_vcpu_wake_up(vcpu); 155 + } 156 + 157 + /* 158 + * Post an interrupt to a vCPU's PIR and trigger the vCPU to process the 159 + * interrupt if necessary. 160 + */ 161 + static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, 162 + struct pi_desc *pi_desc, int vector) 163 + { 164 + if (pi_test_and_set_pir(vector, pi_desc)) 165 + return; 166 + 167 + /* If a previous notification has sent the IPI, nothing to do. */ 168 + if (pi_test_and_set_on(pi_desc)) 169 + return; 170 + 171 + /* 172 + * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*() 173 + * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is 174 + * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a 175 + * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. 176 + */ 177 + kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR); 178 + } 179 + 180 + noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu); 181 + 182 + #endif /* __KVM_X86_VMX_COMMON_H */

+1032 -89

arch/x86/kvm/vmx/main.c

··· 3 3 4 4 #include "x86_ops.h" 5 5 #include "vmx.h" 6 + #include "mmu.h" 6 7 #include "nested.h" 7 8 #include "pmu.h" 8 9 #include "posted_intr.h" 10 + #include "tdx.h" 11 + #include "tdx_arch.h" 12 + 13 + #ifdef CONFIG_KVM_INTEL_TDX 14 + static_assert(offsetof(struct vcpu_vmx, vt) == offsetof(struct vcpu_tdx, vt)); 15 + #endif 16 + 17 + static void vt_disable_virtualization_cpu(void) 18 + { 19 + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ 20 + if (enable_tdx) 21 + tdx_disable_virtualization_cpu(); 22 + vmx_disable_virtualization_cpu(); 23 + } 24 + 25 + static __init int vt_hardware_setup(void) 26 + { 27 + int ret; 28 + 29 + ret = vmx_hardware_setup(); 30 + if (ret) 31 + return ret; 32 + 33 + /* 34 + * Update vt_x86_ops::vm_size here so it is ready before 35 + * kvm_ops_update() is called in kvm_x86_vendor_init(). 36 + * 37 + * Note, the actual bringing up of TDX must be done after 38 + * kvm_ops_update() because enabling TDX requires enabling 39 + * hardware virtualization first, i.e., all online CPUs must 40 + * be in post-VMXON state. This means the @vm_size here 41 + * may be updated to TDX's size but TDX may fail to enable 42 + * at later time. 43 + * 44 + * The VMX/VT code could update kvm_x86_ops::vm_size again 45 + * after bringing up TDX, but this would require exporting 46 + * either kvm_x86_ops or kvm_ops_update() from the base KVM 47 + * module, which looks overkill. Anyway, the worst case here 48 + * is KVM may allocate couple of more bytes than needed for 49 + * each VM. 50 + */ 51 + if (enable_tdx) { 52 + vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size, 53 + sizeof(struct kvm_tdx)); 54 + /* 55 + * Note, TDX may fail to initialize in a later time in 56 + * vt_init(), in which case it is not necessary to setup 57 + * those callbacks. But making them valid here even 58 + * when TDX fails to init later is fine because those 59 + * callbacks won't be called if the VM isn't TDX guest. 60 + */ 61 + vt_x86_ops.link_external_spt = tdx_sept_link_private_spt; 62 + vt_x86_ops.set_external_spte = tdx_sept_set_private_spte; 63 + vt_x86_ops.free_external_spt = tdx_sept_free_private_spt; 64 + vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte; 65 + vt_x86_ops.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt; 66 + } 67 + 68 + return 0; 69 + } 70 + 71 + static int vt_vm_init(struct kvm *kvm) 72 + { 73 + if (is_td(kvm)) 74 + return tdx_vm_init(kvm); 75 + 76 + return vmx_vm_init(kvm); 77 + } 78 + 79 + static void vt_vm_pre_destroy(struct kvm *kvm) 80 + { 81 + if (is_td(kvm)) 82 + return tdx_mmu_release_hkid(kvm); 83 + } 84 + 85 + static void vt_vm_destroy(struct kvm *kvm) 86 + { 87 + if (is_td(kvm)) 88 + return tdx_vm_destroy(kvm); 89 + 90 + vmx_vm_destroy(kvm); 91 + } 92 + 93 + static int vt_vcpu_precreate(struct kvm *kvm) 94 + { 95 + if (is_td(kvm)) 96 + return 0; 97 + 98 + return vmx_vcpu_precreate(kvm); 99 + } 100 + 101 + static int vt_vcpu_create(struct kvm_vcpu *vcpu) 102 + { 103 + if (is_td_vcpu(vcpu)) 104 + return tdx_vcpu_create(vcpu); 105 + 106 + return vmx_vcpu_create(vcpu); 107 + } 108 + 109 + static void vt_vcpu_free(struct kvm_vcpu *vcpu) 110 + { 111 + if (is_td_vcpu(vcpu)) { 112 + tdx_vcpu_free(vcpu); 113 + return; 114 + } 115 + 116 + vmx_vcpu_free(vcpu); 117 + } 118 + 119 + static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 120 + { 121 + if (is_td_vcpu(vcpu)) { 122 + tdx_vcpu_reset(vcpu, init_event); 123 + return; 124 + } 125 + 126 + vmx_vcpu_reset(vcpu, init_event); 127 + } 128 + 129 + static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 130 + { 131 + if (is_td_vcpu(vcpu)) { 132 + tdx_vcpu_load(vcpu, cpu); 133 + return; 134 + } 135 + 136 + vmx_vcpu_load(vcpu, cpu); 137 + } 138 + 139 + static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu) 140 + { 141 + /* 142 + * Basic TDX does not support feature PML. KVM does not enable PML in 143 + * TD's VMCS, nor does it allocate or flush PML buffer for TDX. 144 + */ 145 + if (WARN_ON_ONCE(is_td_vcpu(vcpu))) 146 + return; 147 + 148 + vmx_update_cpu_dirty_logging(vcpu); 149 + } 150 + 151 + static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu) 152 + { 153 + if (is_td_vcpu(vcpu)) { 154 + tdx_prepare_switch_to_guest(vcpu); 155 + return; 156 + } 157 + 158 + vmx_prepare_switch_to_guest(vcpu); 159 + } 160 + 161 + static void vt_vcpu_put(struct kvm_vcpu *vcpu) 162 + { 163 + if (is_td_vcpu(vcpu)) { 164 + tdx_vcpu_put(vcpu); 165 + return; 166 + } 167 + 168 + vmx_vcpu_put(vcpu); 169 + } 170 + 171 + static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu) 172 + { 173 + if (is_td_vcpu(vcpu)) 174 + return tdx_vcpu_pre_run(vcpu); 175 + 176 + return vmx_vcpu_pre_run(vcpu); 177 + } 178 + 179 + static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) 180 + { 181 + if (is_td_vcpu(vcpu)) 182 + return tdx_vcpu_run(vcpu, force_immediate_exit); 183 + 184 + return vmx_vcpu_run(vcpu, force_immediate_exit); 185 + } 186 + 187 + static int vt_handle_exit(struct kvm_vcpu *vcpu, 188 + enum exit_fastpath_completion fastpath) 189 + { 190 + if (is_td_vcpu(vcpu)) 191 + return tdx_handle_exit(vcpu, fastpath); 192 + 193 + return vmx_handle_exit(vcpu, fastpath); 194 + } 195 + 196 + static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 197 + { 198 + if (unlikely(is_td_vcpu(vcpu))) 199 + return tdx_set_msr(vcpu, msr_info); 200 + 201 + return vmx_set_msr(vcpu, msr_info); 202 + } 203 + 204 + /* 205 + * The kvm parameter can be NULL (module initialization, or invocation before 206 + * VM creation). Be sure to check the kvm parameter before using it. 207 + */ 208 + static bool vt_has_emulated_msr(struct kvm *kvm, u32 index) 209 + { 210 + if (kvm && is_td(kvm)) 211 + return tdx_has_emulated_msr(index); 212 + 213 + return vmx_has_emulated_msr(kvm, index); 214 + } 215 + 216 + static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 217 + { 218 + if (unlikely(is_td_vcpu(vcpu))) 219 + return tdx_get_msr(vcpu, msr_info); 220 + 221 + return vmx_get_msr(vcpu, msr_info); 222 + } 223 + 224 + static void vt_msr_filter_changed(struct kvm_vcpu *vcpu) 225 + { 226 + /* 227 + * TDX doesn't allow VMM to configure interception of MSR accesses. 228 + * TDX guest requests MSR accesses by calling TDVMCALL. The MSR 229 + * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR 230 + * if the userspace has set any. 231 + */ 232 + if (is_td_vcpu(vcpu)) 233 + return; 234 + 235 + vmx_msr_filter_changed(vcpu); 236 + } 237 + 238 + static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) 239 + { 240 + if (is_td_vcpu(vcpu)) 241 + return tdx_complete_emulated_msr(vcpu, err); 242 + 243 + return kvm_complete_insn_gp(vcpu, err); 244 + } 245 + 246 + #ifdef CONFIG_KVM_SMM 247 + static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) 248 + { 249 + if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) 250 + return 0; 251 + 252 + return vmx_smi_allowed(vcpu, for_injection); 253 + } 254 + 255 + static int vt_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram) 256 + { 257 + if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) 258 + return 0; 259 + 260 + return vmx_enter_smm(vcpu, smram); 261 + } 262 + 263 + static int vt_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram) 264 + { 265 + if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) 266 + return 0; 267 + 268 + return vmx_leave_smm(vcpu, smram); 269 + } 270 + 271 + static void vt_enable_smi_window(struct kvm_vcpu *vcpu) 272 + { 273 + if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) 274 + return; 275 + 276 + /* RSM will cause a vmexit anyway. */ 277 + vmx_enable_smi_window(vcpu); 278 + } 279 + #endif 280 + 281 + static int vt_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, 282 + void *insn, int insn_len) 283 + { 284 + /* 285 + * For TDX, this can only be triggered for MMIO emulation. Let the 286 + * guest retry after installing the SPTE with suppress #VE bit cleared, 287 + * so that the guest will receive #VE when retry. The guest is expected 288 + * to call TDG.VP.VMCALL<MMIO> to request VMM to do MMIO emulation on 289 + * #VE. 290 + */ 291 + if (is_td_vcpu(vcpu)) 292 + return X86EMUL_RETRY_INSTR; 293 + 294 + return vmx_check_emulate_instruction(vcpu, emul_type, insn, insn_len); 295 + } 296 + 297 + static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu) 298 + { 299 + /* 300 + * INIT and SIPI are always blocked for TDX, i.e., INIT handling and 301 + * the OP vcpu_deliver_sipi_vector() won't be called. 302 + */ 303 + if (is_td_vcpu(vcpu)) 304 + return true; 305 + 306 + return vmx_apic_init_signal_blocked(vcpu); 307 + } 308 + 309 + static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu) 310 + { 311 + /* Only x2APIC mode is supported for TD. */ 312 + if (is_td_vcpu(vcpu)) 313 + return; 314 + 315 + return vmx_set_virtual_apic_mode(vcpu); 316 + } 317 + 318 + static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu) 319 + { 320 + struct pi_desc *pi = vcpu_to_pi_desc(vcpu); 321 + 322 + pi_clear_on(pi); 323 + memset(pi->pir, 0, sizeof(pi->pir)); 324 + } 325 + 326 + static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) 327 + { 328 + if (is_td_vcpu(vcpu)) 329 + return; 330 + 331 + return vmx_hwapic_isr_update(vcpu, max_isr); 332 + } 333 + 334 + static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu) 335 + { 336 + if (is_td_vcpu(vcpu)) 337 + return -1; 338 + 339 + return vmx_sync_pir_to_irr(vcpu); 340 + } 341 + 342 + static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, 343 + int trig_mode, int vector) 344 + { 345 + if (is_td_vcpu(apic->vcpu)) { 346 + tdx_deliver_interrupt(apic, delivery_mode, trig_mode, 347 + vector); 348 + return; 349 + } 350 + 351 + vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector); 352 + } 353 + 354 + static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) 355 + { 356 + if (is_td_vcpu(vcpu)) 357 + return; 358 + 359 + vmx_vcpu_after_set_cpuid(vcpu); 360 + } 361 + 362 + static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu) 363 + { 364 + if (is_td_vcpu(vcpu)) 365 + return; 366 + 367 + vmx_update_exception_bitmap(vcpu); 368 + } 369 + 370 + static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg) 371 + { 372 + if (is_td_vcpu(vcpu)) 373 + return 0; 374 + 375 + return vmx_get_segment_base(vcpu, seg); 376 + } 377 + 378 + static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, 379 + int seg) 380 + { 381 + if (is_td_vcpu(vcpu)) { 382 + memset(var, 0, sizeof(*var)); 383 + return; 384 + } 385 + 386 + vmx_get_segment(vcpu, var, seg); 387 + } 388 + 389 + static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, 390 + int seg) 391 + { 392 + if (is_td_vcpu(vcpu)) 393 + return; 394 + 395 + vmx_set_segment(vcpu, var, seg); 396 + } 397 + 398 + static int vt_get_cpl(struct kvm_vcpu *vcpu) 399 + { 400 + if (is_td_vcpu(vcpu)) 401 + return 0; 402 + 403 + return vmx_get_cpl(vcpu); 404 + } 405 + 406 + static int vt_get_cpl_no_cache(struct kvm_vcpu *vcpu) 407 + { 408 + if (is_td_vcpu(vcpu)) 409 + return 0; 410 + 411 + return vmx_get_cpl_no_cache(vcpu); 412 + } 413 + 414 + static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) 415 + { 416 + if (is_td_vcpu(vcpu)) { 417 + *db = 0; 418 + *l = 0; 419 + return; 420 + } 421 + 422 + vmx_get_cs_db_l_bits(vcpu, db, l); 423 + } 424 + 425 + static bool vt_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 426 + { 427 + if (is_td_vcpu(vcpu)) 428 + return true; 429 + 430 + return vmx_is_valid_cr0(vcpu, cr0); 431 + } 432 + 433 + static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 434 + { 435 + if (is_td_vcpu(vcpu)) 436 + return; 437 + 438 + vmx_set_cr0(vcpu, cr0); 439 + } 440 + 441 + static bool vt_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 442 + { 443 + if (is_td_vcpu(vcpu)) 444 + return true; 445 + 446 + return vmx_is_valid_cr4(vcpu, cr4); 447 + } 448 + 449 + static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 450 + { 451 + if (is_td_vcpu(vcpu)) 452 + return; 453 + 454 + vmx_set_cr4(vcpu, cr4); 455 + } 456 + 457 + static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer) 458 + { 459 + if (is_td_vcpu(vcpu)) 460 + return 0; 461 + 462 + return vmx_set_efer(vcpu, efer); 463 + } 464 + 465 + static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) 466 + { 467 + if (is_td_vcpu(vcpu)) { 468 + memset(dt, 0, sizeof(*dt)); 469 + return; 470 + } 471 + 472 + vmx_get_idt(vcpu, dt); 473 + } 474 + 475 + static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) 476 + { 477 + if (is_td_vcpu(vcpu)) 478 + return; 479 + 480 + vmx_set_idt(vcpu, dt); 481 + } 482 + 483 + static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) 484 + { 485 + if (is_td_vcpu(vcpu)) { 486 + memset(dt, 0, sizeof(*dt)); 487 + return; 488 + } 489 + 490 + vmx_get_gdt(vcpu, dt); 491 + } 492 + 493 + static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) 494 + { 495 + if (is_td_vcpu(vcpu)) 496 + return; 497 + 498 + vmx_set_gdt(vcpu, dt); 499 + } 500 + 501 + static void vt_set_dr6(struct kvm_vcpu *vcpu, unsigned long val) 502 + { 503 + if (is_td_vcpu(vcpu)) 504 + return; 505 + 506 + vmx_set_dr6(vcpu, val); 507 + } 508 + 509 + static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) 510 + { 511 + if (is_td_vcpu(vcpu)) 512 + return; 513 + 514 + vmx_set_dr7(vcpu, val); 515 + } 516 + 517 + static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) 518 + { 519 + /* 520 + * MOV-DR exiting is always cleared for TD guest, even in debug mode. 521 + * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never 522 + * reach here for TD vcpu. 523 + */ 524 + if (is_td_vcpu(vcpu)) 525 + return; 526 + 527 + vmx_sync_dirty_debug_regs(vcpu); 528 + } 529 + 530 + static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) 531 + { 532 + if (WARN_ON_ONCE(is_td_vcpu(vcpu))) 533 + return; 534 + 535 + vmx_cache_reg(vcpu, reg); 536 + } 537 + 538 + static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu) 539 + { 540 + if (is_td_vcpu(vcpu)) 541 + return 0; 542 + 543 + return vmx_get_rflags(vcpu); 544 + } 545 + 546 + static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) 547 + { 548 + if (is_td_vcpu(vcpu)) 549 + return; 550 + 551 + vmx_set_rflags(vcpu, rflags); 552 + } 553 + 554 + static bool vt_get_if_flag(struct kvm_vcpu *vcpu) 555 + { 556 + if (is_td_vcpu(vcpu)) 557 + return false; 558 + 559 + return vmx_get_if_flag(vcpu); 560 + } 561 + 562 + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) 563 + { 564 + if (is_td_vcpu(vcpu)) { 565 + tdx_flush_tlb_all(vcpu); 566 + return; 567 + } 568 + 569 + vmx_flush_tlb_all(vcpu); 570 + } 571 + 572 + static void vt_flush_tlb_current(struct kvm_vcpu *vcpu) 573 + { 574 + if (is_td_vcpu(vcpu)) { 575 + tdx_flush_tlb_current(vcpu); 576 + return; 577 + } 578 + 579 + vmx_flush_tlb_current(vcpu); 580 + } 581 + 582 + static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) 583 + { 584 + if (is_td_vcpu(vcpu)) 585 + return; 586 + 587 + vmx_flush_tlb_gva(vcpu, addr); 588 + } 589 + 590 + static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu) 591 + { 592 + if (is_td_vcpu(vcpu)) 593 + return; 594 + 595 + vmx_flush_tlb_guest(vcpu); 596 + } 597 + 598 + static void vt_inject_nmi(struct kvm_vcpu *vcpu) 599 + { 600 + if (is_td_vcpu(vcpu)) { 601 + tdx_inject_nmi(vcpu); 602 + return; 603 + } 604 + 605 + vmx_inject_nmi(vcpu); 606 + } 607 + 608 + static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) 609 + { 610 + /* 611 + * The TDX module manages NMI windows and NMI reinjection, and hides NMI 612 + * blocking, all KVM can do is throw an NMI over the wall. 613 + */ 614 + if (is_td_vcpu(vcpu)) 615 + return true; 616 + 617 + return vmx_nmi_allowed(vcpu, for_injection); 618 + } 619 + 620 + static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu) 621 + { 622 + /* 623 + * KVM can't get NMI blocking status for TDX guest, assume NMIs are 624 + * always unmasked. 625 + */ 626 + if (is_td_vcpu(vcpu)) 627 + return false; 628 + 629 + return vmx_get_nmi_mask(vcpu); 630 + } 631 + 632 + static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) 633 + { 634 + if (is_td_vcpu(vcpu)) 635 + return; 636 + 637 + vmx_set_nmi_mask(vcpu, masked); 638 + } 639 + 640 + static void vt_enable_nmi_window(struct kvm_vcpu *vcpu) 641 + { 642 + /* Refer to the comments in tdx_inject_nmi(). */ 643 + if (is_td_vcpu(vcpu)) 644 + return; 645 + 646 + vmx_enable_nmi_window(vcpu); 647 + } 648 + 649 + static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, 650 + int pgd_level) 651 + { 652 + if (is_td_vcpu(vcpu)) { 653 + tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level); 654 + return; 655 + } 656 + 657 + vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); 658 + } 659 + 660 + static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) 661 + { 662 + if (is_td_vcpu(vcpu)) 663 + return; 664 + 665 + vmx_set_interrupt_shadow(vcpu, mask); 666 + } 667 + 668 + static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu) 669 + { 670 + if (is_td_vcpu(vcpu)) 671 + return 0; 672 + 673 + return vmx_get_interrupt_shadow(vcpu); 674 + } 675 + 676 + static void vt_patch_hypercall(struct kvm_vcpu *vcpu, 677 + unsigned char *hypercall) 678 + { 679 + /* 680 + * Because guest memory is protected, guest can't be patched. TD kernel 681 + * is modified to use TDG.VP.VMCALL for hypercall. 682 + */ 683 + if (is_td_vcpu(vcpu)) 684 + return; 685 + 686 + vmx_patch_hypercall(vcpu, hypercall); 687 + } 688 + 689 + static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) 690 + { 691 + if (is_td_vcpu(vcpu)) 692 + return; 693 + 694 + vmx_inject_irq(vcpu, reinjected); 695 + } 696 + 697 + static void vt_inject_exception(struct kvm_vcpu *vcpu) 698 + { 699 + if (is_td_vcpu(vcpu)) 700 + return; 701 + 702 + vmx_inject_exception(vcpu); 703 + } 704 + 705 + static void vt_cancel_injection(struct kvm_vcpu *vcpu) 706 + { 707 + if (is_td_vcpu(vcpu)) 708 + return; 709 + 710 + vmx_cancel_injection(vcpu); 711 + } 712 + 713 + static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) 714 + { 715 + if (is_td_vcpu(vcpu)) 716 + return tdx_interrupt_allowed(vcpu); 717 + 718 + return vmx_interrupt_allowed(vcpu, for_injection); 719 + } 720 + 721 + static void vt_enable_irq_window(struct kvm_vcpu *vcpu) 722 + { 723 + if (is_td_vcpu(vcpu)) 724 + return; 725 + 726 + vmx_enable_irq_window(vcpu); 727 + } 728 + 729 + static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code) 730 + { 731 + *intr_info = 0; 732 + *error_code = 0; 733 + 734 + if (is_td_vcpu(vcpu)) 735 + return; 736 + 737 + vmx_get_entry_info(vcpu, intr_info, error_code); 738 + } 739 + 740 + static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, 741 + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) 742 + { 743 + if (is_td_vcpu(vcpu)) { 744 + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info, 745 + error_code); 746 + return; 747 + } 748 + 749 + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); 750 + } 751 + 752 + static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) 753 + { 754 + if (is_td_vcpu(vcpu)) 755 + return; 756 + 757 + vmx_update_cr8_intercept(vcpu, tpr, irr); 758 + } 759 + 760 + static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu) 761 + { 762 + if (is_td_vcpu(vcpu)) 763 + return; 764 + 765 + vmx_set_apic_access_page_addr(vcpu); 766 + } 767 + 768 + static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) 769 + { 770 + if (is_td_vcpu(vcpu)) { 771 + KVM_BUG_ON(!kvm_vcpu_apicv_active(vcpu), vcpu->kvm); 772 + return; 773 + } 774 + 775 + vmx_refresh_apicv_exec_ctrl(vcpu); 776 + } 777 + 778 + static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) 779 + { 780 + if (is_td_vcpu(vcpu)) 781 + return; 782 + 783 + vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap); 784 + } 785 + 786 + static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr) 787 + { 788 + if (is_td(kvm)) 789 + return 0; 790 + 791 + return vmx_set_tss_addr(kvm, addr); 792 + } 793 + 794 + static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) 795 + { 796 + if (is_td(kvm)) 797 + return 0; 798 + 799 + return vmx_set_identity_map_addr(kvm, ident_addr); 800 + } 801 + 802 + static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu) 803 + { 804 + /* TDX doesn't support L2 guest at the moment. */ 805 + if (is_td_vcpu(vcpu)) 806 + return 0; 807 + 808 + return vmx_get_l2_tsc_offset(vcpu); 809 + } 810 + 811 + static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) 812 + { 813 + /* TDX doesn't support L2 guest at the moment. */ 814 + if (is_td_vcpu(vcpu)) 815 + return 0; 816 + 817 + return vmx_get_l2_tsc_multiplier(vcpu); 818 + } 819 + 820 + static void vt_write_tsc_offset(struct kvm_vcpu *vcpu) 821 + { 822 + /* In TDX, tsc offset can't be changed. */ 823 + if (is_td_vcpu(vcpu)) 824 + return; 825 + 826 + vmx_write_tsc_offset(vcpu); 827 + } 828 + 829 + static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu) 830 + { 831 + /* In TDX, tsc multiplier can't be changed. */ 832 + if (is_td_vcpu(vcpu)) 833 + return; 834 + 835 + vmx_write_tsc_multiplier(vcpu); 836 + } 837 + 838 + #ifdef CONFIG_X86_64 839 + static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, 840 + bool *expired) 841 + { 842 + /* VMX-preemption timer isn't available for TDX. */ 843 + if (is_td_vcpu(vcpu)) 844 + return -EINVAL; 845 + 846 + return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired); 847 + } 848 + 849 + static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu) 850 + { 851 + /* VMX-preemption timer can't be set. See vt_set_hv_timer(). */ 852 + if (is_td_vcpu(vcpu)) 853 + return; 854 + 855 + vmx_cancel_hv_timer(vcpu); 856 + } 857 + #endif 858 + 859 + static void vt_setup_mce(struct kvm_vcpu *vcpu) 860 + { 861 + if (is_td_vcpu(vcpu)) 862 + return; 863 + 864 + vmx_setup_mce(vcpu); 865 + } 866 + 867 + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) 868 + { 869 + if (!is_td(kvm)) 870 + return -ENOTTY; 871 + 872 + return tdx_vm_ioctl(kvm, argp); 873 + } 874 + 875 + static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) 876 + { 877 + if (!is_td_vcpu(vcpu)) 878 + return -EINVAL; 879 + 880 + return tdx_vcpu_ioctl(vcpu, argp); 881 + } 882 + 883 + static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) 884 + { 885 + if (is_td(kvm)) 886 + return tdx_gmem_private_max_mapping_level(kvm, pfn); 887 + 888 + return 0; 889 + } 9 890 10 891 #define VMX_REQUIRED_APICV_INHIBITS \ 11 892 (BIT(APICV_INHIBIT_REASON_DISABLED) | \ ··· 905 24 .hardware_unsetup = vmx_hardware_unsetup, 906 25 907 26 .enable_virtualization_cpu = vmx_enable_virtualization_cpu, 908 - .disable_virtualization_cpu = vmx_disable_virtualization_cpu, 27 + .disable_virtualization_cpu = vt_disable_virtualization_cpu, 909 28 .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu, 910 29 911 - .has_emulated_msr = vmx_has_emulated_msr, 30 + .has_emulated_msr = vt_has_emulated_msr, 912 31 913 32 .vm_size = sizeof(struct kvm_vmx), 914 - .vm_init = vmx_vm_init, 915 - .vm_destroy = vmx_vm_destroy, 916 33 917 - .vcpu_precreate = vmx_vcpu_precreate, 918 - .vcpu_create = vmx_vcpu_create, 919 - .vcpu_free = vmx_vcpu_free, 920 - .vcpu_reset = vmx_vcpu_reset, 34 + .vm_init = vt_vm_init, 35 + .vm_pre_destroy = vt_vm_pre_destroy, 36 + .vm_destroy = vt_vm_destroy, 921 37 922 - .prepare_switch_to_guest = vmx_prepare_switch_to_guest, 923 - .vcpu_load = vmx_vcpu_load, 924 - .vcpu_put = vmx_vcpu_put, 38 + .vcpu_precreate = vt_vcpu_precreate, 39 + .vcpu_create = vt_vcpu_create, 40 + .vcpu_free = vt_vcpu_free, 41 + .vcpu_reset = vt_vcpu_reset, 925 42 926 - .update_exception_bitmap = vmx_update_exception_bitmap, 43 + .prepare_switch_to_guest = vt_prepare_switch_to_guest, 44 + .vcpu_load = vt_vcpu_load, 45 + .vcpu_put = vt_vcpu_put, 46 + 47 + .update_exception_bitmap = vt_update_exception_bitmap, 927 48 .get_feature_msr = vmx_get_feature_msr, 928 - .get_msr = vmx_get_msr, 929 - .set_msr = vmx_set_msr, 930 - .get_segment_base = vmx_get_segment_base, 931 - .get_segment = vmx_get_segment, 932 - .set_segment = vmx_set_segment, 933 - .get_cpl = vmx_get_cpl, 934 - .get_cpl_no_cache = vmx_get_cpl_no_cache, 935 - .get_cs_db_l_bits = vmx_get_cs_db_l_bits, 936 - .is_valid_cr0 = vmx_is_valid_cr0, 937 - .set_cr0 = vmx_set_cr0, 938 - .is_valid_cr4 = vmx_is_valid_cr4, 939 - .set_cr4 = vmx_set_cr4, 940 - .set_efer = vmx_set_efer, 941 - .get_idt = vmx_get_idt, 942 - .set_idt = vmx_set_idt, 943 - .get_gdt = vmx_get_gdt, 944 - .set_gdt = vmx_set_gdt, 945 - .set_dr6 = vmx_set_dr6, 946 - .set_dr7 = vmx_set_dr7, 947 - .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs, 948 - .cache_reg = vmx_cache_reg, 949 - .get_rflags = vmx_get_rflags, 950 - .set_rflags = vmx_set_rflags, 951 - .get_if_flag = vmx_get_if_flag, 49 + .get_msr = vt_get_msr, 50 + .set_msr = vt_set_msr, 952 51 953 - .flush_tlb_all = vmx_flush_tlb_all, 954 - .flush_tlb_current = vmx_flush_tlb_current, 955 - .flush_tlb_gva = vmx_flush_tlb_gva, 956 - .flush_tlb_guest = vmx_flush_tlb_guest, 52 + .get_segment_base = vt_get_segment_base, 53 + .get_segment = vt_get_segment, 54 + .set_segment = vt_set_segment, 55 + .get_cpl = vt_get_cpl, 56 + .get_cpl_no_cache = vt_get_cpl_no_cache, 57 + .get_cs_db_l_bits = vt_get_cs_db_l_bits, 58 + .is_valid_cr0 = vt_is_valid_cr0, 59 + .set_cr0 = vt_set_cr0, 60 + .is_valid_cr4 = vt_is_valid_cr4, 61 + .set_cr4 = vt_set_cr4, 62 + .set_efer = vt_set_efer, 63 + .get_idt = vt_get_idt, 64 + .set_idt = vt_set_idt, 65 + .get_gdt = vt_get_gdt, 66 + .set_gdt = vt_set_gdt, 67 + .set_dr6 = vt_set_dr6, 68 + .set_dr7 = vt_set_dr7, 69 + .sync_dirty_debug_regs = vt_sync_dirty_debug_regs, 70 + .cache_reg = vt_cache_reg, 71 + .get_rflags = vt_get_rflags, 72 + .set_rflags = vt_set_rflags, 73 + .get_if_flag = vt_get_if_flag, 957 74 958 - .vcpu_pre_run = vmx_vcpu_pre_run, 959 - .vcpu_run = vmx_vcpu_run, 960 - .handle_exit = vmx_handle_exit, 75 + .flush_tlb_all = vt_flush_tlb_all, 76 + .flush_tlb_current = vt_flush_tlb_current, 77 + .flush_tlb_gva = vt_flush_tlb_gva, 78 + .flush_tlb_guest = vt_flush_tlb_guest, 79 + 80 + .vcpu_pre_run = vt_vcpu_pre_run, 81 + .vcpu_run = vt_vcpu_run, 82 + .handle_exit = vt_handle_exit, 961 83 .skip_emulated_instruction = vmx_skip_emulated_instruction, 962 84 .update_emulated_instruction = vmx_update_emulated_instruction, 963 - .set_interrupt_shadow = vmx_set_interrupt_shadow, 964 - .get_interrupt_shadow = vmx_get_interrupt_shadow, 965 - .patch_hypercall = vmx_patch_hypercall, 966 - .inject_irq = vmx_inject_irq, 967 - .inject_nmi = vmx_inject_nmi, 968 - .inject_exception = vmx_inject_exception, 969 - .cancel_injection = vmx_cancel_injection, 970 - .interrupt_allowed = vmx_interrupt_allowed, 971 - .nmi_allowed = vmx_nmi_allowed, 972 - .get_nmi_mask = vmx_get_nmi_mask, 973 - .set_nmi_mask = vmx_set_nmi_mask, 974 - .enable_nmi_window = vmx_enable_nmi_window, 975 - .enable_irq_window = vmx_enable_irq_window, 976 - .update_cr8_intercept = vmx_update_cr8_intercept, 85 + .set_interrupt_shadow = vt_set_interrupt_shadow, 86 + .get_interrupt_shadow = vt_get_interrupt_shadow, 87 + .patch_hypercall = vt_patch_hypercall, 88 + .inject_irq = vt_inject_irq, 89 + .inject_nmi = vt_inject_nmi, 90 + .inject_exception = vt_inject_exception, 91 + .cancel_injection = vt_cancel_injection, 92 + .interrupt_allowed = vt_interrupt_allowed, 93 + .nmi_allowed = vt_nmi_allowed, 94 + .get_nmi_mask = vt_get_nmi_mask, 95 + .set_nmi_mask = vt_set_nmi_mask, 96 + .enable_nmi_window = vt_enable_nmi_window, 97 + .enable_irq_window = vt_enable_irq_window, 98 + .update_cr8_intercept = vt_update_cr8_intercept, 977 99 978 100 .x2apic_icr_is_split = false, 979 - .set_virtual_apic_mode = vmx_set_virtual_apic_mode, 980 - .set_apic_access_page_addr = vmx_set_apic_access_page_addr, 981 - .refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl, 982 - .load_eoi_exitmap = vmx_load_eoi_exitmap, 983 - .apicv_pre_state_restore = vmx_apicv_pre_state_restore, 101 + .set_virtual_apic_mode = vt_set_virtual_apic_mode, 102 + .set_apic_access_page_addr = vt_set_apic_access_page_addr, 103 + .refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl, 104 + .load_eoi_exitmap = vt_load_eoi_exitmap, 105 + .apicv_pre_state_restore = vt_apicv_pre_state_restore, 984 106 .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS, 985 - .hwapic_isr_update = vmx_hwapic_isr_update, 986 - .sync_pir_to_irr = vmx_sync_pir_to_irr, 987 - .deliver_interrupt = vmx_deliver_interrupt, 107 + .hwapic_isr_update = vt_hwapic_isr_update, 108 + .sync_pir_to_irr = vt_sync_pir_to_irr, 109 + .deliver_interrupt = vt_deliver_interrupt, 988 110 .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt, 989 111 990 - .set_tss_addr = vmx_set_tss_addr, 991 - .set_identity_map_addr = vmx_set_identity_map_addr, 112 + .set_tss_addr = vt_set_tss_addr, 113 + .set_identity_map_addr = vt_set_identity_map_addr, 992 114 .get_mt_mask = vmx_get_mt_mask, 993 115 994 - .get_exit_info = vmx_get_exit_info, 995 - .get_entry_info = vmx_get_entry_info, 116 + .get_exit_info = vt_get_exit_info, 117 + .get_entry_info = vt_get_entry_info, 996 118 997 - .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, 119 + .vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid, 998 120 999 121 .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, 1000 122 1001 - .get_l2_tsc_offset = vmx_get_l2_tsc_offset, 1002 - .get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier, 1003 - .write_tsc_offset = vmx_write_tsc_offset, 1004 - .write_tsc_multiplier = vmx_write_tsc_multiplier, 123 + .get_l2_tsc_offset = vt_get_l2_tsc_offset, 124 + .get_l2_tsc_multiplier = vt_get_l2_tsc_multiplier, 125 + .write_tsc_offset = vt_write_tsc_offset, 126 + .write_tsc_multiplier = vt_write_tsc_multiplier, 1005 127 1006 - .load_mmu_pgd = vmx_load_mmu_pgd, 128 + .load_mmu_pgd = vt_load_mmu_pgd, 1007 129 1008 130 .check_intercept = vmx_check_intercept, 1009 131 .handle_exit_irqoff = vmx_handle_exit_irqoff, 1010 132 1011 - .cpu_dirty_log_size = PML_LOG_NR_ENTRIES, 1012 - .update_cpu_dirty_logging = vmx_update_cpu_dirty_logging, 133 + .update_cpu_dirty_logging = vt_update_cpu_dirty_logging, 1013 134 1014 135 .nested_ops = &vmx_nested_ops, 1015 136 ··· 1019 136 .pi_start_assignment = vmx_pi_start_assignment, 1020 137 1021 138 #ifdef CONFIG_X86_64 1022 - .set_hv_timer = vmx_set_hv_timer, 1023 - .cancel_hv_timer = vmx_cancel_hv_timer, 139 + .set_hv_timer = vt_set_hv_timer, 140 + .cancel_hv_timer = vt_cancel_hv_timer, 1024 141 #endif 1025 142 1026 - .setup_mce = vmx_setup_mce, 143 + .setup_mce = vt_setup_mce, 1027 144 1028 145 #ifdef CONFIG_KVM_SMM 1029 - .smi_allowed = vmx_smi_allowed, 1030 - .enter_smm = vmx_enter_smm, 1031 - .leave_smm = vmx_leave_smm, 1032 - .enable_smi_window = vmx_enable_smi_window, 146 + .smi_allowed = vt_smi_allowed, 147 + .enter_smm = vt_enter_smm, 148 + .leave_smm = vt_leave_smm, 149 + .enable_smi_window = vt_enable_smi_window, 1033 150 #endif 1034 151 1035 - .check_emulate_instruction = vmx_check_emulate_instruction, 1036 - .apic_init_signal_blocked = vmx_apic_init_signal_blocked, 152 + .check_emulate_instruction = vt_check_emulate_instruction, 153 + .apic_init_signal_blocked = vt_apic_init_signal_blocked, 1037 154 .migrate_timers = vmx_migrate_timers, 1038 155 1039 - .msr_filter_changed = vmx_msr_filter_changed, 1040 - .complete_emulated_msr = kvm_complete_insn_gp, 156 + .msr_filter_changed = vt_msr_filter_changed, 157 + .complete_emulated_msr = vt_complete_emulated_msr, 1041 158 1042 159 .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, 1043 160 1044 161 .get_untagged_addr = vmx_get_untagged_addr, 162 + 163 + .mem_enc_ioctl = vt_mem_enc_ioctl, 164 + .vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl, 165 + 166 + .private_max_mapping_level = vt_gmem_private_max_mapping_level 1045 167 }; 1046 168 1047 169 struct kvm_x86_init_ops vt_init_ops __initdata = { 1048 - .hardware_setup = vmx_hardware_setup, 170 + .hardware_setup = vt_hardware_setup, 1049 171 .handle_intel_pt_intr = NULL, 1050 172 1051 173 .runtime_ops = &vt_x86_ops, 1052 174 .pmu_ops = &intel_pmu_ops, 1053 175 }; 176 + 177 + static void __exit vt_exit(void) 178 + { 179 + kvm_exit(); 180 + tdx_cleanup(); 181 + vmx_exit(); 182 + } 183 + module_exit(vt_exit); 184 + 185 + static int __init vt_init(void) 186 + { 187 + unsigned vcpu_size, vcpu_align; 188 + int r; 189 + 190 + r = vmx_init(); 191 + if (r) 192 + return r; 193 + 194 + /* tdx_init() has been taken */ 195 + r = tdx_bringup(); 196 + if (r) 197 + goto err_tdx_bringup; 198 + 199 + /* 200 + * TDX and VMX have different vCPU structures. Calculate the 201 + * maximum size/align so that kvm_init() can use the larger 202 + * values to create the kmem_vcpu_cache. 203 + */ 204 + vcpu_size = sizeof(struct vcpu_vmx); 205 + vcpu_align = __alignof__(struct vcpu_vmx); 206 + if (enable_tdx) { 207 + vcpu_size = max_t(unsigned, vcpu_size, 208 + sizeof(struct vcpu_tdx)); 209 + vcpu_align = max_t(unsigned, vcpu_align, 210 + __alignof__(struct vcpu_tdx)); 211 + kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM); 212 + } 213 + 214 + /* 215 + * Common KVM initialization _must_ come last, after this, /dev/kvm is 216 + * exposed to userspace! 217 + */ 218 + r = kvm_init(vcpu_size, vcpu_align, THIS_MODULE); 219 + if (r) 220 + goto err_kvm_init; 221 + 222 + return 0; 223 + 224 + err_kvm_init: 225 + tdx_cleanup(); 226 + err_tdx_bringup: 227 + vmx_exit(); 228 + return r; 229 + } 230 + module_init(vt_init);

+6 -6

arch/x86/kvm/vmx/nested.c

··· 276 276 { 277 277 struct vmcs_host_state *dest, *src; 278 278 279 - if (unlikely(!vmx->guest_state_loaded)) 279 + if (unlikely(!vmx->vt.guest_state_loaded)) 280 280 return; 281 281 282 282 src = &prev->host_state; ··· 426 426 * tables also changed, but KVM should not treat EPT Misconfig 427 427 * VM-Exits as writes. 428 428 */ 429 - WARN_ON_ONCE(vmx->exit_reason.basic != EXIT_REASON_EPT_VIOLATION); 429 + WARN_ON_ONCE(vmx->vt.exit_reason.basic != EXIT_REASON_EPT_VIOLATION); 430 430 431 431 /* 432 432 * PML Full and EPT Violation VM-Exits both use bit 12 to report ··· 4623 4623 { 4624 4624 /* update exit information fields: */ 4625 4625 vmcs12->vm_exit_reason = vm_exit_reason; 4626 - if (to_vmx(vcpu)->exit_reason.enclave_mode) 4626 + if (vmx_get_exit_reason(vcpu).enclave_mode) 4627 4627 vmcs12->vm_exit_reason |= VMX_EXIT_REASONS_SGX_ENCLAVE_MODE; 4628 4628 vmcs12->exit_qualification = exit_qualification; 4629 4629 ··· 4795 4795 vmcs12->vm_exit_msr_load_count)) 4796 4796 nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_MSR_FAIL); 4797 4797 4798 - to_vmx(vcpu)->emulation_required = vmx_emulation_required(vcpu); 4798 + to_vt(vcpu)->emulation_required = vmx_emulation_required(vcpu); 4799 4799 } 4800 4800 4801 4801 static inline u64 nested_vmx_get_vmcs01_guest_efer(struct vcpu_vmx *vmx) ··· 6128 6128 * nested VM-Exit. Pass the original exit reason, i.e. don't hardcode 6129 6129 * EXIT_REASON_VMFUNC as the exit reason. 6130 6130 */ 6131 - nested_vmx_vmexit(vcpu, vmx->exit_reason.full, 6131 + nested_vmx_vmexit(vcpu, vmx->vt.exit_reason.full, 6132 6132 vmx_get_intr_info(vcpu), 6133 6133 vmx_get_exit_qual(vcpu)); 6134 6134 return 1; ··· 6573 6573 bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu) 6574 6574 { 6575 6575 struct vcpu_vmx *vmx = to_vmx(vcpu); 6576 - union vmx_exit_reason exit_reason = vmx->exit_reason; 6576 + union vmx_exit_reason exit_reason = vmx->vt.exit_reason; 6577 6577 unsigned long exit_qual; 6578 6578 u32 exit_intr_info; 6579 6579

+51 -1

arch/x86/kvm/vmx/pmu_intel.c

··· 20 20 #include "lapic.h" 21 21 #include "nested.h" 22 22 #include "pmu.h" 23 + #include "tdx.h" 23 24 24 25 /* 25 26 * Perf's "BASE" is wildly misleading, architectural PMUs use bits 31:16 of ECX ··· 35 34 #define INTEL_RDPMC_INDEX_MASK GENMASK(15, 0) 36 35 37 36 #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0) 37 + 38 + static struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) 39 + { 40 + if (is_td_vcpu(vcpu)) 41 + return NULL; 42 + 43 + return &to_vmx(vcpu)->lbr_desc; 44 + } 45 + 46 + static struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu) 47 + { 48 + if (is_td_vcpu(vcpu)) 49 + return NULL; 50 + 51 + return &to_vmx(vcpu)->lbr_desc.records; 52 + } 53 + 54 + #pragma GCC poison to_vmx 38 55 39 56 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data) 40 57 { ··· 149 130 return get_gp_pmc(pmu, msr, MSR_IA32_PMC0); 150 131 } 151 132 133 + static bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu) 134 + { 135 + if (is_td_vcpu(vcpu)) 136 + return false; 137 + 138 + return cpuid_model_is_consistent(vcpu); 139 + } 140 + 141 + bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) 142 + { 143 + if (is_td_vcpu(vcpu)) 144 + return false; 145 + 146 + return !!vcpu_to_lbr_records(vcpu)->nr; 147 + } 148 + 152 149 static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index) 153 150 { 154 151 struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu); ··· 230 195 { 231 196 struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); 232 197 198 + if (!lbr_desc) 199 + return; 200 + 233 201 if (lbr_desc->event) { 234 202 perf_event_release_kernel(lbr_desc->event); 235 203 lbr_desc->event = NULL; ··· 273 235 .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | 274 236 PERF_SAMPLE_BRANCH_USER, 275 237 }; 238 + 239 + if (WARN_ON_ONCE(!lbr_desc)) 240 + return 0; 276 241 277 242 if (unlikely(lbr_desc->event)) { 278 243 __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use); ··· 508 467 u64 perf_capabilities; 509 468 u64 counter_rsvd; 510 469 470 + if (!lbr_desc) 471 + return; 472 + 511 473 memset(&lbr_desc->records, 0, sizeof(lbr_desc->records)); 512 474 513 475 /* ··· 587 543 INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters); 588 544 589 545 perf_capabilities = vcpu_get_perf_capabilities(vcpu); 590 - if (cpuid_model_is_consistent(vcpu) && 546 + if (intel_pmu_lbr_is_compatible(vcpu) && 591 547 (perf_capabilities & PMU_CAP_LBR_FMT)) 592 548 memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); 593 549 else ··· 614 570 int i; 615 571 struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); 616 572 struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); 573 + 574 + if (!lbr_desc) 575 + return; 617 576 618 577 for (i = 0; i < KVM_MAX_NR_INTEL_GP_COUNTERS; i++) { 619 578 pmu->gp_counters[i].type = KVM_PMC_GP; ··· 724 677 { 725 678 struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); 726 679 struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); 680 + 681 + if (WARN_ON_ONCE(!lbr_desc)) 682 + return; 727 683 728 684 if (!lbr_desc->event) { 729 685 vmx_disable_lbr_msrs_passthrough(vcpu);

+28

arch/x86/kvm/vmx/pmu_intel.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef __KVM_X86_VMX_PMU_INTEL_H 3 + #define __KVM_X86_VMX_PMU_INTEL_H 4 + 5 + #include <linux/kvm_host.h> 6 + 7 + bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu); 8 + int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); 9 + 10 + struct lbr_desc { 11 + /* Basic info about guest LBR records. */ 12 + struct x86_pmu_lbr records; 13 + 14 + /* 15 + * Emulate LBR feature via passthrough LBR registers when the 16 + * per-vcpu guest LBR event is scheduled on the current pcpu. 17 + * 18 + * The records may be inaccurate if the host reclaims the LBR. 19 + */ 20 + struct perf_event *event; 21 + 22 + /* True if LBRs are marked as not intercepted in the MSR bitmap */ 23 + bool msr_passthrough; 24 + }; 25 + 26 + extern struct x86_pmu_lbr vmx_lbr_caps; 27 + 28 + #endif /* __KVM_X86_VMX_PMU_INTEL_H */

+16 -12

arch/x86/kvm/vmx/posted_intr.c

··· 11 11 #include "posted_intr.h" 12 12 #include "trace.h" 13 13 #include "vmx.h" 14 + #include "tdx.h" 14 15 15 16 /* 16 17 * Maintain a per-CPU list of vCPUs that need to be awakened by wakeup_handler() ··· 34 33 35 34 #define PI_LOCK_SCHED_OUT SINGLE_DEPTH_NESTING 36 35 37 - static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) 36 + struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) 38 37 { 39 - return &(to_vmx(vcpu)->pi_desc); 38 + return &(to_vt(vcpu)->pi_desc); 40 39 } 41 40 42 41 static int pi_try_set_control(struct pi_desc *pi_desc, u64 *pold, u64 new) ··· 56 55 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) 57 56 { 58 57 struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); 59 - struct vcpu_vmx *vmx = to_vmx(vcpu); 58 + struct vcpu_vt *vt = to_vt(vcpu); 60 59 struct pi_desc old, new; 61 60 unsigned long flags; 62 61 unsigned int dest; ··· 103 102 */ 104 103 raw_spin_lock(spinlock); 105 104 spin_acquire(&spinlock->dep_map, PI_LOCK_SCHED_OUT, 0, _RET_IP_); 106 - list_del(&vmx->pi_wakeup_list); 105 + list_del(&vt->pi_wakeup_list); 107 106 spin_release(&spinlock->dep_map, _RET_IP_); 108 107 raw_spin_unlock(spinlock); 109 108 } ··· 160 159 static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu) 161 160 { 162 161 struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); 163 - struct vcpu_vmx *vmx = to_vmx(vcpu); 162 + struct vcpu_vt *vt = to_vt(vcpu); 164 163 struct pi_desc old, new; 165 164 166 165 lockdep_assert_irqs_disabled(); ··· 179 178 */ 180 179 raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 181 180 PI_LOCK_SCHED_OUT); 182 - list_add_tail(&vmx->pi_wakeup_list, 181 + list_add_tail(&vt->pi_wakeup_list, 183 182 &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu)); 184 183 raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); 185 184 ··· 214 213 * notification vector is switched to the one that calls 215 214 * back to the pi_wakeup_handler() function. 216 215 */ 217 - return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm); 216 + return (vmx_can_use_ipiv(vcpu) && !is_td_vcpu(vcpu)) || 217 + vmx_can_use_vtd_pi(vcpu->kvm); 218 218 } 219 219 220 220 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) ··· 225 223 if (!vmx_needs_pi_wakeup(vcpu)) 226 224 return; 227 225 228 - if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu)) 226 + if (kvm_vcpu_is_blocking(vcpu) && 227 + ((is_td_vcpu(vcpu) && tdx_interrupt_allowed(vcpu)) || 228 + (!is_td_vcpu(vcpu) && !vmx_interrupt_blocked(vcpu)))) 229 229 pi_enable_wakeup_handler(vcpu); 230 230 231 231 /* ··· 247 243 int cpu = smp_processor_id(); 248 244 struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); 249 245 raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); 250 - struct vcpu_vmx *vmx; 246 + struct vcpu_vt *vt; 251 247 252 248 raw_spin_lock(spinlock); 253 - list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { 249 + list_for_each_entry(vt, wakeup_list, pi_wakeup_list) { 254 250 255 - if (pi_test_on(&vmx->pi_desc)) 256 - kvm_vcpu_wake_up(&vmx->vcpu); 251 + if (pi_test_on(&vt->pi_desc)) 252 + kvm_vcpu_wake_up(vt_to_vcpu(vt)); 257 253 } 258 254 raw_spin_unlock(spinlock); 259 255 }

+2

arch/x86/kvm/vmx/posted_intr.h

··· 5 5 #include <linux/bitmap.h> 6 6 #include <asm/posted_intr.h> 7 7 8 + struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu); 9 + 8 10 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu); 9 11 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu); 10 12 void pi_wakeup_handler(void);

+3526

arch/x86/kvm/vmx/tdx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <linux/cleanup.h> 3 + #include <linux/cpu.h> 4 + #include <asm/cpufeature.h> 5 + #include <asm/fpu/xcr.h> 6 + #include <linux/misc_cgroup.h> 7 + #include <linux/mmu_context.h> 8 + #include <asm/tdx.h> 9 + #include "capabilities.h" 10 + #include "mmu.h" 11 + #include "x86_ops.h" 12 + #include "lapic.h" 13 + #include "tdx.h" 14 + #include "vmx.h" 15 + #include "mmu/spte.h" 16 + #include "common.h" 17 + #include "posted_intr.h" 18 + #include "irq.h" 19 + #include <trace/events/kvm.h> 20 + #include "trace.h" 21 + 22 + #pragma GCC poison to_vmx 23 + 24 + #undef pr_fmt 25 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 26 + 27 + #define pr_tdx_error(__fn, __err) \ 28 + pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err) 29 + 30 + #define __pr_tdx_error_N(__fn_str, __err, __fmt, ...) \ 31 + pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt, __err, __VA_ARGS__) 32 + 33 + #define pr_tdx_error_1(__fn, __err, __rcx) \ 34 + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx) 35 + 36 + #define pr_tdx_error_2(__fn, __err, __rcx, __rdx) \ 37 + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx) 38 + 39 + #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \ 40 + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8) 41 + 42 + bool enable_tdx __ro_after_init; 43 + module_param_named(tdx, enable_tdx, bool, 0444); 44 + 45 + #define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) 46 + #define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) 47 + 48 + static enum cpuhp_state tdx_cpuhp_state; 49 + 50 + static const struct tdx_sys_info *tdx_sysinfo; 51 + 52 + void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err) 53 + { 54 + KVM_BUG_ON(1, tdx->vcpu.kvm); 55 + pr_err("TDH_VP_RD[%s.0x%x] failed 0x%llx\n", uclass, field, err); 56 + } 57 + 58 + void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, 59 + u64 val, u64 err) 60 + { 61 + KVM_BUG_ON(1, tdx->vcpu.kvm); 62 + pr_err("TDH_VP_WR[%s.0x%x]%s0x%llx failed: 0x%llx\n", uclass, field, op, val, err); 63 + } 64 + 65 + #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) 66 + 67 + static __always_inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) 68 + { 69 + return container_of(kvm, struct kvm_tdx, kvm); 70 + } 71 + 72 + static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) 73 + { 74 + return container_of(vcpu, struct vcpu_tdx, vcpu); 75 + } 76 + 77 + static u64 tdx_get_supported_attrs(const struct tdx_sys_info_td_conf *td_conf) 78 + { 79 + u64 val = KVM_SUPPORTED_TD_ATTRS; 80 + 81 + if ((val & td_conf->attributes_fixed1) != td_conf->attributes_fixed1) 82 + return 0; 83 + 84 + val &= td_conf->attributes_fixed0; 85 + 86 + return val; 87 + } 88 + 89 + static u64 tdx_get_supported_xfam(const struct tdx_sys_info_td_conf *td_conf) 90 + { 91 + u64 val = kvm_caps.supported_xcr0 | kvm_caps.supported_xss; 92 + 93 + if ((val & td_conf->xfam_fixed1) != td_conf->xfam_fixed1) 94 + return 0; 95 + 96 + val &= td_conf->xfam_fixed0; 97 + 98 + return val; 99 + } 100 + 101 + static int tdx_get_guest_phys_addr_bits(const u32 eax) 102 + { 103 + return (eax & GENMASK(23, 16)) >> 16; 104 + } 105 + 106 + static u32 tdx_set_guest_phys_addr_bits(const u32 eax, int addr_bits) 107 + { 108 + return (eax & ~GENMASK(23, 16)) | (addr_bits & 0xff) << 16; 109 + } 110 + 111 + #define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM)) 112 + 113 + static bool has_tsx(const struct kvm_cpuid_entry2 *entry) 114 + { 115 + return entry->function == 7 && entry->index == 0 && 116 + (entry->ebx & TDX_FEATURE_TSX); 117 + } 118 + 119 + static void clear_tsx(struct kvm_cpuid_entry2 *entry) 120 + { 121 + entry->ebx &= ~TDX_FEATURE_TSX; 122 + } 123 + 124 + static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry) 125 + { 126 + return entry->function == 7 && entry->index == 0 && 127 + (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG)); 128 + } 129 + 130 + static void clear_waitpkg(struct kvm_cpuid_entry2 *entry) 131 + { 132 + entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG); 133 + } 134 + 135 + static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry) 136 + { 137 + if (has_tsx(entry)) 138 + clear_tsx(entry); 139 + 140 + if (has_waitpkg(entry)) 141 + clear_waitpkg(entry); 142 + } 143 + 144 + static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry) 145 + { 146 + return has_tsx(entry) || has_waitpkg(entry); 147 + } 148 + 149 + #define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) 150 + 151 + static void td_init_cpuid_entry2(struct kvm_cpuid_entry2 *entry, unsigned char idx) 152 + { 153 + const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; 154 + 155 + entry->function = (u32)td_conf->cpuid_config_leaves[idx]; 156 + entry->index = td_conf->cpuid_config_leaves[idx] >> 32; 157 + entry->eax = (u32)td_conf->cpuid_config_values[idx][0]; 158 + entry->ebx = td_conf->cpuid_config_values[idx][0] >> 32; 159 + entry->ecx = (u32)td_conf->cpuid_config_values[idx][1]; 160 + entry->edx = td_conf->cpuid_config_values[idx][1] >> 32; 161 + 162 + if (entry->index == KVM_TDX_CPUID_NO_SUBLEAF) 163 + entry->index = 0; 164 + 165 + /* 166 + * The TDX module doesn't allow configuring the guest phys addr bits 167 + * (EAX[23:16]). However, KVM uses it as an interface to the userspace 168 + * to configure the GPAW. Report these bits as configurable. 169 + */ 170 + if (entry->function == 0x80000008) 171 + entry->eax = tdx_set_guest_phys_addr_bits(entry->eax, 0xff); 172 + 173 + tdx_clear_unsupported_cpuid(entry); 174 + } 175 + 176 + static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, 177 + struct kvm_tdx_capabilities *caps) 178 + { 179 + int i; 180 + 181 + caps->supported_attrs = tdx_get_supported_attrs(td_conf); 182 + if (!caps->supported_attrs) 183 + return -EIO; 184 + 185 + caps->supported_xfam = tdx_get_supported_xfam(td_conf); 186 + if (!caps->supported_xfam) 187 + return -EIO; 188 + 189 + caps->cpuid.nent = td_conf->num_cpuid_config; 190 + 191 + for (i = 0; i < td_conf->num_cpuid_config; i++) 192 + td_init_cpuid_entry2(&caps->cpuid.entries[i], i); 193 + 194 + return 0; 195 + } 196 + 197 + /* 198 + * Some SEAMCALLs acquire the TDX module globally, and can fail with 199 + * TDX_OPERAND_BUSY. Use a global mutex to serialize these SEAMCALLs. 200 + */ 201 + static DEFINE_MUTEX(tdx_lock); 202 + 203 + static atomic_t nr_configured_hkid; 204 + 205 + static bool tdx_operand_busy(u64 err) 206 + { 207 + return (err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY; 208 + } 209 + 210 + 211 + /* 212 + * A per-CPU list of TD vCPUs associated with a given CPU. 213 + * Protected by interrupt mask. Only manipulated by the CPU owning this per-CPU 214 + * list. 215 + * - When a vCPU is loaded onto a CPU, it is removed from the per-CPU list of 216 + * the old CPU during the IPI callback running on the old CPU, and then added 217 + * to the per-CPU list of the new CPU. 218 + * - When a TD is tearing down, all vCPUs are disassociated from their current 219 + * running CPUs and removed from the per-CPU list during the IPI callback 220 + * running on those CPUs. 221 + * - When a CPU is brought down, traverse the per-CPU list to disassociate all 222 + * associated TD vCPUs and remove them from the per-CPU list. 223 + */ 224 + static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); 225 + 226 + static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *vcpu) 227 + { 228 + return to_tdx(vcpu)->vp_enter_args.r10; 229 + } 230 + 231 + static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu) 232 + { 233 + return to_tdx(vcpu)->vp_enter_args.r11; 234 + } 235 + 236 + static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu, 237 + long val) 238 + { 239 + to_tdx(vcpu)->vp_enter_args.r10 = val; 240 + } 241 + 242 + static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu, 243 + unsigned long val) 244 + { 245 + to_tdx(vcpu)->vp_enter_args.r11 = val; 246 + } 247 + 248 + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) 249 + { 250 + tdx_guest_keyid_free(kvm_tdx->hkid); 251 + kvm_tdx->hkid = -1; 252 + atomic_dec(&nr_configured_hkid); 253 + misc_cg_uncharge(MISC_CG_RES_TDX, kvm_tdx->misc_cg, 1); 254 + put_misc_cg(kvm_tdx->misc_cg); 255 + kvm_tdx->misc_cg = NULL; 256 + } 257 + 258 + static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) 259 + { 260 + return kvm_tdx->hkid > 0; 261 + } 262 + 263 + static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) 264 + { 265 + lockdep_assert_irqs_disabled(); 266 + 267 + list_del(&to_tdx(vcpu)->cpu_list); 268 + 269 + /* 270 + * Ensure tdx->cpu_list is updated before setting vcpu->cpu to -1, 271 + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU 272 + * to its list before it's deleted from this CPU's list. 273 + */ 274 + smp_wmb(); 275 + 276 + vcpu->cpu = -1; 277 + } 278 + 279 + static void tdx_clear_page(struct page *page) 280 + { 281 + const void *zero_page = (const void *) page_to_virt(ZERO_PAGE(0)); 282 + void *dest = page_to_virt(page); 283 + unsigned long i; 284 + 285 + /* 286 + * The page could have been poisoned. MOVDIR64B also clears 287 + * the poison bit so the kernel can safely use the page again. 288 + */ 289 + for (i = 0; i < PAGE_SIZE; i += 64) 290 + movdir64b(dest + i, zero_page); 291 + /* 292 + * MOVDIR64B store uses WC buffer. Prevent following memory reads 293 + * from seeing potentially poisoned cache. 294 + */ 295 + __mb(); 296 + } 297 + 298 + static void tdx_no_vcpus_enter_start(struct kvm *kvm) 299 + { 300 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 301 + 302 + lockdep_assert_held_write(&kvm->mmu_lock); 303 + 304 + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, true); 305 + 306 + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); 307 + } 308 + 309 + static void tdx_no_vcpus_enter_stop(struct kvm *kvm) 310 + { 311 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 312 + 313 + lockdep_assert_held_write(&kvm->mmu_lock); 314 + 315 + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, false); 316 + } 317 + 318 + /* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ 319 + static int __tdx_reclaim_page(struct page *page) 320 + { 321 + u64 err, rcx, rdx, r8; 322 + 323 + err = tdh_phymem_page_reclaim(page, &rcx, &rdx, &r8); 324 + 325 + /* 326 + * No need to check for TDX_OPERAND_BUSY; all TD pages are freed 327 + * before the HKID is released and control pages have also been 328 + * released at this point, so there is no possibility of contention. 329 + */ 330 + if (WARN_ON_ONCE(err)) { 331 + pr_tdx_error_3(TDH_PHYMEM_PAGE_RECLAIM, err, rcx, rdx, r8); 332 + return -EIO; 333 + } 334 + return 0; 335 + } 336 + 337 + static int tdx_reclaim_page(struct page *page) 338 + { 339 + int r; 340 + 341 + r = __tdx_reclaim_page(page); 342 + if (!r) 343 + tdx_clear_page(page); 344 + return r; 345 + } 346 + 347 + 348 + /* 349 + * Reclaim the TD control page(s) which are crypto-protected by TDX guest's 350 + * private KeyID. Assume the cache associated with the TDX private KeyID has 351 + * been flushed. 352 + */ 353 + static void tdx_reclaim_control_page(struct page *ctrl_page) 354 + { 355 + /* 356 + * Leak the page if the kernel failed to reclaim the page. 357 + * The kernel cannot use it safely anymore. 358 + */ 359 + if (tdx_reclaim_page(ctrl_page)) 360 + return; 361 + 362 + __free_page(ctrl_page); 363 + } 364 + 365 + struct tdx_flush_vp_arg { 366 + struct kvm_vcpu *vcpu; 367 + u64 err; 368 + }; 369 + 370 + static void tdx_flush_vp(void *_arg) 371 + { 372 + struct tdx_flush_vp_arg *arg = _arg; 373 + struct kvm_vcpu *vcpu = arg->vcpu; 374 + u64 err; 375 + 376 + arg->err = 0; 377 + lockdep_assert_irqs_disabled(); 378 + 379 + /* Task migration can race with CPU offlining. */ 380 + if (unlikely(vcpu->cpu != raw_smp_processor_id())) 381 + return; 382 + 383 + /* 384 + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The 385 + * list tracking still needs to be updated so that it's correct if/when 386 + * the vCPU does get initialized. 387 + */ 388 + if (to_tdx(vcpu)->state != VCPU_TD_STATE_UNINITIALIZED) { 389 + /* 390 + * No need to retry. TDX Resources needed for TDH.VP.FLUSH are: 391 + * TDVPR as exclusive, TDR as shared, and TDCS as shared. This 392 + * vp flush function is called when destructing vCPU/TD or vCPU 393 + * migration. No other thread uses TDVPR in those cases. 394 + */ 395 + err = tdh_vp_flush(&to_tdx(vcpu)->vp); 396 + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { 397 + /* 398 + * This function is called in IPI context. Do not use 399 + * printk to avoid console semaphore. 400 + * The caller prints out the error message, instead. 401 + */ 402 + if (err) 403 + arg->err = err; 404 + } 405 + } 406 + 407 + tdx_disassociate_vp(vcpu); 408 + } 409 + 410 + static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) 411 + { 412 + struct tdx_flush_vp_arg arg = { 413 + .vcpu = vcpu, 414 + }; 415 + int cpu = vcpu->cpu; 416 + 417 + if (unlikely(cpu == -1)) 418 + return; 419 + 420 + smp_call_function_single(cpu, tdx_flush_vp, &arg, 1); 421 + if (KVM_BUG_ON(arg.err, vcpu->kvm)) 422 + pr_tdx_error(TDH_VP_FLUSH, arg.err); 423 + } 424 + 425 + void tdx_disable_virtualization_cpu(void) 426 + { 427 + int cpu = raw_smp_processor_id(); 428 + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); 429 + struct tdx_flush_vp_arg arg; 430 + struct vcpu_tdx *tdx, *tmp; 431 + unsigned long flags; 432 + 433 + local_irq_save(flags); 434 + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ 435 + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) { 436 + arg.vcpu = &tdx->vcpu; 437 + tdx_flush_vp(&arg); 438 + } 439 + local_irq_restore(flags); 440 + } 441 + 442 + #define TDX_SEAMCALL_RETRIES 10000 443 + 444 + static void smp_func_do_phymem_cache_wb(void *unused) 445 + { 446 + u64 err = 0; 447 + bool resume; 448 + int i; 449 + 450 + /* 451 + * TDH.PHYMEM.CACHE.WB flushes caches associated with any TDX private 452 + * KeyID on the package or core. The TDX module may not finish the 453 + * cache flush but return TDX_INTERRUPTED_RESUMEABLE instead. The 454 + * kernel should retry it until it returns success w/o rescheduling. 455 + */ 456 + for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) { 457 + resume = !!err; 458 + err = tdh_phymem_cache_wb(resume); 459 + switch (err) { 460 + case TDX_INTERRUPTED_RESUMABLE: 461 + continue; 462 + case TDX_NO_HKID_READY_TO_WBCACHE: 463 + err = TDX_SUCCESS; /* Already done by other thread */ 464 + fallthrough; 465 + default: 466 + goto out; 467 + } 468 + } 469 + 470 + out: 471 + if (WARN_ON_ONCE(err)) 472 + pr_tdx_error(TDH_PHYMEM_CACHE_WB, err); 473 + } 474 + 475 + void tdx_mmu_release_hkid(struct kvm *kvm) 476 + { 477 + bool packages_allocated, targets_allocated; 478 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 479 + cpumask_var_t packages, targets; 480 + struct kvm_vcpu *vcpu; 481 + unsigned long j; 482 + int i; 483 + u64 err; 484 + 485 + if (!is_hkid_assigned(kvm_tdx)) 486 + return; 487 + 488 + packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); 489 + targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL); 490 + cpus_read_lock(); 491 + 492 + kvm_for_each_vcpu(j, vcpu, kvm) 493 + tdx_flush_vp_on_cpu(vcpu); 494 + 495 + /* 496 + * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock 497 + * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. 498 + * Multiple TDX guests can be destroyed simultaneously. Take the 499 + * mutex to prevent it from getting error. 500 + */ 501 + mutex_lock(&tdx_lock); 502 + 503 + /* 504 + * Releasing HKID is in vm_destroy(). 505 + * After the above flushing vps, there should be no more vCPU 506 + * associations, as all vCPU fds have been released at this stage. 507 + */ 508 + err = tdh_mng_vpflushdone(&kvm_tdx->td); 509 + if (err == TDX_FLUSHVP_NOT_DONE) 510 + goto out; 511 + if (KVM_BUG_ON(err, kvm)) { 512 + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err); 513 + pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n", 514 + kvm_tdx->hkid); 515 + goto out; 516 + } 517 + 518 + for_each_online_cpu(i) { 519 + if (packages_allocated && 520 + cpumask_test_and_set_cpu(topology_physical_package_id(i), 521 + packages)) 522 + continue; 523 + if (targets_allocated) 524 + cpumask_set_cpu(i, targets); 525 + } 526 + if (targets_allocated) 527 + on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true); 528 + else 529 + on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true); 530 + /* 531 + * In the case of error in smp_func_do_phymem_cache_wb(), the following 532 + * tdh_mng_key_freeid() will fail. 533 + */ 534 + err = tdh_mng_key_freeid(&kvm_tdx->td); 535 + if (KVM_BUG_ON(err, kvm)) { 536 + pr_tdx_error(TDH_MNG_KEY_FREEID, err); 537 + pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n", 538 + kvm_tdx->hkid); 539 + } else { 540 + tdx_hkid_free(kvm_tdx); 541 + } 542 + 543 + out: 544 + mutex_unlock(&tdx_lock); 545 + cpus_read_unlock(); 546 + free_cpumask_var(targets); 547 + free_cpumask_var(packages); 548 + } 549 + 550 + static void tdx_reclaim_td_control_pages(struct kvm *kvm) 551 + { 552 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 553 + u64 err; 554 + int i; 555 + 556 + /* 557 + * tdx_mmu_release_hkid() failed to reclaim HKID. Something went wrong 558 + * heavily with TDX module. Give up freeing TD pages. As the function 559 + * already warned, don't warn it again. 560 + */ 561 + if (is_hkid_assigned(kvm_tdx)) 562 + return; 563 + 564 + if (kvm_tdx->td.tdcs_pages) { 565 + for (i = 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { 566 + if (!kvm_tdx->td.tdcs_pages[i]) 567 + continue; 568 + 569 + tdx_reclaim_control_page(kvm_tdx->td.tdcs_pages[i]); 570 + } 571 + kfree(kvm_tdx->td.tdcs_pages); 572 + kvm_tdx->td.tdcs_pages = NULL; 573 + } 574 + 575 + if (!kvm_tdx->td.tdr_page) 576 + return; 577 + 578 + if (__tdx_reclaim_page(kvm_tdx->td.tdr_page)) 579 + return; 580 + 581 + /* 582 + * Use a SEAMCALL to ask the TDX module to flush the cache based on the 583 + * KeyID. TDX module may access TDR while operating on TD (Especially 584 + * when it is reclaiming TDCS). 585 + */ 586 + err = tdh_phymem_page_wbinvd_tdr(&kvm_tdx->td); 587 + if (KVM_BUG_ON(err, kvm)) { 588 + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); 589 + return; 590 + } 591 + tdx_clear_page(kvm_tdx->td.tdr_page); 592 + 593 + __free_page(kvm_tdx->td.tdr_page); 594 + kvm_tdx->td.tdr_page = NULL; 595 + } 596 + 597 + void tdx_vm_destroy(struct kvm *kvm) 598 + { 599 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 600 + 601 + tdx_reclaim_td_control_pages(kvm); 602 + 603 + kvm_tdx->state = TD_STATE_UNINITIALIZED; 604 + } 605 + 606 + static int tdx_do_tdh_mng_key_config(void *param) 607 + { 608 + struct kvm_tdx *kvm_tdx = param; 609 + u64 err; 610 + 611 + /* TDX_RND_NO_ENTROPY related retries are handled by sc_retry() */ 612 + err = tdh_mng_key_config(&kvm_tdx->td); 613 + 614 + if (KVM_BUG_ON(err, &kvm_tdx->kvm)) { 615 + pr_tdx_error(TDH_MNG_KEY_CONFIG, err); 616 + return -EIO; 617 + } 618 + 619 + return 0; 620 + } 621 + 622 + int tdx_vm_init(struct kvm *kvm) 623 + { 624 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 625 + 626 + kvm->arch.has_protected_state = true; 627 + kvm->arch.has_private_mem = true; 628 + kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT; 629 + 630 + /* 631 + * Because guest TD is protected, VMM can't parse the instruction in TD. 632 + * Instead, guest uses MMIO hypercall. For unmodified device driver, 633 + * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO 634 + * instruction into MMIO hypercall. 635 + * 636 + * SPTE value for MMIO needs to be setup so that #VE is injected into 637 + * TD instead of triggering EPT MISCONFIG. 638 + * - RWX=0 so that EPT violation is triggered. 639 + * - suppress #VE bit is cleared to inject #VE. 640 + */ 641 + kvm_mmu_set_mmio_spte_value(kvm, 0); 642 + 643 + /* 644 + * TDX has its own limit of maximum vCPUs it can support for all 645 + * TDX guests in addition to KVM_MAX_VCPUS. TDX module reports 646 + * such limit via the MAX_VCPU_PER_TD global metadata. In 647 + * practice, it reflects the number of logical CPUs that ALL 648 + * platforms that the TDX module supports can possibly have. 649 + * 650 + * Limit TDX guest's maximum vCPUs to the number of logical CPUs 651 + * the platform has. Simply forwarding the MAX_VCPU_PER_TD to 652 + * userspace would result in an unpredictable ABI. 653 + */ 654 + kvm->max_vcpus = min_t(int, kvm->max_vcpus, num_present_cpus()); 655 + 656 + kvm_tdx->state = TD_STATE_UNINITIALIZED; 657 + 658 + return 0; 659 + } 660 + 661 + int tdx_vcpu_create(struct kvm_vcpu *vcpu) 662 + { 663 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 664 + struct vcpu_tdx *tdx = to_tdx(vcpu); 665 + 666 + if (kvm_tdx->state != TD_STATE_INITIALIZED) 667 + return -EIO; 668 + 669 + /* 670 + * TDX module mandates APICv, which requires an in-kernel local APIC. 671 + * Disallow an in-kernel I/O APIC, because level-triggered interrupts 672 + * and thus the I/O APIC as a whole can't be faithfully emulated in KVM. 673 + */ 674 + if (!irqchip_split(vcpu->kvm)) 675 + return -EINVAL; 676 + 677 + fpstate_set_confidential(&vcpu->arch.guest_fpu); 678 + vcpu->arch.apic->guest_apic_protected = true; 679 + INIT_LIST_HEAD(&tdx->vt.pi_wakeup_list); 680 + 681 + vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX; 682 + 683 + vcpu->arch.switch_db_regs = KVM_DEBUGREG_AUTO_SWITCH; 684 + vcpu->arch.cr0_guest_owned_bits = -1ul; 685 + vcpu->arch.cr4_guest_owned_bits = -1ul; 686 + 687 + /* KVM can't change TSC offset/multiplier as TDX module manages them. */ 688 + vcpu->arch.guest_tsc_protected = true; 689 + vcpu->arch.tsc_offset = kvm_tdx->tsc_offset; 690 + vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset; 691 + vcpu->arch.tsc_scaling_ratio = kvm_tdx->tsc_multiplier; 692 + vcpu->arch.l1_tsc_scaling_ratio = kvm_tdx->tsc_multiplier; 693 + 694 + vcpu->arch.guest_state_protected = 695 + !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); 696 + 697 + if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE) 698 + vcpu->arch.xfd_no_write_intercept = true; 699 + 700 + tdx->vt.pi_desc.nv = POSTED_INTR_VECTOR; 701 + __pi_set_sn(&tdx->vt.pi_desc); 702 + 703 + tdx->state = VCPU_TD_STATE_UNINITIALIZED; 704 + 705 + return 0; 706 + } 707 + 708 + void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 709 + { 710 + struct vcpu_tdx *tdx = to_tdx(vcpu); 711 + 712 + vmx_vcpu_pi_load(vcpu, cpu); 713 + if (vcpu->cpu == cpu || !is_hkid_assigned(to_kvm_tdx(vcpu->kvm))) 714 + return; 715 + 716 + tdx_flush_vp_on_cpu(vcpu); 717 + 718 + KVM_BUG_ON(cpu != raw_smp_processor_id(), vcpu->kvm); 719 + local_irq_disable(); 720 + /* 721 + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure 722 + * vcpu->cpu is read before tdx->cpu_list. 723 + */ 724 + smp_rmb(); 725 + 726 + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); 727 + local_irq_enable(); 728 + } 729 + 730 + bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) 731 + { 732 + /* 733 + * KVM can't get the interrupt status of TDX guest and it assumes 734 + * interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, 735 + * which passes the interrupt blocked flag. 736 + */ 737 + return vmx_get_exit_reason(vcpu).basic != EXIT_REASON_HLT || 738 + !to_tdx(vcpu)->vp_enter_args.r12; 739 + } 740 + 741 + bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) 742 + { 743 + u64 vcpu_state_details; 744 + 745 + if (pi_has_pending_interrupt(vcpu)) 746 + return true; 747 + 748 + /* 749 + * Only check RVI pending for HALTED case with IRQ enabled. 750 + * For non-HLT cases, KVM doesn't care about STI/SS shadows. And if the 751 + * interrupt was pending before TD exit, then it _must_ be blocked, 752 + * otherwise the interrupt would have been serviced at the instruction 753 + * boundary. 754 + */ 755 + if (vmx_get_exit_reason(vcpu).basic != EXIT_REASON_HLT || 756 + to_tdx(vcpu)->vp_enter_args.r12) 757 + return false; 758 + 759 + vcpu_state_details = 760 + td_state_non_arch_read64(to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH); 761 + 762 + return tdx_vcpu_state_details_intr_pending(vcpu_state_details); 763 + } 764 + 765 + /* 766 + * Compared to vmx_prepare_switch_to_guest(), there is not much to do 767 + * as SEAMCALL/SEAMRET calls take care of most of save and restore. 768 + */ 769 + void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) 770 + { 771 + struct vcpu_vt *vt = to_vt(vcpu); 772 + 773 + if (vt->guest_state_loaded) 774 + return; 775 + 776 + if (likely(is_64bit_mm(current->mm))) 777 + vt->msr_host_kernel_gs_base = current->thread.gsbase; 778 + else 779 + vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE); 780 + 781 + vt->host_debugctlmsr = get_debugctlmsr(); 782 + 783 + vt->guest_state_loaded = true; 784 + } 785 + 786 + struct tdx_uret_msr { 787 + u32 msr; 788 + unsigned int slot; 789 + u64 defval; 790 + }; 791 + 792 + static struct tdx_uret_msr tdx_uret_msrs[] = { 793 + {.msr = MSR_SYSCALL_MASK, .defval = 0x20200 }, 794 + {.msr = MSR_STAR,}, 795 + {.msr = MSR_LSTAR,}, 796 + {.msr = MSR_TSC_AUX,}, 797 + }; 798 + 799 + static void tdx_user_return_msr_update_cache(void) 800 + { 801 + int i; 802 + 803 + for (i = 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) 804 + kvm_user_return_msr_update_cache(tdx_uret_msrs[i].slot, 805 + tdx_uret_msrs[i].defval); 806 + } 807 + 808 + static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu) 809 + { 810 + struct vcpu_vt *vt = to_vt(vcpu); 811 + struct vcpu_tdx *tdx = to_tdx(vcpu); 812 + 813 + if (!vt->guest_state_loaded) 814 + return; 815 + 816 + ++vcpu->stat.host_state_reload; 817 + wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base); 818 + 819 + if (tdx->guest_entered) { 820 + tdx_user_return_msr_update_cache(); 821 + tdx->guest_entered = false; 822 + } 823 + 824 + vt->guest_state_loaded = false; 825 + } 826 + 827 + void tdx_vcpu_put(struct kvm_vcpu *vcpu) 828 + { 829 + vmx_vcpu_pi_put(vcpu); 830 + tdx_prepare_switch_to_host(vcpu); 831 + } 832 + 833 + void tdx_vcpu_free(struct kvm_vcpu *vcpu) 834 + { 835 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 836 + struct vcpu_tdx *tdx = to_tdx(vcpu); 837 + int i; 838 + 839 + /* 840 + * It is not possible to reclaim pages while hkid is assigned. It might 841 + * be assigned if: 842 + * 1. the TD VM is being destroyed but freeing hkid failed, in which 843 + * case the pages are leaked 844 + * 2. TD VCPU creation failed and this on the error path, in which case 845 + * there is nothing to do anyway 846 + */ 847 + if (is_hkid_assigned(kvm_tdx)) 848 + return; 849 + 850 + if (tdx->vp.tdcx_pages) { 851 + for (i = 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { 852 + if (tdx->vp.tdcx_pages[i]) 853 + tdx_reclaim_control_page(tdx->vp.tdcx_pages[i]); 854 + } 855 + kfree(tdx->vp.tdcx_pages); 856 + tdx->vp.tdcx_pages = NULL; 857 + } 858 + if (tdx->vp.tdvpr_page) { 859 + tdx_reclaim_control_page(tdx->vp.tdvpr_page); 860 + tdx->vp.tdvpr_page = 0; 861 + } 862 + 863 + tdx->state = VCPU_TD_STATE_UNINITIALIZED; 864 + } 865 + 866 + int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu) 867 + { 868 + if (unlikely(to_tdx(vcpu)->state != VCPU_TD_STATE_INITIALIZED || 869 + to_kvm_tdx(vcpu->kvm)->state != TD_STATE_RUNNABLE)) 870 + return -EINVAL; 871 + 872 + return 1; 873 + } 874 + 875 + static __always_inline u32 tdcall_to_vmx_exit_reason(struct kvm_vcpu *vcpu) 876 + { 877 + switch (tdvmcall_leaf(vcpu)) { 878 + case EXIT_REASON_CPUID: 879 + case EXIT_REASON_HLT: 880 + case EXIT_REASON_IO_INSTRUCTION: 881 + case EXIT_REASON_MSR_READ: 882 + case EXIT_REASON_MSR_WRITE: 883 + return tdvmcall_leaf(vcpu); 884 + case EXIT_REASON_EPT_VIOLATION: 885 + return EXIT_REASON_EPT_MISCONFIG; 886 + default: 887 + break; 888 + } 889 + 890 + return EXIT_REASON_TDCALL; 891 + } 892 + 893 + static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) 894 + { 895 + struct vcpu_tdx *tdx = to_tdx(vcpu); 896 + u32 exit_reason; 897 + 898 + switch (tdx->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) { 899 + case TDX_SUCCESS: 900 + case TDX_NON_RECOVERABLE_VCPU: 901 + case TDX_NON_RECOVERABLE_TD: 902 + case TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE: 903 + case TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE: 904 + break; 905 + default: 906 + return -1u; 907 + } 908 + 909 + exit_reason = tdx->vp_enter_ret; 910 + 911 + switch (exit_reason) { 912 + case EXIT_REASON_TDCALL: 913 + if (tdvmcall_exit_type(vcpu)) 914 + return EXIT_REASON_VMCALL; 915 + 916 + return tdcall_to_vmx_exit_reason(vcpu); 917 + case EXIT_REASON_EPT_MISCONFIG: 918 + /* 919 + * Defer KVM_BUG_ON() until tdx_handle_exit() because this is in 920 + * non-instrumentable code with interrupts disabled. 921 + */ 922 + return -1u; 923 + default: 924 + break; 925 + } 926 + 927 + return exit_reason; 928 + } 929 + 930 + static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) 931 + { 932 + struct vcpu_tdx *tdx = to_tdx(vcpu); 933 + struct vcpu_vt *vt = to_vt(vcpu); 934 + 935 + guest_state_enter_irqoff(); 936 + 937 + tdx->vp_enter_ret = tdh_vp_enter(&tdx->vp, &tdx->vp_enter_args); 938 + 939 + vt->exit_reason.full = tdx_to_vmx_exit_reason(vcpu); 940 + 941 + vt->exit_qualification = tdx->vp_enter_args.rcx; 942 + tdx->ext_exit_qualification = tdx->vp_enter_args.rdx; 943 + tdx->exit_gpa = tdx->vp_enter_args.r8; 944 + vt->exit_intr_info = tdx->vp_enter_args.r9; 945 + 946 + vmx_handle_nmi(vcpu); 947 + 948 + guest_state_exit_irqoff(); 949 + } 950 + 951 + static bool tdx_failed_vmentry(struct kvm_vcpu *vcpu) 952 + { 953 + return vmx_get_exit_reason(vcpu).failed_vmentry && 954 + vmx_get_exit_reason(vcpu).full != -1u; 955 + } 956 + 957 + static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) 958 + { 959 + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret; 960 + 961 + /* 962 + * TDX_OPERAND_BUSY could be returned for SEPT due to 0-step mitigation 963 + * or for TD EPOCH due to contention with TDH.MEM.TRACK on TDH.VP.ENTER. 964 + * 965 + * When KVM requests KVM_REQ_OUTSIDE_GUEST_MODE, which has both 966 + * KVM_REQUEST_WAIT and KVM_REQUEST_NO_ACTION set, it requires target 967 + * vCPUs leaving fastpath so that interrupt can be enabled to ensure the 968 + * IPIs can be delivered. Return EXIT_FASTPATH_EXIT_HANDLED instead of 969 + * EXIT_FASTPATH_REENTER_GUEST to exit fastpath, otherwise, the 970 + * requester may be blocked endlessly. 971 + */ 972 + if (unlikely(tdx_operand_busy(vp_enter_ret))) 973 + return EXIT_FASTPATH_EXIT_HANDLED; 974 + 975 + return EXIT_FASTPATH_NONE; 976 + } 977 + 978 + #define TDX_REGS_AVAIL_SET (BIT_ULL(VCPU_EXREG_EXIT_INFO_1) | \ 979 + BIT_ULL(VCPU_EXREG_EXIT_INFO_2) | \ 980 + BIT_ULL(VCPU_REGS_RAX) | \ 981 + BIT_ULL(VCPU_REGS_RBX) | \ 982 + BIT_ULL(VCPU_REGS_RCX) | \ 983 + BIT_ULL(VCPU_REGS_RDX) | \ 984 + BIT_ULL(VCPU_REGS_RBP) | \ 985 + BIT_ULL(VCPU_REGS_RSI) | \ 986 + BIT_ULL(VCPU_REGS_RDI) | \ 987 + BIT_ULL(VCPU_REGS_R8) | \ 988 + BIT_ULL(VCPU_REGS_R9) | \ 989 + BIT_ULL(VCPU_REGS_R10) | \ 990 + BIT_ULL(VCPU_REGS_R11) | \ 991 + BIT_ULL(VCPU_REGS_R12) | \ 992 + BIT_ULL(VCPU_REGS_R13) | \ 993 + BIT_ULL(VCPU_REGS_R14) | \ 994 + BIT_ULL(VCPU_REGS_R15)) 995 + 996 + static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu) 997 + { 998 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 999 + 1000 + /* 1001 + * All TDX hosts support PKRU; but even if they didn't, 1002 + * vcpu->arch.host_pkru would be 0 and the wrpkru would be 1003 + * skipped. 1004 + */ 1005 + if (vcpu->arch.host_pkru != 0) 1006 + wrpkru(vcpu->arch.host_pkru); 1007 + 1008 + if (kvm_host.xcr0 != (kvm_tdx->xfam & kvm_caps.supported_xcr0)) 1009 + xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0); 1010 + 1011 + /* 1012 + * Likewise, even if a TDX hosts didn't support XSS both arms of 1013 + * the comparison would be 0 and the wrmsrl would be skipped. 1014 + */ 1015 + if (kvm_host.xss != (kvm_tdx->xfam & kvm_caps.supported_xss)) 1016 + wrmsrl(MSR_IA32_XSS, kvm_host.xss); 1017 + } 1018 + 1019 + #define TDX_DEBUGCTL_PRESERVED (DEBUGCTLMSR_BTF | \ 1020 + DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI | \ 1021 + DEBUGCTLMSR_FREEZE_IN_SMM) 1022 + 1023 + fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) 1024 + { 1025 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1026 + struct vcpu_vt *vt = to_vt(vcpu); 1027 + 1028 + /* 1029 + * force_immediate_exit requires vCPU entering for events injection with 1030 + * an immediately exit followed. But The TDX module doesn't guarantee 1031 + * entry, it's already possible for KVM to _think_ it completely entry 1032 + * to the guest without actually having done so. 1033 + * Since KVM never needs to force an immediate exit for TDX, and can't 1034 + * do direct injection, just warn on force_immediate_exit. 1035 + */ 1036 + WARN_ON_ONCE(force_immediate_exit); 1037 + 1038 + /* 1039 + * Wait until retry of SEPT-zap-related SEAMCALL completes before 1040 + * allowing vCPU entry to avoid contention with tdh_vp_enter() and 1041 + * TDCALLs. 1042 + */ 1043 + if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap))) 1044 + return EXIT_FASTPATH_EXIT_HANDLED; 1045 + 1046 + trace_kvm_entry(vcpu, force_immediate_exit); 1047 + 1048 + if (pi_test_on(&vt->pi_desc)) { 1049 + apic->send_IPI_self(POSTED_INTR_VECTOR); 1050 + 1051 + if (pi_test_pir(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTT) & 1052 + APIC_VECTOR_MASK, &vt->pi_desc)) 1053 + kvm_wait_lapic_expire(vcpu); 1054 + } 1055 + 1056 + tdx_vcpu_enter_exit(vcpu); 1057 + 1058 + if (vt->host_debugctlmsr & ~TDX_DEBUGCTL_PRESERVED) 1059 + update_debugctlmsr(vt->host_debugctlmsr); 1060 + 1061 + tdx_load_host_xsave_state(vcpu); 1062 + tdx->guest_entered = true; 1063 + 1064 + vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET; 1065 + 1066 + if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG)) 1067 + return EXIT_FASTPATH_NONE; 1068 + 1069 + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) 1070 + return EXIT_FASTPATH_NONE; 1071 + 1072 + if (unlikely(vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MCE_DURING_VMENTRY)) 1073 + kvm_machine_check(); 1074 + 1075 + trace_kvm_exit(vcpu, KVM_ISA_VMX); 1076 + 1077 + if (unlikely(tdx_failed_vmentry(vcpu))) 1078 + return EXIT_FASTPATH_NONE; 1079 + 1080 + return tdx_exit_handlers_fastpath(vcpu); 1081 + } 1082 + 1083 + void tdx_inject_nmi(struct kvm_vcpu *vcpu) 1084 + { 1085 + ++vcpu->stat.nmi_injections; 1086 + td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1); 1087 + /* 1088 + * From KVM's perspective, NMI injection is completed right after 1089 + * writing to PEND_NMI. KVM doesn't care whether an NMI is injected by 1090 + * the TDX module or not. 1091 + */ 1092 + vcpu->arch.nmi_injected = false; 1093 + /* 1094 + * TDX doesn't support KVM to request NMI window exit. If there is 1095 + * still a pending vNMI, KVM is not able to inject it along with the 1096 + * one pending in TDX module in a back-to-back way. Since the previous 1097 + * vNMI is still pending in TDX module, i.e. it has not been delivered 1098 + * to TDX guest yet, it's OK to collapse the pending vNMI into the 1099 + * previous one. The guest is expected to handle all the NMI sources 1100 + * when handling the first vNMI. 1101 + */ 1102 + vcpu->arch.nmi_pending = 0; 1103 + } 1104 + 1105 + static int tdx_handle_exception_nmi(struct kvm_vcpu *vcpu) 1106 + { 1107 + u32 intr_info = vmx_get_intr_info(vcpu); 1108 + 1109 + /* 1110 + * Machine checks are handled by handle_exception_irqoff(), or by 1111 + * tdx_handle_exit() with TDX_NON_RECOVERABLE set if a #MC occurs on 1112 + * VM-Entry. NMIs are handled by tdx_vcpu_enter_exit(). 1113 + */ 1114 + if (is_nmi(intr_info) || is_machine_check(intr_info)) 1115 + return 1; 1116 + 1117 + vcpu->run->exit_reason = KVM_EXIT_EXCEPTION; 1118 + vcpu->run->ex.exception = intr_info & INTR_INFO_VECTOR_MASK; 1119 + vcpu->run->ex.error_code = 0; 1120 + 1121 + return 0; 1122 + } 1123 + 1124 + static int complete_hypercall_exit(struct kvm_vcpu *vcpu) 1125 + { 1126 + tdvmcall_set_return_code(vcpu, vcpu->run->hypercall.ret); 1127 + return 1; 1128 + } 1129 + 1130 + static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) 1131 + { 1132 + kvm_rax_write(vcpu, to_tdx(vcpu)->vp_enter_args.r10); 1133 + kvm_rbx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r11); 1134 + kvm_rcx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r12); 1135 + kvm_rdx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r13); 1136 + kvm_rsi_write(vcpu, to_tdx(vcpu)->vp_enter_args.r14); 1137 + 1138 + return __kvm_emulate_hypercall(vcpu, 0, complete_hypercall_exit); 1139 + } 1140 + 1141 + /* 1142 + * Split into chunks and check interrupt pending between chunks. This allows 1143 + * for timely injection of interrupts to prevent issues with guest lockup 1144 + * detection. 1145 + */ 1146 + #define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024) 1147 + static void __tdx_map_gpa(struct vcpu_tdx *tdx); 1148 + 1149 + static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu) 1150 + { 1151 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1152 + 1153 + if (vcpu->run->hypercall.ret) { 1154 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1155 + tdx->vp_enter_args.r11 = tdx->map_gpa_next; 1156 + return 1; 1157 + } 1158 + 1159 + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN; 1160 + if (tdx->map_gpa_next >= tdx->map_gpa_end) 1161 + return 1; 1162 + 1163 + /* 1164 + * Stop processing the remaining part if there is a pending interrupt, 1165 + * which could be qualified to deliver. Skip checking pending RVI for 1166 + * TDVMCALL_MAP_GPA, see comments in tdx_protected_apic_has_interrupt(). 1167 + */ 1168 + if (kvm_vcpu_has_events(vcpu)) { 1169 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY); 1170 + tdx->vp_enter_args.r11 = tdx->map_gpa_next; 1171 + return 1; 1172 + } 1173 + 1174 + __tdx_map_gpa(tdx); 1175 + return 0; 1176 + } 1177 + 1178 + static void __tdx_map_gpa(struct vcpu_tdx *tdx) 1179 + { 1180 + u64 gpa = tdx->map_gpa_next; 1181 + u64 size = tdx->map_gpa_end - tdx->map_gpa_next; 1182 + 1183 + if (size > TDX_MAP_GPA_MAX_LEN) 1184 + size = TDX_MAP_GPA_MAX_LEN; 1185 + 1186 + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; 1187 + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; 1188 + /* 1189 + * In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2) 1190 + * assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that 1191 + * it was always zero on KVM_EXIT_HYPERCALL. Since KVM is now overwriting 1192 + * vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU. 1193 + */ 1194 + tdx->vcpu.run->hypercall.ret = 0; 1195 + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); 1196 + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; 1197 + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ? 1198 + KVM_MAP_GPA_RANGE_ENCRYPTED : 1199 + KVM_MAP_GPA_RANGE_DECRYPTED; 1200 + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE; 1201 + 1202 + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa; 1203 + } 1204 + 1205 + static int tdx_map_gpa(struct kvm_vcpu *vcpu) 1206 + { 1207 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1208 + u64 gpa = tdx->vp_enter_args.r12; 1209 + u64 size = tdx->vp_enter_args.r13; 1210 + u64 ret; 1211 + 1212 + /* 1213 + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires 1214 + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE 1215 + * bit set. If not, the error code is not defined in GHCI for TDX, use 1216 + * TDVMCALL_STATUS_INVALID_OPERAND for this case. 1217 + */ 1218 + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) { 1219 + ret = TDVMCALL_STATUS_INVALID_OPERAND; 1220 + goto error; 1221 + } 1222 + 1223 + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) || 1224 + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size - 1) || 1225 + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) != 1226 + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1))) { 1227 + ret = TDVMCALL_STATUS_INVALID_OPERAND; 1228 + goto error; 1229 + } 1230 + 1231 + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) { 1232 + ret = TDVMCALL_STATUS_ALIGN_ERROR; 1233 + goto error; 1234 + } 1235 + 1236 + tdx->map_gpa_end = gpa + size; 1237 + tdx->map_gpa_next = gpa; 1238 + 1239 + __tdx_map_gpa(tdx); 1240 + return 0; 1241 + 1242 + error: 1243 + tdvmcall_set_return_code(vcpu, ret); 1244 + tdx->vp_enter_args.r11 = gpa; 1245 + return 1; 1246 + } 1247 + 1248 + static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) 1249 + { 1250 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1251 + u64 *regs = vcpu->run->system_event.data; 1252 + u64 *module_regs = &tdx->vp_enter_args.r8; 1253 + int index = VCPU_REGS_RAX; 1254 + 1255 + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; 1256 + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL; 1257 + vcpu->run->system_event.ndata = 16; 1258 + 1259 + /* Dump 16 general-purpose registers to userspace in ascending order. */ 1260 + regs[index++] = tdx->vp_enter_ret; 1261 + regs[index++] = tdx->vp_enter_args.rcx; 1262 + regs[index++] = tdx->vp_enter_args.rdx; 1263 + regs[index++] = tdx->vp_enter_args.rbx; 1264 + regs[index++] = 0; 1265 + regs[index++] = 0; 1266 + regs[index++] = tdx->vp_enter_args.rsi; 1267 + regs[index] = tdx->vp_enter_args.rdi; 1268 + for (index = 0; index < 8; index++) 1269 + regs[VCPU_REGS_R8 + index] = module_regs[index]; 1270 + 1271 + return 0; 1272 + } 1273 + 1274 + static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu) 1275 + { 1276 + u32 eax, ebx, ecx, edx; 1277 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1278 + 1279 + /* EAX and ECX for cpuid is stored in R12 and R13. */ 1280 + eax = tdx->vp_enter_args.r12; 1281 + ecx = tdx->vp_enter_args.r13; 1282 + 1283 + kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false); 1284 + 1285 + tdx->vp_enter_args.r12 = eax; 1286 + tdx->vp_enter_args.r13 = ebx; 1287 + tdx->vp_enter_args.r14 = ecx; 1288 + tdx->vp_enter_args.r15 = edx; 1289 + 1290 + return 1; 1291 + } 1292 + 1293 + static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) 1294 + { 1295 + vcpu->arch.pio.count = 0; 1296 + return 1; 1297 + } 1298 + 1299 + static int tdx_complete_pio_in(struct kvm_vcpu *vcpu) 1300 + { 1301 + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 1302 + unsigned long val = 0; 1303 + int ret; 1304 + 1305 + ret = ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size, 1306 + vcpu->arch.pio.port, &val, 1); 1307 + 1308 + WARN_ON_ONCE(!ret); 1309 + 1310 + tdvmcall_set_return_val(vcpu, val); 1311 + 1312 + return 1; 1313 + } 1314 + 1315 + static int tdx_emulate_io(struct kvm_vcpu *vcpu) 1316 + { 1317 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1318 + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 1319 + unsigned long val = 0; 1320 + unsigned int port; 1321 + u64 size, write; 1322 + int ret; 1323 + 1324 + ++vcpu->stat.io_exits; 1325 + 1326 + size = tdx->vp_enter_args.r12; 1327 + write = tdx->vp_enter_args.r13; 1328 + port = tdx->vp_enter_args.r14; 1329 + 1330 + if ((write != 0 && write != 1) || (size != 1 && size != 2 && size != 4)) { 1331 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1332 + return 1; 1333 + } 1334 + 1335 + if (write) { 1336 + val = tdx->vp_enter_args.r15; 1337 + ret = ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1); 1338 + } else { 1339 + ret = ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1); 1340 + } 1341 + 1342 + if (!ret) 1343 + vcpu->arch.complete_userspace_io = write ? tdx_complete_pio_out : 1344 + tdx_complete_pio_in; 1345 + else if (!write) 1346 + tdvmcall_set_return_val(vcpu, val); 1347 + 1348 + return ret; 1349 + } 1350 + 1351 + static int tdx_complete_mmio_read(struct kvm_vcpu *vcpu) 1352 + { 1353 + unsigned long val = 0; 1354 + gpa_t gpa; 1355 + int size; 1356 + 1357 + gpa = vcpu->mmio_fragments[0].gpa; 1358 + size = vcpu->mmio_fragments[0].len; 1359 + 1360 + memcpy(&val, vcpu->run->mmio.data, size); 1361 + tdvmcall_set_return_val(vcpu, val); 1362 + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); 1363 + return 1; 1364 + } 1365 + 1366 + static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size, 1367 + unsigned long val) 1368 + { 1369 + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { 1370 + trace_kvm_fast_mmio(gpa); 1371 + return 0; 1372 + } 1373 + 1374 + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); 1375 + if (kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val)) 1376 + return -EOPNOTSUPP; 1377 + 1378 + return 0; 1379 + } 1380 + 1381 + static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size) 1382 + { 1383 + unsigned long val; 1384 + 1385 + if (kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val)) 1386 + return -EOPNOTSUPP; 1387 + 1388 + tdvmcall_set_return_val(vcpu, val); 1389 + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); 1390 + return 0; 1391 + } 1392 + 1393 + static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) 1394 + { 1395 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1396 + int size, write, r; 1397 + unsigned long val; 1398 + gpa_t gpa; 1399 + 1400 + size = tdx->vp_enter_args.r12; 1401 + write = tdx->vp_enter_args.r13; 1402 + gpa = tdx->vp_enter_args.r14; 1403 + val = write ? tdx->vp_enter_args.r15 : 0; 1404 + 1405 + if (size != 1 && size != 2 && size != 4 && size != 8) 1406 + goto error; 1407 + if (write != 0 && write != 1) 1408 + goto error; 1409 + 1410 + /* 1411 + * TDG.VP.VMCALL<MMIO> allows only shared GPA, it makes no sense to 1412 + * do MMIO emulation for private GPA. 1413 + */ 1414 + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa) || 1415 + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1)) 1416 + goto error; 1417 + 1418 + gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); 1419 + 1420 + if (write) 1421 + r = tdx_mmio_write(vcpu, gpa, size, val); 1422 + else 1423 + r = tdx_mmio_read(vcpu, gpa, size); 1424 + if (!r) 1425 + /* Kernel completed device emulation. */ 1426 + return 1; 1427 + 1428 + /* Request the device emulation to userspace device model. */ 1429 + vcpu->mmio_is_write = write; 1430 + if (!write) 1431 + vcpu->arch.complete_userspace_io = tdx_complete_mmio_read; 1432 + 1433 + vcpu->run->mmio.phys_addr = gpa; 1434 + vcpu->run->mmio.len = size; 1435 + vcpu->run->mmio.is_write = write; 1436 + vcpu->run->exit_reason = KVM_EXIT_MMIO; 1437 + 1438 + if (write) { 1439 + memcpy(vcpu->run->mmio.data, &val, size); 1440 + } else { 1441 + vcpu->mmio_fragments[0].gpa = gpa; 1442 + vcpu->mmio_fragments[0].len = size; 1443 + trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL); 1444 + } 1445 + return 0; 1446 + 1447 + error: 1448 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1449 + return 1; 1450 + } 1451 + 1452 + static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu) 1453 + { 1454 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1455 + 1456 + if (tdx->vp_enter_args.r12) 1457 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1458 + else { 1459 + tdx->vp_enter_args.r11 = 0; 1460 + tdx->vp_enter_args.r13 = 0; 1461 + tdx->vp_enter_args.r14 = 0; 1462 + } 1463 + return 1; 1464 + } 1465 + 1466 + static int handle_tdvmcall(struct kvm_vcpu *vcpu) 1467 + { 1468 + switch (tdvmcall_leaf(vcpu)) { 1469 + case TDVMCALL_MAP_GPA: 1470 + return tdx_map_gpa(vcpu); 1471 + case TDVMCALL_REPORT_FATAL_ERROR: 1472 + return tdx_report_fatal_error(vcpu); 1473 + case TDVMCALL_GET_TD_VM_CALL_INFO: 1474 + return tdx_get_td_vm_call_info(vcpu); 1475 + default: 1476 + break; 1477 + } 1478 + 1479 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1480 + return 1; 1481 + } 1482 + 1483 + void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) 1484 + { 1485 + u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 : 1486 + TDX_SHARED_BIT_PWL_4; 1487 + 1488 + if (KVM_BUG_ON(shared_bit != kvm_gfn_direct_bits(vcpu->kvm), vcpu->kvm)) 1489 + return; 1490 + 1491 + td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); 1492 + } 1493 + 1494 + static void tdx_unpin(struct kvm *kvm, struct page *page) 1495 + { 1496 + put_page(page); 1497 + } 1498 + 1499 + static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, 1500 + enum pg_level level, struct page *page) 1501 + { 1502 + int tdx_level = pg_level_to_tdx_sept_level(level); 1503 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1504 + gpa_t gpa = gfn_to_gpa(gfn); 1505 + u64 entry, level_state; 1506 + u64 err; 1507 + 1508 + err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &level_state); 1509 + if (unlikely(tdx_operand_busy(err))) { 1510 + tdx_unpin(kvm, page); 1511 + return -EBUSY; 1512 + } 1513 + 1514 + if (KVM_BUG_ON(err, kvm)) { 1515 + pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state); 1516 + tdx_unpin(kvm, page); 1517 + return -EIO; 1518 + } 1519 + 1520 + return 0; 1521 + } 1522 + 1523 + /* 1524 + * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to map guest pages; the 1525 + * callback tdx_gmem_post_populate() then maps pages into private memory. 1526 + * through the a seamcall TDH.MEM.PAGE.ADD(). The SEAMCALL also requires the 1527 + * private EPT structures for the page to have been built before, which is 1528 + * done via kvm_tdp_map_page(). nr_premapped counts the number of pages that 1529 + * were added to the EPT structures but not added with TDH.MEM.PAGE.ADD(). 1530 + * The counter has to be zero on KVM_TDX_FINALIZE_VM, to ensure that there 1531 + * are no half-initialized shared EPT pages. 1532 + */ 1533 + static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn, 1534 + enum pg_level level, kvm_pfn_t pfn) 1535 + { 1536 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1537 + 1538 + if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm)) 1539 + return -EINVAL; 1540 + 1541 + /* nr_premapped will be decreased when tdh_mem_page_add() is called. */ 1542 + atomic64_inc(&kvm_tdx->nr_premapped); 1543 + return 0; 1544 + } 1545 + 1546 + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, 1547 + enum pg_level level, kvm_pfn_t pfn) 1548 + { 1549 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1550 + struct page *page = pfn_to_page(pfn); 1551 + 1552 + /* TODO: handle large pages. */ 1553 + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) 1554 + return -EINVAL; 1555 + 1556 + /* 1557 + * Because guest_memfd doesn't support page migration with 1558 + * a_ops->migrate_folio (yet), no callback is triggered for KVM on page 1559 + * migration. Until guest_memfd supports page migration, prevent page 1560 + * migration. 1561 + * TODO: Once guest_memfd introduces callback on page migration, 1562 + * implement it and remove get_page/put_page(). 1563 + */ 1564 + get_page(page); 1565 + 1566 + /* 1567 + * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching 1568 + * barrier in tdx_td_finalize(). 1569 + */ 1570 + smp_rmb(); 1571 + if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) 1572 + return tdx_mem_page_aug(kvm, gfn, level, page); 1573 + 1574 + return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); 1575 + } 1576 + 1577 + static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, 1578 + enum pg_level level, struct page *page) 1579 + { 1580 + int tdx_level = pg_level_to_tdx_sept_level(level); 1581 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1582 + gpa_t gpa = gfn_to_gpa(gfn); 1583 + u64 err, entry, level_state; 1584 + 1585 + /* TODO: handle large pages. */ 1586 + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) 1587 + return -EINVAL; 1588 + 1589 + if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) 1590 + return -EINVAL; 1591 + 1592 + /* 1593 + * When zapping private page, write lock is held. So no race condition 1594 + * with other vcpu sept operation. 1595 + * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. 1596 + */ 1597 + err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry, 1598 + &level_state); 1599 + 1600 + if (unlikely(tdx_operand_busy(err))) { 1601 + /* 1602 + * The second retry is expected to succeed after kicking off all 1603 + * other vCPUs and prevent them from invoking TDH.VP.ENTER. 1604 + */ 1605 + tdx_no_vcpus_enter_start(kvm); 1606 + err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry, 1607 + &level_state); 1608 + tdx_no_vcpus_enter_stop(kvm); 1609 + } 1610 + 1611 + if (KVM_BUG_ON(err, kvm)) { 1612 + pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); 1613 + return -EIO; 1614 + } 1615 + 1616 + err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page); 1617 + 1618 + if (KVM_BUG_ON(err, kvm)) { 1619 + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); 1620 + return -EIO; 1621 + } 1622 + tdx_clear_page(page); 1623 + tdx_unpin(kvm, page); 1624 + return 0; 1625 + } 1626 + 1627 + int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, 1628 + enum pg_level level, void *private_spt) 1629 + { 1630 + int tdx_level = pg_level_to_tdx_sept_level(level); 1631 + gpa_t gpa = gfn_to_gpa(gfn); 1632 + struct page *page = virt_to_page(private_spt); 1633 + u64 err, entry, level_state; 1634 + 1635 + err = tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, page, &entry, 1636 + &level_state); 1637 + if (unlikely(tdx_operand_busy(err))) 1638 + return -EBUSY; 1639 + 1640 + if (KVM_BUG_ON(err, kvm)) { 1641 + pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); 1642 + return -EIO; 1643 + } 1644 + 1645 + return 0; 1646 + } 1647 + 1648 + /* 1649 + * Check if the error returned from a SEPT zap SEAMCALL is due to that a page is 1650 + * mapped by KVM_TDX_INIT_MEM_REGION without tdh_mem_page_add() being called 1651 + * successfully. 1652 + * 1653 + * Since tdh_mem_sept_add() must have been invoked successfully before a 1654 + * non-leaf entry present in the mirrored page table, the SEPT ZAP related 1655 + * SEAMCALLs should not encounter err TDX_EPT_WALK_FAILED. They should instead 1656 + * find TDX_EPT_ENTRY_STATE_INCORRECT due to an empty leaf entry found in the 1657 + * SEPT. 1658 + * 1659 + * Further check if the returned entry from SEPT walking is with RWX permissions 1660 + * to filter out anything unexpected. 1661 + * 1662 + * Note: @level is pg_level, not the tdx_level. The tdx_level extracted from 1663 + * level_state returned from a SEAMCALL error is the same as that passed into 1664 + * the SEAMCALL. 1665 + */ 1666 + static int tdx_is_sept_zap_err_due_to_premap(struct kvm_tdx *kvm_tdx, u64 err, 1667 + u64 entry, int level) 1668 + { 1669 + if (!err || kvm_tdx->state == TD_STATE_RUNNABLE) 1670 + return false; 1671 + 1672 + if (err != (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX)) 1673 + return false; 1674 + 1675 + if ((is_last_spte(entry, level) && (entry & VMX_EPT_RWX_MASK))) 1676 + return false; 1677 + 1678 + return true; 1679 + } 1680 + 1681 + static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, 1682 + enum pg_level level, struct page *page) 1683 + { 1684 + int tdx_level = pg_level_to_tdx_sept_level(level); 1685 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1686 + gpa_t gpa = gfn_to_gpa(gfn) & KVM_HPAGE_MASK(level); 1687 + u64 err, entry, level_state; 1688 + 1689 + /* For now large page isn't supported yet. */ 1690 + WARN_ON_ONCE(level != PG_LEVEL_4K); 1691 + 1692 + err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); 1693 + 1694 + if (unlikely(tdx_operand_busy(err))) { 1695 + /* After no vCPUs enter, the second retry is expected to succeed */ 1696 + tdx_no_vcpus_enter_start(kvm); 1697 + err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); 1698 + tdx_no_vcpus_enter_stop(kvm); 1699 + } 1700 + if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) && 1701 + !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { 1702 + atomic64_dec(&kvm_tdx->nr_premapped); 1703 + tdx_unpin(kvm, page); 1704 + return 0; 1705 + } 1706 + 1707 + if (KVM_BUG_ON(err, kvm)) { 1708 + pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); 1709 + return -EIO; 1710 + } 1711 + return 1; 1712 + } 1713 + 1714 + /* 1715 + * Ensure shared and private EPTs to be flushed on all vCPUs. 1716 + * tdh_mem_track() is the only caller that increases TD epoch. An increase in 1717 + * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are 1718 + * running in guest mode with the value "N - 1". 1719 + * 1720 + * A successful execution of tdh_mem_track() ensures that vCPUs can only run in 1721 + * guest mode with TD epoch value "N" if no TD exit occurs after the TD epoch 1722 + * being increased to "N + 1". 1723 + * 1724 + * Kicking off all vCPUs after that further results in no vCPUs can run in guest 1725 + * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (e.g. 1726 + * to increase TD epoch to "N + 2"). 1727 + * 1728 + * TDX module will flush EPT on the next TD enter and make vCPUs to run in 1729 + * guest mode with TD epoch value "N + 1". 1730 + * 1731 + * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode by 1732 + * waiting empty IPI handler ack_kick(). 1733 + * 1734 + * No action is required to the vCPUs being kicked off since the kicking off 1735 + * occurs certainly after TD epoch increment and before the next 1736 + * tdh_mem_track(). 1737 + */ 1738 + static void tdx_track(struct kvm *kvm) 1739 + { 1740 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1741 + u64 err; 1742 + 1743 + /* If TD isn't finalized, it's before any vcpu running. */ 1744 + if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) 1745 + return; 1746 + 1747 + lockdep_assert_held_write(&kvm->mmu_lock); 1748 + 1749 + err = tdh_mem_track(&kvm_tdx->td); 1750 + if (unlikely(tdx_operand_busy(err))) { 1751 + /* After no vCPUs enter, the second retry is expected to succeed */ 1752 + tdx_no_vcpus_enter_start(kvm); 1753 + err = tdh_mem_track(&kvm_tdx->td); 1754 + tdx_no_vcpus_enter_stop(kvm); 1755 + } 1756 + 1757 + if (KVM_BUG_ON(err, kvm)) 1758 + pr_tdx_error(TDH_MEM_TRACK, err); 1759 + 1760 + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); 1761 + } 1762 + 1763 + int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, 1764 + enum pg_level level, void *private_spt) 1765 + { 1766 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 1767 + 1768 + /* 1769 + * free_external_spt() is only called after hkid is freed when TD is 1770 + * tearing down. 1771 + * KVM doesn't (yet) zap page table pages in mirror page table while 1772 + * TD is active, though guest pages mapped in mirror page table could be 1773 + * zapped during TD is active, e.g. for shared <-> private conversion 1774 + * and slot move/deletion. 1775 + */ 1776 + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) 1777 + return -EINVAL; 1778 + 1779 + /* 1780 + * The HKID assigned to this TD was already freed and cache was 1781 + * already flushed. We don't have to flush again. 1782 + */ 1783 + return tdx_reclaim_page(virt_to_page(private_spt)); 1784 + } 1785 + 1786 + int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, 1787 + enum pg_level level, kvm_pfn_t pfn) 1788 + { 1789 + struct page *page = pfn_to_page(pfn); 1790 + int ret; 1791 + 1792 + /* 1793 + * HKID is released after all private pages have been removed, and set 1794 + * before any might be populated. Warn if zapping is attempted when 1795 + * there can't be anything populated in the private EPT. 1796 + */ 1797 + if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) 1798 + return -EINVAL; 1799 + 1800 + ret = tdx_sept_zap_private_spte(kvm, gfn, level, page); 1801 + if (ret <= 0) 1802 + return ret; 1803 + 1804 + /* 1805 + * TDX requires TLB tracking before dropping private page. Do 1806 + * it here, although it is also done later. 1807 + */ 1808 + tdx_track(kvm); 1809 + 1810 + return tdx_sept_drop_private_spte(kvm, gfn, level, page); 1811 + } 1812 + 1813 + void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, 1814 + int trig_mode, int vector) 1815 + { 1816 + struct kvm_vcpu *vcpu = apic->vcpu; 1817 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1818 + 1819 + /* TDX supports only posted interrupt. No lapic emulation. */ 1820 + __vmx_deliver_posted_interrupt(vcpu, &tdx->vt.pi_desc, vector); 1821 + 1822 + trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector); 1823 + } 1824 + 1825 + static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcpu) 1826 + { 1827 + u64 eeq_type = to_tdx(vcpu)->ext_exit_qualification & TDX_EXT_EXIT_QUAL_TYPE_MASK; 1828 + u64 eq = vmx_get_exit_qual(vcpu); 1829 + 1830 + if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION) 1831 + return false; 1832 + 1833 + return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN); 1834 + } 1835 + 1836 + static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) 1837 + { 1838 + unsigned long exit_qual; 1839 + gpa_t gpa = to_tdx(vcpu)->exit_gpa; 1840 + bool local_retry = false; 1841 + int ret; 1842 + 1843 + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { 1844 + if (tdx_is_sept_violation_unexpected_pending(vcpu)) { 1845 + pr_warn("Guest access before accepting 0x%llx on vCPU %d\n", 1846 + gpa, vcpu->vcpu_id); 1847 + kvm_vm_dead(vcpu->kvm); 1848 + return -EIO; 1849 + } 1850 + /* 1851 + * Always treat SEPT violations as write faults. Ignore the 1852 + * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations. 1853 + * TD private pages are always RWX in the SEPT tables, 1854 + * i.e. they're always mapped writable. Just as importantly, 1855 + * treating SEPT violations as write faults is necessary to 1856 + * avoid COW allocations, which will cause TDAUGPAGE failures 1857 + * due to aliasing a single HPA to multiple GPAs. 1858 + */ 1859 + exit_qual = EPT_VIOLATION_ACC_WRITE; 1860 + 1861 + /* Only private GPA triggers zero-step mitigation */ 1862 + local_retry = true; 1863 + } else { 1864 + exit_qual = vmx_get_exit_qual(vcpu); 1865 + /* 1866 + * EPT violation due to instruction fetch should never be 1867 + * triggered from shared memory in TDX guest. If such EPT 1868 + * violation occurs, treat it as broken hardware. 1869 + */ 1870 + if (KVM_BUG_ON(exit_qual & EPT_VIOLATION_ACC_INSTR, vcpu->kvm)) 1871 + return -EIO; 1872 + } 1873 + 1874 + trace_kvm_page_fault(vcpu, gpa, exit_qual); 1875 + 1876 + /* 1877 + * To minimize TDH.VP.ENTER invocations, retry locally for private GPA 1878 + * mapping in TDX. 1879 + * 1880 + * KVM may return RET_PF_RETRY for private GPA due to 1881 + * - contentions when atomically updating SPTEs of the mirror page table 1882 + * - in-progress GFN invalidation or memslot removal. 1883 + * - TDX_OPERAND_BUSY error from TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD, 1884 + * caused by contentions with TDH.VP.ENTER (with zero-step mitigation) 1885 + * or certain TDCALLs. 1886 + * 1887 + * If TDH.VP.ENTER is invoked more times than the threshold set by the 1888 + * TDX module before KVM resolves the private GPA mapping, the TDX 1889 + * module will activate zero-step mitigation during TDH.VP.ENTER. This 1890 + * process acquires an SEPT tree lock in the TDX module, leading to 1891 + * further contentions with TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD 1892 + * operations on other vCPUs. 1893 + * 1894 + * Breaking out of local retries for kvm_vcpu_has_events() is for 1895 + * interrupt injection. kvm_vcpu_has_events() should not see pending 1896 + * events for TDX. Since KVM can't determine if IRQs (or NMIs) are 1897 + * blocked by TDs, false positives are inevitable i.e., KVM may re-enter 1898 + * the guest even if the IRQ/NMI can't be delivered. 1899 + * 1900 + * Note: even without breaking out of local retries, zero-step 1901 + * mitigation may still occur due to 1902 + * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT, 1903 + * - a single RIP causing EPT violations for more GFNs than the 1904 + * threshold count. 1905 + * This is safe, as triggering zero-step mitigation only introduces 1906 + * contentions to page installation SEAMCALLs on other vCPUs, which will 1907 + * handle retries locally in their EPT violation handlers. 1908 + */ 1909 + while (1) { 1910 + ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual); 1911 + 1912 + if (ret != RET_PF_RETRY || !local_retry) 1913 + break; 1914 + 1915 + if (kvm_vcpu_has_events(vcpu) || signal_pending(current)) 1916 + break; 1917 + 1918 + if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { 1919 + ret = -EIO; 1920 + break; 1921 + } 1922 + 1923 + cond_resched(); 1924 + } 1925 + return ret; 1926 + } 1927 + 1928 + int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) 1929 + { 1930 + if (err) { 1931 + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); 1932 + return 1; 1933 + } 1934 + 1935 + if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MSR_READ) 1936 + tdvmcall_set_return_val(vcpu, kvm_read_edx_eax(vcpu)); 1937 + 1938 + return 1; 1939 + } 1940 + 1941 + 1942 + int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) 1943 + { 1944 + struct vcpu_tdx *tdx = to_tdx(vcpu); 1945 + u64 vp_enter_ret = tdx->vp_enter_ret; 1946 + union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); 1947 + 1948 + if (fastpath != EXIT_FASTPATH_NONE) 1949 + return 1; 1950 + 1951 + if (unlikely(vp_enter_ret == EXIT_REASON_EPT_MISCONFIG)) { 1952 + KVM_BUG_ON(1, vcpu->kvm); 1953 + return -EIO; 1954 + } 1955 + 1956 + /* 1957 + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and 1958 + * TDX_SEAMCALL_VMFAILINVALID. 1959 + */ 1960 + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { 1961 + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); 1962 + goto unhandled_exit; 1963 + } 1964 + 1965 + if (unlikely(tdx_failed_vmentry(vcpu))) { 1966 + /* 1967 + * If the guest state is protected, that means off-TD debug is 1968 + * not enabled, TDX_NON_RECOVERABLE must be set. 1969 + */ 1970 + WARN_ON_ONCE(vcpu->arch.guest_state_protected && 1971 + !(vp_enter_ret & TDX_NON_RECOVERABLE)); 1972 + vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; 1973 + vcpu->run->fail_entry.hardware_entry_failure_reason = exit_reason.full; 1974 + vcpu->run->fail_entry.cpu = vcpu->arch.last_vmentry_cpu; 1975 + return 0; 1976 + } 1977 + 1978 + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE)) && 1979 + exit_reason.basic != EXIT_REASON_TRIPLE_FAULT) { 1980 + kvm_pr_unimpl("TD vp_enter_ret 0x%llx\n", vp_enter_ret); 1981 + goto unhandled_exit; 1982 + } 1983 + 1984 + WARN_ON_ONCE(exit_reason.basic != EXIT_REASON_TRIPLE_FAULT && 1985 + (vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); 1986 + 1987 + switch (exit_reason.basic) { 1988 + case EXIT_REASON_TRIPLE_FAULT: 1989 + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; 1990 + vcpu->mmio_needed = 0; 1991 + return 0; 1992 + case EXIT_REASON_EXCEPTION_NMI: 1993 + return tdx_handle_exception_nmi(vcpu); 1994 + case EXIT_REASON_EXTERNAL_INTERRUPT: 1995 + ++vcpu->stat.irq_exits; 1996 + return 1; 1997 + case EXIT_REASON_CPUID: 1998 + return tdx_emulate_cpuid(vcpu); 1999 + case EXIT_REASON_HLT: 2000 + return kvm_emulate_halt_noskip(vcpu); 2001 + case EXIT_REASON_TDCALL: 2002 + return handle_tdvmcall(vcpu); 2003 + case EXIT_REASON_VMCALL: 2004 + return tdx_emulate_vmcall(vcpu); 2005 + case EXIT_REASON_IO_INSTRUCTION: 2006 + return tdx_emulate_io(vcpu); 2007 + case EXIT_REASON_MSR_READ: 2008 + kvm_rcx_write(vcpu, tdx->vp_enter_args.r12); 2009 + return kvm_emulate_rdmsr(vcpu); 2010 + case EXIT_REASON_MSR_WRITE: 2011 + kvm_rcx_write(vcpu, tdx->vp_enter_args.r12); 2012 + kvm_rax_write(vcpu, tdx->vp_enter_args.r13 & -1u); 2013 + kvm_rdx_write(vcpu, tdx->vp_enter_args.r13 >> 32); 2014 + return kvm_emulate_wrmsr(vcpu); 2015 + case EXIT_REASON_EPT_MISCONFIG: 2016 + return tdx_emulate_mmio(vcpu); 2017 + case EXIT_REASON_EPT_VIOLATION: 2018 + return tdx_handle_ept_violation(vcpu); 2019 + case EXIT_REASON_OTHER_SMI: 2020 + /* 2021 + * Unlike VMX, SMI in SEAM non-root mode (i.e. when 2022 + * TD guest vCPU is running) will cause VM exit to TDX module, 2023 + * then SEAMRET to KVM. Once it exits to KVM, SMI is delivered 2024 + * and handled by kernel handler right away. 2025 + * 2026 + * The Other SMI exit can also be caused by the SEAM non-root 2027 + * machine check delivered via Machine Check System Management 2028 + * Interrupt (MSMI), but it has already been handled by the 2029 + * kernel machine check handler, i.e., the memory page has been 2030 + * marked as poisoned and it won't be freed to the free list 2031 + * when the TDX guest is terminated (the TDX module marks the 2032 + * guest as dead and prevent it from further running when 2033 + * machine check happens in SEAM non-root). 2034 + * 2035 + * - A MSMI will not reach here, it's handled as non_recoverable 2036 + * case above. 2037 + * - If it's not an MSMI, no need to do anything here. 2038 + */ 2039 + return 1; 2040 + default: 2041 + break; 2042 + } 2043 + 2044 + unhandled_exit: 2045 + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 2046 + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; 2047 + vcpu->run->internal.ndata = 2; 2048 + vcpu->run->internal.data[0] = vp_enter_ret; 2049 + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; 2050 + return 0; 2051 + } 2052 + 2053 + void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, 2054 + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) 2055 + { 2056 + struct vcpu_tdx *tdx = to_tdx(vcpu); 2057 + 2058 + *reason = tdx->vt.exit_reason.full; 2059 + if (*reason != -1u) { 2060 + *info1 = vmx_get_exit_qual(vcpu); 2061 + *info2 = tdx->ext_exit_qualification; 2062 + *intr_info = vmx_get_intr_info(vcpu); 2063 + } else { 2064 + *info1 = 0; 2065 + *info2 = 0; 2066 + *intr_info = 0; 2067 + } 2068 + 2069 + *error_code = 0; 2070 + } 2071 + 2072 + bool tdx_has_emulated_msr(u32 index) 2073 + { 2074 + switch (index) { 2075 + case MSR_IA32_UCODE_REV: 2076 + case MSR_IA32_ARCH_CAPABILITIES: 2077 + case MSR_IA32_POWER_CTL: 2078 + case MSR_IA32_CR_PAT: 2079 + case MSR_MTRRcap: 2080 + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: 2081 + case MSR_MTRRdefType: 2082 + case MSR_IA32_TSC_DEADLINE: 2083 + case MSR_IA32_MISC_ENABLE: 2084 + case MSR_PLATFORM_INFO: 2085 + case MSR_MISC_FEATURES_ENABLES: 2086 + case MSR_IA32_APICBASE: 2087 + case MSR_EFER: 2088 + case MSR_IA32_FEAT_CTL: 2089 + case MSR_IA32_MCG_CAP: 2090 + case MSR_IA32_MCG_STATUS: 2091 + case MSR_IA32_MCG_CTL: 2092 + case MSR_IA32_MCG_EXT_CTL: 2093 + case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: 2094 + case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1: 2095 + /* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC, CTL2} */ 2096 + case MSR_KVM_POLL_CONTROL: 2097 + return true; 2098 + case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: 2099 + /* 2100 + * x2APIC registers that are virtualized by the CPU can't be 2101 + * emulated, KVM doesn't have access to the virtual APIC page. 2102 + */ 2103 + switch (index) { 2104 + case X2APIC_MSR(APIC_TASKPRI): 2105 + case X2APIC_MSR(APIC_PROCPRI): 2106 + case X2APIC_MSR(APIC_EOI): 2107 + case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR): 2108 + case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR): 2109 + case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR): 2110 + return false; 2111 + default: 2112 + return true; 2113 + } 2114 + default: 2115 + return false; 2116 + } 2117 + } 2118 + 2119 + static bool tdx_is_read_only_msr(u32 index) 2120 + { 2121 + return index == MSR_IA32_APICBASE || index == MSR_EFER || 2122 + index == MSR_IA32_FEAT_CTL; 2123 + } 2124 + 2125 + int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) 2126 + { 2127 + switch (msr->index) { 2128 + case MSR_IA32_FEAT_CTL: 2129 + /* 2130 + * MCE and MCA are advertised via cpuid. Guest kernel could 2131 + * check if LMCE is enabled or not. 2132 + */ 2133 + msr->data = FEAT_CTL_LOCKED; 2134 + if (vcpu->arch.mcg_cap & MCG_LMCE_P) 2135 + msr->data |= FEAT_CTL_LMCE_ENABLED; 2136 + return 0; 2137 + case MSR_IA32_MCG_EXT_CTL: 2138 + if (!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) 2139 + return 1; 2140 + msr->data = vcpu->arch.mcg_ext_ctl; 2141 + return 0; 2142 + default: 2143 + if (!tdx_has_emulated_msr(msr->index)) 2144 + return 1; 2145 + 2146 + return kvm_get_msr_common(vcpu, msr); 2147 + } 2148 + } 2149 + 2150 + int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) 2151 + { 2152 + switch (msr->index) { 2153 + case MSR_IA32_MCG_EXT_CTL: 2154 + if ((!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) || 2155 + (msr->data & ~MCG_EXT_CTL_LMCE_EN)) 2156 + return 1; 2157 + vcpu->arch.mcg_ext_ctl = msr->data; 2158 + return 0; 2159 + default: 2160 + if (tdx_is_read_only_msr(msr->index)) 2161 + return 1; 2162 + 2163 + if (!tdx_has_emulated_msr(msr->index)) 2164 + return 1; 2165 + 2166 + return kvm_set_msr_common(vcpu, msr); 2167 + } 2168 + } 2169 + 2170 + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) 2171 + { 2172 + const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; 2173 + struct kvm_tdx_capabilities __user *user_caps; 2174 + struct kvm_tdx_capabilities *caps = NULL; 2175 + int ret = 0; 2176 + 2177 + /* flags is reserved for future use */ 2178 + if (cmd->flags) 2179 + return -EINVAL; 2180 + 2181 + caps = kmalloc(sizeof(*caps) + 2182 + sizeof(struct kvm_cpuid_entry2) * td_conf->num_cpuid_config, 2183 + GFP_KERNEL); 2184 + if (!caps) 2185 + return -ENOMEM; 2186 + 2187 + user_caps = u64_to_user_ptr(cmd->data); 2188 + if (copy_from_user(caps, user_caps, sizeof(*caps))) { 2189 + ret = -EFAULT; 2190 + goto out; 2191 + } 2192 + 2193 + if (caps->cpuid.nent < td_conf->num_cpuid_config) { 2194 + ret = -E2BIG; 2195 + goto out; 2196 + } 2197 + 2198 + ret = init_kvm_tdx_caps(td_conf, caps); 2199 + if (ret) 2200 + goto out; 2201 + 2202 + if (copy_to_user(user_caps, caps, sizeof(*caps))) { 2203 + ret = -EFAULT; 2204 + goto out; 2205 + } 2206 + 2207 + if (copy_to_user(user_caps->cpuid.entries, caps->cpuid.entries, 2208 + caps->cpuid.nent * 2209 + sizeof(caps->cpuid.entries[0]))) 2210 + ret = -EFAULT; 2211 + 2212 + out: 2213 + /* kfree() accepts NULL. */ 2214 + kfree(caps); 2215 + return ret; 2216 + } 2217 + 2218 + /* 2219 + * KVM reports guest physical address in CPUID.0x800000008.EAX[23:16], which is 2220 + * similar to TDX's GPAW. Use this field as the interface for userspace to 2221 + * configure the GPAW and EPT level for TDs. 2222 + * 2223 + * Only values 48 and 52 are supported. Value 52 means GPAW-52 and EPT level 2224 + * 5, Value 48 means GPAW-48 and EPT level 4. For value 48, GPAW-48 is always 2225 + * supported. Value 52 is only supported when the platform supports 5 level 2226 + * EPT. 2227 + */ 2228 + static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid, 2229 + struct td_params *td_params) 2230 + { 2231 + const struct kvm_cpuid_entry2 *entry; 2232 + int guest_pa; 2233 + 2234 + entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0); 2235 + if (!entry) 2236 + return -EINVAL; 2237 + 2238 + guest_pa = tdx_get_guest_phys_addr_bits(entry->eax); 2239 + 2240 + if (guest_pa != 48 && guest_pa != 52) 2241 + return -EINVAL; 2242 + 2243 + if (guest_pa == 52 && !cpu_has_vmx_ept_5levels()) 2244 + return -EINVAL; 2245 + 2246 + td_params->eptp_controls = VMX_EPTP_MT_WB; 2247 + if (guest_pa == 52) { 2248 + td_params->eptp_controls |= VMX_EPTP_PWL_5; 2249 + td_params->config_flags |= TDX_CONFIG_FLAGS_MAX_GPAW; 2250 + } else { 2251 + td_params->eptp_controls |= VMX_EPTP_PWL_4; 2252 + } 2253 + 2254 + return 0; 2255 + } 2256 + 2257 + static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid, 2258 + struct td_params *td_params) 2259 + { 2260 + const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; 2261 + const struct kvm_cpuid_entry2 *entry; 2262 + struct tdx_cpuid_value *value; 2263 + int i, copy_cnt = 0; 2264 + 2265 + /* 2266 + * td_params.cpuid_values: The number and the order of cpuid_value must 2267 + * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_configs} 2268 + * It's assumed that td_params was zeroed. 2269 + */ 2270 + for (i = 0; i < td_conf->num_cpuid_config; i++) { 2271 + struct kvm_cpuid_entry2 tmp; 2272 + 2273 + td_init_cpuid_entry2(&tmp, i); 2274 + 2275 + entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 2276 + tmp.function, tmp.index); 2277 + if (!entry) 2278 + continue; 2279 + 2280 + if (tdx_unsupported_cpuid(entry)) 2281 + return -EINVAL; 2282 + 2283 + copy_cnt++; 2284 + 2285 + value = &td_params->cpuid_values[i]; 2286 + value->eax = entry->eax; 2287 + value->ebx = entry->ebx; 2288 + value->ecx = entry->ecx; 2289 + value->edx = entry->edx; 2290 + 2291 + /* 2292 + * TDX module does not accept nonzero bits 16..23 for the 2293 + * CPUID[0x80000008].EAX, see setup_tdparams_eptp_controls(). 2294 + */ 2295 + if (tmp.function == 0x80000008) 2296 + value->eax = tdx_set_guest_phys_addr_bits(value->eax, 0); 2297 + } 2298 + 2299 + /* 2300 + * Rely on the TDX module to reject invalid configuration, but it can't 2301 + * check of leafs that don't have a proper slot in td_params->cpuid_values 2302 + * to stick then. So fail if there were entries that didn't get copied to 2303 + * td_params. 2304 + */ 2305 + if (copy_cnt != cpuid->nent) 2306 + return -EINVAL; 2307 + 2308 + return 0; 2309 + } 2310 + 2311 + static int setup_tdparams(struct kvm *kvm, struct td_params *td_params, 2312 + struct kvm_tdx_init_vm *init_vm) 2313 + { 2314 + const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; 2315 + struct kvm_cpuid2 *cpuid = &init_vm->cpuid; 2316 + int ret; 2317 + 2318 + if (kvm->created_vcpus) 2319 + return -EBUSY; 2320 + 2321 + if (init_vm->attributes & ~tdx_get_supported_attrs(td_conf)) 2322 + return -EINVAL; 2323 + 2324 + if (init_vm->xfam & ~tdx_get_supported_xfam(td_conf)) 2325 + return -EINVAL; 2326 + 2327 + td_params->max_vcpus = kvm->max_vcpus; 2328 + td_params->attributes = init_vm->attributes | td_conf->attributes_fixed1; 2329 + td_params->xfam = init_vm->xfam | td_conf->xfam_fixed1; 2330 + 2331 + td_params->config_flags = TDX_CONFIG_FLAGS_NO_RBP_MOD; 2332 + td_params->tsc_frequency = TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_khz); 2333 + 2334 + ret = setup_tdparams_eptp_controls(cpuid, td_params); 2335 + if (ret) 2336 + return ret; 2337 + 2338 + ret = setup_tdparams_cpuids(cpuid, td_params); 2339 + if (ret) 2340 + return ret; 2341 + 2342 + #define MEMCPY_SAME_SIZE(dst, src) \ 2343 + do { \ 2344 + BUILD_BUG_ON(sizeof(dst) != sizeof(src)); \ 2345 + memcpy((dst), (src), sizeof(dst)); \ 2346 + } while (0) 2347 + 2348 + MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid); 2349 + MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner); 2350 + MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig); 2351 + 2352 + return 0; 2353 + } 2354 + 2355 + static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params, 2356 + u64 *seamcall_err) 2357 + { 2358 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 2359 + cpumask_var_t packages; 2360 + struct page **tdcs_pages = NULL; 2361 + struct page *tdr_page; 2362 + int ret, i; 2363 + u64 err, rcx; 2364 + 2365 + *seamcall_err = 0; 2366 + ret = tdx_guest_keyid_alloc(); 2367 + if (ret < 0) 2368 + return ret; 2369 + kvm_tdx->hkid = ret; 2370 + kvm_tdx->misc_cg = get_current_misc_cg(); 2371 + ret = misc_cg_try_charge(MISC_CG_RES_TDX, kvm_tdx->misc_cg, 1); 2372 + if (ret) 2373 + goto free_hkid; 2374 + 2375 + ret = -ENOMEM; 2376 + 2377 + atomic_inc(&nr_configured_hkid); 2378 + 2379 + tdr_page = alloc_page(GFP_KERNEL); 2380 + if (!tdr_page) 2381 + goto free_hkid; 2382 + 2383 + kvm_tdx->td.tdcs_nr_pages = tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE; 2384 + /* TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ 2385 + kvm_tdx->td.tdcx_nr_pages = tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1; 2386 + tdcs_pages = kcalloc(kvm_tdx->td.tdcs_nr_pages, sizeof(*kvm_tdx->td.tdcs_pages), 2387 + GFP_KERNEL | __GFP_ZERO); 2388 + if (!tdcs_pages) 2389 + goto free_tdr; 2390 + 2391 + for (i = 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { 2392 + tdcs_pages[i] = alloc_page(GFP_KERNEL); 2393 + if (!tdcs_pages[i]) 2394 + goto free_tdcs; 2395 + } 2396 + 2397 + if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) 2398 + goto free_tdcs; 2399 + 2400 + cpus_read_lock(); 2401 + 2402 + /* 2403 + * Need at least one CPU of the package to be online in order to 2404 + * program all packages for host key id. Check it. 2405 + */ 2406 + for_each_present_cpu(i) 2407 + cpumask_set_cpu(topology_physical_package_id(i), packages); 2408 + for_each_online_cpu(i) 2409 + cpumask_clear_cpu(topology_physical_package_id(i), packages); 2410 + if (!cpumask_empty(packages)) { 2411 + ret = -EIO; 2412 + /* 2413 + * Because it's hard for human operator to figure out the 2414 + * reason, warn it. 2415 + */ 2416 + #define MSG_ALLPKG "All packages need to have online CPU to create TD. Online CPU and retry.\n" 2417 + pr_warn_ratelimited(MSG_ALLPKG); 2418 + goto free_packages; 2419 + } 2420 + 2421 + /* 2422 + * TDH.MNG.CREATE tries to grab the global TDX module and fails 2423 + * with TDX_OPERAND_BUSY when it fails to grab. Take the global 2424 + * lock to prevent it from failure. 2425 + */ 2426 + mutex_lock(&tdx_lock); 2427 + kvm_tdx->td.tdr_page = tdr_page; 2428 + err = tdh_mng_create(&kvm_tdx->td, kvm_tdx->hkid); 2429 + mutex_unlock(&tdx_lock); 2430 + 2431 + if (err == TDX_RND_NO_ENTROPY) { 2432 + ret = -EAGAIN; 2433 + goto free_packages; 2434 + } 2435 + 2436 + if (WARN_ON_ONCE(err)) { 2437 + pr_tdx_error(TDH_MNG_CREATE, err); 2438 + ret = -EIO; 2439 + goto free_packages; 2440 + } 2441 + 2442 + for_each_online_cpu(i) { 2443 + int pkg = topology_physical_package_id(i); 2444 + 2445 + if (cpumask_test_and_set_cpu(pkg, packages)) 2446 + continue; 2447 + 2448 + /* 2449 + * Program the memory controller in the package with an 2450 + * encryption key associated to a TDX private host key id 2451 + * assigned to this TDR. Concurrent operations on same memory 2452 + * controller results in TDX_OPERAND_BUSY. No locking needed 2453 + * beyond the cpus_read_lock() above as it serializes against 2454 + * hotplug and the first online CPU of the package is always 2455 + * used. We never have two CPUs in the same socket trying to 2456 + * program the key. 2457 + */ 2458 + ret = smp_call_on_cpu(i, tdx_do_tdh_mng_key_config, 2459 + kvm_tdx, true); 2460 + if (ret) 2461 + break; 2462 + } 2463 + cpus_read_unlock(); 2464 + free_cpumask_var(packages); 2465 + if (ret) { 2466 + i = 0; 2467 + goto teardown; 2468 + } 2469 + 2470 + kvm_tdx->td.tdcs_pages = tdcs_pages; 2471 + for (i = 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { 2472 + err = tdh_mng_addcx(&kvm_tdx->td, tdcs_pages[i]); 2473 + if (err == TDX_RND_NO_ENTROPY) { 2474 + /* Here it's hard to allow userspace to retry. */ 2475 + ret = -EAGAIN; 2476 + goto teardown; 2477 + } 2478 + if (WARN_ON_ONCE(err)) { 2479 + pr_tdx_error(TDH_MNG_ADDCX, err); 2480 + ret = -EIO; 2481 + goto teardown; 2482 + } 2483 + } 2484 + 2485 + err = tdh_mng_init(&kvm_tdx->td, __pa(td_params), &rcx); 2486 + if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) { 2487 + /* 2488 + * Because a user gives operands, don't warn. 2489 + * Return a hint to the user because it's sometimes hard for the 2490 + * user to figure out which operand is invalid. SEAMCALL status 2491 + * code includes which operand caused invalid operand error. 2492 + */ 2493 + *seamcall_err = err; 2494 + ret = -EINVAL; 2495 + goto teardown; 2496 + } else if (WARN_ON_ONCE(err)) { 2497 + pr_tdx_error_1(TDH_MNG_INIT, err, rcx); 2498 + ret = -EIO; 2499 + goto teardown; 2500 + } 2501 + 2502 + return 0; 2503 + 2504 + /* 2505 + * The sequence for freeing resources from a partially initialized TD 2506 + * varies based on where in the initialization flow failure occurred. 2507 + * Simply use the full teardown and destroy, which naturally play nice 2508 + * with partial initialization. 2509 + */ 2510 + teardown: 2511 + /* Only free pages not yet added, so start at 'i' */ 2512 + for (; i < kvm_tdx->td.tdcs_nr_pages; i++) { 2513 + if (tdcs_pages[i]) { 2514 + __free_page(tdcs_pages[i]); 2515 + tdcs_pages[i] = NULL; 2516 + } 2517 + } 2518 + if (!kvm_tdx->td.tdcs_pages) 2519 + kfree(tdcs_pages); 2520 + 2521 + tdx_mmu_release_hkid(kvm); 2522 + tdx_reclaim_td_control_pages(kvm); 2523 + 2524 + return ret; 2525 + 2526 + free_packages: 2527 + cpus_read_unlock(); 2528 + free_cpumask_var(packages); 2529 + 2530 + free_tdcs: 2531 + for (i = 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { 2532 + if (tdcs_pages[i]) 2533 + __free_page(tdcs_pages[i]); 2534 + } 2535 + kfree(tdcs_pages); 2536 + kvm_tdx->td.tdcs_pages = NULL; 2537 + 2538 + free_tdr: 2539 + if (tdr_page) 2540 + __free_page(tdr_page); 2541 + kvm_tdx->td.tdr_page = 0; 2542 + 2543 + free_hkid: 2544 + tdx_hkid_free(kvm_tdx); 2545 + 2546 + return ret; 2547 + } 2548 + 2549 + static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id, 2550 + u64 *data) 2551 + { 2552 + u64 err; 2553 + 2554 + err = tdh_mng_rd(&tdx->td, field_id, data); 2555 + 2556 + return err; 2557 + } 2558 + 2559 + #define TDX_MD_UNREADABLE_LEAF_MASK GENMASK(30, 7) 2560 + #define TDX_MD_UNREADABLE_SUBLEAF_MASK GENMASK(31, 7) 2561 + 2562 + static int tdx_read_cpuid(struct kvm_vcpu *vcpu, u32 leaf, u32 sub_leaf, 2563 + bool sub_leaf_set, int *entry_index, 2564 + struct kvm_cpuid_entry2 *out) 2565 + { 2566 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 2567 + u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES; 2568 + u64 ebx_eax, edx_ecx; 2569 + u64 err = 0; 2570 + 2571 + if (sub_leaf > 0b1111111) 2572 + return -EINVAL; 2573 + 2574 + if (*entry_index >= KVM_MAX_CPUID_ENTRIES) 2575 + return -EINVAL; 2576 + 2577 + if (leaf & TDX_MD_UNREADABLE_LEAF_MASK || 2578 + sub_leaf & TDX_MD_UNREADABLE_SUBLEAF_MASK) 2579 + return -EINVAL; 2580 + 2581 + /* 2582 + * bit 23:17, REVSERVED: reserved, must be 0; 2583 + * bit 16, LEAF_31: leaf number bit 31; 2584 + * bit 15:9, LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are 2585 + * implicitly 0; 2586 + * bit 8, SUBLEAF_NA: sub-leaf not applicable flag; 2587 + * bit 7:1, SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1, 2588 + * the SUBLEAF_6_0 is all-1. 2589 + * sub-leaf bits 31:7 are implicitly 0; 2590 + * bit 0, ELEMENT_I: Element index within field; 2591 + */ 2592 + field_id |= ((leaf & 0x80000000) ? 1 : 0) << 16; 2593 + field_id |= (leaf & 0x7f) << 9; 2594 + if (sub_leaf_set) 2595 + field_id |= (sub_leaf & 0x7f) << 1; 2596 + else 2597 + field_id |= 0x1fe; 2598 + 2599 + err = tdx_td_metadata_field_read(kvm_tdx, field_id, &ebx_eax); 2600 + if (err) //TODO check for specific errors 2601 + goto err_out; 2602 + 2603 + out->eax = (u32) ebx_eax; 2604 + out->ebx = (u32) (ebx_eax >> 32); 2605 + 2606 + field_id++; 2607 + err = tdx_td_metadata_field_read(kvm_tdx, field_id, &edx_ecx); 2608 + /* 2609 + * It's weird that reading edx_ecx fails while reading ebx_eax 2610 + * succeeded. 2611 + */ 2612 + if (WARN_ON_ONCE(err)) 2613 + goto err_out; 2614 + 2615 + out->ecx = (u32) edx_ecx; 2616 + out->edx = (u32) (edx_ecx >> 32); 2617 + 2618 + out->function = leaf; 2619 + out->index = sub_leaf; 2620 + out->flags |= sub_leaf_set ? KVM_CPUID_FLAG_SIGNIFCANT_INDEX : 0; 2621 + 2622 + /* 2623 + * Work around missing support on old TDX modules, fetch 2624 + * guest maxpa from gfn_direct_bits. 2625 + */ 2626 + if (leaf == 0x80000008) { 2627 + gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); 2628 + unsigned int g_maxpa = __ffs(gpa_bits) + 1; 2629 + 2630 + out->eax = tdx_set_guest_phys_addr_bits(out->eax, g_maxpa); 2631 + } 2632 + 2633 + (*entry_index)++; 2634 + 2635 + return 0; 2636 + 2637 + err_out: 2638 + out->eax = 0; 2639 + out->ebx = 0; 2640 + out->ecx = 0; 2641 + out->edx = 0; 2642 + 2643 + return -EIO; 2644 + } 2645 + 2646 + static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) 2647 + { 2648 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 2649 + struct kvm_tdx_init_vm *init_vm; 2650 + struct td_params *td_params = NULL; 2651 + int ret; 2652 + 2653 + BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid)); 2654 + BUILD_BUG_ON(sizeof(struct td_params) != 1024); 2655 + 2656 + if (kvm_tdx->state != TD_STATE_UNINITIALIZED) 2657 + return -EINVAL; 2658 + 2659 + if (cmd->flags) 2660 + return -EINVAL; 2661 + 2662 + init_vm = kmalloc(sizeof(*init_vm) + 2663 + sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES, 2664 + GFP_KERNEL); 2665 + if (!init_vm) 2666 + return -ENOMEM; 2667 + 2668 + if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) { 2669 + ret = -EFAULT; 2670 + goto out; 2671 + } 2672 + 2673 + if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) { 2674 + ret = -E2BIG; 2675 + goto out; 2676 + } 2677 + 2678 + if (copy_from_user(init_vm->cpuid.entries, 2679 + u64_to_user_ptr(cmd->data) + sizeof(*init_vm), 2680 + flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) { 2681 + ret = -EFAULT; 2682 + goto out; 2683 + } 2684 + 2685 + if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) { 2686 + ret = -EINVAL; 2687 + goto out; 2688 + } 2689 + 2690 + if (init_vm->cpuid.padding) { 2691 + ret = -EINVAL; 2692 + goto out; 2693 + } 2694 + 2695 + td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL); 2696 + if (!td_params) { 2697 + ret = -ENOMEM; 2698 + goto out; 2699 + } 2700 + 2701 + ret = setup_tdparams(kvm, td_params, init_vm); 2702 + if (ret) 2703 + goto out; 2704 + 2705 + ret = __tdx_td_init(kvm, td_params, &cmd->hw_error); 2706 + if (ret) 2707 + goto out; 2708 + 2709 + kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET); 2710 + kvm_tdx->tsc_multiplier = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_MULTIPLIER); 2711 + kvm_tdx->attributes = td_params->attributes; 2712 + kvm_tdx->xfam = td_params->xfam; 2713 + 2714 + if (td_params->config_flags & TDX_CONFIG_FLAGS_MAX_GPAW) 2715 + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_5; 2716 + else 2717 + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_4; 2718 + 2719 + kvm_tdx->state = TD_STATE_INITIALIZED; 2720 + out: 2721 + /* kfree() accepts NULL. */ 2722 + kfree(init_vm); 2723 + kfree(td_params); 2724 + 2725 + return ret; 2726 + } 2727 + 2728 + void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) 2729 + { 2730 + /* 2731 + * flush_tlb_current() is invoked when the first time for the vcpu to 2732 + * run or when root of shared EPT is invalidated. 2733 + * KVM only needs to flush shared EPT because the TDX module handles TLB 2734 + * invalidation for private EPT in tdh_vp_enter(); 2735 + * 2736 + * A single context invalidation for shared EPT can be performed here. 2737 + * However, this single context invalidation requires the private EPTP 2738 + * rather than the shared EPTP to flush shared EPT, as shared EPT uses 2739 + * private EPTP as its ASID for TLB invalidation. 2740 + * 2741 + * To avoid reading back private EPTP, perform a global invalidation for 2742 + * shared EPT instead to keep this function simple. 2743 + */ 2744 + ept_sync_global(); 2745 + } 2746 + 2747 + void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) 2748 + { 2749 + /* 2750 + * TDX has called tdx_track() in tdx_sept_remove_private_spte() to 2751 + * ensure that private EPT will be flushed on the next TD enter. No need 2752 + * to call tdx_track() here again even when this callback is a result of 2753 + * zapping private EPT. 2754 + * 2755 + * Due to the lack of the context to determine which EPT has been 2756 + * affected by zapping, invoke invept() directly here for both shared 2757 + * EPT and private EPT for simplicity, though it's not necessary for 2758 + * private EPT. 2759 + */ 2760 + ept_sync_global(); 2761 + } 2762 + 2763 + static int tdx_td_finalize(struct kvm *kvm, struct kvm_tdx_cmd *cmd) 2764 + { 2765 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 2766 + 2767 + guard(mutex)(&kvm->slots_lock); 2768 + 2769 + if (!is_hkid_assigned(kvm_tdx) || kvm_tdx->state == TD_STATE_RUNNABLE) 2770 + return -EINVAL; 2771 + /* 2772 + * Pages are pending for KVM_TDX_INIT_MEM_REGION to issue 2773 + * TDH.MEM.PAGE.ADD(). 2774 + */ 2775 + if (atomic64_read(&kvm_tdx->nr_premapped)) 2776 + return -EINVAL; 2777 + 2778 + cmd->hw_error = tdh_mr_finalize(&kvm_tdx->td); 2779 + if (tdx_operand_busy(cmd->hw_error)) 2780 + return -EBUSY; 2781 + if (KVM_BUG_ON(cmd->hw_error, kvm)) { 2782 + pr_tdx_error(TDH_MR_FINALIZE, cmd->hw_error); 2783 + return -EIO; 2784 + } 2785 + 2786 + kvm_tdx->state = TD_STATE_RUNNABLE; 2787 + /* TD_STATE_RUNNABLE must be set before 'pre_fault_allowed' */ 2788 + smp_wmb(); 2789 + kvm->arch.pre_fault_allowed = true; 2790 + return 0; 2791 + } 2792 + 2793 + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) 2794 + { 2795 + struct kvm_tdx_cmd tdx_cmd; 2796 + int r; 2797 + 2798 + if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd))) 2799 + return -EFAULT; 2800 + 2801 + /* 2802 + * Userspace should never set hw_error. It is used to fill 2803 + * hardware-defined error by the kernel. 2804 + */ 2805 + if (tdx_cmd.hw_error) 2806 + return -EINVAL; 2807 + 2808 + mutex_lock(&kvm->lock); 2809 + 2810 + switch (tdx_cmd.id) { 2811 + case KVM_TDX_CAPABILITIES: 2812 + r = tdx_get_capabilities(&tdx_cmd); 2813 + break; 2814 + case KVM_TDX_INIT_VM: 2815 + r = tdx_td_init(kvm, &tdx_cmd); 2816 + break; 2817 + case KVM_TDX_FINALIZE_VM: 2818 + r = tdx_td_finalize(kvm, &tdx_cmd); 2819 + break; 2820 + default: 2821 + r = -EINVAL; 2822 + goto out; 2823 + } 2824 + 2825 + if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd))) 2826 + r = -EFAULT; 2827 + 2828 + out: 2829 + mutex_unlock(&kvm->lock); 2830 + return r; 2831 + } 2832 + 2833 + /* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */ 2834 + static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) 2835 + { 2836 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 2837 + struct vcpu_tdx *tdx = to_tdx(vcpu); 2838 + struct page *page; 2839 + int ret, i; 2840 + u64 err; 2841 + 2842 + page = alloc_page(GFP_KERNEL); 2843 + if (!page) 2844 + return -ENOMEM; 2845 + tdx->vp.tdvpr_page = page; 2846 + 2847 + tdx->vp.tdcx_pages = kcalloc(kvm_tdx->td.tdcx_nr_pages, sizeof(*tdx->vp.tdcx_pages), 2848 + GFP_KERNEL); 2849 + if (!tdx->vp.tdcx_pages) { 2850 + ret = -ENOMEM; 2851 + goto free_tdvpr; 2852 + } 2853 + 2854 + for (i = 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { 2855 + page = alloc_page(GFP_KERNEL); 2856 + if (!page) { 2857 + ret = -ENOMEM; 2858 + goto free_tdcx; 2859 + } 2860 + tdx->vp.tdcx_pages[i] = page; 2861 + } 2862 + 2863 + err = tdh_vp_create(&kvm_tdx->td, &tdx->vp); 2864 + if (KVM_BUG_ON(err, vcpu->kvm)) { 2865 + ret = -EIO; 2866 + pr_tdx_error(TDH_VP_CREATE, err); 2867 + goto free_tdcx; 2868 + } 2869 + 2870 + for (i = 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { 2871 + err = tdh_vp_addcx(&tdx->vp, tdx->vp.tdcx_pages[i]); 2872 + if (KVM_BUG_ON(err, vcpu->kvm)) { 2873 + pr_tdx_error(TDH_VP_ADDCX, err); 2874 + /* 2875 + * Pages already added are reclaimed by the vcpu_free 2876 + * method, but the rest are freed here. 2877 + */ 2878 + for (; i < kvm_tdx->td.tdcx_nr_pages; i++) { 2879 + __free_page(tdx->vp.tdcx_pages[i]); 2880 + tdx->vp.tdcx_pages[i] = NULL; 2881 + } 2882 + return -EIO; 2883 + } 2884 + } 2885 + 2886 + err = tdh_vp_init(&tdx->vp, vcpu_rcx, vcpu->vcpu_id); 2887 + if (KVM_BUG_ON(err, vcpu->kvm)) { 2888 + pr_tdx_error(TDH_VP_INIT, err); 2889 + return -EIO; 2890 + } 2891 + 2892 + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; 2893 + 2894 + return 0; 2895 + 2896 + free_tdcx: 2897 + for (i = 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { 2898 + if (tdx->vp.tdcx_pages[i]) 2899 + __free_page(tdx->vp.tdcx_pages[i]); 2900 + tdx->vp.tdcx_pages[i] = NULL; 2901 + } 2902 + kfree(tdx->vp.tdcx_pages); 2903 + tdx->vp.tdcx_pages = NULL; 2904 + 2905 + free_tdvpr: 2906 + if (tdx->vp.tdvpr_page) 2907 + __free_page(tdx->vp.tdvpr_page); 2908 + tdx->vp.tdvpr_page = 0; 2909 + 2910 + return ret; 2911 + } 2912 + 2913 + /* Sometimes reads multipple subleafs. Return how many enties were written. */ 2914 + static int tdx_vcpu_get_cpuid_leaf(struct kvm_vcpu *vcpu, u32 leaf, int *entry_index, 2915 + struct kvm_cpuid_entry2 *output_e) 2916 + { 2917 + int sub_leaf = 0; 2918 + int ret; 2919 + 2920 + /* First try without a subleaf */ 2921 + ret = tdx_read_cpuid(vcpu, leaf, 0, false, entry_index, output_e); 2922 + 2923 + /* If success, or invalid leaf, just give up */ 2924 + if (ret != -EIO) 2925 + return ret; 2926 + 2927 + /* 2928 + * If the try without a subleaf failed, try reading subleafs until 2929 + * failure. The TDX module only supports 6 bits of subleaf index. 2930 + */ 2931 + while (1) { 2932 + /* Keep reading subleafs until there is a failure. */ 2933 + if (tdx_read_cpuid(vcpu, leaf, sub_leaf, true, entry_index, output_e)) 2934 + return !sub_leaf; 2935 + 2936 + sub_leaf++; 2937 + output_e++; 2938 + } 2939 + 2940 + return 0; 2941 + } 2942 + 2943 + static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) 2944 + { 2945 + struct kvm_cpuid2 __user *output, *td_cpuid; 2946 + int r = 0, i = 0, leaf; 2947 + u32 level; 2948 + 2949 + output = u64_to_user_ptr(cmd->data); 2950 + td_cpuid = kzalloc(sizeof(*td_cpuid) + 2951 + sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES, 2952 + GFP_KERNEL); 2953 + if (!td_cpuid) 2954 + return -ENOMEM; 2955 + 2956 + if (copy_from_user(td_cpuid, output, sizeof(*output))) { 2957 + r = -EFAULT; 2958 + goto out; 2959 + } 2960 + 2961 + /* Read max CPUID for normal range */ 2962 + if (tdx_vcpu_get_cpuid_leaf(vcpu, 0, &i, &td_cpuid->entries[i])) { 2963 + r = -EIO; 2964 + goto out; 2965 + } 2966 + level = td_cpuid->entries[0].eax; 2967 + 2968 + for (leaf = 1; leaf <= level; leaf++) 2969 + tdx_vcpu_get_cpuid_leaf(vcpu, leaf, &i, &td_cpuid->entries[i]); 2970 + 2971 + /* Read max CPUID for extended range */ 2972 + if (tdx_vcpu_get_cpuid_leaf(vcpu, 0x80000000, &i, &td_cpuid->entries[i])) { 2973 + r = -EIO; 2974 + goto out; 2975 + } 2976 + level = td_cpuid->entries[i - 1].eax; 2977 + 2978 + for (leaf = 0x80000001; leaf <= level; leaf++) 2979 + tdx_vcpu_get_cpuid_leaf(vcpu, leaf, &i, &td_cpuid->entries[i]); 2980 + 2981 + if (td_cpuid->nent < i) 2982 + r = -E2BIG; 2983 + td_cpuid->nent = i; 2984 + 2985 + if (copy_to_user(output, td_cpuid, sizeof(*output))) { 2986 + r = -EFAULT; 2987 + goto out; 2988 + } 2989 + 2990 + if (r == -E2BIG) 2991 + goto out; 2992 + 2993 + if (copy_to_user(output->entries, td_cpuid->entries, 2994 + td_cpuid->nent * sizeof(struct kvm_cpuid_entry2))) 2995 + r = -EFAULT; 2996 + 2997 + out: 2998 + kfree(td_cpuid); 2999 + 3000 + return r; 3001 + } 3002 + 3003 + static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) 3004 + { 3005 + u64 apic_base; 3006 + struct vcpu_tdx *tdx = to_tdx(vcpu); 3007 + int ret; 3008 + 3009 + if (cmd->flags) 3010 + return -EINVAL; 3011 + 3012 + if (tdx->state != VCPU_TD_STATE_UNINITIALIZED) 3013 + return -EINVAL; 3014 + 3015 + /* 3016 + * TDX requires X2APIC, userspace is responsible for configuring guest 3017 + * CPUID accordingly. 3018 + */ 3019 + apic_base = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC | 3020 + (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0); 3021 + if (kvm_apic_set_base(vcpu, apic_base, true)) 3022 + return -EINVAL; 3023 + 3024 + ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data); 3025 + if (ret) 3026 + return ret; 3027 + 3028 + td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR); 3029 + td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->vt.pi_desc)); 3030 + td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR); 3031 + 3032 + tdx->state = VCPU_TD_STATE_INITIALIZED; 3033 + 3034 + return 0; 3035 + } 3036 + 3037 + void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 3038 + { 3039 + /* 3040 + * Yell on INIT, as TDX doesn't support INIT, i.e. KVM should drop all 3041 + * INIT events. 3042 + * 3043 + * Defer initializing vCPU for RESET state until KVM_TDX_INIT_VCPU, as 3044 + * userspace needs to define the vCPU model before KVM can initialize 3045 + * vCPU state, e.g. to enable x2APIC. 3046 + */ 3047 + WARN_ON_ONCE(init_event); 3048 + } 3049 + 3050 + struct tdx_gmem_post_populate_arg { 3051 + struct kvm_vcpu *vcpu; 3052 + __u32 flags; 3053 + }; 3054 + 3055 + static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, 3056 + void __user *src, int order, void *_arg) 3057 + { 3058 + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; 3059 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 3060 + struct tdx_gmem_post_populate_arg *arg = _arg; 3061 + struct kvm_vcpu *vcpu = arg->vcpu; 3062 + gpa_t gpa = gfn_to_gpa(gfn); 3063 + u8 level = PG_LEVEL_4K; 3064 + struct page *src_page; 3065 + int ret, i; 3066 + u64 err, entry, level_state; 3067 + 3068 + /* 3069 + * Get the source page if it has been faulted in. Return failure if the 3070 + * source page has been swapped out or unmapped in primary memory. 3071 + */ 3072 + ret = get_user_pages_fast((unsigned long)src, 1, 0, &src_page); 3073 + if (ret < 0) 3074 + return ret; 3075 + if (ret != 1) 3076 + return -ENOMEM; 3077 + 3078 + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); 3079 + if (ret < 0) 3080 + goto out; 3081 + 3082 + /* 3083 + * The private mem cannot be zapped after kvm_tdp_map_page() 3084 + * because all paths are covered by slots_lock and the 3085 + * filemap invalidate lock. Check that they are indeed enough. 3086 + */ 3087 + if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { 3088 + scoped_guard(read_lock, &kvm->mmu_lock) { 3089 + if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) { 3090 + ret = -EIO; 3091 + goto out; 3092 + } 3093 + } 3094 + } 3095 + 3096 + ret = 0; 3097 + err = tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn), 3098 + src_page, &entry, &level_state); 3099 + if (err) { 3100 + ret = unlikely(tdx_operand_busy(err)) ? -EBUSY : -EIO; 3101 + goto out; 3102 + } 3103 + 3104 + if (!KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) 3105 + atomic64_dec(&kvm_tdx->nr_premapped); 3106 + 3107 + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { 3108 + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { 3109 + err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry, 3110 + &level_state); 3111 + if (err) { 3112 + ret = -EIO; 3113 + break; 3114 + } 3115 + } 3116 + } 3117 + 3118 + out: 3119 + put_page(src_page); 3120 + return ret; 3121 + } 3122 + 3123 + static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) 3124 + { 3125 + struct vcpu_tdx *tdx = to_tdx(vcpu); 3126 + struct kvm *kvm = vcpu->kvm; 3127 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 3128 + struct kvm_tdx_init_mem_region region; 3129 + struct tdx_gmem_post_populate_arg arg; 3130 + long gmem_ret; 3131 + int ret; 3132 + 3133 + if (tdx->state != VCPU_TD_STATE_INITIALIZED) 3134 + return -EINVAL; 3135 + 3136 + guard(mutex)(&kvm->slots_lock); 3137 + 3138 + /* Once TD is finalized, the initial guest memory is fixed. */ 3139 + if (kvm_tdx->state == TD_STATE_RUNNABLE) 3140 + return -EINVAL; 3141 + 3142 + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) 3143 + return -EINVAL; 3144 + 3145 + if (copy_from_user(&region, u64_to_user_ptr(cmd->data), sizeof(region))) 3146 + return -EFAULT; 3147 + 3148 + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || 3149 + !region.nr_pages || 3150 + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || 3151 + !vt_is_tdx_private_gpa(kvm, region.gpa) || 3152 + !vt_is_tdx_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) 3153 + return -EINVAL; 3154 + 3155 + kvm_mmu_reload(vcpu); 3156 + ret = 0; 3157 + while (region.nr_pages) { 3158 + if (signal_pending(current)) { 3159 + ret = -EINTR; 3160 + break; 3161 + } 3162 + 3163 + arg = (struct tdx_gmem_post_populate_arg) { 3164 + .vcpu = vcpu, 3165 + .flags = cmd->flags, 3166 + }; 3167 + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), 3168 + u64_to_user_ptr(region.source_addr), 3169 + 1, tdx_gmem_post_populate, &arg); 3170 + if (gmem_ret < 0) { 3171 + ret = gmem_ret; 3172 + break; 3173 + } 3174 + 3175 + if (gmem_ret != 1) { 3176 + ret = -EIO; 3177 + break; 3178 + } 3179 + 3180 + region.source_addr += PAGE_SIZE; 3181 + region.gpa += PAGE_SIZE; 3182 + region.nr_pages--; 3183 + 3184 + cond_resched(); 3185 + } 3186 + 3187 + if (copy_to_user(u64_to_user_ptr(cmd->data), &region, sizeof(region))) 3188 + ret = -EFAULT; 3189 + return ret; 3190 + } 3191 + 3192 + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) 3193 + { 3194 + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); 3195 + struct kvm_tdx_cmd cmd; 3196 + int ret; 3197 + 3198 + if (!is_hkid_assigned(kvm_tdx) || kvm_tdx->state == TD_STATE_RUNNABLE) 3199 + return -EINVAL; 3200 + 3201 + if (copy_from_user(&cmd, argp, sizeof(cmd))) 3202 + return -EFAULT; 3203 + 3204 + if (cmd.hw_error) 3205 + return -EINVAL; 3206 + 3207 + switch (cmd.id) { 3208 + case KVM_TDX_INIT_VCPU: 3209 + ret = tdx_vcpu_init(vcpu, &cmd); 3210 + break; 3211 + case KVM_TDX_INIT_MEM_REGION: 3212 + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); 3213 + break; 3214 + case KVM_TDX_GET_CPUID: 3215 + ret = tdx_vcpu_get_cpuid(vcpu, &cmd); 3216 + break; 3217 + default: 3218 + ret = -EINVAL; 3219 + break; 3220 + } 3221 + 3222 + return ret; 3223 + } 3224 + 3225 + int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) 3226 + { 3227 + return PG_LEVEL_4K; 3228 + } 3229 + 3230 + static int tdx_online_cpu(unsigned int cpu) 3231 + { 3232 + unsigned long flags; 3233 + int r; 3234 + 3235 + /* Sanity check CPU is already in post-VMXON */ 3236 + WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); 3237 + 3238 + local_irq_save(flags); 3239 + r = tdx_cpu_enable(); 3240 + local_irq_restore(flags); 3241 + 3242 + return r; 3243 + } 3244 + 3245 + static int tdx_offline_cpu(unsigned int cpu) 3246 + { 3247 + int i; 3248 + 3249 + /* No TD is running. Allow any cpu to be offline. */ 3250 + if (!atomic_read(&nr_configured_hkid)) 3251 + return 0; 3252 + 3253 + /* 3254 + * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to 3255 + * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory 3256 + * controller with pconfig. If we have active TDX HKID, refuse to 3257 + * offline the last online cpu. 3258 + */ 3259 + for_each_online_cpu(i) { 3260 + /* 3261 + * Found another online cpu on the same package. 3262 + * Allow to offline. 3263 + */ 3264 + if (i != cpu && topology_physical_package_id(i) == 3265 + topology_physical_package_id(cpu)) 3266 + return 0; 3267 + } 3268 + 3269 + /* 3270 + * This is the last cpu of this package. Don't offline it. 3271 + * 3272 + * Because it's hard for human operator to understand the 3273 + * reason, warn it. 3274 + */ 3275 + #define MSG_ALLPKG_ONLINE \ 3276 + "TDX requires all packages to have an online CPU. Delete all TDs in order to offline all CPUs of a package.\n" 3277 + pr_warn_ratelimited(MSG_ALLPKG_ONLINE); 3278 + return -EBUSY; 3279 + } 3280 + 3281 + static void __do_tdx_cleanup(void) 3282 + { 3283 + /* 3284 + * Once TDX module is initialized, it cannot be disabled and 3285 + * re-initialized again w/o runtime update (which isn't 3286 + * supported by kernel). Only need to remove the cpuhp here. 3287 + * The TDX host core code tracks TDX status and can handle 3288 + * 'multiple enabling' scenario. 3289 + */ 3290 + WARN_ON_ONCE(!tdx_cpuhp_state); 3291 + cpuhp_remove_state_nocalls_cpuslocked(tdx_cpuhp_state); 3292 + tdx_cpuhp_state = 0; 3293 + } 3294 + 3295 + static void __tdx_cleanup(void) 3296 + { 3297 + cpus_read_lock(); 3298 + __do_tdx_cleanup(); 3299 + cpus_read_unlock(); 3300 + } 3301 + 3302 + static int __init __do_tdx_bringup(void) 3303 + { 3304 + int r; 3305 + 3306 + /* 3307 + * TDX-specific cpuhp callback to call tdx_cpu_enable() on all 3308 + * online CPUs before calling tdx_enable(), and on any new 3309 + * going-online CPU to make sure it is ready for TDX guest. 3310 + */ 3311 + r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, 3312 + "kvm/cpu/tdx:online", 3313 + tdx_online_cpu, tdx_offline_cpu); 3314 + if (r < 0) 3315 + return r; 3316 + 3317 + tdx_cpuhp_state = r; 3318 + 3319 + r = tdx_enable(); 3320 + if (r) 3321 + __do_tdx_cleanup(); 3322 + 3323 + return r; 3324 + } 3325 + 3326 + static int __init __tdx_bringup(void) 3327 + { 3328 + const struct tdx_sys_info_td_conf *td_conf; 3329 + int r, i; 3330 + 3331 + for (i = 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) { 3332 + /* 3333 + * Check if MSRs (tdx_uret_msrs) can be saved/restored 3334 + * before returning to user space. 3335 + * 3336 + * this_cpu_ptr(user_return_msrs)->registered isn't checked 3337 + * because the registration is done at vcpu runtime by 3338 + * tdx_user_return_msr_update_cache(). 3339 + */ 3340 + tdx_uret_msrs[i].slot = kvm_find_user_return_msr(tdx_uret_msrs[i].msr); 3341 + if (tdx_uret_msrs[i].slot == -1) { 3342 + /* If any MSR isn't supported, it is a KVM bug */ 3343 + pr_err("MSR %x isn't included by kvm_find_user_return_msr\n", 3344 + tdx_uret_msrs[i].msr); 3345 + return -EIO; 3346 + } 3347 + } 3348 + 3349 + /* 3350 + * Enabling TDX requires enabling hardware virtualization first, 3351 + * as making SEAMCALLs requires CPU being in post-VMXON state. 3352 + */ 3353 + r = kvm_enable_virtualization(); 3354 + if (r) 3355 + return r; 3356 + 3357 + cpus_read_lock(); 3358 + r = __do_tdx_bringup(); 3359 + cpus_read_unlock(); 3360 + 3361 + if (r) 3362 + goto tdx_bringup_err; 3363 + 3364 + /* Get TDX global information for later use */ 3365 + tdx_sysinfo = tdx_get_sysinfo(); 3366 + if (WARN_ON_ONCE(!tdx_sysinfo)) { 3367 + r = -EINVAL; 3368 + goto get_sysinfo_err; 3369 + } 3370 + 3371 + /* Check TDX module and KVM capabilities */ 3372 + if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) || 3373 + !tdx_get_supported_xfam(&tdx_sysinfo->td_conf)) 3374 + goto get_sysinfo_err; 3375 + 3376 + if (!(tdx_sysinfo->features.tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM)) 3377 + goto get_sysinfo_err; 3378 + 3379 + /* 3380 + * TDX has its own limit of maximum vCPUs it can support for all 3381 + * TDX guests in addition to KVM_MAX_VCPUS. Userspace needs to 3382 + * query TDX guest's maximum vCPUs by checking KVM_CAP_MAX_VCPU 3383 + * extension on per-VM basis. 3384 + * 3385 + * TDX module reports such limit via the MAX_VCPU_PER_TD global 3386 + * metadata. Different modules may report different values. 3387 + * Some old module may also not support this metadata (in which 3388 + * case this limit is U16_MAX). 3389 + * 3390 + * In practice, the reported value reflects the maximum logical 3391 + * CPUs that ALL the platforms that the module supports can 3392 + * possibly have. 3393 + * 3394 + * Simply forwarding the MAX_VCPU_PER_TD to userspace could 3395 + * result in an unpredictable ABI. KVM instead always advertise 3396 + * the number of logical CPUs the platform has as the maximum 3397 + * vCPUs for TDX guests. 3398 + * 3399 + * Make sure MAX_VCPU_PER_TD reported by TDX module is not 3400 + * smaller than the number of logical CPUs, otherwise KVM will 3401 + * report an unsupported value to userspace. 3402 + * 3403 + * Note, a platform with TDX enabled in the BIOS cannot support 3404 + * physical CPU hotplug, and TDX requires the BIOS has marked 3405 + * all logical CPUs in MADT table as enabled. Just use 3406 + * num_present_cpus() for the number of logical CPUs. 3407 + */ 3408 + td_conf = &tdx_sysinfo->td_conf; 3409 + if (td_conf->max_vcpus_per_td < num_present_cpus()) { 3410 + pr_err("Disable TDX: MAX_VCPU_PER_TD (%u) smaller than number of logical CPUs (%u).\n", 3411 + td_conf->max_vcpus_per_td, num_present_cpus()); 3412 + r = -EINVAL; 3413 + goto get_sysinfo_err; 3414 + } 3415 + 3416 + if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) { 3417 + r = -EINVAL; 3418 + goto get_sysinfo_err; 3419 + } 3420 + 3421 + /* 3422 + * Leave hardware virtualization enabled after TDX is enabled 3423 + * successfully. TDX CPU hotplug depends on this. 3424 + */ 3425 + return 0; 3426 + 3427 + get_sysinfo_err: 3428 + __tdx_cleanup(); 3429 + tdx_bringup_err: 3430 + kvm_disable_virtualization(); 3431 + return r; 3432 + } 3433 + 3434 + void tdx_cleanup(void) 3435 + { 3436 + if (enable_tdx) { 3437 + misc_cg_set_capacity(MISC_CG_RES_TDX, 0); 3438 + __tdx_cleanup(); 3439 + kvm_disable_virtualization(); 3440 + } 3441 + } 3442 + 3443 + int __init tdx_bringup(void) 3444 + { 3445 + int r, i; 3446 + 3447 + /* tdx_disable_virtualization_cpu() uses associated_tdvcpus. */ 3448 + for_each_possible_cpu(i) 3449 + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, i)); 3450 + 3451 + if (!enable_tdx) 3452 + return 0; 3453 + 3454 + if (!enable_ept) { 3455 + pr_err("EPT is required for TDX\n"); 3456 + goto success_disable_tdx; 3457 + } 3458 + 3459 + if (!tdp_mmu_enabled || !enable_mmio_caching || !enable_ept_ad_bits) { 3460 + pr_err("TDP MMU and MMIO caching and EPT A/D bit is required for TDX\n"); 3461 + goto success_disable_tdx; 3462 + } 3463 + 3464 + if (!enable_apicv) { 3465 + pr_err("APICv is required for TDX\n"); 3466 + goto success_disable_tdx; 3467 + } 3468 + 3469 + if (!cpu_feature_enabled(X86_FEATURE_OSXSAVE)) { 3470 + pr_err("tdx: OSXSAVE is required for TDX\n"); 3471 + goto success_disable_tdx; 3472 + } 3473 + 3474 + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { 3475 + pr_err("tdx: MOVDIR64B is required for TDX\n"); 3476 + goto success_disable_tdx; 3477 + } 3478 + 3479 + if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) { 3480 + pr_err("Self-snoop is required for TDX\n"); 3481 + goto success_disable_tdx; 3482 + } 3483 + 3484 + if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) { 3485 + pr_err("tdx: no TDX private KeyIDs available\n"); 3486 + goto success_disable_tdx; 3487 + } 3488 + 3489 + if (!enable_virt_at_load) { 3490 + pr_err("tdx: tdx requires kvm.enable_virt_at_load=1\n"); 3491 + goto success_disable_tdx; 3492 + } 3493 + 3494 + /* 3495 + * Ideally KVM should probe whether TDX module has been loaded 3496 + * first and then try to bring it up. But TDX needs to use SEAMCALL 3497 + * to probe whether the module is loaded (there is no CPUID or MSR 3498 + * for that), and making SEAMCALL requires enabling virtualization 3499 + * first, just like the rest steps of bringing up TDX module. 3500 + * 3501 + * So, for simplicity do everything in __tdx_bringup(); the first 3502 + * SEAMCALL will return -ENODEV when the module is not loaded. The 3503 + * only complication is having to make sure that initialization 3504 + * SEAMCALLs don't return TDX_SEAMCALL_VMFAILINVALID in other 3505 + * cases. 3506 + */ 3507 + r = __tdx_bringup(); 3508 + if (r) { 3509 + /* 3510 + * Disable TDX only but don't fail to load module if 3511 + * the TDX module could not be loaded. No need to print 3512 + * message saying "module is not loaded" because it was 3513 + * printed when the first SEAMCALL failed. 3514 + */ 3515 + if (r == -ENODEV) 3516 + goto success_disable_tdx; 3517 + 3518 + enable_tdx = 0; 3519 + } 3520 + 3521 + return r; 3522 + 3523 + success_disable_tdx: 3524 + enable_tdx = 0; 3525 + return 0; 3526 + }

+204

arch/x86/kvm/vmx/tdx.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef __KVM_X86_VMX_TDX_H 3 + #define __KVM_X86_VMX_TDX_H 4 + 5 + #include "tdx_arch.h" 6 + #include "tdx_errno.h" 7 + 8 + #ifdef CONFIG_KVM_INTEL_TDX 9 + #include "common.h" 10 + 11 + int tdx_bringup(void); 12 + void tdx_cleanup(void); 13 + 14 + extern bool enable_tdx; 15 + 16 + /* TDX module hardware states. These follow the TDX module OP_STATEs. */ 17 + enum kvm_tdx_state { 18 + TD_STATE_UNINITIALIZED = 0, 19 + TD_STATE_INITIALIZED, 20 + TD_STATE_RUNNABLE, 21 + }; 22 + 23 + struct kvm_tdx { 24 + struct kvm kvm; 25 + 26 + struct misc_cg *misc_cg; 27 + int hkid; 28 + enum kvm_tdx_state state; 29 + 30 + u64 attributes; 31 + u64 xfam; 32 + 33 + u64 tsc_offset; 34 + u64 tsc_multiplier; 35 + 36 + struct tdx_td td; 37 + 38 + /* For KVM_TDX_INIT_MEM_REGION. */ 39 + atomic64_t nr_premapped; 40 + 41 + /* 42 + * Prevent vCPUs from TD entry to ensure SEPT zap related SEAMCALLs do 43 + * not contend with tdh_vp_enter() and TDCALLs. 44 + * Set/unset is protected with kvm->mmu_lock. 45 + */ 46 + bool wait_for_sept_zap; 47 + }; 48 + 49 + /* TDX module vCPU states */ 50 + enum vcpu_tdx_state { 51 + VCPU_TD_STATE_UNINITIALIZED = 0, 52 + VCPU_TD_STATE_INITIALIZED, 53 + }; 54 + 55 + struct vcpu_tdx { 56 + struct kvm_vcpu vcpu; 57 + struct vcpu_vt vt; 58 + u64 ext_exit_qualification; 59 + gpa_t exit_gpa; 60 + struct tdx_module_args vp_enter_args; 61 + 62 + struct tdx_vp vp; 63 + 64 + struct list_head cpu_list; 65 + 66 + u64 vp_enter_ret; 67 + 68 + enum vcpu_tdx_state state; 69 + bool guest_entered; 70 + 71 + u64 map_gpa_next; 72 + u64 map_gpa_end; 73 + }; 74 + 75 + void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); 76 + void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, 77 + u64 val, u64 err); 78 + 79 + static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field) 80 + { 81 + u64 err, data; 82 + 83 + err = tdh_mng_rd(&kvm_tdx->td, TDCS_EXEC(field), &data); 84 + if (unlikely(err)) { 85 + pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err); 86 + return 0; 87 + } 88 + return data; 89 + } 90 + 91 + static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) 92 + { 93 + #define VMCS_ENC_ACCESS_TYPE_MASK 0x1UL 94 + #define VMCS_ENC_ACCESS_TYPE_FULL 0x0UL 95 + #define VMCS_ENC_ACCESS_TYPE_HIGH 0x1UL 96 + #define VMCS_ENC_ACCESS_TYPE(field) ((field) & VMCS_ENC_ACCESS_TYPE_MASK) 97 + 98 + /* TDX is 64bit only. HIGH field isn't supported. */ 99 + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && 100 + VMCS_ENC_ACCESS_TYPE(field) == VMCS_ENC_ACCESS_TYPE_HIGH, 101 + "Read/Write to TD VMCS *_HIGH fields not supported"); 102 + 103 + BUILD_BUG_ON(bits != 16 && bits != 32 && bits != 64); 104 + 105 + #define VMCS_ENC_WIDTH_MASK GENMASK(14, 13) 106 + #define VMCS_ENC_WIDTH_16BIT (0UL << 13) 107 + #define VMCS_ENC_WIDTH_64BIT (1UL << 13) 108 + #define VMCS_ENC_WIDTH_32BIT (2UL << 13) 109 + #define VMCS_ENC_WIDTH_NATURAL (3UL << 13) 110 + #define VMCS_ENC_WIDTH(field) ((field) & VMCS_ENC_WIDTH_MASK) 111 + 112 + /* TDX is 64bit only. i.e. natural width = 64bit. */ 113 + BUILD_BUG_ON_MSG(bits != 64 && __builtin_constant_p(field) && 114 + (VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_64BIT || 115 + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_NATURAL), 116 + "Invalid TD VMCS access for 64-bit field"); 117 + BUILD_BUG_ON_MSG(bits != 32 && __builtin_constant_p(field) && 118 + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_32BIT, 119 + "Invalid TD VMCS access for 32-bit field"); 120 + BUILD_BUG_ON_MSG(bits != 16 && __builtin_constant_p(field) && 121 + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_16BIT, 122 + "Invalid TD VMCS access for 16-bit field"); 123 + } 124 + 125 + static __always_inline void tdvps_management_check(u64 field, u8 bits) {} 126 + static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits) {} 127 + 128 + #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ 129 + static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ 130 + u32 field) \ 131 + { \ 132 + u64 err, data; \ 133 + \ 134 + tdvps_##lclass##_check(field, bits); \ 135 + err = tdh_vp_rd(&tdx->vp, TDVPS_##uclass(field), &data); \ 136 + if (unlikely(err)) { \ 137 + tdh_vp_rd_failed(tdx, #uclass, field, err); \ 138 + return 0; \ 139 + } \ 140 + return (u##bits)data; \ 141 + } \ 142 + static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx, \ 143 + u32 field, u##bits val) \ 144 + { \ 145 + u64 err; \ 146 + \ 147 + tdvps_##lclass##_check(field, bits); \ 148 + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), val, \ 149 + GENMASK_ULL(bits - 1, 0)); \ 150 + if (unlikely(err)) \ 151 + tdh_vp_wr_failed(tdx, #uclass, " = ", field, (u64)val, err); \ 152 + } \ 153 + static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx, \ 154 + u32 field, u64 bit) \ 155 + { \ 156 + u64 err; \ 157 + \ 158 + tdvps_##lclass##_check(field, bits); \ 159 + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), bit, bit); \ 160 + if (unlikely(err)) \ 161 + tdh_vp_wr_failed(tdx, #uclass, " |= ", field, bit, err); \ 162 + } \ 163 + static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ 164 + u32 field, u64 bit) \ 165 + { \ 166 + u64 err; \ 167 + \ 168 + tdvps_##lclass##_check(field, bits); \ 169 + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), 0, bit); \ 170 + if (unlikely(err)) \ 171 + tdh_vp_wr_failed(tdx, #uclass, " &= ~", field, bit, err);\ 172 + } 173 + 174 + 175 + bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu); 176 + int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err); 177 + 178 + TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); 179 + TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); 180 + TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); 181 + 182 + TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management); 183 + TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch); 184 + 185 + #else 186 + static inline int tdx_bringup(void) { return 0; } 187 + static inline void tdx_cleanup(void) {} 188 + 189 + #define enable_tdx 0 190 + 191 + struct kvm_tdx { 192 + struct kvm kvm; 193 + }; 194 + 195 + struct vcpu_tdx { 196 + struct kvm_vcpu vcpu; 197 + }; 198 + 199 + static inline bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) { return false; } 200 + static inline int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) { return 0; } 201 + 202 + #endif 203 + 204 + #endif

+167

arch/x86/kvm/vmx/tdx_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* architectural constants/data definitions for TDX SEAMCALLs */ 3 + 4 + #ifndef __KVM_X86_TDX_ARCH_H 5 + #define __KVM_X86_TDX_ARCH_H 6 + 7 + #include <linux/types.h> 8 + 9 + /* TDX control structure (TDR/TDCS/TDVPS) field access codes */ 10 + #define TDX_NON_ARCH BIT_ULL(63) 11 + #define TDX_CLASS_SHIFT 56 12 + #define TDX_FIELD_MASK GENMASK_ULL(31, 0) 13 + 14 + #define __BUILD_TDX_FIELD(non_arch, class, field) \ 15 + (((non_arch) ? TDX_NON_ARCH : 0) | \ 16 + ((u64)(class) << TDX_CLASS_SHIFT) | \ 17 + ((u64)(field) & TDX_FIELD_MASK)) 18 + 19 + #define BUILD_TDX_FIELD(class, field) \ 20 + __BUILD_TDX_FIELD(false, (class), (field)) 21 + 22 + #define BUILD_TDX_FIELD_NON_ARCH(class, field) \ 23 + __BUILD_TDX_FIELD(true, (class), (field)) 24 + 25 + 26 + /* Class code for TD */ 27 + #define TD_CLASS_EXECUTION_CONTROLS 17ULL 28 + 29 + /* Class code for TDVPS */ 30 + #define TDVPS_CLASS_VMCS 0ULL 31 + #define TDVPS_CLASS_GUEST_GPR 16ULL 32 + #define TDVPS_CLASS_OTHER_GUEST 17ULL 33 + #define TDVPS_CLASS_MANAGEMENT 32ULL 34 + 35 + enum tdx_tdcs_execution_control { 36 + TD_TDCS_EXEC_TSC_OFFSET = 10, 37 + TD_TDCS_EXEC_TSC_MULTIPLIER = 11, 38 + }; 39 + 40 + enum tdx_vcpu_guest_other_state { 41 + TD_VCPU_STATE_DETAILS_NON_ARCH = 0x100, 42 + }; 43 + 44 + #define TDX_VCPU_STATE_DETAILS_INTR_PENDING BIT_ULL(0) 45 + 46 + static inline bool tdx_vcpu_state_details_intr_pending(u64 vcpu_state_details) 47 + { 48 + return !!(vcpu_state_details & TDX_VCPU_STATE_DETAILS_INTR_PENDING); 49 + } 50 + 51 + /* @field is any of enum tdx_tdcs_execution_control */ 52 + #define TDCS_EXEC(field) BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (field)) 53 + 54 + /* @field is the VMCS field encoding */ 55 + #define TDVPS_VMCS(field) BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field)) 56 + 57 + /* @field is any of enum tdx_guest_other_state */ 58 + #define TDVPS_STATE(field) BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (field)) 59 + #define TDVPS_STATE_NON_ARCH(field) BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_OTHER_GUEST, (field)) 60 + 61 + /* Management class fields */ 62 + enum tdx_vcpu_guest_management { 63 + TD_VCPU_PEND_NMI = 11, 64 + }; 65 + 66 + /* @field is any of enum tdx_vcpu_guest_management */ 67 + #define TDVPS_MANAGEMENT(field) BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (field)) 68 + 69 + #define TDX_EXTENDMR_CHUNKSIZE 256 70 + 71 + struct tdx_cpuid_value { 72 + u32 eax; 73 + u32 ebx; 74 + u32 ecx; 75 + u32 edx; 76 + } __packed; 77 + 78 + #define TDX_TD_ATTR_DEBUG BIT_ULL(0) 79 + #define TDX_TD_ATTR_SEPT_VE_DISABLE BIT_ULL(28) 80 + #define TDX_TD_ATTR_PKS BIT_ULL(30) 81 + #define TDX_TD_ATTR_KL BIT_ULL(31) 82 + #define TDX_TD_ATTR_PERFMON BIT_ULL(63) 83 + 84 + #define TDX_EXT_EXIT_QUAL_TYPE_MASK GENMASK(3, 0) 85 + #define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION 6 86 + /* 87 + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B. 88 + */ 89 + struct td_params { 90 + u64 attributes; 91 + u64 xfam; 92 + u16 max_vcpus; 93 + u8 reserved0[6]; 94 + 95 + u64 eptp_controls; 96 + u64 config_flags; 97 + u16 tsc_frequency; 98 + u8 reserved1[38]; 99 + 100 + u64 mrconfigid[6]; 101 + u64 mrowner[6]; 102 + u64 mrownerconfig[6]; 103 + u64 reserved2[4]; 104 + 105 + union { 106 + DECLARE_FLEX_ARRAY(struct tdx_cpuid_value, cpuid_values); 107 + u8 reserved3[768]; 108 + }; 109 + } __packed __aligned(1024); 110 + 111 + /* 112 + * Guest uses MAX_PA for GPAW when set. 113 + * 0: GPA.SHARED bit is GPA[47] 114 + * 1: GPA.SHARED bit is GPA[51] 115 + */ 116 + #define TDX_CONFIG_FLAGS_MAX_GPAW BIT_ULL(0) 117 + 118 + /* 119 + * TDH.VP.ENTER, TDG.VP.VMCALL preserves RBP 120 + * 0: RBP can be used for TDG.VP.VMCALL input. RBP is clobbered. 121 + * 1: RBP can't be used for TDG.VP.VMCALL input. RBP is preserved. 122 + */ 123 + #define TDX_CONFIG_FLAGS_NO_RBP_MOD BIT_ULL(2) 124 + 125 + 126 + /* 127 + * TDX requires the frequency to be defined in units of 25MHz, which is the 128 + * frequency of the core crystal clock on TDX-capable platforms, i.e. the TDX 129 + * module can only program frequencies that are multiples of 25MHz. The 130 + * frequency must be between 100mhz and 10ghz (inclusive). 131 + */ 132 + #define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz) ((tsc_in_khz) / (25 * 1000)) 133 + #define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz) ((tsc_in_25mhz) * (25 * 1000)) 134 + #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) 135 + #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) 136 + 137 + /* Additional Secure EPT entry information */ 138 + #define TDX_SEPT_LEVEL_MASK GENMASK_ULL(2, 0) 139 + #define TDX_SEPT_STATE_MASK GENMASK_ULL(15, 8) 140 + #define TDX_SEPT_STATE_SHIFT 8 141 + 142 + enum tdx_sept_entry_state { 143 + TDX_SEPT_FREE = 0, 144 + TDX_SEPT_BLOCKED = 1, 145 + TDX_SEPT_PENDING = 2, 146 + TDX_SEPT_PENDING_BLOCKED = 3, 147 + TDX_SEPT_PRESENT = 4, 148 + }; 149 + 150 + static inline u8 tdx_get_sept_level(u64 sept_entry_info) 151 + { 152 + return sept_entry_info & TDX_SEPT_LEVEL_MASK; 153 + } 154 + 155 + static inline u8 tdx_get_sept_state(u64 sept_entry_info) 156 + { 157 + return (sept_entry_info & TDX_SEPT_STATE_MASK) >> TDX_SEPT_STATE_SHIFT; 158 + } 159 + 160 + #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) 161 + 162 + /* 163 + * TD scope metadata field ID. 164 + */ 165 + #define TD_MD_FIELD_ID_CPUID_VALUES 0x9410000300000000ULL 166 + 167 + #endif /* __KVM_X86_TDX_ARCH_H */

+40

arch/x86/kvm/vmx/tdx_errno.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* architectural status code for SEAMCALL */ 3 + 4 + #ifndef __KVM_X86_TDX_ERRNO_H 5 + #define __KVM_X86_TDX_ERRNO_H 6 + 7 + #define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL 8 + 9 + /* 10 + * TDX SEAMCALL Status Codes (returned in RAX) 11 + */ 12 + #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL 13 + #define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL 14 + #define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL 15 + #define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL 16 + #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL 17 + #define TDX_OPERAND_INVALID 0xC000010000000000ULL 18 + #define TDX_OPERAND_BUSY 0x8000020000000000ULL 19 + #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL 20 + #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL 21 + #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL 22 + #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL 23 + #define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL 24 + #define TDX_KEY_CONFIGURED 0x0000081500000000ULL 25 + #define TDX_NO_HKID_READY_TO_WBCACHE 0x0000082100000000ULL 26 + #define TDX_FLUSHVP_NOT_DONE 0x8000082400000000ULL 27 + #define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL 28 + #define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL 29 + #define TDX_METADATA_FIELD_NOT_READABLE 0xC0000C0200000000ULL 30 + 31 + /* 32 + * TDX module operand ID, appears in 31:0 part of error code as 33 + * detail information 34 + */ 35 + #define TDX_OPERAND_ID_RCX 0x01 36 + #define TDX_OPERAND_ID_TDR 0x80 37 + #define TDX_OPERAND_ID_SEPT 0x92 38 + #define TDX_OPERAND_ID_TD_EPOCH 0xa9 39 + 40 + #endif /* __KVM_X86_TDX_ERRNO_H */

+111 -180

arch/x86/kvm/vmx/vmx.c

··· 54 54 #include <trace/events/ipi.h> 55 55 56 56 #include "capabilities.h" 57 + #include "common.h" 57 58 #include "cpuid.h" 58 59 #include "hyperv.h" 59 60 #include "kvm_onhyperv.h" ··· 1284 1283 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) 1285 1284 { 1286 1285 struct vcpu_vmx *vmx = to_vmx(vcpu); 1286 + struct vcpu_vt *vt = to_vt(vcpu); 1287 1287 struct vmcs_host_state *host_state; 1288 1288 #ifdef CONFIG_X86_64 1289 1289 int cpu = raw_smp_processor_id(); ··· 1313 1311 if (vmx->nested.need_vmcs12_to_shadow_sync) 1314 1312 nested_sync_vmcs12_to_shadow(vcpu); 1315 1313 1316 - if (vmx->guest_state_loaded) 1314 + if (vt->guest_state_loaded) 1317 1315 return; 1318 1316 1319 1317 host_state = &vmx->loaded_vmcs->host_state; ··· 1334 1332 fs_sel = current->thread.fsindex; 1335 1333 gs_sel = current->thread.gsindex; 1336 1334 fs_base = current->thread.fsbase; 1337 - vmx->msr_host_kernel_gs_base = current->thread.gsbase; 1335 + vt->msr_host_kernel_gs_base = current->thread.gsbase; 1338 1336 } else { 1339 1337 savesegment(fs, fs_sel); 1340 1338 savesegment(gs, gs_sel); 1341 1339 fs_base = read_msr(MSR_FS_BASE); 1342 - vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE); 1340 + vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE); 1343 1341 } 1344 1342 1345 1343 wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); ··· 1351 1349 #endif 1352 1350 1353 1351 vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base); 1354 - vmx->guest_state_loaded = true; 1352 + vt->guest_state_loaded = true; 1355 1353 } 1356 1354 1357 1355 static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx) 1358 1356 { 1359 1357 struct vmcs_host_state *host_state; 1360 1358 1361 - if (!vmx->guest_state_loaded) 1359 + if (!vmx->vt.guest_state_loaded) 1362 1360 return; 1363 1361 1364 1362 host_state = &vmx->loaded_vmcs->host_state; ··· 1386 1384 #endif 1387 1385 invalidate_tss_limit(); 1388 1386 #ifdef CONFIG_X86_64 1389 - wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); 1387 + wrmsrq(MSR_KERNEL_GS_BASE, vmx->vt.msr_host_kernel_gs_base); 1390 1388 #endif 1391 1389 load_fixmap_gdt(raw_smp_processor_id()); 1392 - vmx->guest_state_loaded = false; 1390 + vmx->vt.guest_state_loaded = false; 1393 1391 vmx->guest_uret_msrs_loaded = false; 1394 1392 } 1395 1393 ··· 1397 1395 static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) 1398 1396 { 1399 1397 preempt_disable(); 1400 - if (vmx->guest_state_loaded) 1398 + if (vmx->vt.guest_state_loaded) 1401 1399 rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); 1402 1400 preempt_enable(); 1403 1401 return vmx->msr_guest_kernel_gs_base; ··· 1406 1404 static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data) 1407 1405 { 1408 1406 preempt_disable(); 1409 - if (vmx->guest_state_loaded) 1407 + if (vmx->vt.guest_state_loaded) 1410 1408 wrmsrq(MSR_KERNEL_GS_BASE, data); 1411 1409 preempt_enable(); 1412 1410 vmx->msr_guest_kernel_gs_base = data; ··· 1583 1581 vmcs_writel(GUEST_RFLAGS, rflags); 1584 1582 1585 1583 if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM) 1586 - vmx->emulation_required = vmx_emulation_required(vcpu); 1584 + vmx->vt.emulation_required = vmx_emulation_required(vcpu); 1587 1585 } 1588 1586 1589 1587 bool vmx_get_if_flag(struct kvm_vcpu *vcpu) ··· 1703 1701 * so that guest userspace can't DoS the guest simply by triggering 1704 1702 * emulation (enclaves are CPL3 only). 1705 1703 */ 1706 - if (to_vmx(vcpu)->exit_reason.enclave_mode) { 1704 + if (vmx_get_exit_reason(vcpu).enclave_mode) { 1707 1705 kvm_queue_exception(vcpu, UD_VECTOR); 1708 1706 return X86EMUL_PROPAGATE_FAULT; 1709 1707 } ··· 1718 1716 1719 1717 static int skip_emulated_instruction(struct kvm_vcpu *vcpu) 1720 1718 { 1721 - union vmx_exit_reason exit_reason = to_vmx(vcpu)->exit_reason; 1719 + union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); 1722 1720 unsigned long rip, orig_rip; 1723 1721 u32 instr_len; 1724 1722 ··· 1865 1863 return; 1866 1864 } 1867 1865 1868 - WARN_ON_ONCE(vmx->emulation_required); 1866 + WARN_ON_ONCE(vmx->vt.emulation_required); 1869 1867 1870 1868 if (kvm_exception_is_soft(ex->vector)) { 1871 1869 vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, ··· 3408 3406 } 3409 3407 3410 3408 /* depends on vcpu->arch.cr0 to be set to a new value */ 3411 - vmx->emulation_required = vmx_emulation_required(vcpu); 3409 + vmx->vt.emulation_required = vmx_emulation_required(vcpu); 3412 3410 } 3413 3411 3414 3412 static int vmx_get_max_ept_level(void) ··· 3671 3669 { 3672 3670 __vmx_set_segment(vcpu, var, seg); 3673 3671 3674 - to_vmx(vcpu)->emulation_required = vmx_emulation_required(vcpu); 3672 + to_vmx(vcpu)->vt.emulation_required = vmx_emulation_required(vcpu); 3675 3673 } 3676 3674 3677 3675 void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) ··· 4199 4197 pt_update_intercept_for_msr(vcpu); 4200 4198 } 4201 4199 4202 - static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu, 4203 - int pi_vec) 4204 - { 4205 - #ifdef CONFIG_SMP 4206 - if (vcpu->mode == IN_GUEST_MODE) { 4207 - /* 4208 - * The vector of the virtual has already been set in the PIR. 4209 - * Send a notification event to deliver the virtual interrupt 4210 - * unless the vCPU is the currently running vCPU, i.e. the 4211 - * event is being sent from a fastpath VM-Exit handler, in 4212 - * which case the PIR will be synced to the vIRR before 4213 - * re-entering the guest. 4214 - * 4215 - * When the target is not the running vCPU, the following 4216 - * possibilities emerge: 4217 - * 4218 - * Case 1: vCPU stays in non-root mode. Sending a notification 4219 - * event posts the interrupt to the vCPU. 4220 - * 4221 - * Case 2: vCPU exits to root mode and is still runnable. The 4222 - * PIR will be synced to the vIRR before re-entering the guest. 4223 - * Sending a notification event is ok as the host IRQ handler 4224 - * will ignore the spurious event. 4225 - * 4226 - * Case 3: vCPU exits to root mode and is blocked. vcpu_block() 4227 - * has already synced PIR to vIRR and never blocks the vCPU if 4228 - * the vIRR is not empty. Therefore, a blocked vCPU here does 4229 - * not wait for any requested interrupts in PIR, and sending a 4230 - * notification event also results in a benign, spurious event. 4231 - */ 4232 - 4233 - if (vcpu != kvm_get_running_vcpu()) 4234 - __apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec); 4235 - return; 4236 - } 4237 - #endif 4238 - /* 4239 - * The vCPU isn't in the guest; wake the vCPU in case it is blocking, 4240 - * otherwise do nothing as KVM will grab the highest priority pending 4241 - * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest(). 4242 - */ 4243 - kvm_vcpu_wake_up(vcpu); 4244 - } 4245 - 4246 4200 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, 4247 4201 int vector) 4248 4202 { ··· 4247 4289 */ 4248 4290 static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) 4249 4291 { 4250 - struct vcpu_vmx *vmx = to_vmx(vcpu); 4292 + struct vcpu_vt *vt = to_vt(vcpu); 4251 4293 int r; 4252 4294 4253 4295 r = vmx_deliver_nested_posted_interrupt(vcpu, vector); ··· 4258 4300 if (!vcpu->arch.apic->apicv_active) 4259 4301 return -1; 4260 4302 4261 - if (pi_test_and_set_pir(vector, &vmx->pi_desc)) 4262 - return 0; 4263 - 4264 - /* If a previous notification has sent the IPI, nothing to do. */ 4265 - if (pi_test_and_set_on(&vmx->pi_desc)) 4266 - return 0; 4267 - 4268 - /* 4269 - * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*() 4270 - * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is 4271 - * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a 4272 - * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. 4273 - */ 4274 - kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR); 4303 + __vmx_deliver_posted_interrupt(vcpu, &vt->pi_desc, vector); 4275 4304 return 0; 4276 4305 } 4277 4306 ··· 4725 4780 vmcs_write16(GUEST_INTR_STATUS, 0); 4726 4781 4727 4782 vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); 4728 - vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); 4783 + vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->vt.pi_desc))); 4729 4784 } 4730 4785 4731 4786 if (vmx_can_use_ipiv(&vmx->vcpu)) { ··· 4838 4893 * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR 4839 4894 * or POSTED_INTR_WAKEUP_VECTOR. 4840 4895 */ 4841 - vmx->pi_desc.nv = POSTED_INTR_VECTOR; 4842 - __pi_set_sn(&vmx->pi_desc); 4896 + vmx->vt.pi_desc.nv = POSTED_INTR_VECTOR; 4897 + __pi_set_sn(&vmx->vt.pi_desc); 4843 4898 } 4844 4899 4845 4900 void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ··· 5756 5811 5757 5812 static int handle_ept_violation(struct kvm_vcpu *vcpu) 5758 5813 { 5759 - unsigned long exit_qualification; 5814 + unsigned long exit_qualification = vmx_get_exit_qual(vcpu); 5760 5815 gpa_t gpa; 5761 - u64 error_code; 5762 - 5763 - exit_qualification = vmx_get_exit_qual(vcpu); 5764 5816 5765 5817 /* 5766 5818 * EPT violation happened while executing iret from NMI, ··· 5773 5831 gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); 5774 5832 trace_kvm_page_fault(vcpu, gpa, exit_qualification); 5775 5833 5776 - /* Is it a read fault? */ 5777 - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) 5778 - ? PFERR_USER_MASK : 0; 5779 - /* Is it a write fault? */ 5780 - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) 5781 - ? PFERR_WRITE_MASK : 0; 5782 - /* Is it a fetch fault? */ 5783 - error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) 5784 - ? PFERR_FETCH_MASK : 0; 5785 - /* ept page table entry is present? */ 5786 - error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK) 5787 - ? PFERR_PRESENT_MASK : 0; 5788 - 5789 - if (error_code & EPT_VIOLATION_GVA_IS_VALID) 5790 - error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? 5791 - PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; 5792 - 5793 5834 /* 5794 5835 * Check that the GPA doesn't exceed physical memory limits, as that is 5795 5836 * a guest page fault. We have to emulate the instruction here, because ··· 5784 5859 if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa))) 5785 5860 return kvm_emulate_instruction(vcpu, 0); 5786 5861 5787 - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); 5862 + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); 5788 5863 } 5789 5864 5790 5865 static int handle_ept_misconfig(struct kvm_vcpu *vcpu) ··· 5829 5904 { 5830 5905 struct vcpu_vmx *vmx = to_vmx(vcpu); 5831 5906 5832 - if (!vmx->emulation_required) 5907 + if (!vmx->vt.emulation_required) 5833 5908 return false; 5834 5909 5835 5910 /* ··· 5861 5936 intr_window_requested = exec_controls_get(vmx) & 5862 5937 CPU_BASED_INTR_WINDOW_EXITING; 5863 5938 5864 - while (vmx->emulation_required && count-- != 0) { 5939 + while (vmx->vt.emulation_required && count-- != 0) { 5865 5940 if (intr_window_requested && !vmx_interrupt_blocked(vcpu)) 5866 5941 return handle_interrupt_window(&vmx->vcpu); 5867 5942 ··· 6056 6131 * VM-Exits. Unconditionally set the flag here and leave the handling to 6057 6132 * vmx_handle_exit(). 6058 6133 */ 6059 - to_vmx(vcpu)->exit_reason.bus_lock_detected = true; 6134 + to_vt(vcpu)->exit_reason.bus_lock_detected = true; 6060 6135 return 1; 6061 6136 } 6062 6137 ··· 6154 6229 { 6155 6230 struct vcpu_vmx *vmx = to_vmx(vcpu); 6156 6231 6157 - *reason = vmx->exit_reason.full; 6232 + *reason = vmx->vt.exit_reason.full; 6158 6233 *info1 = vmx_get_exit_qual(vcpu); 6159 - if (!(vmx->exit_reason.failed_vmentry)) { 6234 + if (!(vmx->vt.exit_reason.failed_vmentry)) { 6160 6235 *info2 = vmx->idt_vectoring_info; 6161 6236 *intr_info = vmx_get_intr_info(vcpu); 6162 6237 if (is_exception_with_error_code(*intr_info)) ··· 6452 6527 static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) 6453 6528 { 6454 6529 struct vcpu_vmx *vmx = to_vmx(vcpu); 6455 - union vmx_exit_reason exit_reason = vmx->exit_reason; 6530 + union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); 6456 6531 u32 vectoring_info = vmx->idt_vectoring_info; 6457 6532 u16 exit_handler_index; 6458 6533 ··· 6508 6583 * the least awful solution for the userspace case without 6509 6584 * risking false positives. 6510 6585 */ 6511 - if (vmx->emulation_required) { 6586 + if (vmx->vt.emulation_required) { 6512 6587 nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0); 6513 6588 return 1; 6514 6589 } ··· 6518 6593 } 6519 6594 6520 6595 /* If guest state is invalid, start emulating. L2 is handled above. */ 6521 - if (vmx->emulation_required) 6596 + if (vmx->vt.emulation_required) 6522 6597 return handle_invalid_guest_state(vcpu); 6523 6598 6524 6599 if (exit_reason.failed_vmentry) { ··· 6618 6693 * Exit to user space when bus lock detected to inform that there is 6619 6694 * a bus lock in guest. 6620 6695 */ 6621 - if (to_vmx(vcpu)->exit_reason.bus_lock_detected) { 6696 + if (vmx_get_exit_reason(vcpu).bus_lock_detected) { 6622 6697 if (ret > 0) 6623 6698 vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK; 6624 6699 ··· 6897 6972 6898 6973 int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) 6899 6974 { 6900 - struct vcpu_vmx *vmx = to_vmx(vcpu); 6975 + struct vcpu_vt *vt = to_vt(vcpu); 6901 6976 int max_irr; 6902 6977 bool got_posted_interrupt; 6903 6978 6904 6979 if (KVM_BUG_ON(!enable_apicv, vcpu->kvm)) 6905 6980 return -EIO; 6906 6981 6907 - if (pi_test_on(&vmx->pi_desc)) { 6908 - pi_clear_on(&vmx->pi_desc); 6982 + if (pi_test_on(&vt->pi_desc)) { 6983 + pi_clear_on(&vt->pi_desc); 6909 6984 /* 6910 6985 * IOMMU can write to PID.ON, so the barrier matters even on UP. 6911 6986 * But on x86 this is just a compiler barrier anyway. 6912 6987 */ 6913 6988 smp_mb__after_atomic(); 6914 6989 got_posted_interrupt = 6915 - kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr); 6990 + kvm_apic_update_irr(vcpu, vt->pi_desc.pir, &max_irr); 6916 6991 } else { 6917 6992 max_irr = kvm_lapic_find_highest_irr(vcpu); 6918 6993 got_posted_interrupt = false; ··· 6950 7025 vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]); 6951 7026 vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]); 6952 7027 vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]); 6953 - } 6954 - 6955 - void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu) 6956 - { 6957 - struct vcpu_vmx *vmx = to_vmx(vcpu); 6958 - 6959 - pi_clear_on(&vmx->pi_desc); 6960 - memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir)); 6961 7028 } 6962 7029 6963 7030 void vmx_do_interrupt_irqoff(unsigned long entry); ··· 7008 7091 7009 7092 void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) 7010 7093 { 7011 - struct vcpu_vmx *vmx = to_vmx(vcpu); 7012 - 7013 - if (vmx->emulation_required) 7094 + if (to_vt(vcpu)->emulation_required) 7014 7095 return; 7015 7096 7016 - if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT) 7097 + if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXTERNAL_INTERRUPT) 7017 7098 handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu)); 7018 - else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) 7099 + else if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXCEPTION_NMI) 7019 7100 handle_exception_irqoff(vcpu, vmx_get_intr_info(vcpu)); 7020 7101 } 7021 7102 ··· 7248 7333 * the fastpath even, all other exits must use the slow path. 7249 7334 */ 7250 7335 if (is_guest_mode(vcpu) && 7251 - to_vmx(vcpu)->exit_reason.basic != EXIT_REASON_PREEMPTION_TIMER) 7336 + vmx_get_exit_reason(vcpu).basic != EXIT_REASON_PREEMPTION_TIMER) 7252 7337 return EXIT_FASTPATH_NONE; 7253 7338 7254 - switch (to_vmx(vcpu)->exit_reason.basic) { 7339 + switch (vmx_get_exit_reason(vcpu).basic) { 7255 7340 case EXIT_REASON_MSR_WRITE: 7256 7341 return handle_fastpath_set_msr_irqoff(vcpu); 7257 7342 case EXIT_REASON_PREEMPTION_TIMER: ··· 7261 7346 default: 7262 7347 return EXIT_FASTPATH_NONE; 7263 7348 } 7349 + } 7350 + 7351 + noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu) 7352 + { 7353 + if ((u16)vmx_get_exit_reason(vcpu).basic != EXIT_REASON_EXCEPTION_NMI || 7354 + !is_nmi(vmx_get_intr_info(vcpu))) 7355 + return; 7356 + 7357 + kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); 7358 + if (cpu_feature_enabled(X86_FEATURE_FRED)) 7359 + fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR); 7360 + else 7361 + vmx_do_nmi_irqoff(); 7362 + kvm_after_interrupt(vcpu); 7264 7363 } 7265 7364 7266 7365 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, ··· 7316 7387 vmx_enable_fb_clear(vmx); 7317 7388 7318 7389 if (unlikely(vmx->fail)) { 7319 - vmx->exit_reason.full = 0xdead; 7390 + vmx->vt.exit_reason.full = 0xdead; 7320 7391 goto out; 7321 7392 } 7322 7393 7323 - vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); 7324 - if (likely(!vmx->exit_reason.failed_vmentry)) 7394 + vmx->vt.exit_reason.full = vmcs_read32(VM_EXIT_REASON); 7395 + if (likely(!vmx_get_exit_reason(vcpu).failed_vmentry)) 7325 7396 vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); 7326 7397 7327 - if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI && 7328 - is_nmi(vmx_get_intr_info(vcpu))) { 7329 - kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); 7330 - if (cpu_feature_enabled(X86_FEATURE_FRED)) 7331 - fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR); 7332 - else 7333 - vmx_do_nmi_irqoff(); 7334 - kvm_after_interrupt(vcpu); 7335 - } 7398 + vmx_handle_nmi(vcpu); 7336 7399 7337 7400 out: 7338 7401 guest_state_exit_irqoff(); ··· 7345 7424 * start emulation until we arrive back to a valid state. Synthesize a 7346 7425 * consistency check VM-Exit due to invalid guest state and bail. 7347 7426 */ 7348 - if (unlikely(vmx->emulation_required)) { 7427 + if (unlikely(vmx->vt.emulation_required)) { 7349 7428 vmx->fail = 0; 7350 7429 7351 - vmx->exit_reason.full = EXIT_REASON_INVALID_STATE; 7352 - vmx->exit_reason.failed_vmentry = 1; 7430 + vmx->vt.exit_reason.full = EXIT_REASON_INVALID_STATE; 7431 + vmx->vt.exit_reason.failed_vmentry = 1; 7353 7432 kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1); 7354 - vmx->exit_qualification = ENTRY_FAIL_DEFAULT; 7433 + vmx->vt.exit_qualification = ENTRY_FAIL_DEFAULT; 7355 7434 kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2); 7356 - vmx->exit_intr_info = 0; 7435 + vmx->vt.exit_intr_info = 0; 7357 7436 return EXIT_FASTPATH_NONE; 7358 7437 } 7359 7438 ··· 7456 7535 * checking. 7457 7536 */ 7458 7537 if (vmx->nested.nested_run_pending && 7459 - !vmx->exit_reason.failed_vmentry) 7538 + !vmx_get_exit_reason(vcpu).failed_vmentry) 7460 7539 ++vcpu->stat.nested_run; 7461 7540 7462 7541 vmx->nested.nested_run_pending = 0; ··· 7465 7544 if (unlikely(vmx->fail)) 7466 7545 return EXIT_FASTPATH_NONE; 7467 7546 7468 - if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY)) 7547 + if (unlikely((u16)vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MCE_DURING_VMENTRY)) 7469 7548 kvm_machine_check(); 7470 7549 7471 7550 trace_kvm_exit(vcpu, KVM_ISA_VMX); 7472 7551 7473 - if (unlikely(vmx->exit_reason.failed_vmentry)) 7552 + if (unlikely(vmx_get_exit_reason(vcpu).failed_vmentry)) 7474 7553 return EXIT_FASTPATH_NONE; 7475 7554 7476 7555 vmx->loaded_vmcs->launched = 1; ··· 7502 7581 BUILD_BUG_ON(offsetof(struct vcpu_vmx, vcpu) != 0); 7503 7582 vmx = to_vmx(vcpu); 7504 7583 7505 - INIT_LIST_HEAD(&vmx->pi_wakeup_list); 7584 + INIT_LIST_HEAD(&vmx->vt.pi_wakeup_list); 7506 7585 7507 7586 err = -ENOMEM; 7508 7587 ··· 7600 7679 7601 7680 if (vmx_can_use_ipiv(vcpu)) 7602 7681 WRITE_ONCE(to_kvm_vmx(vcpu->kvm)->pid_table[vcpu->vcpu_id], 7603 - __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); 7682 + __pa(&vmx->vt.pi_desc) | PID_TABLE_ENTRY_VALID); 7604 7683 7605 7684 return 0; 7606 7685 ··· 7645 7724 break; 7646 7725 } 7647 7726 } 7727 + 7728 + if (enable_pml) 7729 + kvm->arch.cpu_dirty_log_size = PML_LOG_NR_ENTRIES; 7648 7730 return 0; 7731 + } 7732 + 7733 + static inline bool vmx_ignore_guest_pat(struct kvm *kvm) 7734 + { 7735 + /* 7736 + * Non-coherent DMA devices need the guest to flush CPU properly. 7737 + * In that case it is not possible to map all guest RAM as WB, so 7738 + * always trust guest PAT. 7739 + */ 7740 + return !kvm_arch_has_noncoherent_dma(kvm) && 7741 + kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT); 7649 7742 } 7650 7743 7651 7744 u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) ··· 7671 7736 if (is_mmio) 7672 7737 return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; 7673 7738 7674 - /* 7675 - * Force WB and ignore guest PAT if the VM does NOT have a non-coherent 7676 - * device attached. Letting the guest control memory types on Intel 7677 - * CPUs may result in unexpected behavior, and so KVM's ABI is to trust 7678 - * the guest to behave only as a last resort. 7679 - */ 7680 - if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) 7739 + /* Force WB if ignoring guest PAT */ 7740 + if (vmx_ignore_guest_pat(vcpu->kvm)) 7681 7741 return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; 7682 7742 7683 7743 return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); ··· 8534 8604 if (enable_ept) 8535 8605 kvm_mmu_set_ept_masks(enable_ept_ad_bits, 8536 8606 cpu_has_vmx_ept_execute_only()); 8607 + else 8608 + vt_x86_ops.get_mt_mask = NULL; 8537 8609 8538 8610 /* 8539 8611 * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID ··· 8552 8620 */ 8553 8621 if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml()) 8554 8622 enable_pml = 0; 8555 - 8556 - if (!enable_pml) 8557 - vt_x86_ops.cpu_dirty_log_size = 0; 8558 8623 8559 8624 if (!cpu_has_vmx_preemption_timer()) 8560 8625 enable_preemption_timer = false; ··· 8610 8681 8611 8682 kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); 8612 8683 8684 + /* 8685 + * On Intel CPUs that lack self-snoop feature, letting the guest control 8686 + * memory types may result in unexpected behavior. So always ignore guest 8687 + * PAT on those CPUs and map VM as writeback, not allowing userspace to 8688 + * disable the quirk. 8689 + * 8690 + * On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is 8691 + * supported, UC is slow enough to cause issues with some older guests (e.g. 8692 + * an old version of bochs driver uses ioremap() instead of ioremap_wc() to 8693 + * map the video RAM, causing wayland desktop to fail to get started 8694 + * correctly). To avoid breaking those older guests that rely on KVM to force 8695 + * memory type to WB, provide KVM_X86_QUIRK_IGNORE_GUEST_PAT to preserve the 8696 + * safer (for performance) default behavior. 8697 + * 8698 + * On top of this, non-coherent DMA devices need the guest to flush CPU 8699 + * caches properly. This also requires honoring guest PAT, and is forced 8700 + * independent of the quirk in vmx_ignore_guest_pat(). 8701 + */ 8702 + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) 8703 + kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8704 + kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8613 8705 return r; 8614 8706 } 8615 8707 ··· 8644 8694 l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; 8645 8695 } 8646 8696 8647 - static void __vmx_exit(void) 8697 + void vmx_exit(void) 8648 8698 { 8649 8699 allow_smaller_maxphyaddr = false; 8650 8700 8651 8701 vmx_cleanup_l1d_flush(); 8652 - } 8653 8702 8654 - static void __exit vmx_exit(void) 8655 - { 8656 - kvm_exit(); 8657 - __vmx_exit(); 8658 8703 kvm_x86_vendor_exit(); 8659 - 8660 8704 } 8661 - module_exit(vmx_exit); 8662 8705 8663 - static int __init vmx_init(void) 8706 + int __init vmx_init(void) 8664 8707 { 8665 8708 int r, cpu; 8666 8709 ··· 8697 8754 if (!enable_ept) 8698 8755 allow_smaller_maxphyaddr = true; 8699 8756 8700 - /* 8701 - * Common KVM initialization _must_ come last, after this, /dev/kvm is 8702 - * exposed to userspace! 8703 - */ 8704 - r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), 8705 - THIS_MODULE); 8706 - if (r) 8707 - goto err_kvm_init; 8708 - 8709 8757 return 0; 8710 8758 8711 - err_kvm_init: 8712 - __vmx_exit(); 8713 8759 err_l1d_flush: 8714 8760 kvm_x86_vendor_exit(); 8715 8761 return r; 8716 8762 } 8717 - module_init(vmx_init);

+43 -97

arch/x86/kvm/vmx/vmx.h

··· 11 11 12 12 #include "capabilities.h" 13 13 #include "../kvm_cache_regs.h" 14 + #include "pmu_intel.h" 14 15 #include "vmcs.h" 15 16 #include "vmx_ops.h" 16 17 #include "../cpuid.h" 17 18 #include "run_flags.h" 18 19 #include "../mmu.h" 20 + #include "common.h" 19 21 20 22 #define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) 21 23 ··· 68 66 struct pt_ctx host; 69 67 struct pt_ctx guest; 70 68 }; 71 - 72 - union vmx_exit_reason { 73 - struct { 74 - u32 basic : 16; 75 - u32 reserved16 : 1; 76 - u32 reserved17 : 1; 77 - u32 reserved18 : 1; 78 - u32 reserved19 : 1; 79 - u32 reserved20 : 1; 80 - u32 reserved21 : 1; 81 - u32 reserved22 : 1; 82 - u32 reserved23 : 1; 83 - u32 reserved24 : 1; 84 - u32 reserved25 : 1; 85 - u32 bus_lock_detected : 1; 86 - u32 enclave_mode : 1; 87 - u32 smi_pending_mtf : 1; 88 - u32 smi_from_vmx_root : 1; 89 - u32 reserved30 : 1; 90 - u32 failed_vmentry : 1; 91 - }; 92 - u32 full; 93 - }; 94 - 95 - struct lbr_desc { 96 - /* Basic info about guest LBR records. */ 97 - struct x86_pmu_lbr records; 98 - 99 - /* 100 - * Emulate LBR feature via passthrough LBR registers when the 101 - * per-vcpu guest LBR event is scheduled on the current pcpu. 102 - * 103 - * The records may be inaccurate if the host reclaims the LBR. 104 - */ 105 - struct perf_event *event; 106 - 107 - /* True if LBRs are marked as not intercepted in the MSR bitmap */ 108 - bool msr_passthrough; 109 - }; 110 - 111 - extern struct x86_pmu_lbr vmx_lbr_caps; 112 69 113 70 /* 114 71 * The nested_vmx structure is part of vcpu_vmx, and holds information we need ··· 209 248 210 249 struct vcpu_vmx { 211 250 struct kvm_vcpu vcpu; 251 + struct vcpu_vt vt; 212 252 u8 fail; 213 253 u8 x2apic_msr_bitmap_mode; 214 254 215 - /* 216 - * If true, host state has been stored in vmx->loaded_vmcs for 217 - * the CPU registers that only need to be switched when transitioning 218 - * to/from the kernel, and the registers have been loaded with guest 219 - * values. If false, host state is loaded in the CPU registers 220 - * and vmx->loaded_vmcs->host_state is invalid. 221 - */ 222 - bool guest_state_loaded; 223 - 224 - unsigned long exit_qualification; 225 - u32 exit_intr_info; 226 255 u32 idt_vectoring_info; 227 256 ulong rflags; 228 257 ··· 225 274 struct vmx_uret_msr guest_uret_msrs[MAX_NR_USER_RETURN_MSRS]; 226 275 bool guest_uret_msrs_loaded; 227 276 #ifdef CONFIG_X86_64 228 - u64 msr_host_kernel_gs_base; 229 277 u64 msr_guest_kernel_gs_base; 230 278 #endif 231 279 ··· 263 313 } seg[8]; 264 314 } segment_cache; 265 315 int vpid; 266 - bool emulation_required; 267 - 268 - union vmx_exit_reason exit_reason; 269 - 270 - /* Posted interrupt descriptor */ 271 - struct pi_desc pi_desc; 272 - 273 - /* Used if this vCPU is waiting for PI notification wakeup. */ 274 - struct list_head pi_wakeup_list; 275 316 276 317 /* Support for a guest hypervisor (nested VMX) */ 277 318 struct nested_vmx nested; ··· 316 375 /* Posted Interrupt Descriptor (PID) table for IPI virtualization */ 317 376 u64 *pid_table; 318 377 }; 378 + 379 + static __always_inline struct vcpu_vt *to_vt(struct kvm_vcpu *vcpu) 380 + { 381 + return &(container_of(vcpu, struct vcpu_vmx, vcpu)->vt); 382 + } 383 + 384 + static __always_inline struct kvm_vcpu *vt_to_vcpu(struct vcpu_vt *vt) 385 + { 386 + return &(container_of(vt, struct vcpu_vmx, vt)->vcpu); 387 + } 388 + 389 + static __always_inline union vmx_exit_reason vmx_get_exit_reason(struct kvm_vcpu *vcpu) 390 + { 391 + return to_vt(vcpu)->exit_reason; 392 + } 393 + 394 + static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu) 395 + { 396 + struct vcpu_vt *vt = to_vt(vcpu); 397 + 398 + if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1) && 399 + !WARN_ON_ONCE(is_td_vcpu(vcpu))) 400 + vt->exit_qualification = vmcs_readl(EXIT_QUALIFICATION); 401 + 402 + return vt->exit_qualification; 403 + } 404 + 405 + static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu) 406 + { 407 + struct vcpu_vt *vt = to_vt(vcpu); 408 + 409 + if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2) && 410 + !WARN_ON_ONCE(is_td_vcpu(vcpu))) 411 + vt->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); 412 + 413 + return vt->exit_intr_info; 414 + } 319 415 320 416 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu, 321 417 struct loaded_vmcs *buddy); ··· 640 662 return container_of(vcpu, struct vcpu_vmx, vcpu); 641 663 } 642 664 643 - static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) 644 - { 645 - return &to_vmx(vcpu)->lbr_desc; 646 - } 647 - 648 - static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu) 649 - { 650 - return &vcpu_to_lbr_desc(vcpu)->records; 651 - } 652 - 653 - static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) 654 - { 655 - return !!vcpu_to_lbr_records(vcpu)->nr; 656 - } 657 - 658 665 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu); 659 666 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); 660 667 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu); 661 - 662 - static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu) 663 - { 664 - struct vcpu_vmx *vmx = to_vmx(vcpu); 665 - 666 - if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1)) 667 - vmx->exit_qualification = vmcs_readl(EXIT_QUALIFICATION); 668 - 669 - return vmx->exit_qualification; 670 - } 671 - 672 - static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu) 673 - { 674 - struct vcpu_vmx *vmx = to_vmx(vcpu); 675 - 676 - if (!kvm_register_test_and_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2)) 677 - vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); 678 - 679 - return vmx->exit_intr_info; 680 - } 681 668 682 669 struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags); 683 670 void free_vmcs(struct vmcs *vmcs); ··· 700 757 { 701 758 vmx->segment_cache.bitmask = 0; 702 759 } 760 + 761 + int vmx_init(void); 762 + void vmx_exit(void); 703 763 704 764 #endif /* __KVM_X86_VMX_H */

+110 -1

arch/x86/kvm/vmx/x86_ops.h

··· 46 46 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu); 47 47 void vmx_migrate_timers(struct kvm_vcpu *vcpu); 48 48 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu); 49 - void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu); 50 49 void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr); 51 50 int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu); 52 51 void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, ··· 119 120 void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); 120 121 #endif 121 122 void vmx_setup_mce(struct kvm_vcpu *vcpu); 123 + 124 + #ifdef CONFIG_KVM_INTEL_TDX 125 + void tdx_disable_virtualization_cpu(void); 126 + int tdx_vm_init(struct kvm *kvm); 127 + void tdx_mmu_release_hkid(struct kvm *kvm); 128 + void tdx_vm_destroy(struct kvm *kvm); 129 + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); 130 + 131 + int tdx_vcpu_create(struct kvm_vcpu *vcpu); 132 + void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); 133 + void tdx_vcpu_free(struct kvm_vcpu *vcpu); 134 + void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); 135 + int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); 136 + fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); 137 + void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); 138 + void tdx_vcpu_put(struct kvm_vcpu *vcpu); 139 + bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu); 140 + int tdx_handle_exit(struct kvm_vcpu *vcpu, 141 + enum exit_fastpath_completion fastpath); 142 + 143 + void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, 144 + int trig_mode, int vector); 145 + void tdx_inject_nmi(struct kvm_vcpu *vcpu); 146 + void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, 147 + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); 148 + bool tdx_has_emulated_msr(u32 index); 149 + int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); 150 + int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); 151 + 152 + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); 153 + 154 + int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, 155 + enum pg_level level, void *private_spt); 156 + int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, 157 + enum pg_level level, void *private_spt); 158 + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, 159 + enum pg_level level, kvm_pfn_t pfn); 160 + int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, 161 + enum pg_level level, kvm_pfn_t pfn); 162 + 163 + void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); 164 + void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); 165 + void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); 166 + int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); 167 + #else 168 + static inline void tdx_disable_virtualization_cpu(void) {} 169 + static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } 170 + static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} 171 + static inline void tdx_vm_destroy(struct kvm *kvm) {} 172 + static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } 173 + 174 + static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } 175 + static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} 176 + static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} 177 + static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} 178 + static inline int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } 179 + static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) 180 + { 181 + return EXIT_FASTPATH_NONE; 182 + } 183 + static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} 184 + static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} 185 + static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { return false; } 186 + static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, 187 + enum exit_fastpath_completion fastpath) { return 0; } 188 + 189 + static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, 190 + int trig_mode, int vector) {} 191 + static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {} 192 + static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, 193 + u64 *info2, u32 *intr_info, u32 *error_code) {} 194 + static inline bool tdx_has_emulated_msr(u32 index) { return false; } 195 + static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } 196 + static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } 197 + 198 + static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } 199 + 200 + static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, 201 + enum pg_level level, 202 + void *private_spt) 203 + { 204 + return -EOPNOTSUPP; 205 + } 206 + 207 + static inline int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, 208 + enum pg_level level, 209 + void *private_spt) 210 + { 211 + return -EOPNOTSUPP; 212 + } 213 + 214 + static inline int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, 215 + enum pg_level level, 216 + kvm_pfn_t pfn) 217 + { 218 + return -EOPNOTSUPP; 219 + } 220 + 221 + static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, 222 + enum pg_level level, 223 + kvm_pfn_t pfn) 224 + { 225 + return -EOPNOTSUPP; 226 + } 227 + 228 + static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} 229 + static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} 230 + static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} 231 + static inline int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; } 232 + #endif 122 233 123 234 #endif /* __KVM_X86_VMX_X86_OPS_H */

+68 -31

arch/x86/kvm/x86.c

··· 90 90 #include "trace.h" 91 91 92 92 #define MAX_IO_MSRS 256 93 - #define KVM_MAX_MCE_BANKS 32 94 93 95 94 /* 96 95 * Note, kvm_caps fields should *never* have default values, all fields must be ··· 635 636 } 636 637 } 637 638 639 + static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs) 640 + { 641 + if (!msrs->registered) { 642 + msrs->urn.on_user_return = kvm_on_user_return; 643 + user_return_notifier_register(&msrs->urn); 644 + msrs->registered = true; 645 + } 646 + } 647 + 638 648 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask) 639 649 { 640 650 struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs); ··· 657 649 return 1; 658 650 659 651 msrs->values[slot].curr = value; 660 - if (!msrs->registered) { 661 - msrs->urn.on_user_return = kvm_on_user_return; 662 - user_return_notifier_register(&msrs->urn); 663 - msrs->registered = true; 664 - } 652 + kvm_user_return_register_notifier(msrs); 665 653 return 0; 666 654 } 667 655 EXPORT_SYMBOL_GPL(kvm_set_user_return_msr); 656 + 657 + void kvm_user_return_msr_update_cache(unsigned int slot, u64 value) 658 + { 659 + struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs); 660 + 661 + msrs->values[slot].curr = value; 662 + kvm_user_return_register_notifier(msrs); 663 + } 664 + EXPORT_SYMBOL_GPL(kvm_user_return_msr_update_cache); 668 665 669 666 static void drop_user_return_notifiers(void) 670 667 { ··· 4752 4739 break; 4753 4740 case KVM_CAP_MAX_VCPUS: 4754 4741 r = KVM_MAX_VCPUS; 4742 + if (kvm) 4743 + r = kvm->max_vcpus; 4755 4744 break; 4756 4745 case KVM_CAP_MAX_VCPU_ID: 4757 4746 r = KVM_MAX_VCPU_IDS; ··· 4809 4794 r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0; 4810 4795 break; 4811 4796 case KVM_CAP_DISABLE_QUIRKS2: 4812 - r = KVM_X86_VALID_QUIRKS; 4797 + r = kvm_caps.supported_quirks; 4813 4798 break; 4814 4799 case KVM_CAP_X86_NOTIFY_VMEXIT: 4815 4800 r = kvm_caps.has_notify_vmexit; ··· 5132 5117 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, 5133 5118 struct kvm_lapic_state *s) 5134 5119 { 5120 + if (vcpu->arch.apic->guest_apic_protected) 5121 + return -EINVAL; 5122 + 5135 5123 kvm_x86_call(sync_pir_to_irr)(vcpu); 5136 5124 5137 5125 return kvm_apic_get_state(vcpu, s); ··· 5144 5126 struct kvm_lapic_state *s) 5145 5127 { 5146 5128 int r; 5129 + 5130 + if (vcpu->arch.apic->guest_apic_protected) 5131 + return -EINVAL; 5147 5132 5148 5133 r = kvm_apic_set_state(vcpu, s); 5149 5134 if (r) ··· 6325 6304 case KVM_SET_DEVICE_ATTR: 6326 6305 r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); 6327 6306 break; 6307 + case KVM_MEMORY_ENCRYPT_OP: 6308 + r = -ENOTTY; 6309 + if (!kvm_x86_ops.vcpu_mem_enc_ioctl) 6310 + goto out; 6311 + r = kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp); 6312 + break; 6328 6313 default: 6329 6314 r = -EINVAL; 6330 6315 } ··· 6518 6491 struct kvm_vcpu *vcpu; 6519 6492 unsigned long i; 6520 6493 6521 - if (!kvm_x86_ops.cpu_dirty_log_size) 6494 + if (!kvm->arch.cpu_dirty_log_size) 6522 6495 return; 6523 6496 6524 6497 kvm_for_each_vcpu(i, vcpu, kvm) ··· 6548 6521 switch (cap->cap) { 6549 6522 case KVM_CAP_DISABLE_QUIRKS2: 6550 6523 r = -EINVAL; 6551 - if (cap->args[0] & ~KVM_X86_VALID_QUIRKS) 6524 + if (cap->args[0] & ~kvm_caps.supported_quirks) 6552 6525 break; 6553 6526 fallthrough; 6554 6527 case KVM_CAP_DISABLE_QUIRKS: 6555 - kvm->arch.disabled_quirks = cap->args[0]; 6528 + kvm->arch.disabled_quirks |= cap->args[0] & kvm_caps.supported_quirks; 6556 6529 r = 0; 6557 6530 break; 6558 6531 case KVM_CAP_SPLIT_IRQCHIP: { ··· 7327 7300 goto out; 7328 7301 } 7329 7302 case KVM_MEMORY_ENCRYPT_OP: { 7330 - r = -ENOTTY; 7331 - if (!kvm_x86_ops.mem_enc_ioctl) 7332 - goto out; 7333 - 7334 7303 r = kvm_x86_call(mem_enc_ioctl)(kvm, argp); 7335 7304 break; 7336 7305 } ··· 9794 9771 kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); 9795 9772 kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; 9796 9773 } 9774 + kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS; 9775 + kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS; 9797 9776 9798 9777 rdmsrq_safe(MSR_EFER, &kvm_host.efer); 9799 9778 ··· 9839 9814 9840 9815 if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled) 9841 9816 kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM); 9817 + 9818 + /* KVM always ignores guest PAT for shadow paging. */ 9819 + if (!tdp_enabled) 9820 + kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 9842 9821 9843 9822 if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) 9844 9823 kvm_caps.supported_xss = 0; ··· 10052 10023 return kvm_skip_emulated_instruction(vcpu); 10053 10024 } 10054 10025 10055 - int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, 10056 - unsigned long a0, unsigned long a1, 10057 - unsigned long a2, unsigned long a3, 10058 - int op_64_bit, int cpl, 10026 + int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl, 10059 10027 int (*complete_hypercall)(struct kvm_vcpu *)) 10060 10028 { 10061 10029 unsigned long ret; 10030 + unsigned long nr = kvm_rax_read(vcpu); 10031 + unsigned long a0 = kvm_rbx_read(vcpu); 10032 + unsigned long a1 = kvm_rcx_read(vcpu); 10033 + unsigned long a2 = kvm_rdx_read(vcpu); 10034 + unsigned long a3 = kvm_rsi_read(vcpu); 10035 + int op_64_bit = is_64_bit_hypercall(vcpu); 10062 10036 10063 10037 ++vcpu->stat.hypercalls; 10064 10038 ··· 10164 10132 if (kvm_hv_hypercall_enabled(vcpu)) 10165 10133 return kvm_hv_hypercall(vcpu); 10166 10134 10167 - return __kvm_emulate_hypercall(vcpu, rax, rbx, rcx, rdx, rsi, 10168 - is_64_bit_hypercall(vcpu), 10169 - kvm_x86_call(get_cpl)(vcpu), 10135 + return __kvm_emulate_hypercall(vcpu, kvm_x86_call(get_cpl)(vcpu), 10170 10136 complete_hypercall_exit); 10171 10137 } 10172 10138 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); ··· 11008 10978 if (vcpu->arch.guest_fpu.xfd_err) 11009 10979 wrmsrq(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); 11010 10980 11011 - if (unlikely(vcpu->arch.switch_db_regs)) { 10981 + if (unlikely(vcpu->arch.switch_db_regs && 10982 + !(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))) { 11012 10983 set_debugreg(0, 7); 11013 10984 set_debugreg(vcpu->arch.eff_db[0], 0); 11014 10985 set_debugreg(vcpu->arch.eff_db[1], 1); ··· 11061 11030 */ 11062 11031 if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) { 11063 11032 WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP); 11033 + WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH); 11064 11034 kvm_x86_call(sync_dirty_debug_regs)(vcpu); 11065 11035 kvm_update_dr0123(vcpu); 11066 11036 kvm_update_dr7(vcpu); ··· 11166 11134 !vcpu->arch.apf.halted); 11167 11135 } 11168 11136 11169 - static bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu) 11137 + bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu) 11170 11138 { 11171 11139 if (!list_empty_careful(&vcpu->async_pf.done)) 11172 11140 return true; 11173 11141 11174 11142 if (kvm_apic_has_pending_init_or_sipi(vcpu) && 11175 11143 kvm_apic_init_sipi_allowed(vcpu)) 11176 - return true; 11177 - 11178 - if (vcpu->arch.pv.pv_unhalted) 11179 11144 return true; 11180 11145 11181 11146 if (kvm_is_exception_pending(vcpu)) ··· 11212 11183 11213 11184 return false; 11214 11185 } 11186 + EXPORT_SYMBOL_GPL(kvm_vcpu_has_events); 11215 11187 11216 11188 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 11217 11189 { 11218 - return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu); 11190 + return kvm_vcpu_running(vcpu) || vcpu->arch.pv.pv_unhalted || 11191 + kvm_vcpu_has_events(vcpu); 11219 11192 } 11220 11193 11221 11194 /* Called within kvm->srcu read side. */ ··· 11351 11320 */ 11352 11321 ++vcpu->stat.halt_exits; 11353 11322 if (lapic_in_kernel(vcpu)) { 11354 - if (kvm_vcpu_has_events(vcpu)) 11323 + if (kvm_vcpu_has_events(vcpu) || vcpu->arch.pv.pv_unhalted) 11355 11324 state = KVM_MP_STATE_RUNNABLE; 11356 11325 kvm_set_mp_state(vcpu, state); 11357 11326 return 1; ··· 12725 12694 { 12726 12695 return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id; 12727 12696 } 12697 + EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp); 12728 12698 12729 12699 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) 12730 12700 { ··· 12755 12723 /* Decided by the vendor code for other VM types. */ 12756 12724 kvm->arch.pre_fault_allowed = 12757 12725 type == KVM_X86_DEFAULT_VM || type == KVM_X86_SW_PROTECTED_VM; 12726 + kvm->arch.disabled_quirks = kvm_caps.inapplicable_quirks & kvm_caps.supported_quirks; 12758 12727 12759 12728 ret = kvm_page_track_init(kvm); 12760 12729 if (ret) ··· 12909 12876 kvm_free_pit(kvm); 12910 12877 12911 12878 kvm_mmu_pre_destroy_vm(kvm); 12879 + static_call_cond(kvm_x86_vm_pre_destroy)(kvm); 12912 12880 } 12913 12881 12914 12882 void kvm_arch_destroy_vm(struct kvm *kvm) ··· 13107 13073 { 13108 13074 int nr_slots; 13109 13075 13110 - if (!kvm_x86_ops.cpu_dirty_log_size) 13076 + if (!kvm->arch.cpu_dirty_log_size) 13111 13077 return; 13112 13078 13113 13079 nr_slots = atomic_read(&kvm->nr_memslots_dirty_logging); ··· 13179 13145 if (READ_ONCE(eager_page_split)) 13180 13146 kvm_mmu_slot_try_split_huge_pages(kvm, new, PG_LEVEL_4K); 13181 13147 13182 - if (kvm_x86_ops.cpu_dirty_log_size) { 13148 + if (kvm->arch.cpu_dirty_log_size) { 13183 13149 kvm_mmu_slot_leaf_clear_dirty(kvm, new); 13184 13150 kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_2M); 13185 13151 } else { ··· 13568 13534 * due to toggling the "ignore PAT" bit. Zap all SPTEs when the first 13569 13535 * (or last) non-coherent device is (un)registered to so that new SPTEs 13570 13536 * with the correct "ignore guest PAT" setting are created. 13537 + * 13538 + * If KVM always honors guest PAT, however, there is nothing to do. 13571 13539 */ 13572 - if (kvm_mmu_may_ignore_guest_pat()) 13540 + if (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT)) 13573 13541 kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); 13574 13542 } 13575 13543 ··· 14048 14012 14049 14013 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); 14050 14014 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); 14015 + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio); 14051 14016 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); 14052 14017 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); 14053 14018 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);

+14 -17

arch/x86/kvm/x86.h

··· 10 10 #include "kvm_emulate.h" 11 11 #include "cpuid.h" 12 12 13 + #define KVM_MAX_MCE_BANKS 32 14 + 13 15 struct kvm_caps { 14 16 /* control of guest tsc rate supported? */ 15 17 bool has_tsc_control; ··· 34 32 u64 supported_xcr0; 35 33 u64 supported_xss; 36 34 u64 supported_perf_cap; 35 + 36 + u64 supported_quirks; 37 + u64 inapplicable_quirks; 37 38 }; 38 39 39 40 struct kvm_host_values { ··· 634 629 return kvm->arch.hypercall_exit_enabled & BIT(hc_nr); 635 630 } 636 631 637 - int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, 638 - unsigned long a0, unsigned long a1, 639 - unsigned long a2, unsigned long a3, 640 - int op_64_bit, int cpl, 632 + int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl, 641 633 int (*complete_hypercall)(struct kvm_vcpu *)); 642 634 643 - #define __kvm_emulate_hypercall(_vcpu, nr, a0, a1, a2, a3, op_64_bit, cpl, complete_hypercall) \ 644 - ({ \ 645 - int __ret; \ 646 - \ 647 - __ret = ____kvm_emulate_hypercall(_vcpu, \ 648 - kvm_##nr##_read(_vcpu), kvm_##a0##_read(_vcpu), \ 649 - kvm_##a1##_read(_vcpu), kvm_##a2##_read(_vcpu), \ 650 - kvm_##a3##_read(_vcpu), op_64_bit, cpl, \ 651 - complete_hypercall); \ 652 - \ 653 - if (__ret > 0) \ 654 - __ret = complete_hypercall(_vcpu); \ 655 - __ret; \ 635 + #define __kvm_emulate_hypercall(_vcpu, cpl, complete_hypercall) \ 636 + ({ \ 637 + int __ret; \ 638 + __ret = ____kvm_emulate_hypercall(_vcpu, cpl, complete_hypercall); \ 639 + \ 640 + if (__ret > 0) \ 641 + __ret = complete_hypercall(_vcpu); \ 642 + __ret; \ 656 643 }) 657 644 658 645 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);

+3

arch/x86/virt/vmx/tdx/seamcall.S

··· 41 41 TDX_MODULE_CALL host=1 ret=1 42 42 SYM_FUNC_END(__seamcall_ret) 43 43 44 + /* KVM requires non-instrumentable __seamcall_saved_ret() for TDH.VP.ENTER */ 45 + .section .noinstr.text, "ax" 46 + 44 47 /* 45 48 * __seamcall_saved_ret() - Host-side interface functions to SEAM software 46 49 * (the P-SEAMLDR or the TDX module), with saving output registers to the

+418 -5

arch/x86/virt/vmx/tdx/tdx.c

··· 5 5 * Intel Trusted Domain Extensions (TDX) support 6 6 */ 7 7 8 + #include "asm/page_types.h" 8 9 #define pr_fmt(fmt) "virt/tdx: " fmt 9 10 10 11 #include <linux/types.h> ··· 28 27 #include <linux/log2.h> 29 28 #include <linux/acpi.h> 30 29 #include <linux/suspend.h> 30 + #include <linux/idr.h> 31 31 #include <asm/page.h> 32 32 #include <asm/special_insns.h> 33 33 #include <asm/msr-index.h> ··· 44 42 static u32 tdx_guest_keyid_start __ro_after_init; 45 43 static u32 tdx_nr_guest_keyids __ro_after_init; 46 44 45 + static DEFINE_IDA(tdx_guest_keyid_pool); 46 + 47 47 static DEFINE_PER_CPU(bool, tdx_lp_initialized); 48 48 49 49 static struct tdmr_info_list tdx_tdmr_list; ··· 55 51 56 52 /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ 57 53 static LIST_HEAD(tdx_memlist); 54 + 55 + static struct tdx_sys_info tdx_sysinfo; 58 56 59 57 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args); 60 58 ··· 1066 1060 1067 1061 static int init_tdx_module(void) 1068 1062 { 1069 - struct tdx_sys_info sysinfo; 1070 1063 int ret; 1071 1064 1072 - ret = get_tdx_sys_info(&sysinfo); 1065 + ret = get_tdx_sys_info(&tdx_sysinfo); 1073 1066 if (ret) 1074 1067 return ret; 1075 1068 1076 1069 /* Check whether the kernel can support this module */ 1077 - ret = check_features(&sysinfo); 1070 + ret = check_features(&tdx_sysinfo); 1078 1071 if (ret) 1079 1072 return ret; 1080 1073 ··· 1094 1089 goto out_put_tdxmem; 1095 1090 1096 1091 /* Allocate enough space for constructing TDMRs */ 1097 - ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr); 1092 + ret = alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); 1098 1093 if (ret) 1099 1094 goto err_free_tdxmem; 1100 1095 1101 1096 /* Cover all TDX-usable memory regions in TDMRs */ 1102 - ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr); 1097 + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdx_sysinfo.tdmr); 1103 1098 if (ret) 1104 1099 goto err_free_tdmrs; 1105 1100 ··· 1461 1456 1462 1457 check_tdx_erratum(); 1463 1458 } 1459 + 1460 + const struct tdx_sys_info *tdx_get_sysinfo(void) 1461 + { 1462 + const struct tdx_sys_info *p = NULL; 1463 + 1464 + /* Make sure all fields in @tdx_sysinfo have been populated */ 1465 + mutex_lock(&tdx_module_lock); 1466 + if (tdx_module_status == TDX_MODULE_INITIALIZED) 1467 + p = (const struct tdx_sys_info *)&tdx_sysinfo; 1468 + mutex_unlock(&tdx_module_lock); 1469 + 1470 + return p; 1471 + } 1472 + EXPORT_SYMBOL_GPL(tdx_get_sysinfo); 1473 + 1474 + u32 tdx_get_nr_guest_keyids(void) 1475 + { 1476 + return tdx_nr_guest_keyids; 1477 + } 1478 + EXPORT_SYMBOL_GPL(tdx_get_nr_guest_keyids); 1479 + 1480 + int tdx_guest_keyid_alloc(void) 1481 + { 1482 + return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, 1483 + tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, 1484 + GFP_KERNEL); 1485 + } 1486 + EXPORT_SYMBOL_GPL(tdx_guest_keyid_alloc); 1487 + 1488 + void tdx_guest_keyid_free(unsigned int keyid) 1489 + { 1490 + ida_free(&tdx_guest_keyid_pool, keyid); 1491 + } 1492 + EXPORT_SYMBOL_GPL(tdx_guest_keyid_free); 1493 + 1494 + static inline u64 tdx_tdr_pa(struct tdx_td *td) 1495 + { 1496 + return page_to_phys(td->tdr_page); 1497 + } 1498 + 1499 + static inline u64 tdx_tdvpr_pa(struct tdx_vp *td) 1500 + { 1501 + return page_to_phys(td->tdvpr_page); 1502 + } 1503 + 1504 + /* 1505 + * The TDX module exposes a CLFLUSH_BEFORE_ALLOC bit to specify whether 1506 + * a CLFLUSH of pages is required before handing them to the TDX module. 1507 + * Be conservative and make the code simpler by doing the CLFLUSH 1508 + * unconditionally. 1509 + */ 1510 + static void tdx_clflush_page(struct page *page) 1511 + { 1512 + clflush_cache_range(page_to_virt(page), PAGE_SIZE); 1513 + } 1514 + 1515 + noinstr __flatten u64 tdh_vp_enter(struct tdx_vp *td, struct tdx_module_args *args) 1516 + { 1517 + args->rcx = tdx_tdvpr_pa(td); 1518 + 1519 + return __seamcall_saved_ret(TDH_VP_ENTER, args); 1520 + } 1521 + EXPORT_SYMBOL_GPL(tdh_vp_enter); 1522 + 1523 + u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page) 1524 + { 1525 + struct tdx_module_args args = { 1526 + .rcx = page_to_phys(tdcs_page), 1527 + .rdx = tdx_tdr_pa(td), 1528 + }; 1529 + 1530 + tdx_clflush_page(tdcs_page); 1531 + return seamcall(TDH_MNG_ADDCX, &args); 1532 + } 1533 + EXPORT_SYMBOL_GPL(tdh_mng_addcx); 1534 + 1535 + u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2) 1536 + { 1537 + struct tdx_module_args args = { 1538 + .rcx = gpa, 1539 + .rdx = tdx_tdr_pa(td), 1540 + .r8 = page_to_phys(page), 1541 + .r9 = page_to_phys(source), 1542 + }; 1543 + u64 ret; 1544 + 1545 + tdx_clflush_page(page); 1546 + ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args); 1547 + 1548 + *ext_err1 = args.rcx; 1549 + *ext_err2 = args.rdx; 1550 + 1551 + return ret; 1552 + } 1553 + EXPORT_SYMBOL_GPL(tdh_mem_page_add); 1554 + 1555 + u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2) 1556 + { 1557 + struct tdx_module_args args = { 1558 + .rcx = gpa | level, 1559 + .rdx = tdx_tdr_pa(td), 1560 + .r8 = page_to_phys(page), 1561 + }; 1562 + u64 ret; 1563 + 1564 + tdx_clflush_page(page); 1565 + ret = seamcall_ret(TDH_MEM_SEPT_ADD, &args); 1566 + 1567 + *ext_err1 = args.rcx; 1568 + *ext_err2 = args.rdx; 1569 + 1570 + return ret; 1571 + } 1572 + EXPORT_SYMBOL_GPL(tdh_mem_sept_add); 1573 + 1574 + u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page) 1575 + { 1576 + struct tdx_module_args args = { 1577 + .rcx = page_to_phys(tdcx_page), 1578 + .rdx = tdx_tdvpr_pa(vp), 1579 + }; 1580 + 1581 + tdx_clflush_page(tdcx_page); 1582 + return seamcall(TDH_VP_ADDCX, &args); 1583 + } 1584 + EXPORT_SYMBOL_GPL(tdh_vp_addcx); 1585 + 1586 + u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2) 1587 + { 1588 + struct tdx_module_args args = { 1589 + .rcx = gpa | level, 1590 + .rdx = tdx_tdr_pa(td), 1591 + .r8 = page_to_phys(page), 1592 + }; 1593 + u64 ret; 1594 + 1595 + tdx_clflush_page(page); 1596 + ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); 1597 + 1598 + *ext_err1 = args.rcx; 1599 + *ext_err2 = args.rdx; 1600 + 1601 + return ret; 1602 + } 1603 + EXPORT_SYMBOL_GPL(tdh_mem_page_aug); 1604 + 1605 + u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2) 1606 + { 1607 + struct tdx_module_args args = { 1608 + .rcx = gpa | level, 1609 + .rdx = tdx_tdr_pa(td), 1610 + }; 1611 + u64 ret; 1612 + 1613 + ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &args); 1614 + 1615 + *ext_err1 = args.rcx; 1616 + *ext_err2 = args.rdx; 1617 + 1618 + return ret; 1619 + } 1620 + EXPORT_SYMBOL_GPL(tdh_mem_range_block); 1621 + 1622 + u64 tdh_mng_key_config(struct tdx_td *td) 1623 + { 1624 + struct tdx_module_args args = { 1625 + .rcx = tdx_tdr_pa(td), 1626 + }; 1627 + 1628 + return seamcall(TDH_MNG_KEY_CONFIG, &args); 1629 + } 1630 + EXPORT_SYMBOL_GPL(tdh_mng_key_config); 1631 + 1632 + u64 tdh_mng_create(struct tdx_td *td, u16 hkid) 1633 + { 1634 + struct tdx_module_args args = { 1635 + .rcx = tdx_tdr_pa(td), 1636 + .rdx = hkid, 1637 + }; 1638 + 1639 + tdx_clflush_page(td->tdr_page); 1640 + return seamcall(TDH_MNG_CREATE, &args); 1641 + } 1642 + EXPORT_SYMBOL_GPL(tdh_mng_create); 1643 + 1644 + u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp) 1645 + { 1646 + struct tdx_module_args args = { 1647 + .rcx = tdx_tdvpr_pa(vp), 1648 + .rdx = tdx_tdr_pa(td), 1649 + }; 1650 + 1651 + tdx_clflush_page(vp->tdvpr_page); 1652 + return seamcall(TDH_VP_CREATE, &args); 1653 + } 1654 + EXPORT_SYMBOL_GPL(tdh_vp_create); 1655 + 1656 + u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data) 1657 + { 1658 + struct tdx_module_args args = { 1659 + .rcx = tdx_tdr_pa(td), 1660 + .rdx = field, 1661 + }; 1662 + u64 ret; 1663 + 1664 + ret = seamcall_ret(TDH_MNG_RD, &args); 1665 + 1666 + /* R8: Content of the field, or 0 in case of error. */ 1667 + *data = args.r8; 1668 + 1669 + return ret; 1670 + } 1671 + EXPORT_SYMBOL_GPL(tdh_mng_rd); 1672 + 1673 + u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2) 1674 + { 1675 + struct tdx_module_args args = { 1676 + .rcx = gpa, 1677 + .rdx = tdx_tdr_pa(td), 1678 + }; 1679 + u64 ret; 1680 + 1681 + ret = seamcall_ret(TDH_MR_EXTEND, &args); 1682 + 1683 + *ext_err1 = args.rcx; 1684 + *ext_err2 = args.rdx; 1685 + 1686 + return ret; 1687 + } 1688 + EXPORT_SYMBOL_GPL(tdh_mr_extend); 1689 + 1690 + u64 tdh_mr_finalize(struct tdx_td *td) 1691 + { 1692 + struct tdx_module_args args = { 1693 + .rcx = tdx_tdr_pa(td), 1694 + }; 1695 + 1696 + return seamcall(TDH_MR_FINALIZE, &args); 1697 + } 1698 + EXPORT_SYMBOL_GPL(tdh_mr_finalize); 1699 + 1700 + u64 tdh_vp_flush(struct tdx_vp *vp) 1701 + { 1702 + struct tdx_module_args args = { 1703 + .rcx = tdx_tdvpr_pa(vp), 1704 + }; 1705 + 1706 + return seamcall(TDH_VP_FLUSH, &args); 1707 + } 1708 + EXPORT_SYMBOL_GPL(tdh_vp_flush); 1709 + 1710 + u64 tdh_mng_vpflushdone(struct tdx_td *td) 1711 + { 1712 + struct tdx_module_args args = { 1713 + .rcx = tdx_tdr_pa(td), 1714 + }; 1715 + 1716 + return seamcall(TDH_MNG_VPFLUSHDONE, &args); 1717 + } 1718 + EXPORT_SYMBOL_GPL(tdh_mng_vpflushdone); 1719 + 1720 + u64 tdh_mng_key_freeid(struct tdx_td *td) 1721 + { 1722 + struct tdx_module_args args = { 1723 + .rcx = tdx_tdr_pa(td), 1724 + }; 1725 + 1726 + return seamcall(TDH_MNG_KEY_FREEID, &args); 1727 + } 1728 + EXPORT_SYMBOL_GPL(tdh_mng_key_freeid); 1729 + 1730 + u64 tdh_mng_init(struct tdx_td *td, u64 td_params, u64 *extended_err) 1731 + { 1732 + struct tdx_module_args args = { 1733 + .rcx = tdx_tdr_pa(td), 1734 + .rdx = td_params, 1735 + }; 1736 + u64 ret; 1737 + 1738 + ret = seamcall_ret(TDH_MNG_INIT, &args); 1739 + 1740 + *extended_err = args.rcx; 1741 + 1742 + return ret; 1743 + } 1744 + EXPORT_SYMBOL_GPL(tdh_mng_init); 1745 + 1746 + u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data) 1747 + { 1748 + struct tdx_module_args args = { 1749 + .rcx = tdx_tdvpr_pa(vp), 1750 + .rdx = field, 1751 + }; 1752 + u64 ret; 1753 + 1754 + ret = seamcall_ret(TDH_VP_RD, &args); 1755 + 1756 + /* R8: Content of the field, or 0 in case of error. */ 1757 + *data = args.r8; 1758 + 1759 + return ret; 1760 + } 1761 + EXPORT_SYMBOL_GPL(tdh_vp_rd); 1762 + 1763 + u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask) 1764 + { 1765 + struct tdx_module_args args = { 1766 + .rcx = tdx_tdvpr_pa(vp), 1767 + .rdx = field, 1768 + .r8 = data, 1769 + .r9 = mask, 1770 + }; 1771 + 1772 + return seamcall(TDH_VP_WR, &args); 1773 + } 1774 + EXPORT_SYMBOL_GPL(tdh_vp_wr); 1775 + 1776 + u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid) 1777 + { 1778 + struct tdx_module_args args = { 1779 + .rcx = tdx_tdvpr_pa(vp), 1780 + .rdx = initial_rcx, 1781 + .r8 = x2apicid, 1782 + }; 1783 + 1784 + /* apicid requires version == 1. */ 1785 + return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &args); 1786 + } 1787 + EXPORT_SYMBOL_GPL(tdh_vp_init); 1788 + 1789 + /* 1790 + * TDX ABI defines output operands as PT, OWNER and SIZE. These are TDX defined fomats. 1791 + * So despite the names, they must be interpted specially as described by the spec. Return 1792 + * them only for error reporting purposes. 1793 + */ 1794 + u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx_size) 1795 + { 1796 + struct tdx_module_args args = { 1797 + .rcx = page_to_phys(page), 1798 + }; 1799 + u64 ret; 1800 + 1801 + ret = seamcall_ret(TDH_PHYMEM_PAGE_RECLAIM, &args); 1802 + 1803 + *tdx_pt = args.rcx; 1804 + *tdx_owner = args.rdx; 1805 + *tdx_size = args.r8; 1806 + 1807 + return ret; 1808 + } 1809 + EXPORT_SYMBOL_GPL(tdh_phymem_page_reclaim); 1810 + 1811 + u64 tdh_mem_track(struct tdx_td *td) 1812 + { 1813 + struct tdx_module_args args = { 1814 + .rcx = tdx_tdr_pa(td), 1815 + }; 1816 + 1817 + return seamcall(TDH_MEM_TRACK, &args); 1818 + } 1819 + EXPORT_SYMBOL_GPL(tdh_mem_track); 1820 + 1821 + u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u64 *ext_err2) 1822 + { 1823 + struct tdx_module_args args = { 1824 + .rcx = gpa | level, 1825 + .rdx = tdx_tdr_pa(td), 1826 + }; 1827 + u64 ret; 1828 + 1829 + ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &args); 1830 + 1831 + *ext_err1 = args.rcx; 1832 + *ext_err2 = args.rdx; 1833 + 1834 + return ret; 1835 + } 1836 + EXPORT_SYMBOL_GPL(tdh_mem_page_remove); 1837 + 1838 + u64 tdh_phymem_cache_wb(bool resume) 1839 + { 1840 + struct tdx_module_args args = { 1841 + .rcx = resume ? 1 : 0, 1842 + }; 1843 + 1844 + return seamcall(TDH_PHYMEM_CACHE_WB, &args); 1845 + } 1846 + EXPORT_SYMBOL_GPL(tdh_phymem_cache_wb); 1847 + 1848 + u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td) 1849 + { 1850 + struct tdx_module_args args = {}; 1851 + 1852 + args.rcx = mk_keyed_paddr(tdx_global_keyid, td->tdr_page); 1853 + 1854 + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); 1855 + } 1856 + EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_tdr); 1857 + 1858 + u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) 1859 + { 1860 + struct tdx_module_args args = {}; 1861 + 1862 + args.rcx = mk_keyed_paddr(hkid, page); 1863 + 1864 + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); 1865 + } 1866 + EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid);

+40 -8

arch/x86/virt/vmx/tdx/tdx.h

··· 3 3 #define _X86_VIRT_TDX_H 4 4 5 5 #include <linux/bits.h> 6 - #include "tdx_global_metadata.h" 7 6 8 7 /* 9 8 * This file contains both macros and data structures defined by the TDX ··· 14 15 /* 15 16 * TDX module SEAMCALL leaf functions 16 17 */ 17 - #define TDH_PHYMEM_PAGE_RDMD 24 18 - #define TDH_SYS_KEY_CONFIG 31 19 - #define TDH_SYS_INIT 33 20 - #define TDH_SYS_RD 34 21 - #define TDH_SYS_LP_INIT 35 22 - #define TDH_SYS_TDMR_INIT 36 23 - #define TDH_SYS_CONFIG 45 18 + #define TDH_VP_ENTER 0 19 + #define TDH_MNG_ADDCX 1 20 + #define TDH_MEM_PAGE_ADD 2 21 + #define TDH_MEM_SEPT_ADD 3 22 + #define TDH_VP_ADDCX 4 23 + #define TDH_MEM_PAGE_AUG 6 24 + #define TDH_MEM_RANGE_BLOCK 7 25 + #define TDH_MNG_KEY_CONFIG 8 26 + #define TDH_MNG_CREATE 9 27 + #define TDH_MNG_RD 11 28 + #define TDH_MR_EXTEND 16 29 + #define TDH_MR_FINALIZE 17 30 + #define TDH_VP_FLUSH 18 31 + #define TDH_MNG_VPFLUSHDONE 19 32 + #define TDH_VP_CREATE 10 33 + #define TDH_MNG_KEY_FREEID 20 34 + #define TDH_MNG_INIT 21 35 + #define TDH_VP_INIT 22 36 + #define TDH_PHYMEM_PAGE_RDMD 24 37 + #define TDH_VP_RD 26 38 + #define TDH_PHYMEM_PAGE_RECLAIM 28 39 + #define TDH_MEM_PAGE_REMOVE 29 40 + #define TDH_SYS_KEY_CONFIG 31 41 + #define TDH_SYS_INIT 33 42 + #define TDH_SYS_RD 34 43 + #define TDH_SYS_LP_INIT 35 44 + #define TDH_SYS_TDMR_INIT 36 45 + #define TDH_MEM_TRACK 38 46 + #define TDH_PHYMEM_CACHE_WB 40 47 + #define TDH_PHYMEM_PAGE_WBINVD 41 48 + #define TDH_VP_WR 43 49 + #define TDH_SYS_CONFIG 45 50 + 51 + /* 52 + * SEAMCALL leaf: 53 + * 54 + * Bit 15:0 Leaf number 55 + * Bit 23:16 Version number 56 + */ 57 + #define TDX_VERSION_SHIFT 16 24 58 25 59 /* TDX page types */ 26 60 #define PT_NDA 0x0

+50

arch/x86/virt/vmx/tdx/tdx_global_metadata.c

··· 37 37 return ret; 38 38 } 39 39 40 + static int get_tdx_sys_info_td_ctrl(struct tdx_sys_info_td_ctrl *sysinfo_td_ctrl) 41 + { 42 + int ret = 0; 43 + u64 val; 44 + 45 + if (!ret && !(ret = read_sys_metadata_field(0x9800000100000000, &val))) 46 + sysinfo_td_ctrl->tdr_base_size = val; 47 + if (!ret && !(ret = read_sys_metadata_field(0x9800000100000100, &val))) 48 + sysinfo_td_ctrl->tdcs_base_size = val; 49 + if (!ret && !(ret = read_sys_metadata_field(0x9800000100000200, &val))) 50 + sysinfo_td_ctrl->tdvps_base_size = val; 51 + 52 + return ret; 53 + } 54 + 55 + static int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sysinfo_td_conf) 56 + { 57 + int ret = 0; 58 + u64 val; 59 + int i, j; 60 + 61 + if (!ret && !(ret = read_sys_metadata_field(0x1900000300000000, &val))) 62 + sysinfo_td_conf->attributes_fixed0 = val; 63 + if (!ret && !(ret = read_sys_metadata_field(0x1900000300000001, &val))) 64 + sysinfo_td_conf->attributes_fixed1 = val; 65 + if (!ret && !(ret = read_sys_metadata_field(0x1900000300000002, &val))) 66 + sysinfo_td_conf->xfam_fixed0 = val; 67 + if (!ret && !(ret = read_sys_metadata_field(0x1900000300000003, &val))) 68 + sysinfo_td_conf->xfam_fixed1 = val; 69 + if (!ret && !(ret = read_sys_metadata_field(0x9900000100000004, &val))) 70 + sysinfo_td_conf->num_cpuid_config = val; 71 + if (!ret && !(ret = read_sys_metadata_field(0x9900000100000008, &val))) 72 + sysinfo_td_conf->max_vcpus_per_td = val; 73 + if (sysinfo_td_conf->num_cpuid_config > ARRAY_SIZE(sysinfo_td_conf->cpuid_config_leaves)) 74 + return -EINVAL; 75 + for (i = 0; i < sysinfo_td_conf->num_cpuid_config; i++) 76 + if (!ret && !(ret = read_sys_metadata_field(0x9900000300000400 + i, &val))) 77 + sysinfo_td_conf->cpuid_config_leaves[i] = val; 78 + if (sysinfo_td_conf->num_cpuid_config > ARRAY_SIZE(sysinfo_td_conf->cpuid_config_values)) 79 + return -EINVAL; 80 + for (i = 0; i < sysinfo_td_conf->num_cpuid_config; i++) 81 + for (j = 0; j < 2; j++) 82 + if (!ret && !(ret = read_sys_metadata_field(0x9900000300000500 + i * 2 + j, &val))) 83 + sysinfo_td_conf->cpuid_config_values[i][j] = val; 84 + 85 + return ret; 86 + } 87 + 40 88 static int get_tdx_sys_info(struct tdx_sys_info *sysinfo) 41 89 { 42 90 int ret = 0; 43 91 44 92 ret = ret ?: get_tdx_sys_info_features(&sysinfo->features); 45 93 ret = ret ?: get_tdx_sys_info_tdmr(&sysinfo->tdmr); 94 + ret = ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl); 95 + ret = ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf); 46 96 47 97 return ret; 48 98 }

+19

arch/x86/virt/vmx/tdx/tdx_global_metadata.h arch/x86/include/asm/tdx_global_metadata.h

··· 17 17 u16 pamt_1g_entry_size; 18 18 }; 19 19 20 + struct tdx_sys_info_td_ctrl { 21 + u16 tdr_base_size; 22 + u16 tdcs_base_size; 23 + u16 tdvps_base_size; 24 + }; 25 + 26 + struct tdx_sys_info_td_conf { 27 + u64 attributes_fixed0; 28 + u64 attributes_fixed1; 29 + u64 xfam_fixed0; 30 + u64 xfam_fixed1; 31 + u16 num_cpuid_config; 32 + u16 max_vcpus_per_td; 33 + u64 cpuid_config_leaves[128]; 34 + u64 cpuid_config_values[128][2]; 35 + }; 36 + 20 37 struct tdx_sys_info { 21 38 struct tdx_sys_info_features features; 22 39 struct tdx_sys_info_tdmr tdmr; 40 + struct tdx_sys_info_td_ctrl td_ctrl; 41 + struct tdx_sys_info_td_conf td_conf; 23 42 }; 24 43 25 44 #endif

+6 -5

include/linux/kvm_dirty_ring.h

··· 32 32 * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should 33 33 * not be included as well, so define these nop functions for the arch. 34 34 */ 35 - static inline u32 kvm_dirty_ring_get_rsvd_entries(void) 35 + static inline u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm) 36 36 { 37 37 return 0; 38 38 } ··· 42 42 return true; 43 43 } 44 44 45 - static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, 45 + static inline int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, 46 46 int index, u32 size) 47 47 { 48 48 return 0; ··· 71 71 72 72 #else /* CONFIG_HAVE_KVM_DIRTY_RING */ 73 73 74 - int kvm_cpu_dirty_log_size(void); 74 + int kvm_cpu_dirty_log_size(struct kvm *kvm); 75 75 bool kvm_use_dirty_bitmap(struct kvm *kvm); 76 76 bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm); 77 - u32 kvm_dirty_ring_get_rsvd_entries(void); 78 - int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size); 77 + u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm); 78 + int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, 79 + int index, u32 size); 79 80 80 81 /* 81 82 * called with kvm->slots_lock held, returns the number of

+10

include/linux/kvm_host.h

··· 1610 1610 int kvm_arch_enable_virtualization_cpu(void); 1611 1611 void kvm_arch_disable_virtualization_cpu(void); 1612 1612 #endif 1613 + bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu); 1613 1614 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); 1614 1615 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu); 1615 1616 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); ··· 2285 2284 } 2286 2285 2287 2286 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING 2287 + extern bool enable_virt_at_load; 2288 2288 extern bool kvm_rebooting; 2289 2289 #endif 2290 2290 ··· 2571 2569 #ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY 2572 2570 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, 2573 2571 struct kvm_pre_fault_memory *range); 2572 + #endif 2573 + 2574 + #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING 2575 + int kvm_enable_virtualization(void); 2576 + void kvm_disable_virtualization(void); 2577 + #else 2578 + static inline int kvm_enable_virtualization(void) { return 0; } 2579 + static inline void kvm_disable_virtualization(void) { } 2574 2580 #endif 2575 2581 2576 2582 #endif

+4

include/linux/misc_cgroup.h

··· 18 18 /** @MISC_CG_RES_SEV_ES: AMD SEV-ES ASIDs resource */ 19 19 MISC_CG_RES_SEV_ES, 20 20 #endif 21 + #ifdef CONFIG_INTEL_TDX_HOST 22 + /* Intel TDX HKIDs resource */ 23 + MISC_CG_RES_TDX, 24 + #endif 21 25 /** @MISC_CG_RES_TYPES: count of enum misc_res_type constants */ 22 26 MISC_CG_RES_TYPES 23 27 };

+3 -3

include/linux/ubsan.h

··· 2 2 #ifndef _LINUX_UBSAN_H 3 3 #define _LINUX_UBSAN_H 4 4 5 - #ifdef CONFIG_UBSAN_TRAP 6 - const char *report_ubsan_failure(struct pt_regs *regs, u32 check_type); 5 + #if defined(CONFIG_UBSAN_TRAP) || defined(CONFIG_UBSAN_KVM_EL2) 6 + const char *report_ubsan_failure(u32 check_type); 7 7 #else 8 - static inline const char *report_ubsan_failure(struct pt_regs *regs, u32 check_type) 8 + static inline const char *report_ubsan_failure(u32 check_type) 9 9 { 10 10 return NULL; 11 11 }

+4

include/uapi/linux/kvm.h

··· 375 375 #define KVM_SYSTEM_EVENT_WAKEUP 4 376 376 #define KVM_SYSTEM_EVENT_SUSPEND 5 377 377 #define KVM_SYSTEM_EVENT_SEV_TERM 6 378 + #define KVM_SYSTEM_EVENT_TDX_FATAL 7 378 379 __u32 type; 379 380 __u32 ndata; 380 381 union { ··· 931 930 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 932 931 #define KVM_CAP_X86_GUEST_MODE 238 933 932 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239 933 + #define KVM_CAP_ARM_EL2 240 934 + #define KVM_CAP_ARM_EL2_E2H0 241 935 + #define KVM_CAP_RISCV_MP_STATE_RESET 242 934 936 935 937 struct kvm_irq_routing_irqchip { 936 938 __u32 irqchip;

+4

kernel/cgroup/misc.c

··· 24 24 /* AMD SEV-ES ASIDs resource */ 25 25 "sev_es", 26 26 #endif 27 + #ifdef CONFIG_INTEL_TDX_HOST 28 + /* Intel TDX HKIDs resource */ 29 + "tdx", 30 + #endif 27 31 }; 28 32 29 33 /* Root misc cgroup */

+9

lib/Kconfig.ubsan

··· 165 165 This is a test module for UBSAN. 166 166 It triggers various undefined behavior, and detect it. 167 167 168 + config UBSAN_KVM_EL2 169 + bool "UBSAN for KVM code at EL2" 170 + depends on ARM64 171 + help 172 + Enable UBSAN when running on ARM64 with KVM in a split mode 173 + (nvhe/hvhe/protected) for the hypervisor code running in EL2. 174 + In this mode, any UBSAN violation in EL2 would panic the kernel 175 + and information similar to UBSAN_TRAP would be printed. 176 + 168 177 endif # if UBSAN

+5 -3

lib/ubsan.c

··· 19 19 20 20 #include "ubsan.h" 21 21 22 - #ifdef CONFIG_UBSAN_TRAP 22 + #if defined(CONFIG_UBSAN_TRAP) || defined(CONFIG_UBSAN_KVM_EL2) 23 23 /* 24 24 * Only include matches for UBSAN checks that are actually compiled in. 25 25 * The mappings of struct SanitizerKind (the -fsanitize=xxx args) to 26 26 * enum SanitizerHandler (the traps) in Clang is in clang/lib/CodeGen/. 27 27 */ 28 - const char *report_ubsan_failure(struct pt_regs *regs, u32 check_type) 28 + const char *report_ubsan_failure(u32 check_type) 29 29 { 30 30 switch (check_type) { 31 31 #ifdef CONFIG_UBSAN_BOUNDS ··· 97 97 } 98 98 } 99 99 100 - #else 100 + #endif 101 + 102 + #ifndef CONFIG_UBSAN_TRAP 101 103 static const char * const type_check_kinds[] = { 102 104 "load of", 103 105 "store to",

+4 -1

scripts/Makefile.ubsan

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 3 + # Shared with KVM/arm64. 4 + export CFLAGS_UBSAN_TRAP := $(call cc-option,-fsanitize-trap=undefined,-fsanitize-undefined-trap-on-error) 5 + 3 6 # Enable available and selected UBSAN features. 4 7 ubsan-cflags-$(CONFIG_UBSAN_ALIGNMENT) += -fsanitize=alignment 5 8 ubsan-cflags-$(CONFIG_UBSAN_BOUNDS_STRICT) += -fsanitize=bounds-strict ··· 13 10 ubsan-cflags-$(CONFIG_UBSAN_UNREACHABLE) += -fsanitize=unreachable 14 11 ubsan-cflags-$(CONFIG_UBSAN_BOOL) += -fsanitize=bool 15 12 ubsan-cflags-$(CONFIG_UBSAN_ENUM) += -fsanitize=enum 16 - ubsan-cflags-$(CONFIG_UBSAN_TRAP) += $(call cc-option,-fsanitize-trap=undefined,-fsanitize-undefined-trap-on-error) 13 + ubsan-cflags-$(CONFIG_UBSAN_TRAP) += $(CFLAGS_UBSAN_TRAP) 17 14 18 15 export CFLAGS_UBSAN := $(ubsan-cflags-y) 19 16

+48 -17

tools/arch/arm64/include/asm/sysreg.h

··· 117 117 118 118 #define SB_BARRIER_INSN __SYS_BARRIER_INSN(0, 7, 31) 119 119 120 + /* Data cache zero operations */ 120 121 #define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) 121 122 #define SYS_DC_IGSW sys_insn(1, 0, 7, 6, 4) 122 123 #define SYS_DC_IGDSW sys_insn(1, 0, 7, 6, 6) ··· 154 153 #define SYS_DC_CIGVAC sys_insn(1, 3, 7, 14, 3) 155 154 #define SYS_DC_CIGDVAC sys_insn(1, 3, 7, 14, 5) 156 155 157 - /* Data cache zero operations */ 158 156 #define SYS_DC_ZVA sys_insn(1, 3, 7, 4, 1) 159 157 #define SYS_DC_GVA sys_insn(1, 3, 7, 4, 3) 160 158 #define SYS_DC_GZVA sys_insn(1, 3, 7, 4, 4) 159 + 160 + #define SYS_DC_CIVAPS sys_insn(1, 0, 7, 15, 1) 161 + #define SYS_DC_CIGDVAPS sys_insn(1, 0, 7, 15, 5) 161 162 162 163 /* 163 164 * Automatically generated definitions for system registers, the ··· 478 475 #define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0) 479 476 480 477 #define SYS_CNTPCT_EL0 sys_reg(3, 3, 14, 0, 1) 478 + #define SYS_CNTVCT_EL0 sys_reg(3, 3, 14, 0, 2) 481 479 #define SYS_CNTPCTSS_EL0 sys_reg(3, 3, 14, 0, 5) 482 480 #define SYS_CNTVCTSS_EL0 sys_reg(3, 3, 14, 0, 6) 483 481 ··· 486 482 #define SYS_CNTP_CTL_EL0 sys_reg(3, 3, 14, 2, 1) 487 483 #define SYS_CNTP_CVAL_EL0 sys_reg(3, 3, 14, 2, 2) 488 484 485 + #define SYS_CNTV_TVAL_EL0 sys_reg(3, 3, 14, 3, 0) 489 486 #define SYS_CNTV_CTL_EL0 sys_reg(3, 3, 14, 3, 1) 490 487 #define SYS_CNTV_CVAL_EL0 sys_reg(3, 3, 14, 3, 2) 491 488 492 489 #define SYS_AARCH32_CNTP_TVAL sys_reg(0, 0, 14, 2, 0) 493 490 #define SYS_AARCH32_CNTP_CTL sys_reg(0, 0, 14, 2, 1) 494 491 #define SYS_AARCH32_CNTPCT sys_reg(0, 0, 0, 14, 0) 492 + #define SYS_AARCH32_CNTVCT sys_reg(0, 1, 0, 14, 0) 495 493 #define SYS_AARCH32_CNTP_CVAL sys_reg(0, 2, 0, 14, 0) 496 494 #define SYS_AARCH32_CNTPCTSS sys_reg(0, 8, 0, 14, 0) 495 + #define SYS_AARCH32_CNTVCTSS sys_reg(0, 9, 0, 14, 0) 497 496 498 497 #define __PMEV_op2(n) ((n) & 0x7) 499 498 #define __CNTR_CRm(n) (0x8 | (((n) >> 3) & 0x3)) 499 + #define SYS_PMEVCNTSVRn_EL1(n) sys_reg(2, 0, 14, __CNTR_CRm(n), __PMEV_op2(n)) 500 500 #define SYS_PMEVCNTRn_EL0(n) sys_reg(3, 3, 14, __CNTR_CRm(n), __PMEV_op2(n)) 501 501 #define __TYPER_CRm(n) (0xc | (((n) >> 3) & 0x3)) 502 502 #define SYS_PMEVTYPERn_EL0(n) sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n)) 503 503 504 504 #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) 505 + 506 + #define SYS_SPMCGCRn_EL1(n) sys_reg(2, 0, 9, 13, ((n) & 1)) 507 + 508 + #define __SPMEV_op2(n) ((n) & 0x7) 509 + #define __SPMEV_crm(p, n) ((((p) & 7) << 1) | (((n) >> 3) & 1)) 510 + #define SYS_SPMEVCNTRn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b000, n), __SPMEV_op2(n)) 511 + #define SYS_SPMEVFILT2Rn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b011, n), __SPMEV_op2(n)) 512 + #define SYS_SPMEVFILTRn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b010, n), __SPMEV_op2(n)) 513 + #define SYS_SPMEVTYPERn_EL0(n) sys_reg(2, 3, 14, __SPMEV_crm(0b001, n), __SPMEV_op2(n)) 505 514 506 515 #define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0) 507 516 #define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5) ··· 535 518 #define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2) 536 519 537 520 #define SYS_VNCR_EL2 sys_reg(3, 4, 2, 2, 0) 538 - #define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6) 539 521 #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) 540 522 #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) 541 523 #define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0) ··· 620 604 621 605 /* VHE encodings for architectural EL0/1 system registers */ 622 606 #define SYS_BRBCR_EL12 sys_reg(2, 5, 9, 0, 0) 623 - #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) 624 - #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) 625 - #define SYS_SCTLR2_EL12 sys_reg(3, 5, 1, 0, 3) 626 - #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) 627 - #define SYS_TRFCR_EL12 sys_reg(3, 5, 1, 2, 1) 628 - #define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6) 629 607 #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) 630 608 #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) 631 - #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) 632 - #define SYS_TCR2_EL12 sys_reg(3, 5, 2, 0, 3) 633 609 #define SYS_SPSR_EL12 sys_reg(3, 5, 4, 0, 0) 634 610 #define SYS_ELR_EL12 sys_reg(3, 5, 4, 0, 1) 635 611 #define SYS_AFSR0_EL12 sys_reg(3, 5, 5, 1, 0) 636 612 #define SYS_AFSR1_EL12 sys_reg(3, 5, 5, 1, 1) 637 613 #define SYS_ESR_EL12 sys_reg(3, 5, 5, 2, 0) 638 614 #define SYS_TFSR_EL12 sys_reg(3, 5, 5, 6, 0) 639 - #define SYS_FAR_EL12 sys_reg(3, 5, 6, 0, 0) 640 615 #define SYS_PMSCR_EL12 sys_reg(3, 5, 9, 9, 0) 641 616 #define SYS_MAIR_EL12 sys_reg(3, 5, 10, 2, 0) 642 617 #define SYS_AMAIR_EL12 sys_reg(3, 5, 10, 3, 0) 643 618 #define SYS_VBAR_EL12 sys_reg(3, 5, 12, 0, 0) 644 - #define SYS_CONTEXTIDR_EL12 sys_reg(3, 5, 13, 0, 1) 645 619 #define SYS_SCXTNUM_EL12 sys_reg(3, 5, 13, 0, 7) 646 620 #define SYS_CNTKCTL_EL12 sys_reg(3, 5, 14, 1, 0) 647 621 #define SYS_CNTP_TVAL_EL02 sys_reg(3, 5, 14, 2, 0) ··· 1034 1028 #define PIE_RX UL(0xa) 1035 1029 #define PIE_RW UL(0xc) 1036 1030 #define PIE_RWX UL(0xe) 1031 + #define PIE_MASK UL(0xf) 1037 1032 1038 - #define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4)) 1033 + #define PIRx_ELx_BITS_PER_IDX 4 1034 + #define PIRx_ELx_PERM_SHIFT(idx) ((idx) * PIRx_ELx_BITS_PER_IDX) 1035 + #define PIRx_ELx_PERM_PREP(idx, perm) (((perm) & PIE_MASK) << PIRx_ELx_PERM_SHIFT(idx)) 1039 1036 1040 1037 /* 1041 1038 * Permission Overlay Extension (POE) permission encodings. ··· 1049 1040 #define POE_RX UL(0x3) 1050 1041 #define POE_W UL(0x4) 1051 1042 #define POE_RW UL(0x5) 1052 - #define POE_XW UL(0x6) 1053 - #define POE_RXW UL(0x7) 1043 + #define POE_WX UL(0x6) 1044 + #define POE_RWX UL(0x7) 1054 1045 #define POE_MASK UL(0xf) 1055 1046 1056 - /* Initial value for Permission Overlay Extension for EL0 */ 1057 - #define POR_EL0_INIT POE_RXW 1047 + #define POR_ELx_BITS_PER_IDX 4 1048 + #define POR_ELx_PERM_SHIFT(idx) ((idx) * POR_ELx_BITS_PER_IDX) 1049 + #define POR_ELx_PERM_GET(idx, reg) (((reg) >> POR_ELx_PERM_SHIFT(idx)) & POE_MASK) 1050 + #define POR_ELx_PERM_PREP(idx, perm) (((perm) & POE_MASK) << POR_ELx_PERM_SHIFT(idx)) 1051 + 1052 + /* 1053 + * Definitions for Guarded Control Stack 1054 + */ 1055 + 1056 + #define GCS_CAP_ADDR_MASK GENMASK(63, 12) 1057 + #define GCS_CAP_ADDR_SHIFT 12 1058 + #define GCS_CAP_ADDR_WIDTH 52 1059 + #define GCS_CAP_ADDR(x) FIELD_GET(GCS_CAP_ADDR_MASK, x) 1060 + 1061 + #define GCS_CAP_TOKEN_MASK GENMASK(11, 0) 1062 + #define GCS_CAP_TOKEN_SHIFT 0 1063 + #define GCS_CAP_TOKEN_WIDTH 12 1064 + #define GCS_CAP_TOKEN(x) FIELD_GET(GCS_CAP_TOKEN_MASK, x) 1065 + 1066 + #define GCS_CAP_VALID_TOKEN 0x1 1067 + #define GCS_CAP_IN_PROGRESS_TOKEN 0x5 1068 + 1069 + #define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \ 1070 + GCS_CAP_VALID_TOKEN) 1058 1071 1059 1072 #define ARM64_FEATURE_FIELD_BITS 4 1060 1073

+1 -1

tools/testing/selftests/kvm/Makefile

··· 3 3 include $(top_srcdir)/scripts/subarch.include 4 4 ARCH ?= $(SUBARCH) 5 5 6 - ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64)) 6 + ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64 loongarch)) 7 7 # Top-level selftests allows ARCH=x86_64 :-( 8 8 ifeq ($(ARCH),x86_64) 9 9 ARCH := x86

+18

tools/testing/selftests/kvm/Makefile.kvm

··· 47 47 LIBKVM_riscv += lib/riscv/processor.c 48 48 LIBKVM_riscv += lib/riscv/ucall.c 49 49 50 + LIBKVM_loongarch += lib/loongarch/processor.c 51 + LIBKVM_loongarch += lib/loongarch/ucall.c 52 + LIBKVM_loongarch += lib/loongarch/exception.S 53 + 50 54 # Non-compiled test targets 51 55 TEST_PROGS_x86 += x86/nx_huge_pages_test.sh 52 56 ··· 151 147 TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs 152 148 TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases 153 149 TEST_GEN_PROGS_arm64 += arm64/debug-exceptions 150 + TEST_GEN_PROGS_arm64 += arm64/host_sve 154 151 TEST_GEN_PROGS_arm64 += arm64/hypercalls 155 152 TEST_GEN_PROGS_arm64 += arm64/mmio_abort 156 153 TEST_GEN_PROGS_arm64 += arm64/page_fault_test ··· 194 189 TEST_GEN_PROGS_riscv += coalesced_io_test 195 190 TEST_GEN_PROGS_riscv += get-reg-list 196 191 TEST_GEN_PROGS_riscv += steal_time 192 + 193 + TEST_GEN_PROGS_loongarch += coalesced_io_test 194 + TEST_GEN_PROGS_loongarch += demand_paging_test 195 + TEST_GEN_PROGS_loongarch += dirty_log_perf_test 196 + TEST_GEN_PROGS_loongarch += dirty_log_test 197 + TEST_GEN_PROGS_loongarch += guest_print_test 198 + TEST_GEN_PROGS_loongarch += hardware_disable_test 199 + TEST_GEN_PROGS_loongarch += kvm_binary_stats_test 200 + TEST_GEN_PROGS_loongarch += kvm_create_max_vcpus 201 + TEST_GEN_PROGS_loongarch += kvm_page_table_test 202 + TEST_GEN_PROGS_loongarch += memslot_modification_stress_test 203 + TEST_GEN_PROGS_loongarch += memslot_perf_test 204 + TEST_GEN_PROGS_loongarch += set_memory_region_test 197 205 198 206 SPLIT_TESTS += arch_timer 199 207 SPLIT_TESTS += get-reg-list

+127

tools/testing/selftests/kvm/arm64/host_sve.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + /* 4 + * Host SVE: Check FPSIMD/SVE/SME save/restore over KVM_RUN ioctls. 5 + * 6 + * Copyright 2025 Arm, Ltd 7 + */ 8 + 9 + #include <errno.h> 10 + #include <signal.h> 11 + #include <sys/auxv.h> 12 + #include <asm/kvm.h> 13 + #include <kvm_util.h> 14 + 15 + #include "ucall_common.h" 16 + 17 + static void guest_code(void) 18 + { 19 + for (int i = 0; i < 10; i++) { 20 + GUEST_UCALL_NONE(); 21 + } 22 + 23 + GUEST_DONE(); 24 + } 25 + 26 + void handle_sigill(int sig, siginfo_t *info, void *ctx) 27 + { 28 + ucontext_t *uctx = ctx; 29 + 30 + printf(" < host signal %d >\n", sig); 31 + 32 + /* 33 + * Skip the UDF 34 + */ 35 + uctx->uc_mcontext.pc += 4; 36 + } 37 + 38 + void register_sigill_handler(void) 39 + { 40 + struct sigaction sa = { 41 + .sa_sigaction = handle_sigill, 42 + .sa_flags = SA_SIGINFO, 43 + }; 44 + sigaction(SIGILL, &sa, NULL); 45 + } 46 + 47 + static void do_sve_roundtrip(void) 48 + { 49 + unsigned long before, after; 50 + 51 + /* 52 + * Set all bits in a predicate register, force a save/restore via a 53 + * SIGILL (which handle_sigill() will recover from), then report 54 + * whether the value has changed. 55 + */ 56 + asm volatile( 57 + " .arch_extension sve\n" 58 + " ptrue p0.B\n" 59 + " cntp %[before], p0, p0.B\n" 60 + " udf #0\n" 61 + " cntp %[after], p0, p0.B\n" 62 + : [before] "=r" (before), 63 + [after] "=r" (after) 64 + : 65 + : "p0" 66 + ); 67 + 68 + if (before != after) { 69 + TEST_FAIL("Signal roundtrip discarded predicate bits (%ld => %ld)\n", 70 + before, after); 71 + } else { 72 + printf("Signal roundtrip preserved predicate bits (%ld => %ld)\n", 73 + before, after); 74 + } 75 + } 76 + 77 + static void test_run(void) 78 + { 79 + struct kvm_vcpu *vcpu; 80 + struct kvm_vm *vm; 81 + struct ucall uc; 82 + bool guest_done = false; 83 + 84 + register_sigill_handler(); 85 + 86 + vm = vm_create_with_one_vcpu(&vcpu, guest_code); 87 + 88 + do_sve_roundtrip(); 89 + 90 + while (!guest_done) { 91 + 92 + printf("Running VCPU...\n"); 93 + vcpu_run(vcpu); 94 + 95 + switch (get_ucall(vcpu, &uc)) { 96 + case UCALL_NONE: 97 + do_sve_roundtrip(); 98 + do_sve_roundtrip(); 99 + break; 100 + case UCALL_DONE: 101 + guest_done = true; 102 + break; 103 + case UCALL_ABORT: 104 + REPORT_GUEST_ASSERT(uc); 105 + break; 106 + default: 107 + TEST_FAIL("Unexpected guest exit"); 108 + } 109 + } 110 + 111 + kvm_vm_free(vm); 112 + } 113 + 114 + int main(void) 115 + { 116 + /* 117 + * This is testing the host environment, we don't care about 118 + * guest SVE support. 119 + */ 120 + if (!(getauxval(AT_HWCAP) & HWCAP_SVE)) { 121 + printf("SVE not supported\n"); 122 + return KSFT_SKIP; 123 + } 124 + 125 + test_run(); 126 + return 0; 127 + }

+76 -1

tools/testing/selftests/kvm/arm64/set_id_regs.c

··· 15 15 #include "test_util.h" 16 16 #include <linux/bitfield.h> 17 17 18 + bool have_cap_arm_mte; 19 + 18 20 enum ftr_type { 19 21 FTR_EXACT, /* Use a predefined safe value */ 20 22 FTR_LOWER_SAFE, /* Smaller value is safe */ ··· 545 543 ksft_test_result_fail("ID_AA64PFR1_EL1.MPAM_frac value should not be ignored\n"); 546 544 } 547 545 546 + #define MTE_IDREG_TEST 1 547 + static void test_user_set_mte_reg(struct kvm_vcpu *vcpu) 548 + { 549 + uint64_t masks[KVM_ARM_FEATURE_ID_RANGE_SIZE]; 550 + struct reg_mask_range range = { 551 + .addr = (__u64)masks, 552 + }; 553 + uint64_t val; 554 + uint64_t mte; 555 + uint64_t mte_frac; 556 + int idx, err; 557 + 558 + if (!have_cap_arm_mte) { 559 + ksft_test_result_skip("MTE capability not supported, nothing to test\n"); 560 + return; 561 + } 562 + 563 + /* Get writable masks for feature ID registers */ 564 + memset(range.reserved, 0, sizeof(range.reserved)); 565 + vm_ioctl(vcpu->vm, KVM_ARM_GET_REG_WRITABLE_MASKS, &range); 566 + 567 + idx = encoding_to_range_idx(SYS_ID_AA64PFR1_EL1); 568 + if ((masks[idx] & ID_AA64PFR1_EL1_MTE_frac_MASK) == ID_AA64PFR1_EL1_MTE_frac_MASK) { 569 + ksft_test_result_skip("ID_AA64PFR1_EL1.MTE_frac is officially writable, nothing to test\n"); 570 + return; 571 + } 572 + 573 + /* 574 + * When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2) 575 + * ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported 576 + * and MTE_frac == 0 indicates it is supported. 577 + * 578 + * As MTE_frac was previously unconditionally read as 0, check 579 + * that the set to 0 succeeds but does not change MTE_frac 580 + * from unsupported (0xF) to supported (0). 581 + * 582 + */ 583 + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); 584 + 585 + mte = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE), val); 586 + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); 587 + if (mte != ID_AA64PFR1_EL1_MTE_MTE2 || 588 + mte_frac != ID_AA64PFR1_EL1_MTE_frac_NI) { 589 + ksft_test_result_skip("MTE_ASYNC or MTE_ASYMM are supported, nothing to test\n"); 590 + return; 591 + } 592 + 593 + /* Try to set MTE_frac=0. */ 594 + val &= ~ID_AA64PFR1_EL1_MTE_frac_MASK; 595 + val |= FIELD_PREP(ID_AA64PFR1_EL1_MTE_frac_MASK, 0); 596 + err = __vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1), val); 597 + if (err) { 598 + ksft_test_result_fail("ID_AA64PFR1_EL1.MTE_frac=0 was not accepted\n"); 599 + return; 600 + } 601 + 602 + val = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_ID_AA64PFR1_EL1)); 603 + mte_frac = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE_frac), val); 604 + if (mte_frac == ID_AA64PFR1_EL1_MTE_frac_NI) 605 + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac=0 accepted and still 0xF\n"); 606 + else 607 + ksft_test_result_pass("ID_AA64PFR1_EL1.MTE_frac no longer 0xF\n"); 608 + } 609 + 548 610 static void test_guest_reg_read(struct kvm_vcpu *vcpu) 549 611 { 550 612 bool done = false; ··· 739 673 ksft_test_result_pass("%s\n", __func__); 740 674 } 741 675 676 + void kvm_arch_vm_post_create(struct kvm_vm *vm) 677 + { 678 + if (vm_check_cap(vm, KVM_CAP_ARM_MTE)) { 679 + vm_enable_cap(vm, KVM_CAP_ARM_MTE, 0); 680 + have_cap_arm_mte = true; 681 + } 682 + } 683 + 742 684 int main(void) 743 685 { 744 686 struct kvm_vcpu *vcpu; ··· 775 701 ARRAY_SIZE(ftr_id_aa64pfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + 776 702 ARRAY_SIZE(ftr_id_aa64mmfr1_el1) + ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + 777 703 ARRAY_SIZE(ftr_id_aa64zfr0_el1) - ARRAY_SIZE(test_regs) + 3 + 778 - MPAM_IDREG_TEST; 704 + MPAM_IDREG_TEST + MTE_IDREG_TEST; 779 705 780 706 ksft_set_plan(test_cnt); 781 707 ··· 783 709 test_vcpu_ftr_id_regs(vcpu); 784 710 test_vcpu_non_ftr_id_regs(vcpu); 785 711 test_user_set_mpam_reg(vcpu); 712 + test_user_set_mte_reg(vcpu); 786 713 787 714 test_guest_reg_read(vcpu); 788 715

+6

tools/testing/selftests/kvm/include/kvm_util.h

··· 177 177 VM_MODE_P36V48_4K, 178 178 VM_MODE_P36V48_16K, 179 179 VM_MODE_P36V48_64K, 180 + VM_MODE_P47V47_16K, 180 181 VM_MODE_P36V47_16K, 181 182 NUM_VM_MODES, 182 183 }; ··· 230 229 #endif 231 230 232 231 #define VM_MODE_DEFAULT VM_MODE_P40V48_4K 232 + #define MIN_PAGE_SHIFT 12U 233 + #define ptes_per_page(page_size) ((page_size) / 8) 234 + 235 + #elif defined(__loongarch__) 236 + #define VM_MODE_DEFAULT VM_MODE_P47V47_16K 233 237 #define MIN_PAGE_SHIFT 12U 234 238 #define ptes_per_page(page_size) ((page_size) / 8) 235 239

+7

tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTEST_KVM_UTIL_ARCH_H 3 + #define SELFTEST_KVM_UTIL_ARCH_H 4 + 5 + struct kvm_vm_arch {}; 6 + 7 + #endif // SELFTEST_KVM_UTIL_ARCH_H

+141

tools/testing/selftests/kvm/include/loongarch/processor.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + 3 + #ifndef SELFTEST_KVM_PROCESSOR_H 4 + #define SELFTEST_KVM_PROCESSOR_H 5 + 6 + #ifndef __ASSEMBLER__ 7 + #include "ucall_common.h" 8 + 9 + #else 10 + /* general registers */ 11 + #define zero $r0 12 + #define ra $r1 13 + #define tp $r2 14 + #define sp $r3 15 + #define a0 $r4 16 + #define a1 $r5 17 + #define a2 $r6 18 + #define a3 $r7 19 + #define a4 $r8 20 + #define a5 $r9 21 + #define a6 $r10 22 + #define a7 $r11 23 + #define t0 $r12 24 + #define t1 $r13 25 + #define t2 $r14 26 + #define t3 $r15 27 + #define t4 $r16 28 + #define t5 $r17 29 + #define t6 $r18 30 + #define t7 $r19 31 + #define t8 $r20 32 + #define u0 $r21 33 + #define fp $r22 34 + #define s0 $r23 35 + #define s1 $r24 36 + #define s2 $r25 37 + #define s3 $r26 38 + #define s4 $r27 39 + #define s5 $r28 40 + #define s6 $r29 41 + #define s7 $r30 42 + #define s8 $r31 43 + #endif 44 + 45 + /* 46 + * LoongArch page table entry definition 47 + * Original header file arch/loongarch/include/asm/loongarch.h 48 + */ 49 + #define _PAGE_VALID_SHIFT 0 50 + #define _PAGE_DIRTY_SHIFT 1 51 + #define _PAGE_PLV_SHIFT 2 /* 2~3, two bits */ 52 + #define PLV_KERN 0 53 + #define PLV_USER 3 54 + #define PLV_MASK 0x3 55 + #define _CACHE_SHIFT 4 /* 4~5, two bits */ 56 + #define _PAGE_PRESENT_SHIFT 7 57 + #define _PAGE_WRITE_SHIFT 8 58 + 59 + #define _PAGE_VALID BIT_ULL(_PAGE_VALID_SHIFT) 60 + #define _PAGE_PRESENT BIT_ULL(_PAGE_PRESENT_SHIFT) 61 + #define _PAGE_WRITE BIT_ULL(_PAGE_WRITE_SHIFT) 62 + #define _PAGE_DIRTY BIT_ULL(_PAGE_DIRTY_SHIFT) 63 + #define _PAGE_USER (PLV_USER << _PAGE_PLV_SHIFT) 64 + #define __READABLE (_PAGE_VALID) 65 + #define __WRITEABLE (_PAGE_DIRTY | _PAGE_WRITE) 66 + /* Coherent Cached */ 67 + #define _CACHE_CC BIT_ULL(_CACHE_SHIFT) 68 + #define PS_4K 0x0000000c 69 + #define PS_16K 0x0000000e 70 + #define PS_64K 0x00000010 71 + #define PS_DEFAULT_SIZE PS_16K 72 + 73 + /* LoongArch Basic CSR registers */ 74 + #define LOONGARCH_CSR_CRMD 0x0 /* Current mode info */ 75 + #define CSR_CRMD_PG_SHIFT 4 76 + #define CSR_CRMD_PG BIT_ULL(CSR_CRMD_PG_SHIFT) 77 + #define CSR_CRMD_IE_SHIFT 2 78 + #define CSR_CRMD_IE BIT_ULL(CSR_CRMD_IE_SHIFT) 79 + #define CSR_CRMD_PLV_SHIFT 0 80 + #define CSR_CRMD_PLV_WIDTH 2 81 + #define CSR_CRMD_PLV (0x3UL << CSR_CRMD_PLV_SHIFT) 82 + #define PLV_MASK 0x3 83 + #define LOONGARCH_CSR_PRMD 0x1 84 + #define LOONGARCH_CSR_EUEN 0x2 85 + #define LOONGARCH_CSR_ECFG 0x4 86 + #define LOONGARCH_CSR_ESTAT 0x5 /* Exception status */ 87 + #define LOONGARCH_CSR_ERA 0x6 /* ERA */ 88 + #define LOONGARCH_CSR_BADV 0x7 /* Bad virtual address */ 89 + #define LOONGARCH_CSR_EENTRY 0xc 90 + #define LOONGARCH_CSR_TLBIDX 0x10 /* TLB Index, EHINV, PageSize */ 91 + #define CSR_TLBIDX_PS_SHIFT 24 92 + #define CSR_TLBIDX_PS_WIDTH 6 93 + #define CSR_TLBIDX_PS (0x3fUL << CSR_TLBIDX_PS_SHIFT) 94 + #define CSR_TLBIDX_SIZEM 0x3f000000 95 + #define CSR_TLBIDX_SIZE CSR_TLBIDX_PS_SHIFT 96 + #define LOONGARCH_CSR_ASID 0x18 /* ASID */ 97 + #define LOONGARCH_CSR_PGDL 0x19 98 + #define LOONGARCH_CSR_PGDH 0x1a 99 + /* Page table base */ 100 + #define LOONGARCH_CSR_PGD 0x1b 101 + #define LOONGARCH_CSR_PWCTL0 0x1c 102 + #define LOONGARCH_CSR_PWCTL1 0x1d 103 + #define LOONGARCH_CSR_STLBPGSIZE 0x1e 104 + #define LOONGARCH_CSR_CPUID 0x20 105 + #define LOONGARCH_CSR_KS0 0x30 106 + #define LOONGARCH_CSR_KS1 0x31 107 + #define LOONGARCH_CSR_TMID 0x40 108 + #define LOONGARCH_CSR_TCFG 0x41 109 + /* TLB refill exception entry */ 110 + #define LOONGARCH_CSR_TLBRENTRY 0x88 111 + #define LOONGARCH_CSR_TLBRSAVE 0x8b 112 + #define LOONGARCH_CSR_TLBREHI 0x8e 113 + #define CSR_TLBREHI_PS_SHIFT 0 114 + #define CSR_TLBREHI_PS (0x3fUL << CSR_TLBREHI_PS_SHIFT) 115 + 116 + #define EXREGS_GPRS (32) 117 + 118 + #ifndef __ASSEMBLER__ 119 + void handle_tlb_refill(void); 120 + void handle_exception(void); 121 + 122 + struct ex_regs { 123 + unsigned long regs[EXREGS_GPRS]; 124 + unsigned long pc; 125 + unsigned long estat; 126 + unsigned long badv; 127 + }; 128 + 129 + #define PC_OFFSET_EXREGS offsetof(struct ex_regs, pc) 130 + #define ESTAT_OFFSET_EXREGS offsetof(struct ex_regs, estat) 131 + #define BADV_OFFSET_EXREGS offsetof(struct ex_regs, badv) 132 + #define EXREGS_SIZE sizeof(struct ex_regs) 133 + 134 + #else 135 + #define PC_OFFSET_EXREGS ((EXREGS_GPRS + 0) * 8) 136 + #define ESTAT_OFFSET_EXREGS ((EXREGS_GPRS + 1) * 8) 137 + #define BADV_OFFSET_EXREGS ((EXREGS_GPRS + 2) * 8) 138 + #define EXREGS_SIZE ((EXREGS_GPRS + 3) * 8) 139 + #endif 140 + 141 + #endif /* SELFTEST_KVM_PROCESSOR_H */

+20

tools/testing/selftests/kvm/include/loongarch/ucall.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTEST_KVM_UCALL_H 3 + #define SELFTEST_KVM_UCALL_H 4 + 5 + #include "kvm_util.h" 6 + 7 + #define UCALL_EXIT_REASON KVM_EXIT_MMIO 8 + 9 + /* 10 + * ucall_exit_mmio_addr holds per-VM values (global data is duplicated by each 11 + * VM), it must not be accessed from host code. 12 + */ 13 + extern vm_vaddr_t *ucall_exit_mmio_addr; 14 + 15 + static inline void ucall_arch_do_ucall(vm_vaddr_t uc) 16 + { 17 + WRITE_ONCE(*ucall_exit_mmio_addr, uc); 18 + } 19 + 20 + #endif

+20 -3

tools/testing/selftests/kvm/include/riscv/processor.h

··· 11 11 #include <asm/csr.h> 12 12 #include "kvm_util.h" 13 13 14 + #define INSN_OPCODE_MASK 0x007c 15 + #define INSN_OPCODE_SHIFT 2 16 + #define INSN_OPCODE_SYSTEM 28 17 + 18 + #define INSN_MASK_FUNCT3 0x7000 19 + #define INSN_SHIFT_FUNCT3 12 20 + 21 + #define INSN_CSR_MASK 0xfff00000 22 + #define INSN_CSR_SHIFT 20 23 + 24 + #define GET_RM(insn) (((insn) & INSN_MASK_FUNCT3) >> INSN_SHIFT_FUNCT3) 25 + #define GET_CSR_NUM(insn) (((insn) & INSN_CSR_MASK) >> INSN_CSR_SHIFT) 26 + 14 27 static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t subtype, 15 28 uint64_t idx, uint64_t size) 16 29 { ··· 73 60 return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext)); 74 61 } 75 62 76 - struct ex_regs { 63 + struct pt_regs { 64 + unsigned long epc; 77 65 unsigned long ra; 78 66 unsigned long sp; 79 67 unsigned long gp; ··· 106 92 unsigned long t4; 107 93 unsigned long t5; 108 94 unsigned long t6; 109 - unsigned long epc; 95 + /* Supervisor/Machine CSRs */ 110 96 unsigned long status; 97 + unsigned long badaddr; 111 98 unsigned long cause; 99 + /* a0 value before the syscall */ 100 + unsigned long orig_a0; 112 101 }; 113 102 114 103 #define NR_VECTORS 2 115 104 #define NR_EXCEPTIONS 32 116 105 #define EC_MASK (NR_EXCEPTIONS - 1) 117 106 118 - typedef void(*exception_handler_fn)(struct ex_regs *); 107 + typedef void(*exception_handler_fn)(struct pt_regs *); 119 108 120 109 void vm_init_vector_tables(struct kvm_vm *vm); 121 110 void vcpu_init_vector_tables(struct kvm_vcpu *vcpu);

+3

tools/testing/selftests/kvm/lib/kvm_util.c

··· 222 222 [VM_MODE_P36V48_4K] = "PA-bits:36, VA-bits:48, 4K pages", 223 223 [VM_MODE_P36V48_16K] = "PA-bits:36, VA-bits:48, 16K pages", 224 224 [VM_MODE_P36V48_64K] = "PA-bits:36, VA-bits:48, 64K pages", 225 + [VM_MODE_P47V47_16K] = "PA-bits:47, VA-bits:47, 16K pages", 225 226 [VM_MODE_P36V47_16K] = "PA-bits:36, VA-bits:47, 16K pages", 226 227 }; 227 228 _Static_assert(sizeof(strings)/sizeof(char *) == NUM_VM_MODES, ··· 249 248 [VM_MODE_P36V48_4K] = { 36, 48, 0x1000, 12 }, 250 249 [VM_MODE_P36V48_16K] = { 36, 48, 0x4000, 14 }, 251 250 [VM_MODE_P36V48_64K] = { 36, 48, 0x10000, 16 }, 251 + [VM_MODE_P47V47_16K] = { 47, 47, 0x4000, 14 }, 252 252 [VM_MODE_P36V47_16K] = { 36, 47, 0x4000, 14 }, 253 253 }; 254 254 _Static_assert(sizeof(vm_guest_mode_params)/sizeof(struct vm_guest_mode_params) == NUM_VM_MODES, ··· 321 319 case VM_MODE_P36V48_16K: 322 320 vm->pgtable_levels = 4; 323 321 break; 322 + case VM_MODE_P47V47_16K: 324 323 case VM_MODE_P36V47_16K: 325 324 vm->pgtable_levels = 3; 326 325 break;

+59

tools/testing/selftests/kvm/lib/loongarch/exception.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #include "processor.h" 4 + 5 + /* address of refill exception should be 4K aligned */ 6 + .balign 4096 7 + .global handle_tlb_refill 8 + handle_tlb_refill: 9 + csrwr t0, LOONGARCH_CSR_TLBRSAVE 10 + csrrd t0, LOONGARCH_CSR_PGD 11 + lddir t0, t0, 3 12 + lddir t0, t0, 1 13 + ldpte t0, 0 14 + ldpte t0, 1 15 + tlbfill 16 + csrrd t0, LOONGARCH_CSR_TLBRSAVE 17 + ertn 18 + 19 + /* 20 + * save and restore all gprs except base register, 21 + * and default value of base register is sp ($r3). 22 + */ 23 + .macro save_gprs base 24 + .irp n,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 25 + st.d $r\n, \base, 8 * \n 26 + .endr 27 + .endm 28 + 29 + .macro restore_gprs base 30 + .irp n,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 31 + ld.d $r\n, \base, 8 * \n 32 + .endr 33 + .endm 34 + 35 + /* address of general exception should be 4K aligned */ 36 + .balign 4096 37 + .global handle_exception 38 + handle_exception: 39 + csrwr sp, LOONGARCH_CSR_KS0 40 + csrrd sp, LOONGARCH_CSR_KS1 41 + addi.d sp, sp, -EXREGS_SIZE 42 + 43 + save_gprs sp 44 + /* save sp register to stack */ 45 + csrrd t0, LOONGARCH_CSR_KS0 46 + st.d t0, sp, 3 * 8 47 + 48 + csrrd t0, LOONGARCH_CSR_ERA 49 + st.d t0, sp, PC_OFFSET_EXREGS 50 + csrrd t0, LOONGARCH_CSR_ESTAT 51 + st.d t0, sp, ESTAT_OFFSET_EXREGS 52 + csrrd t0, LOONGARCH_CSR_BADV 53 + st.d t0, sp, BADV_OFFSET_EXREGS 54 + 55 + or a0, sp, zero 56 + bl route_exception 57 + restore_gprs sp 58 + csrrd sp, LOONGARCH_CSR_KS0 59 + ertn

+346

tools/testing/selftests/kvm/lib/loongarch/processor.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <assert.h> 4 + #include <linux/compiler.h> 5 + 6 + #include "kvm_util.h" 7 + #include "processor.h" 8 + #include "ucall_common.h" 9 + 10 + #define LOONGARCH_PAGE_TABLE_PHYS_MIN 0x200000 11 + #define LOONGARCH_GUEST_STACK_VADDR_MIN 0x200000 12 + 13 + static vm_paddr_t invalid_pgtable[4]; 14 + 15 + static uint64_t virt_pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level) 16 + { 17 + unsigned int shift; 18 + uint64_t mask; 19 + 20 + shift = level * (vm->page_shift - 3) + vm->page_shift; 21 + mask = (1UL << (vm->page_shift - 3)) - 1; 22 + return (gva >> shift) & mask; 23 + } 24 + 25 + static uint64_t pte_addr(struct kvm_vm *vm, uint64_t entry) 26 + { 27 + return entry & ~((0x1UL << vm->page_shift) - 1); 28 + } 29 + 30 + static uint64_t ptrs_per_pte(struct kvm_vm *vm) 31 + { 32 + return 1 << (vm->page_shift - 3); 33 + } 34 + 35 + static void virt_set_pgtable(struct kvm_vm *vm, vm_paddr_t table, vm_paddr_t child) 36 + { 37 + uint64_t *ptep; 38 + int i, ptrs_per_pte; 39 + 40 + ptep = addr_gpa2hva(vm, table); 41 + ptrs_per_pte = 1 << (vm->page_shift - 3); 42 + for (i = 0; i < ptrs_per_pte; i++) 43 + WRITE_ONCE(*(ptep + i), child); 44 + } 45 + 46 + void virt_arch_pgd_alloc(struct kvm_vm *vm) 47 + { 48 + int i; 49 + vm_paddr_t child, table; 50 + 51 + if (vm->pgd_created) 52 + return; 53 + 54 + child = table = 0; 55 + for (i = 0; i < vm->pgtable_levels; i++) { 56 + invalid_pgtable[i] = child; 57 + table = vm_phy_page_alloc(vm, LOONGARCH_PAGE_TABLE_PHYS_MIN, 58 + vm->memslots[MEM_REGION_PT]); 59 + TEST_ASSERT(table, "Fail to allocate page tale at level %d\n", i); 60 + virt_set_pgtable(vm, table, child); 61 + child = table; 62 + } 63 + vm->pgd = table; 64 + vm->pgd_created = true; 65 + } 66 + 67 + static int virt_pte_none(uint64_t *ptep, int level) 68 + { 69 + return *ptep == invalid_pgtable[level]; 70 + } 71 + 72 + static uint64_t *virt_populate_pte(struct kvm_vm *vm, vm_vaddr_t gva, int alloc) 73 + { 74 + int level; 75 + uint64_t *ptep; 76 + vm_paddr_t child; 77 + 78 + if (!vm->pgd_created) 79 + goto unmapped_gva; 80 + 81 + child = vm->pgd; 82 + level = vm->pgtable_levels - 1; 83 + while (level > 0) { 84 + ptep = addr_gpa2hva(vm, child) + virt_pte_index(vm, gva, level) * 8; 85 + if (virt_pte_none(ptep, level)) { 86 + if (alloc) { 87 + child = vm_alloc_page_table(vm); 88 + virt_set_pgtable(vm, child, invalid_pgtable[level - 1]); 89 + WRITE_ONCE(*ptep, child); 90 + } else 91 + goto unmapped_gva; 92 + 93 + } else 94 + child = pte_addr(vm, *ptep); 95 + level--; 96 + } 97 + 98 + ptep = addr_gpa2hva(vm, child) + virt_pte_index(vm, gva, level) * 8; 99 + return ptep; 100 + 101 + unmapped_gva: 102 + TEST_FAIL("No mapping for vm virtual address, gva: 0x%lx", gva); 103 + exit(EXIT_FAILURE); 104 + } 105 + 106 + vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) 107 + { 108 + uint64_t *ptep; 109 + 110 + ptep = virt_populate_pte(vm, gva, 0); 111 + TEST_ASSERT(*ptep != 0, "Virtual address vaddr: 0x%lx not mapped\n", gva); 112 + 113 + return pte_addr(vm, *ptep) + (gva & (vm->page_size - 1)); 114 + } 115 + 116 + void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr) 117 + { 118 + uint32_t prot_bits; 119 + uint64_t *ptep; 120 + 121 + TEST_ASSERT((vaddr % vm->page_size) == 0, 122 + "Virtual address not on page boundary,\n" 123 + "vaddr: 0x%lx vm->page_size: 0x%x", vaddr, vm->page_size); 124 + TEST_ASSERT(sparsebit_is_set(vm->vpages_valid, 125 + (vaddr >> vm->page_shift)), 126 + "Invalid virtual address, vaddr: 0x%lx", vaddr); 127 + TEST_ASSERT((paddr % vm->page_size) == 0, 128 + "Physical address not on page boundary,\n" 129 + "paddr: 0x%lx vm->page_size: 0x%x", paddr, vm->page_size); 130 + TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn, 131 + "Physical address beyond maximum supported,\n" 132 + "paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x", 133 + paddr, vm->max_gfn, vm->page_size); 134 + 135 + ptep = virt_populate_pte(vm, vaddr, 1); 136 + prot_bits = _PAGE_PRESENT | __READABLE | __WRITEABLE | _CACHE_CC | _PAGE_USER; 137 + WRITE_ONCE(*ptep, paddr | prot_bits); 138 + } 139 + 140 + static void pte_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent, uint64_t page, int level) 141 + { 142 + uint64_t pte, *ptep; 143 + static const char * const type[] = { "pte", "pmd", "pud", "pgd"}; 144 + 145 + if (level < 0) 146 + return; 147 + 148 + for (pte = page; pte < page + ptrs_per_pte(vm) * 8; pte += 8) { 149 + ptep = addr_gpa2hva(vm, pte); 150 + if (virt_pte_none(ptep, level)) 151 + continue; 152 + fprintf(stream, "%*s%s: %lx: %lx at %p\n", 153 + indent, "", type[level], pte, *ptep, ptep); 154 + pte_dump(stream, vm, indent + 1, pte_addr(vm, *ptep), level--); 155 + } 156 + } 157 + 158 + void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent) 159 + { 160 + int level; 161 + 162 + if (!vm->pgd_created) 163 + return; 164 + 165 + level = vm->pgtable_levels - 1; 166 + pte_dump(stream, vm, indent, vm->pgd, level); 167 + } 168 + 169 + void vcpu_arch_dump(FILE *stream, struct kvm_vcpu *vcpu, uint8_t indent) 170 + { 171 + } 172 + 173 + void assert_on_unhandled_exception(struct kvm_vcpu *vcpu) 174 + { 175 + struct ucall uc; 176 + 177 + if (get_ucall(vcpu, &uc) != UCALL_UNHANDLED) 178 + return; 179 + 180 + TEST_FAIL("Unexpected exception (pc:0x%lx, estat:0x%lx, badv:0x%lx)", 181 + uc.args[0], uc.args[1], uc.args[2]); 182 + } 183 + 184 + void route_exception(struct ex_regs *regs) 185 + { 186 + unsigned long pc, estat, badv; 187 + 188 + pc = regs->pc; 189 + badv = regs->badv; 190 + estat = regs->estat; 191 + ucall(UCALL_UNHANDLED, 3, pc, estat, badv); 192 + while (1) ; 193 + } 194 + 195 + void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...) 196 + { 197 + int i; 198 + va_list ap; 199 + struct kvm_regs regs; 200 + 201 + TEST_ASSERT(num >= 1 && num <= 8, "Unsupported number of args,\n" 202 + "num: %u\n", num); 203 + 204 + vcpu_regs_get(vcpu, &regs); 205 + 206 + va_start(ap, num); 207 + for (i = 0; i < num; i++) 208 + regs.gpr[i + 4] = va_arg(ap, uint64_t); 209 + va_end(ap); 210 + 211 + vcpu_regs_set(vcpu, &regs); 212 + } 213 + 214 + static void loongarch_get_csr(struct kvm_vcpu *vcpu, uint64_t id, void *addr) 215 + { 216 + uint64_t csrid; 217 + 218 + csrid = KVM_REG_LOONGARCH_CSR | KVM_REG_SIZE_U64 | 8 * id; 219 + __vcpu_get_reg(vcpu, csrid, addr); 220 + } 221 + 222 + static void loongarch_set_csr(struct kvm_vcpu *vcpu, uint64_t id, uint64_t val) 223 + { 224 + uint64_t csrid; 225 + 226 + csrid = KVM_REG_LOONGARCH_CSR | KVM_REG_SIZE_U64 | 8 * id; 227 + __vcpu_set_reg(vcpu, csrid, val); 228 + } 229 + 230 + static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu) 231 + { 232 + int width; 233 + unsigned long val; 234 + struct kvm_vm *vm = vcpu->vm; 235 + 236 + switch (vm->mode) { 237 + case VM_MODE_P36V47_16K: 238 + case VM_MODE_P47V47_16K: 239 + break; 240 + 241 + default: 242 + TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode); 243 + } 244 + 245 + /* user mode and page enable mode */ 246 + val = PLV_USER | CSR_CRMD_PG; 247 + loongarch_set_csr(vcpu, LOONGARCH_CSR_CRMD, val); 248 + loongarch_set_csr(vcpu, LOONGARCH_CSR_PRMD, val); 249 + loongarch_set_csr(vcpu, LOONGARCH_CSR_EUEN, 1); 250 + loongarch_set_csr(vcpu, LOONGARCH_CSR_ECFG, 0); 251 + loongarch_set_csr(vcpu, LOONGARCH_CSR_TCFG, 0); 252 + loongarch_set_csr(vcpu, LOONGARCH_CSR_ASID, 1); 253 + 254 + val = 0; 255 + width = vm->page_shift - 3; 256 + 257 + switch (vm->pgtable_levels) { 258 + case 4: 259 + /* pud page shift and width */ 260 + val = (vm->page_shift + width * 2) << 20 | (width << 25); 261 + /* fall throuth */ 262 + case 3: 263 + /* pmd page shift and width */ 264 + val |= (vm->page_shift + width) << 10 | (width << 15); 265 + /* pte page shift and width */ 266 + val |= vm->page_shift | width << 5; 267 + break; 268 + default: 269 + TEST_FAIL("Got %u page table levels, expected 3 or 4", vm->pgtable_levels); 270 + } 271 + 272 + loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL0, val); 273 + 274 + /* PGD page shift and width */ 275 + val = (vm->page_shift + width * (vm->pgtable_levels - 1)) | width << 6; 276 + loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL1, val); 277 + loongarch_set_csr(vcpu, LOONGARCH_CSR_PGDL, vm->pgd); 278 + 279 + /* 280 + * Refill exception runs on real mode 281 + * Entry address should be physical address 282 + */ 283 + val = addr_gva2gpa(vm, (unsigned long)handle_tlb_refill); 284 + loongarch_set_csr(vcpu, LOONGARCH_CSR_TLBRENTRY, val); 285 + 286 + /* 287 + * General exception runs on page-enabled mode 288 + * Entry address should be virtual address 289 + */ 290 + val = (unsigned long)handle_exception; 291 + loongarch_set_csr(vcpu, LOONGARCH_CSR_EENTRY, val); 292 + 293 + loongarch_get_csr(vcpu, LOONGARCH_CSR_TLBIDX, &val); 294 + val &= ~CSR_TLBIDX_SIZEM; 295 + val |= PS_DEFAULT_SIZE << CSR_TLBIDX_SIZE; 296 + loongarch_set_csr(vcpu, LOONGARCH_CSR_TLBIDX, val); 297 + 298 + loongarch_set_csr(vcpu, LOONGARCH_CSR_STLBPGSIZE, PS_DEFAULT_SIZE); 299 + 300 + /* LOONGARCH_CSR_KS1 is used for exception stack */ 301 + val = __vm_vaddr_alloc(vm, vm->page_size, 302 + LOONGARCH_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA); 303 + TEST_ASSERT(val != 0, "No memory for exception stack"); 304 + val = val + vm->page_size; 305 + loongarch_set_csr(vcpu, LOONGARCH_CSR_KS1, val); 306 + 307 + loongarch_get_csr(vcpu, LOONGARCH_CSR_TLBREHI, &val); 308 + val &= ~CSR_TLBREHI_PS; 309 + val |= PS_DEFAULT_SIZE << CSR_TLBREHI_PS_SHIFT; 310 + loongarch_set_csr(vcpu, LOONGARCH_CSR_TLBREHI, val); 311 + 312 + loongarch_set_csr(vcpu, LOONGARCH_CSR_CPUID, vcpu->id); 313 + loongarch_set_csr(vcpu, LOONGARCH_CSR_TMID, vcpu->id); 314 + } 315 + 316 + struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id) 317 + { 318 + size_t stack_size; 319 + uint64_t stack_vaddr; 320 + struct kvm_regs regs; 321 + struct kvm_vcpu *vcpu; 322 + 323 + vcpu = __vm_vcpu_add(vm, vcpu_id); 324 + stack_size = vm->page_size; 325 + stack_vaddr = __vm_vaddr_alloc(vm, stack_size, 326 + LOONGARCH_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA); 327 + TEST_ASSERT(stack_vaddr != 0, "No memory for vm stack"); 328 + 329 + loongarch_vcpu_setup(vcpu); 330 + /* Setup guest general purpose registers */ 331 + vcpu_regs_get(vcpu, &regs); 332 + regs.gpr[3] = stack_vaddr + stack_size; 333 + vcpu_regs_set(vcpu, &regs); 334 + 335 + return vcpu; 336 + } 337 + 338 + void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code) 339 + { 340 + struct kvm_regs regs; 341 + 342 + /* Setup guest PC register */ 343 + vcpu_regs_get(vcpu, &regs); 344 + regs.pc = (uint64_t)guest_code; 345 + vcpu_regs_set(vcpu, &regs); 346 + }

+38

tools/testing/selftests/kvm/lib/loongarch/ucall.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * ucall support. A ucall is a "hypercall to userspace". 4 + * 5 + */ 6 + #include "kvm_util.h" 7 + 8 + /* 9 + * ucall_exit_mmio_addr holds per-VM values (global data is duplicated by each 10 + * VM), it must not be accessed from host code. 11 + */ 12 + vm_vaddr_t *ucall_exit_mmio_addr; 13 + 14 + void ucall_arch_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa) 15 + { 16 + vm_vaddr_t mmio_gva = vm_vaddr_unused_gap(vm, vm->page_size, KVM_UTIL_MIN_VADDR); 17 + 18 + virt_map(vm, mmio_gva, mmio_gpa, 1); 19 + 20 + vm->ucall_mmio_addr = mmio_gpa; 21 + 22 + write_guest_global(vm, ucall_exit_mmio_addr, (vm_vaddr_t *)mmio_gva); 23 + } 24 + 25 + void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu) 26 + { 27 + struct kvm_run *run = vcpu->run; 28 + 29 + if (run->exit_reason == KVM_EXIT_MMIO && 30 + run->mmio.phys_addr == vcpu->vm->ucall_mmio_addr) { 31 + TEST_ASSERT(run->mmio.is_write && run->mmio.len == sizeof(uint64_t), 32 + "Unexpected ucall exit mmio address access"); 33 + 34 + return (void *)(*((uint64_t *)run->mmio.data)); 35 + } 36 + 37 + return NULL; 38 + }

+71 -68

tools/testing/selftests/kvm/lib/riscv/handlers.S

··· 10 10 #include <asm/csr.h> 11 11 12 12 .macro save_context 13 - addi sp, sp, (-8*34) 14 - sd x1, 0(sp) 15 - sd x2, 8(sp) 16 - sd x3, 16(sp) 17 - sd x4, 24(sp) 18 - sd x5, 32(sp) 19 - sd x6, 40(sp) 20 - sd x7, 48(sp) 21 - sd x8, 56(sp) 22 - sd x9, 64(sp) 23 - sd x10, 72(sp) 24 - sd x11, 80(sp) 25 - sd x12, 88(sp) 26 - sd x13, 96(sp) 27 - sd x14, 104(sp) 28 - sd x15, 112(sp) 29 - sd x16, 120(sp) 30 - sd x17, 128(sp) 31 - sd x18, 136(sp) 32 - sd x19, 144(sp) 33 - sd x20, 152(sp) 34 - sd x21, 160(sp) 35 - sd x22, 168(sp) 36 - sd x23, 176(sp) 37 - sd x24, 184(sp) 38 - sd x25, 192(sp) 39 - sd x26, 200(sp) 40 - sd x27, 208(sp) 41 - sd x28, 216(sp) 42 - sd x29, 224(sp) 43 - sd x30, 232(sp) 44 - sd x31, 240(sp) 13 + addi sp, sp, (-8*36) 14 + sd x1, 8(sp) 15 + sd x2, 16(sp) 16 + sd x3, 24(sp) 17 + sd x4, 32(sp) 18 + sd x5, 40(sp) 19 + sd x6, 48(sp) 20 + sd x7, 56(sp) 21 + sd x8, 64(sp) 22 + sd x9, 72(sp) 23 + sd x10, 80(sp) 24 + sd x11, 88(sp) 25 + sd x12, 96(sp) 26 + sd x13, 104(sp) 27 + sd x14, 112(sp) 28 + sd x15, 120(sp) 29 + sd x16, 128(sp) 30 + sd x17, 136(sp) 31 + sd x18, 144(sp) 32 + sd x19, 152(sp) 33 + sd x20, 160(sp) 34 + sd x21, 168(sp) 35 + sd x22, 176(sp) 36 + sd x23, 184(sp) 37 + sd x24, 192(sp) 38 + sd x25, 200(sp) 39 + sd x26, 208(sp) 40 + sd x27, 216(sp) 41 + sd x28, 224(sp) 42 + sd x29, 232(sp) 43 + sd x30, 240(sp) 44 + sd x31, 248(sp) 45 45 csrr s0, CSR_SEPC 46 46 csrr s1, CSR_SSTATUS 47 - csrr s2, CSR_SCAUSE 48 - sd s0, 248(sp) 47 + csrr s2, CSR_STVAL 48 + csrr s3, CSR_SCAUSE 49 + sd s0, 0(sp) 49 50 sd s1, 256(sp) 50 51 sd s2, 264(sp) 52 + sd s3, 272(sp) 51 53 .endm 52 54 53 55 .macro restore_context 56 + ld s3, 272(sp) 54 57 ld s2, 264(sp) 55 58 ld s1, 256(sp) 56 - ld s0, 248(sp) 57 - csrw CSR_SCAUSE, s2 59 + ld s0, 0(sp) 60 + csrw CSR_SCAUSE, s3 58 61 csrw CSR_SSTATUS, s1 59 62 csrw CSR_SEPC, s0 60 - ld x31, 240(sp) 61 - ld x30, 232(sp) 62 - ld x29, 224(sp) 63 - ld x28, 216(sp) 64 - ld x27, 208(sp) 65 - ld x26, 200(sp) 66 - ld x25, 192(sp) 67 - ld x24, 184(sp) 68 - ld x23, 176(sp) 69 - ld x22, 168(sp) 70 - ld x21, 160(sp) 71 - ld x20, 152(sp) 72 - ld x19, 144(sp) 73 - ld x18, 136(sp) 74 - ld x17, 128(sp) 75 - ld x16, 120(sp) 76 - ld x15, 112(sp) 77 - ld x14, 104(sp) 78 - ld x13, 96(sp) 79 - ld x12, 88(sp) 80 - ld x11, 80(sp) 81 - ld x10, 72(sp) 82 - ld x9, 64(sp) 83 - ld x8, 56(sp) 84 - ld x7, 48(sp) 85 - ld x6, 40(sp) 86 - ld x5, 32(sp) 87 - ld x4, 24(sp) 88 - ld x3, 16(sp) 89 - ld x2, 8(sp) 90 - ld x1, 0(sp) 91 - addi sp, sp, (8*34) 63 + ld x31, 248(sp) 64 + ld x30, 240(sp) 65 + ld x29, 232(sp) 66 + ld x28, 224(sp) 67 + ld x27, 216(sp) 68 + ld x26, 208(sp) 69 + ld x25, 200(sp) 70 + ld x24, 192(sp) 71 + ld x23, 184(sp) 72 + ld x22, 176(sp) 73 + ld x21, 168(sp) 74 + ld x20, 160(sp) 75 + ld x19, 152(sp) 76 + ld x18, 144(sp) 77 + ld x17, 136(sp) 78 + ld x16, 128(sp) 79 + ld x15, 120(sp) 80 + ld x14, 112(sp) 81 + ld x13, 104(sp) 82 + ld x12, 96(sp) 83 + ld x11, 88(sp) 84 + ld x10, 80(sp) 85 + ld x9, 72(sp) 86 + ld x8, 64(sp) 87 + ld x7, 56(sp) 88 + ld x6, 48(sp) 89 + ld x5, 40(sp) 90 + ld x4, 32(sp) 91 + ld x3, 24(sp) 92 + ld x2, 16(sp) 93 + ld x1, 8(sp) 94 + addi sp, sp, (8*36) 92 95 .endm 93 96 94 97 .balign 4

+1 -1

tools/testing/selftests/kvm/lib/riscv/processor.c

··· 402 402 exception_handler_fn exception_handlers[NR_VECTORS][NR_EXCEPTIONS]; 403 403 }; 404 404 405 - void route_exception(struct ex_regs *regs) 405 + void route_exception(struct pt_regs *regs) 406 406 { 407 407 struct handlers *handlers = (struct handlers *)exception_handlers; 408 408 int vector = 0, ec;

+1 -1

tools/testing/selftests/kvm/riscv/arch_timer.c

··· 15 15 16 16 static int timer_irq = IRQ_S_TIMER; 17 17 18 - static void guest_irq_handler(struct ex_regs *regs) 18 + static void guest_irq_handler(struct pt_regs *regs) 19 19 { 20 20 uint64_t xcnt, xcnt_diff_us, cmp; 21 21 unsigned int intid = regs->cause & ~CAUSE_IRQ_FLAG;

+1 -1

tools/testing/selftests/kvm/riscv/ebreak_test.c

··· 27 27 GUEST_DONE(); 28 28 } 29 29 30 - static void guest_breakpoint_handler(struct ex_regs *regs) 30 + static void guest_breakpoint_handler(struct pt_regs *regs) 31 31 { 32 32 WRITE_ONCE(sw_bp_addr, regs->epc); 33 33 regs->epc += 4;

+132

tools/testing/selftests/kvm/riscv/get-reg-list.c

··· 17 17 VCPU_FEATURE_SBI_EXT, 18 18 }; 19 19 20 + enum { 21 + KVM_RISC_V_REG_OFFSET_VSTART = 0, 22 + KVM_RISC_V_REG_OFFSET_VL, 23 + KVM_RISC_V_REG_OFFSET_VTYPE, 24 + KVM_RISC_V_REG_OFFSET_VCSR, 25 + KVM_RISC_V_REG_OFFSET_VLENB, 26 + KVM_RISC_V_REG_OFFSET_MAX, 27 + }; 28 + 20 29 static bool isa_ext_cant_disable[KVM_RISCV_ISA_EXT_MAX]; 21 30 22 31 bool filter_reg(__u64 reg) ··· 152 143 return err == EINVAL; 153 144 } 154 145 146 + static int override_vector_reg_size(struct kvm_vcpu *vcpu, struct vcpu_reg_sublist *s, 147 + uint64_t feature) 148 + { 149 + unsigned long vlenb_reg = 0; 150 + int rc; 151 + u64 reg, size; 152 + 153 + /* Enable V extension so that we can get the vlenb register */ 154 + rc = __vcpu_set_reg(vcpu, feature, 1); 155 + if (rc) 156 + return rc; 157 + 158 + vlenb_reg = vcpu_get_reg(vcpu, s->regs[KVM_RISC_V_REG_OFFSET_VLENB]); 159 + if (!vlenb_reg) { 160 + TEST_FAIL("Can't compute vector register size from zero vlenb\n"); 161 + return -EPERM; 162 + } 163 + 164 + size = __builtin_ctzl(vlenb_reg); 165 + size <<= KVM_REG_SIZE_SHIFT; 166 + 167 + for (int i = 0; i < 32; i++) { 168 + reg = KVM_REG_RISCV | KVM_REG_RISCV_VECTOR | size | KVM_REG_RISCV_VECTOR_REG(i); 169 + s->regs[KVM_RISC_V_REG_OFFSET_MAX + i] = reg; 170 + } 171 + 172 + /* We should assert if disabling failed here while enabling succeeded before */ 173 + vcpu_set_reg(vcpu, feature, 0); 174 + 175 + return 0; 176 + } 177 + 155 178 void finalize_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reg_list *c) 156 179 { 157 180 unsigned long isa_ext_state[KVM_RISCV_ISA_EXT_MAX] = { 0 }; ··· 213 172 if (!s->feature) 214 173 continue; 215 174 175 + if (s->feature == KVM_RISCV_ISA_EXT_V) { 176 + feature = RISCV_ISA_EXT_REG(s->feature); 177 + rc = override_vector_reg_size(vcpu, s, feature); 178 + if (rc) 179 + goto skip; 180 + } 181 + 216 182 switch (s->feature_type) { 217 183 case VCPU_FEATURE_ISA_EXT: 218 184 feature = RISCV_ISA_EXT_REG(s->feature); ··· 234 186 /* Try to enable the desired extension */ 235 187 __vcpu_set_reg(vcpu, feature, 1); 236 188 189 + skip: 237 190 /* Double check whether the desired extension was enabled */ 238 191 __TEST_REQUIRE(__vcpu_has_ext(vcpu, feature), 239 192 "%s not available, skipping tests", s->name); ··· 454 405 return strdup_printf("KVM_REG_RISCV_FP_D_REG(f[%lld])", reg_off); 455 406 case KVM_REG_RISCV_FP_D_REG(fcsr): 456 407 return "KVM_REG_RISCV_FP_D_REG(fcsr)"; 408 + } 409 + 410 + return strdup_printf("%lld /* UNKNOWN */", reg_off); 411 + } 412 + 413 + static const char *vector_id_to_str(const char *prefix, __u64 id) 414 + { 415 + /* reg_off is the offset into struct __riscv_v_ext_state */ 416 + __u64 reg_off = id & ~(REG_MASK | KVM_REG_RISCV_VECTOR); 417 + int reg_index = 0; 418 + 419 + assert((id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_VECTOR); 420 + 421 + if (reg_off >= KVM_REG_RISCV_VECTOR_REG(0)) 422 + reg_index = reg_off - KVM_REG_RISCV_VECTOR_REG(0); 423 + switch (reg_off) { 424 + case KVM_REG_RISCV_VECTOR_REG(0) ... 425 + KVM_REG_RISCV_VECTOR_REG(31): 426 + return strdup_printf("KVM_REG_RISCV_VECTOR_REG(%d)", reg_index); 427 + case KVM_REG_RISCV_VECTOR_CSR_REG(vstart): 428 + return "KVM_REG_RISCV_VECTOR_CSR_REG(vstart)"; 429 + case KVM_REG_RISCV_VECTOR_CSR_REG(vl): 430 + return "KVM_REG_RISCV_VECTOR_CSR_REG(vl)"; 431 + case KVM_REG_RISCV_VECTOR_CSR_REG(vtype): 432 + return "KVM_REG_RISCV_VECTOR_CSR_REG(vtype)"; 433 + case KVM_REG_RISCV_VECTOR_CSR_REG(vcsr): 434 + return "KVM_REG_RISCV_VECTOR_CSR_REG(vcsr)"; 435 + case KVM_REG_RISCV_VECTOR_CSR_REG(vlenb): 436 + return "KVM_REG_RISCV_VECTOR_CSR_REG(vlenb)"; 457 437 } 458 438 459 439 return strdup_printf("%lld /* UNKNOWN */", reg_off); ··· 717 639 case KVM_REG_SIZE_U128: 718 640 reg_size = "KVM_REG_SIZE_U128"; 719 641 break; 642 + case KVM_REG_SIZE_U256: 643 + reg_size = "KVM_REG_SIZE_U256"; 644 + break; 720 645 default: 721 646 printf("\tKVM_REG_RISCV | (%lld << KVM_REG_SIZE_SHIFT) | 0x%llx /* UNKNOWN */,\n", 722 647 (id & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT, id & ~REG_MASK); ··· 750 669 case KVM_REG_RISCV_FP_D: 751 670 printf("\tKVM_REG_RISCV | %s | KVM_REG_RISCV_FP_D | %s,\n", 752 671 reg_size, fp_d_id_to_str(prefix, id)); 672 + break; 673 + case KVM_REG_RISCV_VECTOR: 674 + printf("\tKVM_REG_RISCV | %s | KVM_REG_RISCV_VECTOR | %s,\n", 675 + reg_size, vector_id_to_str(prefix, id)); 753 676 break; 754 677 case KVM_REG_RISCV_ISA_EXT: 755 678 printf("\tKVM_REG_RISCV | %s | KVM_REG_RISCV_ISA_EXT | %s,\n", ··· 959 874 KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_D, 960 875 }; 961 876 877 + /* Define a default vector registers with length. This will be overwritten at runtime */ 878 + static __u64 vector_regs[] = { 879 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_CSR_REG(vstart), 880 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_CSR_REG(vl), 881 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_CSR_REG(vtype), 882 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_CSR_REG(vcsr), 883 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_CSR_REG(vlenb), 884 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(0), 885 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(1), 886 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(2), 887 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(3), 888 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(4), 889 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(5), 890 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(6), 891 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(7), 892 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(8), 893 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(9), 894 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(10), 895 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(11), 896 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(12), 897 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(13), 898 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(14), 899 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(15), 900 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(16), 901 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(17), 902 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(18), 903 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(19), 904 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(20), 905 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(21), 906 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(22), 907 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(23), 908 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(24), 909 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(25), 910 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(26), 911 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(27), 912 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(28), 913 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(29), 914 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(30), 915 + KVM_REG_RISCV | KVM_REG_SIZE_U128 | KVM_REG_RISCV_VECTOR | KVM_REG_RISCV_VECTOR_REG(31), 916 + KVM_REG_RISCV | KVM_REG_SIZE_ULONG | KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V, 917 + }; 918 + 962 919 #define SUBLIST_BASE \ 963 920 {"base", .regs = base_regs, .regs_n = ARRAY_SIZE(base_regs), \ 964 921 .skips_set = base_skips_set, .skips_set_n = ARRAY_SIZE(base_skips_set),} ··· 1024 897 #define SUBLIST_FP_D \ 1025 898 {"fp_d", .feature = KVM_RISCV_ISA_EXT_D, .regs = fp_d_regs, \ 1026 899 .regs_n = ARRAY_SIZE(fp_d_regs),} 900 + 901 + #define SUBLIST_V \ 902 + {"v", .feature = KVM_RISCV_ISA_EXT_V, .regs = vector_regs, .regs_n = ARRAY_SIZE(vector_regs),} 1027 903 1028 904 #define KVM_ISA_EXT_SIMPLE_CONFIG(ext, extu) \ 1029 905 static __u64 regs_##ext[] = { \ ··· 1096 966 KVM_ISA_EXT_SUBLIST_CONFIG(aia, AIA); 1097 967 KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F); 1098 968 KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D); 969 + KVM_ISA_EXT_SUBLIST_CONFIG(v, V); 1099 970 KVM_ISA_EXT_SIMPLE_CONFIG(h, H); 1100 971 KVM_ISA_EXT_SIMPLE_CONFIG(smnpm, SMNPM); 1101 972 KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN); ··· 1171 1040 &config_fp_f, 1172 1041 &config_fp_d, 1173 1042 &config_h, 1043 + &config_v, 1174 1044 &config_smnpm, 1175 1045 &config_smstateen, 1176 1046 &config_sscofpmf,

+21 -3

tools/testing/selftests/kvm/riscv/sbi_pmu_test.c

··· 73 73 74 74 switch (csr_num) { 75 75 switchcase_csr_read_32(CSR_CYCLE, ret) 76 - switchcase_csr_read_32(CSR_CYCLEH, ret) 77 76 default : 78 77 break; 79 78 } ··· 127 128 "Unable to stop counter %ld error %ld\n", counter, ret.error); 128 129 } 129 130 130 - static void guest_illegal_exception_handler(struct ex_regs *regs) 131 + static void guest_illegal_exception_handler(struct pt_regs *regs) 131 132 { 133 + unsigned long insn; 134 + int opcode, csr_num, funct3; 135 + 132 136 __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL, 133 137 "Unexpected exception handler %lx\n", regs->cause); 138 + 139 + insn = regs->badaddr; 140 + opcode = (insn & INSN_OPCODE_MASK) >> INSN_OPCODE_SHIFT; 141 + __GUEST_ASSERT(opcode == INSN_OPCODE_SYSTEM, 142 + "Unexpected instruction with opcode 0x%x insn 0x%lx\n", opcode, insn); 143 + 144 + csr_num = GET_CSR_NUM(insn); 145 + funct3 = GET_RM(insn); 146 + /* Validate if it is a CSR read/write operation */ 147 + __GUEST_ASSERT(funct3 <= 7 && (funct3 != 0 && funct3 != 4), 148 + "Unexpected system opcode with funct3 0x%x csr_num 0x%x\n", 149 + funct3, csr_num); 150 + 151 + /* Validate if it is a HPMCOUNTER CSR operation */ 152 + __GUEST_ASSERT((csr_num >= CSR_CYCLE && csr_num <= CSR_HPMCOUNTER31), 153 + "Unexpected csr_num 0x%x\n", csr_num); 134 154 135 155 illegal_handler_invoked = true; 136 156 /* skip the trapping instruction */ 137 157 regs->epc += 4; 138 158 } 139 159 140 - static void guest_irq_handler(struct ex_regs *regs) 160 + static void guest_irq_handler(struct pt_regs *regs) 141 161 { 142 162 unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG; 143 163 struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;

+1 -1

tools/testing/selftests/kvm/set_memory_region_test.c

··· 350 350 struct kvm_vm *vm; 351 351 int r, i; 352 352 353 - #if defined __aarch64__ || defined __riscv || defined __x86_64__ 353 + #if defined __aarch64__ || defined __riscv || defined __x86_64__ || defined __loongarch__ 354 354 supported_flags |= KVM_MEM_READONLY; 355 355 #endif 356 356

+6 -5

virt/kvm/dirty_ring.c

··· 11 11 #include <trace/events/kvm.h> 12 12 #include "kvm_mm.h" 13 13 14 - int __weak kvm_cpu_dirty_log_size(void) 14 + int __weak kvm_cpu_dirty_log_size(struct kvm *kvm) 15 15 { 16 16 return 0; 17 17 } 18 18 19 - u32 kvm_dirty_ring_get_rsvd_entries(void) 19 + u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm) 20 20 { 21 - return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(); 21 + return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(kvm); 22 22 } 23 23 24 24 bool kvm_use_dirty_bitmap(struct kvm *kvm) ··· 74 74 KVM_MMU_UNLOCK(kvm); 75 75 } 76 76 77 - int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size) 77 + int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, 78 + int index, u32 size) 78 79 { 79 80 ring->dirty_gfns = vzalloc(size); 80 81 if (!ring->dirty_gfns) 81 82 return -ENOMEM; 82 83 83 84 ring->size = size / sizeof(struct kvm_dirty_gfn); 84 - ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(); 85 + ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(kvm); 85 86 ring->dirty_index = 0; 86 87 ring->reset_index = 0; 87 88 ring->index = index;

+9 -17

virt/kvm/kvm_main.c

··· 143 143 #define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \ 144 144 .open = kvm_no_compat_open 145 145 #endif 146 - static int kvm_enable_virtualization(void); 147 - static void kvm_disable_virtualization(void); 148 146 149 147 static void kvm_io_bus_destroy(struct kvm_io_bus *bus); 150 148 ··· 4124 4126 goto vcpu_free_run_page; 4125 4127 4126 4128 if (kvm->dirty_ring_size) { 4127 - r = kvm_dirty_ring_alloc(&vcpu->dirty_ring, 4129 + r = kvm_dirty_ring_alloc(kvm, &vcpu->dirty_ring, 4128 4130 id, kvm->dirty_ring_size); 4129 4131 if (r) 4130 4132 goto arch_vcpu_destroy; ··· 4862 4864 return -EINVAL; 4863 4865 4864 4866 /* Should be bigger to keep the reserved entries, or a page */ 4865 - if (size < kvm_dirty_ring_get_rsvd_entries() * 4867 + if (size < kvm_dirty_ring_get_rsvd_entries(kvm) * 4866 4868 sizeof(struct kvm_dirty_gfn) || size < PAGE_SIZE) 4867 4869 return -EINVAL; 4868 4870 ··· 5477 5479 }; 5478 5480 5479 5481 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING 5480 - static bool enable_virt_at_load = true; 5482 + bool enable_virt_at_load = true; 5481 5483 module_param(enable_virt_at_load, bool, 0444); 5484 + EXPORT_SYMBOL_GPL(enable_virt_at_load); 5482 5485 5483 5486 __visible bool kvm_rebooting; 5484 5487 EXPORT_SYMBOL_GPL(kvm_rebooting); ··· 5588 5589 .shutdown = kvm_shutdown, 5589 5590 }; 5590 5591 5591 - static int kvm_enable_virtualization(void) 5592 + int kvm_enable_virtualization(void) 5592 5593 { 5593 5594 int r; 5594 5595 ··· 5633 5634 --kvm_usage_count; 5634 5635 return r; 5635 5636 } 5637 + EXPORT_SYMBOL_GPL(kvm_enable_virtualization); 5636 5638 5637 - static void kvm_disable_virtualization(void) 5639 + void kvm_disable_virtualization(void) 5638 5640 { 5639 5641 guard(mutex)(&kvm_usage_lock); 5640 5642 ··· 5646 5646 cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); 5647 5647 kvm_arch_disable_virtualization(); 5648 5648 } 5649 + EXPORT_SYMBOL_GPL(kvm_disable_virtualization); 5649 5650 5650 5651 static int kvm_init_virtualization(void) 5651 5652 { ··· 5662 5661 kvm_disable_virtualization(); 5663 5662 } 5664 5663 #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */ 5665 - static int kvm_enable_virtualization(void) 5666 - { 5667 - return 0; 5668 - } 5669 - 5670 5664 static int kvm_init_virtualization(void) 5671 5665 { 5672 5666 return 0; 5673 - } 5674 - 5675 - static void kvm_disable_virtualization(void) 5676 - { 5677 - 5678 5667 } 5679 5668 5680 5669 static void kvm_uninit_virtualization(void) ··· 5855 5864 r = __kvm_io_bus_read(vcpu, bus, &range, val); 5856 5865 return r < 0 ? r : 0; 5857 5866 } 5867 + EXPORT_SYMBOL_GPL(kvm_io_bus_read); 5858 5868 5859 5869 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, 5860 5870 int len, struct kvm_io_device *dev)

Configure Feed

Configure Feed