Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
"Generic:

- Rework almost all of KVM's exports to expose symbols only to KVM's
x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

x86:

- Rework almost all of KVM x86's exports to expose symbols only to
KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

- Add support for virtualizing Control-flow Enforcement Technology
(CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
(Shadow Stacks).

It is worth noting that while SHSTK and IBT can be enabled
separately in CPUID, it is not really possible to virtualize them
separately. Therefore, Intel processors will really allow both
SHSTK and IBT under the hood if either is made visible in the
guest's CPUID. The alternative would be to intercept
XSAVES/XRSTORS, which is not feasible for performance reasons

- Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
when completing userspace I/O. KVM has already committed to
allowing L2 to to perform I/O at that point

- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
MSR is supposed to exist for v2 PMUs

- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside
of forced emulation and other contrived testing scenarios)

- Clean up the MSR APIs in preparation for CET and FRED
virtualization, as well as mediated vPMU support

- Clean up a pile of PMU code in anticipation of adding support for
mediated vPMUs

- Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
vmexits needed to faithfully emulate an I/O APIC for such guests

- Many cleanups and minor fixes

- Recover possible NX huge pages within the TDP MMU under read lock
to reduce guest jitter when restoring NX huge pages

- Return -EAGAIN during prefault if userspace concurrently
deletes/moves the relevant memslot, to fix an issue where
prefaulting could deadlock with the memslot update

x86 (AMD):

- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
supported

- Require a minimum GHCB version of 2 when starting SEV-SNP guests
via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
errors instead of latent guest failures

- Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
prevents unauthorized CPU accesses from reading the ciphertext of
SNP guest private memory, e.g. to attempt an offline attack. This
feature splits the shared SEV-ES/SEV-SNP ASID space into separate
ranges for SEV-ES and SEV-SNP guests, therefore a new module
parameter is needed to control the number of ASIDs that can be used
for VMs with CipherText Hiding vs. how many can be used to run
SEV-ES guests

- Add support for Secure TSC for SEV-SNP guests, which prevents the
untrusted host from tampering with the guest's TSC frequency, while
still allowing the the VMM to configure the guest's TSC frequency
prior to launch

- Validate the XCR0 provided by the guest (via the GHCB) to avoid
bugs resulting from bogus XCR0 values

- Save an SEV guest's policy if and only if LAUNCH_START fully
succeeds to avoid leaving behind stale state (thankfully not
consumed in KVM)

- Explicitly reject non-positive effective lengths during SNP's
LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
them

- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
host's desired TSC_AUX, to fix a bug where KVM was keeping a
different vCPU's TSC_AUX in the host MSR until return to userspace

KVM (Intel):

- Preparation for FRED support

- Don't retry in TDX's anti-zero-step mitigation if the target
memslot is invalid, i.e. is being deleted or moved, to fix a
deadlock scenario similar to the aforementioned prefaulting case

- Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: x86: Export KVM-internal symbols for sub-modules only
KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
KVM: Export KVM-internal symbols for sub-modules only
KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
KVM: VMX: Make CR4.CET a guest owned bit
KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
KVM: selftests: Add coverage for KVM-defined registers in MSRs test
KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
KVM: selftests: Extend MSRs test to validate vCPUs without supported features
KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
KVM: selftests: Add an MSR test to exercise guest/host and read/write
KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
KVM: x86: Define Control Protection Exception (#CP) vector
KVM: x86: Add human friendly formatting for #XM, and #VE
KVM: SVM: Enable shadow stack virtualization for SVM
KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
KVM: SVM: Pass through shadow stack MSRs as appropriate
KVM: SVM: Update dump_vmcb with shadow stack save area additions
KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
...

+3205 -1236
+21
Documentation/admin-guide/kernel-parameters.txt
··· 2962 2962 (enabled). Disable by KVM if hardware lacks support 2963 2963 for NPT. 2964 2964 2965 + kvm-amd.ciphertext_hiding_asids= 2966 + [KVM,AMD] Ciphertext hiding prevents disallowed accesses 2967 + to SNP private memory from reading ciphertext. Instead, 2968 + reads will see constant default values (0xff). 2969 + 2970 + If ciphertext hiding is enabled, the joint SEV-ES and 2971 + SEV-SNP ASID space is partitioned into separate SEV-ES 2972 + and SEV-SNP ASID ranges, with the SEV-SNP range being 2973 + [1..max_snp_asid] and the SEV-ES range being 2974 + (max_snp_asid..min_sev_asid), where min_sev_asid is 2975 + enumerated by CPUID.0x.8000_001F[EDX]. 2976 + 2977 + A non-zero value enables SEV-SNP ciphertext hiding and 2978 + adjusts the ASID ranges for SEV-ES and SEV-SNP guests. 2979 + KVM caps the number of SEV-SNP ASIDs at the maximum 2980 + possible value, e.g. specifying -1u will assign all 2981 + joint SEV-ES and SEV-SNP ASIDs to SEV-SNP. Note, 2982 + assigning all joint ASIDs to SEV-SNP, i.e. configuring 2983 + max_snp_asid == min_sev_asid-1, will effectively make 2984 + SEV-ES unusable. 2985 + 2965 2986 kvm-arm.mode= 2966 2987 [KVM,ARM,EARLY] Select one of KVM/arm64's modes of 2967 2988 operation.
+19 -1
Documentation/virt/kvm/api.rst
··· 2908 2908 2909 2909 0x9030 0000 0002 <reg:16> 2910 2910 2911 + x86 MSR registers have the following id bit patterns:: 2912 + 0x2030 0002 <msr number:32> 2913 + 2914 + Following are the KVM-defined registers for x86: 2915 + 2916 + ======================= ========= ============================================= 2917 + Encoding Register Description 2918 + ======================= ========= ============================================= 2919 + 0x2030 0003 0000 0000 SSP Shadow Stack Pointer 2920 + ======================= ========= ============================================= 2911 2921 2912 2922 4.69 KVM_GET_ONE_REG 2913 2923 -------------------- ··· 3084 3074 3085 3075 Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2. 3086 3076 See KVM_GET_PIT2 for details on struct kvm_pit_state2. 3077 + 3078 + .. Tip:: 3079 + ``KVM_SET_PIT2`` strictly adheres to the spec of Intel 8254 PIT. For example, 3080 + a ``count`` value of 0 in ``struct kvm_pit_channel_state`` is interpreted as 3081 + 65536, which is the maximum count value. Refer to `Intel 8254 programmable 3082 + interval timer <https://www.scs.stanford.edu/10wi-cs140/pintos/specs/8254.pdf>`_. 3087 3083 3088 3084 This IOCTL replaces the obsolete KVM_SET_PIT. 3089 3085 ··· 3598 3582 --------------------- 3599 3583 3600 3584 :Capability: basic 3601 - :Architectures: arm64, mips, riscv 3585 + :Architectures: arm64, mips, riscv, x86 (if KVM_CAP_ONE_REG) 3602 3586 :Type: vcpu ioctl 3603 3587 :Parameters: struct kvm_reg_list (in/out) 3604 3588 :Returns: 0 on success; -1 on error ··· 3641 3625 3642 3626 - KVM_REG_S390_GBEA 3643 3627 3628 + Note, for x86, all MSRs enumerated by KVM_GET_MSR_INDEX_LIST are supported as 3629 + type KVM_X86_REG_TYPE_MSR, but are NOT enumerated via KVM_GET_REG_LIST. 3644 3630 3645 3631 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) 3646 3632 -----------------------------------------
+3 -3
Documentation/virt/kvm/x86/hypercalls.rst
··· 137 137 Returns KVM_EOPNOTSUPP if the host does not use TSC clocksource, 138 138 or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK. 139 139 140 - 6. KVM_HC_SEND_IPI 140 + 7. KVM_HC_SEND_IPI 141 141 ------------------ 142 142 143 143 :Architecture: x86 ··· 158 158 159 159 Returns the number of CPUs to which the IPIs were delivered successfully. 160 160 161 - 7. KVM_HC_SCHED_YIELD 161 + 8. KVM_HC_SCHED_YIELD 162 162 --------------------- 163 163 164 164 :Architecture: x86 ··· 170 170 :Usage example: When sending a call-function IPI-many to vCPUs, yield if 171 171 any of the IPI target vCPUs was preempted. 172 172 173 - 8. KVM_HC_MAP_GPA_RANGE 173 + 9. KVM_HC_MAP_GPA_RANGE 174 174 ------------------------- 175 175 :Architecture: x86 176 176 :Status: active
-1
arch/powerpc/include/asm/Kbuild
··· 3 3 generated-y += syscall_table_64.h 4 4 generated-y += syscall_table_spu.h 5 5 generic-y += agp.h 6 - generic-y += kvm_types.h 7 6 generic-y += mcs_spinlock.h 8 7 generic-y += qrwlock.h 9 8 generic-y += early_ioremap.h
+15
arch/powerpc/include/asm/kvm_types.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_PPC_KVM_TYPES_H 3 + #define _ASM_PPC_KVM_TYPES_H 4 + 5 + #if IS_MODULE(CONFIG_KVM_BOOK3S_64_PR) && IS_MODULE(CONFIG_KVM_BOOK3S_64_HV) 6 + #define KVM_SUB_MODULES kvm-pr,kvm-hv 7 + #elif IS_MODULE(CONFIG_KVM_BOOK3S_64_PR) 8 + #define KVM_SUB_MODULES kvm-pr 9 + #elif IS_MODULE(CONFIG_KVM_BOOK3S_64_HV) 10 + #define KVM_SUB_MODULES kvm-hv 11 + #else 12 + #undef KVM_SUB_MODULES 13 + #endif 14 + 15 + #endif
+2
arch/s390/include/asm/kvm_host.h
··· 722 722 extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc); 723 723 extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc); 724 724 725 + bool kvm_s390_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa); 726 + 725 727 static inline void kvm_arch_free_memslot(struct kvm *kvm, 726 728 struct kvm_memory_slot *slot) {} 727 729 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
+8
arch/s390/kvm/priv.c
··· 605 605 } 606 606 } 607 607 608 + #if IS_ENABLED(CONFIG_VFIO_AP) 609 + bool kvm_s390_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa) 610 + { 611 + return kvm_is_gpa_in_memslot(kvm, gpa); 612 + } 613 + EXPORT_SYMBOL_FOR_MODULES(kvm_s390_is_gpa_in_memslot, "vfio_ap"); 614 + #endif 615 + 608 616 /* 609 617 * handle_pqap: Handling pqap interception 610 618 * @vcpu: the vcpu having issue the pqap instruction
+2
arch/x86/include/asm/cpufeatures.h
··· 444 444 #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* VM Page Flush MSR is supported */ 445 445 #define X86_FEATURE_SEV_ES (19*32+ 3) /* "sev_es" Secure Encrypted Virtualization - Encrypted State */ 446 446 #define X86_FEATURE_SEV_SNP (19*32+ 4) /* "sev_snp" Secure Encrypted Virtualization - Secure Nested Paging */ 447 + #define X86_FEATURE_SNP_SECURE_TSC (19*32+ 8) /* SEV-SNP Secure TSC */ 447 448 #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* Virtual TSC_AUX */ 448 449 #define X86_FEATURE_SME_COHERENT (19*32+10) /* hardware-enforced cache coherency */ 449 450 #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* "debug_swap" SEV-ES full debug state swap support */ ··· 498 497 #define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */ 499 498 #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */ 500 499 #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */ 500 + #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */ 501 501 502 502 /* 503 503 * BUG word(s)
+1 -1
arch/x86/include/asm/kvm-x86-ops.h
··· 138 138 KVM_X86_OP(apic_init_signal_blocked) 139 139 KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush) 140 140 KVM_X86_OP_OPTIONAL(migrate_timers) 141 - KVM_X86_OP(recalc_msr_intercepts) 141 + KVM_X86_OP(recalc_intercepts) 142 142 KVM_X86_OP(complete_emulated_msr) 143 143 KVM_X86_OP(vcpu_deliver_sipi_vector) 144 144 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+53 -30
arch/x86/include/asm/kvm_host.h
··· 120 120 #define KVM_REQ_TLB_FLUSH_GUEST \ 121 121 KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) 122 122 #define KVM_REQ_APF_READY KVM_ARCH_REQ(28) 123 - #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29) 123 + #define KVM_REQ_RECALC_INTERCEPTS KVM_ARCH_REQ(29) 124 124 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \ 125 125 KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) 126 126 #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ ··· 142 142 | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ 143 143 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ 144 144 | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ 145 - | X86_CR4_LAM_SUP)) 145 + | X86_CR4_LAM_SUP | X86_CR4_CET)) 146 146 147 147 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) 148 148 ··· 267 267 #define PFERR_RSVD_MASK BIT(3) 268 268 #define PFERR_FETCH_MASK BIT(4) 269 269 #define PFERR_PK_MASK BIT(5) 270 + #define PFERR_SS_MASK BIT(6) 270 271 #define PFERR_SGX_MASK BIT(15) 271 272 #define PFERR_GUEST_RMP_MASK BIT_ULL(31) 272 273 #define PFERR_GUEST_FINAL_MASK BIT_ULL(32) ··· 546 545 #define KVM_MAX_NR_GP_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \ 547 546 KVM_MAX_NR_AMD_GP_COUNTERS) 548 547 549 - #define KVM_MAX_NR_INTEL_FIXED_COUTNERS 3 550 - #define KVM_MAX_NR_AMD_FIXED_COUTNERS 0 551 - #define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUTNERS, \ 552 - KVM_MAX_NR_AMD_FIXED_COUTNERS) 548 + #define KVM_MAX_NR_INTEL_FIXED_COUNTERS 3 549 + #define KVM_MAX_NR_AMD_FIXED_COUNTERS 0 550 + #define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \ 551 + KVM_MAX_NR_AMD_FIXED_COUNTERS) 553 552 554 553 struct kvm_pmu { 555 554 u8 version; ··· 579 578 }; 580 579 DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX); 581 580 DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX); 581 + 582 + DECLARE_BITMAP(pmc_counting_instructions, X86_PMC_IDX_MAX); 583 + DECLARE_BITMAP(pmc_counting_branches, X86_PMC_IDX_MAX); 582 584 583 585 u64 ds_area; 584 586 u64 pebs_enable; ··· 775 771 CPUID_7_2_EDX, 776 772 CPUID_24_0_EBX, 777 773 CPUID_8000_0021_ECX, 774 + CPUID_7_1_ECX, 778 775 NR_KVM_CPU_CAPS, 779 776 780 777 NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, ··· 816 811 bool at_instruction_boundary; 817 812 bool tpr_access_reporting; 818 813 bool xfd_no_write_intercept; 819 - u64 ia32_xss; 820 814 u64 microcode_version; 821 815 u64 arch_capabilities; 822 816 u64 perf_capabilities; ··· 876 872 877 873 u64 xcr0; 878 874 u64 guest_supported_xcr0; 875 + u64 ia32_xss; 876 + u64 guest_supported_xss; 879 877 880 878 struct kvm_pio_request pio; 881 879 void *pio_data; ··· 932 926 bool emulate_regs_need_sync_from_vcpu; 933 927 int (*complete_userspace_io)(struct kvm_vcpu *vcpu); 934 928 unsigned long cui_linear_rip; 929 + int cui_rdmsr_imm_reg; 935 930 936 931 gpa_t time; 937 932 s8 pvclock_tsc_shift; ··· 1355 1348 __APICV_INHIBIT_REASON(LOGICAL_ID_ALIASED), \ 1356 1349 __APICV_INHIBIT_REASON(PHYSICAL_ID_TOO_BIG) 1357 1350 1358 - struct kvm_arch { 1359 - unsigned long n_used_mmu_pages; 1360 - unsigned long n_requested_mmu_pages; 1361 - unsigned long n_max_mmu_pages; 1362 - unsigned int indirect_shadow_pages; 1363 - u8 mmu_valid_gen; 1364 - u8 vm_type; 1365 - bool has_private_mem; 1366 - bool has_protected_state; 1367 - bool pre_fault_allowed; 1368 - struct hlist_head *mmu_page_hash; 1369 - struct list_head active_mmu_pages; 1351 + struct kvm_possible_nx_huge_pages { 1370 1352 /* 1371 1353 * A list of kvm_mmu_page structs that, if zapped, could possibly be 1372 1354 * replaced by an NX huge page. A shadow page is on this list if its ··· 1367 1371 * guest attempts to execute from the region then KVM obviously can't 1368 1372 * create an NX huge page (without hanging the guest). 1369 1373 */ 1370 - struct list_head possible_nx_huge_pages; 1374 + struct list_head pages; 1375 + u64 nr_pages; 1376 + }; 1377 + 1378 + enum kvm_mmu_type { 1379 + KVM_SHADOW_MMU, 1380 + #ifdef CONFIG_X86_64 1381 + KVM_TDP_MMU, 1382 + #endif 1383 + KVM_NR_MMU_TYPES, 1384 + }; 1385 + 1386 + struct kvm_arch { 1387 + unsigned long n_used_mmu_pages; 1388 + unsigned long n_requested_mmu_pages; 1389 + unsigned long n_max_mmu_pages; 1390 + unsigned int indirect_shadow_pages; 1391 + u8 mmu_valid_gen; 1392 + u8 vm_type; 1393 + bool has_private_mem; 1394 + bool has_protected_state; 1395 + bool has_protected_eoi; 1396 + bool pre_fault_allowed; 1397 + struct hlist_head *mmu_page_hash; 1398 + struct list_head active_mmu_pages; 1399 + struct kvm_possible_nx_huge_pages possible_nx_huge_pages[KVM_NR_MMU_TYPES]; 1371 1400 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING 1372 1401 struct kvm_page_track_notifier_head track_notifier_head; 1373 1402 #endif ··· 1547 1526 * is held in read mode: 1548 1527 * - tdp_mmu_roots (above) 1549 1528 * - the link field of kvm_mmu_page structs used by the TDP MMU 1550 - * - possible_nx_huge_pages; 1529 + * - possible_nx_huge_pages[KVM_TDP_MMU]; 1551 1530 * - the possible_nx_huge_page_link field of kvm_mmu_page structs used 1552 1531 * by the TDP MMU 1553 1532 * Because the lock is only taken within the MMU lock, strictly ··· 1929 1908 int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu); 1930 1909 1931 1910 void (*migrate_timers)(struct kvm_vcpu *vcpu); 1932 - void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu); 1911 + void (*recalc_intercepts)(struct kvm_vcpu *vcpu); 1933 1912 int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); 1934 1913 1935 1914 void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); ··· 2170 2149 2171 2150 void kvm_enable_efer_bits(u64); 2172 2151 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); 2173 - int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2174 - int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data); 2175 - int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); 2176 - int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2177 - int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); 2152 + int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2153 + int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2154 + int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2155 + int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2156 + int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2157 + int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2178 2158 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu); 2159 + int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 2179 2160 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu); 2161 + int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 2180 2162 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu); 2181 2163 int kvm_emulate_invd(struct kvm_vcpu *vcpu); 2182 2164 int kvm_emulate_mwait(struct kvm_vcpu *vcpu); ··· 2211 2187 unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr); 2212 2188 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu); 2213 2189 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw); 2190 + int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); 2214 2191 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu); 2215 2192 2216 2193 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); ··· 2379 2354 int kvm_find_user_return_msr(u32 msr); 2380 2355 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask); 2381 2356 void kvm_user_return_msr_update_cache(unsigned int index, u64 val); 2357 + u64 kvm_get_user_return_msr(unsigned int slot); 2382 2358 2383 2359 static inline bool kvm_is_supported_user_return_msr(u32 msr) 2384 2360 { ··· 2415 2389 u32 size); 2416 2390 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu); 2417 2391 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu); 2418 - 2419 - bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 2420 - struct kvm_vcpu **dest_vcpu); 2421 2392 2422 2393 static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq) 2423 2394 {
+10
arch/x86/include/asm/kvm_types.h
··· 2 2 #ifndef _ASM_X86_KVM_TYPES_H 3 3 #define _ASM_X86_KVM_TYPES_H 4 4 5 + #if IS_MODULE(CONFIG_KVM_AMD) && IS_MODULE(CONFIG_KVM_INTEL) 6 + #define KVM_SUB_MODULES kvm-amd,kvm-intel 7 + #elif IS_MODULE(CONFIG_KVM_AMD) 8 + #define KVM_SUB_MODULES kvm-amd 9 + #elif IS_MODULE(CONFIG_KVM_INTEL) 10 + #define KVM_SUB_MODULES kvm-intel 11 + #else 12 + #undef KVM_SUB_MODULES 13 + #endif 14 + 5 15 #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40 6 16 7 17 #endif /* _ASM_X86_KVM_TYPES_H */
+4
arch/x86/include/asm/msr-index.h
··· 315 315 #define PERF_CAP_PT_IDX 16 316 316 317 317 #define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6 318 + 319 + #define PERF_CAP_LBR_FMT 0x3f 318 320 #define PERF_CAP_PEBS_TRAP BIT_ULL(6) 319 321 #define PERF_CAP_ARCH_REG BIT_ULL(7) 320 322 #define PERF_CAP_PEBS_FORMAT 0xf00 323 + #define PERF_CAP_FW_WRITES BIT_ULL(13) 321 324 #define PERF_CAP_PEBS_BASELINE BIT_ULL(14) 322 325 #define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17) 323 326 #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \ ··· 750 747 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300 751 748 #define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301 752 749 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302 750 + #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET 0xc0000303 753 751 754 752 /* AMD Hardware Feedback Support MSRs */ 755 753 #define MSR_AMD_WORKLOAD_CLASS_CONFIG 0xc0000500
+1
arch/x86/include/asm/svm.h
··· 299 299 #define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3) 300 300 #define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4) 301 301 #define SVM_SEV_FEAT_DEBUG_SWAP BIT(5) 302 + #define SVM_SEV_FEAT_SECURE_TSC BIT(9) 302 303 303 304 #define VMCB_ALLOWED_SEV_FEATURES_VALID BIT_ULL(63) 304 305
+9
arch/x86/include/asm/vmx.h
··· 106 106 #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 107 107 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 108 108 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 109 + #define VM_EXIT_LOAD_CET_STATE 0x10000000 109 110 110 111 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff 111 112 ··· 120 119 #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 121 120 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 122 121 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 122 + #define VM_ENTRY_LOAD_CET_STATE 0x00100000 123 123 124 124 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff 125 125 ··· 134 132 #define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49) 135 133 #define VMX_BASIC_INOUT BIT_ULL(54) 136 134 #define VMX_BASIC_TRUE_CTLS BIT_ULL(55) 135 + #define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56) 137 136 138 137 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic) 139 138 { ··· 372 369 GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, 373 370 GUEST_SYSENTER_ESP = 0x00006824, 374 371 GUEST_SYSENTER_EIP = 0x00006826, 372 + GUEST_S_CET = 0x00006828, 373 + GUEST_SSP = 0x0000682a, 374 + GUEST_INTR_SSP_TABLE = 0x0000682c, 375 375 HOST_CR0 = 0x00006c00, 376 376 HOST_CR3 = 0x00006c02, 377 377 HOST_CR4 = 0x00006c04, ··· 387 381 HOST_IA32_SYSENTER_EIP = 0x00006c12, 388 382 HOST_RSP = 0x00006c14, 389 383 HOST_RIP = 0x00006c16, 384 + HOST_S_CET = 0x00006c18, 385 + HOST_SSP = 0x00006c1a, 386 + HOST_INTR_SSP_TABLE = 0x00006c1c 390 387 }; 391 388 392 389 /*
+34
arch/x86/include/uapi/asm/kvm.h
··· 35 35 #define MC_VECTOR 18 36 36 #define XM_VECTOR 19 37 37 #define VE_VECTOR 20 38 + #define CP_VECTOR 21 39 + 40 + #define HV_VECTOR 28 41 + #define VC_VECTOR 29 42 + #define SX_VECTOR 30 38 43 39 44 /* Select x86 specific features in <linux/kvm.h> */ 40 45 #define __KVM_HAVE_PIT ··· 415 410 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 416 411 __u64 padding[16]; 417 412 }; 413 + 414 + #define KVM_X86_REG_TYPE_MSR 2 415 + #define KVM_X86_REG_TYPE_KVM 3 416 + 417 + #define KVM_X86_KVM_REG_SIZE(reg) \ 418 + ({ \ 419 + reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0; \ 420 + }) 421 + 422 + #define KVM_X86_REG_TYPE_SIZE(type, reg) \ 423 + ({ \ 424 + __u64 type_size = (__u64)type << 32; \ 425 + \ 426 + type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 : \ 427 + type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) : \ 428 + 0; \ 429 + type_size; \ 430 + }) 431 + 432 + #define KVM_X86_REG_ID(type, index) \ 433 + (KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index) 434 + 435 + #define KVM_X86_REG_MSR(index) \ 436 + KVM_X86_REG_ID(KVM_X86_REG_TYPE_MSR, index) 437 + #define KVM_X86_REG_KVM(index) \ 438 + KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index) 439 + 440 + /* KVM-defined registers starting from 0 */ 441 + #define KVM_REG_GUEST_SSP 0 418 442 419 443 #define KVM_SYNC_X86_REGS (1UL << 0) 420 444 #define KVM_SYNC_X86_SREGS (1UL << 1)
+5 -1
arch/x86/include/uapi/asm/vmx.h
··· 94 94 #define EXIT_REASON_BUS_LOCK 74 95 95 #define EXIT_REASON_NOTIFY 75 96 96 #define EXIT_REASON_TDCALL 77 97 + #define EXIT_REASON_MSR_READ_IMM 84 98 + #define EXIT_REASON_MSR_WRITE_IMM 85 97 99 98 100 #define VMX_EXIT_REASONS \ 99 101 { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ ··· 160 158 { EXIT_REASON_TPAUSE, "TPAUSE" }, \ 161 159 { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ 162 160 { EXIT_REASON_NOTIFY, "NOTIFY" }, \ 163 - { EXIT_REASON_TDCALL, "TDCALL" } 161 + { EXIT_REASON_TDCALL, "TDCALL" }, \ 162 + { EXIT_REASON_MSR_READ_IMM, "MSR_READ_IMM" }, \ 163 + { EXIT_REASON_MSR_WRITE_IMM, "MSR_WRITE_IMM" } 164 164 165 165 #define VMX_EXIT_REASON_FLAGS \ 166 166 { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" }
+1
arch/x86/kernel/cpu/scattered.c
··· 27 27 { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 }, 28 28 { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 }, 29 29 { X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 }, 30 + { X86_FEATURE_MSR_IMM, CPUID_ECX, 5, 0x00000007, 1 }, 30 31 { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 }, 31 32 { X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 }, 32 33 { X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
+49 -9
arch/x86/kvm/cpuid.c
··· 34 34 * aligned to sizeof(unsigned long) because it's not accessed via bitops. 35 35 */ 36 36 u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; 37 - EXPORT_SYMBOL_GPL(kvm_cpu_caps); 37 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_caps); 38 38 39 39 struct cpuid_xstate_sizes { 40 40 u32 eax; ··· 131 131 132 132 return NULL; 133 133 } 134 - EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2); 134 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_cpuid_entry2); 135 135 136 136 static int kvm_check_cpuid(struct kvm_vcpu *vcpu) 137 137 { ··· 263 263 return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0; 264 264 } 265 265 266 + static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu) 267 + { 268 + struct kvm_cpuid_entry2 *best; 269 + 270 + best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1); 271 + if (!best) 272 + return 0; 273 + 274 + return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss; 275 + } 276 + 266 277 static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu, 267 278 struct kvm_cpuid_entry2 *entry, 268 279 unsigned int x86_feature, ··· 316 305 best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1); 317 306 if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || 318 307 cpuid_entry_has(best, X86_FEATURE_XSAVEC))) 319 - best->ebx = xstate_required_size(vcpu->arch.xcr0, true); 308 + best->ebx = xstate_required_size(vcpu->arch.xcr0 | 309 + vcpu->arch.ia32_xss, true); 320 310 } 321 311 322 312 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu) ··· 436 424 } 437 425 438 426 vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu); 427 + vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu); 439 428 440 429 vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu); 441 430 ··· 461 448 * adjustments to the reserved GPA bits. 462 449 */ 463 450 kvm_mmu_after_set_cpuid(vcpu); 451 + 452 + kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu); 464 453 } 465 454 466 455 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu) ··· 946 931 VENDOR_F(WAITPKG), 947 932 F(SGX_LC), 948 933 F(BUS_LOCK_DETECT), 934 + X86_64_F(SHSTK), 949 935 ); 950 936 951 937 /* ··· 955 939 */ 956 940 if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) 957 941 kvm_cpu_cap_clear(X86_FEATURE_PKU); 942 + 943 + /* 944 + * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack 945 + * accesses require "magic" Writable=0,Dirty=1 protection, which KVM 946 + * doesn't know how to emulate or map. 947 + */ 948 + if (!tdp_enabled) 949 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 958 950 959 951 kvm_cpu_cap_init(CPUID_7_EDX, 960 952 F(AVX512_4VNNIW), ··· 981 957 F(AMX_INT8), 982 958 F(AMX_BF16), 983 959 F(FLUSH_L1D), 960 + F(IBT), 984 961 ); 962 + 963 + /* 964 + * Disable support for IBT and SHSTK if KVM is configured to emulate 965 + * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or 966 + * SHSTK, nor does KVM handle Shadow Stack #PFs (see above). 967 + */ 968 + if (allow_smaller_maxphyaddr) { 969 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 970 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 971 + } 985 972 986 973 if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) && 987 974 boot_cpu_has(X86_FEATURE_AMD_IBPB) && ··· 1018 983 F(AMX_FP16), 1019 984 F(AVX_IFMA), 1020 985 F(LAM), 986 + ); 987 + 988 + kvm_cpu_cap_init(CPUID_7_1_ECX, 989 + SCATTERED_F(MSR_IMM), 1021 990 ); 1022 991 1023 992 kvm_cpu_cap_init(CPUID_7_1_EDX, ··· 1261 1222 kvm_cpu_cap_clear(X86_FEATURE_RDPID); 1262 1223 } 1263 1224 } 1264 - EXPORT_SYMBOL_GPL(kvm_set_cpu_caps); 1225 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cpu_caps); 1265 1226 1266 1227 #undef F 1267 1228 #undef SCATTERED_F ··· 1450 1411 goto out; 1451 1412 1452 1413 cpuid_entry_override(entry, CPUID_7_1_EAX); 1414 + cpuid_entry_override(entry, CPUID_7_1_ECX); 1453 1415 cpuid_entry_override(entry, CPUID_7_1_EDX); 1454 1416 entry->ebx = 0; 1455 - entry->ecx = 0; 1456 1417 } 1457 1418 if (max_idx >= 2) { 1458 1419 entry = do_host_cpuid(array, function, 2); ··· 1859 1820 int r; 1860 1821 1861 1822 if (func == CENTAUR_CPUID_SIGNATURE && 1862 - boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR) 1823 + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR && 1824 + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) 1863 1825 return 0; 1864 1826 1865 1827 r = do_cpuid_func(array, func, type); ··· 2041 2001 if (function == 7 && index == 0) { 2042 2002 u64 data; 2043 2003 if ((*ebx & (feature_bit(RTM) | feature_bit(HLE))) && 2044 - !__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && 2004 + !kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) && 2045 2005 (data & TSX_CTRL_CPUID_CLEAR)) 2046 2006 *ebx &= ~(feature_bit(RTM) | feature_bit(HLE)); 2047 2007 } else if (function == 0x80000007) { ··· 2085 2045 used_max_basic); 2086 2046 return exact; 2087 2047 } 2088 - EXPORT_SYMBOL_GPL(kvm_cpuid); 2048 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpuid); 2089 2049 2090 2050 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu) 2091 2051 { ··· 2103 2063 kvm_rdx_write(vcpu, edx); 2104 2064 return kvm_skip_emulated_instruction(vcpu); 2105 2065 } 2106 - EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); 2066 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_cpuid);
+143 -20
arch/x86/kvm/emulate.c
··· 178 178 #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */ 179 179 #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */ 180 180 #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */ 181 + #define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */ 181 182 182 183 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) 183 184 ··· 1554 1553 return linear_write_system(ctxt, addr, desc, sizeof(*desc)); 1555 1554 } 1556 1555 1556 + static bool emulator_is_ssp_invalid(struct x86_emulate_ctxt *ctxt, u8 cpl) 1557 + { 1558 + const u32 MSR_IA32_X_CET = cpl == 3 ? MSR_IA32_U_CET : MSR_IA32_S_CET; 1559 + u64 efer = 0, cet = 0, ssp = 0; 1560 + 1561 + if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET)) 1562 + return false; 1563 + 1564 + if (ctxt->ops->get_msr(ctxt, MSR_EFER, &efer)) 1565 + return true; 1566 + 1567 + /* SSP is guaranteed to be valid if the vCPU was already in 32-bit mode. */ 1568 + if (!(efer & EFER_LMA)) 1569 + return false; 1570 + 1571 + if (ctxt->ops->get_msr(ctxt, MSR_IA32_X_CET, &cet)) 1572 + return true; 1573 + 1574 + if (!(cet & CET_SHSTK_EN)) 1575 + return false; 1576 + 1577 + if (ctxt->ops->get_msr(ctxt, MSR_KVM_INTERNAL_GUEST_SSP, &ssp)) 1578 + return true; 1579 + 1580 + /* 1581 + * On transfer from 64-bit mode to compatibility mode, SSP[63:32] must 1582 + * be 0, i.e. SSP must be a 32-bit value outside of 64-bit mode. 1583 + */ 1584 + return ssp >> 32; 1585 + } 1586 + 1557 1587 static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, 1558 1588 u16 selector, int seg, u8 cpl, 1559 1589 enum x86_transfer_type transfer, ··· 1724 1692 ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); 1725 1693 if (efer & EFER_LMA) 1726 1694 goto exception; 1695 + } 1696 + if (!seg_desc.l && emulator_is_ssp_invalid(ctxt, cpl)) { 1697 + err_code = 0; 1698 + goto exception; 1727 1699 } 1728 1700 1729 1701 /* CS(RPL) <- CPL */ ··· 4104 4068 static const struct opcode group5[] = { 4105 4069 F(DstMem | SrcNone | Lock, em_inc), 4106 4070 F(DstMem | SrcNone | Lock, em_dec), 4107 - I(SrcMem | NearBranch | IsBranch, em_call_near_abs), 4108 - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far), 4071 + I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs), 4072 + I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far), 4109 4073 I(SrcMem | NearBranch | IsBranch, em_jmp_abs), 4110 4074 I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far), 4111 4075 I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined), ··· 4340 4304 DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)), 4341 4305 /* 0x98 - 0x9F */ 4342 4306 D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), 4343 - I(SrcImmFAddr | No64 | IsBranch, em_call_far), N, 4307 + I(SrcImmFAddr | No64 | IsBranch | ShadowStack, em_call_far), N, 4344 4308 II(ImplicitOps | Stack, em_pushf, pushf), 4345 4309 II(ImplicitOps | Stack, em_popf, popf), 4346 4310 I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf), ··· 4360 4324 X8(I(DstReg | SrcImm64 | Mov, em_mov)), 4361 4325 /* 0xC0 - 0xC7 */ 4362 4326 G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2), 4363 - I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm), 4364 - I(ImplicitOps | NearBranch | IsBranch, em_ret), 4327 + I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm), 4328 + I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret), 4365 4329 I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg), 4366 4330 I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg), 4367 4331 G(ByteOp, group11), G(0, group11), 4368 4332 /* 0xC8 - 0xCF */ 4369 - I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter), 4370 - I(Stack | IsBranch, em_leave), 4371 - I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm), 4372 - I(ImplicitOps | IsBranch, em_ret_far), 4373 - D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn), 4333 + I(Stack | SrcImmU16 | Src2ImmByte, em_enter), 4334 + I(Stack, em_leave), 4335 + I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm), 4336 + I(ImplicitOps | IsBranch | ShadowStack, em_ret_far), 4337 + D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn), 4374 4338 D(ImplicitOps | No64 | IsBranch), 4375 - II(ImplicitOps | IsBranch, em_iret, iret), 4339 + II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret), 4376 4340 /* 0xD0 - 0xD7 */ 4377 4341 G(Src2One | ByteOp, group2), G(Src2One, group2), 4378 4342 G(Src2CL | ByteOp, group2), G(Src2CL, group2), ··· 4388 4352 I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in), 4389 4353 I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out), 4390 4354 /* 0xE8 - 0xEF */ 4391 - I(SrcImm | NearBranch | IsBranch, em_call), 4355 + I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call), 4392 4356 D(SrcImm | ImplicitOps | NearBranch | IsBranch), 4393 4357 I(SrcImmFAddr | No64 | IsBranch, em_jmp_far), 4394 4358 D(SrcImmByte | ImplicitOps | NearBranch | IsBranch), ··· 4407 4371 static const struct opcode twobyte_table[256] = { 4408 4372 /* 0x00 - 0x0F */ 4409 4373 G(0, group6), GD(0, &group7), N, N, 4410 - N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall), 4374 + N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_syscall), 4411 4375 II(ImplicitOps | Priv, em_clts, clts), N, 4412 4376 DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, 4413 4377 N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N, ··· 4438 4402 IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc), 4439 4403 II(ImplicitOps | Priv, em_rdmsr, rdmsr), 4440 4404 IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc), 4441 - I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter), 4442 - I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit), 4405 + I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_sysenter), 4406 + I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit), 4443 4407 N, N, 4444 4408 N, N, N, N, N, N, N, N, 4445 4409 /* 0x40 - 0x4F */ ··· 4549 4513 #undef I2bv 4550 4514 #undef I2bvIP 4551 4515 #undef I6ALU 4516 + 4517 + static bool is_shstk_instruction(struct x86_emulate_ctxt *ctxt) 4518 + { 4519 + return ctxt->d & ShadowStack; 4520 + } 4521 + 4522 + static bool is_ibt_instruction(struct x86_emulate_ctxt *ctxt) 4523 + { 4524 + u64 flags = ctxt->d; 4525 + 4526 + if (!(flags & IsBranch)) 4527 + return false; 4528 + 4529 + /* 4530 + * All far JMPs and CALLs (including SYSCALL, SYSENTER, and INTn) are 4531 + * indirect and thus affect IBT state. All far RETs (including SYSEXIT 4532 + * and IRET) are protected via Shadow Stacks and thus don't affect IBT 4533 + * state. IRET #GPs when returning to virtual-8086 and IBT or SHSTK is 4534 + * enabled, but that should be handled by IRET emulation (in the very 4535 + * unlikely scenario that KVM adds support for fully emulating IRET). 4536 + */ 4537 + if (!(flags & NearBranch)) 4538 + return ctxt->execute != em_iret && 4539 + ctxt->execute != em_ret_far && 4540 + ctxt->execute != em_ret_far_imm && 4541 + ctxt->execute != em_sysexit; 4542 + 4543 + switch (flags & SrcMask) { 4544 + case SrcReg: 4545 + case SrcMem: 4546 + case SrcMem16: 4547 + case SrcMem32: 4548 + return true; 4549 + case SrcMemFAddr: 4550 + case SrcImmFAddr: 4551 + /* Far branches should be handled above. */ 4552 + WARN_ON_ONCE(1); 4553 + return true; 4554 + case SrcNone: 4555 + case SrcImm: 4556 + case SrcImmByte: 4557 + /* 4558 + * Note, ImmU16 is used only for the stack adjustment operand on ENTER 4559 + * and RET instructions. ENTER isn't a branch and RET FAR is handled 4560 + * by the NearBranch check above. RET itself isn't an indirect branch. 4561 + */ 4562 + case SrcImmU16: 4563 + return false; 4564 + default: 4565 + WARN_ONCE(1, "Unexpected Src operand '%llx' on branch", 4566 + flags & SrcMask); 4567 + return false; 4568 + } 4569 + } 4552 4570 4553 4571 static unsigned imm_size(struct x86_emulate_ctxt *ctxt) 4554 4572 { ··· 5033 4943 5034 4944 ctxt->execute = opcode.u.execute; 5035 4945 4946 + /* 4947 + * Reject emulation if KVM might need to emulate shadow stack updates 4948 + * and/or indirect branch tracking enforcement, which the emulator 4949 + * doesn't support. 4950 + */ 4951 + if ((is_ibt_instruction(ctxt) || is_shstk_instruction(ctxt)) && 4952 + ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) { 4953 + u64 u_cet = 0, s_cet = 0; 4954 + 4955 + /* 4956 + * Check both User and Supervisor on far transfers as inter- 4957 + * privilege level transfers are impacted by CET at the target 4958 + * privilege level, and that is not known at this time. The 4959 + * expectation is that the guest will not require emulation of 4960 + * any CET-affected instructions at any privilege level. 4961 + */ 4962 + if (!(ctxt->d & NearBranch)) 4963 + u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4964 + else if (ctxt->ops->cpl(ctxt) == 3) 4965 + u_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4966 + else 4967 + s_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4968 + 4969 + if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) || 4970 + (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))) 4971 + return EMULATION_FAILED; 4972 + 4973 + if ((u_cet | s_cet) & CET_SHSTK_EN && is_shstk_instruction(ctxt)) 4974 + return EMULATION_FAILED; 4975 + 4976 + if ((u_cet | s_cet) & CET_ENDBR_EN && is_ibt_instruction(ctxt)) 4977 + return EMULATION_FAILED; 4978 + } 4979 + 5036 4980 if (unlikely(emulation_type & EMULTYPE_TRAP_UD) && 5037 4981 likely(!(ctxt->d & EmulateOnUD))) 5038 4982 return EMULATION_FAILED; ··· 5231 5107 ctxt->mem_read.end = 0; 5232 5108 } 5233 5109 5234 - int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) 5110 + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, bool check_intercepts) 5235 5111 { 5236 5112 const struct x86_emulate_ops *ops = ctxt->ops; 5237 5113 int rc = X86EMUL_CONTINUE; 5238 5114 int saved_dst_type = ctxt->dst.type; 5239 - bool is_guest_mode = ctxt->ops->is_guest_mode(ctxt); 5240 5115 5241 5116 ctxt->mem_read.pos = 0; 5242 5117 ··· 5283 5160 fetch_possible_mmx_operand(&ctxt->dst); 5284 5161 } 5285 5162 5286 - if (unlikely(is_guest_mode) && ctxt->intercept) { 5163 + if (unlikely(check_intercepts) && ctxt->intercept) { 5287 5164 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5288 5165 X86_ICPT_PRE_EXCEPT); 5289 5166 if (rc != X86EMUL_CONTINUE) ··· 5312 5189 goto done; 5313 5190 } 5314 5191 5315 - if (unlikely(is_guest_mode) && (ctxt->d & Intercept)) { 5192 + if (unlikely(check_intercepts) && (ctxt->d & Intercept)) { 5316 5193 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5317 5194 X86_ICPT_POST_EXCEPT); 5318 5195 if (rc != X86EMUL_CONTINUE) ··· 5366 5243 5367 5244 special_insn: 5368 5245 5369 - if (unlikely(is_guest_mode) && (ctxt->d & Intercept)) { 5246 + if (unlikely(check_intercepts) && (ctxt->d & Intercept)) { 5370 5247 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5371 5248 X86_ICPT_POST_MEMACCESS); 5372 5249 if (rc != X86EMUL_CONTINUE)
+7 -9
arch/x86/kvm/hyperv.c
··· 923 923 return false; 924 924 return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED; 925 925 } 926 - EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled); 926 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_hv_assist_page_enabled); 927 927 928 928 int kvm_hv_get_assist_page(struct kvm_vcpu *vcpu) 929 929 { ··· 935 935 return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, 936 936 &hv_vcpu->vp_assist_page, sizeof(struct hv_vp_assist_page)); 937 937 } 938 - EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page); 938 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_hv_get_assist_page); 939 939 940 940 static void stimer_prepare_msg(struct kvm_vcpu_hv_stimer *stimer) 941 941 { ··· 1168 1168 BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence)); 1169 1169 BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0); 1170 1170 1171 - mutex_lock(&hv->hv_lock); 1171 + guard(mutex)(&hv->hv_lock); 1172 1172 1173 1173 if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN || 1174 1174 hv->hv_tsc_page_status == HV_TSC_PAGE_SET || 1175 1175 hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET) 1176 - goto out_unlock; 1176 + return; 1177 1177 1178 1178 if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)) 1179 - goto out_unlock; 1179 + return; 1180 1180 1181 1181 gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; 1182 1182 /* ··· 1192 1192 goto out_err; 1193 1193 1194 1194 hv->hv_tsc_page_status = HV_TSC_PAGE_SET; 1195 - goto out_unlock; 1195 + return; 1196 1196 } 1197 1197 1198 1198 /* ··· 1228 1228 goto out_err; 1229 1229 1230 1230 hv->hv_tsc_page_status = HV_TSC_PAGE_SET; 1231 - goto out_unlock; 1231 + return; 1232 1232 1233 1233 out_err: 1234 1234 hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN; 1235 - out_unlock: 1236 - mutex_unlock(&hv->hv_lock); 1237 1235 } 1238 1236 1239 1237 void kvm_hv_request_tsc_page_update(struct kvm *kvm)
+1 -14
arch/x86/kvm/ioapic.c
··· 1 + // SPDX-License-Identifier: LGPL-2.1-or-later 1 2 /* 2 3 * Copyright (C) 2001 MandrakeSoft S.A. 3 4 * Copyright 2010 Red Hat, Inc. and/or its affiliates. ··· 8 7 * 75002 Paris - France 9 8 * http://www.linux-mandrake.com/ 10 9 * http://www.mandrakesoft.com/ 11 - * 12 - * This library is free software; you can redistribute it and/or 13 - * modify it under the terms of the GNU Lesser General Public 14 - * License as published by the Free Software Foundation; either 15 - * version 2 of the License, or (at your option) any later version. 16 - * 17 - * This library is distributed in the hope that it will be useful, 18 - * but WITHOUT ANY WARRANTY; without even the implied warranty of 19 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 20 - * Lesser General Public License for more details. 21 - * 22 - * You should have received a copy of the GNU Lesser General Public 23 - * License along with this library; if not, write to the Free Software 24 - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 25 10 * 26 11 * Yunhong Jiang <yunhong.jiang@intel.com> 27 12 * Yaozu (Eddie) Dong <eddie.dong@intel.com>
+3 -88
arch/x86/kvm/irq.c
··· 103 103 104 104 return kvm_apic_has_interrupt(v) != -1; /* LAPIC */ 105 105 } 106 - EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr); 106 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_has_injectable_intr); 107 107 108 108 /* 109 109 * check if there is pending interrupt without ··· 119 119 120 120 return kvm_apic_has_interrupt(v) != -1; /* LAPIC */ 121 121 } 122 - EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); 122 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_has_interrupt); 123 123 124 124 /* 125 125 * Read pending interrupt(from non-APIC source) ··· 148 148 WARN_ON_ONCE(!irqchip_split(v->kvm)); 149 149 return get_userspace_extint(v); 150 150 } 151 - EXPORT_SYMBOL_GPL(kvm_cpu_get_extint); 151 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_get_extint); 152 152 153 153 /* 154 154 * Read pending interrupt vector and intack. ··· 193 193 bool kvm_arch_irqchip_in_kernel(struct kvm *kvm) 194 194 { 195 195 return irqchip_in_kernel(kvm); 196 - } 197 - 198 - int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 199 - struct kvm_lapic_irq *irq, struct dest_map *dest_map) 200 - { 201 - int r = -1; 202 - struct kvm_vcpu *vcpu, *lowest = NULL; 203 - unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; 204 - unsigned int dest_vcpus = 0; 205 - 206 - if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) 207 - return r; 208 - 209 - if (irq->dest_mode == APIC_DEST_PHYSICAL && 210 - irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) { 211 - pr_info("apic: phys broadcast and lowest prio\n"); 212 - irq->delivery_mode = APIC_DM_FIXED; 213 - } 214 - 215 - memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap)); 216 - 217 - kvm_for_each_vcpu(i, vcpu, kvm) { 218 - if (!kvm_apic_present(vcpu)) 219 - continue; 220 - 221 - if (!kvm_apic_match_dest(vcpu, src, irq->shorthand, 222 - irq->dest_id, irq->dest_mode)) 223 - continue; 224 - 225 - if (!kvm_lowest_prio_delivery(irq)) { 226 - if (r < 0) 227 - r = 0; 228 - r += kvm_apic_set_irq(vcpu, irq, dest_map); 229 - } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { 230 - if (!kvm_vector_hashing_enabled()) { 231 - if (!lowest) 232 - lowest = vcpu; 233 - else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 234 - lowest = vcpu; 235 - } else { 236 - __set_bit(i, dest_vcpu_bitmap); 237 - dest_vcpus++; 238 - } 239 - } 240 - } 241 - 242 - if (dest_vcpus != 0) { 243 - int idx = kvm_vector_to_index(irq->vector, dest_vcpus, 244 - dest_vcpu_bitmap, KVM_MAX_VCPUS); 245 - 246 - lowest = kvm_get_vcpu(kvm, idx); 247 - } 248 - 249 - if (lowest) 250 - r = kvm_apic_set_irq(lowest, irq, dest_map); 251 - 252 - return r; 253 196 } 254 197 255 198 static void kvm_msi_to_lapic_irq(struct kvm *kvm, ··· 353 410 354 411 return 0; 355 412 } 356 - 357 - bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 358 - struct kvm_vcpu **dest_vcpu) 359 - { 360 - int r = 0; 361 - unsigned long i; 362 - struct kvm_vcpu *vcpu; 363 - 364 - if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu)) 365 - return true; 366 - 367 - kvm_for_each_vcpu(i, vcpu, kvm) { 368 - if (!kvm_apic_present(vcpu)) 369 - continue; 370 - 371 - if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand, 372 - irq->dest_id, irq->dest_mode)) 373 - continue; 374 - 375 - if (++r == 2) 376 - return false; 377 - 378 - *dest_vcpu = vcpu; 379 - } 380 - 381 - return r == 1; 382 - } 383 - EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu); 384 413 385 414 void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode, 386 415 u8 vector, unsigned long *ioapic_handled_vectors)
-4
arch/x86/kvm/irq.h
··· 121 121 122 122 int apic_has_pending_timer(struct kvm_vcpu *vcpu); 123 123 124 - int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 125 - struct kvm_lapic_irq *irq, 126 - struct dest_map *dest_map); 127 - 128 124 #endif
+2 -1
arch/x86/kvm/kvm_cache_regs.h
··· 7 7 #define KVM_POSSIBLE_CR0_GUEST_BITS (X86_CR0_TS | X86_CR0_WP) 8 8 #define KVM_POSSIBLE_CR4_GUEST_BITS \ 9 9 (X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \ 10 - | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE) 10 + | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \ 11 + | X86_CR4_CET) 11 12 12 13 #define X86_CR0_PDPTR_BITS (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG) 13 14 #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
+1 -2
arch/x86/kvm/kvm_emulate.h
··· 235 235 void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked); 236 236 237 237 bool (*is_smm)(struct x86_emulate_ctxt *ctxt); 238 - bool (*is_guest_mode)(struct x86_emulate_ctxt *ctxt); 239 238 int (*leave_smm)(struct x86_emulate_ctxt *ctxt); 240 239 void (*triple_fault)(struct x86_emulate_ctxt *ctxt); 241 240 int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr); ··· 520 521 #define EMULATION_RESTART 1 521 522 #define EMULATION_INTERCEPTED 2 522 523 void init_decode_cache(struct x86_emulate_ctxt *ctxt); 523 - int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); 524 + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, bool check_intercepts); 524 525 int emulator_task_switch(struct x86_emulate_ctxt *ctxt, 525 526 u16 tss_selector, int idt_index, int reason, 526 527 bool has_error_code, u32 error_code);
+3 -3
arch/x86/kvm/kvm_onhyperv.c
··· 101 101 102 102 return __hv_flush_remote_tlbs_range(kvm, &range); 103 103 } 104 - EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs_range); 104 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_flush_remote_tlbs_range); 105 105 106 106 int hv_flush_remote_tlbs(struct kvm *kvm) 107 107 { 108 108 return __hv_flush_remote_tlbs_range(kvm, NULL); 109 109 } 110 - EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs); 110 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_flush_remote_tlbs); 111 111 112 112 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp) 113 113 { ··· 121 121 spin_unlock(&kvm_arch->hv_root_tdp_lock); 122 122 } 123 123 } 124 - EXPORT_SYMBOL_GPL(hv_track_root_tdp); 124 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_track_root_tdp);
+179 -65
arch/x86/kvm/lapic.c
··· 74 74 #define LAPIC_TIMER_ADVANCE_NS_MAX 5000 75 75 /* step-by-step approximation to mitigate fluctuation */ 76 76 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 77 + 78 + static bool __read_mostly vector_hashing_enabled = true; 79 + module_param_named(vector_hashing, vector_hashing_enabled, bool, 0444); 80 + 77 81 static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data); 78 82 static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data); 79 83 ··· 106 102 } 107 103 108 104 __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); 109 - EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); 105 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu); 110 106 111 107 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); 112 108 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); ··· 134 130 (kvm_mwait_in_guest(vcpu->kvm) || kvm_hlt_in_guest(vcpu->kvm)); 135 131 } 136 132 137 - bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu) 133 + static bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu) 138 134 { 139 135 return kvm_x86_ops.set_hv_timer 140 136 && !(kvm_mwait_in_guest(vcpu->kvm) || ··· 646 642 return ((max_updated_irr != -1) && 647 643 (max_updated_irr == *max_irr)); 648 644 } 649 - EXPORT_SYMBOL_GPL(__kvm_apic_update_irr); 645 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_apic_update_irr); 650 646 651 647 bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, unsigned long *pir, int *max_irr) 652 648 { ··· 657 653 apic->irr_pending = true; 658 654 return irr_updated; 659 655 } 660 - EXPORT_SYMBOL_GPL(kvm_apic_update_irr); 656 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_irr); 661 657 662 658 static inline int apic_search_irr(struct kvm_lapic *apic) 663 659 { ··· 697 693 { 698 694 apic_clear_irr(vec, vcpu->arch.apic); 699 695 } 700 - EXPORT_SYMBOL_GPL(kvm_apic_clear_irr); 696 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_clear_irr); 701 697 702 698 static void *apic_vector_to_isr(int vec, struct kvm_lapic *apic) 703 699 { ··· 779 775 780 776 kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic)); 781 777 } 782 - EXPORT_SYMBOL_GPL(kvm_apic_update_hwapic_isr); 778 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_hwapic_isr); 783 779 784 780 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) 785 781 { ··· 790 786 */ 791 787 return apic_find_highest_irr(vcpu->arch.apic); 792 788 } 793 - EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr); 789 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_irr); 794 790 795 791 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, 796 792 int vector, int level, int trig_mode, ··· 954 950 { 955 951 apic_update_ppr(vcpu->arch.apic); 956 952 } 957 - EXPORT_SYMBOL_GPL(kvm_apic_update_ppr); 953 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_ppr); 958 954 959 955 static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr) 960 956 { ··· 1065 1061 return false; 1066 1062 } 1067 1063 } 1068 - EXPORT_SYMBOL_GPL(kvm_apic_match_dest); 1064 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_match_dest); 1069 1065 1070 - int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 1071 - const unsigned long *bitmap, u32 bitmap_size) 1066 + static int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 1067 + const unsigned long *bitmap, u32 bitmap_size) 1072 1068 { 1073 - u32 mod; 1074 - int i, idx = -1; 1069 + int idx = find_nth_bit(bitmap, bitmap_size, vector % dest_vcpus); 1075 1070 1076 - mod = vector % dest_vcpus; 1077 - 1078 - for (i = 0; i <= mod; i++) { 1079 - idx = find_next_bit(bitmap, bitmap_size, idx + 1); 1080 - BUG_ON(idx == bitmap_size); 1081 - } 1082 - 1071 + BUG_ON(idx >= bitmap_size); 1083 1072 return idx; 1084 1073 } 1085 1074 ··· 1101 1104 } 1102 1105 1103 1106 return false; 1107 + } 1108 + 1109 + static bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) 1110 + { 1111 + return (irq->delivery_mode == APIC_DM_LOWEST || irq->msi_redir_hint); 1112 + } 1113 + 1114 + static int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) 1115 + { 1116 + return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; 1104 1117 } 1105 1118 1106 1119 /* Return true if the interrupt can be handled by using *bitmap as index mask ··· 1156 1149 if (!kvm_lowest_prio_delivery(irq)) 1157 1150 return true; 1158 1151 1159 - if (!kvm_vector_hashing_enabled()) { 1152 + if (!vector_hashing_enabled) { 1160 1153 lowest = -1; 1161 1154 for_each_set_bit(i, bitmap, 16) { 1162 1155 if (!(*dst)[i]) ··· 1237 1230 * interrupt. 1238 1231 * - Otherwise, use remapped mode to inject the interrupt. 1239 1232 */ 1240 - bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 1241 - struct kvm_vcpu **dest_vcpu) 1233 + static bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, 1234 + struct kvm_lapic_irq *irq, 1235 + struct kvm_vcpu **dest_vcpu) 1242 1236 { 1243 1237 struct kvm_apic_map *map; 1244 1238 unsigned long bitmap; ··· 1264 1256 1265 1257 rcu_read_unlock(); 1266 1258 return ret; 1259 + } 1260 + 1261 + bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 1262 + struct kvm_vcpu **dest_vcpu) 1263 + { 1264 + int r = 0; 1265 + unsigned long i; 1266 + struct kvm_vcpu *vcpu; 1267 + 1268 + if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu)) 1269 + return true; 1270 + 1271 + kvm_for_each_vcpu(i, vcpu, kvm) { 1272 + if (!kvm_apic_present(vcpu)) 1273 + continue; 1274 + 1275 + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand, 1276 + irq->dest_id, irq->dest_mode)) 1277 + continue; 1278 + 1279 + if (++r == 2) 1280 + return false; 1281 + 1282 + *dest_vcpu = vcpu; 1283 + } 1284 + 1285 + return r == 1; 1286 + } 1287 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_intr_is_single_vcpu); 1288 + 1289 + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 1290 + struct kvm_lapic_irq *irq, struct dest_map *dest_map) 1291 + { 1292 + int r = -1; 1293 + struct kvm_vcpu *vcpu, *lowest = NULL; 1294 + unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; 1295 + unsigned int dest_vcpus = 0; 1296 + 1297 + if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) 1298 + return r; 1299 + 1300 + if (irq->dest_mode == APIC_DEST_PHYSICAL && 1301 + irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) { 1302 + pr_info("apic: phys broadcast and lowest prio\n"); 1303 + irq->delivery_mode = APIC_DM_FIXED; 1304 + } 1305 + 1306 + memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap)); 1307 + 1308 + kvm_for_each_vcpu(i, vcpu, kvm) { 1309 + if (!kvm_apic_present(vcpu)) 1310 + continue; 1311 + 1312 + if (!kvm_apic_match_dest(vcpu, src, irq->shorthand, 1313 + irq->dest_id, irq->dest_mode)) 1314 + continue; 1315 + 1316 + if (!kvm_lowest_prio_delivery(irq)) { 1317 + if (r < 0) 1318 + r = 0; 1319 + r += kvm_apic_set_irq(vcpu, irq, dest_map); 1320 + } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { 1321 + if (!vector_hashing_enabled) { 1322 + if (!lowest) 1323 + lowest = vcpu; 1324 + else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 1325 + lowest = vcpu; 1326 + } else { 1327 + __set_bit(i, dest_vcpu_bitmap); 1328 + dest_vcpus++; 1329 + } 1330 + } 1331 + } 1332 + 1333 + if (dest_vcpus != 0) { 1334 + int idx = kvm_vector_to_index(irq->vector, dest_vcpus, 1335 + dest_vcpu_bitmap, KVM_MAX_VCPUS); 1336 + 1337 + lowest = kvm_get_vcpu(kvm, idx); 1338 + } 1339 + 1340 + if (lowest) 1341 + r = kvm_apic_set_irq(lowest, irq, dest_map); 1342 + 1343 + return r; 1267 1344 } 1268 1345 1269 1346 /* ··· 1494 1401 rcu_read_unlock(); 1495 1402 } 1496 1403 1497 - int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) 1498 - { 1499 - return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; 1500 - } 1501 - 1502 1404 static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector) 1503 1405 { 1504 1406 return test_bit(vector, apic->vcpu->arch.ioapic_handled_vectors); ··· 1569 1481 kvm_ioapic_send_eoi(apic, vector); 1570 1482 kvm_make_request(KVM_REQ_EVENT, apic->vcpu); 1571 1483 } 1572 - EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated); 1484 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_eoi_accelerated); 1485 + 1486 + static void kvm_icr_to_lapic_irq(struct kvm_lapic *apic, u32 icr_low, 1487 + u32 icr_high, struct kvm_lapic_irq *irq) 1488 + { 1489 + /* KVM has no delay and should always clear the BUSY/PENDING flag. */ 1490 + WARN_ON_ONCE(icr_low & APIC_ICR_BUSY); 1491 + 1492 + irq->vector = icr_low & APIC_VECTOR_MASK; 1493 + irq->delivery_mode = icr_low & APIC_MODE_MASK; 1494 + irq->dest_mode = icr_low & APIC_DEST_MASK; 1495 + irq->level = (icr_low & APIC_INT_ASSERT) != 0; 1496 + irq->trig_mode = icr_low & APIC_INT_LEVELTRIG; 1497 + irq->shorthand = icr_low & APIC_SHORT_MASK; 1498 + irq->msi_redir_hint = false; 1499 + if (apic_x2apic_mode(apic)) 1500 + irq->dest_id = icr_high; 1501 + else 1502 + irq->dest_id = GET_XAPIC_DEST_FIELD(icr_high); 1503 + } 1573 1504 1574 1505 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high) 1575 1506 { 1576 1507 struct kvm_lapic_irq irq; 1577 1508 1578 - /* KVM has no delay and should always clear the BUSY/PENDING flag. */ 1579 - WARN_ON_ONCE(icr_low & APIC_ICR_BUSY); 1580 - 1581 - irq.vector = icr_low & APIC_VECTOR_MASK; 1582 - irq.delivery_mode = icr_low & APIC_MODE_MASK; 1583 - irq.dest_mode = icr_low & APIC_DEST_MASK; 1584 - irq.level = (icr_low & APIC_INT_ASSERT) != 0; 1585 - irq.trig_mode = icr_low & APIC_INT_LEVELTRIG; 1586 - irq.shorthand = icr_low & APIC_SHORT_MASK; 1587 - irq.msi_redir_hint = false; 1588 - if (apic_x2apic_mode(apic)) 1589 - irq.dest_id = icr_high; 1590 - else 1591 - irq.dest_id = GET_XAPIC_DEST_FIELD(icr_high); 1509 + kvm_icr_to_lapic_irq(apic, icr_low, icr_high, &irq); 1592 1510 1593 1511 trace_kvm_apic_ipi(icr_low, irq.dest_id); 1594 1512 1595 1513 kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL); 1596 1514 } 1597 - EXPORT_SYMBOL_GPL(kvm_apic_send_ipi); 1515 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_send_ipi); 1598 1516 1599 1517 static u32 apic_get_tmcct(struct kvm_lapic *apic) 1600 1518 { ··· 1717 1623 1718 1624 return valid_reg_mask; 1719 1625 } 1720 - EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask); 1626 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_readable_reg_mask); 1721 1627 1722 1628 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len, 1723 1629 void *data) ··· 1958 1864 lapic_timer_int_injected(vcpu)) 1959 1865 __kvm_wait_lapic_expire(vcpu); 1960 1866 } 1961 - EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire); 1867 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_wait_lapic_expire); 1962 1868 1963 1869 static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic) 1964 1870 { ··· 2272 2178 out: 2273 2179 preempt_enable(); 2274 2180 } 2275 - EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer); 2181 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_expired_hv_timer); 2276 2182 2277 2183 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu) 2278 2184 { ··· 2525 2431 { 2526 2432 kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0); 2527 2433 } 2528 - EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); 2434 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_set_eoi); 2529 2435 2530 2436 #define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13)) 2531 2437 2532 - int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data) 2438 + static int __kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data, bool fast) 2533 2439 { 2534 2440 if (data & X2APIC_ICR_RESERVED_BITS) 2535 2441 return 1; ··· 2544 2450 */ 2545 2451 data &= ~APIC_ICR_BUSY; 2546 2452 2547 - kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32)); 2453 + if (fast) { 2454 + struct kvm_lapic_irq irq; 2455 + int ignored; 2456 + 2457 + kvm_icr_to_lapic_irq(apic, (u32)data, (u32)(data >> 32), &irq); 2458 + 2459 + if (!kvm_irq_delivery_to_apic_fast(apic->vcpu->kvm, apic, &irq, 2460 + &ignored, NULL)) 2461 + return -EWOULDBLOCK; 2462 + 2463 + trace_kvm_apic_ipi((u32)data, irq.dest_id); 2464 + } else { 2465 + kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32)); 2466 + } 2548 2467 if (kvm_x86_ops.x2apic_icr_is_split) { 2549 2468 kvm_lapic_set_reg(apic, APIC_ICR, data); 2550 2469 kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32); ··· 2566 2459 } 2567 2460 trace_kvm_apic_write(APIC_ICR, data); 2568 2461 return 0; 2462 + } 2463 + 2464 + static int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data) 2465 + { 2466 + return __kvm_x2apic_icr_write(apic, data, false); 2467 + } 2468 + 2469 + int kvm_x2apic_icr_write_fast(struct kvm_lapic *apic, u64 data) 2470 + { 2471 + return __kvm_x2apic_icr_write(apic, data, true); 2569 2472 } 2570 2473 2571 2474 static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic) ··· 2608 2491 else 2609 2492 kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset)); 2610 2493 } 2611 - EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); 2494 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_write_nodecode); 2612 2495 2613 2496 void kvm_free_lapic(struct kvm_vcpu *vcpu) 2614 2497 { ··· 2746 2629 kvm_recalculate_apic_map(vcpu->kvm); 2747 2630 return 0; 2748 2631 } 2749 - EXPORT_SYMBOL_GPL(kvm_apic_set_base); 2632 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_base); 2750 2633 2751 2634 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) 2752 2635 { ··· 2778 2661 int kvm_alloc_apic_access_page(struct kvm *kvm) 2779 2662 { 2780 2663 void __user *hva; 2781 - int ret = 0; 2782 2664 2783 - mutex_lock(&kvm->slots_lock); 2665 + guard(mutex)(&kvm->slots_lock); 2666 + 2784 2667 if (kvm->arch.apic_access_memslot_enabled || 2785 2668 kvm->arch.apic_access_memslot_inhibited) 2786 - goto out; 2669 + return 0; 2787 2670 2788 2671 hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 2789 2672 APIC_DEFAULT_PHYS_BASE, PAGE_SIZE); 2790 - if (IS_ERR(hva)) { 2791 - ret = PTR_ERR(hva); 2792 - goto out; 2793 - } 2673 + if (IS_ERR(hva)) 2674 + return PTR_ERR(hva); 2794 2675 2795 2676 kvm->arch.apic_access_memslot_enabled = true; 2796 - out: 2797 - mutex_unlock(&kvm->slots_lock); 2798 - return ret; 2677 + 2678 + return 0; 2799 2679 } 2800 - EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page); 2680 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_alloc_apic_access_page); 2801 2681 2802 2682 void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu) 2803 2683 { ··· 3058 2944 __apic_update_ppr(apic, &ppr); 3059 2945 return apic_has_interrupt_for_ppr(apic, ppr); 3060 2946 } 3061 - EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt); 2947 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_has_interrupt); 3062 2948 3063 2949 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) 3064 2950 { ··· 3117 3003 } 3118 3004 3119 3005 } 3120 - EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt); 3006 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_ack_interrupt); 3121 3007 3122 3008 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, 3123 3009 struct kvm_lapic_state *s, bool set)
+6 -13
arch/x86/kvm/lapic.h
··· 105 105 void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu); 106 106 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, 107 107 int shorthand, unsigned int dest, int dest_mode); 108 - int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2); 109 108 void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec); 110 109 bool __kvm_apic_update_irr(unsigned long *pir, void *regs, int *max_irr); 111 110 bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, unsigned long *pir, int *max_irr); ··· 118 119 119 120 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, 120 121 struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map); 122 + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 123 + struct kvm_lapic_irq *irq, 124 + struct dest_map *dest_map); 121 125 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high); 122 126 123 127 int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated); ··· 139 137 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu); 140 138 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu); 141 139 142 - int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data); 140 + int kvm_x2apic_icr_write_fast(struct kvm_lapic *apic, u64 data); 143 141 int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data); 144 142 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data); 145 143 ··· 224 222 !kvm_x86_call(apic_init_signal_blocked)(vcpu); 225 223 } 226 224 227 - static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) 228 - { 229 - return (irq->delivery_mode == APIC_DM_LOWEST || 230 - irq->msi_redir_hint); 231 - } 232 - 233 225 static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu) 234 226 { 235 227 return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); ··· 236 240 void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq, 237 241 unsigned long *vcpu_bitmap); 238 242 239 - bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 240 - struct kvm_vcpu **dest_vcpu); 241 - int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 242 - const unsigned long *bitmap, u32 bitmap_size); 243 + bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 244 + struct kvm_vcpu **dest_vcpu); 243 245 void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu); 244 246 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu); 245 247 void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu); 246 248 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu); 247 249 void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu); 248 - bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu); 249 250 250 251 static inline enum lapic_mode kvm_apic_mode(u64 apic_base) 251 252 {
+1 -1
arch/x86/kvm/mmu.h
··· 212 212 213 213 fault = (mmu->permissions[index] >> pte_access) & 1; 214 214 215 - WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); 215 + WARN_ON_ONCE(pfec & (PFERR_PK_MASK | PFERR_SS_MASK | PFERR_RSVD_MASK)); 216 216 if (unlikely(mmu->pkru_mask)) { 217 217 u32 pkru_bits, offset; 218 218
+127 -74
arch/x86/kvm/mmu/mmu.c
··· 110 110 #ifdef CONFIG_X86_64 111 111 bool __read_mostly tdp_mmu_enabled = true; 112 112 module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); 113 - EXPORT_SYMBOL_GPL(tdp_mmu_enabled); 113 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(tdp_mmu_enabled); 114 114 #endif 115 115 116 116 static int max_huge_page_level __read_mostly; ··· 776 776 kvm_flush_remote_tlbs_gfn(kvm, gfn, PG_LEVEL_4K); 777 777 } 778 778 779 - void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) 779 + void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 780 + enum kvm_mmu_type mmu_type) 780 781 { 781 782 /* 782 783 * If it's possible to replace the shadow page with an NX huge page, ··· 791 790 return; 792 791 793 792 ++kvm->stat.nx_lpage_splits; 793 + ++kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages; 794 794 list_add_tail(&sp->possible_nx_huge_page_link, 795 - &kvm->arch.possible_nx_huge_pages); 795 + &kvm->arch.possible_nx_huge_pages[mmu_type].pages); 796 796 } 797 797 798 798 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, ··· 802 800 sp->nx_huge_page_disallowed = true; 803 801 804 802 if (nx_huge_page_possible) 805 - track_possible_nx_huge_page(kvm, sp); 803 + track_possible_nx_huge_page(kvm, sp, KVM_SHADOW_MMU); 806 804 } 807 805 808 806 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) ··· 821 819 kvm_mmu_gfn_allow_lpage(slot, gfn); 822 820 } 823 821 824 - void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) 822 + void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 823 + enum kvm_mmu_type mmu_type) 825 824 { 826 825 if (list_empty(&sp->possible_nx_huge_page_link)) 827 826 return; 828 827 829 828 --kvm->stat.nx_lpage_splits; 829 + --kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages; 830 830 list_del_init(&sp->possible_nx_huge_page_link); 831 831 } 832 832 ··· 836 832 { 837 833 sp->nx_huge_page_disallowed = false; 838 834 839 - untrack_possible_nx_huge_page(kvm, sp); 835 + untrack_possible_nx_huge_page(kvm, sp, KVM_SHADOW_MMU); 840 836 } 841 837 842 838 static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, ··· 3865 3861 write_unlock(&kvm->mmu_lock); 3866 3862 } 3867 3863 } 3868 - EXPORT_SYMBOL_GPL(kvm_mmu_free_roots); 3864 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_roots); 3869 3865 3870 3866 void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu) 3871 3867 { ··· 3892 3888 3893 3889 kvm_mmu_free_roots(kvm, mmu, roots_to_free); 3894 3890 } 3895 - EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots); 3891 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_guest_mode_roots); 3896 3892 3897 3893 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, 3898 3894 u8 level) ··· 4667 4663 /* 4668 4664 * Retry the page fault if the gfn hit a memslot that is being deleted 4669 4665 * or moved. This ensures any existing SPTEs for the old memslot will 4670 - * be zapped before KVM inserts a new MMIO SPTE for the gfn. 4666 + * be zapped before KVM inserts a new MMIO SPTE for the gfn. Punt the 4667 + * error to userspace if this is a prefault, as KVM's prefaulting ABI 4668 + * doesn't provide the same forward progress guarantees as KVM_RUN. 4671 4669 */ 4672 - if (slot->flags & KVM_MEMSLOT_INVALID) 4670 + if (slot->flags & KVM_MEMSLOT_INVALID) { 4671 + if (fault->prefetch) 4672 + return -EAGAIN; 4673 + 4673 4674 return RET_PF_RETRY; 4675 + } 4674 4676 4675 4677 if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) { 4676 4678 /* ··· 4876 4866 4877 4867 return r; 4878 4868 } 4879 - EXPORT_SYMBOL_GPL(kvm_handle_page_fault); 4869 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_page_fault); 4880 4870 4881 4871 #ifdef CONFIG_X86_64 4882 4872 static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, ··· 4966 4956 return -EIO; 4967 4957 } 4968 4958 } 4969 - EXPORT_SYMBOL_GPL(kvm_tdp_map_page); 4959 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_map_page); 4970 4960 4971 4961 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, 4972 4962 struct kvm_pre_fault_memory *range) ··· 5162 5152 __clear_sp_write_flooding_count(sp); 5163 5153 } 5164 5154 } 5165 - EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd); 5155 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_new_pgd); 5166 5156 5167 5157 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn, 5168 5158 unsigned int access) ··· 5808 5798 shadow_mmu_init_context(vcpu, context, cpu_role, root_role); 5809 5799 kvm_mmu_new_pgd(vcpu, nested_cr3); 5810 5800 } 5811 - EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu); 5801 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_shadow_npt_mmu); 5812 5802 5813 5803 static union kvm_cpu_role 5814 5804 kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, ··· 5862 5852 5863 5853 kvm_mmu_new_pgd(vcpu, new_eptp); 5864 5854 } 5865 - EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); 5855 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_shadow_ept_mmu); 5866 5856 5867 5857 static void init_kvm_softmmu(struct kvm_vcpu *vcpu, 5868 5858 union kvm_cpu_role cpu_role) ··· 5927 5917 else 5928 5918 init_kvm_softmmu(vcpu, cpu_role); 5929 5919 } 5930 - EXPORT_SYMBOL_GPL(kvm_init_mmu); 5920 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_mmu); 5931 5921 5932 5922 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) 5933 5923 { ··· 5963 5953 kvm_mmu_unload(vcpu); 5964 5954 kvm_init_mmu(vcpu); 5965 5955 } 5966 - EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); 5956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_reset_context); 5967 5957 5968 5958 int kvm_mmu_load(struct kvm_vcpu *vcpu) 5969 5959 { ··· 5997 5987 out: 5998 5988 return r; 5999 5989 } 6000 - EXPORT_SYMBOL_GPL(kvm_mmu_load); 5990 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_load); 6001 5991 6002 5992 void kvm_mmu_unload(struct kvm_vcpu *vcpu) 6003 5993 { ··· 6059 6049 __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.root_mmu); 6060 6050 __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.guest_mmu); 6061 6051 } 6062 - EXPORT_SYMBOL_GPL(kvm_mmu_free_obsolete_roots); 6052 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_obsolete_roots); 6063 6053 6064 6054 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, 6065 6055 int *bytes) ··· 6385 6375 return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn, 6386 6376 insn_len); 6387 6377 } 6388 - EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); 6378 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_page_fault); 6389 6379 6390 6380 void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg) 6391 6381 { ··· 6401 6391 pr_cont(", spte[%d] = 0x%llx", level, sptes[level]); 6402 6392 pr_cont("\n"); 6403 6393 } 6404 - EXPORT_SYMBOL_GPL(kvm_mmu_print_sptes); 6394 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_print_sptes); 6405 6395 6406 6396 static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 6407 6397 u64 addr, hpa_t root_hpa) ··· 6467 6457 __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa); 6468 6458 } 6469 6459 } 6470 - EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr); 6460 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invalidate_addr); 6471 6461 6472 6462 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) 6473 6463 { ··· 6484 6474 kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL); 6485 6475 ++vcpu->stat.invlpg; 6486 6476 } 6487 - EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); 6477 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invlpg); 6488 6478 6489 6479 6490 6480 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid) ··· 6537 6527 else 6538 6528 max_huge_page_level = PG_LEVEL_2M; 6539 6529 } 6540 - EXPORT_SYMBOL_GPL(kvm_configure_mmu); 6530 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_configure_mmu); 6541 6531 6542 6532 static void free_mmu_pages(struct kvm_mmu *mmu) 6543 6533 { ··· 6761 6751 6762 6752 int kvm_mmu_init_vm(struct kvm *kvm) 6763 6753 { 6764 - int r; 6754 + int r, i; 6765 6755 6766 6756 kvm->arch.shadow_mmio_value = shadow_mmio_value; 6767 6757 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); 6768 - INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); 6758 + for (i = 0; i < KVM_NR_MMU_TYPES; ++i) 6759 + INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages[i].pages); 6769 6760 spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); 6770 6761 6771 6762 if (tdp_mmu_enabled) { ··· 7204 7193 7205 7194 return need_tlb_flush; 7206 7195 } 7207 - EXPORT_SYMBOL_GPL(kvm_zap_gfn_range); 7196 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_zap_gfn_range); 7208 7197 7209 7198 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm, 7210 7199 const struct kvm_memory_slot *slot) ··· 7607 7596 return err; 7608 7597 } 7609 7598 7610 - static void kvm_recover_nx_huge_pages(struct kvm *kvm) 7599 + static unsigned long nx_huge_pages_to_zap(struct kvm *kvm, 7600 + enum kvm_mmu_type mmu_type) 7611 7601 { 7612 - unsigned long nx_lpage_splits = kvm->stat.nx_lpage_splits; 7602 + unsigned long pages = READ_ONCE(kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages); 7603 + unsigned int ratio = READ_ONCE(nx_huge_pages_recovery_ratio); 7604 + 7605 + return ratio ? DIV_ROUND_UP(pages, ratio) : 0; 7606 + } 7607 + 7608 + static bool kvm_mmu_sp_dirty_logging_enabled(struct kvm *kvm, 7609 + struct kvm_mmu_page *sp) 7610 + { 7613 7611 struct kvm_memory_slot *slot; 7614 - int rcu_idx; 7612 + 7613 + /* 7614 + * Skip the memslot lookup if dirty tracking can't possibly be enabled, 7615 + * as memslot lookups are relatively expensive. 7616 + * 7617 + * If a memslot update is in progress, reading an incorrect value of 7618 + * kvm->nr_memslots_dirty_logging is not a problem: if it is becoming 7619 + * zero, KVM will do an unnecessary memslot lookup; if it is becoming 7620 + * nonzero, the page will be zapped unnecessarily. Either way, this 7621 + * only affects efficiency in racy situations, and not correctness. 7622 + */ 7623 + if (!atomic_read(&kvm->nr_memslots_dirty_logging)) 7624 + return false; 7625 + 7626 + slot = __gfn_to_memslot(kvm_memslots_for_spte_role(kvm, sp->role), sp->gfn); 7627 + if (WARN_ON_ONCE(!slot)) 7628 + return false; 7629 + 7630 + return kvm_slot_dirty_track_enabled(slot); 7631 + } 7632 + 7633 + static void kvm_recover_nx_huge_pages(struct kvm *kvm, 7634 + const enum kvm_mmu_type mmu_type) 7635 + { 7636 + #ifdef CONFIG_X86_64 7637 + const bool is_tdp_mmu = mmu_type == KVM_TDP_MMU; 7638 + spinlock_t *tdp_mmu_pages_lock = &kvm->arch.tdp_mmu_pages_lock; 7639 + #else 7640 + const bool is_tdp_mmu = false; 7641 + spinlock_t *tdp_mmu_pages_lock = NULL; 7642 + #endif 7643 + unsigned long to_zap = nx_huge_pages_to_zap(kvm, mmu_type); 7644 + struct list_head *nx_huge_pages; 7615 7645 struct kvm_mmu_page *sp; 7616 - unsigned int ratio; 7617 7646 LIST_HEAD(invalid_list); 7618 7647 bool flush = false; 7619 - ulong to_zap; 7648 + int rcu_idx; 7649 + 7650 + nx_huge_pages = &kvm->arch.possible_nx_huge_pages[mmu_type].pages; 7620 7651 7621 7652 rcu_idx = srcu_read_lock(&kvm->srcu); 7622 - write_lock(&kvm->mmu_lock); 7653 + if (is_tdp_mmu) 7654 + read_lock(&kvm->mmu_lock); 7655 + else 7656 + write_lock(&kvm->mmu_lock); 7623 7657 7624 7658 /* 7625 7659 * Zapping TDP MMU shadow pages, including the remote TLB flush, must ··· 7673 7617 */ 7674 7618 rcu_read_lock(); 7675 7619 7676 - ratio = READ_ONCE(nx_huge_pages_recovery_ratio); 7677 - to_zap = ratio ? DIV_ROUND_UP(nx_lpage_splits, ratio) : 0; 7678 7620 for ( ; to_zap; --to_zap) { 7679 - if (list_empty(&kvm->arch.possible_nx_huge_pages)) 7621 + if (is_tdp_mmu) 7622 + spin_lock(tdp_mmu_pages_lock); 7623 + 7624 + if (list_empty(nx_huge_pages)) { 7625 + if (is_tdp_mmu) 7626 + spin_unlock(tdp_mmu_pages_lock); 7680 7627 break; 7628 + } 7681 7629 7682 7630 /* 7683 7631 * We use a separate list instead of just using active_mmu_pages ··· 7690 7630 * the total number of shadow pages. And because the TDP MMU 7691 7631 * doesn't use active_mmu_pages. 7692 7632 */ 7693 - sp = list_first_entry(&kvm->arch.possible_nx_huge_pages, 7633 + sp = list_first_entry(nx_huge_pages, 7694 7634 struct kvm_mmu_page, 7695 7635 possible_nx_huge_page_link); 7696 7636 WARN_ON_ONCE(!sp->nx_huge_page_disallowed); 7697 7637 WARN_ON_ONCE(!sp->role.direct); 7698 7638 7699 - /* 7700 - * Unaccount and do not attempt to recover any NX Huge Pages 7701 - * that are being dirty tracked, as they would just be faulted 7702 - * back in as 4KiB pages. The NX Huge Pages in this slot will be 7703 - * recovered, along with all the other huge pages in the slot, 7704 - * when dirty logging is disabled. 7705 - * 7706 - * Since gfn_to_memslot() is relatively expensive, it helps to 7707 - * skip it if it the test cannot possibly return true. On the 7708 - * other hand, if any memslot has logging enabled, chances are 7709 - * good that all of them do, in which case unaccount_nx_huge_page() 7710 - * is much cheaper than zapping the page. 7711 - * 7712 - * If a memslot update is in progress, reading an incorrect value 7713 - * of kvm->nr_memslots_dirty_logging is not a problem: if it is 7714 - * becoming zero, gfn_to_memslot() will be done unnecessarily; if 7715 - * it is becoming nonzero, the page will be zapped unnecessarily. 7716 - * Either way, this only affects efficiency in racy situations, 7717 - * and not correctness. 7718 - */ 7719 - slot = NULL; 7720 - if (atomic_read(&kvm->nr_memslots_dirty_logging)) { 7721 - struct kvm_memslots *slots; 7639 + unaccount_nx_huge_page(kvm, sp); 7722 7640 7723 - slots = kvm_memslots_for_spte_role(kvm, sp->role); 7724 - slot = __gfn_to_memslot(slots, sp->gfn); 7725 - WARN_ON_ONCE(!slot); 7641 + if (is_tdp_mmu) 7642 + spin_unlock(tdp_mmu_pages_lock); 7643 + 7644 + /* 7645 + * Do not attempt to recover any NX Huge Pages that are being 7646 + * dirty tracked, as they would just be faulted back in as 4KiB 7647 + * pages. The NX Huge Pages in this slot will be recovered, 7648 + * along with all the other huge pages in the slot, when dirty 7649 + * logging is disabled. 7650 + */ 7651 + if (!kvm_mmu_sp_dirty_logging_enabled(kvm, sp)) { 7652 + if (is_tdp_mmu) 7653 + flush |= kvm_tdp_mmu_zap_possible_nx_huge_page(kvm, sp); 7654 + else 7655 + kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 7656 + 7726 7657 } 7727 7658 7728 - if (slot && kvm_slot_dirty_track_enabled(slot)) 7729 - unaccount_nx_huge_page(kvm, sp); 7730 - else if (is_tdp_mmu_page(sp)) 7731 - flush |= kvm_tdp_mmu_zap_sp(kvm, sp); 7732 - else 7733 - kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 7734 7659 WARN_ON_ONCE(sp->nx_huge_page_disallowed); 7735 7660 7736 7661 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) { 7737 7662 kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush); 7738 7663 rcu_read_unlock(); 7739 7664 7740 - cond_resched_rwlock_write(&kvm->mmu_lock); 7741 - flush = false; 7665 + if (is_tdp_mmu) 7666 + cond_resched_rwlock_read(&kvm->mmu_lock); 7667 + else 7668 + cond_resched_rwlock_write(&kvm->mmu_lock); 7742 7669 7670 + flush = false; 7743 7671 rcu_read_lock(); 7744 7672 } 7745 7673 } ··· 7735 7687 7736 7688 rcu_read_unlock(); 7737 7689 7738 - write_unlock(&kvm->mmu_lock); 7690 + if (is_tdp_mmu) 7691 + read_unlock(&kvm->mmu_lock); 7692 + else 7693 + write_unlock(&kvm->mmu_lock); 7739 7694 srcu_read_unlock(&kvm->srcu, rcu_idx); 7740 7695 } 7741 7696 ··· 7749 7698 static bool kvm_nx_huge_page_recovery_worker(void *data) 7750 7699 { 7751 7700 struct kvm *kvm = data; 7701 + long remaining_time; 7752 7702 bool enabled; 7753 7703 uint period; 7754 - long remaining_time; 7704 + int i; 7755 7705 7756 7706 enabled = calc_nx_huge_pages_recovery_period(&period); 7757 7707 if (!enabled) ··· 7767 7715 } 7768 7716 7769 7717 __set_current_state(TASK_RUNNING); 7770 - kvm_recover_nx_huge_pages(kvm); 7718 + for (i = 0; i < KVM_NR_MMU_TYPES; ++i) 7719 + kvm_recover_nx_huge_pages(kvm, i); 7771 7720 kvm->arch.nx_huge_page_last = get_jiffies_64(); 7772 7721 return true; 7773 7722 }
+4 -2
arch/x86/kvm/mmu/mmu_internal.h
··· 416 416 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 417 417 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); 418 418 419 - void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); 420 - void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); 419 + void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 420 + enum kvm_mmu_type mmu_type); 421 + void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 422 + enum kvm_mmu_type mmu_type); 421 423 422 424 #endif /* __KVM_X86_MMU_INTERNAL_H */
+3
arch/x86/kvm/mmu/mmutrace.h
··· 51 51 { PFERR_PRESENT_MASK, "P" }, \ 52 52 { PFERR_WRITE_MASK, "W" }, \ 53 53 { PFERR_USER_MASK, "U" }, \ 54 + { PFERR_PK_MASK, "PK" }, \ 55 + { PFERR_SS_MASK, "SS" }, \ 56 + { PFERR_SGX_MASK, "SGX" }, \ 54 57 { PFERR_RSVD_MASK, "RSVD" }, \ 55 58 { PFERR_FETCH_MASK, "F" } 56 59
+5 -5
arch/x86/kvm/mmu/spte.c
··· 22 22 bool __read_mostly enable_mmio_caching = true; 23 23 static bool __ro_after_init allow_mmio_caching; 24 24 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444); 25 - EXPORT_SYMBOL_GPL(enable_mmio_caching); 25 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mmio_caching); 26 26 27 27 bool __read_mostly kvm_ad_enabled; 28 28 ··· 470 470 shadow_mmio_mask = mmio_mask; 471 471 shadow_mmio_access_mask = access_mask; 472 472 } 473 - EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); 473 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_mmio_spte_mask); 474 474 475 475 void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) 476 476 { 477 477 kvm->arch.shadow_mmio_value = mmio_value; 478 478 } 479 - EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); 479 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_mmio_spte_value); 480 480 481 481 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) 482 482 { ··· 487 487 shadow_me_value = me_value; 488 488 shadow_me_mask = me_mask; 489 489 } 490 - EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask); 490 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_me_spte_mask); 491 491 492 492 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only) 493 493 { ··· 513 513 kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE, 514 514 VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT, 0); 515 515 } 516 - EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks); 516 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_ept_masks); 517 517 518 518 void kvm_mmu_reset_all_pte_masks(void) 519 519 {
+40 -11
arch/x86/kvm/mmu/tdp_mmu.c
··· 355 355 356 356 spin_lock(&kvm->arch.tdp_mmu_pages_lock); 357 357 sp->nx_huge_page_disallowed = false; 358 - untrack_possible_nx_huge_page(kvm, sp); 358 + untrack_possible_nx_huge_page(kvm, sp, KVM_TDP_MMU); 359 359 spin_unlock(&kvm->arch.tdp_mmu_pages_lock); 360 360 } 361 361 ··· 925 925 rcu_read_unlock(); 926 926 } 927 927 928 - bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) 928 + bool kvm_tdp_mmu_zap_possible_nx_huge_page(struct kvm *kvm, 929 + struct kvm_mmu_page *sp) 929 930 { 930 - u64 old_spte; 931 + struct tdp_iter iter = { 932 + .old_spte = sp->ptep ? kvm_tdp_mmu_read_spte(sp->ptep) : 0, 933 + .sptep = sp->ptep, 934 + .level = sp->role.level + 1, 935 + .gfn = sp->gfn, 936 + .as_id = kvm_mmu_page_as_id(sp), 937 + }; 938 + 939 + lockdep_assert_held_read(&kvm->mmu_lock); 940 + 941 + if (WARN_ON_ONCE(!is_tdp_mmu_page(sp))) 942 + return false; 931 943 932 944 /* 933 - * This helper intentionally doesn't allow zapping a root shadow page, 934 - * which doesn't have a parent page table and thus no associated entry. 945 + * Root shadow pages don't have a parent page table and thus no 946 + * associated entry, but they can never be possible NX huge pages. 935 947 */ 936 948 if (WARN_ON_ONCE(!sp->ptep)) 937 949 return false; 938 950 939 - old_spte = kvm_tdp_mmu_read_spte(sp->ptep); 940 - if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte))) 951 + /* 952 + * Since mmu_lock is held in read mode, it's possible another task has 953 + * already modified the SPTE. Zap the SPTE if and only if the SPTE 954 + * points at the SP's page table, as checking shadow-present isn't 955 + * sufficient, e.g. the SPTE could be replaced by a leaf SPTE, or even 956 + * another SP. Note, spte_to_child_pt() also checks that the SPTE is 957 + * shadow-present, i.e. guards against zapping a frozen SPTE. 958 + */ 959 + if ((tdp_ptep_t)sp->spt != spte_to_child_pt(iter.old_spte, iter.level)) 941 960 return false; 942 961 943 - tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 944 - SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1); 962 + /* 963 + * If a different task modified the SPTE, then it should be impossible 964 + * for the SPTE to still be used for the to-be-zapped SP. Non-leaf 965 + * SPTEs don't have Dirty bits, KVM always sets the Accessed bit when 966 + * creating non-leaf SPTEs, and all other bits are immutable for non- 967 + * leaf SPTEs, i.e. the only legal operations for non-leaf SPTEs are 968 + * zapping and replacement. 969 + */ 970 + if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE)) { 971 + WARN_ON_ONCE((tdp_ptep_t)sp->spt == spte_to_child_pt(iter.old_spte, iter.level)); 972 + return false; 973 + } 945 974 946 975 return true; 947 976 } ··· 1332 1303 fault->req_level >= iter.level) { 1333 1304 spin_lock(&kvm->arch.tdp_mmu_pages_lock); 1334 1305 if (sp->nx_huge_page_disallowed) 1335 - track_possible_nx_huge_page(kvm, sp); 1306 + track_possible_nx_huge_page(kvm, sp, KVM_TDP_MMU); 1336 1307 spin_unlock(&kvm->arch.tdp_mmu_pages_lock); 1337 1308 } 1338 1309 } ··· 1982 1953 spte = sptes[leaf]; 1983 1954 return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); 1984 1955 } 1985 - EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); 1956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_mmu_gpa_is_mapped); 1986 1957 1987 1958 /* 1988 1959 * Returns the last level spte pointer of the shadow page walk for the given
+2 -1
arch/x86/kvm/mmu/tdp_mmu.h
··· 64 64 } 65 65 66 66 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); 67 - bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); 67 + bool kvm_tdp_mmu_zap_possible_nx_huge_page(struct kvm *kvm, 68 + struct kvm_mmu_page *sp); 68 69 void kvm_tdp_mmu_zap_all(struct kvm *kvm); 69 70 void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, 70 71 enum kvm_tdp_mmu_root_types root_types);
+140 -35
arch/x86/kvm/pmu.c
··· 26 26 /* This is enough to filter the vast majority of currently defined events. */ 27 27 #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300 28 28 29 - struct x86_pmu_capability __read_mostly kvm_pmu_cap; 30 - EXPORT_SYMBOL_GPL(kvm_pmu_cap); 29 + /* Unadultered PMU capabilities of the host, i.e. of hardware. */ 30 + static struct x86_pmu_capability __read_mostly kvm_host_pmu; 31 31 32 - struct kvm_pmu_emulated_event_selectors __read_mostly kvm_pmu_eventsel; 33 - EXPORT_SYMBOL_GPL(kvm_pmu_eventsel); 32 + /* KVM's PMU capabilities, i.e. the intersection of KVM and hardware support. */ 33 + struct x86_pmu_capability __read_mostly kvm_pmu_cap; 34 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_cap); 35 + 36 + struct kvm_pmu_emulated_event_selectors { 37 + u64 INSTRUCTIONS_RETIRED; 38 + u64 BRANCH_INSTRUCTIONS_RETIRED; 39 + }; 40 + static struct kvm_pmu_emulated_event_selectors __read_mostly kvm_pmu_eventsel; 34 41 35 42 /* Precise Distribution of Instructions Retired (PDIR) */ 36 43 static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = { ··· 101 94 #define KVM_X86_PMU_OP_OPTIONAL __KVM_X86_PMU_OP 102 95 #include <asm/kvm-x86-pmu-ops.h> 103 96 #undef __KVM_X86_PMU_OP 97 + } 98 + 99 + void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) 100 + { 101 + bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; 102 + int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS; 103 + 104 + perf_get_x86_pmu_capability(&kvm_host_pmu); 105 + 106 + /* 107 + * Hybrid PMUs don't play nice with virtualization without careful 108 + * configuration by userspace, and KVM's APIs for reporting supported 109 + * vPMU features do not account for hybrid PMUs. Disable vPMU support 110 + * for hybrid PMUs until KVM gains a way to let userspace opt-in. 111 + */ 112 + if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) 113 + enable_pmu = false; 114 + 115 + if (enable_pmu) { 116 + /* 117 + * WARN if perf did NOT disable hardware PMU if the number of 118 + * architecturally required GP counters aren't present, i.e. if 119 + * there are a non-zero number of counters, but fewer than what 120 + * is architecturally required. 121 + */ 122 + if (!kvm_host_pmu.num_counters_gp || 123 + WARN_ON_ONCE(kvm_host_pmu.num_counters_gp < min_nr_gp_ctrs)) 124 + enable_pmu = false; 125 + else if (is_intel && !kvm_host_pmu.version) 126 + enable_pmu = false; 127 + } 128 + 129 + if (!enable_pmu) { 130 + memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); 131 + return; 132 + } 133 + 134 + memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu)); 135 + kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2); 136 + kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp, 137 + pmu_ops->MAX_NR_GP_COUNTERS); 138 + kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed, 139 + KVM_MAX_NR_FIXED_COUNTERS); 140 + 141 + kvm_pmu_eventsel.INSTRUCTIONS_RETIRED = 142 + perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS); 143 + kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED = 144 + perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); 104 145 } 105 146 106 147 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi) ··· 373 318 pmc->counter &= pmc_bitmask(pmc); 374 319 pmc_update_sample_period(pmc); 375 320 } 376 - EXPORT_SYMBOL_GPL(pmc_write_counter); 321 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(pmc_write_counter); 377 322 378 323 static int filter_cmp(const void *pa, const void *pb, u64 mask) 379 324 { ··· 481 426 return true; 482 427 } 483 428 484 - static bool check_pmu_event_filter(struct kvm_pmc *pmc) 429 + static bool pmc_is_event_allowed(struct kvm_pmc *pmc) 485 430 { 486 431 struct kvm_x86_pmu_event_filter *filter; 487 432 struct kvm *kvm = pmc->vcpu->kvm; ··· 496 441 return is_fixed_event_allowed(filter, pmc->idx); 497 442 } 498 443 499 - static bool pmc_event_is_allowed(struct kvm_pmc *pmc) 500 - { 501 - return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) && 502 - check_pmu_event_filter(pmc); 503 - } 504 - 505 444 static int reprogram_counter(struct kvm_pmc *pmc) 506 445 { 507 446 struct kvm_pmu *pmu = pmc_to_pmu(pmc); ··· 506 457 507 458 emulate_overflow = pmc_pause_counter(pmc); 508 459 509 - if (!pmc_event_is_allowed(pmc)) 460 + if (!pmc_is_globally_enabled(pmc) || !pmc_is_locally_enabled(pmc) || 461 + !pmc_is_event_allowed(pmc)) 510 462 return 0; 511 463 512 464 if (emulate_overflow) ··· 541 491 !(eventsel & ARCH_PERFMON_EVENTSEL_OS), 542 492 eventsel & ARCH_PERFMON_EVENTSEL_INT); 543 493 } 494 + 495 + static bool pmc_is_event_match(struct kvm_pmc *pmc, u64 eventsel) 496 + { 497 + /* 498 + * Ignore checks for edge detect (all events currently emulated by KVM 499 + * are always rising edges), pin control (unsupported by modern CPUs), 500 + * and counter mask and its invert flag (KVM doesn't emulate multiple 501 + * events in a single clock cycle). 502 + * 503 + * Note, the uppermost nibble of AMD's mask overlaps Intel's IN_TX (bit 504 + * 32) and IN_TXCP (bit 33), as well as two reserved bits (bits 35:34). 505 + * Checking the "in HLE/RTM transaction" flags is correct as the vCPU 506 + * can't be in a transaction if KVM is emulating an instruction. 507 + * 508 + * Checking the reserved bits might be wrong if they are defined in the 509 + * future, but so could ignoring them, so do the simple thing for now. 510 + */ 511 + return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); 512 + } 513 + 514 + void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc) 515 + { 516 + bitmap_clear(pmu->pmc_counting_instructions, pmc->idx, 1); 517 + bitmap_clear(pmu->pmc_counting_branches, pmc->idx, 1); 518 + 519 + /* 520 + * Do NOT consult the PMU event filters, as the filters must be checked 521 + * at the time of emulation to ensure KVM uses fresh information, e.g. 522 + * omitting a PMC from a bitmap could result in a missed event if the 523 + * filter is changed to allow counting the event. 524 + */ 525 + if (!pmc_is_locally_enabled(pmc)) 526 + return; 527 + 528 + if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED)) 529 + bitmap_set(pmu->pmc_counting_instructions, pmc->idx, 1); 530 + 531 + if (pmc_is_event_match(pmc, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED)) 532 + bitmap_set(pmu->pmc_counting_branches, pmc->idx, 1); 533 + } 534 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_recalc_pmc_emulation); 544 535 545 536 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) 546 537 { ··· 618 527 */ 619 528 if (unlikely(pmu->need_cleanup)) 620 529 kvm_pmu_cleanup(vcpu); 530 + 531 + kvm_for_each_pmc(pmu, pmc, bit, bitmap) 532 + kvm_pmu_recalc_pmc_emulation(pmu, pmc); 621 533 } 622 534 623 535 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx) ··· 744 650 msr_info->data = pmu->global_ctrl; 745 651 break; 746 652 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 653 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 747 654 case MSR_CORE_PERF_GLOBAL_OVF_CTRL: 748 655 msr_info->data = 0; 749 656 break; ··· 805 710 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 806 711 if (!msr_info->host_initiated) 807 712 pmu->global_status &= ~data; 713 + break; 714 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 715 + if (!msr_info->host_initiated) 716 + pmu->global_status |= data & ~pmu->global_status_rsvd; 808 717 break; 809 718 default: 810 719 kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index); ··· 888 789 */ 889 790 if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters) 890 791 pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); 792 + 793 + bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); 794 + bitmap_set(pmu->all_valid_pmc_idx, KVM_FIXED_PMC_BASE_IDX, 795 + pmu->nr_arch_fixed_counters); 891 796 } 892 797 893 798 void kvm_pmu_init(struct kvm_vcpu *vcpu) ··· 916 813 pmu->pmc_in_use, X86_PMC_IDX_MAX); 917 814 918 815 kvm_for_each_pmc(pmu, pmc, i, bitmask) { 919 - if (pmc->perf_event && !pmc_speculative_in_use(pmc)) 816 + if (pmc->perf_event && !pmc_is_locally_enabled(pmc)) 920 817 pmc_stop_counter(pmc); 921 818 } 922 819 ··· 963 860 select_user; 964 861 } 965 862 966 - void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) 863 + static void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, 864 + const unsigned long *event_pmcs) 967 865 { 968 866 DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); 969 867 struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); 970 868 struct kvm_pmc *pmc; 971 - int i; 869 + int i, idx; 972 870 973 871 BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX); 974 872 873 + if (bitmap_empty(event_pmcs, X86_PMC_IDX_MAX)) 874 + return; 875 + 975 876 if (!kvm_pmu_has_perf_global_ctrl(pmu)) 976 - bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); 977 - else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx, 877 + bitmap_copy(bitmap, event_pmcs, X86_PMC_IDX_MAX); 878 + else if (!bitmap_and(bitmap, event_pmcs, 978 879 (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX)) 979 880 return; 980 881 882 + idx = srcu_read_lock(&vcpu->kvm->srcu); 981 883 kvm_for_each_pmc(pmu, pmc, i, bitmap) { 982 - /* 983 - * Ignore checks for edge detect (all events currently emulated 984 - * but KVM are always rising edges), pin control (unsupported 985 - * by modern CPUs), and counter mask and its invert flag (KVM 986 - * doesn't emulate multiple events in a single clock cycle). 987 - * 988 - * Note, the uppermost nibble of AMD's mask overlaps Intel's 989 - * IN_TX (bit 32) and IN_TXCP (bit 33), as well as two reserved 990 - * bits (bits 35:34). Checking the "in HLE/RTM transaction" 991 - * flags is correct as the vCPU can't be in a transaction if 992 - * KVM is emulating an instruction. Checking the reserved bits 993 - * might be wrong if they are defined in the future, but so 994 - * could ignoring them, so do the simple thing for now. 995 - */ 996 - if (((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB) || 997 - !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) 884 + if (!pmc_is_event_allowed(pmc) || !cpl_is_matched(pmc)) 998 885 continue; 999 886 1000 887 kvm_pmu_incr_counter(pmc); 1001 888 } 889 + srcu_read_unlock(&vcpu->kvm->srcu, idx); 1002 890 } 1003 - EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); 891 + 892 + void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu) 893 + { 894 + kvm_pmu_trigger_event(vcpu, vcpu_to_pmu(vcpu)->pmc_counting_instructions); 895 + } 896 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_instruction_retired); 897 + 898 + void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu) 899 + { 900 + kvm_pmu_trigger_event(vcpu, vcpu_to_pmu(vcpu)->pmc_counting_branches); 901 + } 902 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_branch_retired); 1004 903 1005 904 static bool is_masked_filter_valid(const struct kvm_x86_pmu_event_filter *filter) 1006 905 {
+7 -53
arch/x86/kvm/pmu.h
··· 23 23 24 24 #define KVM_FIXED_PMC_BASE_IDX INTEL_PMC_IDX_FIXED 25 25 26 - struct kvm_pmu_emulated_event_selectors { 27 - u64 INSTRUCTIONS_RETIRED; 28 - u64 BRANCH_INSTRUCTIONS_RETIRED; 29 - }; 30 - 31 26 struct kvm_pmu_ops { 32 27 struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu, 33 28 unsigned int idx, u64 *mask); ··· 160 165 return NULL; 161 166 } 162 167 163 - static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc) 168 + static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc) 164 169 { 165 170 struct kvm_pmu *pmu = pmc_to_pmu(pmc); 166 171 ··· 173 178 } 174 179 175 180 extern struct x86_pmu_capability kvm_pmu_cap; 176 - extern struct kvm_pmu_emulated_event_selectors kvm_pmu_eventsel; 177 181 178 - static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) 179 - { 180 - bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; 181 - int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS; 182 + void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops); 182 183 183 - /* 184 - * Hybrid PMUs don't play nice with virtualization without careful 185 - * configuration by userspace, and KVM's APIs for reporting supported 186 - * vPMU features do not account for hybrid PMUs. Disable vPMU support 187 - * for hybrid PMUs until KVM gains a way to let userspace opt-in. 188 - */ 189 - if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) 190 - enable_pmu = false; 191 - 192 - if (enable_pmu) { 193 - perf_get_x86_pmu_capability(&kvm_pmu_cap); 194 - 195 - /* 196 - * WARN if perf did NOT disable hardware PMU if the number of 197 - * architecturally required GP counters aren't present, i.e. if 198 - * there are a non-zero number of counters, but fewer than what 199 - * is architecturally required. 200 - */ 201 - if (!kvm_pmu_cap.num_counters_gp || 202 - WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs)) 203 - enable_pmu = false; 204 - else if (is_intel && !kvm_pmu_cap.version) 205 - enable_pmu = false; 206 - } 207 - 208 - if (!enable_pmu) { 209 - memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); 210 - return; 211 - } 212 - 213 - kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2); 214 - kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp, 215 - pmu_ops->MAX_NR_GP_COUNTERS); 216 - kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed, 217 - KVM_MAX_NR_FIXED_COUNTERS); 218 - 219 - kvm_pmu_eventsel.INSTRUCTIONS_RETIRED = 220 - perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS); 221 - kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED = 222 - perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); 223 - } 184 + void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc); 224 185 225 186 static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc) 226 187 { 188 + kvm_pmu_recalc_pmc_emulation(pmc_to_pmu(pmc), pmc); 189 + 227 190 set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi); 228 191 kvm_make_request(KVM_REQ_PMU, pmc->vcpu); 229 192 } ··· 225 272 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu); 226 273 void kvm_pmu_destroy(struct kvm_vcpu *vcpu); 227 274 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); 228 - void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel); 275 + void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu); 276 + void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu); 229 277 230 278 bool is_vmware_backdoor_pmc(u32 pmc_idx); 231 279
+5
arch/x86/kvm/reverse_cpuid.h
··· 25 25 #define KVM_X86_FEATURE_SGX2 KVM_X86_FEATURE(CPUID_12_EAX, 1) 26 26 #define KVM_X86_FEATURE_SGX_EDECCSSA KVM_X86_FEATURE(CPUID_12_EAX, 11) 27 27 28 + /* Intel-defined sub-features, CPUID level 0x00000007:1 (ECX) */ 29 + #define KVM_X86_FEATURE_MSR_IMM KVM_X86_FEATURE(CPUID_7_1_ECX, 5) 30 + 28 31 /* Intel-defined sub-features, CPUID level 0x00000007:1 (EDX) */ 29 32 #define X86_FEATURE_AVX_VNNI_INT8 KVM_X86_FEATURE(CPUID_7_1_EDX, 4) 30 33 #define X86_FEATURE_AVX_NE_CONVERT KVM_X86_FEATURE(CPUID_7_1_EDX, 5) ··· 90 87 [CPUID_7_2_EDX] = { 7, 2, CPUID_EDX}, 91 88 [CPUID_24_0_EBX] = { 0x24, 0, CPUID_EBX}, 92 89 [CPUID_8000_0021_ECX] = {0x80000021, 0, CPUID_ECX}, 90 + [CPUID_7_1_ECX] = { 7, 1, CPUID_ECX}, 93 91 }; 94 92 95 93 /* ··· 132 128 KVM_X86_TRANSLATE_FEATURE(BHI_CTRL); 133 129 KVM_X86_TRANSLATE_FEATURE(TSA_SQ_NO); 134 130 KVM_X86_TRANSLATE_FEATURE(TSA_L1_NO); 131 + KVM_X86_TRANSLATE_FEATURE(MSR_IMM); 135 132 default: 136 133 return x86_feature; 137 134 }
+11 -3
arch/x86/kvm/smm.c
··· 131 131 132 132 kvm_mmu_reset_context(vcpu); 133 133 } 134 - EXPORT_SYMBOL_GPL(kvm_smm_changed); 134 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_smm_changed); 135 135 136 136 void process_smi(struct kvm_vcpu *vcpu) 137 137 { ··· 269 269 enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS); 270 270 271 271 smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu); 272 + 273 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 274 + kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp)) 275 + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); 272 276 } 273 277 #endif 274 278 ··· 533 529 534 530 vcpu->arch.smbase = smstate->smbase; 535 531 536 - if (kvm_set_msr(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA)) 532 + if (__kvm_emulate_msr_write(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA)) 537 533 return X86EMUL_UNHANDLEABLE; 538 534 539 535 rsm_load_seg_64(vcpu, &smstate->tr, VCPU_SREG_TR); ··· 561 557 562 558 kvm_x86_call(set_interrupt_shadow)(vcpu, 0); 563 559 ctxt->interruptibility = (u8)smstate->int_shadow; 560 + 561 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 562 + kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp)) 563 + return X86EMUL_UNHANDLEABLE; 564 564 565 565 return X86EMUL_CONTINUE; 566 566 } ··· 628 620 629 621 /* And finally go back to 32-bit mode. */ 630 622 efer = 0; 631 - kvm_set_msr(vcpu, MSR_EFER, efer); 623 + __kvm_emulate_msr_write(vcpu, MSR_EFER, efer); 632 624 } 633 625 #endif 634 626
+1 -1
arch/x86/kvm/smm.h
··· 116 116 u32 smbase; 117 117 u32 reserved4[5]; 118 118 119 - /* ssp and svm_* fields below are not implemented by KVM */ 120 119 u64 ssp; 120 + /* svm_* fields below are not implemented by KVM */ 121 121 u64 svm_guest_pat; 122 122 u64 svm_host_efer; 123 123 u64 svm_host_cr4;
+125 -26
arch/x86/kvm/svm/avic.c
··· 64 64 65 65 static_assert(__AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_IDX_MASK) == -1u); 66 66 67 + #define AVIC_AUTO_MODE -1 68 + 69 + static int avic_param_set(const char *val, const struct kernel_param *kp) 70 + { 71 + if (val && sysfs_streq(val, "auto")) { 72 + *(int *)kp->arg = AVIC_AUTO_MODE; 73 + return 0; 74 + } 75 + 76 + return param_set_bint(val, kp); 77 + } 78 + 79 + static const struct kernel_param_ops avic_ops = { 80 + .flags = KERNEL_PARAM_OPS_FL_NOARG, 81 + .set = avic_param_set, 82 + .get = param_get_bool, 83 + }; 84 + 85 + /* 86 + * Enable / disable AVIC. In "auto" mode (default behavior), AVIC is enabled 87 + * for Zen4+ CPUs with x2AVIC (and all other criteria for enablement are met). 88 + */ 89 + static int avic = AVIC_AUTO_MODE; 90 + module_param_cb(avic, &avic_ops, &avic, 0444); 91 + __MODULE_PARM_TYPE(avic, "bool"); 92 + 93 + module_param(enable_ipiv, bool, 0444); 94 + 67 95 static bool force_avic; 68 96 module_param_unsafe(force_avic, bool, 0444); 69 97 ··· 105 77 static u32 next_vm_id = 0; 106 78 static bool next_vm_id_wrapped = 0; 107 79 static DEFINE_SPINLOCK(svm_vm_data_hash_lock); 108 - bool x2avic_enabled; 80 + static bool x2avic_enabled; 81 + 82 + 83 + static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm, 84 + bool intercept) 85 + { 86 + static const u32 x2avic_passthrough_msrs[] = { 87 + X2APIC_MSR(APIC_ID), 88 + X2APIC_MSR(APIC_LVR), 89 + X2APIC_MSR(APIC_TASKPRI), 90 + X2APIC_MSR(APIC_ARBPRI), 91 + X2APIC_MSR(APIC_PROCPRI), 92 + X2APIC_MSR(APIC_EOI), 93 + X2APIC_MSR(APIC_RRR), 94 + X2APIC_MSR(APIC_LDR), 95 + X2APIC_MSR(APIC_DFR), 96 + X2APIC_MSR(APIC_SPIV), 97 + X2APIC_MSR(APIC_ISR), 98 + X2APIC_MSR(APIC_TMR), 99 + X2APIC_MSR(APIC_IRR), 100 + X2APIC_MSR(APIC_ESR), 101 + X2APIC_MSR(APIC_ICR), 102 + X2APIC_MSR(APIC_ICR2), 103 + 104 + /* 105 + * Note! Always intercept LVTT, as TSC-deadline timer mode 106 + * isn't virtualized by hardware, and the CPU will generate a 107 + * #GP instead of a #VMEXIT. 108 + */ 109 + X2APIC_MSR(APIC_LVTTHMR), 110 + X2APIC_MSR(APIC_LVTPC), 111 + X2APIC_MSR(APIC_LVT0), 112 + X2APIC_MSR(APIC_LVT1), 113 + X2APIC_MSR(APIC_LVTERR), 114 + X2APIC_MSR(APIC_TMICT), 115 + X2APIC_MSR(APIC_TMCCT), 116 + X2APIC_MSR(APIC_TDCR), 117 + }; 118 + int i; 119 + 120 + if (intercept == svm->x2avic_msrs_intercepted) 121 + return; 122 + 123 + if (!x2avic_enabled) 124 + return; 125 + 126 + for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++) 127 + svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i], 128 + MSR_TYPE_RW, intercept); 129 + 130 + svm->x2avic_msrs_intercepted = intercept; 131 + } 109 132 110 133 static void avic_activate_vmcb(struct vcpu_svm *svm) 111 134 { ··· 178 99 vmcb->control.int_ctl |= X2APIC_MODE_MASK; 179 100 vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID; 180 101 /* Disabling MSR intercept for x2APIC registers */ 181 - svm_set_x2apic_msr_interception(svm, false); 102 + avic_set_x2apic_msr_interception(svm, false); 182 103 } else { 183 104 /* 184 105 * Flush the TLB, the guest may have inserted a non-APIC ··· 189 110 /* For xAVIC and hybrid-xAVIC modes */ 190 111 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; 191 112 /* Enabling MSR intercept for x2APIC registers */ 192 - svm_set_x2apic_msr_interception(svm, true); 113 + avic_set_x2apic_msr_interception(svm, true); 193 114 } 194 115 } 195 116 ··· 209 130 return; 210 131 211 132 /* Enabling MSR intercept for x2APIC registers */ 212 - svm_set_x2apic_msr_interception(svm, true); 133 + avic_set_x2apic_msr_interception(svm, true); 213 134 } 214 135 215 136 /* Note: ··· 1169 1090 avic_vcpu_load(vcpu, vcpu->cpu); 1170 1091 } 1171 1092 1172 - /* 1173 - * Note: 1174 - * - The module param avic enable both xAPIC and x2APIC mode. 1175 - * - Hypervisor can support both xAVIC and x2AVIC in the same guest. 1176 - * - The mode can be switched at run-time. 1177 - */ 1178 - bool avic_hardware_setup(void) 1093 + static bool __init avic_want_avic_enabled(void) 1179 1094 { 1180 - if (!npt_enabled) 1095 + /* 1096 + * In "auto" mode, enable AVIC by default for Zen4+ if x2AVIC is 1097 + * supported (to avoid enabling partial support by default, and because 1098 + * x2AVIC should be supported by all Zen4+ CPUs). Explicitly check for 1099 + * family 0x19 and later (Zen5+), as the kernel's synthetic ZenX flags 1100 + * aren't inclusive of previous generations, i.e. the kernel will set 1101 + * at most one ZenX feature flag. 1102 + */ 1103 + if (avic == AVIC_AUTO_MODE) 1104 + avic = boot_cpu_has(X86_FEATURE_X2AVIC) && 1105 + (boot_cpu_data.x86 > 0x19 || cpu_feature_enabled(X86_FEATURE_ZEN4)); 1106 + 1107 + if (!avic || !npt_enabled) 1181 1108 return false; 1182 1109 1183 1110 /* AVIC is a prerequisite for x2AVIC. */ 1184 1111 if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) { 1185 - if (boot_cpu_has(X86_FEATURE_X2AVIC)) { 1186 - pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled"); 1187 - pr_warn(FW_BUG "Try enable AVIC using force_avic option"); 1188 - } 1112 + if (boot_cpu_has(X86_FEATURE_X2AVIC)) 1113 + pr_warn(FW_BUG "Cannot enable x2AVIC, AVIC is unsupported\n"); 1189 1114 return false; 1190 1115 } 1191 1116 ··· 1199 1116 return false; 1200 1117 } 1201 1118 1202 - if (boot_cpu_has(X86_FEATURE_AVIC)) { 1203 - pr_info("AVIC enabled\n"); 1204 - } else if (force_avic) { 1205 - /* 1206 - * Some older systems does not advertise AVIC support. 1207 - * See Revision Guide for specific AMD processor for more detail. 1208 - */ 1209 - pr_warn("AVIC is not supported in CPUID but force enabled"); 1210 - pr_warn("Your system might crash and burn"); 1211 - } 1119 + /* 1120 + * Print a scary message if AVIC is force enabled to make it abundantly 1121 + * clear that ignoring CPUID could have repercussions. See Revision 1122 + * Guide for specific AMD processor for more details. 1123 + */ 1124 + if (!boot_cpu_has(X86_FEATURE_AVIC)) 1125 + pr_warn("AVIC unsupported in CPUID but force enabled, your system might crash and burn\n"); 1126 + 1127 + return true; 1128 + } 1129 + 1130 + /* 1131 + * Note: 1132 + * - The module param avic enable both xAPIC and x2APIC mode. 1133 + * - Hypervisor can support both xAVIC and x2AVIC in the same guest. 1134 + * - The mode can be switched at run-time. 1135 + */ 1136 + bool __init avic_hardware_setup(void) 1137 + { 1138 + avic = avic_want_avic_enabled(); 1139 + if (!avic) 1140 + return false; 1141 + 1142 + pr_info("AVIC enabled\n"); 1212 1143 1213 1144 /* AVIC is a prerequisite for x2AVIC. */ 1214 1145 x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC); 1215 1146 if (x2avic_enabled) 1216 1147 pr_info("x2AVIC enabled\n"); 1148 + else 1149 + svm_x86_ops.allow_apicv_in_x2apic_without_x2apic_virtualization = true; 1217 1150 1218 1151 /* 1219 1152 * Disable IPI virtualization for AMD Family 17h CPUs (Zen1 and Zen2)
+28 -10
arch/x86/kvm/svm/nested.c
··· 636 636 vmcb_mark_dirty(vmcb02, VMCB_DT); 637 637 } 638 638 639 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 640 + (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) { 641 + vmcb02->save.s_cet = vmcb12->save.s_cet; 642 + vmcb02->save.isst_addr = vmcb12->save.isst_addr; 643 + vmcb02->save.ssp = vmcb12->save.ssp; 644 + vmcb_mark_dirty(vmcb02, VMCB_CET); 645 + } 646 + 639 647 kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED); 640 648 641 649 svm_set_efer(vcpu, svm->nested.save.efer); ··· 1052 1044 to_save->rsp = from_save->rsp; 1053 1045 to_save->rip = from_save->rip; 1054 1046 to_save->cpl = 0; 1047 + 1048 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 1049 + to_save->s_cet = from_save->s_cet; 1050 + to_save->isst_addr = from_save->isst_addr; 1051 + to_save->ssp = from_save->ssp; 1052 + } 1055 1053 } 1056 1054 1057 1055 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb) ··· 1124 1110 vmcb12->save.dr7 = vmcb02->save.dr7; 1125 1111 vmcb12->save.dr6 = svm->vcpu.arch.dr6; 1126 1112 vmcb12->save.cpl = vmcb02->save.cpl; 1113 + 1114 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 1115 + vmcb12->save.s_cet = vmcb02->save.s_cet; 1116 + vmcb12->save.isst_addr = vmcb02->save.isst_addr; 1117 + vmcb12->save.ssp = vmcb02->save.ssp; 1118 + } 1127 1119 1128 1120 vmcb12->control.int_state = vmcb02->control.int_state; 1129 1121 vmcb12->control.exit_code = vmcb02->control.exit_code; ··· 1818 1798 if (kvm_state->size < sizeof(*kvm_state) + KVM_STATE_NESTED_SVM_VMCB_SIZE) 1819 1799 return -EINVAL; 1820 1800 1821 - ret = -ENOMEM; 1822 - ctl = kzalloc(sizeof(*ctl), GFP_KERNEL); 1823 - save = kzalloc(sizeof(*save), GFP_KERNEL); 1824 - if (!ctl || !save) 1825 - goto out_free; 1801 + ctl = memdup_user(&user_vmcb->control, sizeof(*ctl)); 1802 + if (IS_ERR(ctl)) 1803 + return PTR_ERR(ctl); 1826 1804 1827 - ret = -EFAULT; 1828 - if (copy_from_user(ctl, &user_vmcb->control, sizeof(*ctl))) 1829 - goto out_free; 1830 - if (copy_from_user(save, &user_vmcb->save, sizeof(*save))) 1831 - goto out_free; 1805 + save = memdup_user(&user_vmcb->save, sizeof(*save)); 1806 + if (IS_ERR(save)) { 1807 + kfree(ctl); 1808 + return PTR_ERR(save); 1809 + } 1832 1810 1833 1811 ret = -EINVAL; 1834 1812 __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
+4 -4
arch/x86/kvm/svm/pmu.c
··· 41 41 struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); 42 42 unsigned int idx; 43 43 44 - if (!vcpu->kvm->arch.enable_pmu) 44 + if (!pmu->version) 45 45 return NULL; 46 46 47 47 switch (msr) { ··· 113 113 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: 114 114 case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: 115 115 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 116 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 116 117 return pmu->version > 1; 117 118 default: 118 119 if (msr > MSR_F15H_PERF_CTR5 && ··· 200 199 kvm_pmu_cap.num_counters_gp); 201 200 202 201 if (pmu->version > 1) { 203 - pmu->global_ctrl_rsvd = ~((1ull << pmu->nr_arch_gp_counters) - 1); 202 + pmu->global_ctrl_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1); 204 203 pmu->global_status_rsvd = pmu->global_ctrl_rsvd; 205 204 } 206 205 207 - pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; 206 + pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(48) - 1; 208 207 pmu->reserved_bits = 0xfffffff000280000ull; 209 208 pmu->raw_event_mask = AMD64_RAW_EVENT_MASK; 210 209 /* not applicable to AMD; but clean them to prevent any fall out */ 211 210 pmu->counter_bitmask[KVM_PMC_FIXED] = 0; 212 211 pmu->nr_arch_fixed_counters = 0; 213 - bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); 214 212 } 215 213 216 214 static void amd_pmu_init(struct kvm_vcpu *vcpu)
+168 -63
arch/x86/kvm/svm/sev.c
··· 37 37 #include "trace.h" 38 38 39 39 #define GHCB_VERSION_MAX 2ULL 40 - #define GHCB_VERSION_DEFAULT 2ULL 41 40 #define GHCB_VERSION_MIN 1ULL 42 41 43 42 #define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION) ··· 57 58 static bool sev_es_debug_swap_enabled = true; 58 59 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444); 59 60 static u64 sev_supported_vmsa_features; 61 + 62 + static unsigned int nr_ciphertext_hiding_asids; 63 + module_param_named(ciphertext_hiding_asids, nr_ciphertext_hiding_asids, uint, 0444); 60 64 61 65 #define AP_RESET_HOLD_NONE 0 62 66 #define AP_RESET_HOLD_NAE_EVENT 1 ··· 87 85 static DEFINE_MUTEX(sev_bitmap_lock); 88 86 unsigned int max_sev_asid; 89 87 static unsigned int min_sev_asid; 88 + static unsigned int max_sev_es_asid; 89 + static unsigned int min_sev_es_asid; 90 + static unsigned int max_snp_asid; 91 + static unsigned int min_snp_asid; 90 92 static unsigned long sev_me_mask; 91 93 static unsigned int nr_asids; 92 94 static unsigned long *sev_asid_bitmap; ··· 153 147 return sev->vmsa_features & SVM_SEV_FEAT_DEBUG_SWAP; 154 148 } 155 149 150 + static bool snp_is_secure_tsc_enabled(struct kvm *kvm) 151 + { 152 + struct kvm_sev_info *sev = to_kvm_sev_info(kvm); 153 + 154 + return (sev->vmsa_features & SVM_SEV_FEAT_SECURE_TSC) && 155 + !WARN_ON_ONCE(!sev_snp_guest(kvm)); 156 + } 157 + 156 158 /* Must be called with the sev_bitmap_lock held */ 157 159 static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid) 158 160 { ··· 187 173 misc_cg_uncharge(type, sev->misc_cg, 1); 188 174 } 189 175 190 - static int sev_asid_new(struct kvm_sev_info *sev) 176 + static int sev_asid_new(struct kvm_sev_info *sev, unsigned long vm_type) 191 177 { 192 178 /* 193 179 * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. 194 180 * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. 195 - * Note: min ASID can end up larger than the max if basic SEV support is 196 - * effectively disabled by disallowing use of ASIDs for SEV guests. 197 181 */ 198 - unsigned int min_asid = sev->es_active ? 1 : min_sev_asid; 199 - unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; 200 - unsigned int asid; 182 + unsigned int min_asid, max_asid, asid; 201 183 bool retry = true; 202 184 int ret; 203 185 186 + if (vm_type == KVM_X86_SNP_VM) { 187 + min_asid = min_snp_asid; 188 + max_asid = max_snp_asid; 189 + } else if (sev->es_active) { 190 + min_asid = min_sev_es_asid; 191 + max_asid = max_sev_es_asid; 192 + } else { 193 + min_asid = min_sev_asid; 194 + max_asid = max_sev_asid; 195 + } 196 + 197 + /* 198 + * The min ASID can end up larger than the max if basic SEV support is 199 + * effectively disabled by disallowing use of ASIDs for SEV guests. 200 + * Similarly for SEV-ES guests the min ASID can end up larger than the 201 + * max when ciphertext hiding is enabled, effectively disabling SEV-ES 202 + * support. 203 + */ 204 204 if (min_asid > max_asid) 205 205 return -ENOTTY; 206 206 ··· 434 406 struct kvm_sev_info *sev = to_kvm_sev_info(kvm); 435 407 struct sev_platform_init_args init_args = {0}; 436 408 bool es_active = vm_type != KVM_X86_SEV_VM; 409 + bool snp_active = vm_type == KVM_X86_SNP_VM; 437 410 u64 valid_vmsa_features = es_active ? sev_supported_vmsa_features : 0; 438 411 int ret; 439 412 ··· 444 415 if (data->flags) 445 416 return -EINVAL; 446 417 418 + if (!snp_active) 419 + valid_vmsa_features &= ~SVM_SEV_FEAT_SECURE_TSC; 420 + 447 421 if (data->vmsa_features & ~valid_vmsa_features) 448 422 return -EINVAL; 449 423 450 424 if (data->ghcb_version > GHCB_VERSION_MAX || (!es_active && data->ghcb_version)) 425 + return -EINVAL; 426 + 427 + /* 428 + * KVM supports the full range of mandatory features defined by version 429 + * 2 of the GHCB protocol, so default to that for SEV-ES guests created 430 + * via KVM_SEV_INIT2 (KVM_SEV_INIT forces version 1). 431 + */ 432 + if (es_active && !data->ghcb_version) 433 + data->ghcb_version = 2; 434 + 435 + if (snp_active && data->ghcb_version < 2) 451 436 return -EINVAL; 452 437 453 438 if (unlikely(sev->active)) ··· 472 429 sev->vmsa_features = data->vmsa_features; 473 430 sev->ghcb_version = data->ghcb_version; 474 431 475 - /* 476 - * Currently KVM supports the full range of mandatory features defined 477 - * by version 2 of the GHCB protocol, so default to that for SEV-ES 478 - * guests created via KVM_SEV_INIT2. 479 - */ 480 - if (sev->es_active && !sev->ghcb_version) 481 - sev->ghcb_version = GHCB_VERSION_DEFAULT; 482 - 483 - if (vm_type == KVM_X86_SNP_VM) 432 + if (snp_active) 484 433 sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE; 485 434 486 - ret = sev_asid_new(sev); 435 + ret = sev_asid_new(sev, vm_type); 487 436 if (ret) 488 437 goto e_no_asid; 489 438 ··· 490 455 } 491 456 492 457 /* This needs to happen after SEV/SNP firmware initialization. */ 493 - if (vm_type == KVM_X86_SNP_VM) { 458 + if (snp_active) { 494 459 ret = snp_guest_req_init(kvm); 495 460 if (ret) 496 461 goto e_free; ··· 604 569 if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params))) 605 570 return -EFAULT; 606 571 607 - sev->policy = params.policy; 608 - 609 572 memset(&start, 0, sizeof(start)); 610 573 611 574 dh_blob = NULL; ··· 651 618 goto e_free_session; 652 619 } 653 620 621 + sev->policy = params.policy; 654 622 sev->handle = start.handle; 655 623 sev->fd = argp->sev_fd; 656 624 ··· 2002 1968 kvm_for_each_vcpu(i, dst_vcpu, dst_kvm) { 2003 1969 dst_svm = to_svm(dst_vcpu); 2004 1970 2005 - sev_init_vmcb(dst_svm); 1971 + sev_init_vmcb(dst_svm, false); 2006 1972 2007 1973 if (!dst->es_active) 2008 1974 continue; ··· 2214 2180 if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO)) 2215 2181 return -EINVAL; 2216 2182 2217 - sev->policy = params.policy; 2183 + if (snp_is_secure_tsc_enabled(kvm)) { 2184 + if (WARN_ON_ONCE(!kvm->arch.default_tsc_khz)) 2185 + return -EINVAL; 2186 + 2187 + start.desired_tsc_khz = kvm->arch.default_tsc_khz; 2188 + } 2218 2189 2219 2190 sev->snp_context = snp_context_create(kvm, argp); 2220 2191 if (!sev->snp_context) ··· 2227 2188 2228 2189 start.gctx_paddr = __psp_pa(sev->snp_context); 2229 2190 start.policy = params.policy; 2191 + 2230 2192 memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw)); 2231 2193 rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error); 2232 2194 if (rc) { ··· 2236 2196 goto e_free_context; 2237 2197 } 2238 2198 2199 + sev->policy = params.policy; 2239 2200 sev->fd = argp->sev_fd; 2240 2201 rc = snp_bind_asid(kvm, &argp->error); 2241 2202 if (rc) { ··· 2370 2329 pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__, 2371 2330 params.gfn_start, params.len, params.type, params.flags); 2372 2331 2373 - if (!PAGE_ALIGNED(params.len) || params.flags || 2332 + if (!params.len || !PAGE_ALIGNED(params.len) || params.flags || 2374 2333 (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL && 2375 2334 params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO && 2376 2335 params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED && ··· 3079 3038 if (min_sev_asid == 1) 3080 3039 goto out; 3081 3040 3041 + min_sev_es_asid = min_snp_asid = 1; 3042 + max_sev_es_asid = max_snp_asid = min_sev_asid - 1; 3043 + 3082 3044 sev_es_asid_count = min_sev_asid - 1; 3083 3045 WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count)); 3084 3046 sev_es_supported = true; ··· 3090 3046 out: 3091 3047 if (sev_enabled) { 3092 3048 init_args.probe = true; 3049 + 3050 + if (sev_is_snp_ciphertext_hiding_supported()) 3051 + init_args.max_snp_asid = min(nr_ciphertext_hiding_asids, 3052 + min_sev_asid - 1); 3053 + 3093 3054 if (sev_platform_init(&init_args)) 3094 3055 sev_supported = sev_es_supported = sev_snp_supported = false; 3095 3056 else if (sev_snp_supported) 3096 3057 sev_snp_supported = is_sev_snp_initialized(); 3058 + 3059 + if (sev_snp_supported) 3060 + nr_ciphertext_hiding_asids = init_args.max_snp_asid; 3061 + 3062 + /* 3063 + * If ciphertext hiding is enabled, the joint SEV-ES/SEV-SNP 3064 + * ASID range is partitioned into separate SEV-ES and SEV-SNP 3065 + * ASID ranges, with the SEV-SNP range being [1..max_snp_asid] 3066 + * and the SEV-ES range being (max_snp_asid..max_sev_es_asid]. 3067 + * Note, SEV-ES may effectively be disabled if all ASIDs from 3068 + * the joint range are assigned to SEV-SNP. 3069 + */ 3070 + if (nr_ciphertext_hiding_asids) { 3071 + max_snp_asid = nr_ciphertext_hiding_asids; 3072 + min_sev_es_asid = max_snp_asid + 1; 3073 + pr_info("SEV-SNP ciphertext hiding enabled\n"); 3074 + } 3097 3075 } 3098 3076 3099 3077 if (boot_cpu_has(X86_FEATURE_SEV)) ··· 3126 3060 min_sev_asid, max_sev_asid); 3127 3061 if (boot_cpu_has(X86_FEATURE_SEV_ES)) 3128 3062 pr_info("SEV-ES %s (ASIDs %u - %u)\n", 3129 - str_enabled_disabled(sev_es_supported), 3130 - min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1); 3063 + sev_es_supported ? min_sev_es_asid <= max_sev_es_asid ? "enabled" : 3064 + "unusable" : 3065 + "disabled", 3066 + min_sev_es_asid, max_sev_es_asid); 3131 3067 if (boot_cpu_has(X86_FEATURE_SEV_SNP)) 3132 3068 pr_info("SEV-SNP %s (ASIDs %u - %u)\n", 3133 3069 str_enabled_disabled(sev_snp_supported), 3134 - min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1); 3070 + min_snp_asid, max_snp_asid); 3135 3071 3136 3072 sev_enabled = sev_supported; 3137 3073 sev_es_enabled = sev_es_supported; ··· 3146 3078 sev_supported_vmsa_features = 0; 3147 3079 if (sev_es_debug_swap_enabled) 3148 3080 sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP; 3081 + 3082 + if (sev_snp_enabled && tsc_khz && cpu_feature_enabled(X86_FEATURE_SNP_SECURE_TSC)) 3083 + sev_supported_vmsa_features |= SVM_SEV_FEAT_SECURE_TSC; 3149 3084 } 3150 3085 3151 3086 void sev_hardware_unsetup(void) ··· 3264 3193 kvfree(svm->sev_es.ghcb_sa); 3265 3194 } 3266 3195 3267 - static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control) 3196 + static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control) 3268 3197 { 3269 3198 return (((u64)control->exit_code_hi) << 32) | control->exit_code; 3270 3199 } ··· 3290 3219 */ 3291 3220 pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa); 3292 3221 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code", 3293 - kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm)); 3222 + kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm)); 3294 3223 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1", 3295 3224 control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm)); 3296 3225 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2", ··· 3343 3272 BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap)); 3344 3273 memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap)); 3345 3274 3346 - vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb); 3347 - vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb); 3348 - vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb); 3349 - vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb); 3350 - vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb); 3275 + vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm); 3276 + vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm); 3277 + vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm); 3278 + vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm); 3279 + vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm); 3351 3280 3352 - svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb); 3281 + svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm); 3353 3282 3354 - if (kvm_ghcb_xcr0_is_valid(svm)) { 3355 - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); 3356 - vcpu->arch.cpuid_dynamic_bits_dirty = true; 3357 - } 3283 + if (kvm_ghcb_xcr0_is_valid(svm)) 3284 + __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm)); 3285 + 3286 + if (kvm_ghcb_xss_is_valid(svm)) 3287 + __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm)); 3358 3288 3359 3289 /* Copy the GHCB exit information into the VMCB fields */ 3360 - exit_code = ghcb_get_sw_exit_code(ghcb); 3290 + exit_code = kvm_ghcb_get_sw_exit_code(svm); 3361 3291 control->exit_code = lower_32_bits(exit_code); 3362 3292 control->exit_code_hi = upper_32_bits(exit_code); 3363 - control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb); 3364 - control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb); 3365 - svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb); 3293 + control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(svm); 3294 + control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(svm); 3295 + svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm); 3366 3296 3367 3297 /* Clear the valid entries fields */ 3368 3298 memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); ··· 3380 3308 * Retrieve the exit code now even though it may not be marked valid 3381 3309 * as it could help with debugging. 3382 3310 */ 3383 - exit_code = kvm_ghcb_get_sw_exit_code(control); 3311 + exit_code = kvm_get_cached_sw_exit_code(control); 3384 3312 3385 3313 /* Only GHCB Usage code 0 is supported */ 3386 3314 if (svm->sev_es.ghcb->ghcb_usage) { ··· 3952 3880 /* 3953 3881 * Invoked as part of svm_vcpu_reset() processing of an init event. 3954 3882 */ 3955 - void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) 3883 + static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) 3956 3884 { 3957 3885 struct vcpu_svm *svm = to_svm(vcpu); 3958 3886 struct kvm_memory_slot *slot; 3959 3887 struct page *page; 3960 3888 kvm_pfn_t pfn; 3961 3889 gfn_t gfn; 3962 - 3963 - if (!sev_snp_guest(vcpu->kvm)) 3964 - return; 3965 3890 3966 3891 guard(mutex)(&svm->sev_es.snp_vmsa_mutex); 3967 3892 ··· 4385 4316 4386 4317 svm_vmgexit_success(svm, 0); 4387 4318 4388 - exit_code = kvm_ghcb_get_sw_exit_code(control); 4319 + exit_code = kvm_get_cached_sw_exit_code(control); 4389 4320 switch (exit_code) { 4390 4321 case SVM_VMGEXIT_MMIO_READ: 4391 4322 ret = setup_vmgexit_scratch(svm, true, control->exit_info_2); ··· 4517 4448 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) && 4518 4449 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID)); 4519 4450 4451 + svm_set_intercept_for_msr(vcpu, MSR_AMD64_GUEST_TSC_FREQ, MSR_TYPE_R, 4452 + !snp_is_secure_tsc_enabled(vcpu->kvm)); 4453 + 4520 4454 /* 4521 4455 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if 4522 4456 * the host/guest supports its use. ··· 4548 4476 vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f)); 4549 4477 } 4550 4478 4551 - static void sev_es_init_vmcb(struct vcpu_svm *svm) 4479 + static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event) 4552 4480 { 4553 4481 struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm); 4554 4482 struct vmcb *vmcb = svm->vmcb01.ptr; ··· 4609 4537 4610 4538 /* Can't intercept XSETBV, HV can't modify XCR0 directly */ 4611 4539 svm_clr_intercept(svm, INTERCEPT_XSETBV); 4540 + 4541 + /* 4542 + * Set the GHCB MSR value as per the GHCB specification when emulating 4543 + * vCPU RESET for an SEV-ES guest. 4544 + */ 4545 + if (!init_event) 4546 + set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version, 4547 + GHCB_VERSION_MIN, 4548 + sev_enc_bit)); 4612 4549 } 4613 4550 4614 - void sev_init_vmcb(struct vcpu_svm *svm) 4551 + void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) 4615 4552 { 4553 + struct kvm_vcpu *vcpu = &svm->vcpu; 4554 + 4616 4555 svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE; 4617 4556 clr_exception_intercept(svm, UD_VECTOR); 4618 4557 ··· 4633 4550 */ 4634 4551 clr_exception_intercept(svm, GP_VECTOR); 4635 4552 4636 - if (sev_es_guest(svm->vcpu.kvm)) 4637 - sev_es_init_vmcb(svm); 4553 + if (init_event && sev_snp_guest(vcpu->kvm)) 4554 + sev_snp_init_protected_guest_state(vcpu); 4555 + 4556 + if (sev_es_guest(vcpu->kvm)) 4557 + sev_es_init_vmcb(svm, init_event); 4638 4558 } 4639 4559 4640 - void sev_es_vcpu_reset(struct vcpu_svm *svm) 4560 + int sev_vcpu_create(struct kvm_vcpu *vcpu) 4641 4561 { 4642 - struct kvm_vcpu *vcpu = &svm->vcpu; 4643 - struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); 4644 - 4645 - /* 4646 - * Set the GHCB MSR value as per the GHCB specification when emulating 4647 - * vCPU RESET for an SEV-ES guest. 4648 - */ 4649 - set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version, 4650 - GHCB_VERSION_MIN, 4651 - sev_enc_bit)); 4562 + struct vcpu_svm *svm = to_svm(vcpu); 4563 + struct page *vmsa_page; 4652 4564 4653 4565 mutex_init(&svm->sev_es.snp_vmsa_mutex); 4566 + 4567 + if (!sev_es_guest(vcpu->kvm)) 4568 + return 0; 4569 + 4570 + /* 4571 + * SEV-ES guests require a separate (from the VMCB) VMSA page used to 4572 + * contain the encrypted register state of the guest. 4573 + */ 4574 + vmsa_page = snp_safe_alloc_page(); 4575 + if (!vmsa_page) 4576 + return -ENOMEM; 4577 + 4578 + svm->sev_es.vmsa = page_address(vmsa_page); 4579 + 4580 + vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); 4581 + 4582 + return 0; 4654 4583 } 4655 4584 4656 4585 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa) ··· 4713 4618 hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2); 4714 4619 hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3); 4715 4620 } 4621 + 4622 + /* 4623 + * TSC_AUX is always virtualized for SEV-ES guests when the feature is 4624 + * available, i.e. TSC_AUX is loaded on #VMEXIT from the host save area. 4625 + * Set the save area to the current hardware value, i.e. the current 4626 + * user return value, so that the correct value is restored on #VMEXIT. 4627 + */ 4628 + if (cpu_feature_enabled(X86_FEATURE_V_TSC_AUX) && 4629 + !WARN_ON_ONCE(tsc_aux_uret_slot < 0)) 4630 + hostsa->tsc_aux = kvm_get_user_return_msr(tsc_aux_uret_slot); 4716 4631 } 4717 4632 4718 4633 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+101 -135
arch/x86/kvm/svm/svm.c
··· 158 158 static int tsc_scaling = true; 159 159 module_param(tsc_scaling, int, 0444); 160 160 161 - /* 162 - * enable / disable AVIC. Because the defaults differ for APICv 163 - * support between VMX and SVM we cannot use module_param_named. 164 - */ 165 - static bool avic; 166 - module_param(avic, bool, 0444); 167 - module_param(enable_ipiv, bool, 0444); 168 - 169 161 module_param(enable_device_posted_irqs, bool, 0444); 170 162 171 163 bool __read_mostly dump_invalid_vmcb; ··· 187 195 * RDTSCP and RDPID are not used in the kernel, specifically to allow KVM to 188 196 * defer the restoration of TSC_AUX until the CPU returns to userspace. 189 197 */ 190 - static int tsc_aux_uret_slot __read_mostly = -1; 198 + int tsc_aux_uret_slot __ro_after_init = -1; 191 199 192 200 static int get_npt_level(void) 193 201 { ··· 569 577 570 578 amd_pmu_enable_virt(); 571 579 572 - /* 573 - * If TSC_AUX virtualization is supported, TSC_AUX becomes a swap type 574 - * "B" field (see sev_es_prepare_switch_to_guest()) for SEV-ES guests. 575 - * Since Linux does not change the value of TSC_AUX once set, prime the 576 - * TSC_AUX field now to avoid a RDMSR on every vCPU run. 577 - */ 578 - if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) { 579 - u32 __maybe_unused msr_hi; 580 - 581 - rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi); 582 - } 583 - 584 580 return 0; 585 581 } 586 582 ··· 716 736 svm_set_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW, intercept); 717 737 } 718 738 719 - void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept) 720 - { 721 - static const u32 x2avic_passthrough_msrs[] = { 722 - X2APIC_MSR(APIC_ID), 723 - X2APIC_MSR(APIC_LVR), 724 - X2APIC_MSR(APIC_TASKPRI), 725 - X2APIC_MSR(APIC_ARBPRI), 726 - X2APIC_MSR(APIC_PROCPRI), 727 - X2APIC_MSR(APIC_EOI), 728 - X2APIC_MSR(APIC_RRR), 729 - X2APIC_MSR(APIC_LDR), 730 - X2APIC_MSR(APIC_DFR), 731 - X2APIC_MSR(APIC_SPIV), 732 - X2APIC_MSR(APIC_ISR), 733 - X2APIC_MSR(APIC_TMR), 734 - X2APIC_MSR(APIC_IRR), 735 - X2APIC_MSR(APIC_ESR), 736 - X2APIC_MSR(APIC_ICR), 737 - X2APIC_MSR(APIC_ICR2), 738 - 739 - /* 740 - * Note! Always intercept LVTT, as TSC-deadline timer mode 741 - * isn't virtualized by hardware, and the CPU will generate a 742 - * #GP instead of a #VMEXIT. 743 - */ 744 - X2APIC_MSR(APIC_LVTTHMR), 745 - X2APIC_MSR(APIC_LVTPC), 746 - X2APIC_MSR(APIC_LVT0), 747 - X2APIC_MSR(APIC_LVT1), 748 - X2APIC_MSR(APIC_LVTERR), 749 - X2APIC_MSR(APIC_TMICT), 750 - X2APIC_MSR(APIC_TMCCT), 751 - X2APIC_MSR(APIC_TDCR), 752 - }; 753 - int i; 754 - 755 - if (intercept == svm->x2avic_msrs_intercepted) 756 - return; 757 - 758 - if (!x2avic_enabled) 759 - return; 760 - 761 - for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++) 762 - svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i], 763 - MSR_TYPE_RW, intercept); 764 - 765 - svm->x2avic_msrs_intercepted = intercept; 766 - } 767 - 768 739 void svm_vcpu_free_msrpm(void *msrpm) 769 740 { 770 741 __free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE)); ··· 773 842 if (kvm_aperfmperf_in_guest(vcpu->kvm)) { 774 843 svm_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R); 775 844 svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R); 845 + } 846 + 847 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 848 + bool shstk_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 849 + 850 + svm_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, !shstk_enabled); 851 + svm_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, !shstk_enabled); 852 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, !shstk_enabled); 853 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, !shstk_enabled); 854 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, !shstk_enabled); 855 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, !shstk_enabled); 776 856 } 777 857 778 858 if (sev_es_guest(vcpu->kvm)) ··· 1019 1077 } 1020 1078 } 1021 1079 1022 - static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu) 1080 + static void svm_recalc_intercepts(struct kvm_vcpu *vcpu) 1023 1081 { 1024 1082 svm_recalc_instruction_intercepts(vcpu); 1025 1083 svm_recalc_msr_intercepts(vcpu); 1026 1084 } 1027 1085 1028 - static void init_vmcb(struct kvm_vcpu *vcpu) 1086 + static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) 1029 1087 { 1030 1088 struct vcpu_svm *svm = to_svm(vcpu); 1031 1089 struct vmcb *vmcb = svm->vmcb01.ptr; ··· 1163 1221 svm_set_intercept(svm, INTERCEPT_BUSLOCK); 1164 1222 1165 1223 if (sev_guest(vcpu->kvm)) 1166 - sev_init_vmcb(svm); 1224 + sev_init_vmcb(svm, init_event); 1167 1225 1168 1226 svm_hv_init_vmcb(vmcb); 1169 1227 1170 - svm_recalc_intercepts_after_set_cpuid(vcpu); 1228 + kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu); 1171 1229 1172 1230 vmcb_mark_all_dirty(vmcb); 1173 1231 ··· 1186 1244 1187 1245 svm->nmi_masked = false; 1188 1246 svm->awaiting_iret_completion = false; 1189 - 1190 - if (sev_es_guest(vcpu->kvm)) 1191 - sev_es_vcpu_reset(svm); 1192 1247 } 1193 1248 1194 1249 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ··· 1195 1256 svm->spec_ctrl = 0; 1196 1257 svm->virt_spec_ctrl = 0; 1197 1258 1198 - if (init_event) 1199 - sev_snp_init_protected_guest_state(vcpu); 1200 - 1201 - init_vmcb(vcpu); 1259 + init_vmcb(vcpu, init_event); 1202 1260 1203 1261 if (!init_event) 1204 1262 __svm_vcpu_reset(vcpu); ··· 1211 1275 { 1212 1276 struct vcpu_svm *svm; 1213 1277 struct page *vmcb01_page; 1214 - struct page *vmsa_page = NULL; 1215 1278 int err; 1216 1279 1217 1280 BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0); ··· 1221 1286 if (!vmcb01_page) 1222 1287 goto out; 1223 1288 1224 - if (sev_es_guest(vcpu->kvm)) { 1225 - /* 1226 - * SEV-ES guests require a separate VMSA page used to contain 1227 - * the encrypted register state of the guest. 1228 - */ 1229 - vmsa_page = snp_safe_alloc_page(); 1230 - if (!vmsa_page) 1231 - goto error_free_vmcb_page; 1232 - } 1289 + err = sev_vcpu_create(vcpu); 1290 + if (err) 1291 + goto error_free_vmcb_page; 1233 1292 1234 1293 err = avic_init_vcpu(svm); 1235 1294 if (err) 1236 - goto error_free_vmsa_page; 1295 + goto error_free_sev; 1237 1296 1238 1297 svm->msrpm = svm_vcpu_alloc_msrpm(); 1239 1298 if (!svm->msrpm) { 1240 1299 err = -ENOMEM; 1241 - goto error_free_vmsa_page; 1300 + goto error_free_sev; 1242 1301 } 1243 1302 1244 1303 svm->x2avic_msrs_intercepted = true; ··· 1241 1312 svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT); 1242 1313 svm_switch_vmcb(svm, &svm->vmcb01); 1243 1314 1244 - if (vmsa_page) 1245 - svm->sev_es.vmsa = page_address(vmsa_page); 1246 - 1247 1315 svm->guest_state_loaded = false; 1248 1316 1249 1317 return 0; 1250 1318 1251 - error_free_vmsa_page: 1252 - if (vmsa_page) 1253 - __free_page(vmsa_page); 1319 + error_free_sev: 1320 + sev_free_vcpu(vcpu); 1254 1321 error_free_vmcb_page: 1255 1322 __free_page(vmcb01_page); 1256 1323 out: ··· 1348 1423 __svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio); 1349 1424 1350 1425 /* 1351 - * TSC_AUX is always virtualized for SEV-ES guests when the feature is 1352 - * available. The user return MSR support is not required in this case 1353 - * because TSC_AUX is restored on #VMEXIT from the host save area 1354 - * (which has been initialized in svm_enable_virtualization_cpu()). 1426 + * TSC_AUX is always virtualized (context switched by hardware) for 1427 + * SEV-ES guests when the feature is available. For non-SEV-ES guests, 1428 + * context switch TSC_AUX via the user_return MSR infrastructure (not 1429 + * all CPUs support TSC_AUX virtualization). 1355 1430 */ 1356 1431 if (likely(tsc_aux_uret_slot >= 0) && 1357 1432 (!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm))) ··· 2652 2727 static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu, 2653 2728 struct msr_data *msr_info) 2654 2729 { 2655 - return sev_es_guest(vcpu->kvm) && 2656 - vcpu->arch.guest_state_protected && 2730 + return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected && 2731 + msr_info->index != MSR_IA32_XSS && 2657 2732 !msr_write_intercepted(vcpu, msr_info->index); 2658 2733 } 2659 2734 ··· 2708 2783 msr_info->data = svm->vmcb01.ptr->save.sysenter_esp; 2709 2784 if (guest_cpuid_is_intel_compatible(vcpu)) 2710 2785 msr_info->data |= (u64)svm->sysenter_esp_hi << 32; 2786 + break; 2787 + case MSR_IA32_S_CET: 2788 + msr_info->data = svm->vmcb->save.s_cet; 2789 + break; 2790 + case MSR_IA32_INT_SSP_TAB: 2791 + msr_info->data = svm->vmcb->save.isst_addr; 2792 + break; 2793 + case MSR_KVM_INTERNAL_GUEST_SSP: 2794 + msr_info->data = svm->vmcb->save.ssp; 2711 2795 break; 2712 2796 case MSR_TSC_AUX: 2713 2797 msr_info->data = svm->tsc_aux; ··· 2950 3016 svm->vmcb01.ptr->save.sysenter_esp = (u32)data; 2951 3017 svm->sysenter_esp_hi = guest_cpuid_is_intel_compatible(vcpu) ? (data >> 32) : 0; 2952 3018 break; 3019 + case MSR_IA32_S_CET: 3020 + svm->vmcb->save.s_cet = data; 3021 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3022 + break; 3023 + case MSR_IA32_INT_SSP_TAB: 3024 + svm->vmcb->save.isst_addr = data; 3025 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3026 + break; 3027 + case MSR_KVM_INTERNAL_GUEST_SSP: 3028 + svm->vmcb->save.ssp = data; 3029 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3030 + break; 2953 3031 case MSR_TSC_AUX: 2954 3032 /* 2955 3033 * TSC_AUX is always virtualized for SEV-ES guests when the 2956 3034 * feature is available. The user return MSR support is not 2957 3035 * required in this case because TSC_AUX is restored on #VMEXIT 2958 - * from the host save area (which has been initialized in 2959 - * svm_enable_virtualization_cpu()). 3036 + * from the host save area. 2960 3037 */ 2961 3038 if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm)) 2962 3039 break; ··· 3352 3407 pr_err("%-15s %016llx %-13s %016llx\n", 3353 3408 "rsp:", save->rsp, "rax:", save->rax); 3354 3409 pr_err("%-15s %016llx %-13s %016llx\n", 3410 + "s_cet:", save->s_cet, "ssp:", save->ssp); 3411 + pr_err("%-15s %016llx\n", 3412 + "isst_addr:", save->isst_addr); 3413 + pr_err("%-15s %016llx %-13s %016llx\n", 3355 3414 "star:", save01->star, "lstar:", save01->lstar); 3356 3415 pr_err("%-15s %016llx %-13s %016llx\n", 3357 3416 "cstar:", save01->cstar, "sfmask:", save01->sfmask); ··· 3378 3429 3379 3430 pr_err("%-15s %016llx\n", 3380 3431 "sev_features", vmsa->sev_features); 3432 + 3433 + pr_err("%-15s %016llx %-13s %016llx\n", 3434 + "pl0_ssp:", vmsa->pl0_ssp, "pl1_ssp:", vmsa->pl1_ssp); 3435 + pr_err("%-15s %016llx %-13s %016llx\n", 3436 + "pl2_ssp:", vmsa->pl2_ssp, "pl3_ssp:", vmsa->pl3_ssp); 3437 + pr_err("%-15s %016llx\n", 3438 + "u_cet:", vmsa->u_cet); 3381 3439 3382 3440 pr_err("%-15s %016llx %-13s %016llx\n", 3383 3441 "rax:", vmsa->rax, "rbx:", vmsa->rbx); ··· 4136 4180 static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu) 4137 4181 { 4138 4182 struct vcpu_svm *svm = to_svm(vcpu); 4183 + struct vmcb_control_area *control = &svm->vmcb->control; 4184 + 4185 + /* 4186 + * Next RIP must be provided as IRQs are disabled, and accessing guest 4187 + * memory to decode the instruction might fault, i.e. might sleep. 4188 + */ 4189 + if (!nrips || !control->next_rip) 4190 + return EXIT_FASTPATH_NONE; 4139 4191 4140 4192 if (is_guest_mode(vcpu)) 4141 4193 return EXIT_FASTPATH_NONE; 4142 4194 4143 - switch (svm->vmcb->control.exit_code) { 4195 + switch (control->exit_code) { 4144 4196 case SVM_EXIT_MSR: 4145 - if (!svm->vmcb->control.exit_info_1) 4197 + if (!control->exit_info_1) 4146 4198 break; 4147 - return handle_fastpath_set_msr_irqoff(vcpu); 4199 + return handle_fastpath_wrmsr(vcpu); 4148 4200 case SVM_EXIT_HLT: 4149 4201 return handle_fastpath_hlt(vcpu); 4202 + case SVM_EXIT_INVD: 4203 + return handle_fastpath_invd(vcpu); 4150 4204 default: 4151 4205 break; 4152 4206 } ··· 4433 4467 4434 4468 if (sev_guest(vcpu->kvm)) 4435 4469 sev_vcpu_after_set_cpuid(svm); 4436 - 4437 - svm_recalc_intercepts_after_set_cpuid(vcpu); 4438 4470 } 4439 4471 4440 4472 static bool svm_has_wbinvd_exit(void) ··· 5005 5041 return page_address(page); 5006 5042 } 5007 5043 5008 - static struct kvm_x86_ops svm_x86_ops __initdata = { 5044 + struct kvm_x86_ops svm_x86_ops __initdata = { 5009 5045 .name = KBUILD_MODNAME, 5010 5046 5011 5047 .check_processor_compatibility = svm_check_processor_compat, ··· 5134 5170 5135 5171 .apic_init_signal_blocked = svm_apic_init_signal_blocked, 5136 5172 5137 - .recalc_msr_intercepts = svm_recalc_msr_intercepts, 5173 + .recalc_intercepts = svm_recalc_intercepts, 5138 5174 .complete_emulated_msr = svm_complete_emulated_msr, 5139 5175 5140 5176 .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, ··· 5192 5228 kvm_set_cpu_caps(); 5193 5229 5194 5230 kvm_caps.supported_perf_cap = 0; 5195 - kvm_caps.supported_xss = 0; 5231 + 5232 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 5196 5233 5197 5234 /* CPUID 0x80000001 and 0x8000000A (SVM features) */ 5198 5235 if (nested) { ··· 5265 5300 /* CPUID 0x8000001F (SME/SEV features) */ 5266 5301 sev_set_cpu_caps(); 5267 5302 5268 - /* Don't advertise Bus Lock Detect to guest if SVM support is absent */ 5303 + /* 5304 + * Clear capabilities that are automatically configured by common code, 5305 + * but that require explicit SVM support (that isn't yet implemented). 5306 + */ 5269 5307 kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT); 5308 + kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM); 5270 5309 } 5271 5310 5272 5311 static __init int svm_hardware_setup(void) ··· 5343 5374 get_npt_level(), PG_LEVEL_1G); 5344 5375 pr_info("Nested Paging %s\n", str_enabled_disabled(npt_enabled)); 5345 5376 5377 + /* 5378 + * It seems that on AMD processors PTE's accessed bit is 5379 + * being set by the CPU hardware before the NPF vmexit. 5380 + * This is not expected behaviour and our tests fail because 5381 + * of it. 5382 + * A workaround here is to disable support for 5383 + * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled. 5384 + * In this case userspace can know if there is support using 5385 + * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle 5386 + * it 5387 + * If future AMD CPU models change the behaviour described above, 5388 + * this variable can be changed accordingly 5389 + */ 5390 + allow_smaller_maxphyaddr = !npt_enabled; 5391 + 5346 5392 /* Setup shadow_me_value and shadow_me_mask */ 5347 5393 kvm_mmu_set_me_spte_mask(sme_me_mask, sme_me_mask); 5348 5394 ··· 5392 5408 goto err; 5393 5409 } 5394 5410 5395 - enable_apicv = avic = avic && avic_hardware_setup(); 5396 - 5411 + enable_apicv = avic_hardware_setup(); 5397 5412 if (!enable_apicv) { 5398 5413 enable_ipiv = false; 5399 5414 svm_x86_ops.vcpu_blocking = NULL; 5400 5415 svm_x86_ops.vcpu_unblocking = NULL; 5401 5416 svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL; 5402 - } else if (!x2avic_enabled) { 5403 - svm_x86_ops.allow_apicv_in_x2apic_without_x2apic_virtualization = true; 5404 5417 } 5405 5418 5406 5419 if (vls) { ··· 5433 5452 pr_info("PMU virtualization is disabled\n"); 5434 5453 5435 5454 svm_set_cpu_caps(); 5436 - 5437 - /* 5438 - * It seems that on AMD processors PTE's accessed bit is 5439 - * being set by the CPU hardware before the NPF vmexit. 5440 - * This is not expected behaviour and our tests fail because 5441 - * of it. 5442 - * A workaround here is to disable support for 5443 - * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled. 5444 - * In this case userspace can know if there is support using 5445 - * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle 5446 - * it 5447 - * If future AMD CPU models change the behaviour described above, 5448 - * this variable can be changed accordingly 5449 - */ 5450 - allow_smaller_maxphyaddr = !npt_enabled; 5451 5455 5452 5456 kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED; 5453 5457 return 0;
+26 -18
arch/x86/kvm/svm/svm.h
··· 48 48 extern int nrips; 49 49 extern int vgif; 50 50 extern bool intercept_smi; 51 - extern bool x2avic_enabled; 52 51 extern bool vnmi; 53 52 extern int lbrv; 53 + 54 + extern int tsc_aux_uret_slot __ro_after_init; 55 + 56 + extern struct kvm_x86_ops svm_x86_ops __initdata; 54 57 55 58 /* 56 59 * Clean bits in VMCB. ··· 77 74 * AVIC PHYSICAL_TABLE pointer, 78 75 * AVIC LOGICAL_TABLE pointer 79 76 */ 77 + VMCB_CET, /* S_CET, SSP, ISST_ADDR */ 80 78 VMCB_SW = 31, /* Reserved for hypervisor/software use */ 81 79 }; 82 80 ··· 86 82 (1U << VMCB_ASID) | (1U << VMCB_INTR) | \ 87 83 (1U << VMCB_NPT) | (1U << VMCB_CR) | (1U << VMCB_DR) | \ 88 84 (1U << VMCB_DT) | (1U << VMCB_SEG) | (1U << VMCB_CR2) | \ 89 - (1U << VMCB_LBR) | (1U << VMCB_AVIC) | \ 85 + (1U << VMCB_LBR) | (1U << VMCB_AVIC) | (1U << VMCB_CET) | \ 90 86 (1U << VMCB_SW)) 91 87 92 88 /* TPR and CR2 are always written before VMRUN */ ··· 703 699 int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code); 704 700 void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr, 705 701 int read, int write); 706 - void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable); 707 702 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode, 708 703 int trig_mode, int vec); 709 704 ··· 804 801 BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_TOO_BIG) \ 805 802 ) 806 803 807 - bool avic_hardware_setup(void); 804 + bool __init avic_hardware_setup(void); 808 805 int avic_ga_log_notifier(u32 ga_tag); 809 806 void avic_vm_destroy(struct kvm *kvm); 810 807 int avic_vm_init(struct kvm *kvm); ··· 829 826 /* sev.c */ 830 827 831 828 int pre_sev_run(struct vcpu_svm *svm, int cpu); 832 - void sev_init_vmcb(struct vcpu_svm *svm); 829 + void sev_init_vmcb(struct vcpu_svm *svm, bool init_event); 833 830 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm); 834 831 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in); 835 - void sev_es_vcpu_reset(struct vcpu_svm *svm); 836 832 void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu); 837 833 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector); 838 834 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa); ··· 856 854 return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT); 857 855 } 858 856 857 + int sev_vcpu_create(struct kvm_vcpu *vcpu); 859 858 void sev_free_vcpu(struct kvm_vcpu *vcpu); 860 859 void sev_vm_destroy(struct kvm *kvm); 861 860 void __init sev_set_cpu_caps(void); ··· 866 863 int sev_dev_get_attr(u32 group, u64 attr, u64 *val); 867 864 extern unsigned int max_sev_asid; 868 865 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code); 869 - void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); 870 866 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); 871 867 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); 872 868 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private); ··· 882 880 return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT); 883 881 } 884 882 883 + static inline int sev_vcpu_create(struct kvm_vcpu *vcpu) { return 0; } 885 884 static inline void sev_free_vcpu(struct kvm_vcpu *vcpu) {} 886 885 static inline void sev_vm_destroy(struct kvm *kvm) {} 887 886 static inline void __init sev_set_cpu_caps(void) {} ··· 892 889 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } 893 890 #define max_sev_asid 0 894 891 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {} 895 - static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {} 896 892 static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) 897 893 { 898 894 return 0; ··· 916 914 void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted); 917 915 918 916 #define DEFINE_KVM_GHCB_ACCESSORS(field) \ 919 - static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \ 920 - { \ 921 - return test_bit(GHCB_BITMAP_IDX(field), \ 922 - (unsigned long *)&svm->sev_es.valid_bitmap); \ 923 - } \ 924 - \ 925 - static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \ 926 - { \ 927 - return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0; \ 928 - } \ 917 + static __always_inline u64 kvm_ghcb_get_##field(struct vcpu_svm *svm) \ 918 + { \ 919 + return READ_ONCE(svm->sev_es.ghcb->save.field); \ 920 + } \ 921 + \ 922 + static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \ 923 + { \ 924 + return test_bit(GHCB_BITMAP_IDX(field), \ 925 + (unsigned long *)&svm->sev_es.valid_bitmap); \ 926 + } \ 927 + \ 928 + static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm) \ 929 + { \ 930 + return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(svm) : 0; \ 931 + } 929 932 930 933 DEFINE_KVM_GHCB_ACCESSORS(cpl) 931 934 DEFINE_KVM_GHCB_ACCESSORS(rax) ··· 943 936 DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2) 944 937 DEFINE_KVM_GHCB_ACCESSORS(sw_scratch) 945 938 DEFINE_KVM_GHCB_ACCESSORS(xcr0) 939 + DEFINE_KVM_GHCB_ACCESSORS(xss) 946 940 947 941 #endif
+27 -1
arch/x86/kvm/svm/svm_onhyperv.c
··· 15 15 #include "kvm_onhyperv.h" 16 16 #include "svm_onhyperv.h" 17 17 18 - int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) 18 + static int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) 19 19 { 20 20 struct hv_vmcb_enlightenments *hve; 21 21 hpa_t partition_assist_page = hv_get_partition_assist_page(vcpu); ··· 35 35 return 0; 36 36 } 37 37 38 + __init void svm_hv_hardware_setup(void) 39 + { 40 + if (npt_enabled && 41 + ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) { 42 + pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n"); 43 + svm_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; 44 + svm_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; 45 + } 46 + 47 + if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) { 48 + int cpu; 49 + 50 + pr_info(KBUILD_MODNAME ": Hyper-V Direct TLB Flush enabled\n"); 51 + for_each_online_cpu(cpu) { 52 + struct hv_vp_assist_page *vp_ap = 53 + hv_get_vp_assist_page(cpu); 54 + 55 + if (!vp_ap) 56 + continue; 57 + 58 + vp_ap->nested_control.features.directhypercall = 1; 59 + } 60 + svm_x86_ops.enable_l2_tlb_flush = 61 + svm_hv_enable_l2_tlb_flush; 62 + } 63 + }
+1 -30
arch/x86/kvm/svm/svm_onhyperv.h
··· 13 13 #include "kvm_onhyperv.h" 14 14 #include "svm/hyperv.h" 15 15 16 - static struct kvm_x86_ops svm_x86_ops; 17 - 18 - int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu); 16 + __init void svm_hv_hardware_setup(void); 19 17 20 18 static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu) 21 19 { ··· 36 38 37 39 if (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP) 38 40 hve->hv_enlightenments_control.msr_bitmap = 1; 39 - } 40 - 41 - static inline __init void svm_hv_hardware_setup(void) 42 - { 43 - if (npt_enabled && 44 - ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) { 45 - pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n"); 46 - svm_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; 47 - svm_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; 48 - } 49 - 50 - if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) { 51 - int cpu; 52 - 53 - pr_info(KBUILD_MODNAME ": Hyper-V Direct TLB Flush enabled\n"); 54 - for_each_online_cpu(cpu) { 55 - struct hv_vp_assist_page *vp_ap = 56 - hv_get_vp_assist_page(cpu); 57 - 58 - if (!vp_ap) 59 - continue; 60 - 61 - vp_ap->nested_control.features.directhypercall = 1; 62 - } 63 - svm_x86_ops.enable_l2_tlb_flush = 64 - svm_hv_enable_l2_tlb_flush; 65 - } 66 41 } 67 42 68 43 static inline void svm_hv_vmcb_dirty_nested_enlightenments(
+3 -2
arch/x86/kvm/trace.h
··· 461 461 462 462 #define kvm_trace_sym_exc \ 463 463 EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \ 464 - EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \ 465 - EXS(MF), EXS(AC), EXS(MC) 464 + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \ 465 + EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP), \ 466 + EXS(HV), EXS(VC), EXS(SX) 466 467 467 468 /* 468 469 * Tracepoint for kvm interrupt injection:
+9 -3
arch/x86/kvm/vmx/capabilities.h
··· 20 20 #define PT_MODE_SYSTEM 0 21 21 #define PT_MODE_HOST_GUEST 1 22 22 23 - #define PMU_CAP_FW_WRITES (1ULL << 13) 24 - #define PMU_CAP_LBR_FMT 0x3f 25 - 26 23 struct nested_vmx_msrs { 27 24 /* 28 25 * We only store the "true" versions of the VMX capability MSRs. We ··· 73 76 return vmcs_config.basic & VMX_BASIC_INOUT; 74 77 } 75 78 79 + static inline bool cpu_has_vmx_basic_no_hw_errcode_cc(void) 80 + { 81 + return vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC; 82 + } 83 + 76 84 static inline bool cpu_has_virtual_nmis(void) 77 85 { 78 86 return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS && ··· 105 103 return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; 106 104 } 107 105 106 + static inline bool cpu_has_load_cet_ctrl(void) 107 + { 108 + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE); 109 + } 108 110 static inline bool cpu_has_vmx_mpx(void) 109 111 { 110 112 return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
+7 -7
arch/x86/kvm/vmx/main.c
··· 188 188 return vmx_get_msr(vcpu, msr_info); 189 189 } 190 190 191 - static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 191 + static void vt_recalc_intercepts(struct kvm_vcpu *vcpu) 192 192 { 193 193 /* 194 - * TDX doesn't allow VMM to configure interception of MSR accesses. 195 - * TDX guest requests MSR accesses by calling TDVMCALL. The MSR 196 - * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR 197 - * if the userspace has set any. 194 + * TDX doesn't allow VMM to configure interception of instructions or 195 + * MSR accesses. TDX guest requests MSR accesses by calling TDVMCALL. 196 + * The MSR filters will be applied when handling the TDVMCALL for 197 + * RDMSR/WRMSR if the userspace has set any. 198 198 */ 199 199 if (is_td_vcpu(vcpu)) 200 200 return; 201 201 202 - vmx_recalc_msr_intercepts(vcpu); 202 + vmx_recalc_intercepts(vcpu); 203 203 } 204 204 205 205 static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) ··· 996 996 .apic_init_signal_blocked = vt_op(apic_init_signal_blocked), 997 997 .migrate_timers = vmx_migrate_timers, 998 998 999 - .recalc_msr_intercepts = vt_op(recalc_msr_intercepts), 999 + .recalc_intercepts = vt_op(recalc_intercepts), 1000 1000 .complete_emulated_msr = vt_op(complete_emulated_msr), 1001 1001 1002 1002 .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
+188 -27
arch/x86/kvm/vmx/nested.c
··· 721 721 nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 722 722 MSR_IA32_MPERF, MSR_TYPE_R); 723 723 724 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 725 + MSR_IA32_U_CET, MSR_TYPE_RW); 726 + 727 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 728 + MSR_IA32_S_CET, MSR_TYPE_RW); 729 + 730 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 731 + MSR_IA32_PL0_SSP, MSR_TYPE_RW); 732 + 733 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 734 + MSR_IA32_PL1_SSP, MSR_TYPE_RW); 735 + 736 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 737 + MSR_IA32_PL2_SSP, MSR_TYPE_RW); 738 + 739 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 740 + MSR_IA32_PL3_SSP, MSR_TYPE_RW); 741 + 724 742 kvm_vcpu_unmap(vcpu, &map); 725 743 726 744 vmx->nested.force_msr_bitmap_recalc = false; ··· 1015 997 __func__, i, e.index, e.reserved); 1016 998 goto fail; 1017 999 } 1018 - if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) { 1000 + if (kvm_emulate_msr_write(vcpu, e.index, e.value)) { 1019 1001 pr_debug_ratelimited( 1020 1002 "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", 1021 1003 __func__, i, e.index, e.value); ··· 1051 1033 } 1052 1034 } 1053 1035 1054 - if (kvm_get_msr_with_filter(vcpu, msr_index, data)) { 1036 + if (kvm_emulate_msr_read(vcpu, msr_index, data)) { 1055 1037 pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__, 1056 1038 msr_index); 1057 1039 return false; ··· 1290 1272 { 1291 1273 const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT | 1292 1274 VMX_BASIC_INOUT | 1293 - VMX_BASIC_TRUE_CTLS; 1275 + VMX_BASIC_TRUE_CTLS | 1276 + VMX_BASIC_NO_HW_ERROR_CODE_CC; 1294 1277 1295 - const u64 reserved_bits = GENMASK_ULL(63, 56) | 1278 + const u64 reserved_bits = GENMASK_ULL(63, 57) | 1296 1279 GENMASK_ULL(47, 45) | 1297 1280 BIT_ULL(31); 1298 1281 ··· 2539 2520 } 2540 2521 } 2541 2522 2523 + static void vmcs_read_cet_state(struct kvm_vcpu *vcpu, u64 *s_cet, 2524 + u64 *ssp, u64 *ssp_tbl) 2525 + { 2526 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) || 2527 + guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 2528 + *s_cet = vmcs_readl(GUEST_S_CET); 2529 + 2530 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 2531 + *ssp = vmcs_readl(GUEST_SSP); 2532 + *ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE); 2533 + } 2534 + } 2535 + 2536 + static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, 2537 + u64 ssp, u64 ssp_tbl) 2538 + { 2539 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) || 2540 + guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 2541 + vmcs_writel(GUEST_S_CET, s_cet); 2542 + 2543 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 2544 + vmcs_writel(GUEST_SSP, ssp); 2545 + vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl); 2546 + } 2547 + } 2548 + 2542 2549 static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) 2543 2550 { 2544 2551 struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx); ··· 2681 2636 vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); 2682 2637 vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); 2683 2638 2639 + if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) 2640 + vmcs_write_cet_state(&vmx->vcpu, vmcs12->guest_s_cet, 2641 + vmcs12->guest_ssp, vmcs12->guest_ssp_tbl); 2642 + 2684 2643 set_cr4_guest_host_mask(vmx); 2685 2644 } 2686 2645 ··· 2724 2675 kvm_set_dr(vcpu, 7, vcpu->arch.dr7); 2725 2676 vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl); 2726 2677 } 2678 + 2679 + if (!vmx->nested.nested_run_pending || 2680 + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) 2681 + vmcs_write_cet_state(vcpu, vmx->nested.pre_vmenter_s_cet, 2682 + vmx->nested.pre_vmenter_ssp, 2683 + vmx->nested.pre_vmenter_ssp_tbl); 2684 + 2727 2685 if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || 2728 2686 !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) 2729 2687 vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs); ··· 2826 2770 2827 2771 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && 2828 2772 kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) && 2829 - WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 2830 - vmcs12->guest_ia32_perf_global_ctrl))) { 2773 + WARN_ON_ONCE(__kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 2774 + vmcs12->guest_ia32_perf_global_ctrl))) { 2831 2775 *entry_failure_code = ENTRY_FAIL_DEFAULT; 2832 2776 return -EINVAL; 2833 2777 } ··· 3005 2949 u8 vector = intr_info & INTR_INFO_VECTOR_MASK; 3006 2950 u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK; 3007 2951 bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK; 3008 - bool should_have_error_code; 3009 2952 bool urg = nested_cpu_has2(vmcs12, 3010 2953 SECONDARY_EXEC_UNRESTRICTED_GUEST); 3011 2954 bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE; ··· 3021 2966 CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0)) 3022 2967 return -EINVAL; 3023 2968 3024 - /* VM-entry interruption-info field: deliver error code */ 3025 - should_have_error_code = 3026 - intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode && 3027 - x86_exception_has_error_code(vector); 3028 - if (CC(has_error_code != should_have_error_code)) 3029 - return -EINVAL; 2969 + /* 2970 + * Cannot deliver error code in real mode or if the interrupt 2971 + * type is not hardware exception. For other cases, do the 2972 + * consistency check only if the vCPU doesn't enumerate 2973 + * VMX_BASIC_NO_HW_ERROR_CODE_CC. 2974 + */ 2975 + if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) { 2976 + if (CC(has_error_code)) 2977 + return -EINVAL; 2978 + } else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) { 2979 + if (CC(has_error_code != x86_exception_has_error_code(vector))) 2980 + return -EINVAL; 2981 + } 3030 2982 3031 2983 /* VM-entry exception error code */ 3032 2984 if (CC(has_error_code && ··· 3100 3038 return !__is_canonical_address(la, l1_address_bits_on_exit); 3101 3039 } 3102 3040 3041 + static int nested_vmx_check_cet_state_common(struct kvm_vcpu *vcpu, u64 s_cet, 3042 + u64 ssp, u64 ssp_tbl) 3043 + { 3044 + if (CC(!kvm_is_valid_u_s_cet(vcpu, s_cet)) || CC(!IS_ALIGNED(ssp, 4)) || 3045 + CC(is_noncanonical_msr_address(ssp_tbl, vcpu))) 3046 + return -EINVAL; 3047 + 3048 + return 0; 3049 + } 3050 + 3103 3051 static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, 3104 3052 struct vmcs12 *vmcs12) 3105 3053 { ··· 3118 3046 if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) || 3119 3047 CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) || 3120 3048 CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3))) 3049 + return -EINVAL; 3050 + 3051 + if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP))) 3121 3052 return -EINVAL; 3122 3053 3123 3054 if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) || ··· 3177 3102 CC(ia32e != !!(vmcs12->host_ia32_efer & EFER_LMA)) || 3178 3103 CC(ia32e != !!(vmcs12->host_ia32_efer & EFER_LME))) 3179 3104 return -EINVAL; 3105 + } 3106 + 3107 + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) { 3108 + if (nested_vmx_check_cet_state_common(vcpu, vmcs12->host_s_cet, 3109 + vmcs12->host_ssp, 3110 + vmcs12->host_ssp_tbl)) 3111 + return -EINVAL; 3112 + 3113 + /* 3114 + * IA32_S_CET and SSP must be canonical if the host will 3115 + * enter 64-bit mode after VM-exit; otherwise, higher 3116 + * 32-bits must be all 0s. 3117 + */ 3118 + if (ia32e) { 3119 + if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) || 3120 + CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu))) 3121 + return -EINVAL; 3122 + } else { 3123 + if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32)) 3124 + return -EINVAL; 3125 + } 3180 3126 } 3181 3127 3182 3128 return 0; ··· 3258 3162 CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))) 3259 3163 return -EINVAL; 3260 3164 3165 + if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP))) 3166 + return -EINVAL; 3167 + 3261 3168 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && 3262 3169 (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || 3263 3170 CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) ··· 3309 3210 (CC(is_noncanonical_msr_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu)) || 3310 3211 CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))) 3311 3212 return -EINVAL; 3213 + 3214 + if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) { 3215 + if (nested_vmx_check_cet_state_common(vcpu, vmcs12->guest_s_cet, 3216 + vmcs12->guest_ssp, 3217 + vmcs12->guest_ssp_tbl)) 3218 + return -EINVAL; 3219 + 3220 + /* 3221 + * Guest SSP must have 63:N bits identical, rather than 3222 + * be canonical (i.e., 63:N-1 bits identical), where N is 3223 + * the CPU's maximum linear-address width. Similar to 3224 + * is_noncanonical_msr_address(), use the host's 3225 + * linear-address width. 3226 + */ 3227 + if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1))) 3228 + return -EINVAL; 3229 + } 3312 3230 3313 3231 if (nested_check_guest_non_reg_state(vmcs12)) 3314 3232 return -EINVAL; ··· 3660 3544 !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) 3661 3545 vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS); 3662 3546 3547 + if (!vmx->nested.nested_run_pending || 3548 + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) 3549 + vmcs_read_cet_state(vcpu, &vmx->nested.pre_vmenter_s_cet, 3550 + &vmx->nested.pre_vmenter_ssp, 3551 + &vmx->nested.pre_vmenter_ssp_tbl); 3552 + 3663 3553 /* 3664 3554 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* 3665 3555 * nested early checks are disabled. In the event of a "late" VM-Fail, ··· 3812 3690 return 1; 3813 3691 } 3814 3692 3815 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED); 3693 + kvm_pmu_branch_retired(vcpu); 3816 3694 3817 3695 if (CC(evmptrld_status == EVMPTRLD_VMFAIL)) 3818 3696 return nested_vmx_failInvalid(vcpu); ··· 4749 4627 4750 4628 if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER) 4751 4629 vmcs12->guest_ia32_efer = vcpu->arch.efer; 4630 + 4631 + vmcs_read_cet_state(&vmx->vcpu, &vmcs12->guest_s_cet, 4632 + &vmcs12->guest_ssp, 4633 + &vmcs12->guest_ssp_tbl); 4752 4634 } 4753 4635 4754 4636 /* ··· 4878 4752 if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS) 4879 4753 vmcs_write64(GUEST_BNDCFGS, 0); 4880 4754 4755 + /* 4756 + * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set. 4757 + * otherwise CET state should be retained across VM-exit, i.e., 4758 + * guest values should be propagated from vmcs12 to vmcs01. 4759 + */ 4760 + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) 4761 + vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp, 4762 + vmcs12->host_ssp_tbl); 4763 + else 4764 + vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp, 4765 + vmcs12->guest_ssp_tbl); 4766 + 4881 4767 if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) { 4882 4768 vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat); 4883 4769 vcpu->arch.pat = vmcs12->host_ia32_pat; 4884 4770 } 4885 4771 if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) && 4886 4772 kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu))) 4887 - WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 4888 - vmcs12->host_ia32_perf_global_ctrl)); 4773 + WARN_ON_ONCE(__kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 4774 + vmcs12->host_ia32_perf_global_ctrl)); 4889 4775 4890 4776 /* Set L1 segment info according to Intel SDM 4891 4777 27.5.2 Loading Host Segment and Descriptor-Table Registers */ ··· 5075 4937 goto vmabort; 5076 4938 } 5077 4939 5078 - if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) { 4940 + if (kvm_emulate_msr_write(vcpu, h.index, h.value)) { 5079 4941 pr_debug_ratelimited( 5080 4942 "%s WRMSR failed (%u, 0x%x, 0x%llx)\n", 5081 4943 __func__, j, h.index, h.value); ··· 6354 6216 struct vmcs12 *vmcs12, 6355 6217 union vmx_exit_reason exit_reason) 6356 6218 { 6357 - u32 msr_index = kvm_rcx_read(vcpu); 6219 + u32 msr_index; 6358 6220 gpa_t bitmap; 6359 6221 6360 6222 if (!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) 6361 6223 return true; 6224 + 6225 + if (exit_reason.basic == EXIT_REASON_MSR_READ_IMM || 6226 + exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6227 + msr_index = vmx_get_exit_qual(vcpu); 6228 + else 6229 + msr_index = kvm_rcx_read(vcpu); 6362 6230 6363 6231 /* 6364 6232 * The MSR_BITMAP page is divided into four 1024-byte bitmaps, ··· 6372 6228 * First we need to figure out which of the four to use: 6373 6229 */ 6374 6230 bitmap = vmcs12->msr_bitmap; 6375 - if (exit_reason.basic == EXIT_REASON_MSR_WRITE) 6231 + if (exit_reason.basic == EXIT_REASON_MSR_WRITE || 6232 + exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6376 6233 bitmap += 2048; 6377 6234 if (msr_index >= 0xc0000000) { 6378 6235 msr_index -= 0xc0000000; ··· 6672 6527 return nested_cpu_has2(vmcs12, SECONDARY_EXEC_DESC); 6673 6528 case EXIT_REASON_MSR_READ: 6674 6529 case EXIT_REASON_MSR_WRITE: 6530 + case EXIT_REASON_MSR_READ_IMM: 6531 + case EXIT_REASON_MSR_WRITE_IMM: 6675 6532 return nested_vmx_exit_handled_msr(vcpu, vmcs12, exit_reason); 6676 6533 case EXIT_REASON_INVALID_STATE: 6677 6534 return true; ··· 6708 6561 return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING); 6709 6562 case EXIT_REASON_XSETBV: 6710 6563 return true; 6711 - case EXIT_REASON_XSAVES: case EXIT_REASON_XRSTORS: 6564 + case EXIT_REASON_XSAVES: 6565 + case EXIT_REASON_XRSTORS: 6712 6566 /* 6713 - * This should never happen, since it is not possible to 6714 - * set XSS to a non-zero value---neither in L1 nor in L2. 6715 - * If if it were, XSS would have to be checked against 6716 - * the XSS exit bitmap in vmcs12. 6567 + * Always forward XSAVES/XRSTORS to L1 as KVM doesn't utilize 6568 + * XSS-bitmap, and always loads vmcs02 with vmcs12's XSS-bitmap 6569 + * verbatim, i.e. any exit is due to L1's bitmap. WARN if 6570 + * XSAVES isn't enabled, as the CPU is supposed to inject #UD 6571 + * in that case, before consulting the XSS-bitmap. 6717 6572 */ 6718 - return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES); 6573 + WARN_ON_ONCE(!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES)); 6574 + return true; 6719 6575 case EXIT_REASON_UMWAIT: 6720 6576 case EXIT_REASON_TPAUSE: 6721 6577 return nested_cpu_has2(vmcs12, ··· 7179 7029 VM_EXIT_HOST_ADDR_SPACE_SIZE | 7180 7030 #endif 7181 7031 VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | 7182 - VM_EXIT_CLEAR_BNDCFGS; 7032 + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE; 7183 7033 msrs->exit_ctls_high |= 7184 7034 VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | 7185 7035 VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | 7186 7036 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT | 7187 7037 VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; 7038 + 7039 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7040 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7041 + msrs->exit_ctls_high &= ~VM_EXIT_LOAD_CET_STATE; 7188 7042 7189 7043 /* We support free control of debug control saving. */ 7190 7044 msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS; ··· 7205 7051 #ifdef CONFIG_X86_64 7206 7052 VM_ENTRY_IA32E_MODE | 7207 7053 #endif 7208 - VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; 7054 + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | 7055 + VM_ENTRY_LOAD_CET_STATE; 7209 7056 msrs->entry_ctls_high |= 7210 7057 (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER | 7211 7058 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL); 7059 + 7060 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7061 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7062 + msrs->entry_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE; 7212 7063 7213 7064 /* We support free control of debug control loading. */ 7214 7065 msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS; ··· 7364 7205 msrs->basic |= VMX_BASIC_TRUE_CTLS; 7365 7206 if (cpu_has_vmx_basic_inout()) 7366 7207 msrs->basic |= VMX_BASIC_INOUT; 7208 + if (cpu_has_vmx_basic_no_hw_errcode_cc()) 7209 + msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC; 7367 7210 } 7368 7211 7369 7212 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
+5
arch/x86/kvm/vmx/nested.h
··· 309 309 __kvm_is_valid_cr4(vcpu, val); 310 310 } 311 311 312 + static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu) 313 + { 314 + return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC; 315 + } 316 + 312 317 /* No difference in the restrictions on guest and host CR4 in VMX operation. */ 313 318 #define nested_guest_cr4_valid nested_cr4_valid 314 319 #define nested_host_cr4_valid nested_cr4_valid
+35 -44
arch/x86/kvm/vmx/pmu_intel.c
··· 138 138 139 139 static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu) 140 140 { 141 - return (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_FW_WRITES) != 0; 141 + return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0; 142 142 } 143 143 144 144 static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr) ··· 478 478 }; 479 479 u64 eventsel; 480 480 481 - BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUTNERS); 482 - BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUTNERS); 481 + BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS); 482 + BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS); 483 483 484 484 /* 485 485 * Yell if perf reports support for a fixed counter but perf doesn't ··· 536 536 kvm_pmu_cap.num_counters_gp); 537 537 eax.split.bit_width = min_t(int, eax.split.bit_width, 538 538 kvm_pmu_cap.bit_width_gp); 539 - pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1; 539 + pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(eax.split.bit_width) - 1; 540 540 eax.split.mask_length = min_t(int, eax.split.mask_length, 541 541 kvm_pmu_cap.events_mask_len); 542 - pmu->available_event_types = ~entry->ebx & 543 - ((1ull << eax.split.mask_length) - 1); 542 + pmu->available_event_types = ~entry->ebx & (BIT_ULL(eax.split.mask_length) - 1); 544 543 545 - if (pmu->version == 1) { 546 - pmu->nr_arch_fixed_counters = 0; 547 - } else { 548 - pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed, 549 - kvm_pmu_cap.num_counters_fixed); 550 - edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed, 551 - kvm_pmu_cap.bit_width_fixed); 552 - pmu->counter_bitmask[KVM_PMC_FIXED] = 553 - ((u64)1 << edx.split.bit_width_fixed) - 1; 544 + entry = kvm_find_cpuid_entry_index(vcpu, 7, 0); 545 + if (entry && 546 + (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && 547 + (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { 548 + pmu->reserved_bits ^= HSW_IN_TX; 549 + pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); 554 550 } 551 + 552 + perf_capabilities = vcpu_get_perf_capabilities(vcpu); 553 + if (intel_pmu_lbr_is_compatible(vcpu) && 554 + (perf_capabilities & PERF_CAP_LBR_FMT)) 555 + memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); 556 + else 557 + lbr_desc->records.nr = 0; 558 + 559 + if (lbr_desc->records.nr) 560 + bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1); 561 + 562 + if (pmu->version == 1) 563 + return; 564 + 565 + pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed, 566 + kvm_pmu_cap.num_counters_fixed); 567 + edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed, 568 + kvm_pmu_cap.bit_width_fixed); 569 + pmu->counter_bitmask[KVM_PMC_FIXED] = BIT_ULL(edx.split.bit_width_fixed) - 1; 555 570 556 571 intel_pmu_enable_fixed_counter_bits(pmu, INTEL_FIXED_0_KERNEL | 557 572 INTEL_FIXED_0_USER | 558 573 INTEL_FIXED_0_ENABLE_PMI); 559 574 560 - counter_rsvd = ~(((1ull << pmu->nr_arch_gp_counters) - 1) | 561 - (((1ull << pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX)); 575 + counter_rsvd = ~((BIT_ULL(pmu->nr_arch_gp_counters) - 1) | 576 + ((BIT_ULL(pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX)); 562 577 pmu->global_ctrl_rsvd = counter_rsvd; 563 578 564 579 /* ··· 588 573 pmu->global_status_rsvd &= 589 574 ~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI; 590 575 591 - entry = kvm_find_cpuid_entry_index(vcpu, 7, 0); 592 - if (entry && 593 - (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && 594 - (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { 595 - pmu->reserved_bits ^= HSW_IN_TX; 596 - pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); 597 - } 598 - 599 - bitmap_set(pmu->all_valid_pmc_idx, 600 - 0, pmu->nr_arch_gp_counters); 601 - bitmap_set(pmu->all_valid_pmc_idx, 602 - INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters); 603 - 604 - perf_capabilities = vcpu_get_perf_capabilities(vcpu); 605 - if (intel_pmu_lbr_is_compatible(vcpu) && 606 - (perf_capabilities & PMU_CAP_LBR_FMT)) 607 - memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); 608 - else 609 - lbr_desc->records.nr = 0; 610 - 611 - if (lbr_desc->records.nr) 612 - bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1); 613 - 614 576 if (perf_capabilities & PERF_CAP_PEBS_FORMAT) { 615 577 if (perf_capabilities & PERF_CAP_PEBS_BASELINE) { 616 578 pmu->pebs_enable_rsvd = counter_rsvd; ··· 595 603 pmu->pebs_data_cfg_rsvd = ~0xff00000full; 596 604 intel_pmu_enable_fixed_counter_bits(pmu, ICL_FIXED_0_ADAPTIVE); 597 605 } else { 598 - pmu->pebs_enable_rsvd = 599 - ~((1ull << pmu->nr_arch_gp_counters) - 1); 606 + pmu->pebs_enable_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1); 600 607 } 601 608 } 602 609 } ··· 616 625 pmu->gp_counters[i].current_config = 0; 617 626 } 618 627 619 - for (i = 0; i < KVM_MAX_NR_INTEL_FIXED_COUTNERS; i++) { 628 + for (i = 0; i < KVM_MAX_NR_INTEL_FIXED_COUNTERS; i++) { 620 629 pmu->fixed_counters[i].type = KVM_PMC_FIXED; 621 630 pmu->fixed_counters[i].vcpu = vcpu; 622 631 pmu->fixed_counters[i].idx = i + KVM_FIXED_PMC_BASE_IDX; ··· 753 762 int bit, hw_idx; 754 763 755 764 kvm_for_each_pmc(pmu, pmc, bit, (unsigned long *)&pmu->global_ctrl) { 756 - if (!pmc_speculative_in_use(pmc) || 765 + if (!pmc_is_locally_enabled(pmc) || 757 766 !pmc_is_globally_enabled(pmc) || !pmc->perf_event) 758 767 continue; 759 768
+20 -8
arch/x86/kvm/vmx/tdx.c
··· 620 620 struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 621 621 622 622 kvm->arch.has_protected_state = true; 623 + /* 624 + * TDX Module doesn't allow the hypervisor to modify the EOI-bitmap, 625 + * i.e. all EOIs are accelerated and never trigger exits. 626 + */ 627 + kvm->arch.has_protected_eoi = true; 623 628 kvm->arch.has_private_mem = true; 624 629 kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT; 625 630 ··· 1999 1994 * handle retries locally in their EPT violation handlers. 2000 1995 */ 2001 1996 while (1) { 1997 + struct kvm_memory_slot *slot; 1998 + 2002 1999 ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual); 2003 2000 2004 2001 if (ret != RET_PF_RETRY || !local_retry) ··· 2013 2006 ret = -EIO; 2014 2007 break; 2015 2008 } 2009 + 2010 + /* 2011 + * Bail if the memslot is invalid, i.e. is being deleted, as 2012 + * faulting in will never succeed and this task needs to drop 2013 + * SRCU in order to let memslot deletion complete. 2014 + */ 2015 + slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa)); 2016 + if (slot && slot->flags & KVM_MEMSLOT_INVALID) 2017 + break; 2016 2018 2017 2019 cond_resched(); 2018 2020 } ··· 2488 2472 /* TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ 2489 2473 kvm_tdx->td.tdcx_nr_pages = tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1; 2490 2474 tdcs_pages = kcalloc(kvm_tdx->td.tdcs_nr_pages, sizeof(*kvm_tdx->td.tdcs_pages), 2491 - GFP_KERNEL | __GFP_ZERO); 2475 + GFP_KERNEL); 2492 2476 if (!tdcs_pages) 2493 2477 goto free_tdr; 2494 2478 ··· 3476 3460 if (r) 3477 3461 goto tdx_bringup_err; 3478 3462 3463 + r = -EINVAL; 3479 3464 /* Get TDX global information for later use */ 3480 3465 tdx_sysinfo = tdx_get_sysinfo(); 3481 - if (WARN_ON_ONCE(!tdx_sysinfo)) { 3482 - r = -EINVAL; 3466 + if (WARN_ON_ONCE(!tdx_sysinfo)) 3483 3467 goto get_sysinfo_err; 3484 - } 3485 3468 3486 3469 /* Check TDX module and KVM capabilities */ 3487 3470 if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) || ··· 3523 3508 if (td_conf->max_vcpus_per_td < num_present_cpus()) { 3524 3509 pr_err("Disable TDX: MAX_VCPU_PER_TD (%u) smaller than number of logical CPUs (%u).\n", 3525 3510 td_conf->max_vcpus_per_td, num_present_cpus()); 3526 - r = -EINVAL; 3527 3511 goto get_sysinfo_err; 3528 3512 } 3529 3513 3530 - if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) { 3531 - r = -EINVAL; 3514 + if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) 3532 3515 goto get_sysinfo_err; 3533 - } 3534 3516 3535 3517 /* 3536 3518 * Leave hardware virtualization enabled after TDX is enabled
+6
arch/x86/kvm/vmx/vmcs12.c
··· 139 139 FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions), 140 140 FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp), 141 141 FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip), 142 + FIELD(GUEST_S_CET, guest_s_cet), 143 + FIELD(GUEST_SSP, guest_ssp), 144 + FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl), 142 145 FIELD(HOST_CR0, host_cr0), 143 146 FIELD(HOST_CR3, host_cr3), 144 147 FIELD(HOST_CR4, host_cr4), ··· 154 151 FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip), 155 152 FIELD(HOST_RSP, host_rsp), 156 153 FIELD(HOST_RIP, host_rip), 154 + FIELD(HOST_S_CET, host_s_cet), 155 + FIELD(HOST_SSP, host_ssp), 156 + FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl), 157 157 }; 158 158 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
+13 -1
arch/x86/kvm/vmx/vmcs12.h
··· 117 117 natural_width host_ia32_sysenter_eip; 118 118 natural_width host_rsp; 119 119 natural_width host_rip; 120 - natural_width paddingl[8]; /* room for future expansion */ 120 + natural_width host_s_cet; 121 + natural_width host_ssp; 122 + natural_width host_ssp_tbl; 123 + natural_width guest_s_cet; 124 + natural_width guest_ssp; 125 + natural_width guest_ssp_tbl; 126 + natural_width paddingl[2]; /* room for future expansion */ 121 127 u32 pin_based_vm_exec_control; 122 128 u32 cpu_based_vm_exec_control; 123 129 u32 exception_bitmap; ··· 300 294 CHECK_OFFSET(host_ia32_sysenter_eip, 656); 301 295 CHECK_OFFSET(host_rsp, 664); 302 296 CHECK_OFFSET(host_rip, 672); 297 + CHECK_OFFSET(host_s_cet, 680); 298 + CHECK_OFFSET(host_ssp, 688); 299 + CHECK_OFFSET(host_ssp_tbl, 696); 300 + CHECK_OFFSET(guest_s_cet, 704); 301 + CHECK_OFFSET(guest_ssp, 712); 302 + CHECK_OFFSET(guest_ssp_tbl, 720); 303 303 CHECK_OFFSET(pin_based_vm_exec_control, 744); 304 304 CHECK_OFFSET(cpu_based_vm_exec_control, 748); 305 305 CHECK_OFFSET(exception_bitmap, 752);
+181 -52
arch/x86/kvm/vmx/vmx.c
··· 1344 1344 } 1345 1345 1346 1346 #ifdef CONFIG_X86_64 1347 - static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) 1347 + static u64 vmx_read_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 *cache) 1348 1348 { 1349 1349 preempt_disable(); 1350 1350 if (vmx->vt.guest_state_loaded) 1351 - rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); 1351 + *cache = read_msr(msr); 1352 1352 preempt_enable(); 1353 - return vmx->msr_guest_kernel_gs_base; 1353 + return *cache; 1354 + } 1355 + 1356 + static void vmx_write_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 data, 1357 + u64 *cache) 1358 + { 1359 + preempt_disable(); 1360 + if (vmx->vt.guest_state_loaded) 1361 + wrmsrns(msr, data); 1362 + preempt_enable(); 1363 + *cache = data; 1364 + } 1365 + 1366 + static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) 1367 + { 1368 + return vmx_read_guest_host_msr(vmx, MSR_KERNEL_GS_BASE, 1369 + &vmx->msr_guest_kernel_gs_base); 1354 1370 } 1355 1371 1356 1372 static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data) 1357 1373 { 1358 - preempt_disable(); 1359 - if (vmx->vt.guest_state_loaded) 1360 - wrmsrq(MSR_KERNEL_GS_BASE, data); 1361 - preempt_enable(); 1362 - vmx->msr_guest_kernel_gs_base = data; 1374 + vmx_write_guest_host_msr(vmx, MSR_KERNEL_GS_BASE, data, 1375 + &vmx->msr_guest_kernel_gs_base); 1363 1376 } 1364 1377 #endif 1365 1378 ··· 2106 2093 else 2107 2094 msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; 2108 2095 break; 2096 + case MSR_IA32_S_CET: 2097 + msr_info->data = vmcs_readl(GUEST_S_CET); 2098 + break; 2099 + case MSR_KVM_INTERNAL_GUEST_SSP: 2100 + msr_info->data = vmcs_readl(GUEST_SSP); 2101 + break; 2102 + case MSR_IA32_INT_SSP_TAB: 2103 + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE); 2104 + break; 2109 2105 case MSR_IA32_DEBUGCTLMSR: 2110 2106 msr_info->data = vmx_guest_debugctl_read(); 2111 2107 break; ··· 2149 2127 (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))) 2150 2128 debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT; 2151 2129 2152 - if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) && 2130 + if ((kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT) && 2153 2131 (host_initiated || intel_pmu_lbr_is_enabled(vcpu))) 2154 2132 debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; 2155 2133 ··· 2433 2411 else 2434 2412 vmx->pt_desc.guest.addr_a[index / 2] = data; 2435 2413 break; 2414 + case MSR_IA32_S_CET: 2415 + vmcs_writel(GUEST_S_CET, data); 2416 + break; 2417 + case MSR_KVM_INTERNAL_GUEST_SSP: 2418 + vmcs_writel(GUEST_SSP, data); 2419 + break; 2420 + case MSR_IA32_INT_SSP_TAB: 2421 + vmcs_writel(GUEST_INTR_SSP_TABLE, data); 2422 + break; 2436 2423 case MSR_IA32_PERF_CAPABILITIES: 2437 - if (data & PMU_CAP_LBR_FMT) { 2438 - if ((data & PMU_CAP_LBR_FMT) != 2439 - (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT)) 2424 + if (data & PERF_CAP_LBR_FMT) { 2425 + if ((data & PERF_CAP_LBR_FMT) != 2426 + (kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT)) 2440 2427 return 1; 2441 2428 if (!cpuid_model_is_consistent(vcpu)) 2442 2429 return 1; ··· 2615 2584 { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER }, 2616 2585 { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS }, 2617 2586 { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL }, 2587 + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE }, 2618 2588 }; 2619 2589 2620 2590 memset(vmcs_conf, 0, sizeof(*vmcs_conf)); ··· 4100 4068 } 4101 4069 } 4102 4070 4103 - void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 4071 + static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 4104 4072 { 4073 + bool intercept; 4074 + 4105 4075 if (!cpu_has_vmx_msr_bitmap()) 4106 4076 return; 4107 4077 ··· 4149 4115 vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W, 4150 4116 !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D)); 4151 4117 4118 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 4119 + intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 4120 + 4121 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept); 4122 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept); 4123 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept); 4124 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept); 4125 + } 4126 + 4127 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) { 4128 + intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) && 4129 + !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 4130 + 4131 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept); 4132 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept); 4133 + } 4134 + 4152 4135 /* 4153 4136 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be 4154 4137 * filtered by userspace. 4155 4138 */ 4139 + } 4140 + 4141 + void vmx_recalc_intercepts(struct kvm_vcpu *vcpu) 4142 + { 4143 + vmx_recalc_msr_intercepts(vcpu); 4156 4144 } 4157 4145 4158 4146 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, ··· 4326 4270 4327 4271 if (cpu_has_load_ia32_efer()) 4328 4272 vmcs_write64(HOST_IA32_EFER, kvm_host.efer); 4273 + 4274 + /* 4275 + * Supervisor shadow stack is not enabled on host side, i.e., 4276 + * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM 4277 + * description(RDSSP instruction), SSP is not readable in CPL0, 4278 + * so resetting the two registers to 0s at VM-Exit does no harm 4279 + * to kernel execution. When execution flow exits to userspace, 4280 + * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter 4281 + * 3 and 4 for details. 4282 + */ 4283 + if (cpu_has_load_cet_ctrl()) { 4284 + vmcs_writel(HOST_S_CET, kvm_host.s_cet); 4285 + vmcs_writel(HOST_SSP, 0); 4286 + vmcs_writel(HOST_INTR_SSP_TABLE, 0); 4287 + } 4329 4288 } 4330 4289 4331 4290 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) ··· 4375 4304 return pin_based_exec_ctrl; 4376 4305 } 4377 4306 4378 - static u32 vmx_vmentry_ctrl(void) 4307 + static u32 vmx_get_initial_vmentry_ctrl(void) 4379 4308 { 4380 4309 u32 vmentry_ctrl = vmcs_config.vmentry_ctrl; 4381 4310 ··· 4392 4321 return vmentry_ctrl; 4393 4322 } 4394 4323 4395 - static u32 vmx_vmexit_ctrl(void) 4324 + static u32 vmx_get_initial_vmexit_ctrl(void) 4396 4325 { 4397 4326 u32 vmexit_ctrl = vmcs_config.vmexit_ctrl; 4398 4327 ··· 4422 4351 4423 4352 pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); 4424 4353 4425 - if (kvm_vcpu_apicv_active(vcpu)) { 4426 - secondary_exec_controls_setbit(vmx, 4427 - SECONDARY_EXEC_APIC_REGISTER_VIRT | 4428 - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); 4429 - if (enable_ipiv) 4430 - tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT); 4431 - } else { 4432 - secondary_exec_controls_clearbit(vmx, 4433 - SECONDARY_EXEC_APIC_REGISTER_VIRT | 4434 - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); 4435 - if (enable_ipiv) 4436 - tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT); 4437 - } 4354 + secondary_exec_controls_changebit(vmx, 4355 + SECONDARY_EXEC_APIC_REGISTER_VIRT | 4356 + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY, 4357 + kvm_vcpu_apicv_active(vcpu)); 4358 + if (enable_ipiv) 4359 + tertiary_exec_controls_changebit(vmx, TERTIARY_EXEC_IPI_VIRT, 4360 + kvm_vcpu_apicv_active(vcpu)); 4438 4361 4439 4362 vmx_update_msr_bitmap_x2apic(vcpu); 4440 4363 } ··· 4751 4686 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) 4752 4687 vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat); 4753 4688 4754 - vm_exit_controls_set(vmx, vmx_vmexit_ctrl()); 4689 + vm_exit_controls_set(vmx, vmx_get_initial_vmexit_ctrl()); 4755 4690 4756 4691 /* 22.2.1, 20.8.1 */ 4757 - vm_entry_controls_set(vmx, vmx_vmentry_ctrl()); 4692 + vm_entry_controls_set(vmx, vmx_get_initial_vmentry_ctrl()); 4758 4693 4759 4694 vmx->vcpu.arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits(); 4760 4695 vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits); ··· 4881 4816 vmcs_write64(GUEST_BNDCFGS, 0); 4882 4817 4883 4818 vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */ 4819 + 4820 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 4821 + vmcs_writel(GUEST_SSP, 0); 4822 + vmcs_writel(GUEST_INTR_SSP_TABLE, 0); 4823 + } 4824 + if (kvm_cpu_cap_has(X86_FEATURE_IBT) || 4825 + kvm_cpu_cap_has(X86_FEATURE_SHSTK)) 4826 + vmcs_writel(GUEST_S_CET, 0); 4884 4827 4885 4828 kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); 4886 4829 ··· 6083 6010 return 1; 6084 6011 } 6085 6012 6013 + static int vmx_get_msr_imm_reg(struct kvm_vcpu *vcpu) 6014 + { 6015 + return vmx_get_instr_info_reg(vmcs_read32(VMX_INSTRUCTION_INFO)); 6016 + } 6017 + 6018 + static int handle_rdmsr_imm(struct kvm_vcpu *vcpu) 6019 + { 6020 + return kvm_emulate_rdmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 6021 + vmx_get_msr_imm_reg(vcpu)); 6022 + } 6023 + 6024 + static int handle_wrmsr_imm(struct kvm_vcpu *vcpu) 6025 + { 6026 + return kvm_emulate_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 6027 + vmx_get_msr_imm_reg(vcpu)); 6028 + } 6029 + 6086 6030 /* 6087 6031 * The exit handlers return 1 if the exit was handled fully and guest execution 6088 6032 * may resume. Otherwise they set the kvm_run parameter to indicate what needs ··· 6158 6068 [EXIT_REASON_ENCLS] = handle_encls, 6159 6069 [EXIT_REASON_BUS_LOCK] = handle_bus_lock_vmexit, 6160 6070 [EXIT_REASON_NOTIFY] = handle_notify, 6071 + [EXIT_REASON_MSR_READ_IMM] = handle_rdmsr_imm, 6072 + [EXIT_REASON_MSR_WRITE_IMM] = handle_wrmsr_imm, 6161 6073 }; 6162 6074 6163 6075 static const int kvm_vmx_max_exit_handlers = ··· 6364 6272 if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0) 6365 6273 vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest); 6366 6274 6275 + if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) 6276 + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n", 6277 + vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP), 6278 + vmcs_readl(GUEST_INTR_SSP_TABLE)); 6367 6279 pr_err("*** Host State ***\n"); 6368 6280 pr_err("RIP = 0x%016lx RSP = 0x%016lx\n", 6369 6281 vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP)); ··· 6398 6302 vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); 6399 6303 if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0) 6400 6304 vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); 6305 + if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) 6306 + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n", 6307 + vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP), 6308 + vmcs_readl(HOST_INTR_SSP_TABLE)); 6401 6309 6402 6310 pr_err("*** Control State ***\n"); 6403 6311 pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n", ··· 6602 6502 #ifdef CONFIG_MITIGATION_RETPOLINE 6603 6503 if (exit_reason.basic == EXIT_REASON_MSR_WRITE) 6604 6504 return kvm_emulate_wrmsr(vcpu); 6505 + else if (exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6506 + return handle_wrmsr_imm(vcpu); 6605 6507 else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER) 6606 6508 return handle_preemption_timer(vcpu); 6607 6509 else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW) ··· 7279 7177 7280 7178 switch (vmx_get_exit_reason(vcpu).basic) { 7281 7179 case EXIT_REASON_MSR_WRITE: 7282 - return handle_fastpath_set_msr_irqoff(vcpu); 7180 + return handle_fastpath_wrmsr(vcpu); 7181 + case EXIT_REASON_MSR_WRITE_IMM: 7182 + return handle_fastpath_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 7183 + vmx_get_msr_imm_reg(vcpu)); 7283 7184 case EXIT_REASON_PREEMPTION_TIMER: 7284 7185 return handle_fastpath_preemption_timer(vcpu, force_immediate_exit); 7285 7186 case EXIT_REASON_HLT: 7286 7187 return handle_fastpath_hlt(vcpu); 7188 + case EXIT_REASON_INVD: 7189 + return handle_fastpath_invd(vcpu); 7287 7190 default: 7288 7191 return EXIT_FASTPATH_NONE; 7289 7192 } ··· 7755 7648 cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); 7756 7649 cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); 7757 7650 cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); 7651 + cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK)); 7652 + cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT)); 7758 7653 7759 7654 entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1); 7760 7655 cr4_fixed1_update(X86_CR4_LAM_SUP, eax, feature_bit(LAM)); ··· 7891 7782 vmx->msr_ia32_feature_control_valid_bits &= 7892 7783 ~FEAT_CTL_SGX_LC_ENABLED; 7893 7784 7894 - /* Recalc MSR interception to account for feature changes. */ 7895 - vmx_recalc_msr_intercepts(vcpu); 7896 - 7897 7785 /* Refresh #PF interception to account for MAXPHYADDR changes. */ 7898 7786 vmx_update_exception_bitmap(vcpu); 7899 7787 } 7900 7788 7901 7789 static __init u64 vmx_get_perf_capabilities(void) 7902 7790 { 7903 - u64 perf_cap = PMU_CAP_FW_WRITES; 7791 + u64 perf_cap = PERF_CAP_FW_WRITES; 7904 7792 u64 host_perf_cap = 0; 7905 7793 7906 7794 if (!enable_pmu) ··· 7917 7811 if (!vmx_lbr_caps.has_callstack) 7918 7812 memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps)); 7919 7813 else if (vmx_lbr_caps.nr) 7920 - perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT; 7814 + perf_cap |= host_perf_cap & PERF_CAP_LBR_FMT; 7921 7815 } 7922 7816 7923 7817 if (vmx_pebs_supported()) { ··· 7985 7879 kvm_cpu_cap_set(X86_FEATURE_UMIP); 7986 7880 7987 7881 /* CPUID 0xD.1 */ 7988 - kvm_caps.supported_xss = 0; 7989 7882 if (!cpu_has_vmx_xsaves()) 7990 7883 kvm_cpu_cap_clear(X86_FEATURE_XSAVES); 7991 7884 ··· 7996 7891 7997 7892 if (cpu_has_vmx_waitpkg()) 7998 7893 kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); 7894 + 7895 + /* 7896 + * Disable CET if unrestricted_guest is unsupported as KVM doesn't 7897 + * enforce CET HW behaviors in emulator. On platforms with 7898 + * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code 7899 + * fails, so disable CET in this case too. 7900 + */ 7901 + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || 7902 + !cpu_has_vmx_basic_no_hw_errcode_cc()) { 7903 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 7904 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 7905 + } 7999 7906 } 8000 7907 8001 7908 static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu, ··· 8457 8340 8458 8341 vmx_setup_user_return_msrs(); 8459 8342 8460 - if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0) 8461 - return -EIO; 8462 8343 8463 8344 if (boot_cpu_has(X86_FEATURE_NX)) 8464 8345 kvm_enable_efer_bits(EFER_NX); ··· 8485 8370 pr_err_ratelimited("NX (Execute Disable) not supported\n"); 8486 8371 return -EOPNOTSUPP; 8487 8372 } 8373 + 8374 + /* 8375 + * Shadow paging doesn't have a (further) performance penalty 8376 + * from GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable it 8377 + * by default 8378 + */ 8379 + if (!enable_ept) 8380 + allow_smaller_maxphyaddr = true; 8488 8381 8489 8382 if (!cpu_has_vmx_ept_ad_bits() || !enable_ept) 8490 8383 enable_ept_ad_bits = 0; ··· 8619 8496 8620 8497 setup_default_sgx_lepubkeyhash(); 8621 8498 8499 + vmx_set_cpu_caps(); 8500 + 8501 + /* 8502 + * Configure nested capabilities after core CPU capabilities so that 8503 + * nested support can be conditional on base support, e.g. so that KVM 8504 + * can hide/show features based on kvm_cpu_cap_has(). 8505 + */ 8622 8506 if (nested) { 8623 8507 nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept); 8624 8508 ··· 8633 8503 if (r) 8634 8504 return r; 8635 8505 } 8636 - 8637 - vmx_set_cpu_caps(); 8638 8506 8639 8507 r = alloc_kvm_area(); 8640 8508 if (r && nested) ··· 8660 8532 */ 8661 8533 if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) 8662 8534 kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8663 - kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8535 + 8536 + kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8537 + 8664 8538 return r; 8665 8539 } 8666 8540 ··· 8695 8565 return -EOPNOTSUPP; 8696 8566 8697 8567 /* 8698 - * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing 8699 - * to unwind if a later step fails. 8568 + * Note, VMCS and eVMCS configuration only touch VMX knobs/variables, 8569 + * i.e. there's nothing to unwind if a later step fails. 8700 8570 */ 8701 8571 hv_init_evmcs(); 8572 + 8573 + /* 8574 + * Parse the VMCS config and VMX capabilities before anything else, so 8575 + * that the information is available to all setup flows. 8576 + */ 8577 + if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0) 8578 + return -EIO; 8702 8579 8703 8580 r = kvm_x86_vendor_init(&vt_init_ops); 8704 8581 if (r) ··· 8729 8592 } 8730 8593 8731 8594 vmx_check_vmcs12_offsets(); 8732 - 8733 - /* 8734 - * Shadow paging doesn't have a (further) performance penalty 8735 - * from GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable it 8736 - * by default 8737 - */ 8738 - if (!enable_ept) 8739 - allow_smaller_maxphyaddr = true; 8740 8595 8741 8596 return 0; 8742 8597
+20 -2
arch/x86/kvm/vmx/vmx.h
··· 181 181 */ 182 182 u64 pre_vmenter_debugctl; 183 183 u64 pre_vmenter_bndcfgs; 184 + u64 pre_vmenter_s_cet; 185 + u64 pre_vmenter_ssp; 186 + u64 pre_vmenter_ssp_tbl; 184 187 185 188 /* to migrate it to L1 if L2 writes to L1's CR8 directly */ 186 189 int l1_tpr_threshold; ··· 487 484 VM_ENTRY_LOAD_IA32_EFER | \ 488 485 VM_ENTRY_LOAD_BNDCFGS | \ 489 486 VM_ENTRY_PT_CONCEAL_PIP | \ 490 - VM_ENTRY_LOAD_IA32_RTIT_CTL) 487 + VM_ENTRY_LOAD_IA32_RTIT_CTL | \ 488 + VM_ENTRY_LOAD_CET_STATE) 491 489 492 490 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \ 493 491 (VM_EXIT_SAVE_DEBUG_CONTROLS | \ ··· 510 506 VM_EXIT_LOAD_IA32_EFER | \ 511 507 VM_EXIT_CLEAR_BNDCFGS | \ 512 508 VM_EXIT_PT_CONCEAL_PIP | \ 513 - VM_EXIT_CLEAR_IA32_RTIT_CTL) 509 + VM_EXIT_CLEAR_IA32_RTIT_CTL | \ 510 + VM_EXIT_LOAD_CET_STATE) 514 511 515 512 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \ 516 513 (PIN_BASED_EXT_INTR_MASK | \ ··· 613 608 { \ 614 609 BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname))); \ 615 610 lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ 611 + } \ 612 + static __always_inline void lname##_controls_changebit(struct vcpu_vmx *vmx, u##bits val, \ 613 + bool set) \ 614 + { \ 615 + if (set) \ 616 + lname##_controls_setbit(vmx, val); \ 617 + else \ 618 + lname##_controls_clearbit(vmx, val); \ 616 619 } 617 620 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32) 618 621 BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) ··· 718 705 } 719 706 720 707 void dump_vmcs(struct kvm_vcpu *vcpu); 708 + 709 + static inline int vmx_get_instr_info_reg(u32 vmx_instr_info) 710 + { 711 + return (vmx_instr_info >> 3) & 0xf; 712 + } 721 713 722 714 static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info) 723 715 {
+1 -1
arch/x86/kvm/vmx/x86_ops.h
··· 52 52 int trig_mode, int vector); 53 53 void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu); 54 54 bool vmx_has_emulated_msr(struct kvm *kvm, u32 index); 55 - void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu); 55 + void vmx_recalc_intercepts(struct kvm_vcpu *vcpu); 56 56 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); 57 57 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu); 58 58 int vmx_get_feature_msr(u32 msr, u64 *data);
+681 -271
arch/x86/kvm/x86.c
··· 97 97 * vendor module being reloaded with different module parameters. 98 98 */ 99 99 struct kvm_caps kvm_caps __read_mostly; 100 - EXPORT_SYMBOL_GPL(kvm_caps); 100 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_caps); 101 101 102 102 struct kvm_host_values kvm_host __read_mostly; 103 - EXPORT_SYMBOL_GPL(kvm_host); 103 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host); 104 104 105 105 #define ERR_PTR_USR(e) ((void __user *)ERR_PTR(e)) 106 106 ··· 136 136 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); 137 137 138 138 static DEFINE_MUTEX(vendor_module_lock); 139 + static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); 140 + static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); 141 + 139 142 struct kvm_x86_ops kvm_x86_ops __read_mostly; 140 143 141 144 #define KVM_X86_OP(func) \ ··· 155 152 156 153 bool __read_mostly report_ignored_msrs = true; 157 154 module_param(report_ignored_msrs, bool, 0644); 158 - EXPORT_SYMBOL_GPL(report_ignored_msrs); 155 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs); 159 156 160 157 unsigned int min_timer_period_us = 200; 161 158 module_param(min_timer_period_us, uint, 0644); ··· 167 164 static u32 __read_mostly tsc_tolerance_ppm = 250; 168 165 module_param(tsc_tolerance_ppm, uint, 0644); 169 166 170 - static bool __read_mostly vector_hashing = true; 171 - module_param(vector_hashing, bool, 0444); 172 - 173 167 bool __read_mostly enable_vmware_backdoor = false; 174 168 module_param(enable_vmware_backdoor, bool, 0444); 175 - EXPORT_SYMBOL_GPL(enable_vmware_backdoor); 169 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_vmware_backdoor); 176 170 177 171 /* 178 172 * Flags to manipulate forced emulation behavior (any non-zero value will ··· 184 184 185 185 /* Enable/disable PMU virtualization */ 186 186 bool __read_mostly enable_pmu = true; 187 - EXPORT_SYMBOL_GPL(enable_pmu); 187 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu); 188 188 module_param(enable_pmu, bool, 0444); 189 189 190 190 bool __read_mostly eager_page_split = true; ··· 211 211 }; 212 212 213 213 u32 __read_mostly kvm_nr_uret_msrs; 214 - EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs); 214 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs); 215 215 static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS]; 216 216 static struct kvm_user_return_msrs __percpu *user_return_msrs; 217 217 ··· 220 220 | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ 221 221 | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) 222 222 223 + #define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL) 224 + /* 225 + * Note, KVM supports exposing PT to the guest, but does not support context 226 + * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping 227 + * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support 228 + * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs). 229 + */ 230 + #define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_ALL) 231 + 223 232 bool __read_mostly allow_smaller_maxphyaddr = 0; 224 - EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); 233 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(allow_smaller_maxphyaddr); 225 234 226 235 bool __read_mostly enable_apicv = true; 227 - EXPORT_SYMBOL_GPL(enable_apicv); 236 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_apicv); 228 237 229 238 bool __read_mostly enable_ipiv = true; 230 - EXPORT_SYMBOL_GPL(enable_ipiv); 239 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_ipiv); 231 240 232 241 bool __read_mostly enable_device_posted_irqs = true; 233 - EXPORT_SYMBOL_GPL(enable_device_posted_irqs); 242 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_device_posted_irqs); 234 243 235 244 const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 236 245 KVM_GENERIC_VM_STATS(), ··· 344 335 MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, 345 336 MSR_IA32_UMWAIT_CONTROL, 346 337 347 - MSR_IA32_XFD, MSR_IA32_XFD_ERR, 338 + MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS, 339 + 340 + MSR_IA32_U_CET, MSR_IA32_S_CET, 341 + MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, 342 + MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, 348 343 }; 349 344 350 345 static const u32 msrs_to_save_pmu[] = { ··· 380 367 MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 381 368 MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 382 369 MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 370 + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, 383 371 }; 384 372 385 373 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) + ··· 628 614 kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr; 629 615 return kvm_nr_uret_msrs++; 630 616 } 631 - EXPORT_SYMBOL_GPL(kvm_add_user_return_msr); 617 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr); 632 618 633 619 int kvm_find_user_return_msr(u32 msr) 634 620 { ··· 640 626 } 641 627 return -1; 642 628 } 643 - EXPORT_SYMBOL_GPL(kvm_find_user_return_msr); 629 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr); 644 630 645 631 static void kvm_user_return_msr_cpu_online(void) 646 632 { ··· 680 666 kvm_user_return_register_notifier(msrs); 681 667 return 0; 682 668 } 683 - EXPORT_SYMBOL_GPL(kvm_set_user_return_msr); 669 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr); 684 670 685 671 void kvm_user_return_msr_update_cache(unsigned int slot, u64 value) 686 672 { ··· 689 675 msrs->values[slot].curr = value; 690 676 kvm_user_return_register_notifier(msrs); 691 677 } 692 - EXPORT_SYMBOL_GPL(kvm_user_return_msr_update_cache); 678 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_user_return_msr_update_cache); 679 + 680 + u64 kvm_get_user_return_msr(unsigned int slot) 681 + { 682 + return this_cpu_ptr(user_return_msrs)->values[slot].curr; 683 + } 684 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr); 693 685 694 686 static void drop_user_return_notifiers(void) 695 687 { ··· 717 697 /* Fault while not rebooting. We want the trace. */ 718 698 BUG_ON(!kvm_rebooting); 719 699 } 720 - EXPORT_SYMBOL_GPL(kvm_spurious_fault); 700 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spurious_fault); 721 701 722 702 #define EXCPT_BENIGN 0 723 703 #define EXCPT_CONTRIBUTORY 1 ··· 822 802 ex->has_payload = false; 823 803 ex->payload = 0; 824 804 } 825 - EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload); 805 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_deliver_exception_payload); 826 806 827 807 static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vector, 828 808 bool has_error_code, u32 error_code, ··· 906 886 { 907 887 kvm_multiple_exception(vcpu, nr, false, 0, false, 0); 908 888 } 909 - EXPORT_SYMBOL_GPL(kvm_queue_exception); 889 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception); 910 890 911 891 912 892 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, ··· 914 894 { 915 895 kvm_multiple_exception(vcpu, nr, false, 0, true, payload); 916 896 } 917 - EXPORT_SYMBOL_GPL(kvm_queue_exception_p); 897 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception_p); 918 898 919 899 static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr, 920 900 u32 error_code, unsigned long payload) ··· 949 929 vcpu->arch.exception.has_payload = false; 950 930 vcpu->arch.exception.payload = 0; 951 931 } 952 - EXPORT_SYMBOL_GPL(kvm_requeue_exception); 932 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_requeue_exception); 953 933 954 934 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err) 955 935 { ··· 960 940 961 941 return 1; 962 942 } 963 - EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); 943 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_complete_insn_gp); 964 944 965 945 static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err) 966 946 { ··· 1010 990 1011 991 fault_mmu->inject_page_fault(vcpu, fault); 1012 992 } 1013 - EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault); 993 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_emulated_page_fault); 1014 994 1015 995 void kvm_inject_nmi(struct kvm_vcpu *vcpu) 1016 996 { ··· 1022 1002 { 1023 1003 kvm_multiple_exception(vcpu, nr, true, error_code, false, 0); 1024 1004 } 1025 - EXPORT_SYMBOL_GPL(kvm_queue_exception_e); 1005 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception_e); 1026 1006 1027 1007 /* 1028 1008 * Checks if cpl <= required_cpl; if true, return true. Otherwise queue ··· 1044 1024 kvm_queue_exception(vcpu, UD_VECTOR); 1045 1025 return false; 1046 1026 } 1047 - EXPORT_SYMBOL_GPL(kvm_require_dr); 1027 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_require_dr); 1048 1028 1049 1029 static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu) 1050 1030 { ··· 1099 1079 1100 1080 return 1; 1101 1081 } 1102 - EXPORT_SYMBOL_GPL(load_pdptrs); 1082 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs); 1103 1083 1104 1084 static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 1105 1085 { ··· 1152 1132 if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS) 1153 1133 kvm_mmu_reset_context(vcpu); 1154 1134 } 1155 - EXPORT_SYMBOL_GPL(kvm_post_set_cr0); 1135 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0); 1156 1136 1157 1137 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 1158 1138 { ··· 1187 1167 (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))) 1188 1168 return 1; 1189 1169 1170 + if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) 1171 + return 1; 1172 + 1190 1173 kvm_x86_call(set_cr0)(vcpu, cr0); 1191 1174 1192 1175 kvm_post_set_cr0(vcpu, old_cr0, cr0); 1193 1176 1194 1177 return 0; 1195 1178 } 1196 - EXPORT_SYMBOL_GPL(kvm_set_cr0); 1179 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0); 1197 1180 1198 1181 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) 1199 1182 { 1200 1183 (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f)); 1201 1184 } 1202 - EXPORT_SYMBOL_GPL(kvm_lmsw); 1185 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw); 1203 1186 1204 1187 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) 1205 1188 { ··· 1225 1202 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) 1226 1203 wrpkru(vcpu->arch.pkru); 1227 1204 } 1228 - EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); 1205 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_load_guest_xsave_state); 1229 1206 1230 1207 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) 1231 1208 { ··· 1251 1228 } 1252 1229 1253 1230 } 1254 - EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state); 1231 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_load_host_xsave_state); 1255 1232 1256 1233 #ifdef CONFIG_X86_64 1257 1234 static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu) ··· 1260 1237 } 1261 1238 #endif 1262 1239 1263 - static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 1240 + int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 1264 1241 { 1265 1242 u64 xcr0 = xcr; 1266 1243 u64 old_xcr0 = vcpu->arch.xcr0; ··· 1304 1281 vcpu->arch.cpuid_dynamic_bits_dirty = true; 1305 1282 return 0; 1306 1283 } 1284 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_set_xcr); 1307 1285 1308 1286 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu) 1309 1287 { ··· 1317 1293 1318 1294 return kvm_skip_emulated_instruction(vcpu); 1319 1295 } 1320 - EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv); 1296 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv); 1321 1297 1322 1298 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 1323 1299 { ··· 1365 1341 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 1366 1342 1367 1343 } 1368 - EXPORT_SYMBOL_GPL(kvm_post_set_cr4); 1344 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4); 1369 1345 1370 1346 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 1371 1347 { ··· 1390 1366 return 1; 1391 1367 } 1392 1368 1369 + if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP)) 1370 + return 1; 1371 + 1393 1372 kvm_x86_call(set_cr4)(vcpu, cr4); 1394 1373 1395 1374 kvm_post_set_cr4(vcpu, old_cr4, cr4); 1396 1375 1397 1376 return 0; 1398 1377 } 1399 - EXPORT_SYMBOL_GPL(kvm_set_cr4); 1378 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4); 1400 1379 1401 1380 static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid) 1402 1381 { ··· 1491 1464 1492 1465 return 0; 1493 1466 } 1494 - EXPORT_SYMBOL_GPL(kvm_set_cr3); 1467 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3); 1495 1468 1496 1469 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) 1497 1470 { ··· 1503 1476 vcpu->arch.cr8 = cr8; 1504 1477 return 0; 1505 1478 } 1506 - EXPORT_SYMBOL_GPL(kvm_set_cr8); 1479 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8); 1507 1480 1508 1481 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) 1509 1482 { ··· 1512 1485 else 1513 1486 return vcpu->arch.cr8; 1514 1487 } 1515 - EXPORT_SYMBOL_GPL(kvm_get_cr8); 1488 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8); 1516 1489 1517 1490 static void kvm_update_dr0123(struct kvm_vcpu *vcpu) 1518 1491 { ··· 1537 1510 if (dr7 & DR7_BP_EN_MASK) 1538 1511 vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; 1539 1512 } 1540 - EXPORT_SYMBOL_GPL(kvm_update_dr7); 1513 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7); 1541 1514 1542 1515 static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu) 1543 1516 { ··· 1578 1551 1579 1552 return 0; 1580 1553 } 1581 - EXPORT_SYMBOL_GPL(kvm_set_dr); 1554 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr); 1582 1555 1583 1556 unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr) 1584 1557 { ··· 1595 1568 return vcpu->arch.dr7; 1596 1569 } 1597 1570 } 1598 - EXPORT_SYMBOL_GPL(kvm_get_dr); 1571 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr); 1599 1572 1600 1573 int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu) 1601 1574 { 1602 - u32 ecx = kvm_rcx_read(vcpu); 1575 + u32 pmc = kvm_rcx_read(vcpu); 1603 1576 u64 data; 1604 1577 1605 - if (kvm_pmu_rdpmc(vcpu, ecx, &data)) { 1578 + if (kvm_pmu_rdpmc(vcpu, pmc, &data)) { 1606 1579 kvm_inject_gp(vcpu, 0); 1607 1580 return 1; 1608 1581 } ··· 1611 1584 kvm_rdx_write(vcpu, data >> 32); 1612 1585 return kvm_skip_emulated_instruction(vcpu); 1613 1586 } 1614 - EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc); 1587 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdpmc); 1615 1588 1616 1589 /* 1617 1590 * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM ··· 1750 1723 1751 1724 return __kvm_valid_efer(vcpu, efer); 1752 1725 } 1753 - EXPORT_SYMBOL_GPL(kvm_valid_efer); 1726 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer); 1754 1727 1755 1728 static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 1756 1729 { ··· 1793 1766 { 1794 1767 efer_reserved_bits &= ~mask; 1795 1768 } 1796 - EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); 1769 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits); 1797 1770 1798 1771 bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type) 1799 1772 { ··· 1836 1809 1837 1810 return allowed; 1838 1811 } 1839 - EXPORT_SYMBOL_GPL(kvm_msr_allowed); 1812 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed); 1840 1813 1841 1814 /* 1842 1815 * Write @data into the MSR specified by @index. Select MSR specific fault ··· 1897 1870 1898 1871 data = (u32)data; 1899 1872 break; 1873 + case MSR_IA32_U_CET: 1874 + case MSR_IA32_S_CET: 1875 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 1876 + !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT)) 1877 + return KVM_MSR_RET_UNSUPPORTED; 1878 + if (!kvm_is_valid_u_s_cet(vcpu, data)) 1879 + return 1; 1880 + break; 1881 + case MSR_KVM_INTERNAL_GUEST_SSP: 1882 + if (!host_initiated) 1883 + return 1; 1884 + fallthrough; 1885 + /* 1886 + * Note that the MSR emulation here is flawed when a vCPU 1887 + * doesn't support the Intel 64 architecture. The expected 1888 + * architectural behavior in this case is that the upper 32 1889 + * bits do not exist and should always read '0'. However, 1890 + * because the actual hardware on which the virtual CPU is 1891 + * running does support Intel 64, XRSTORS/XSAVES in the 1892 + * guest could observe behavior that violates the 1893 + * architecture. Intercepting XRSTORS/XSAVES for this 1894 + * special case isn't deemed worthwhile. 1895 + */ 1896 + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: 1897 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 1898 + return KVM_MSR_RET_UNSUPPORTED; 1899 + /* 1900 + * MSR_IA32_INT_SSP_TAB is not present on processors that do 1901 + * not support Intel 64 architecture. 1902 + */ 1903 + if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) 1904 + return KVM_MSR_RET_UNSUPPORTED; 1905 + if (is_noncanonical_msr_address(data, vcpu)) 1906 + return 1; 1907 + /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */ 1908 + if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4)) 1909 + return 1; 1910 + break; 1900 1911 } 1901 1912 1902 1913 msr.data = data; ··· 1963 1898 * Returns 0 on success, non-0 otherwise. 1964 1899 * Assumes vcpu_load() was already called. 1965 1900 */ 1966 - int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, 1967 - bool host_initiated) 1901 + static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, 1902 + bool host_initiated) 1968 1903 { 1969 1904 struct msr_data msr; 1970 1905 int ret; ··· 1979 1914 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID)) 1980 1915 return 1; 1981 1916 break; 1917 + case MSR_IA32_U_CET: 1918 + case MSR_IA32_S_CET: 1919 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 1920 + !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT)) 1921 + return KVM_MSR_RET_UNSUPPORTED; 1922 + break; 1923 + case MSR_KVM_INTERNAL_GUEST_SSP: 1924 + if (!host_initiated) 1925 + return 1; 1926 + fallthrough; 1927 + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: 1928 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 1929 + return KVM_MSR_RET_UNSUPPORTED; 1930 + break; 1982 1931 } 1983 1932 1984 1933 msr.index = index; ··· 2004 1925 return ret; 2005 1926 } 2006 1927 1928 + int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 1929 + { 1930 + return __kvm_set_msr(vcpu, index, data, true); 1931 + } 1932 + 1933 + int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1934 + { 1935 + return __kvm_get_msr(vcpu, index, data, true); 1936 + } 1937 + 2007 1938 static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, 2008 1939 u32 index, u64 *data, bool host_initiated) 2009 1940 { ··· 2021 1932 __kvm_get_msr); 2022 1933 } 2023 1934 2024 - int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1935 + int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1936 + { 1937 + return kvm_get_msr_ignored_check(vcpu, index, data, false); 1938 + } 1939 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read); 1940 + 1941 + int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 1942 + { 1943 + return kvm_set_msr_ignored_check(vcpu, index, data, false); 1944 + } 1945 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write); 1946 + 1947 + int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 2025 1948 { 2026 1949 if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) 2027 1950 return KVM_MSR_RET_FILTERED; 2028 - return kvm_get_msr_ignored_check(vcpu, index, data, false); 2029 - } 2030 - EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter); 2031 1951 2032 - int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) 1952 + return __kvm_emulate_msr_read(vcpu, index, data); 1953 + } 1954 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read); 1955 + 1956 + int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 2033 1957 { 2034 1958 if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) 2035 1959 return KVM_MSR_RET_FILTERED; 2036 - return kvm_set_msr_ignored_check(vcpu, index, data, false); 2037 - } 2038 - EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter); 2039 1960 2040 - int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data) 2041 - { 2042 - return kvm_get_msr_ignored_check(vcpu, index, data, false); 1961 + return __kvm_emulate_msr_write(vcpu, index, data); 2043 1962 } 2044 - EXPORT_SYMBOL_GPL(kvm_get_msr); 1963 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write); 2045 1964 2046 - int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data) 2047 - { 2048 - return kvm_set_msr_ignored_check(vcpu, index, data, false); 2049 - } 2050 - EXPORT_SYMBOL_GPL(kvm_set_msr); 2051 1965 2052 1966 static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu) 2053 1967 { ··· 2079 1987 static int complete_fast_rdmsr(struct kvm_vcpu *vcpu) 2080 1988 { 2081 1989 complete_userspace_rdmsr(vcpu); 1990 + return complete_fast_msr_access(vcpu); 1991 + } 1992 + 1993 + static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu) 1994 + { 1995 + if (!vcpu->run->msr.error) 1996 + kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg, 1997 + vcpu->run->msr.data); 1998 + 2082 1999 return complete_fast_msr_access(vcpu); 2083 2000 } 2084 2001 ··· 2125 2024 return 1; 2126 2025 } 2127 2026 2128 - int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu) 2027 + static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg, 2028 + int (*complete_rdmsr)(struct kvm_vcpu *)) 2129 2029 { 2130 - u32 ecx = kvm_rcx_read(vcpu); 2131 2030 u64 data; 2132 2031 int r; 2133 2032 2134 - r = kvm_get_msr_with_filter(vcpu, ecx, &data); 2033 + r = kvm_emulate_msr_read(vcpu, msr, &data); 2135 2034 2136 2035 if (!r) { 2137 - trace_kvm_msr_read(ecx, data); 2036 + trace_kvm_msr_read(msr, data); 2138 2037 2139 - kvm_rax_write(vcpu, data & -1u); 2140 - kvm_rdx_write(vcpu, (data >> 32) & -1u); 2038 + if (reg < 0) { 2039 + kvm_rax_write(vcpu, data & -1u); 2040 + kvm_rdx_write(vcpu, (data >> 32) & -1u); 2041 + } else { 2042 + kvm_register_write(vcpu, reg, data); 2043 + } 2141 2044 } else { 2142 2045 /* MSR read failed? See if we should ask user space */ 2143 - if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0, 2144 - complete_fast_rdmsr, r)) 2046 + if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0, 2047 + complete_rdmsr, r)) 2145 2048 return 0; 2146 - trace_kvm_msr_read_ex(ecx); 2049 + trace_kvm_msr_read_ex(msr); 2147 2050 } 2148 2051 2149 2052 return kvm_x86_call(complete_emulated_msr)(vcpu, r); 2150 2053 } 2151 - EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr); 2152 2054 2153 - int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu) 2055 + int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu) 2154 2056 { 2155 - u32 ecx = kvm_rcx_read(vcpu); 2156 - u64 data = kvm_read_edx_eax(vcpu); 2057 + return __kvm_emulate_rdmsr(vcpu, kvm_rcx_read(vcpu), -1, 2058 + complete_fast_rdmsr); 2059 + } 2060 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr); 2061 + 2062 + int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2063 + { 2064 + vcpu->arch.cui_rdmsr_imm_reg = reg; 2065 + 2066 + return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm); 2067 + } 2068 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm); 2069 + 2070 + static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data) 2071 + { 2157 2072 int r; 2158 2073 2159 - r = kvm_set_msr_with_filter(vcpu, ecx, data); 2160 - 2074 + r = kvm_emulate_msr_write(vcpu, msr, data); 2161 2075 if (!r) { 2162 - trace_kvm_msr_write(ecx, data); 2076 + trace_kvm_msr_write(msr, data); 2163 2077 } else { 2164 2078 /* MSR write failed? See if we should ask user space */ 2165 - if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data, 2079 + if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data, 2166 2080 complete_fast_msr_access, r)) 2167 2081 return 0; 2168 2082 /* Signal all other negative errors to userspace */ 2169 2083 if (r < 0) 2170 2084 return r; 2171 - trace_kvm_msr_write_ex(ecx, data); 2085 + trace_kvm_msr_write_ex(msr, data); 2172 2086 } 2173 2087 2174 2088 return kvm_x86_call(complete_emulated_msr)(vcpu, r); 2175 2089 } 2176 - EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr); 2090 + 2091 + int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu) 2092 + { 2093 + return __kvm_emulate_wrmsr(vcpu, kvm_rcx_read(vcpu), 2094 + kvm_read_edx_eax(vcpu)); 2095 + } 2096 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr); 2097 + 2098 + int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2099 + { 2100 + return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg)); 2101 + } 2102 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm); 2177 2103 2178 2104 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu) 2179 2105 { ··· 2212 2084 /* Treat an INVD instruction as a NOP and just skip it. */ 2213 2085 return kvm_emulate_as_nop(vcpu); 2214 2086 } 2215 - EXPORT_SYMBOL_GPL(kvm_emulate_invd); 2087 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_invd); 2088 + 2089 + fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu) 2090 + { 2091 + if (!kvm_emulate_invd(vcpu)) 2092 + return EXIT_FASTPATH_EXIT_USERSPACE; 2093 + 2094 + return EXIT_FASTPATH_REENTER_GUEST; 2095 + } 2096 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_invd); 2216 2097 2217 2098 int kvm_handle_invalid_op(struct kvm_vcpu *vcpu) 2218 2099 { 2219 2100 kvm_queue_exception(vcpu, UD_VECTOR); 2220 2101 return 1; 2221 2102 } 2222 - EXPORT_SYMBOL_GPL(kvm_handle_invalid_op); 2103 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_invalid_op); 2223 2104 2224 2105 2225 2106 static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn) ··· 2254 2117 { 2255 2118 return kvm_emulate_monitor_mwait(vcpu, "MWAIT"); 2256 2119 } 2257 - EXPORT_SYMBOL_GPL(kvm_emulate_mwait); 2120 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_mwait); 2258 2121 2259 2122 int kvm_emulate_monitor(struct kvm_vcpu *vcpu) 2260 2123 { 2261 2124 return kvm_emulate_monitor_mwait(vcpu, "MONITOR"); 2262 2125 } 2263 - EXPORT_SYMBOL_GPL(kvm_emulate_monitor); 2126 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_monitor); 2264 2127 2265 2128 static inline bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu) 2266 2129 { ··· 2270 2133 kvm_request_pending(vcpu) || xfer_to_guest_mode_work_pending(); 2271 2134 } 2272 2135 2273 - /* 2274 - * The fast path for frequent and performance sensitive wrmsr emulation, 2275 - * i.e. the sending of IPI, sending IPI early in the VM-Exit flow reduces 2276 - * the latency of virtual IPI by avoiding the expensive bits of transitioning 2277 - * from guest to host, e.g. reacquiring KVM's SRCU lock. In contrast to the 2278 - * other cases which must be called after interrupts are enabled on the host. 2279 - */ 2280 - static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data) 2136 + static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data) 2281 2137 { 2282 - if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic)) 2283 - return 1; 2284 - 2285 - if (((data & APIC_SHORT_MASK) == APIC_DEST_NOSHORT) && 2286 - ((data & APIC_DEST_MASK) == APIC_DEST_PHYSICAL) && 2287 - ((data & APIC_MODE_MASK) == APIC_DM_FIXED) && 2288 - ((u32)(data >> 32) != X2APIC_BROADCAST)) 2289 - return kvm_x2apic_icr_write(vcpu->arch.apic, data); 2290 - 2291 - return 1; 2292 - } 2293 - 2294 - static int handle_fastpath_set_tscdeadline(struct kvm_vcpu *vcpu, u64 data) 2295 - { 2296 - if (!kvm_can_use_hv_timer(vcpu)) 2297 - return 1; 2298 - 2299 - kvm_set_lapic_tscdeadline_msr(vcpu, data); 2300 - return 0; 2301 - } 2302 - 2303 - fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) 2304 - { 2305 - u32 msr = kvm_rcx_read(vcpu); 2306 - u64 data; 2307 - fastpath_t ret; 2308 - bool handled; 2309 - 2310 - kvm_vcpu_srcu_read_lock(vcpu); 2311 - 2312 2138 switch (msr) { 2313 2139 case APIC_BASE_MSR + (APIC_ICR >> 4): 2314 - data = kvm_read_edx_eax(vcpu); 2315 - handled = !handle_fastpath_set_x2apic_icr_irqoff(vcpu, data); 2140 + if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) || 2141 + kvm_x2apic_icr_write_fast(vcpu->arch.apic, data)) 2142 + return EXIT_FASTPATH_NONE; 2316 2143 break; 2317 2144 case MSR_IA32_TSC_DEADLINE: 2318 - data = kvm_read_edx_eax(vcpu); 2319 - handled = !handle_fastpath_set_tscdeadline(vcpu, data); 2145 + kvm_set_lapic_tscdeadline_msr(vcpu, data); 2320 2146 break; 2321 2147 default: 2322 - handled = false; 2323 - break; 2148 + return EXIT_FASTPATH_NONE; 2324 2149 } 2325 2150 2326 - if (handled) { 2327 - if (!kvm_skip_emulated_instruction(vcpu)) 2328 - ret = EXIT_FASTPATH_EXIT_USERSPACE; 2329 - else 2330 - ret = EXIT_FASTPATH_REENTER_GUEST; 2331 - trace_kvm_msr_write(msr, data); 2332 - } else { 2333 - ret = EXIT_FASTPATH_NONE; 2334 - } 2151 + trace_kvm_msr_write(msr, data); 2335 2152 2336 - kvm_vcpu_srcu_read_unlock(vcpu); 2153 + if (!kvm_skip_emulated_instruction(vcpu)) 2154 + return EXIT_FASTPATH_EXIT_USERSPACE; 2337 2155 2338 - return ret; 2156 + return EXIT_FASTPATH_REENTER_GUEST; 2339 2157 } 2340 - EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff); 2158 + 2159 + fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu) 2160 + { 2161 + return __handle_fastpath_wrmsr(vcpu, kvm_rcx_read(vcpu), 2162 + kvm_read_edx_eax(vcpu)); 2163 + } 2164 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr); 2165 + 2166 + fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2167 + { 2168 + return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg)); 2169 + } 2170 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm); 2341 2171 2342 2172 /* 2343 2173 * Adapt set_msr() to msr_io()'s calling convention ··· 2670 2566 return vcpu->arch.l1_tsc_offset + 2671 2567 kvm_scale_tsc(host_tsc, vcpu->arch.l1_tsc_scaling_ratio); 2672 2568 } 2673 - EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); 2569 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_l1_tsc); 2674 2570 2675 2571 u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier) 2676 2572 { ··· 2685 2581 nested_offset += l2_offset; 2686 2582 return nested_offset; 2687 2583 } 2688 - EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset); 2584 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_offset); 2689 2585 2690 2586 u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier) 2691 2587 { ··· 2695 2591 2696 2592 return l1_multiplier; 2697 2593 } 2698 - EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier); 2594 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_multiplier); 2699 2595 2700 2596 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset) 2701 2597 { ··· 3773 3669 if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) 3774 3670 kvm_vcpu_flush_tlb_guest(vcpu); 3775 3671 } 3776 - EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests); 3672 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_service_local_tlb_flush_requests); 3777 3673 3778 3674 static void record_steal_time(struct kvm_vcpu *vcpu) 3779 3675 { ··· 3871 3767 user_access_end(); 3872 3768 dirty: 3873 3769 mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); 3770 + } 3771 + 3772 + /* 3773 + * Returns true if the MSR in question is managed via XSTATE, i.e. is context 3774 + * switched with the rest of guest FPU state. Note! S_CET is _not_ context 3775 + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS. 3776 + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields, 3777 + * the value saved/restored via XSTATE is always the host's value. That detail 3778 + * is _extremely_ important, as the guest's S_CET must _never_ be resident in 3779 + * hardware while executing in the host. Loading guest values for U_CET and 3780 + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to 3781 + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower 3782 + * privilege levels, i.e. are effectively only consumed by userspace as well. 3783 + */ 3784 + static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr) 3785 + { 3786 + if (!vcpu) 3787 + return false; 3788 + 3789 + switch (msr) { 3790 + case MSR_IA32_U_CET: 3791 + return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) || 3792 + guest_cpu_cap_has(vcpu, X86_FEATURE_IBT); 3793 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 3794 + return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 3795 + default: 3796 + return false; 3797 + } 3798 + } 3799 + 3800 + /* 3801 + * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an 3802 + * MSR that is managed via XSTATE. Note, the caller is responsible for doing 3803 + * the initial FPU load, this helper only ensures that guest state is resident 3804 + * in hardware (the kernel can load its FPU state in IRQ context). 3805 + */ 3806 + static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu, 3807 + struct msr_data *msr_info, 3808 + int access) 3809 + { 3810 + BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W); 3811 + 3812 + KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm); 3813 + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm); 3814 + 3815 + kvm_fpu_get(); 3816 + if (access == MSR_TYPE_R) 3817 + rdmsrq(msr_info->index, msr_info->data); 3818 + else 3819 + wrmsrq(msr_info->index, msr_info->data); 3820 + kvm_fpu_put(); 3821 + } 3822 + 3823 + static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 3824 + { 3825 + kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W); 3826 + } 3827 + 3828 + static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 3829 + { 3830 + kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R); 3874 3831 } 3875 3832 3876 3833 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ··· 4125 3960 } 4126 3961 break; 4127 3962 case MSR_IA32_XSS: 4128 - if (!msr_info->host_initiated && 4129 - !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) 3963 + if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) 3964 + return KVM_MSR_RET_UNSUPPORTED; 3965 + 3966 + if (data & ~vcpu->arch.guest_supported_xss) 4130 3967 return 1; 4131 - /* 4132 - * KVM supports exposing PT to the guest, but does not support 4133 - * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than 4134 - * XSAVES/XRSTORS to save/restore PT MSRs. 4135 - */ 4136 - if (data & ~kvm_caps.supported_xss) 4137 - return 1; 3968 + if (vcpu->arch.ia32_xss == data) 3969 + break; 4138 3970 vcpu->arch.ia32_xss = data; 4139 3971 vcpu->arch.cpuid_dynamic_bits_dirty = true; 4140 3972 break; ··· 4315 4153 vcpu->arch.guest_fpu.xfd_err = data; 4316 4154 break; 4317 4155 #endif 4156 + case MSR_IA32_U_CET: 4157 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 4158 + kvm_set_xstate_msr(vcpu, msr_info); 4159 + break; 4318 4160 default: 4319 4161 if (kvm_pmu_is_valid_msr(vcpu, msr)) 4320 4162 return kvm_pmu_set_msr(vcpu, msr_info); ··· 4327 4161 } 4328 4162 return 0; 4329 4163 } 4330 - EXPORT_SYMBOL_GPL(kvm_set_msr_common); 4164 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common); 4331 4165 4332 4166 static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) 4333 4167 { ··· 4668 4502 msr_info->data = vcpu->arch.guest_fpu.xfd_err; 4669 4503 break; 4670 4504 #endif 4505 + case MSR_IA32_U_CET: 4506 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 4507 + kvm_get_xstate_msr(vcpu, msr_info); 4508 + break; 4671 4509 default: 4672 4510 if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) 4673 4511 return kvm_pmu_get_msr(vcpu, msr_info); ··· 4680 4510 } 4681 4511 return 0; 4682 4512 } 4683 - EXPORT_SYMBOL_GPL(kvm_get_msr_common); 4513 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common); 4684 4514 4685 4515 /* 4686 4516 * Read or write a bunch of msrs. All parameters are kernel addresses. ··· 4692 4522 int (*do_msr)(struct kvm_vcpu *vcpu, 4693 4523 unsigned index, u64 *data)) 4694 4524 { 4525 + bool fpu_loaded = false; 4695 4526 int i; 4696 4527 4697 - for (i = 0; i < msrs->nmsrs; ++i) 4528 + for (i = 0; i < msrs->nmsrs; ++i) { 4529 + /* 4530 + * If userspace is accessing one or more XSTATE-managed MSRs, 4531 + * temporarily load the guest's FPU state so that the guest's 4532 + * MSR value(s) is resident in hardware and thus can be accessed 4533 + * via RDMSR/WRMSR. 4534 + */ 4535 + if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) { 4536 + kvm_load_guest_fpu(vcpu); 4537 + fpu_loaded = true; 4538 + } 4698 4539 if (do_msr(vcpu, entries[i].index, &entries[i].data)) 4699 4540 break; 4541 + } 4542 + if (fpu_loaded) 4543 + kvm_put_guest_fpu(vcpu); 4700 4544 4701 4545 return i; 4702 4546 } ··· 4895 4711 case KVM_CAP_IRQFD_RESAMPLE: 4896 4712 case KVM_CAP_MEMORY_FAULT_INFO: 4897 4713 case KVM_CAP_X86_GUEST_MODE: 4714 + case KVM_CAP_ONE_REG: 4898 4715 r = 1; 4899 4716 break; 4900 4717 case KVM_CAP_PRE_FAULT_MEMORY: ··· 6074 5889 } 6075 5890 } 6076 5891 5892 + struct kvm_x86_reg_id { 5893 + __u32 index; 5894 + __u8 type; 5895 + __u8 rsvd1; 5896 + __u8 rsvd2:4; 5897 + __u8 size:4; 5898 + __u8 x86; 5899 + }; 5900 + 5901 + static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu, 5902 + struct kvm_x86_reg_id *reg) 5903 + { 5904 + switch (reg->index) { 5905 + case KVM_REG_GUEST_SSP: 5906 + /* 5907 + * FIXME: If host-initiated accesses are ever exempted from 5908 + * ignore_msrs (in kvm_do_msr_access()), drop this manual check 5909 + * and rely on KVM's standard checks to reject accesses to regs 5910 + * that don't exist. 5911 + */ 5912 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 5913 + return -EINVAL; 5914 + 5915 + reg->type = KVM_X86_REG_TYPE_MSR; 5916 + reg->index = MSR_KVM_INTERNAL_GUEST_SSP; 5917 + break; 5918 + default: 5919 + return -EINVAL; 5920 + } 5921 + return 0; 5922 + } 5923 + 5924 + static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val) 5925 + { 5926 + u64 val; 5927 + 5928 + if (do_get_msr(vcpu, msr, &val)) 5929 + return -EINVAL; 5930 + 5931 + if (put_user(val, user_val)) 5932 + return -EFAULT; 5933 + 5934 + return 0; 5935 + } 5936 + 5937 + static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val) 5938 + { 5939 + u64 val; 5940 + 5941 + if (get_user(val, user_val)) 5942 + return -EFAULT; 5943 + 5944 + if (do_set_msr(vcpu, msr, &val)) 5945 + return -EINVAL; 5946 + 5947 + return 0; 5948 + } 5949 + 5950 + static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl, 5951 + void __user *argp) 5952 + { 5953 + struct kvm_one_reg one_reg; 5954 + struct kvm_x86_reg_id *reg; 5955 + u64 __user *user_val; 5956 + bool load_fpu; 5957 + int r; 5958 + 5959 + if (copy_from_user(&one_reg, argp, sizeof(one_reg))) 5960 + return -EFAULT; 5961 + 5962 + if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86) 5963 + return -EINVAL; 5964 + 5965 + reg = (struct kvm_x86_reg_id *)&one_reg.id; 5966 + if (reg->rsvd1 || reg->rsvd2) 5967 + return -EINVAL; 5968 + 5969 + if (reg->type == KVM_X86_REG_TYPE_KVM) { 5970 + r = kvm_translate_kvm_reg(vcpu, reg); 5971 + if (r) 5972 + return r; 5973 + } 5974 + 5975 + if (reg->type != KVM_X86_REG_TYPE_MSR) 5976 + return -EINVAL; 5977 + 5978 + if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64) 5979 + return -EINVAL; 5980 + 5981 + guard(srcu)(&vcpu->kvm->srcu); 5982 + 5983 + load_fpu = is_xstate_managed_msr(vcpu, reg->index); 5984 + if (load_fpu) 5985 + kvm_load_guest_fpu(vcpu); 5986 + 5987 + user_val = u64_to_user_ptr(one_reg.addr); 5988 + if (ioctl == KVM_GET_ONE_REG) 5989 + r = kvm_get_one_msr(vcpu, reg->index, user_val); 5990 + else 5991 + r = kvm_set_one_msr(vcpu, reg->index, user_val); 5992 + 5993 + if (load_fpu) 5994 + kvm_put_guest_fpu(vcpu); 5995 + return r; 5996 + } 5997 + 5998 + static int kvm_get_reg_list(struct kvm_vcpu *vcpu, 5999 + struct kvm_reg_list __user *user_list) 6000 + { 6001 + u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0; 6002 + u64 user_nr_regs; 6003 + 6004 + if (get_user(user_nr_regs, &user_list->n)) 6005 + return -EFAULT; 6006 + 6007 + if (put_user(nr_regs, &user_list->n)) 6008 + return -EFAULT; 6009 + 6010 + if (user_nr_regs < nr_regs) 6011 + return -E2BIG; 6012 + 6013 + if (nr_regs && 6014 + put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0])) 6015 + return -EFAULT; 6016 + 6017 + return 0; 6018 + } 6019 + 6077 6020 long kvm_arch_vcpu_ioctl(struct file *filp, 6078 6021 unsigned int ioctl, unsigned long arg) 6079 6022 { ··· 6318 6005 srcu_read_unlock(&vcpu->kvm->srcu, idx); 6319 6006 break; 6320 6007 } 6008 + case KVM_GET_ONE_REG: 6009 + case KVM_SET_ONE_REG: 6010 + r = kvm_get_set_one_reg(vcpu, ioctl, argp); 6011 + break; 6012 + case KVM_GET_REG_LIST: 6013 + r = kvm_get_reg_list(vcpu, argp); 6014 + break; 6321 6015 case KVM_TPR_ACCESS_REPORTING: { 6322 6016 struct kvm_tpr_access_ctl tac; 6323 6017 ··· 7091 6771 7092 6772 kvm_free_msr_filter(old_filter); 7093 6773 7094 - kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED); 6774 + /* 6775 + * Recalc MSR intercepts as userspace may want to intercept accesses to 6776 + * MSRs that KVM would otherwise pass through to the guest. 6777 + */ 6778 + kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS); 7095 6779 7096 6780 return 0; 7097 6781 } ··· 7288 6964 7289 6965 r = -EEXIST; 7290 6966 if (irqchip_in_kernel(kvm)) 6967 + goto create_irqchip_unlock; 6968 + 6969 + /* 6970 + * Disallow an in-kernel I/O APIC if the VM has protected EOIs, 6971 + * i.e. if KVM can't intercept EOIs and thus can't properly 6972 + * emulate level-triggered interrupts. 6973 + */ 6974 + r = -ENOTTY; 6975 + if (kvm->arch.has_protected_eoi) 7291 6976 goto create_irqchip_unlock; 7292 6977 7293 6978 r = -EINVAL; ··· 7686 7353 case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: 7687 7354 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: 7688 7355 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 7356 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 7689 7357 if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) 7690 7358 return; 7691 7359 break; ··· 7697 7363 break; 7698 7364 case MSR_IA32_TSX_CTRL: 7699 7365 if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) 7366 + return; 7367 + break; 7368 + case MSR_IA32_XSS: 7369 + if (!kvm_caps.supported_xss) 7370 + return; 7371 + break; 7372 + case MSR_IA32_U_CET: 7373 + case MSR_IA32_S_CET: 7374 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7375 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7376 + return; 7377 + break; 7378 + case MSR_IA32_INT_SSP_TAB: 7379 + if (!kvm_cpu_cap_has(X86_FEATURE_LM)) 7380 + return; 7381 + fallthrough; 7382 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 7383 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) 7700 7384 return; 7701 7385 break; 7702 7386 default: ··· 7836 7484 u64 access = (kvm_x86_call(get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0; 7837 7485 return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception); 7838 7486 } 7839 - EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); 7487 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_gva_to_gpa_read); 7840 7488 7841 7489 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, 7842 7490 struct x86_exception *exception) ··· 7847 7495 access |= PFERR_WRITE_MASK; 7848 7496 return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception); 7849 7497 } 7850 - EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); 7498 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_gva_to_gpa_write); 7851 7499 7852 7500 /* uses this to access any guest's mapped memory without checking CPL */ 7853 7501 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, ··· 7933 7581 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, 7934 7582 exception); 7935 7583 } 7936 - EXPORT_SYMBOL_GPL(kvm_read_guest_virt); 7584 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_virt); 7937 7585 7938 7586 static int emulator_read_std(struct x86_emulate_ctxt *ctxt, 7939 7587 gva_t addr, void *val, unsigned int bytes, ··· 8005 7653 return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 8006 7654 PFERR_WRITE_MASK, exception); 8007 7655 } 8008 - EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system); 7656 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_virt_system); 8009 7657 8010 7658 static int kvm_check_emulate_insn(struct kvm_vcpu *vcpu, int emul_type, 8011 7659 void *insn, int insn_len) ··· 8039 7687 8040 7688 return kvm_emulate_instruction(vcpu, emul_type); 8041 7689 } 8042 - EXPORT_SYMBOL_GPL(handle_ud); 7690 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_ud); 8043 7691 8044 7692 static int vcpu_is_mmio_gpa(struct kvm_vcpu *vcpu, unsigned long gva, 8045 7693 gpa_t gpa, bool write) ··· 8518 8166 kvm_emulate_wbinvd_noskip(vcpu); 8519 8167 return kvm_skip_emulated_instruction(vcpu); 8520 8168 } 8521 - EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd); 8169 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wbinvd); 8522 8170 8523 8171 8524 8172 ··· 8705 8353 struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); 8706 8354 int r; 8707 8355 8708 - r = kvm_get_msr_with_filter(vcpu, msr_index, pdata); 8356 + r = kvm_emulate_msr_read(vcpu, msr_index, pdata); 8709 8357 if (r < 0) 8710 8358 return X86EMUL_UNHANDLEABLE; 8711 8359 ··· 8728 8376 struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); 8729 8377 int r; 8730 8378 8731 - r = kvm_set_msr_with_filter(vcpu, msr_index, data); 8379 + r = kvm_emulate_msr_write(vcpu, msr_index, data); 8732 8380 if (r < 0) 8733 8381 return X86EMUL_UNHANDLEABLE; 8734 8382 ··· 8748 8396 static int emulator_get_msr(struct x86_emulate_ctxt *ctxt, 8749 8397 u32 msr_index, u64 *pdata) 8750 8398 { 8751 - return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata); 8399 + /* 8400 + * Treat emulator accesses to the current shadow stack pointer as host- 8401 + * initiated, as they aren't true MSR accesses (SSP is a "just a reg"), 8402 + * and this API is used only for implicit accesses, i.e. not RDMSR, and 8403 + * so the index is fully KVM-controlled. 8404 + */ 8405 + if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP)) 8406 + return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata); 8407 + 8408 + return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata); 8752 8409 } 8753 8410 8754 8411 static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc) ··· 8829 8468 static bool emulator_is_smm(struct x86_emulate_ctxt *ctxt) 8830 8469 { 8831 8470 return is_smm(emul_to_vcpu(ctxt)); 8832 - } 8833 - 8834 - static bool emulator_is_guest_mode(struct x86_emulate_ctxt *ctxt) 8835 - { 8836 - return is_guest_mode(emul_to_vcpu(ctxt)); 8837 8471 } 8838 8472 8839 8473 #ifndef CONFIG_KVM_SMM ··· 8914 8558 .guest_cpuid_is_intel_compatible = emulator_guest_cpuid_is_intel_compatible, 8915 8559 .set_nmi_mask = emulator_set_nmi_mask, 8916 8560 .is_smm = emulator_is_smm, 8917 - .is_guest_mode = emulator_is_guest_mode, 8918 8561 .leave_smm = emulator_leave_smm, 8919 8562 .triple_fault = emulator_triple_fault, 8920 8563 .set_xcr = emulator_set_xcr, ··· 9016 8661 kvm_set_rflags(vcpu, ctxt->eflags); 9017 8662 } 9018 8663 } 9019 - EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt); 8664 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_realmode_interrupt); 9020 8665 9021 8666 static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, 9022 8667 u8 ndata, u8 *insn_bytes, u8 insn_size) ··· 9081 8726 { 9082 8727 prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0); 9083 8728 } 9084 - EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit); 8729 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_prepare_emulation_failure_exit); 9085 8730 9086 8731 void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) 9087 8732 { 9088 8733 __kvm_prepare_emulation_failure_exit(vcpu, NULL, 0); 9089 8734 } 9090 - EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit); 8735 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prepare_emulation_failure_exit); 9091 8736 9092 8737 void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa) 9093 8738 { ··· 9109 8754 run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV; 9110 8755 run->internal.ndata = ndata; 9111 8756 } 9112 - EXPORT_SYMBOL_GPL(kvm_prepare_event_vectoring_exit); 8757 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prepare_event_vectoring_exit); 9113 8758 9114 8759 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) 9115 8760 { ··· 9219 8864 if (unlikely(!r)) 9220 8865 return 0; 9221 8866 9222 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED); 8867 + kvm_pmu_instruction_retired(vcpu); 9223 8868 9224 8869 /* 9225 8870 * rflags is the old, "raw" value of the flags. The new value has ··· 9233 8878 r = kvm_vcpu_do_singlestep(vcpu); 9234 8879 return r; 9235 8880 } 9236 - EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction); 8881 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_skip_emulated_instruction); 9237 8882 9238 8883 static bool kvm_is_code_breakpoint_inhibited(struct kvm_vcpu *vcpu) 9239 8884 { ··· 9364 9009 9365 9010 return r; 9366 9011 } 9367 - EXPORT_SYMBOL_GPL(x86_decode_emulated_instruction); 9012 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(x86_decode_emulated_instruction); 9368 9013 9369 9014 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 9370 9015 int emulation_type, void *insn, int insn_len) ··· 9498 9143 ctxt->exception.address = 0; 9499 9144 } 9500 9145 9501 - r = x86_emulate_insn(ctxt); 9146 + /* 9147 + * Check L1's instruction intercepts when emulating instructions for 9148 + * L2, unless KVM is re-emulating a previously decoded instruction, 9149 + * e.g. to complete userspace I/O, in which case KVM has already 9150 + * checked the intercepts. 9151 + */ 9152 + r = x86_emulate_insn(ctxt, is_guest_mode(vcpu) && 9153 + !(emulation_type & EMULTYPE_NO_DECODE)); 9502 9154 9503 9155 if (r == EMULATION_INTERCEPTED) 9504 9156 return 1; ··· 9560 9198 */ 9561 9199 if (!ctxt->have_exception || 9562 9200 exception_type(ctxt->exception.vector) == EXCPT_TRAP) { 9563 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED); 9201 + kvm_pmu_instruction_retired(vcpu); 9564 9202 if (ctxt->is_branch) 9565 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED); 9203 + kvm_pmu_branch_retired(vcpu); 9566 9204 kvm_rip_write(vcpu, ctxt->eip); 9567 9205 if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) 9568 9206 r = kvm_vcpu_do_singlestep(vcpu); ··· 9588 9226 { 9589 9227 return x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0); 9590 9228 } 9591 - EXPORT_SYMBOL_GPL(kvm_emulate_instruction); 9229 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_instruction); 9592 9230 9593 9231 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, 9594 9232 void *insn, int insn_len) 9595 9233 { 9596 9234 return x86_emulate_instruction(vcpu, 0, 0, insn, insn_len); 9597 9235 } 9598 - EXPORT_SYMBOL_GPL(kvm_emulate_instruction_from_buffer); 9236 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_instruction_from_buffer); 9599 9237 9600 9238 static int complete_fast_pio_out_port_0x7e(struct kvm_vcpu *vcpu) 9601 9239 { ··· 9690 9328 ret = kvm_fast_pio_out(vcpu, size, port); 9691 9329 return ret && kvm_skip_emulated_instruction(vcpu); 9692 9330 } 9693 - EXPORT_SYMBOL_GPL(kvm_fast_pio); 9331 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fast_pio); 9694 9332 9695 9333 static int kvmclock_cpu_down_prep(unsigned int cpu) 9696 9334 { ··· 10013 9651 return -EIO; 10014 9652 } 10015 9653 9654 + if (boot_cpu_has(X86_FEATURE_SHSTK) || boot_cpu_has(X86_FEATURE_IBT)) { 9655 + rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet); 9656 + /* 9657 + * Linux doesn't yet support supervisor shadow stacks (SSS), so 9658 + * KVM doesn't save/restore the associated MSRs, i.e. KVM may 9659 + * clobber the host values. Yell and refuse to load if SSS is 9660 + * unexpectedly enabled, e.g. to avoid crashing the host. 9661 + */ 9662 + if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN)) 9663 + return -EIO; 9664 + } 9665 + 10016 9666 memset(&kvm_caps, 0, sizeof(kvm_caps)); 10017 9667 10018 9668 x86_emulator_cache = kvm_alloc_emulator_cache(); ··· 10052 9678 kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); 10053 9679 kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; 10054 9680 } 9681 + 9682 + if (boot_cpu_has(X86_FEATURE_XSAVES)) { 9683 + rdmsrq(MSR_IA32_XSS, kvm_host.xss); 9684 + kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS; 9685 + } 9686 + 10055 9687 kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS; 10056 9688 kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS; 10057 9689 10058 9690 rdmsrq_safe(MSR_EFER, &kvm_host.efer); 10059 - 10060 - if (boot_cpu_has(X86_FEATURE_XSAVES)) 10061 - rdmsrq(MSR_IA32_XSS, kvm_host.xss); 10062 9691 10063 9692 kvm_init_pmu_capability(ops->pmu_ops); 10064 9693 ··· 10111 9734 if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) 10112 9735 kvm_caps.supported_xss = 0; 10113 9736 9737 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 9738 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 9739 + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL; 9740 + 9741 + if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) { 9742 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 9743 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 9744 + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL; 9745 + } 9746 + 10114 9747 if (kvm_caps.has_tsc_control) { 10115 9748 /* 10116 9749 * Make sure the user can only configure tsc_khz values that ··· 10147 9760 kmem_cache_destroy(x86_emulator_cache); 10148 9761 return r; 10149 9762 } 10150 - EXPORT_SYMBOL_GPL(kvm_x86_vendor_init); 9763 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x86_vendor_init); 10151 9764 10152 9765 void kvm_x86_vendor_exit(void) 10153 9766 { ··· 10181 9794 kvm_x86_ops.enable_virtualization_cpu = NULL; 10182 9795 mutex_unlock(&vendor_module_lock); 10183 9796 } 10184 - EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit); 9797 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x86_vendor_exit); 10185 9798 10186 9799 #ifdef CONFIG_X86_64 10187 9800 static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr, ··· 10245 9858 { 10246 9859 return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0); 10247 9860 } 10248 - EXPORT_SYMBOL_GPL(kvm_apicv_activated); 9861 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apicv_activated); 10249 9862 10250 9863 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu) 10251 9864 { ··· 10255 9868 10256 9869 return (vm_reasons | vcpu_reasons) == 0; 10257 9870 } 10258 - EXPORT_SYMBOL_GPL(kvm_vcpu_apicv_activated); 9871 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_apicv_activated); 10259 9872 10260 9873 static void set_or_clear_apicv_inhibit(unsigned long *inhibits, 10261 9874 enum kvm_apicv_inhibit reason, bool set) ··· 10431 10044 vcpu->run->hypercall.ret = ret; 10432 10045 return 1; 10433 10046 } 10434 - EXPORT_SYMBOL_GPL(____kvm_emulate_hypercall); 10047 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(____kvm_emulate_hypercall); 10435 10048 10436 10049 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) 10437 10050 { ··· 10444 10057 return __kvm_emulate_hypercall(vcpu, kvm_x86_call(get_cpl)(vcpu), 10445 10058 complete_hypercall_exit); 10446 10059 } 10447 - EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); 10060 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_hypercall); 10448 10061 10449 10062 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) 10450 10063 { ··· 10887 10500 preempt_enable(); 10888 10501 up_read(&vcpu->kvm->arch.apicv_update_lock); 10889 10502 } 10890 - EXPORT_SYMBOL_GPL(__kvm_vcpu_update_apicv); 10503 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_update_apicv); 10891 10504 10892 10505 static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) 10893 10506 { ··· 10963 10576 __kvm_set_or_clear_apicv_inhibit(kvm, reason, set); 10964 10577 up_write(&kvm->arch.apicv_update_lock); 10965 10578 } 10966 - EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit); 10579 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_or_clear_apicv_inhibit); 10967 10580 10968 10581 static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) 10969 10582 { ··· 11183 10796 if (kvm_check_request(KVM_REQ_APF_READY, vcpu)) 11184 10797 kvm_check_async_pf_completion(vcpu); 11185 10798 11186 - /* 11187 - * Recalc MSR intercepts as userspace may want to intercept 11188 - * accesses to MSRs that KVM would otherwise pass through to 11189 - * the guest. 11190 - */ 11191 - if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu)) 11192 - kvm_x86_call(recalc_msr_intercepts)(vcpu); 10799 + if (kvm_check_request(KVM_REQ_RECALC_INTERCEPTS, vcpu)) 10800 + kvm_x86_call(recalc_intercepts)(vcpu); 11193 10801 11194 10802 if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) 11195 10803 kvm_x86_call(update_cpu_dirty_logging)(vcpu); ··· 11517 11135 11518 11136 return false; 11519 11137 } 11520 - EXPORT_SYMBOL_GPL(kvm_vcpu_has_events); 11138 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_has_events); 11521 11139 11522 11140 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 11523 11141 { ··· 11670 11288 { 11671 11289 return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT); 11672 11290 } 11673 - EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip); 11291 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_halt_noskip); 11674 11292 11675 11293 int kvm_emulate_halt(struct kvm_vcpu *vcpu) 11676 11294 { ··· 11681 11299 */ 11682 11300 return kvm_emulate_halt_noskip(vcpu) && ret; 11683 11301 } 11684 - EXPORT_SYMBOL_GPL(kvm_emulate_halt); 11302 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_halt); 11685 11303 11686 11304 fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu) 11687 11305 { 11688 - int ret; 11689 - 11690 - kvm_vcpu_srcu_read_lock(vcpu); 11691 - ret = kvm_emulate_halt(vcpu); 11692 - kvm_vcpu_srcu_read_unlock(vcpu); 11693 - 11694 - if (!ret) 11306 + if (!kvm_emulate_halt(vcpu)) 11695 11307 return EXIT_FASTPATH_EXIT_USERSPACE; 11696 11308 11697 11309 if (kvm_vcpu_running(vcpu)) ··· 11693 11317 11694 11318 return EXIT_FASTPATH_EXIT_HANDLED; 11695 11319 } 11696 - EXPORT_SYMBOL_GPL(handle_fastpath_hlt); 11320 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_hlt); 11697 11321 11698 11322 int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu) 11699 11323 { ··· 11702 11326 return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, 11703 11327 KVM_EXIT_AP_RESET_HOLD) && ret; 11704 11328 } 11705 - EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold); 11329 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_ap_reset_hold); 11706 11330 11707 11331 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) 11708 11332 { ··· 12213 11837 struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 12214 11838 int ret; 12215 11839 11840 + if (kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) { 11841 + u64 u_cet, s_cet; 11842 + 11843 + /* 11844 + * Check both User and Supervisor on task switches as inter- 11845 + * privilege level task switches are impacted by CET at both 11846 + * the current privilege level and the new privilege level, and 11847 + * that information is not known at this time. The expectation 11848 + * is that the guest won't require emulation of task switches 11849 + * while using IBT or Shadow Stacks. 11850 + */ 11851 + if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) || 11852 + __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet)) 11853 + goto unhandled_task_switch; 11854 + 11855 + if ((u_cet | s_cet) & (CET_ENDBR_EN | CET_SHSTK_EN)) 11856 + goto unhandled_task_switch; 11857 + } 11858 + 12216 11859 init_emulate_ctxt(vcpu); 12217 11860 12218 11861 ret = emulator_task_switch(ctxt, tss_selector, idt_index, reason, ··· 12241 11846 * Report an error userspace if MMIO is needed, as KVM doesn't support 12242 11847 * MMIO during a task switch (or any other complex operation). 12243 11848 */ 12244 - if (ret || vcpu->mmio_needed) { 12245 - vcpu->mmio_needed = false; 12246 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 12247 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 12248 - vcpu->run->internal.ndata = 0; 12249 - return 0; 12250 - } 11849 + if (ret || vcpu->mmio_needed) 11850 + goto unhandled_task_switch; 12251 11851 12252 11852 kvm_rip_write(vcpu, ctxt->eip); 12253 11853 kvm_set_rflags(vcpu, ctxt->eflags); 12254 11854 return 1; 11855 + 11856 + unhandled_task_switch: 11857 + vcpu->mmio_needed = false; 11858 + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 11859 + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 11860 + vcpu->run->internal.ndata = 0; 11861 + return 0; 12255 11862 } 12256 - EXPORT_SYMBOL_GPL(kvm_task_switch); 11863 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_task_switch); 12257 11864 12258 11865 static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) 12259 11866 { ··· 12785 12388 kvfree(vcpu->arch.cpuid_entries); 12786 12389 } 12787 12390 12391 + static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event) 12392 + { 12393 + struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; 12394 + u64 xfeatures_mask; 12395 + int i; 12396 + 12397 + /* 12398 + * Guest FPU state is zero allocated and so doesn't need to be manually 12399 + * cleared on RESET, i.e. during vCPU creation. 12400 + */ 12401 + if (!init_event || !fpstate) 12402 + return; 12403 + 12404 + /* 12405 + * On INIT, only select XSTATE components are zeroed, most components 12406 + * are unchanged. Currently, the only components that are zeroed and 12407 + * supported by KVM are MPX and CET related. 12408 + */ 12409 + xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) & 12410 + (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR | 12411 + XFEATURE_MASK_CET_ALL); 12412 + if (!xfeatures_mask) 12413 + return; 12414 + 12415 + BUILD_BUG_ON(sizeof(xfeatures_mask) * BITS_PER_BYTE <= XFEATURE_MAX); 12416 + 12417 + /* 12418 + * All paths that lead to INIT are required to load the guest's FPU 12419 + * state (because most paths are buried in KVM_RUN). 12420 + */ 12421 + kvm_put_guest_fpu(vcpu); 12422 + for_each_set_bit(i, (unsigned long *)&xfeatures_mask, XFEATURE_MAX) 12423 + fpstate_clear_xstate_component(fpstate, i); 12424 + kvm_load_guest_fpu(vcpu); 12425 + } 12426 + 12788 12427 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 12789 12428 { 12790 12429 struct kvm_cpuid_entry2 *cpuid_0x1; ··· 12878 12445 kvm_async_pf_hash_reset(vcpu); 12879 12446 vcpu->arch.apf.halted = false; 12880 12447 12881 - if (vcpu->arch.guest_fpu.fpstate && kvm_mpx_supported()) { 12882 - struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; 12883 - 12884 - /* 12885 - * All paths that lead to INIT are required to load the guest's 12886 - * FPU state (because most paths are buried in KVM_RUN). 12887 - */ 12888 - if (init_event) 12889 - kvm_put_guest_fpu(vcpu); 12890 - 12891 - fpstate_clear_xstate_component(fpstate, XFEATURE_BNDREGS); 12892 - fpstate_clear_xstate_component(fpstate, XFEATURE_BNDCSR); 12893 - 12894 - if (init_event) 12895 - kvm_load_guest_fpu(vcpu); 12896 - } 12448 + kvm_xstate_reset(vcpu, init_event); 12897 12449 12898 12450 if (!init_event) { 12899 12451 vcpu->arch.smbase = 0x30000; ··· 12890 12472 MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; 12891 12473 12892 12474 __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP); 12893 - __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true); 12475 + kvm_msr_write(vcpu, MSR_IA32_XSS, 0); 12894 12476 } 12895 12477 12896 12478 /* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ ··· 12956 12538 if (init_event) 12957 12539 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); 12958 12540 } 12959 - EXPORT_SYMBOL_GPL(kvm_vcpu_reset); 12541 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_reset); 12960 12542 12961 12543 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) 12962 12544 { ··· 12968 12550 kvm_set_segment(vcpu, &cs, VCPU_SREG_CS); 12969 12551 kvm_rip_write(vcpu, 0); 12970 12552 } 12971 - EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector); 12553 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_deliver_sipi_vector); 12972 12554 12973 12555 void kvm_arch_enable_virtualization(void) 12974 12556 { ··· 13086 12668 { 13087 12669 return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id; 13088 12670 } 13089 - EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp); 12671 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_is_reset_bsp); 13090 12672 13091 12673 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) 13092 12674 { ··· 13250 12832 13251 12833 return (void __user *)hva; 13252 12834 } 13253 - EXPORT_SYMBOL_GPL(__x86_set_memory_region); 12835 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__x86_set_memory_region); 13254 12836 13255 12837 void kvm_arch_pre_destroy_vm(struct kvm *kvm) 13256 12838 { ··· 13658 13240 return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) + 13659 13241 kvm_rip_read(vcpu)); 13660 13242 } 13661 - EXPORT_SYMBOL_GPL(kvm_get_linear_rip); 13243 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip); 13662 13244 13663 13245 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip) 13664 13246 { 13665 13247 return kvm_get_linear_rip(vcpu) == linear_rip; 13666 13248 } 13667 - EXPORT_SYMBOL_GPL(kvm_is_linear_rip); 13249 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip); 13668 13250 13669 13251 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) 13670 13252 { ··· 13675 13257 rflags &= ~X86_EFLAGS_TF; 13676 13258 return rflags; 13677 13259 } 13678 - EXPORT_SYMBOL_GPL(kvm_get_rflags); 13260 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags); 13679 13261 13680 13262 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) 13681 13263 { ··· 13690 13272 __kvm_set_rflags(vcpu, rflags); 13691 13273 kvm_make_request(KVM_REQ_EVENT, vcpu); 13692 13274 } 13693 - EXPORT_SYMBOL_GPL(kvm_set_rflags); 13275 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags); 13694 13276 13695 13277 static inline u32 kvm_async_pf_hash_fn(gfn_t gfn) 13696 13278 { ··· 13922 13504 if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) 13923 13505 kvm_noncoherent_dma_assignment_start_or_stop(kvm); 13924 13506 } 13925 - EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); 13926 13507 13927 13508 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) 13928 13509 { 13929 13510 if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) 13930 13511 kvm_noncoherent_dma_assignment_start_or_stop(kvm); 13931 13512 } 13932 - EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); 13933 13513 13934 13514 bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) 13935 13515 { 13936 13516 return atomic_read(&kvm->arch.noncoherent_dma_count); 13937 13517 } 13938 - EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); 13939 - 13940 - bool kvm_vector_hashing_enabled(void) 13941 - { 13942 - return vector_hashing; 13943 - } 13518 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_arch_has_noncoherent_dma); 13944 13519 13945 13520 bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) 13946 13521 { 13947 13522 return (vcpu->arch.msr_kvm_poll_control & 1) == 0; 13948 13523 } 13949 - EXPORT_SYMBOL_GPL(kvm_arch_no_poll); 13950 13524 13951 13525 #ifdef CONFIG_KVM_GUEST_MEMFD 13952 13526 /* ··· 13989 13579 13990 13580 return ret; 13991 13581 } 13992 - EXPORT_SYMBOL_GPL(kvm_spec_ctrl_test_value); 13582 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value); 13993 13583 13994 13584 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code) 13995 13585 { ··· 14014 13604 } 14015 13605 vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault); 14016 13606 } 14017 - EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error); 13607 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fixup_and_inject_pf_error); 14018 13608 14019 13609 /* 14020 13610 * Handles kvm_read/write_guest_virt*() result and either injects #PF or returns ··· 14043 13633 14044 13634 return 0; 14045 13635 } 14046 - EXPORT_SYMBOL_GPL(kvm_handle_memory_failure); 13636 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_memory_failure); 14047 13637 14048 13638 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva) 14049 13639 { ··· 14107 13697 return 1; 14108 13698 } 14109 13699 } 14110 - EXPORT_SYMBOL_GPL(kvm_handle_invpcid); 13700 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_invpcid); 14111 13701 14112 13702 static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu) 14113 13703 { ··· 14192 13782 14193 13783 return 0; 14194 13784 } 14195 - EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_write); 13785 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_mmio_write); 14196 13786 14197 13787 int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes, 14198 13788 void *data) ··· 14230 13820 14231 13821 return 0; 14232 13822 } 14233 - EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read); 13823 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_mmio_read); 14234 13824 14235 13825 static void advance_sev_es_emulated_pio(struct kvm_vcpu *vcpu, unsigned count, int size) 14236 13826 { ··· 14318 13908 return in ? kvm_sev_es_ins(vcpu, size, port) 14319 13909 : kvm_sev_es_outs(vcpu, size, port); 14320 13910 } 14321 - EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); 13911 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_string_io); 14322 13912 14323 13913 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); 14324 13914 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+40 -2
arch/x86/kvm/x86.h
··· 50 50 u64 efer; 51 51 u64 xcr0; 52 52 u64 xss; 53 + u64 s_cet; 53 54 u64 arch_capabilities; 54 55 }; 55 56 ··· 101 100 #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX 102 101 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX 103 102 #define KVM_SVM_DEFAULT_PLE_WINDOW 3000 103 + 104 + /* 105 + * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves 106 + * are arbitrary and have no meaning, the only requirement is that they don't 107 + * conflict with "real" MSRs that KVM supports. Use values at the upper end 108 + * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values 109 + * will be usable until KVM exhausts its supply of paravirtual MSR indices. 110 + */ 111 + 112 + #define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff 104 113 105 114 static inline unsigned int __grow_ple_window(unsigned int val, 106 115 unsigned int base, unsigned int modifier, unsigned int max) ··· 442 431 443 432 int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); 444 433 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 445 - bool kvm_vector_hashing_enabled(void); 446 434 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code); 447 435 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, 448 436 void *insn, int insn_len); 449 437 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 450 438 int emulation_type, void *insn, int insn_len); 451 - fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); 439 + fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu); 440 + fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 452 441 fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); 442 + fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu); 453 443 454 444 extern struct kvm_caps kvm_caps; 455 445 extern struct kvm_host_values kvm_host; ··· 680 668 __reserved_bits |= X86_CR4_PCIDE; \ 681 669 if (!__cpu_has(__c, X86_FEATURE_LAM)) \ 682 670 __reserved_bits |= X86_CR4_LAM_SUP; \ 671 + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \ 672 + !__cpu_has(__c, X86_FEATURE_IBT)) \ 673 + __reserved_bits |= X86_CR4_CET; \ 683 674 __reserved_bits; \ 684 675 }) 685 676 ··· 714 699 715 700 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); 716 701 702 + #define CET_US_RESERVED_BITS GENMASK(9, 6) 703 + #define CET_US_SHSTK_MASK_BITS GENMASK(1, 0) 704 + #define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10)) 705 + #define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12) 706 + 707 + static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data) 708 + { 709 + if (data & CET_US_RESERVED_BITS) 710 + return false; 711 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 712 + (data & CET_US_SHSTK_MASK_BITS)) 713 + return false; 714 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) && 715 + (data & CET_US_IBT_MASK_BITS)) 716 + return false; 717 + if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4)) 718 + return false; 719 + /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */ 720 + if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR)) 721 + return false; 722 + 723 + return true; 724 + } 717 725 #endif
+1 -1
drivers/s390/crypto/vfio_ap_ops.c
··· 354 354 355 355 if (!*nib) 356 356 return -EINVAL; 357 - if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT))) 357 + if (!kvm_s390_is_gpa_in_memslot(vcpu->kvm, *nib)) 358 358 return -EINVAL; 359 359 360 360 return 0;
+18 -7
include/linux/kvm_types.h
··· 3 3 #ifndef __KVM_TYPES_H__ 4 4 #define __KVM_TYPES_H__ 5 5 6 + #include <linux/bits.h> 7 + #include <linux/export.h> 8 + #include <linux/types.h> 9 + #include <asm/kvm_types.h> 10 + 11 + #ifdef KVM_SUB_MODULES 12 + #define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol) \ 13 + EXPORT_SYMBOL_FOR_MODULES(symbol, __stringify(KVM_SUB_MODULES)) 14 + #else 15 + #define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol) 16 + #endif 17 + 18 + #ifndef __ASSEMBLER__ 19 + 20 + #include <linux/mutex.h> 21 + #include <linux/spinlock_types.h> 22 + 6 23 struct kvm; 7 24 struct kvm_async_pf; 8 25 struct kvm_device_ops; ··· 35 18 struct kvm_memslots; 36 19 37 20 enum kvm_mr_change; 38 - 39 - #include <linux/bits.h> 40 - #include <linux/mutex.h> 41 - #include <linux/types.h> 42 - #include <linux/spinlock_types.h> 43 - 44 - #include <asm/kvm_types.h> 45 21 46 22 /* 47 23 * Address types: ··· 126 116 }; 127 117 128 118 #define KVM_STATS_NAME_SIZE 48 119 + #endif /* !__ASSEMBLER__ */ 129 120 130 121 #endif /* __KVM_TYPES_H__ */
+1
tools/testing/selftests/kvm/Makefile.kvm
··· 87 87 TEST_GEN_PROGS_x86 += x86/kvm_pv_test 88 88 TEST_GEN_PROGS_x86 += x86/kvm_buslock_test 89 89 TEST_GEN_PROGS_x86 += x86/monitor_mwait_test 90 + TEST_GEN_PROGS_x86 += x86/msrs_test 90 91 TEST_GEN_PROGS_x86 += x86/nested_emulation_test 91 92 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test 92 93 TEST_GEN_PROGS_x86 += x86/platform_info_test
+5
tools/testing/selftests/kvm/include/x86/processor.h
··· 1362 1362 return get_kvm_intel_param_bool("unrestricted_guest"); 1363 1363 } 1364 1364 1365 + static inline bool kvm_is_ignore_msrs(void) 1366 + { 1367 + return get_kvm_param_bool("ignore_msrs"); 1368 + } 1369 + 1365 1370 uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr, 1366 1371 int *level); 1367 1372 uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
+489
tools/testing/selftests/kvm/x86/msrs_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <asm/msr-index.h> 3 + 4 + #include <stdint.h> 5 + 6 + #include "kvm_util.h" 7 + #include "processor.h" 8 + 9 + /* Use HYPERVISOR for MSRs that are emulated unconditionally (as is HYPERVISOR). */ 10 + #define X86_FEATURE_NONE X86_FEATURE_HYPERVISOR 11 + 12 + struct kvm_msr { 13 + const struct kvm_x86_cpu_feature feature; 14 + const struct kvm_x86_cpu_feature feature2; 15 + const char *name; 16 + const u64 reset_val; 17 + const u64 write_val; 18 + const u64 rsvd_val; 19 + const u32 index; 20 + const bool is_kvm_defined; 21 + }; 22 + 23 + #define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2, is_kvm) \ 24 + { \ 25 + .index = msr, \ 26 + .name = str, \ 27 + .write_val = val, \ 28 + .rsvd_val = rsvd, \ 29 + .reset_val = reset, \ 30 + .feature = X86_FEATURE_ ##feat, \ 31 + .feature2 = X86_FEATURE_ ##f2, \ 32 + .is_kvm_defined = is_kvm, \ 33 + } 34 + 35 + #define __MSR_TEST(msr, str, val, rsvd, reset, feat) \ 36 + ____MSR_TEST(msr, str, val, rsvd, reset, feat, feat, false) 37 + 38 + #define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat) \ 39 + __MSR_TEST(msr, #msr, val, rsvd, reset, feat) 40 + 41 + #define MSR_TEST(msr, val, rsvd, feat) \ 42 + __MSR_TEST(msr, #msr, val, rsvd, 0, feat) 43 + 44 + #define MSR_TEST2(msr, val, rsvd, feat, f2) \ 45 + ____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2, false) 46 + 47 + /* 48 + * Note, use a page aligned value for the canonical value so that the value 49 + * is compatible with MSRs that use bits 11:0 for things other than addresses. 50 + */ 51 + static const u64 canonical_val = 0x123456789000ull; 52 + 53 + /* 54 + * Arbitrary value with bits set in every byte, but not all bits set. This is 55 + * also a non-canonical value, but that's coincidental (any 64-bit value with 56 + * an alternating 0s/1s pattern will be non-canonical). 57 + */ 58 + static const u64 u64_val = 0xaaaa5555aaaa5555ull; 59 + 60 + #define MSR_TEST_CANONICAL(msr, feat) \ 61 + __MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat) 62 + 63 + #define MSR_TEST_KVM(msr, val, rsvd, feat) \ 64 + ____MSR_TEST(KVM_REG_ ##msr, #msr, val, rsvd, 0, feat, feat, true) 65 + 66 + /* 67 + * The main struct must be scoped to a function due to the use of structures to 68 + * define features. For the global structure, allocate enough space for the 69 + * foreseeable future without getting too ridiculous, to minimize maintenance 70 + * costs (bumping the array size every time an MSR is added is really annoying). 71 + */ 72 + static struct kvm_msr msrs[128]; 73 + static int idx; 74 + 75 + static bool ignore_unsupported_msrs; 76 + 77 + static u64 fixup_rdmsr_val(u32 msr, u64 want) 78 + { 79 + /* 80 + * AMD CPUs drop bits 63:32 on some MSRs that Intel CPUs support. KVM 81 + * is supposed to emulate that behavior based on guest vendor model 82 + * (which is the same as the host vendor model for this test). 83 + */ 84 + if (!host_cpu_is_amd) 85 + return want; 86 + 87 + switch (msr) { 88 + case MSR_IA32_SYSENTER_ESP: 89 + case MSR_IA32_SYSENTER_EIP: 90 + case MSR_TSC_AUX: 91 + return want & GENMASK_ULL(31, 0); 92 + default: 93 + return want; 94 + } 95 + } 96 + 97 + static void __rdmsr(u32 msr, u64 want) 98 + { 99 + u64 val; 100 + u8 vec; 101 + 102 + vec = rdmsr_safe(msr, &val); 103 + __GUEST_ASSERT(!vec, "Unexpected %s on RDMSR(0x%x)", ex_str(vec), msr); 104 + 105 + __GUEST_ASSERT(val == want, "Wanted 0x%lx from RDMSR(0x%x), got 0x%lx", 106 + want, msr, val); 107 + } 108 + 109 + static void __wrmsr(u32 msr, u64 val) 110 + { 111 + u8 vec; 112 + 113 + vec = wrmsr_safe(msr, val); 114 + __GUEST_ASSERT(!vec, "Unexpected %s on WRMSR(0x%x, 0x%lx)", 115 + ex_str(vec), msr, val); 116 + __rdmsr(msr, fixup_rdmsr_val(msr, val)); 117 + } 118 + 119 + static void guest_test_supported_msr(const struct kvm_msr *msr) 120 + { 121 + __rdmsr(msr->index, msr->reset_val); 122 + __wrmsr(msr->index, msr->write_val); 123 + GUEST_SYNC(fixup_rdmsr_val(msr->index, msr->write_val)); 124 + 125 + __rdmsr(msr->index, msr->reset_val); 126 + } 127 + 128 + static void guest_test_unsupported_msr(const struct kvm_msr *msr) 129 + { 130 + u64 val; 131 + u8 vec; 132 + 133 + /* 134 + * KVM's ABI with respect to ignore_msrs is a mess and largely beyond 135 + * repair, just skip the unsupported MSR tests. 136 + */ 137 + if (ignore_unsupported_msrs) 138 + goto skip_wrmsr_gp; 139 + 140 + /* 141 + * {S,U}_CET exist if IBT or SHSTK is supported, but with bits that are 142 + * writable only if their associated feature is supported. Skip the 143 + * RDMSR #GP test if the secondary feature is supported, but perform 144 + * the WRMSR #GP test as the to-be-written value is tied to the primary 145 + * feature. For all other MSRs, simply do nothing. 146 + */ 147 + if (this_cpu_has(msr->feature2)) { 148 + if (msr->index != MSR_IA32_U_CET && 149 + msr->index != MSR_IA32_S_CET) 150 + goto skip_wrmsr_gp; 151 + 152 + goto skip_rdmsr_gp; 153 + } 154 + 155 + vec = rdmsr_safe(msr->index, &val); 156 + __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s", 157 + msr->index, ex_str(vec)); 158 + 159 + skip_rdmsr_gp: 160 + vec = wrmsr_safe(msr->index, msr->write_val); 161 + __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s", 162 + msr->index, msr->write_val, ex_str(vec)); 163 + 164 + skip_wrmsr_gp: 165 + GUEST_SYNC(0); 166 + } 167 + 168 + void guest_test_reserved_val(const struct kvm_msr *msr) 169 + { 170 + /* Skip reserved value checks as well, ignore_msrs is trully a mess. */ 171 + if (ignore_unsupported_msrs) 172 + return; 173 + 174 + /* 175 + * If the CPU will truncate the written value (e.g. SYSENTER on AMD), 176 + * expect success and a truncated value, not #GP. 177 + */ 178 + if (!this_cpu_has(msr->feature) || 179 + msr->rsvd_val == fixup_rdmsr_val(msr->index, msr->rsvd_val)) { 180 + u8 vec = wrmsr_safe(msr->index, msr->rsvd_val); 181 + 182 + __GUEST_ASSERT(vec == GP_VECTOR, 183 + "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s", 184 + msr->index, msr->rsvd_val, ex_str(vec)); 185 + } else { 186 + __wrmsr(msr->index, msr->rsvd_val); 187 + __wrmsr(msr->index, msr->reset_val); 188 + } 189 + } 190 + 191 + static void guest_main(void) 192 + { 193 + for (;;) { 194 + const struct kvm_msr *msr = &msrs[READ_ONCE(idx)]; 195 + 196 + if (this_cpu_has(msr->feature)) 197 + guest_test_supported_msr(msr); 198 + else 199 + guest_test_unsupported_msr(msr); 200 + 201 + if (msr->rsvd_val) 202 + guest_test_reserved_val(msr); 203 + 204 + GUEST_SYNC(msr->reset_val); 205 + } 206 + } 207 + 208 + static bool has_one_reg; 209 + static bool use_one_reg; 210 + 211 + #define KVM_X86_MAX_NR_REGS 1 212 + 213 + static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg) 214 + { 215 + struct { 216 + struct kvm_reg_list list; 217 + u64 regs[KVM_X86_MAX_NR_REGS]; 218 + } regs = {}; 219 + int r, i; 220 + 221 + /* 222 + * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported 223 + * regs, then the vCPU obviously doesn't support the reg. 224 + */ 225 + r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list); 226 + if (!r) 227 + return false; 228 + 229 + TEST_ASSERT_EQ(errno, E2BIG); 230 + 231 + /* 232 + * KVM x86 is expected to support enumerating a relative small number 233 + * of regs. The majority of registers supported by KVM_{G,S}ET_ONE_REG 234 + * are enumerated via other ioctls, e.g. KVM_GET_MSR_INDEX_LIST. For 235 + * simplicity, hardcode the maximum number of regs and manually update 236 + * the test as necessary. 237 + */ 238 + TEST_ASSERT(regs.list.n <= KVM_X86_MAX_NR_REGS, 239 + "KVM reports %llu regs, test expects at most %u regs, stale test?", 240 + regs.list.n, KVM_X86_MAX_NR_REGS); 241 + 242 + vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list); 243 + for (i = 0; i < regs.list.n; i++) { 244 + if (regs.regs[i] == reg) 245 + return true; 246 + } 247 + 248 + return false; 249 + } 250 + 251 + static void host_test_kvm_reg(struct kvm_vcpu *vcpu) 252 + { 253 + bool has_reg = vcpu_cpuid_has(vcpu, msrs[idx].feature); 254 + u64 reset_val = msrs[idx].reset_val; 255 + u64 write_val = msrs[idx].write_val; 256 + u64 rsvd_val = msrs[idx].rsvd_val; 257 + u32 reg = msrs[idx].index; 258 + u64 val; 259 + int r; 260 + 261 + if (!use_one_reg) 262 + return; 263 + 264 + TEST_ASSERT_EQ(vcpu_has_reg(vcpu, KVM_X86_REG_KVM(reg)), has_reg); 265 + 266 + if (!has_reg) { 267 + r = __vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg), &val); 268 + TEST_ASSERT(r && errno == EINVAL, 269 + "Expected failure on get_reg(0x%x)", reg); 270 + rsvd_val = 0; 271 + goto out; 272 + } 273 + 274 + val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg)); 275 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 276 + reset_val, reg, val); 277 + 278 + vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), write_val); 279 + val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg)); 280 + TEST_ASSERT(val == write_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 281 + write_val, reg, val); 282 + 283 + out: 284 + r = __vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), rsvd_val); 285 + TEST_ASSERT(r, "Expected failure on set_reg(0x%x, 0x%lx)", reg, rsvd_val); 286 + } 287 + 288 + static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val) 289 + { 290 + u64 reset_val = msrs[idx].reset_val; 291 + u32 msr = msrs[idx].index; 292 + u64 val; 293 + 294 + if (!kvm_cpu_has(msrs[idx].feature)) 295 + return; 296 + 297 + val = vcpu_get_msr(vcpu, msr); 298 + TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx", 299 + guest_val, msr, val); 300 + 301 + if (use_one_reg) 302 + vcpu_set_reg(vcpu, KVM_X86_REG_MSR(msr), reset_val); 303 + else 304 + vcpu_set_msr(vcpu, msr, reset_val); 305 + 306 + val = vcpu_get_msr(vcpu, msr); 307 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx", 308 + reset_val, msr, val); 309 + 310 + if (!has_one_reg) 311 + return; 312 + 313 + val = vcpu_get_reg(vcpu, KVM_X86_REG_MSR(msr)); 314 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 315 + reset_val, msr, val); 316 + } 317 + 318 + static void do_vcpu_run(struct kvm_vcpu *vcpu) 319 + { 320 + struct ucall uc; 321 + 322 + for (;;) { 323 + vcpu_run(vcpu); 324 + 325 + switch (get_ucall(vcpu, &uc)) { 326 + case UCALL_SYNC: 327 + host_test_msr(vcpu, uc.args[1]); 328 + return; 329 + case UCALL_PRINTF: 330 + pr_info("%s", uc.buffer); 331 + break; 332 + case UCALL_ABORT: 333 + REPORT_GUEST_ASSERT(uc); 334 + case UCALL_DONE: 335 + TEST_FAIL("Unexpected UCALL_DONE"); 336 + default: 337 + TEST_FAIL("Unexpected ucall: %lu", uc.cmd); 338 + } 339 + } 340 + } 341 + 342 + static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS) 343 + { 344 + int i; 345 + 346 + for (i = 0; i < NR_VCPUS; i++) 347 + do_vcpu_run(vcpus[i]); 348 + } 349 + 350 + #define MISC_ENABLES_RESET_VAL (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | MSR_IA32_MISC_ENABLE_BTS_UNAVAIL) 351 + 352 + static void test_msrs(void) 353 + { 354 + const struct kvm_msr __msrs[] = { 355 + MSR_TEST_NON_ZERO(MSR_IA32_MISC_ENABLE, 356 + MISC_ENABLES_RESET_VAL | MSR_IA32_MISC_ENABLE_FAST_STRING, 357 + MSR_IA32_MISC_ENABLE_FAST_STRING, MISC_ENABLES_RESET_VAL, NONE), 358 + MSR_TEST_NON_ZERO(MSR_IA32_CR_PAT, 0x07070707, 0, 0x7040600070406, NONE), 359 + 360 + /* 361 + * TSC_AUX is supported if RDTSCP *or* RDPID is supported. Add 362 + * entries for each features so that TSC_AUX doesn't exists for 363 + * the "unsupported" vCPU, and obviously to test both cases. 364 + */ 365 + MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDTSCP, RDPID), 366 + MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDPID, RDTSCP), 367 + 368 + MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE), 369 + /* 370 + * SYSENTER_{ESP,EIP} are technically non-canonical on Intel, 371 + * but KVM doesn't emulate that behavior on emulated writes, 372 + * i.e. this test will observe different behavior if the MSR 373 + * writes are handed by hardware vs. KVM. KVM's behavior is 374 + * intended (though far from ideal), so don't bother testing 375 + * non-canonical values. 376 + */ 377 + MSR_TEST(MSR_IA32_SYSENTER_ESP, canonical_val, 0, NONE), 378 + MSR_TEST(MSR_IA32_SYSENTER_EIP, canonical_val, 0, NONE), 379 + 380 + MSR_TEST_CANONICAL(MSR_FS_BASE, LM), 381 + MSR_TEST_CANONICAL(MSR_GS_BASE, LM), 382 + MSR_TEST_CANONICAL(MSR_KERNEL_GS_BASE, LM), 383 + MSR_TEST_CANONICAL(MSR_LSTAR, LM), 384 + MSR_TEST_CANONICAL(MSR_CSTAR, LM), 385 + MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM), 386 + 387 + MSR_TEST2(MSR_IA32_S_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT), 388 + MSR_TEST2(MSR_IA32_S_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK), 389 + MSR_TEST2(MSR_IA32_U_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT), 390 + MSR_TEST2(MSR_IA32_U_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK), 391 + MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK), 392 + MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK), 393 + MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK), 394 + MSR_TEST(MSR_IA32_PL1_SSP, canonical_val, canonical_val | 1, SHSTK), 395 + MSR_TEST_CANONICAL(MSR_IA32_PL2_SSP, SHSTK), 396 + MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK), 397 + MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK), 398 + MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK), 399 + 400 + MSR_TEST_KVM(GUEST_SSP, canonical_val, NONCANONICAL, SHSTK), 401 + }; 402 + 403 + const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE; 404 + const struct kvm_x86_cpu_feature feat_lm = X86_FEATURE_LM; 405 + 406 + /* 407 + * Create three vCPUs, but run them on the same task, to validate KVM's 408 + * context switching of MSR state. Don't pin the task to a pCPU to 409 + * also validate KVM's handling of cross-pCPU migration. Use the full 410 + * set of features for the first two vCPUs, but clear all features in 411 + * third vCPU in order to test both positive and negative paths. 412 + */ 413 + const int NR_VCPUS = 3; 414 + struct kvm_vcpu *vcpus[NR_VCPUS]; 415 + struct kvm_vm *vm; 416 + int i; 417 + 418 + kvm_static_assert(sizeof(__msrs) <= sizeof(msrs)); 419 + kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs)); 420 + memcpy(msrs, __msrs, sizeof(__msrs)); 421 + 422 + ignore_unsupported_msrs = kvm_is_ignore_msrs(); 423 + 424 + vm = vm_create_with_vcpus(NR_VCPUS, guest_main, vcpus); 425 + 426 + sync_global_to_guest(vm, msrs); 427 + sync_global_to_guest(vm, ignore_unsupported_msrs); 428 + 429 + /* 430 + * Clear features in the "unsupported features" vCPU. This needs to be 431 + * done before the first vCPU run as KVM's ABI is that guest CPUID is 432 + * immutable once the vCPU has been run. 433 + */ 434 + for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) { 435 + /* 436 + * Don't clear LM; selftests are 64-bit only, and KVM doesn't 437 + * honor LM=0 for MSRs that are supposed to exist if and only 438 + * if the vCPU is a 64-bit model. Ditto for NONE; clearing a 439 + * fake feature flag will result in false failures. 440 + */ 441 + if (memcmp(&msrs[idx].feature, &feat_lm, sizeof(feat_lm)) && 442 + memcmp(&msrs[idx].feature, &feat_none, sizeof(feat_none))) 443 + vcpu_clear_cpuid_feature(vcpus[2], msrs[idx].feature); 444 + } 445 + 446 + for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) { 447 + struct kvm_msr *msr = &msrs[idx]; 448 + 449 + if (msr->is_kvm_defined) { 450 + for (i = 0; i < NR_VCPUS; i++) 451 + host_test_kvm_reg(vcpus[i]); 452 + continue; 453 + } 454 + 455 + /* 456 + * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST 457 + * are consistent with respect to MSRs whose existence is 458 + * enumerated via CPUID. Skip the check for FS/GS.base MSRs, 459 + * as they aren't reported in the save/restore list since their 460 + * state is managed via SREGS. 461 + */ 462 + TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE || 463 + kvm_msr_is_in_save_restore_list(msr->index) == 464 + (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)), 465 + "%s %s in save/restore list, but %s according to CPUID", msr->name, 466 + kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't", 467 + (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ? 468 + "supported" : "unsupported"); 469 + 470 + sync_global_to_guest(vm, idx); 471 + 472 + vcpus_run(vcpus, NR_VCPUS); 473 + vcpus_run(vcpus, NR_VCPUS); 474 + } 475 + 476 + kvm_vm_free(vm); 477 + } 478 + 479 + int main(void) 480 + { 481 + has_one_reg = kvm_has_cap(KVM_CAP_ONE_REG); 482 + 483 + test_msrs(); 484 + 485 + if (has_one_reg) { 486 + use_one_reg = true; 487 + test_msrs(); 488 + } 489 + }
+5 -3
tools/testing/selftests/kvm/x86/pmu_counters_test.c
··· 14 14 #define NUM_BRANCH_INSNS_RETIRED (NUM_LOOPS) 15 15 16 16 /* 17 - * Number of instructions in each loop. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 18 - * 1 LOOP. 17 + * Number of instructions in each loop. 1 ENTER, 1 CLFLUSH/CLFLUSHOPT/NOP, 18 + * 1 MFENCE, 1 MOV, 1 LEAVE, 1 LOOP. 19 19 */ 20 - #define NUM_INSNS_PER_LOOP 4 20 + #define NUM_INSNS_PER_LOOP 6 21 21 22 22 /* 23 23 * Number of "extra" instructions that will be counted, i.e. the number of ··· 226 226 __asm__ __volatile__("wrmsr\n\t" \ 227 227 " mov $" __stringify(NUM_LOOPS) ", %%ecx\n\t" \ 228 228 "1:\n\t" \ 229 + FEP "enter $0, $0\n\t" \ 229 230 clflush "\n\t" \ 230 231 "mfence\n\t" \ 231 232 "mov %[m], %%eax\n\t" \ 233 + FEP "leave\n\t" \ 232 234 FEP "loop 1b\n\t" \ 233 235 FEP "mov %%edi, %%ecx\n\t" \ 234 236 FEP "xor %%eax, %%eax\n\t" \
+1 -1
virt/kvm/eventfd.c
··· 525 525 526 526 return false; 527 527 } 528 - EXPORT_SYMBOL_GPL(kvm_irq_has_notifier); 528 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_irq_has_notifier); 529 529 530 530 void kvm_notify_acked_gsi(struct kvm *kvm, int gsi) 531 531 {
+4 -3
virt/kvm/guest_memfd.c
··· 702 702 fput(file); 703 703 return r; 704 704 } 705 - EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); 705 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn); 706 706 707 707 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE 708 708 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages, ··· 716 716 long i; 717 717 718 718 lockdep_assert_held(&kvm->slots_lock); 719 - if (npages < 0) 719 + 720 + if (WARN_ON_ONCE(npages <= 0)) 720 721 return -EINVAL; 721 722 722 723 slot = gfn_to_memslot(kvm, start_gfn); ··· 785 784 fput(file); 786 785 return ret && !i ? ret : i; 787 786 } 788 - EXPORT_SYMBOL_GPL(kvm_gmem_populate); 787 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_populate); 789 788 #endif
+64 -63
virt/kvm/kvm_main.c
··· 77 77 /* Architectures should define their poll value according to the halt latency */ 78 78 unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT; 79 79 module_param(halt_poll_ns, uint, 0644); 80 - EXPORT_SYMBOL_GPL(halt_poll_ns); 80 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns); 81 81 82 82 /* Default doubles per-vcpu halt_poll_ns. */ 83 83 unsigned int halt_poll_ns_grow = 2; 84 84 module_param(halt_poll_ns_grow, uint, 0644); 85 - EXPORT_SYMBOL_GPL(halt_poll_ns_grow); 85 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_grow); 86 86 87 87 /* The start value to grow halt_poll_ns from */ 88 88 unsigned int halt_poll_ns_grow_start = 10000; /* 10us */ 89 89 module_param(halt_poll_ns_grow_start, uint, 0644); 90 - EXPORT_SYMBOL_GPL(halt_poll_ns_grow_start); 90 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_grow_start); 91 91 92 92 /* Default halves per-vcpu halt_poll_ns. */ 93 93 unsigned int halt_poll_ns_shrink = 2; 94 94 module_param(halt_poll_ns_shrink, uint, 0644); 95 - EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); 95 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink); 96 96 97 97 /* 98 98 * Allow direct access (from KVM or the CPU) without MMU notifier protection ··· 170 170 kvm_arch_vcpu_load(vcpu, cpu); 171 171 put_cpu(); 172 172 } 173 - EXPORT_SYMBOL_GPL(vcpu_load); 173 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(vcpu_load); 174 174 175 175 void vcpu_put(struct kvm_vcpu *vcpu) 176 176 { ··· 180 180 __this_cpu_write(kvm_running_vcpu, NULL); 181 181 preempt_enable(); 182 182 } 183 - EXPORT_SYMBOL_GPL(vcpu_put); 183 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(vcpu_put); 184 184 185 185 /* TODO: merge with kvm_arch_vcpu_should_kick */ 186 186 static bool kvm_request_needs_ipi(struct kvm_vcpu *vcpu, unsigned req) ··· 288 288 289 289 return called; 290 290 } 291 - EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request); 291 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_make_all_cpus_request); 292 292 293 293 void kvm_flush_remote_tlbs(struct kvm *kvm) 294 294 { ··· 309 309 || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) 310 310 ++kvm->stat.generic.remote_tlb_flush; 311 311 } 312 - EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs); 312 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_flush_remote_tlbs); 313 313 314 314 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages) 315 315 { ··· 499 499 500 500 atomic_set(&kvm->online_vcpus, 0); 501 501 } 502 - EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); 502 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_destroy_vcpus); 503 503 504 504 #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER 505 505 static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) ··· 1365 1365 { 1366 1366 WARN_ON(refcount_dec_and_test(&kvm->users_count)); 1367 1367 } 1368 - EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); 1368 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_put_kvm_no_destroy); 1369 1369 1370 1370 static int kvm_vm_release(struct inode *inode, struct file *filp) 1371 1371 { ··· 1397 1397 } 1398 1398 return -EINTR; 1399 1399 } 1400 - EXPORT_SYMBOL_GPL(kvm_trylock_all_vcpus); 1400 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_trylock_all_vcpus); 1401 1401 1402 1402 int kvm_lock_all_vcpus(struct kvm *kvm) 1403 1403 { ··· 1422 1422 } 1423 1423 return r; 1424 1424 } 1425 - EXPORT_SYMBOL_GPL(kvm_lock_all_vcpus); 1425 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lock_all_vcpus); 1426 1426 1427 1427 void kvm_unlock_all_vcpus(struct kvm *kvm) 1428 1428 { ··· 1434 1434 kvm_for_each_vcpu(i, vcpu, kvm) 1435 1435 mutex_unlock(&vcpu->mutex); 1436 1436 } 1437 - EXPORT_SYMBOL_GPL(kvm_unlock_all_vcpus); 1437 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_unlock_all_vcpus); 1438 1438 1439 1439 /* 1440 1440 * Allocation size is twice as large as the actual dirty bitmap size. ··· 2142 2142 2143 2143 return kvm_set_memory_region(kvm, mem); 2144 2144 } 2145 - EXPORT_SYMBOL_GPL(kvm_set_internal_memslot); 2145 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_internal_memslot); 2146 2146 2147 2147 static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, 2148 2148 struct kvm_userspace_memory_region2 *mem) ··· 2201 2201 *is_dirty = 1; 2202 2202 return 0; 2203 2203 } 2204 - EXPORT_SYMBOL_GPL(kvm_get_dirty_log); 2204 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dirty_log); 2205 2205 2206 2206 #else /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ 2207 2207 /** ··· 2636 2636 { 2637 2637 return __gfn_to_memslot(kvm_memslots(kvm), gfn); 2638 2638 } 2639 - EXPORT_SYMBOL_GPL(gfn_to_memslot); 2639 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_memslot); 2640 2640 2641 2641 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn) 2642 2642 { ··· 2670 2670 2671 2671 return NULL; 2672 2672 } 2673 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_gfn_to_memslot); 2673 2674 2674 2675 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) 2675 2676 { ··· 2678 2677 2679 2678 return kvm_is_visible_memslot(memslot); 2680 2679 } 2681 - EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); 2680 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_visible_gfn); 2682 2681 2683 2682 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) 2684 2683 { ··· 2686 2685 2687 2686 return kvm_is_visible_memslot(memslot); 2688 2687 } 2689 - EXPORT_SYMBOL_GPL(kvm_vcpu_is_visible_gfn); 2688 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_is_visible_gfn); 2690 2689 2691 2690 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) 2692 2691 { ··· 2743 2742 { 2744 2743 return gfn_to_hva_many(slot, gfn, NULL); 2745 2744 } 2746 - EXPORT_SYMBOL_GPL(gfn_to_hva_memslot); 2745 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_hva_memslot); 2747 2746 2748 2747 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) 2749 2748 { 2750 2749 return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL); 2751 2750 } 2752 - EXPORT_SYMBOL_GPL(gfn_to_hva); 2751 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_hva); 2753 2752 2754 2753 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn) 2755 2754 { 2756 2755 return gfn_to_hva_many(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, NULL); 2757 2756 } 2758 - EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_hva); 2757 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_gfn_to_hva); 2759 2758 2760 2759 /* 2761 2760 * Return the hva of a @gfn and the R/W attribute if possible. ··· 2819 2818 kvm_set_page_accessed(page); 2820 2819 put_page(page); 2821 2820 } 2822 - EXPORT_SYMBOL_GPL(kvm_release_page_clean); 2821 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_release_page_clean); 2823 2822 2824 2823 void kvm_release_page_dirty(struct page *page) 2825 2824 { ··· 2829 2828 kvm_set_page_dirty(page); 2830 2829 kvm_release_page_clean(page); 2831 2830 } 2832 - EXPORT_SYMBOL_GPL(kvm_release_page_dirty); 2831 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_release_page_dirty); 2833 2832 2834 2833 static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, 2835 2834 struct follow_pfnmap_args *map, bool writable) ··· 3073 3072 3074 3073 return kvm_follow_pfn(&kfp); 3075 3074 } 3076 - EXPORT_SYMBOL_GPL(__kvm_faultin_pfn); 3075 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_faultin_pfn); 3077 3076 3078 3077 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn, 3079 3078 struct page **pages, int nr_pages) ··· 3090 3089 3091 3090 return get_user_pages_fast_only(addr, nr_pages, FOLL_WRITE, pages); 3092 3091 } 3093 - EXPORT_SYMBOL_GPL(kvm_prefetch_pages); 3092 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prefetch_pages); 3094 3093 3095 3094 /* 3096 3095 * Don't use this API unless you are absolutely, positively certain that KVM ··· 3112 3111 (void)kvm_follow_pfn(&kfp); 3113 3112 return refcounted_page; 3114 3113 } 3115 - EXPORT_SYMBOL_GPL(__gfn_to_page); 3114 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__gfn_to_page); 3116 3115 3117 3116 int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map, 3118 3117 bool writable) ··· 3146 3145 3147 3146 return map->hva ? 0 : -EFAULT; 3148 3147 } 3149 - EXPORT_SYMBOL_GPL(__kvm_vcpu_map); 3148 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_map); 3150 3149 3151 3150 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map) 3152 3151 { ··· 3174 3173 map->page = NULL; 3175 3174 map->pinned_page = NULL; 3176 3175 } 3177 - EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); 3176 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_unmap); 3178 3177 3179 3178 static int next_segment(unsigned long len, int offset) 3180 3179 { ··· 3210 3209 3211 3210 return __kvm_read_guest_page(slot, gfn, data, offset, len); 3212 3211 } 3213 - EXPORT_SYMBOL_GPL(kvm_read_guest_page); 3212 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_page); 3214 3213 3215 3214 int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, 3216 3215 int offset, int len) ··· 3219 3218 3220 3219 return __kvm_read_guest_page(slot, gfn, data, offset, len); 3221 3220 } 3222 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); 3221 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest_page); 3223 3222 3224 3223 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len) 3225 3224 { ··· 3239 3238 } 3240 3239 return 0; 3241 3240 } 3242 - EXPORT_SYMBOL_GPL(kvm_read_guest); 3241 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest); 3243 3242 3244 3243 int kvm_vcpu_read_guest(struct kvm_vcpu *vcpu, gpa_t gpa, void *data, unsigned long len) 3245 3244 { ··· 3259 3258 } 3260 3259 return 0; 3261 3260 } 3262 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest); 3261 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest); 3263 3262 3264 3263 static int __kvm_read_guest_atomic(struct kvm_memory_slot *slot, gfn_t gfn, 3265 3264 void *data, int offset, unsigned long len) ··· 3290 3289 3291 3290 return __kvm_read_guest_atomic(slot, gfn, data, offset, len); 3292 3291 } 3293 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); 3292 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest_atomic); 3294 3293 3295 3294 /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ 3296 3295 static int __kvm_write_guest_page(struct kvm *kvm, ··· 3320 3319 3321 3320 return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); 3322 3321 } 3323 - EXPORT_SYMBOL_GPL(kvm_write_guest_page); 3322 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_page); 3324 3323 3325 3324 int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, 3326 3325 const void *data, int offset, int len) ··· 3329 3328 3330 3329 return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); 3331 3330 } 3332 - EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); 3331 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_write_guest_page); 3333 3332 3334 3333 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, 3335 3334 unsigned long len) ··· 3350 3349 } 3351 3350 return 0; 3352 3351 } 3353 - EXPORT_SYMBOL_GPL(kvm_write_guest); 3352 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest); 3354 3353 3355 3354 int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data, 3356 3355 unsigned long len) ··· 3371 3370 } 3372 3371 return 0; 3373 3372 } 3374 - EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest); 3373 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_write_guest); 3375 3374 3376 3375 static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots, 3377 3376 struct gfn_to_hva_cache *ghc, ··· 3420 3419 struct kvm_memslots *slots = kvm_memslots(kvm); 3421 3420 return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); 3422 3421 } 3423 - EXPORT_SYMBOL_GPL(kvm_gfn_to_hva_cache_init); 3422 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gfn_to_hva_cache_init); 3424 3423 3425 3424 int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3426 3425 void *data, unsigned int offset, ··· 3451 3450 3452 3451 return 0; 3453 3452 } 3454 - EXPORT_SYMBOL_GPL(kvm_write_guest_offset_cached); 3453 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_offset_cached); 3455 3454 3456 3455 int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3457 3456 void *data, unsigned long len) 3458 3457 { 3459 3458 return kvm_write_guest_offset_cached(kvm, ghc, data, 0, len); 3460 3459 } 3461 - EXPORT_SYMBOL_GPL(kvm_write_guest_cached); 3460 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_cached); 3462 3461 3463 3462 int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3464 3463 void *data, unsigned int offset, ··· 3488 3487 3489 3488 return 0; 3490 3489 } 3491 - EXPORT_SYMBOL_GPL(kvm_read_guest_offset_cached); 3490 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_offset_cached); 3492 3491 3493 3492 int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3494 3493 void *data, unsigned long len) 3495 3494 { 3496 3495 return kvm_read_guest_offset_cached(kvm, ghc, data, 0, len); 3497 3496 } 3498 - EXPORT_SYMBOL_GPL(kvm_read_guest_cached); 3497 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_cached); 3499 3498 3500 3499 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) 3501 3500 { ··· 3515 3514 } 3516 3515 return 0; 3517 3516 } 3518 - EXPORT_SYMBOL_GPL(kvm_clear_guest); 3517 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_clear_guest); 3519 3518 3520 3519 void mark_page_dirty_in_slot(struct kvm *kvm, 3521 3520 const struct kvm_memory_slot *memslot, ··· 3540 3539 set_bit_le(rel_gfn, memslot->dirty_bitmap); 3541 3540 } 3542 3541 } 3543 - EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot); 3542 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(mark_page_dirty_in_slot); 3544 3543 3545 3544 void mark_page_dirty(struct kvm *kvm, gfn_t gfn) 3546 3545 { ··· 3549 3548 memslot = gfn_to_memslot(kvm, gfn); 3550 3549 mark_page_dirty_in_slot(kvm, memslot, gfn); 3551 3550 } 3552 - EXPORT_SYMBOL_GPL(mark_page_dirty); 3551 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(mark_page_dirty); 3553 3552 3554 3553 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) 3555 3554 { ··· 3558 3557 memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 3559 3558 mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn); 3560 3559 } 3561 - EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); 3560 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_mark_page_dirty); 3562 3561 3563 3562 void kvm_sigset_activate(struct kvm_vcpu *vcpu) 3564 3563 { ··· 3795 3794 3796 3795 trace_kvm_vcpu_wakeup(halt_ns, waited, vcpu_valid_wakeup(vcpu)); 3797 3796 } 3798 - EXPORT_SYMBOL_GPL(kvm_vcpu_halt); 3797 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt); 3799 3798 3800 3799 bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu) 3801 3800 { ··· 3807 3806 3808 3807 return false; 3809 3808 } 3810 - EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up); 3809 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_wake_up); 3811 3810 3812 3811 #ifndef CONFIG_S390 3813 3812 /* ··· 3859 3858 out: 3860 3859 put_cpu(); 3861 3860 } 3862 - EXPORT_SYMBOL_GPL(__kvm_vcpu_kick); 3861 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_kick); 3863 3862 #endif /* !CONFIG_S390 */ 3864 3863 3865 3864 int kvm_vcpu_yield_to(struct kvm_vcpu *target) ··· 3882 3881 3883 3882 return ret; 3884 3883 } 3885 - EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to); 3884 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_yield_to); 3886 3885 3887 3886 /* 3888 3887 * Helper that checks whether a VCPU is eligible for directed yield. ··· 4037 4036 /* Ensure vcpu is not eligible during next spinloop */ 4038 4037 kvm_vcpu_set_dy_eligible(me, false); 4039 4038 } 4040 - EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); 4039 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_on_spin); 4041 4040 4042 4041 static bool kvm_page_in_dirty_ring(struct kvm *kvm, unsigned long pgoff) 4043 4042 { ··· 5019 5018 5020 5019 return true; 5021 5020 } 5022 - EXPORT_SYMBOL_GPL(kvm_are_all_memslots_empty); 5021 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_are_all_memslots_empty); 5023 5022 5024 5023 static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, 5025 5024 struct kvm_enable_cap *cap) ··· 5474 5473 { 5475 5474 return file && file->f_op == &kvm_vm_fops; 5476 5475 } 5477 - EXPORT_SYMBOL_GPL(file_is_kvm); 5476 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm); 5478 5477 5479 5478 static int kvm_dev_ioctl_create_vm(unsigned long type) 5480 5479 { ··· 5569 5568 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING 5570 5569 bool enable_virt_at_load = true; 5571 5570 module_param(enable_virt_at_load, bool, 0444); 5572 - EXPORT_SYMBOL_GPL(enable_virt_at_load); 5571 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_virt_at_load); 5573 5572 5574 5573 __visible bool kvm_rebooting; 5575 - EXPORT_SYMBOL_GPL(kvm_rebooting); 5574 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); 5576 5575 5577 5576 static DEFINE_PER_CPU(bool, virtualization_enabled); 5578 5577 static DEFINE_MUTEX(kvm_usage_lock); ··· 5723 5722 --kvm_usage_count; 5724 5723 return r; 5725 5724 } 5726 - EXPORT_SYMBOL_GPL(kvm_enable_virtualization); 5725 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_virtualization); 5727 5726 5728 5727 void kvm_disable_virtualization(void) 5729 5728 { ··· 5736 5735 cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); 5737 5736 kvm_arch_disable_virtualization(); 5738 5737 } 5739 - EXPORT_SYMBOL_GPL(kvm_disable_virtualization); 5738 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_disable_virtualization); 5740 5739 5741 5740 static int kvm_init_virtualization(void) 5742 5741 { ··· 5885 5884 r = __kvm_io_bus_write(vcpu, bus, &range, val); 5886 5885 return r < 0 ? r : 0; 5887 5886 } 5888 - EXPORT_SYMBOL_GPL(kvm_io_bus_write); 5887 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_write); 5889 5888 5890 5889 int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, 5891 5890 gpa_t addr, int len, const void *val, long cookie) ··· 5954 5953 r = __kvm_io_bus_read(vcpu, bus, &range, val); 5955 5954 return r < 0 ? r : 0; 5956 5955 } 5957 - EXPORT_SYMBOL_GPL(kvm_io_bus_read); 5956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_read); 5958 5957 5959 5958 static void __free_bus(struct rcu_head *rcu) 5960 5959 { ··· 6078 6077 6079 6078 return iodev; 6080 6079 } 6081 - EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev); 6080 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_get_dev); 6082 6081 6083 6082 static int kvm_debugfs_open(struct inode *inode, struct file *file, 6084 6083 int (*get)(void *, u64 *), int (*set)(void *, u64), ··· 6415 6414 6416 6415 return vcpu; 6417 6416 } 6418 - EXPORT_SYMBOL_GPL(kvm_get_running_vcpu); 6417 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_running_vcpu); 6419 6418 6420 6419 /** 6421 6420 * kvm_get_running_vcpus - get the per-CPU array of currently running vcpus. ··· 6550 6549 kmem_cache_destroy(kvm_vcpu_cache); 6551 6550 return r; 6552 6551 } 6553 - EXPORT_SYMBOL_GPL(kvm_init); 6552 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init); 6554 6553 6555 6554 void kvm_exit(void) 6556 6555 { ··· 6573 6572 kvm_async_pf_deinit(); 6574 6573 kvm_irqfd_exit(); 6575 6574 } 6576 - EXPORT_SYMBOL_GPL(kvm_exit); 6575 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_exit);