Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+21

Documentation/admin-guide/kernel-parameters.txt

··· 2962 2962 (enabled). Disable by KVM if hardware lacks support 2963 2963 for NPT. 2964 2964 2965 + kvm-amd.ciphertext_hiding_asids= 2966 + [KVM,AMD] Ciphertext hiding prevents disallowed accesses 2967 + to SNP private memory from reading ciphertext. Instead, 2968 + reads will see constant default values (0xff). 2969 + 2970 + If ciphertext hiding is enabled, the joint SEV-ES and 2971 + SEV-SNP ASID space is partitioned into separate SEV-ES 2972 + and SEV-SNP ASID ranges, with the SEV-SNP range being 2973 + [1..max_snp_asid] and the SEV-ES range being 2974 + (max_snp_asid..min_sev_asid), where min_sev_asid is 2975 + enumerated by CPUID.0x.8000_001F[EDX]. 2976 + 2977 + A non-zero value enables SEV-SNP ciphertext hiding and 2978 + adjusts the ASID ranges for SEV-ES and SEV-SNP guests. 2979 + KVM caps the number of SEV-SNP ASIDs at the maximum 2980 + possible value, e.g. specifying -1u will assign all 2981 + joint SEV-ES and SEV-SNP ASIDs to SEV-SNP. Note, 2982 + assigning all joint ASIDs to SEV-SNP, i.e. configuring 2983 + max_snp_asid == min_sev_asid-1, will effectively make 2984 + SEV-ES unusable. 2985 + 2965 2986 kvm-arm.mode= 2966 2987 [KVM,ARM,EARLY] Select one of KVM/arm64's modes of 2967 2988 operation.

+19 -1

Documentation/virt/kvm/api.rst

··· 2908 2908 2909 2909 0x9030 0000 0002 <reg:16> 2910 2910 2911 + x86 MSR registers have the following id bit patterns:: 2912 + 0x2030 0002 <msr number:32> 2913 + 2914 + Following are the KVM-defined registers for x86: 2915 + 2916 + ======================= ========= ============================================= 2917 + Encoding Register Description 2918 + ======================= ========= ============================================= 2919 + 0x2030 0003 0000 0000 SSP Shadow Stack Pointer 2920 + ======================= ========= ============================================= 2911 2921 2912 2922 4.69 KVM_GET_ONE_REG 2913 2923 -------------------- ··· 3084 3074 3085 3075 Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2. 3086 3076 See KVM_GET_PIT2 for details on struct kvm_pit_state2. 3077 + 3078 + .. Tip:: 3079 + ``KVM_SET_PIT2`` strictly adheres to the spec of Intel 8254 PIT. For example, 3080 + a ``count`` value of 0 in ``struct kvm_pit_channel_state`` is interpreted as 3081 + 65536, which is the maximum count value. Refer to `Intel 8254 programmable 3082 + interval timer <https://www.scs.stanford.edu/10wi-cs140/pintos/specs/8254.pdf>`_. 3087 3083 3088 3084 This IOCTL replaces the obsolete KVM_SET_PIT. 3089 3085 ··· 3598 3582 --------------------- 3599 3583 3600 3584 :Capability: basic 3601 - :Architectures: arm64, mips, riscv 3585 + :Architectures: arm64, mips, riscv, x86 (if KVM_CAP_ONE_REG) 3602 3586 :Type: vcpu ioctl 3603 3587 :Parameters: struct kvm_reg_list (in/out) 3604 3588 :Returns: 0 on success; -1 on error ··· 3641 3625 3642 3626 - KVM_REG_S390_GBEA 3643 3627 3628 + Note, for x86, all MSRs enumerated by KVM_GET_MSR_INDEX_LIST are supported as 3629 + type KVM_X86_REG_TYPE_MSR, but are NOT enumerated via KVM_GET_REG_LIST. 3644 3630 3645 3631 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) 3646 3632 -----------------------------------------

+3 -3

Documentation/virt/kvm/x86/hypercalls.rst

··· 137 137 Returns KVM_EOPNOTSUPP if the host does not use TSC clocksource, 138 138 or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK. 139 139 140 - 6. KVM_HC_SEND_IPI 140 + 7. KVM_HC_SEND_IPI 141 141 ------------------ 142 142 143 143 :Architecture: x86 ··· 158 158 159 159 Returns the number of CPUs to which the IPIs were delivered successfully. 160 160 161 - 7. KVM_HC_SCHED_YIELD 161 + 8. KVM_HC_SCHED_YIELD 162 162 --------------------- 163 163 164 164 :Architecture: x86 ··· 170 170 :Usage example: When sending a call-function IPI-many to vCPUs, yield if 171 171 any of the IPI target vCPUs was preempted. 172 172 173 - 8. KVM_HC_MAP_GPA_RANGE 173 + 9. KVM_HC_MAP_GPA_RANGE 174 174 ------------------------- 175 175 :Architecture: x86 176 176 :Status: active

-1

arch/powerpc/include/asm/Kbuild

··· 3 3 generated-y += syscall_table_64.h 4 4 generated-y += syscall_table_spu.h 5 5 generic-y += agp.h 6 - generic-y += kvm_types.h 7 6 generic-y += mcs_spinlock.h 8 7 generic-y += qrwlock.h 9 8 generic-y += early_ioremap.h

+15

arch/powerpc/include/asm/kvm_types.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_PPC_KVM_TYPES_H 3 + #define _ASM_PPC_KVM_TYPES_H 4 + 5 + #if IS_MODULE(CONFIG_KVM_BOOK3S_64_PR) && IS_MODULE(CONFIG_KVM_BOOK3S_64_HV) 6 + #define KVM_SUB_MODULES kvm-pr,kvm-hv 7 + #elif IS_MODULE(CONFIG_KVM_BOOK3S_64_PR) 8 + #define KVM_SUB_MODULES kvm-pr 9 + #elif IS_MODULE(CONFIG_KVM_BOOK3S_64_HV) 10 + #define KVM_SUB_MODULES kvm-hv 11 + #else 12 + #undef KVM_SUB_MODULES 13 + #endif 14 + 15 + #endif

+2

arch/s390/include/asm/kvm_host.h

··· 722 722 extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc); 723 723 extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc); 724 724 725 + bool kvm_s390_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa); 726 + 725 727 static inline void kvm_arch_free_memslot(struct kvm *kvm, 726 728 struct kvm_memory_slot *slot) {} 727 729 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}

+8

arch/s390/kvm/priv.c

··· 605 605 } 606 606 } 607 607 608 + #if IS_ENABLED(CONFIG_VFIO_AP) 609 + bool kvm_s390_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa) 610 + { 611 + return kvm_is_gpa_in_memslot(kvm, gpa); 612 + } 613 + EXPORT_SYMBOL_FOR_MODULES(kvm_s390_is_gpa_in_memslot, "vfio_ap"); 614 + #endif 615 + 608 616 /* 609 617 * handle_pqap: Handling pqap interception 610 618 * @vcpu: the vcpu having issue the pqap instruction

+2

arch/x86/include/asm/cpufeatures.h

··· 444 444 #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* VM Page Flush MSR is supported */ 445 445 #define X86_FEATURE_SEV_ES (19*32+ 3) /* "sev_es" Secure Encrypted Virtualization - Encrypted State */ 446 446 #define X86_FEATURE_SEV_SNP (19*32+ 4) /* "sev_snp" Secure Encrypted Virtualization - Secure Nested Paging */ 447 + #define X86_FEATURE_SNP_SECURE_TSC (19*32+ 8) /* SEV-SNP Secure TSC */ 447 448 #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* Virtual TSC_AUX */ 448 449 #define X86_FEATURE_SME_COHERENT (19*32+10) /* hardware-enforced cache coherency */ 449 450 #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* "debug_swap" SEV-ES full debug state swap support */ ··· 498 497 #define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */ 499 498 #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */ 500 499 #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */ 500 + #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */ 501 501 502 502 /* 503 503 * BUG word(s)

+1 -1

arch/x86/include/asm/kvm-x86-ops.h

··· 138 138 KVM_X86_OP(apic_init_signal_blocked) 139 139 KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush) 140 140 KVM_X86_OP_OPTIONAL(migrate_timers) 141 - KVM_X86_OP(recalc_msr_intercepts) 141 + KVM_X86_OP(recalc_intercepts) 142 142 KVM_X86_OP(complete_emulated_msr) 143 143 KVM_X86_OP(vcpu_deliver_sipi_vector) 144 144 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);

+53 -30

arch/x86/include/asm/kvm_host.h

··· 120 120 #define KVM_REQ_TLB_FLUSH_GUEST \ 121 121 KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) 122 122 #define KVM_REQ_APF_READY KVM_ARCH_REQ(28) 123 - #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29) 123 + #define KVM_REQ_RECALC_INTERCEPTS KVM_ARCH_REQ(29) 124 124 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \ 125 125 KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) 126 126 #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ ··· 142 142 | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ 143 143 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ 144 144 | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ 145 - | X86_CR4_LAM_SUP)) 145 + | X86_CR4_LAM_SUP | X86_CR4_CET)) 146 146 147 147 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) 148 148 ··· 267 267 #define PFERR_RSVD_MASK BIT(3) 268 268 #define PFERR_FETCH_MASK BIT(4) 269 269 #define PFERR_PK_MASK BIT(5) 270 + #define PFERR_SS_MASK BIT(6) 270 271 #define PFERR_SGX_MASK BIT(15) 271 272 #define PFERR_GUEST_RMP_MASK BIT_ULL(31) 272 273 #define PFERR_GUEST_FINAL_MASK BIT_ULL(32) ··· 546 545 #define KVM_MAX_NR_GP_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_GP_COUNTERS, \ 547 546 KVM_MAX_NR_AMD_GP_COUNTERS) 548 547 549 - #define KVM_MAX_NR_INTEL_FIXED_COUTNERS 3 550 - #define KVM_MAX_NR_AMD_FIXED_COUTNERS 0 551 - #define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUTNERS, \ 552 - KVM_MAX_NR_AMD_FIXED_COUTNERS) 548 + #define KVM_MAX_NR_INTEL_FIXED_COUNTERS 3 549 + #define KVM_MAX_NR_AMD_FIXED_COUNTERS 0 550 + #define KVM_MAX_NR_FIXED_COUNTERS KVM_MAX(KVM_MAX_NR_INTEL_FIXED_COUNTERS, \ 551 + KVM_MAX_NR_AMD_FIXED_COUNTERS) 553 552 554 553 struct kvm_pmu { 555 554 u8 version; ··· 579 578 }; 580 579 DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX); 581 580 DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX); 581 + 582 + DECLARE_BITMAP(pmc_counting_instructions, X86_PMC_IDX_MAX); 583 + DECLARE_BITMAP(pmc_counting_branches, X86_PMC_IDX_MAX); 582 584 583 585 u64 ds_area; 584 586 u64 pebs_enable; ··· 775 771 CPUID_7_2_EDX, 776 772 CPUID_24_0_EBX, 777 773 CPUID_8000_0021_ECX, 774 + CPUID_7_1_ECX, 778 775 NR_KVM_CPU_CAPS, 779 776 780 777 NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, ··· 816 811 bool at_instruction_boundary; 817 812 bool tpr_access_reporting; 818 813 bool xfd_no_write_intercept; 819 - u64 ia32_xss; 820 814 u64 microcode_version; 821 815 u64 arch_capabilities; 822 816 u64 perf_capabilities; ··· 876 872 877 873 u64 xcr0; 878 874 u64 guest_supported_xcr0; 875 + u64 ia32_xss; 876 + u64 guest_supported_xss; 879 877 880 878 struct kvm_pio_request pio; 881 879 void *pio_data; ··· 932 926 bool emulate_regs_need_sync_from_vcpu; 933 927 int (*complete_userspace_io)(struct kvm_vcpu *vcpu); 934 928 unsigned long cui_linear_rip; 929 + int cui_rdmsr_imm_reg; 935 930 936 931 gpa_t time; 937 932 s8 pvclock_tsc_shift; ··· 1355 1348 __APICV_INHIBIT_REASON(LOGICAL_ID_ALIASED), \ 1356 1349 __APICV_INHIBIT_REASON(PHYSICAL_ID_TOO_BIG) 1357 1350 1358 - struct kvm_arch { 1359 - unsigned long n_used_mmu_pages; 1360 - unsigned long n_requested_mmu_pages; 1361 - unsigned long n_max_mmu_pages; 1362 - unsigned int indirect_shadow_pages; 1363 - u8 mmu_valid_gen; 1364 - u8 vm_type; 1365 - bool has_private_mem; 1366 - bool has_protected_state; 1367 - bool pre_fault_allowed; 1368 - struct hlist_head *mmu_page_hash; 1369 - struct list_head active_mmu_pages; 1351 + struct kvm_possible_nx_huge_pages { 1370 1352 /* 1371 1353 * A list of kvm_mmu_page structs that, if zapped, could possibly be 1372 1354 * replaced by an NX huge page. A shadow page is on this list if its ··· 1367 1371 * guest attempts to execute from the region then KVM obviously can't 1368 1372 * create an NX huge page (without hanging the guest). 1369 1373 */ 1370 - struct list_head possible_nx_huge_pages; 1374 + struct list_head pages; 1375 + u64 nr_pages; 1376 + }; 1377 + 1378 + enum kvm_mmu_type { 1379 + KVM_SHADOW_MMU, 1380 + #ifdef CONFIG_X86_64 1381 + KVM_TDP_MMU, 1382 + #endif 1383 + KVM_NR_MMU_TYPES, 1384 + }; 1385 + 1386 + struct kvm_arch { 1387 + unsigned long n_used_mmu_pages; 1388 + unsigned long n_requested_mmu_pages; 1389 + unsigned long n_max_mmu_pages; 1390 + unsigned int indirect_shadow_pages; 1391 + u8 mmu_valid_gen; 1392 + u8 vm_type; 1393 + bool has_private_mem; 1394 + bool has_protected_state; 1395 + bool has_protected_eoi; 1396 + bool pre_fault_allowed; 1397 + struct hlist_head *mmu_page_hash; 1398 + struct list_head active_mmu_pages; 1399 + struct kvm_possible_nx_huge_pages possible_nx_huge_pages[KVM_NR_MMU_TYPES]; 1371 1400 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING 1372 1401 struct kvm_page_track_notifier_head track_notifier_head; 1373 1402 #endif ··· 1547 1526 * is held in read mode: 1548 1527 * - tdp_mmu_roots (above) 1549 1528 * - the link field of kvm_mmu_page structs used by the TDP MMU 1550 - * - possible_nx_huge_pages; 1529 + * - possible_nx_huge_pages[KVM_TDP_MMU]; 1551 1530 * - the possible_nx_huge_page_link field of kvm_mmu_page structs used 1552 1531 * by the TDP MMU 1553 1532 * Because the lock is only taken within the MMU lock, strictly ··· 1929 1908 int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu); 1930 1909 1931 1910 void (*migrate_timers)(struct kvm_vcpu *vcpu); 1932 - void (*recalc_msr_intercepts)(struct kvm_vcpu *vcpu); 1911 + void (*recalc_intercepts)(struct kvm_vcpu *vcpu); 1933 1912 int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); 1934 1913 1935 1914 void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); ··· 2170 2149 2171 2150 void kvm_enable_efer_bits(u64); 2172 2151 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); 2173 - int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2174 - int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data); 2175 - int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); 2176 - int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2177 - int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); 2152 + int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2153 + int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2154 + int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2155 + int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2156 + int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data); 2157 + int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data); 2178 2158 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu); 2159 + int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 2179 2160 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu); 2161 + int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 2180 2162 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu); 2181 2163 int kvm_emulate_invd(struct kvm_vcpu *vcpu); 2182 2164 int kvm_emulate_mwait(struct kvm_vcpu *vcpu); ··· 2211 2187 unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr); 2212 2188 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu); 2213 2189 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw); 2190 + int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); 2214 2191 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu); 2215 2192 2216 2193 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); ··· 2379 2354 int kvm_find_user_return_msr(u32 msr); 2380 2355 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask); 2381 2356 void kvm_user_return_msr_update_cache(unsigned int index, u64 val); 2357 + u64 kvm_get_user_return_msr(unsigned int slot); 2382 2358 2383 2359 static inline bool kvm_is_supported_user_return_msr(u32 msr) 2384 2360 { ··· 2415 2389 u32 size); 2416 2390 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu); 2417 2391 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu); 2418 - 2419 - bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 2420 - struct kvm_vcpu **dest_vcpu); 2421 2392 2422 2393 static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq) 2423 2394 {

+10

arch/x86/include/asm/kvm_types.h

··· 2 2 #ifndef _ASM_X86_KVM_TYPES_H 3 3 #define _ASM_X86_KVM_TYPES_H 4 4 5 + #if IS_MODULE(CONFIG_KVM_AMD) && IS_MODULE(CONFIG_KVM_INTEL) 6 + #define KVM_SUB_MODULES kvm-amd,kvm-intel 7 + #elif IS_MODULE(CONFIG_KVM_AMD) 8 + #define KVM_SUB_MODULES kvm-amd 9 + #elif IS_MODULE(CONFIG_KVM_INTEL) 10 + #define KVM_SUB_MODULES kvm-intel 11 + #else 12 + #undef KVM_SUB_MODULES 13 + #endif 14 + 5 15 #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40 6 16 7 17 #endif /* _ASM_X86_KVM_TYPES_H */

+4

arch/x86/include/asm/msr-index.h

··· 315 315 #define PERF_CAP_PT_IDX 16 316 316 317 317 #define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6 318 + 319 + #define PERF_CAP_LBR_FMT 0x3f 318 320 #define PERF_CAP_PEBS_TRAP BIT_ULL(6) 319 321 #define PERF_CAP_ARCH_REG BIT_ULL(7) 320 322 #define PERF_CAP_PEBS_FORMAT 0xf00 323 + #define PERF_CAP_FW_WRITES BIT_ULL(13) 321 324 #define PERF_CAP_PEBS_BASELINE BIT_ULL(14) 322 325 #define PERF_CAP_PEBS_TIMING_INFO BIT_ULL(17) 323 326 #define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \ ··· 750 747 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300 751 748 #define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301 752 749 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302 750 + #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET 0xc0000303 753 751 754 752 /* AMD Hardware Feedback Support MSRs */ 755 753 #define MSR_AMD_WORKLOAD_CLASS_CONFIG 0xc0000500

+1

arch/x86/include/asm/svm.h

··· 299 299 #define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3) 300 300 #define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4) 301 301 #define SVM_SEV_FEAT_DEBUG_SWAP BIT(5) 302 + #define SVM_SEV_FEAT_SECURE_TSC BIT(9) 302 303 303 304 #define VMCB_ALLOWED_SEV_FEATURES_VALID BIT_ULL(63) 304 305

+9

arch/x86/include/asm/vmx.h

··· 106 106 #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 107 107 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 108 108 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 109 + #define VM_EXIT_LOAD_CET_STATE 0x10000000 109 110 110 111 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff 111 112 ··· 120 119 #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 121 120 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 122 121 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 122 + #define VM_ENTRY_LOAD_CET_STATE 0x00100000 123 123 124 124 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff 125 125 ··· 134 132 #define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49) 135 133 #define VMX_BASIC_INOUT BIT_ULL(54) 136 134 #define VMX_BASIC_TRUE_CTLS BIT_ULL(55) 135 + #define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56) 137 136 138 137 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic) 139 138 { ··· 372 369 GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, 373 370 GUEST_SYSENTER_ESP = 0x00006824, 374 371 GUEST_SYSENTER_EIP = 0x00006826, 372 + GUEST_S_CET = 0x00006828, 373 + GUEST_SSP = 0x0000682a, 374 + GUEST_INTR_SSP_TABLE = 0x0000682c, 375 375 HOST_CR0 = 0x00006c00, 376 376 HOST_CR3 = 0x00006c02, 377 377 HOST_CR4 = 0x00006c04, ··· 387 381 HOST_IA32_SYSENTER_EIP = 0x00006c12, 388 382 HOST_RSP = 0x00006c14, 389 383 HOST_RIP = 0x00006c16, 384 + HOST_S_CET = 0x00006c18, 385 + HOST_SSP = 0x00006c1a, 386 + HOST_INTR_SSP_TABLE = 0x00006c1c 390 387 }; 391 388 392 389 /*

+34

arch/x86/include/uapi/asm/kvm.h

··· 35 35 #define MC_VECTOR 18 36 36 #define XM_VECTOR 19 37 37 #define VE_VECTOR 20 38 + #define CP_VECTOR 21 39 + 40 + #define HV_VECTOR 28 41 + #define VC_VECTOR 29 42 + #define SX_VECTOR 30 38 43 39 44 /* Select x86 specific features in <linux/kvm.h> */ 40 45 #define __KVM_HAVE_PIT ··· 415 410 struct kvm_xcr xcrs[KVM_MAX_XCRS]; 416 411 __u64 padding[16]; 417 412 }; 413 + 414 + #define KVM_X86_REG_TYPE_MSR 2 415 + #define KVM_X86_REG_TYPE_KVM 3 416 + 417 + #define KVM_X86_KVM_REG_SIZE(reg) \ 418 + ({ \ 419 + reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0; \ 420 + }) 421 + 422 + #define KVM_X86_REG_TYPE_SIZE(type, reg) \ 423 + ({ \ 424 + __u64 type_size = (__u64)type << 32; \ 425 + \ 426 + type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 : \ 427 + type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) : \ 428 + 0; \ 429 + type_size; \ 430 + }) 431 + 432 + #define KVM_X86_REG_ID(type, index) \ 433 + (KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index) 434 + 435 + #define KVM_X86_REG_MSR(index) \ 436 + KVM_X86_REG_ID(KVM_X86_REG_TYPE_MSR, index) 437 + #define KVM_X86_REG_KVM(index) \ 438 + KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index) 439 + 440 + /* KVM-defined registers starting from 0 */ 441 + #define KVM_REG_GUEST_SSP 0 418 442 419 443 #define KVM_SYNC_X86_REGS (1UL << 0) 420 444 #define KVM_SYNC_X86_SREGS (1UL << 1)

+5 -1

arch/x86/include/uapi/asm/vmx.h

··· 94 94 #define EXIT_REASON_BUS_LOCK 74 95 95 #define EXIT_REASON_NOTIFY 75 96 96 #define EXIT_REASON_TDCALL 77 97 + #define EXIT_REASON_MSR_READ_IMM 84 98 + #define EXIT_REASON_MSR_WRITE_IMM 85 97 99 98 100 #define VMX_EXIT_REASONS \ 99 101 { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ ··· 160 158 { EXIT_REASON_TPAUSE, "TPAUSE" }, \ 161 159 { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ 162 160 { EXIT_REASON_NOTIFY, "NOTIFY" }, \ 163 - { EXIT_REASON_TDCALL, "TDCALL" } 161 + { EXIT_REASON_TDCALL, "TDCALL" }, \ 162 + { EXIT_REASON_MSR_READ_IMM, "MSR_READ_IMM" }, \ 163 + { EXIT_REASON_MSR_WRITE_IMM, "MSR_WRITE_IMM" } 164 164 165 165 #define VMX_EXIT_REASON_FLAGS \ 166 166 { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" }

+1

arch/x86/kernel/cpu/scattered.c

··· 27 27 { X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 }, 28 28 { X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 }, 29 29 { X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 }, 30 + { X86_FEATURE_MSR_IMM, CPUID_ECX, 5, 0x00000007, 1 }, 30 31 { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 }, 31 32 { X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 }, 32 33 { X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },

+49 -9

arch/x86/kvm/cpuid.c

··· 34 34 * aligned to sizeof(unsigned long) because it's not accessed via bitops. 35 35 */ 36 36 u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; 37 - EXPORT_SYMBOL_GPL(kvm_cpu_caps); 37 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_caps); 38 38 39 39 struct cpuid_xstate_sizes { 40 40 u32 eax; ··· 131 131 132 132 return NULL; 133 133 } 134 - EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2); 134 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_cpuid_entry2); 135 135 136 136 static int kvm_check_cpuid(struct kvm_vcpu *vcpu) 137 137 { ··· 263 263 return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0; 264 264 } 265 265 266 + static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu) 267 + { 268 + struct kvm_cpuid_entry2 *best; 269 + 270 + best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1); 271 + if (!best) 272 + return 0; 273 + 274 + return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss; 275 + } 276 + 266 277 static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu, 267 278 struct kvm_cpuid_entry2 *entry, 268 279 unsigned int x86_feature, ··· 316 305 best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1); 317 306 if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || 318 307 cpuid_entry_has(best, X86_FEATURE_XSAVEC))) 319 - best->ebx = xstate_required_size(vcpu->arch.xcr0, true); 308 + best->ebx = xstate_required_size(vcpu->arch.xcr0 | 309 + vcpu->arch.ia32_xss, true); 320 310 } 321 311 322 312 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu) ··· 436 424 } 437 425 438 426 vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu); 427 + vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu); 439 428 440 429 vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu); 441 430 ··· 461 448 * adjustments to the reserved GPA bits. 462 449 */ 463 450 kvm_mmu_after_set_cpuid(vcpu); 451 + 452 + kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu); 464 453 } 465 454 466 455 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu) ··· 946 931 VENDOR_F(WAITPKG), 947 932 F(SGX_LC), 948 933 F(BUS_LOCK_DETECT), 934 + X86_64_F(SHSTK), 949 935 ); 950 936 951 937 /* ··· 955 939 */ 956 940 if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) 957 941 kvm_cpu_cap_clear(X86_FEATURE_PKU); 942 + 943 + /* 944 + * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack 945 + * accesses require "magic" Writable=0,Dirty=1 protection, which KVM 946 + * doesn't know how to emulate or map. 947 + */ 948 + if (!tdp_enabled) 949 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 958 950 959 951 kvm_cpu_cap_init(CPUID_7_EDX, 960 952 F(AVX512_4VNNIW), ··· 981 957 F(AMX_INT8), 982 958 F(AMX_BF16), 983 959 F(FLUSH_L1D), 960 + F(IBT), 984 961 ); 962 + 963 + /* 964 + * Disable support for IBT and SHSTK if KVM is configured to emulate 965 + * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or 966 + * SHSTK, nor does KVM handle Shadow Stack #PFs (see above). 967 + */ 968 + if (allow_smaller_maxphyaddr) { 969 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 970 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 971 + } 985 972 986 973 if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) && 987 974 boot_cpu_has(X86_FEATURE_AMD_IBPB) && ··· 1018 983 F(AMX_FP16), 1019 984 F(AVX_IFMA), 1020 985 F(LAM), 986 + ); 987 + 988 + kvm_cpu_cap_init(CPUID_7_1_ECX, 989 + SCATTERED_F(MSR_IMM), 1021 990 ); 1022 991 1023 992 kvm_cpu_cap_init(CPUID_7_1_EDX, ··· 1261 1222 kvm_cpu_cap_clear(X86_FEATURE_RDPID); 1262 1223 } 1263 1224 } 1264 - EXPORT_SYMBOL_GPL(kvm_set_cpu_caps); 1225 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cpu_caps); 1265 1226 1266 1227 #undef F 1267 1228 #undef SCATTERED_F ··· 1450 1411 goto out; 1451 1412 1452 1413 cpuid_entry_override(entry, CPUID_7_1_EAX); 1414 + cpuid_entry_override(entry, CPUID_7_1_ECX); 1453 1415 cpuid_entry_override(entry, CPUID_7_1_EDX); 1454 1416 entry->ebx = 0; 1455 - entry->ecx = 0; 1456 1417 } 1457 1418 if (max_idx >= 2) { 1458 1419 entry = do_host_cpuid(array, function, 2); ··· 1859 1820 int r; 1860 1821 1861 1822 if (func == CENTAUR_CPUID_SIGNATURE && 1862 - boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR) 1823 + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR && 1824 + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) 1863 1825 return 0; 1864 1826 1865 1827 r = do_cpuid_func(array, func, type); ··· 2041 2001 if (function == 7 && index == 0) { 2042 2002 u64 data; 2043 2003 if ((*ebx & (feature_bit(RTM) | feature_bit(HLE))) && 2044 - !__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && 2004 + !kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) && 2045 2005 (data & TSX_CTRL_CPUID_CLEAR)) 2046 2006 *ebx &= ~(feature_bit(RTM) | feature_bit(HLE)); 2047 2007 } else if (function == 0x80000007) { ··· 2085 2045 used_max_basic); 2086 2046 return exact; 2087 2047 } 2088 - EXPORT_SYMBOL_GPL(kvm_cpuid); 2048 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpuid); 2089 2049 2090 2050 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu) 2091 2051 { ··· 2103 2063 kvm_rdx_write(vcpu, edx); 2104 2064 return kvm_skip_emulated_instruction(vcpu); 2105 2065 } 2106 - EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); 2066 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_cpuid);

+143 -20

arch/x86/kvm/emulate.c

··· 178 178 #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */ 179 179 #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */ 180 180 #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */ 181 + #define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */ 181 182 182 183 #define DstXacc (DstAccLo | SrcAccHi | SrcWrite) 183 184 ··· 1554 1553 return linear_write_system(ctxt, addr, desc, sizeof(*desc)); 1555 1554 } 1556 1555 1556 + static bool emulator_is_ssp_invalid(struct x86_emulate_ctxt *ctxt, u8 cpl) 1557 + { 1558 + const u32 MSR_IA32_X_CET = cpl == 3 ? MSR_IA32_U_CET : MSR_IA32_S_CET; 1559 + u64 efer = 0, cet = 0, ssp = 0; 1560 + 1561 + if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET)) 1562 + return false; 1563 + 1564 + if (ctxt->ops->get_msr(ctxt, MSR_EFER, &efer)) 1565 + return true; 1566 + 1567 + /* SSP is guaranteed to be valid if the vCPU was already in 32-bit mode. */ 1568 + if (!(efer & EFER_LMA)) 1569 + return false; 1570 + 1571 + if (ctxt->ops->get_msr(ctxt, MSR_IA32_X_CET, &cet)) 1572 + return true; 1573 + 1574 + if (!(cet & CET_SHSTK_EN)) 1575 + return false; 1576 + 1577 + if (ctxt->ops->get_msr(ctxt, MSR_KVM_INTERNAL_GUEST_SSP, &ssp)) 1578 + return true; 1579 + 1580 + /* 1581 + * On transfer from 64-bit mode to compatibility mode, SSP[63:32] must 1582 + * be 0, i.e. SSP must be a 32-bit value outside of 64-bit mode. 1583 + */ 1584 + return ssp >> 32; 1585 + } 1586 + 1557 1587 static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, 1558 1588 u16 selector, int seg, u8 cpl, 1559 1589 enum x86_transfer_type transfer, ··· 1724 1692 ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); 1725 1693 if (efer & EFER_LMA) 1726 1694 goto exception; 1695 + } 1696 + if (!seg_desc.l && emulator_is_ssp_invalid(ctxt, cpl)) { 1697 + err_code = 0; 1698 + goto exception; 1727 1699 } 1728 1700 1729 1701 /* CS(RPL) <- CPL */ ··· 4104 4068 static const struct opcode group5[] = { 4105 4069 F(DstMem | SrcNone | Lock, em_inc), 4106 4070 F(DstMem | SrcNone | Lock, em_dec), 4107 - I(SrcMem | NearBranch | IsBranch, em_call_near_abs), 4108 - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far), 4071 + I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs), 4072 + I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far), 4109 4073 I(SrcMem | NearBranch | IsBranch, em_jmp_abs), 4110 4074 I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far), 4111 4075 I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined), ··· 4340 4304 DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)), 4341 4305 /* 0x98 - 0x9F */ 4342 4306 D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), 4343 - I(SrcImmFAddr | No64 | IsBranch, em_call_far), N, 4307 + I(SrcImmFAddr | No64 | IsBranch | ShadowStack, em_call_far), N, 4344 4308 II(ImplicitOps | Stack, em_pushf, pushf), 4345 4309 II(ImplicitOps | Stack, em_popf, popf), 4346 4310 I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf), ··· 4360 4324 X8(I(DstReg | SrcImm64 | Mov, em_mov)), 4361 4325 /* 0xC0 - 0xC7 */ 4362 4326 G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2), 4363 - I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm), 4364 - I(ImplicitOps | NearBranch | IsBranch, em_ret), 4327 + I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm), 4328 + I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret), 4365 4329 I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg), 4366 4330 I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg), 4367 4331 G(ByteOp, group11), G(0, group11), 4368 4332 /* 0xC8 - 0xCF */ 4369 - I(Stack | SrcImmU16 | Src2ImmByte | IsBranch, em_enter), 4370 - I(Stack | IsBranch, em_leave), 4371 - I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm), 4372 - I(ImplicitOps | IsBranch, em_ret_far), 4373 - D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn), 4333 + I(Stack | SrcImmU16 | Src2ImmByte, em_enter), 4334 + I(Stack, em_leave), 4335 + I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm), 4336 + I(ImplicitOps | IsBranch | ShadowStack, em_ret_far), 4337 + D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn), 4374 4338 D(ImplicitOps | No64 | IsBranch), 4375 - II(ImplicitOps | IsBranch, em_iret, iret), 4339 + II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret), 4376 4340 /* 0xD0 - 0xD7 */ 4377 4341 G(Src2One | ByteOp, group2), G(Src2One, group2), 4378 4342 G(Src2CL | ByteOp, group2), G(Src2CL, group2), ··· 4388 4352 I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in), 4389 4353 I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out), 4390 4354 /* 0xE8 - 0xEF */ 4391 - I(SrcImm | NearBranch | IsBranch, em_call), 4355 + I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call), 4392 4356 D(SrcImm | ImplicitOps | NearBranch | IsBranch), 4393 4357 I(SrcImmFAddr | No64 | IsBranch, em_jmp_far), 4394 4358 D(SrcImmByte | ImplicitOps | NearBranch | IsBranch), ··· 4407 4371 static const struct opcode twobyte_table[256] = { 4408 4372 /* 0x00 - 0x0F */ 4409 4373 G(0, group6), GD(0, &group7), N, N, 4410 - N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall), 4374 + N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_syscall), 4411 4375 II(ImplicitOps | Priv, em_clts, clts), N, 4412 4376 DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, 4413 4377 N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N, ··· 4438 4402 IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc), 4439 4403 II(ImplicitOps | Priv, em_rdmsr, rdmsr), 4440 4404 IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc), 4441 - I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter), 4442 - I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit), 4405 + I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_sysenter), 4406 + I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit), 4443 4407 N, N, 4444 4408 N, N, N, N, N, N, N, N, 4445 4409 /* 0x40 - 0x4F */ ··· 4549 4513 #undef I2bv 4550 4514 #undef I2bvIP 4551 4515 #undef I6ALU 4516 + 4517 + static bool is_shstk_instruction(struct x86_emulate_ctxt *ctxt) 4518 + { 4519 + return ctxt->d & ShadowStack; 4520 + } 4521 + 4522 + static bool is_ibt_instruction(struct x86_emulate_ctxt *ctxt) 4523 + { 4524 + u64 flags = ctxt->d; 4525 + 4526 + if (!(flags & IsBranch)) 4527 + return false; 4528 + 4529 + /* 4530 + * All far JMPs and CALLs (including SYSCALL, SYSENTER, and INTn) are 4531 + * indirect and thus affect IBT state. All far RETs (including SYSEXIT 4532 + * and IRET) are protected via Shadow Stacks and thus don't affect IBT 4533 + * state. IRET #GPs when returning to virtual-8086 and IBT or SHSTK is 4534 + * enabled, but that should be handled by IRET emulation (in the very 4535 + * unlikely scenario that KVM adds support for fully emulating IRET). 4536 + */ 4537 + if (!(flags & NearBranch)) 4538 + return ctxt->execute != em_iret && 4539 + ctxt->execute != em_ret_far && 4540 + ctxt->execute != em_ret_far_imm && 4541 + ctxt->execute != em_sysexit; 4542 + 4543 + switch (flags & SrcMask) { 4544 + case SrcReg: 4545 + case SrcMem: 4546 + case SrcMem16: 4547 + case SrcMem32: 4548 + return true; 4549 + case SrcMemFAddr: 4550 + case SrcImmFAddr: 4551 + /* Far branches should be handled above. */ 4552 + WARN_ON_ONCE(1); 4553 + return true; 4554 + case SrcNone: 4555 + case SrcImm: 4556 + case SrcImmByte: 4557 + /* 4558 + * Note, ImmU16 is used only for the stack adjustment operand on ENTER 4559 + * and RET instructions. ENTER isn't a branch and RET FAR is handled 4560 + * by the NearBranch check above. RET itself isn't an indirect branch. 4561 + */ 4562 + case SrcImmU16: 4563 + return false; 4564 + default: 4565 + WARN_ONCE(1, "Unexpected Src operand '%llx' on branch", 4566 + flags & SrcMask); 4567 + return false; 4568 + } 4569 + } 4552 4570 4553 4571 static unsigned imm_size(struct x86_emulate_ctxt *ctxt) 4554 4572 { ··· 5033 4943 5034 4944 ctxt->execute = opcode.u.execute; 5035 4945 4946 + /* 4947 + * Reject emulation if KVM might need to emulate shadow stack updates 4948 + * and/or indirect branch tracking enforcement, which the emulator 4949 + * doesn't support. 4950 + */ 4951 + if ((is_ibt_instruction(ctxt) || is_shstk_instruction(ctxt)) && 4952 + ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) { 4953 + u64 u_cet = 0, s_cet = 0; 4954 + 4955 + /* 4956 + * Check both User and Supervisor on far transfers as inter- 4957 + * privilege level transfers are impacted by CET at the target 4958 + * privilege level, and that is not known at this time. The 4959 + * expectation is that the guest will not require emulation of 4960 + * any CET-affected instructions at any privilege level. 4961 + */ 4962 + if (!(ctxt->d & NearBranch)) 4963 + u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4964 + else if (ctxt->ops->cpl(ctxt) == 3) 4965 + u_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4966 + else 4967 + s_cet = CET_SHSTK_EN | CET_ENDBR_EN; 4968 + 4969 + if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) || 4970 + (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet))) 4971 + return EMULATION_FAILED; 4972 + 4973 + if ((u_cet | s_cet) & CET_SHSTK_EN && is_shstk_instruction(ctxt)) 4974 + return EMULATION_FAILED; 4975 + 4976 + if ((u_cet | s_cet) & CET_ENDBR_EN && is_ibt_instruction(ctxt)) 4977 + return EMULATION_FAILED; 4978 + } 4979 + 5036 4980 if (unlikely(emulation_type & EMULTYPE_TRAP_UD) && 5037 4981 likely(!(ctxt->d & EmulateOnUD))) 5038 4982 return EMULATION_FAILED; ··· 5231 5107 ctxt->mem_read.end = 0; 5232 5108 } 5233 5109 5234 - int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) 5110 + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, bool check_intercepts) 5235 5111 { 5236 5112 const struct x86_emulate_ops *ops = ctxt->ops; 5237 5113 int rc = X86EMUL_CONTINUE; 5238 5114 int saved_dst_type = ctxt->dst.type; 5239 - bool is_guest_mode = ctxt->ops->is_guest_mode(ctxt); 5240 5115 5241 5116 ctxt->mem_read.pos = 0; 5242 5117 ··· 5283 5160 fetch_possible_mmx_operand(&ctxt->dst); 5284 5161 } 5285 5162 5286 - if (unlikely(is_guest_mode) && ctxt->intercept) { 5163 + if (unlikely(check_intercepts) && ctxt->intercept) { 5287 5164 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5288 5165 X86_ICPT_PRE_EXCEPT); 5289 5166 if (rc != X86EMUL_CONTINUE) ··· 5312 5189 goto done; 5313 5190 } 5314 5191 5315 - if (unlikely(is_guest_mode) && (ctxt->d & Intercept)) { 5192 + if (unlikely(check_intercepts) && (ctxt->d & Intercept)) { 5316 5193 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5317 5194 X86_ICPT_POST_EXCEPT); 5318 5195 if (rc != X86EMUL_CONTINUE) ··· 5366 5243 5367 5244 special_insn: 5368 5245 5369 - if (unlikely(is_guest_mode) && (ctxt->d & Intercept)) { 5246 + if (unlikely(check_intercepts) && (ctxt->d & Intercept)) { 5370 5247 rc = emulator_check_intercept(ctxt, ctxt->intercept, 5371 5248 X86_ICPT_POST_MEMACCESS); 5372 5249 if (rc != X86EMUL_CONTINUE)

+7 -9

arch/x86/kvm/hyperv.c

··· 923 923 return false; 924 924 return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED; 925 925 } 926 - EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled); 926 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_hv_assist_page_enabled); 927 927 928 928 int kvm_hv_get_assist_page(struct kvm_vcpu *vcpu) 929 929 { ··· 935 935 return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, 936 936 &hv_vcpu->vp_assist_page, sizeof(struct hv_vp_assist_page)); 937 937 } 938 - EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page); 938 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_hv_get_assist_page); 939 939 940 940 static void stimer_prepare_msg(struct kvm_vcpu_hv_stimer *stimer) 941 941 { ··· 1168 1168 BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence)); 1169 1169 BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0); 1170 1170 1171 - mutex_lock(&hv->hv_lock); 1171 + guard(mutex)(&hv->hv_lock); 1172 1172 1173 1173 if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN || 1174 1174 hv->hv_tsc_page_status == HV_TSC_PAGE_SET || 1175 1175 hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET) 1176 - goto out_unlock; 1176 + return; 1177 1177 1178 1178 if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)) 1179 - goto out_unlock; 1179 + return; 1180 1180 1181 1181 gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; 1182 1182 /* ··· 1192 1192 goto out_err; 1193 1193 1194 1194 hv->hv_tsc_page_status = HV_TSC_PAGE_SET; 1195 - goto out_unlock; 1195 + return; 1196 1196 } 1197 1197 1198 1198 /* ··· 1228 1228 goto out_err; 1229 1229 1230 1230 hv->hv_tsc_page_status = HV_TSC_PAGE_SET; 1231 - goto out_unlock; 1231 + return; 1232 1232 1233 1233 out_err: 1234 1234 hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN; 1235 - out_unlock: 1236 - mutex_unlock(&hv->hv_lock); 1237 1235 } 1238 1236 1239 1237 void kvm_hv_request_tsc_page_update(struct kvm *kvm)

+1 -14

arch/x86/kvm/ioapic.c

··· 1 + // SPDX-License-Identifier: LGPL-2.1-or-later 1 2 /* 2 3 * Copyright (C) 2001 MandrakeSoft S.A. 3 4 * Copyright 2010 Red Hat, Inc. and/or its affiliates. ··· 8 7 * 75002 Paris - France 9 8 * http://www.linux-mandrake.com/ 10 9 * http://www.mandrakesoft.com/ 11 - * 12 - * This library is free software; you can redistribute it and/or 13 - * modify it under the terms of the GNU Lesser General Public 14 - * License as published by the Free Software Foundation; either 15 - * version 2 of the License, or (at your option) any later version. 16 - * 17 - * This library is distributed in the hope that it will be useful, 18 - * but WITHOUT ANY WARRANTY; without even the implied warranty of 19 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 20 - * Lesser General Public License for more details. 21 - * 22 - * You should have received a copy of the GNU Lesser General Public 23 - * License along with this library; if not, write to the Free Software 24 - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 25 10 * 26 11 * Yunhong Jiang <yunhong.jiang@intel.com> 27 12 * Yaozu (Eddie) Dong <eddie.dong@intel.com>

+3 -88

arch/x86/kvm/irq.c

··· 103 103 104 104 return kvm_apic_has_interrupt(v) != -1; /* LAPIC */ 105 105 } 106 - EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr); 106 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_has_injectable_intr); 107 107 108 108 /* 109 109 * check if there is pending interrupt without ··· 119 119 120 120 return kvm_apic_has_interrupt(v) != -1; /* LAPIC */ 121 121 } 122 - EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); 122 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_has_interrupt); 123 123 124 124 /* 125 125 * Read pending interrupt(from non-APIC source) ··· 148 148 WARN_ON_ONCE(!irqchip_split(v->kvm)); 149 149 return get_userspace_extint(v); 150 150 } 151 - EXPORT_SYMBOL_GPL(kvm_cpu_get_extint); 151 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_get_extint); 152 152 153 153 /* 154 154 * Read pending interrupt vector and intack. ··· 193 193 bool kvm_arch_irqchip_in_kernel(struct kvm *kvm) 194 194 { 195 195 return irqchip_in_kernel(kvm); 196 - } 197 - 198 - int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 199 - struct kvm_lapic_irq *irq, struct dest_map *dest_map) 200 - { 201 - int r = -1; 202 - struct kvm_vcpu *vcpu, *lowest = NULL; 203 - unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; 204 - unsigned int dest_vcpus = 0; 205 - 206 - if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) 207 - return r; 208 - 209 - if (irq->dest_mode == APIC_DEST_PHYSICAL && 210 - irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) { 211 - pr_info("apic: phys broadcast and lowest prio\n"); 212 - irq->delivery_mode = APIC_DM_FIXED; 213 - } 214 - 215 - memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap)); 216 - 217 - kvm_for_each_vcpu(i, vcpu, kvm) { 218 - if (!kvm_apic_present(vcpu)) 219 - continue; 220 - 221 - if (!kvm_apic_match_dest(vcpu, src, irq->shorthand, 222 - irq->dest_id, irq->dest_mode)) 223 - continue; 224 - 225 - if (!kvm_lowest_prio_delivery(irq)) { 226 - if (r < 0) 227 - r = 0; 228 - r += kvm_apic_set_irq(vcpu, irq, dest_map); 229 - } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { 230 - if (!kvm_vector_hashing_enabled()) { 231 - if (!lowest) 232 - lowest = vcpu; 233 - else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 234 - lowest = vcpu; 235 - } else { 236 - __set_bit(i, dest_vcpu_bitmap); 237 - dest_vcpus++; 238 - } 239 - } 240 - } 241 - 242 - if (dest_vcpus != 0) { 243 - int idx = kvm_vector_to_index(irq->vector, dest_vcpus, 244 - dest_vcpu_bitmap, KVM_MAX_VCPUS); 245 - 246 - lowest = kvm_get_vcpu(kvm, idx); 247 - } 248 - 249 - if (lowest) 250 - r = kvm_apic_set_irq(lowest, irq, dest_map); 251 - 252 - return r; 253 196 } 254 197 255 198 static void kvm_msi_to_lapic_irq(struct kvm *kvm, ··· 353 410 354 411 return 0; 355 412 } 356 - 357 - bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 358 - struct kvm_vcpu **dest_vcpu) 359 - { 360 - int r = 0; 361 - unsigned long i; 362 - struct kvm_vcpu *vcpu; 363 - 364 - if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu)) 365 - return true; 366 - 367 - kvm_for_each_vcpu(i, vcpu, kvm) { 368 - if (!kvm_apic_present(vcpu)) 369 - continue; 370 - 371 - if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand, 372 - irq->dest_id, irq->dest_mode)) 373 - continue; 374 - 375 - if (++r == 2) 376 - return false; 377 - 378 - *dest_vcpu = vcpu; 379 - } 380 - 381 - return r == 1; 382 - } 383 - EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu); 384 413 385 414 void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode, 386 415 u8 vector, unsigned long *ioapic_handled_vectors)

-4

arch/x86/kvm/irq.h

··· 121 121 122 122 int apic_has_pending_timer(struct kvm_vcpu *vcpu); 123 123 124 - int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 125 - struct kvm_lapic_irq *irq, 126 - struct dest_map *dest_map); 127 - 128 124 #endif

+2 -1

arch/x86/kvm/kvm_cache_regs.h

+1 -2

arch/x86/kvm/kvm_emulate.h

··· 235 235 void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked); 236 236 237 237 bool (*is_smm)(struct x86_emulate_ctxt *ctxt); 238 - bool (*is_guest_mode)(struct x86_emulate_ctxt *ctxt); 239 238 int (*leave_smm)(struct x86_emulate_ctxt *ctxt); 240 239 void (*triple_fault)(struct x86_emulate_ctxt *ctxt); 241 240 int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr); ··· 520 521 #define EMULATION_RESTART 1 521 522 #define EMULATION_INTERCEPTED 2 522 523 void init_decode_cache(struct x86_emulate_ctxt *ctxt); 523 - int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); 524 + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, bool check_intercepts); 524 525 int emulator_task_switch(struct x86_emulate_ctxt *ctxt, 525 526 u16 tss_selector, int idt_index, int reason, 526 527 bool has_error_code, u32 error_code);

+3 -3

arch/x86/kvm/kvm_onhyperv.c

··· 101 101 102 102 return __hv_flush_remote_tlbs_range(kvm, &range); 103 103 } 104 - EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs_range); 104 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_flush_remote_tlbs_range); 105 105 106 106 int hv_flush_remote_tlbs(struct kvm *kvm) 107 107 { 108 108 return __hv_flush_remote_tlbs_range(kvm, NULL); 109 109 } 110 - EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs); 110 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_flush_remote_tlbs); 111 111 112 112 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp) 113 113 { ··· 121 121 spin_unlock(&kvm_arch->hv_root_tdp_lock); 122 122 } 123 123 } 124 - EXPORT_SYMBOL_GPL(hv_track_root_tdp); 124 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(hv_track_root_tdp);

+179 -65

arch/x86/kvm/lapic.c

··· 74 74 #define LAPIC_TIMER_ADVANCE_NS_MAX 5000 75 75 /* step-by-step approximation to mitigate fluctuation */ 76 76 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 77 + 78 + static bool __read_mostly vector_hashing_enabled = true; 79 + module_param_named(vector_hashing, vector_hashing_enabled, bool, 0444); 80 + 77 81 static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data); 78 82 static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data); 79 83 ··· 106 102 } 107 103 108 104 __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); 109 - EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); 105 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu); 110 106 111 107 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); 112 108 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); ··· 134 130 (kvm_mwait_in_guest(vcpu->kvm) || kvm_hlt_in_guest(vcpu->kvm)); 135 131 } 136 132 137 - bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu) 133 + static bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu) 138 134 { 139 135 return kvm_x86_ops.set_hv_timer 140 136 && !(kvm_mwait_in_guest(vcpu->kvm) || ··· 646 642 return ((max_updated_irr != -1) && 647 643 (max_updated_irr == *max_irr)); 648 644 } 649 - EXPORT_SYMBOL_GPL(__kvm_apic_update_irr); 645 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_apic_update_irr); 650 646 651 647 bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, unsigned long *pir, int *max_irr) 652 648 { ··· 657 653 apic->irr_pending = true; 658 654 return irr_updated; 659 655 } 660 - EXPORT_SYMBOL_GPL(kvm_apic_update_irr); 656 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_irr); 661 657 662 658 static inline int apic_search_irr(struct kvm_lapic *apic) 663 659 { ··· 697 693 { 698 694 apic_clear_irr(vec, vcpu->arch.apic); 699 695 } 700 - EXPORT_SYMBOL_GPL(kvm_apic_clear_irr); 696 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_clear_irr); 701 697 702 698 static void *apic_vector_to_isr(int vec, struct kvm_lapic *apic) 703 699 { ··· 779 775 780 776 kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic)); 781 777 } 782 - EXPORT_SYMBOL_GPL(kvm_apic_update_hwapic_isr); 778 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_hwapic_isr); 783 779 784 780 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) 785 781 { ··· 790 786 */ 791 787 return apic_find_highest_irr(vcpu->arch.apic); 792 788 } 793 - EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr); 789 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_irr); 794 790 795 791 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, 796 792 int vector, int level, int trig_mode, ··· 954 950 { 955 951 apic_update_ppr(vcpu->arch.apic); 956 952 } 957 - EXPORT_SYMBOL_GPL(kvm_apic_update_ppr); 953 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_update_ppr); 958 954 959 955 static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr) 960 956 { ··· 1065 1061 return false; 1066 1062 } 1067 1063 } 1068 - EXPORT_SYMBOL_GPL(kvm_apic_match_dest); 1064 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_match_dest); 1069 1065 1070 - int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 1071 - const unsigned long *bitmap, u32 bitmap_size) 1066 + static int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 1067 + const unsigned long *bitmap, u32 bitmap_size) 1072 1068 { 1073 - u32 mod; 1074 - int i, idx = -1; 1069 + int idx = find_nth_bit(bitmap, bitmap_size, vector % dest_vcpus); 1075 1070 1076 - mod = vector % dest_vcpus; 1077 - 1078 - for (i = 0; i <= mod; i++) { 1079 - idx = find_next_bit(bitmap, bitmap_size, idx + 1); 1080 - BUG_ON(idx == bitmap_size); 1081 - } 1082 - 1071 + BUG_ON(idx >= bitmap_size); 1083 1072 return idx; 1084 1073 } 1085 1074 ··· 1101 1104 } 1102 1105 1103 1106 return false; 1107 + } 1108 + 1109 + static bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) 1110 + { 1111 + return (irq->delivery_mode == APIC_DM_LOWEST || irq->msi_redir_hint); 1112 + } 1113 + 1114 + static int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) 1115 + { 1116 + return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; 1104 1117 } 1105 1118 1106 1119 /* Return true if the interrupt can be handled by using *bitmap as index mask ··· 1156 1149 if (!kvm_lowest_prio_delivery(irq)) 1157 1150 return true; 1158 1151 1159 - if (!kvm_vector_hashing_enabled()) { 1152 + if (!vector_hashing_enabled) { 1160 1153 lowest = -1; 1161 1154 for_each_set_bit(i, bitmap, 16) { 1162 1155 if (!(*dst)[i]) ··· 1237 1230 * interrupt. 1238 1231 * - Otherwise, use remapped mode to inject the interrupt. 1239 1232 */ 1240 - bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 1241 - struct kvm_vcpu **dest_vcpu) 1233 + static bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, 1234 + struct kvm_lapic_irq *irq, 1235 + struct kvm_vcpu **dest_vcpu) 1242 1236 { 1243 1237 struct kvm_apic_map *map; 1244 1238 unsigned long bitmap; ··· 1264 1256 1265 1257 rcu_read_unlock(); 1266 1258 return ret; 1259 + } 1260 + 1261 + bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 1262 + struct kvm_vcpu **dest_vcpu) 1263 + { 1264 + int r = 0; 1265 + unsigned long i; 1266 + struct kvm_vcpu *vcpu; 1267 + 1268 + if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu)) 1269 + return true; 1270 + 1271 + kvm_for_each_vcpu(i, vcpu, kvm) { 1272 + if (!kvm_apic_present(vcpu)) 1273 + continue; 1274 + 1275 + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand, 1276 + irq->dest_id, irq->dest_mode)) 1277 + continue; 1278 + 1279 + if (++r == 2) 1280 + return false; 1281 + 1282 + *dest_vcpu = vcpu; 1283 + } 1284 + 1285 + return r == 1; 1286 + } 1287 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_intr_is_single_vcpu); 1288 + 1289 + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 1290 + struct kvm_lapic_irq *irq, struct dest_map *dest_map) 1291 + { 1292 + int r = -1; 1293 + struct kvm_vcpu *vcpu, *lowest = NULL; 1294 + unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; 1295 + unsigned int dest_vcpus = 0; 1296 + 1297 + if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) 1298 + return r; 1299 + 1300 + if (irq->dest_mode == APIC_DEST_PHYSICAL && 1301 + irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) { 1302 + pr_info("apic: phys broadcast and lowest prio\n"); 1303 + irq->delivery_mode = APIC_DM_FIXED; 1304 + } 1305 + 1306 + memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap)); 1307 + 1308 + kvm_for_each_vcpu(i, vcpu, kvm) { 1309 + if (!kvm_apic_present(vcpu)) 1310 + continue; 1311 + 1312 + if (!kvm_apic_match_dest(vcpu, src, irq->shorthand, 1313 + irq->dest_id, irq->dest_mode)) 1314 + continue; 1315 + 1316 + if (!kvm_lowest_prio_delivery(irq)) { 1317 + if (r < 0) 1318 + r = 0; 1319 + r += kvm_apic_set_irq(vcpu, irq, dest_map); 1320 + } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) { 1321 + if (!vector_hashing_enabled) { 1322 + if (!lowest) 1323 + lowest = vcpu; 1324 + else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 1325 + lowest = vcpu; 1326 + } else { 1327 + __set_bit(i, dest_vcpu_bitmap); 1328 + dest_vcpus++; 1329 + } 1330 + } 1331 + } 1332 + 1333 + if (dest_vcpus != 0) { 1334 + int idx = kvm_vector_to_index(irq->vector, dest_vcpus, 1335 + dest_vcpu_bitmap, KVM_MAX_VCPUS); 1336 + 1337 + lowest = kvm_get_vcpu(kvm, idx); 1338 + } 1339 + 1340 + if (lowest) 1341 + r = kvm_apic_set_irq(lowest, irq, dest_map); 1342 + 1343 + return r; 1267 1344 } 1268 1345 1269 1346 /* ··· 1494 1401 rcu_read_unlock(); 1495 1402 } 1496 1403 1497 - int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) 1498 - { 1499 - return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio; 1500 - } 1501 - 1502 1404 static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector) 1503 1405 { 1504 1406 return test_bit(vector, apic->vcpu->arch.ioapic_handled_vectors); ··· 1569 1481 kvm_ioapic_send_eoi(apic, vector); 1570 1482 kvm_make_request(KVM_REQ_EVENT, apic->vcpu); 1571 1483 } 1572 - EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated); 1484 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_eoi_accelerated); 1485 + 1486 + static void kvm_icr_to_lapic_irq(struct kvm_lapic *apic, u32 icr_low, 1487 + u32 icr_high, struct kvm_lapic_irq *irq) 1488 + { 1489 + /* KVM has no delay and should always clear the BUSY/PENDING flag. */ 1490 + WARN_ON_ONCE(icr_low & APIC_ICR_BUSY); 1491 + 1492 + irq->vector = icr_low & APIC_VECTOR_MASK; 1493 + irq->delivery_mode = icr_low & APIC_MODE_MASK; 1494 + irq->dest_mode = icr_low & APIC_DEST_MASK; 1495 + irq->level = (icr_low & APIC_INT_ASSERT) != 0; 1496 + irq->trig_mode = icr_low & APIC_INT_LEVELTRIG; 1497 + irq->shorthand = icr_low & APIC_SHORT_MASK; 1498 + irq->msi_redir_hint = false; 1499 + if (apic_x2apic_mode(apic)) 1500 + irq->dest_id = icr_high; 1501 + else 1502 + irq->dest_id = GET_XAPIC_DEST_FIELD(icr_high); 1503 + } 1573 1504 1574 1505 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high) 1575 1506 { 1576 1507 struct kvm_lapic_irq irq; 1577 1508 1578 - /* KVM has no delay and should always clear the BUSY/PENDING flag. */ 1579 - WARN_ON_ONCE(icr_low & APIC_ICR_BUSY); 1580 - 1581 - irq.vector = icr_low & APIC_VECTOR_MASK; 1582 - irq.delivery_mode = icr_low & APIC_MODE_MASK; 1583 - irq.dest_mode = icr_low & APIC_DEST_MASK; 1584 - irq.level = (icr_low & APIC_INT_ASSERT) != 0; 1585 - irq.trig_mode = icr_low & APIC_INT_LEVELTRIG; 1586 - irq.shorthand = icr_low & APIC_SHORT_MASK; 1587 - irq.msi_redir_hint = false; 1588 - if (apic_x2apic_mode(apic)) 1589 - irq.dest_id = icr_high; 1590 - else 1591 - irq.dest_id = GET_XAPIC_DEST_FIELD(icr_high); 1509 + kvm_icr_to_lapic_irq(apic, icr_low, icr_high, &irq); 1592 1510 1593 1511 trace_kvm_apic_ipi(icr_low, irq.dest_id); 1594 1512 1595 1513 kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL); 1596 1514 } 1597 - EXPORT_SYMBOL_GPL(kvm_apic_send_ipi); 1515 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_send_ipi); 1598 1516 1599 1517 static u32 apic_get_tmcct(struct kvm_lapic *apic) 1600 1518 { ··· 1717 1623 1718 1624 return valid_reg_mask; 1719 1625 } 1720 - EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask); 1626 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_readable_reg_mask); 1721 1627 1722 1628 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len, 1723 1629 void *data) ··· 1958 1864 lapic_timer_int_injected(vcpu)) 1959 1865 __kvm_wait_lapic_expire(vcpu); 1960 1866 } 1961 - EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire); 1867 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_wait_lapic_expire); 1962 1868 1963 1869 static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic) 1964 1870 { ··· 2272 2178 out: 2273 2179 preempt_enable(); 2274 2180 } 2275 - EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer); 2181 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_expired_hv_timer); 2276 2182 2277 2183 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu) 2278 2184 { ··· 2525 2431 { 2526 2432 kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0); 2527 2433 } 2528 - EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); 2434 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_set_eoi); 2529 2435 2530 2436 #define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13)) 2531 2437 2532 - int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data) 2438 + static int __kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data, bool fast) 2533 2439 { 2534 2440 if (data & X2APIC_ICR_RESERVED_BITS) 2535 2441 return 1; ··· 2544 2450 */ 2545 2451 data &= ~APIC_ICR_BUSY; 2546 2452 2547 - kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32)); 2453 + if (fast) { 2454 + struct kvm_lapic_irq irq; 2455 + int ignored; 2456 + 2457 + kvm_icr_to_lapic_irq(apic, (u32)data, (u32)(data >> 32), &irq); 2458 + 2459 + if (!kvm_irq_delivery_to_apic_fast(apic->vcpu->kvm, apic, &irq, 2460 + &ignored, NULL)) 2461 + return -EWOULDBLOCK; 2462 + 2463 + trace_kvm_apic_ipi((u32)data, irq.dest_id); 2464 + } else { 2465 + kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32)); 2466 + } 2548 2467 if (kvm_x86_ops.x2apic_icr_is_split) { 2549 2468 kvm_lapic_set_reg(apic, APIC_ICR, data); 2550 2469 kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32); ··· 2566 2459 } 2567 2460 trace_kvm_apic_write(APIC_ICR, data); 2568 2461 return 0; 2462 + } 2463 + 2464 + static int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data) 2465 + { 2466 + return __kvm_x2apic_icr_write(apic, data, false); 2467 + } 2468 + 2469 + int kvm_x2apic_icr_write_fast(struct kvm_lapic *apic, u64 data) 2470 + { 2471 + return __kvm_x2apic_icr_write(apic, data, true); 2569 2472 } 2570 2473 2571 2474 static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic) ··· 2608 2491 else 2609 2492 kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset)); 2610 2493 } 2611 - EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); 2494 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_write_nodecode); 2612 2495 2613 2496 void kvm_free_lapic(struct kvm_vcpu *vcpu) 2614 2497 { ··· 2746 2629 kvm_recalculate_apic_map(vcpu->kvm); 2747 2630 return 0; 2748 2631 } 2749 - EXPORT_SYMBOL_GPL(kvm_apic_set_base); 2632 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_set_base); 2750 2633 2751 2634 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) 2752 2635 { ··· 2778 2661 int kvm_alloc_apic_access_page(struct kvm *kvm) 2779 2662 { 2780 2663 void __user *hva; 2781 - int ret = 0; 2782 2664 2783 - mutex_lock(&kvm->slots_lock); 2665 + guard(mutex)(&kvm->slots_lock); 2666 + 2784 2667 if (kvm->arch.apic_access_memslot_enabled || 2785 2668 kvm->arch.apic_access_memslot_inhibited) 2786 - goto out; 2669 + return 0; 2787 2670 2788 2671 hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 2789 2672 APIC_DEFAULT_PHYS_BASE, PAGE_SIZE); 2790 - if (IS_ERR(hva)) { 2791 - ret = PTR_ERR(hva); 2792 - goto out; 2793 - } 2673 + if (IS_ERR(hva)) 2674 + return PTR_ERR(hva); 2794 2675 2795 2676 kvm->arch.apic_access_memslot_enabled = true; 2796 - out: 2797 - mutex_unlock(&kvm->slots_lock); 2798 - return ret; 2677 + 2678 + return 0; 2799 2679 } 2800 - EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page); 2680 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_alloc_apic_access_page); 2801 2681 2802 2682 void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu) 2803 2683 { ··· 3058 2944 __apic_update_ppr(apic, &ppr); 3059 2945 return apic_has_interrupt_for_ppr(apic, ppr); 3060 2946 } 3061 - EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt); 2947 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_has_interrupt); 3062 2948 3063 2949 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) 3064 2950 { ··· 3117 3003 } 3118 3004 3119 3005 } 3120 - EXPORT_SYMBOL_GPL(kvm_apic_ack_interrupt); 3006 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apic_ack_interrupt); 3121 3007 3122 3008 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, 3123 3009 struct kvm_lapic_state *s, bool set)

+6 -13

arch/x86/kvm/lapic.h

··· 105 105 void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu); 106 106 bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, 107 107 int shorthand, unsigned int dest, int dest_mode); 108 - int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2); 109 108 void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec); 110 109 bool __kvm_apic_update_irr(unsigned long *pir, void *regs, int *max_irr); 111 110 bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, unsigned long *pir, int *max_irr); ··· 118 119 119 120 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, 120 121 struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map); 122 + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 123 + struct kvm_lapic_irq *irq, 124 + struct dest_map *dest_map); 121 125 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high); 122 126 123 127 int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated); ··· 139 137 void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu); 140 138 void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu); 141 139 142 - int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data); 140 + int kvm_x2apic_icr_write_fast(struct kvm_lapic *apic, u64 data); 143 141 int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data); 144 142 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data); 145 143 ··· 224 222 !kvm_x86_call(apic_init_signal_blocked)(vcpu); 225 223 } 226 224 227 - static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) 228 - { 229 - return (irq->delivery_mode == APIC_DM_LOWEST || 230 - irq->msi_redir_hint); 231 - } 232 - 233 225 static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu) 234 226 { 235 227 return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); ··· 236 240 void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq, 237 241 unsigned long *vcpu_bitmap); 238 242 239 - bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 240 - struct kvm_vcpu **dest_vcpu); 241 - int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 242 - const unsigned long *bitmap, u32 bitmap_size); 243 + bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, 244 + struct kvm_vcpu **dest_vcpu); 243 245 void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu); 244 246 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu); 245 247 void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu); 246 248 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu); 247 249 void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu); 248 - bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu); 249 250 250 251 static inline enum lapic_mode kvm_apic_mode(u64 apic_base) 251 252 {

+1 -1

arch/x86/kvm/mmu.h

··· 212 212 213 213 fault = (mmu->permissions[index] >> pte_access) & 1; 214 214 215 - WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); 215 + WARN_ON_ONCE(pfec & (PFERR_PK_MASK | PFERR_SS_MASK | PFERR_RSVD_MASK)); 216 216 if (unlikely(mmu->pkru_mask)) { 217 217 u32 pkru_bits, offset; 218 218

+127 -74

arch/x86/kvm/mmu/mmu.c

··· 110 110 #ifdef CONFIG_X86_64 111 111 bool __read_mostly tdp_mmu_enabled = true; 112 112 module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); 113 - EXPORT_SYMBOL_GPL(tdp_mmu_enabled); 113 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(tdp_mmu_enabled); 114 114 #endif 115 115 116 116 static int max_huge_page_level __read_mostly; ··· 776 776 kvm_flush_remote_tlbs_gfn(kvm, gfn, PG_LEVEL_4K); 777 777 } 778 778 779 - void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) 779 + void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 780 + enum kvm_mmu_type mmu_type) 780 781 { 781 782 /* 782 783 * If it's possible to replace the shadow page with an NX huge page, ··· 791 790 return; 792 791 793 792 ++kvm->stat.nx_lpage_splits; 793 + ++kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages; 794 794 list_add_tail(&sp->possible_nx_huge_page_link, 795 - &kvm->arch.possible_nx_huge_pages); 795 + &kvm->arch.possible_nx_huge_pages[mmu_type].pages); 796 796 } 797 797 798 798 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, ··· 802 800 sp->nx_huge_page_disallowed = true; 803 801 804 802 if (nx_huge_page_possible) 805 - track_possible_nx_huge_page(kvm, sp); 803 + track_possible_nx_huge_page(kvm, sp, KVM_SHADOW_MMU); 806 804 } 807 805 808 806 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) ··· 821 819 kvm_mmu_gfn_allow_lpage(slot, gfn); 822 820 } 823 821 824 - void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) 822 + void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 823 + enum kvm_mmu_type mmu_type) 825 824 { 826 825 if (list_empty(&sp->possible_nx_huge_page_link)) 827 826 return; 828 827 829 828 --kvm->stat.nx_lpage_splits; 829 + --kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages; 830 830 list_del_init(&sp->possible_nx_huge_page_link); 831 831 } 832 832 ··· 836 832 { 837 833 sp->nx_huge_page_disallowed = false; 838 834 839 - untrack_possible_nx_huge_page(kvm, sp); 835 + untrack_possible_nx_huge_page(kvm, sp, KVM_SHADOW_MMU); 840 836 } 841 837 842 838 static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, ··· 3865 3861 write_unlock(&kvm->mmu_lock); 3866 3862 } 3867 3863 } 3868 - EXPORT_SYMBOL_GPL(kvm_mmu_free_roots); 3864 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_roots); 3869 3865 3870 3866 void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu) 3871 3867 { ··· 3892 3888 3893 3889 kvm_mmu_free_roots(kvm, mmu, roots_to_free); 3894 3890 } 3895 - EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots); 3891 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_guest_mode_roots); 3896 3892 3897 3893 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, 3898 3894 u8 level) ··· 4667 4663 /* 4668 4664 * Retry the page fault if the gfn hit a memslot that is being deleted 4669 4665 * or moved. This ensures any existing SPTEs for the old memslot will 4670 - * be zapped before KVM inserts a new MMIO SPTE for the gfn. 4666 + * be zapped before KVM inserts a new MMIO SPTE for the gfn. Punt the 4667 + * error to userspace if this is a prefault, as KVM's prefaulting ABI 4668 + * doesn't provide the same forward progress guarantees as KVM_RUN. 4671 4669 */ 4672 - if (slot->flags & KVM_MEMSLOT_INVALID) 4670 + if (slot->flags & KVM_MEMSLOT_INVALID) { 4671 + if (fault->prefetch) 4672 + return -EAGAIN; 4673 + 4673 4674 return RET_PF_RETRY; 4675 + } 4674 4676 4675 4677 if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) { 4676 4678 /* ··· 4876 4866 4877 4867 return r; 4878 4868 } 4879 - EXPORT_SYMBOL_GPL(kvm_handle_page_fault); 4869 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_page_fault); 4880 4870 4881 4871 #ifdef CONFIG_X86_64 4882 4872 static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, ··· 4966 4956 return -EIO; 4967 4957 } 4968 4958 } 4969 - EXPORT_SYMBOL_GPL(kvm_tdp_map_page); 4959 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_map_page); 4970 4960 4971 4961 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, 4972 4962 struct kvm_pre_fault_memory *range) ··· 5162 5152 __clear_sp_write_flooding_count(sp); 5163 5153 } 5164 5154 } 5165 - EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd); 5155 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_new_pgd); 5166 5156 5167 5157 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn, 5168 5158 unsigned int access) ··· 5808 5798 shadow_mmu_init_context(vcpu, context, cpu_role, root_role); 5809 5799 kvm_mmu_new_pgd(vcpu, nested_cr3); 5810 5800 } 5811 - EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu); 5801 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_shadow_npt_mmu); 5812 5802 5813 5803 static union kvm_cpu_role 5814 5804 kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, ··· 5862 5852 5863 5853 kvm_mmu_new_pgd(vcpu, new_eptp); 5864 5854 } 5865 - EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); 5855 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_shadow_ept_mmu); 5866 5856 5867 5857 static void init_kvm_softmmu(struct kvm_vcpu *vcpu, 5868 5858 union kvm_cpu_role cpu_role) ··· 5927 5917 else 5928 5918 init_kvm_softmmu(vcpu, cpu_role); 5929 5919 } 5930 - EXPORT_SYMBOL_GPL(kvm_init_mmu); 5920 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_mmu); 5931 5921 5932 5922 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) 5933 5923 { ··· 5963 5953 kvm_mmu_unload(vcpu); 5964 5954 kvm_init_mmu(vcpu); 5965 5955 } 5966 - EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); 5956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_reset_context); 5967 5957 5968 5958 int kvm_mmu_load(struct kvm_vcpu *vcpu) 5969 5959 { ··· 5997 5987 out: 5998 5988 return r; 5999 5989 } 6000 - EXPORT_SYMBOL_GPL(kvm_mmu_load); 5990 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_load); 6001 5991 6002 5992 void kvm_mmu_unload(struct kvm_vcpu *vcpu) 6003 5993 { ··· 6059 6049 __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.root_mmu); 6060 6050 __kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.guest_mmu); 6061 6051 } 6062 - EXPORT_SYMBOL_GPL(kvm_mmu_free_obsolete_roots); 6052 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_free_obsolete_roots); 6063 6053 6064 6054 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, 6065 6055 int *bytes) ··· 6385 6375 return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn, 6386 6376 insn_len); 6387 6377 } 6388 - EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); 6378 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_page_fault); 6389 6379 6390 6380 void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg) 6391 6381 { ··· 6401 6391 pr_cont(", spte[%d] = 0x%llx", level, sptes[level]); 6402 6392 pr_cont("\n"); 6403 6393 } 6404 - EXPORT_SYMBOL_GPL(kvm_mmu_print_sptes); 6394 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_print_sptes); 6405 6395 6406 6396 static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 6407 6397 u64 addr, hpa_t root_hpa) ··· 6467 6457 __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa); 6468 6458 } 6469 6459 } 6470 - EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr); 6460 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invalidate_addr); 6471 6461 6472 6462 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) 6473 6463 { ··· 6484 6474 kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL); 6485 6475 ++vcpu->stat.invlpg; 6486 6476 } 6487 - EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); 6477 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invlpg); 6488 6478 6489 6479 6490 6480 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid) ··· 6537 6527 else 6538 6528 max_huge_page_level = PG_LEVEL_2M; 6539 6529 } 6540 - EXPORT_SYMBOL_GPL(kvm_configure_mmu); 6530 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_configure_mmu); 6541 6531 6542 6532 static void free_mmu_pages(struct kvm_mmu *mmu) 6543 6533 { ··· 6761 6751 6762 6752 int kvm_mmu_init_vm(struct kvm *kvm) 6763 6753 { 6764 - int r; 6754 + int r, i; 6765 6755 6766 6756 kvm->arch.shadow_mmio_value = shadow_mmio_value; 6767 6757 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); 6768 - INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); 6758 + for (i = 0; i < KVM_NR_MMU_TYPES; ++i) 6759 + INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages[i].pages); 6769 6760 spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); 6770 6761 6771 6762 if (tdp_mmu_enabled) { ··· 7204 7193 7205 7194 return need_tlb_flush; 7206 7195 } 7207 - EXPORT_SYMBOL_GPL(kvm_zap_gfn_range); 7196 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_zap_gfn_range); 7208 7197 7209 7198 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm, 7210 7199 const struct kvm_memory_slot *slot) ··· 7607 7596 return err; 7608 7597 } 7609 7598 7610 - static void kvm_recover_nx_huge_pages(struct kvm *kvm) 7599 + static unsigned long nx_huge_pages_to_zap(struct kvm *kvm, 7600 + enum kvm_mmu_type mmu_type) 7611 7601 { 7612 - unsigned long nx_lpage_splits = kvm->stat.nx_lpage_splits; 7602 + unsigned long pages = READ_ONCE(kvm->arch.possible_nx_huge_pages[mmu_type].nr_pages); 7603 + unsigned int ratio = READ_ONCE(nx_huge_pages_recovery_ratio); 7604 + 7605 + return ratio ? DIV_ROUND_UP(pages, ratio) : 0; 7606 + } 7607 + 7608 + static bool kvm_mmu_sp_dirty_logging_enabled(struct kvm *kvm, 7609 + struct kvm_mmu_page *sp) 7610 + { 7613 7611 struct kvm_memory_slot *slot; 7614 - int rcu_idx; 7612 + 7613 + /* 7614 + * Skip the memslot lookup if dirty tracking can't possibly be enabled, 7615 + * as memslot lookups are relatively expensive. 7616 + * 7617 + * If a memslot update is in progress, reading an incorrect value of 7618 + * kvm->nr_memslots_dirty_logging is not a problem: if it is becoming 7619 + * zero, KVM will do an unnecessary memslot lookup; if it is becoming 7620 + * nonzero, the page will be zapped unnecessarily. Either way, this 7621 + * only affects efficiency in racy situations, and not correctness. 7622 + */ 7623 + if (!atomic_read(&kvm->nr_memslots_dirty_logging)) 7624 + return false; 7625 + 7626 + slot = __gfn_to_memslot(kvm_memslots_for_spte_role(kvm, sp->role), sp->gfn); 7627 + if (WARN_ON_ONCE(!slot)) 7628 + return false; 7629 + 7630 + return kvm_slot_dirty_track_enabled(slot); 7631 + } 7632 + 7633 + static void kvm_recover_nx_huge_pages(struct kvm *kvm, 7634 + const enum kvm_mmu_type mmu_type) 7635 + { 7636 + #ifdef CONFIG_X86_64 7637 + const bool is_tdp_mmu = mmu_type == KVM_TDP_MMU; 7638 + spinlock_t *tdp_mmu_pages_lock = &kvm->arch.tdp_mmu_pages_lock; 7639 + #else 7640 + const bool is_tdp_mmu = false; 7641 + spinlock_t *tdp_mmu_pages_lock = NULL; 7642 + #endif 7643 + unsigned long to_zap = nx_huge_pages_to_zap(kvm, mmu_type); 7644 + struct list_head *nx_huge_pages; 7615 7645 struct kvm_mmu_page *sp; 7616 - unsigned int ratio; 7617 7646 LIST_HEAD(invalid_list); 7618 7647 bool flush = false; 7619 - ulong to_zap; 7648 + int rcu_idx; 7649 + 7650 + nx_huge_pages = &kvm->arch.possible_nx_huge_pages[mmu_type].pages; 7620 7651 7621 7652 rcu_idx = srcu_read_lock(&kvm->srcu); 7622 - write_lock(&kvm->mmu_lock); 7653 + if (is_tdp_mmu) 7654 + read_lock(&kvm->mmu_lock); 7655 + else 7656 + write_lock(&kvm->mmu_lock); 7623 7657 7624 7658 /* 7625 7659 * Zapping TDP MMU shadow pages, including the remote TLB flush, must ··· 7673 7617 */ 7674 7618 rcu_read_lock(); 7675 7619 7676 - ratio = READ_ONCE(nx_huge_pages_recovery_ratio); 7677 - to_zap = ratio ? DIV_ROUND_UP(nx_lpage_splits, ratio) : 0; 7678 7620 for ( ; to_zap; --to_zap) { 7679 - if (list_empty(&kvm->arch.possible_nx_huge_pages)) 7621 + if (is_tdp_mmu) 7622 + spin_lock(tdp_mmu_pages_lock); 7623 + 7624 + if (list_empty(nx_huge_pages)) { 7625 + if (is_tdp_mmu) 7626 + spin_unlock(tdp_mmu_pages_lock); 7680 7627 break; 7628 + } 7681 7629 7682 7630 /* 7683 7631 * We use a separate list instead of just using active_mmu_pages ··· 7690 7630 * the total number of shadow pages. And because the TDP MMU 7691 7631 * doesn't use active_mmu_pages. 7692 7632 */ 7693 - sp = list_first_entry(&kvm->arch.possible_nx_huge_pages, 7633 + sp = list_first_entry(nx_huge_pages, 7694 7634 struct kvm_mmu_page, 7695 7635 possible_nx_huge_page_link); 7696 7636 WARN_ON_ONCE(!sp->nx_huge_page_disallowed); 7697 7637 WARN_ON_ONCE(!sp->role.direct); 7698 7638 7699 - /* 7700 - * Unaccount and do not attempt to recover any NX Huge Pages 7701 - * that are being dirty tracked, as they would just be faulted 7702 - * back in as 4KiB pages. The NX Huge Pages in this slot will be 7703 - * recovered, along with all the other huge pages in the slot, 7704 - * when dirty logging is disabled. 7705 - * 7706 - * Since gfn_to_memslot() is relatively expensive, it helps to 7707 - * skip it if it the test cannot possibly return true. On the 7708 - * other hand, if any memslot has logging enabled, chances are 7709 - * good that all of them do, in which case unaccount_nx_huge_page() 7710 - * is much cheaper than zapping the page. 7711 - * 7712 - * If a memslot update is in progress, reading an incorrect value 7713 - * of kvm->nr_memslots_dirty_logging is not a problem: if it is 7714 - * becoming zero, gfn_to_memslot() will be done unnecessarily; if 7715 - * it is becoming nonzero, the page will be zapped unnecessarily. 7716 - * Either way, this only affects efficiency in racy situations, 7717 - * and not correctness. 7718 - */ 7719 - slot = NULL; 7720 - if (atomic_read(&kvm->nr_memslots_dirty_logging)) { 7721 - struct kvm_memslots *slots; 7639 + unaccount_nx_huge_page(kvm, sp); 7722 7640 7723 - slots = kvm_memslots_for_spte_role(kvm, sp->role); 7724 - slot = __gfn_to_memslot(slots, sp->gfn); 7725 - WARN_ON_ONCE(!slot); 7641 + if (is_tdp_mmu) 7642 + spin_unlock(tdp_mmu_pages_lock); 7643 + 7644 + /* 7645 + * Do not attempt to recover any NX Huge Pages that are being 7646 + * dirty tracked, as they would just be faulted back in as 4KiB 7647 + * pages. The NX Huge Pages in this slot will be recovered, 7648 + * along with all the other huge pages in the slot, when dirty 7649 + * logging is disabled. 7650 + */ 7651 + if (!kvm_mmu_sp_dirty_logging_enabled(kvm, sp)) { 7652 + if (is_tdp_mmu) 7653 + flush |= kvm_tdp_mmu_zap_possible_nx_huge_page(kvm, sp); 7654 + else 7655 + kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 7656 + 7726 7657 } 7727 7658 7728 - if (slot && kvm_slot_dirty_track_enabled(slot)) 7729 - unaccount_nx_huge_page(kvm, sp); 7730 - else if (is_tdp_mmu_page(sp)) 7731 - flush |= kvm_tdp_mmu_zap_sp(kvm, sp); 7732 - else 7733 - kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); 7734 7659 WARN_ON_ONCE(sp->nx_huge_page_disallowed); 7735 7660 7736 7661 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) { 7737 7662 kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush); 7738 7663 rcu_read_unlock(); 7739 7664 7740 - cond_resched_rwlock_write(&kvm->mmu_lock); 7741 - flush = false; 7665 + if (is_tdp_mmu) 7666 + cond_resched_rwlock_read(&kvm->mmu_lock); 7667 + else 7668 + cond_resched_rwlock_write(&kvm->mmu_lock); 7742 7669 7670 + flush = false; 7743 7671 rcu_read_lock(); 7744 7672 } 7745 7673 } ··· 7735 7687 7736 7688 rcu_read_unlock(); 7737 7689 7738 - write_unlock(&kvm->mmu_lock); 7690 + if (is_tdp_mmu) 7691 + read_unlock(&kvm->mmu_lock); 7692 + else 7693 + write_unlock(&kvm->mmu_lock); 7739 7694 srcu_read_unlock(&kvm->srcu, rcu_idx); 7740 7695 } 7741 7696 ··· 7749 7698 static bool kvm_nx_huge_page_recovery_worker(void *data) 7750 7699 { 7751 7700 struct kvm *kvm = data; 7701 + long remaining_time; 7752 7702 bool enabled; 7753 7703 uint period; 7754 - long remaining_time; 7704 + int i; 7755 7705 7756 7706 enabled = calc_nx_huge_pages_recovery_period(&period); 7757 7707 if (!enabled) ··· 7767 7715 } 7768 7716 7769 7717 __set_current_state(TASK_RUNNING); 7770 - kvm_recover_nx_huge_pages(kvm); 7718 + for (i = 0; i < KVM_NR_MMU_TYPES; ++i) 7719 + kvm_recover_nx_huge_pages(kvm, i); 7771 7720 kvm->arch.nx_huge_page_last = get_jiffies_64(); 7772 7721 return true; 7773 7722 }

+4 -2

arch/x86/kvm/mmu/mmu_internal.h

··· 416 416 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 417 417 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); 418 418 419 - void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); 420 - void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); 419 + void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 420 + enum kvm_mmu_type mmu_type); 421 + void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, 422 + enum kvm_mmu_type mmu_type); 421 423 422 424 #endif /* __KVM_X86_MMU_INTERNAL_H */

+3

arch/x86/kvm/mmu/mmutrace.h

··· 51 51 { PFERR_PRESENT_MASK, "P" }, \ 52 52 { PFERR_WRITE_MASK, "W" }, \ 53 53 { PFERR_USER_MASK, "U" }, \ 54 + { PFERR_PK_MASK, "PK" }, \ 55 + { PFERR_SS_MASK, "SS" }, \ 56 + { PFERR_SGX_MASK, "SGX" }, \ 54 57 { PFERR_RSVD_MASK, "RSVD" }, \ 55 58 { PFERR_FETCH_MASK, "F" } 56 59

+5 -5

arch/x86/kvm/mmu/spte.c

··· 22 22 bool __read_mostly enable_mmio_caching = true; 23 23 static bool __ro_after_init allow_mmio_caching; 24 24 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444); 25 - EXPORT_SYMBOL_GPL(enable_mmio_caching); 25 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mmio_caching); 26 26 27 27 bool __read_mostly kvm_ad_enabled; 28 28 ··· 470 470 shadow_mmio_mask = mmio_mask; 471 471 shadow_mmio_access_mask = access_mask; 472 472 } 473 - EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); 473 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_mmio_spte_mask); 474 474 475 475 void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) 476 476 { 477 477 kvm->arch.shadow_mmio_value = mmio_value; 478 478 } 479 - EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); 479 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_mmio_spte_value); 480 480 481 481 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) 482 482 { ··· 487 487 shadow_me_value = me_value; 488 488 shadow_me_mask = me_mask; 489 489 } 490 - EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask); 490 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_me_spte_mask); 491 491 492 492 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only) 493 493 { ··· 513 513 kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE, 514 514 VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT, 0); 515 515 } 516 - EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks); 516 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_ept_masks); 517 517 518 518 void kvm_mmu_reset_all_pte_masks(void) 519 519 {

+40 -11

arch/x86/kvm/mmu/tdp_mmu.c

··· 355 355 356 356 spin_lock(&kvm->arch.tdp_mmu_pages_lock); 357 357 sp->nx_huge_page_disallowed = false; 358 - untrack_possible_nx_huge_page(kvm, sp); 358 + untrack_possible_nx_huge_page(kvm, sp, KVM_TDP_MMU); 359 359 spin_unlock(&kvm->arch.tdp_mmu_pages_lock); 360 360 } 361 361 ··· 925 925 rcu_read_unlock(); 926 926 } 927 927 928 - bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) 928 + bool kvm_tdp_mmu_zap_possible_nx_huge_page(struct kvm *kvm, 929 + struct kvm_mmu_page *sp) 929 930 { 930 - u64 old_spte; 931 + struct tdp_iter iter = { 932 + .old_spte = sp->ptep ? kvm_tdp_mmu_read_spte(sp->ptep) : 0, 933 + .sptep = sp->ptep, 934 + .level = sp->role.level + 1, 935 + .gfn = sp->gfn, 936 + .as_id = kvm_mmu_page_as_id(sp), 937 + }; 938 + 939 + lockdep_assert_held_read(&kvm->mmu_lock); 940 + 941 + if (WARN_ON_ONCE(!is_tdp_mmu_page(sp))) 942 + return false; 931 943 932 944 /* 933 - * This helper intentionally doesn't allow zapping a root shadow page, 934 - * which doesn't have a parent page table and thus no associated entry. 945 + * Root shadow pages don't have a parent page table and thus no 946 + * associated entry, but they can never be possible NX huge pages. 935 947 */ 936 948 if (WARN_ON_ONCE(!sp->ptep)) 937 949 return false; 938 950 939 - old_spte = kvm_tdp_mmu_read_spte(sp->ptep); 940 - if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte))) 951 + /* 952 + * Since mmu_lock is held in read mode, it's possible another task has 953 + * already modified the SPTE. Zap the SPTE if and only if the SPTE 954 + * points at the SP's page table, as checking shadow-present isn't 955 + * sufficient, e.g. the SPTE could be replaced by a leaf SPTE, or even 956 + * another SP. Note, spte_to_child_pt() also checks that the SPTE is 957 + * shadow-present, i.e. guards against zapping a frozen SPTE. 958 + */ 959 + if ((tdp_ptep_t)sp->spt != spte_to_child_pt(iter.old_spte, iter.level)) 941 960 return false; 942 961 943 - tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 944 - SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1); 962 + /* 963 + * If a different task modified the SPTE, then it should be impossible 964 + * for the SPTE to still be used for the to-be-zapped SP. Non-leaf 965 + * SPTEs don't have Dirty bits, KVM always sets the Accessed bit when 966 + * creating non-leaf SPTEs, and all other bits are immutable for non- 967 + * leaf SPTEs, i.e. the only legal operations for non-leaf SPTEs are 968 + * zapping and replacement. 969 + */ 970 + if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE)) { 971 + WARN_ON_ONCE((tdp_ptep_t)sp->spt == spte_to_child_pt(iter.old_spte, iter.level)); 972 + return false; 973 + } 945 974 946 975 return true; 947 976 } ··· 1332 1303 fault->req_level >= iter.level) { 1333 1304 spin_lock(&kvm->arch.tdp_mmu_pages_lock); 1334 1305 if (sp->nx_huge_page_disallowed) 1335 - track_possible_nx_huge_page(kvm, sp); 1306 + track_possible_nx_huge_page(kvm, sp, KVM_TDP_MMU); 1336 1307 spin_unlock(&kvm->arch.tdp_mmu_pages_lock); 1337 1308 } 1338 1309 } ··· 1982 1953 spte = sptes[leaf]; 1983 1954 return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); 1984 1955 } 1985 - EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); 1956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_mmu_gpa_is_mapped); 1986 1957 1987 1958 /* 1988 1959 * Returns the last level spte pointer of the shadow page walk for the given

+2 -1

arch/x86/kvm/mmu/tdp_mmu.h

··· 64 64 } 65 65 66 66 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); 67 - bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); 67 + bool kvm_tdp_mmu_zap_possible_nx_huge_page(struct kvm *kvm, 68 + struct kvm_mmu_page *sp); 68 69 void kvm_tdp_mmu_zap_all(struct kvm *kvm); 69 70 void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, 70 71 enum kvm_tdp_mmu_root_types root_types);

+140 -35

arch/x86/kvm/pmu.c

··· 26 26 /* This is enough to filter the vast majority of currently defined events. */ 27 27 #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300 28 28 29 - struct x86_pmu_capability __read_mostly kvm_pmu_cap; 30 - EXPORT_SYMBOL_GPL(kvm_pmu_cap); 29 + /* Unadultered PMU capabilities of the host, i.e. of hardware. */ 30 + static struct x86_pmu_capability __read_mostly kvm_host_pmu; 31 31 32 - struct kvm_pmu_emulated_event_selectors __read_mostly kvm_pmu_eventsel; 33 - EXPORT_SYMBOL_GPL(kvm_pmu_eventsel); 32 + /* KVM's PMU capabilities, i.e. the intersection of KVM and hardware support. */ 33 + struct x86_pmu_capability __read_mostly kvm_pmu_cap; 34 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_cap); 35 + 36 + struct kvm_pmu_emulated_event_selectors { 37 + u64 INSTRUCTIONS_RETIRED; 38 + u64 BRANCH_INSTRUCTIONS_RETIRED; 39 + }; 40 + static struct kvm_pmu_emulated_event_selectors __read_mostly kvm_pmu_eventsel; 34 41 35 42 /* Precise Distribution of Instructions Retired (PDIR) */ 36 43 static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = { ··· 101 94 #define KVM_X86_PMU_OP_OPTIONAL __KVM_X86_PMU_OP 102 95 #include <asm/kvm-x86-pmu-ops.h> 103 96 #undef __KVM_X86_PMU_OP 97 + } 98 + 99 + void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) 100 + { 101 + bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; 102 + int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS; 103 + 104 + perf_get_x86_pmu_capability(&kvm_host_pmu); 105 + 106 + /* 107 + * Hybrid PMUs don't play nice with virtualization without careful 108 + * configuration by userspace, and KVM's APIs for reporting supported 109 + * vPMU features do not account for hybrid PMUs. Disable vPMU support 110 + * for hybrid PMUs until KVM gains a way to let userspace opt-in. 111 + */ 112 + if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) 113 + enable_pmu = false; 114 + 115 + if (enable_pmu) { 116 + /* 117 + * WARN if perf did NOT disable hardware PMU if the number of 118 + * architecturally required GP counters aren't present, i.e. if 119 + * there are a non-zero number of counters, but fewer than what 120 + * is architecturally required. 121 + */ 122 + if (!kvm_host_pmu.num_counters_gp || 123 + WARN_ON_ONCE(kvm_host_pmu.num_counters_gp < min_nr_gp_ctrs)) 124 + enable_pmu = false; 125 + else if (is_intel && !kvm_host_pmu.version) 126 + enable_pmu = false; 127 + } 128 + 129 + if (!enable_pmu) { 130 + memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); 131 + return; 132 + } 133 + 134 + memcpy(&kvm_pmu_cap, &kvm_host_pmu, sizeof(kvm_host_pmu)); 135 + kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2); 136 + kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp, 137 + pmu_ops->MAX_NR_GP_COUNTERS); 138 + kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed, 139 + KVM_MAX_NR_FIXED_COUNTERS); 140 + 141 + kvm_pmu_eventsel.INSTRUCTIONS_RETIRED = 142 + perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS); 143 + kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED = 144 + perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); 104 145 } 105 146 106 147 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi) ··· 373 318 pmc->counter &= pmc_bitmask(pmc); 374 319 pmc_update_sample_period(pmc); 375 320 } 376 - EXPORT_SYMBOL_GPL(pmc_write_counter); 321 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(pmc_write_counter); 377 322 378 323 static int filter_cmp(const void *pa, const void *pb, u64 mask) 379 324 { ··· 481 426 return true; 482 427 } 483 428 484 - static bool check_pmu_event_filter(struct kvm_pmc *pmc) 429 + static bool pmc_is_event_allowed(struct kvm_pmc *pmc) 485 430 { 486 431 struct kvm_x86_pmu_event_filter *filter; 487 432 struct kvm *kvm = pmc->vcpu->kvm; ··· 496 441 return is_fixed_event_allowed(filter, pmc->idx); 497 442 } 498 443 499 - static bool pmc_event_is_allowed(struct kvm_pmc *pmc) 500 - { 501 - return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) && 502 - check_pmu_event_filter(pmc); 503 - } 504 - 505 444 static int reprogram_counter(struct kvm_pmc *pmc) 506 445 { 507 446 struct kvm_pmu *pmu = pmc_to_pmu(pmc); ··· 506 457 507 458 emulate_overflow = pmc_pause_counter(pmc); 508 459 509 - if (!pmc_event_is_allowed(pmc)) 460 + if (!pmc_is_globally_enabled(pmc) || !pmc_is_locally_enabled(pmc) || 461 + !pmc_is_event_allowed(pmc)) 510 462 return 0; 511 463 512 464 if (emulate_overflow) ··· 541 491 !(eventsel & ARCH_PERFMON_EVENTSEL_OS), 542 492 eventsel & ARCH_PERFMON_EVENTSEL_INT); 543 493 } 494 + 495 + static bool pmc_is_event_match(struct kvm_pmc *pmc, u64 eventsel) 496 + { 497 + /* 498 + * Ignore checks for edge detect (all events currently emulated by KVM 499 + * are always rising edges), pin control (unsupported by modern CPUs), 500 + * and counter mask and its invert flag (KVM doesn't emulate multiple 501 + * events in a single clock cycle). 502 + * 503 + * Note, the uppermost nibble of AMD's mask overlaps Intel's IN_TX (bit 504 + * 32) and IN_TXCP (bit 33), as well as two reserved bits (bits 35:34). 505 + * Checking the "in HLE/RTM transaction" flags is correct as the vCPU 506 + * can't be in a transaction if KVM is emulating an instruction. 507 + * 508 + * Checking the reserved bits might be wrong if they are defined in the 509 + * future, but so could ignoring them, so do the simple thing for now. 510 + */ 511 + return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); 512 + } 513 + 514 + void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc) 515 + { 516 + bitmap_clear(pmu->pmc_counting_instructions, pmc->idx, 1); 517 + bitmap_clear(pmu->pmc_counting_branches, pmc->idx, 1); 518 + 519 + /* 520 + * Do NOT consult the PMU event filters, as the filters must be checked 521 + * at the time of emulation to ensure KVM uses fresh information, e.g. 522 + * omitting a PMC from a bitmap could result in a missed event if the 523 + * filter is changed to allow counting the event. 524 + */ 525 + if (!pmc_is_locally_enabled(pmc)) 526 + return; 527 + 528 + if (pmc_is_event_match(pmc, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED)) 529 + bitmap_set(pmu->pmc_counting_instructions, pmc->idx, 1); 530 + 531 + if (pmc_is_event_match(pmc, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED)) 532 + bitmap_set(pmu->pmc_counting_branches, pmc->idx, 1); 533 + } 534 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_recalc_pmc_emulation); 544 535 545 536 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) 546 537 { ··· 618 527 */ 619 528 if (unlikely(pmu->need_cleanup)) 620 529 kvm_pmu_cleanup(vcpu); 530 + 531 + kvm_for_each_pmc(pmu, pmc, bit, bitmap) 532 + kvm_pmu_recalc_pmc_emulation(pmu, pmc); 621 533 } 622 534 623 535 int kvm_pmu_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx) ··· 744 650 msr_info->data = pmu->global_ctrl; 745 651 break; 746 652 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 653 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 747 654 case MSR_CORE_PERF_GLOBAL_OVF_CTRL: 748 655 msr_info->data = 0; 749 656 break; ··· 805 710 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 806 711 if (!msr_info->host_initiated) 807 712 pmu->global_status &= ~data; 713 + break; 714 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 715 + if (!msr_info->host_initiated) 716 + pmu->global_status |= data & ~pmu->global_status_rsvd; 808 717 break; 809 718 default: 810 719 kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index); ··· 888 789 */ 889 790 if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters) 890 791 pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); 792 + 793 + bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); 794 + bitmap_set(pmu->all_valid_pmc_idx, KVM_FIXED_PMC_BASE_IDX, 795 + pmu->nr_arch_fixed_counters); 891 796 } 892 797 893 798 void kvm_pmu_init(struct kvm_vcpu *vcpu) ··· 916 813 pmu->pmc_in_use, X86_PMC_IDX_MAX); 917 814 918 815 kvm_for_each_pmc(pmu, pmc, i, bitmask) { 919 - if (pmc->perf_event && !pmc_speculative_in_use(pmc)) 816 + if (pmc->perf_event && !pmc_is_locally_enabled(pmc)) 920 817 pmc_stop_counter(pmc); 921 818 } 922 819 ··· 963 860 select_user; 964 861 } 965 862 966 - void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) 863 + static void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, 864 + const unsigned long *event_pmcs) 967 865 { 968 866 DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); 969 867 struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); 970 868 struct kvm_pmc *pmc; 971 - int i; 869 + int i, idx; 972 870 973 871 BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX); 974 872 873 + if (bitmap_empty(event_pmcs, X86_PMC_IDX_MAX)) 874 + return; 875 + 975 876 if (!kvm_pmu_has_perf_global_ctrl(pmu)) 976 - bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); 977 - else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx, 877 + bitmap_copy(bitmap, event_pmcs, X86_PMC_IDX_MAX); 878 + else if (!bitmap_and(bitmap, event_pmcs, 978 879 (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX)) 979 880 return; 980 881 882 + idx = srcu_read_lock(&vcpu->kvm->srcu); 981 883 kvm_for_each_pmc(pmu, pmc, i, bitmap) { 982 - /* 983 - * Ignore checks for edge detect (all events currently emulated 984 - * but KVM are always rising edges), pin control (unsupported 985 - * by modern CPUs), and counter mask and its invert flag (KVM 986 - * doesn't emulate multiple events in a single clock cycle). 987 - * 988 - * Note, the uppermost nibble of AMD's mask overlaps Intel's 989 - * IN_TX (bit 32) and IN_TXCP (bit 33), as well as two reserved 990 - * bits (bits 35:34). Checking the "in HLE/RTM transaction" 991 - * flags is correct as the vCPU can't be in a transaction if 992 - * KVM is emulating an instruction. Checking the reserved bits 993 - * might be wrong if they are defined in the future, but so 994 - * could ignoring them, so do the simple thing for now. 995 - */ 996 - if (((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB) || 997 - !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) 884 + if (!pmc_is_event_allowed(pmc) || !cpl_is_matched(pmc)) 998 885 continue; 999 886 1000 887 kvm_pmu_incr_counter(pmc); 1001 888 } 889 + srcu_read_unlock(&vcpu->kvm->srcu, idx); 1002 890 } 1003 - EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); 891 + 892 + void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu) 893 + { 894 + kvm_pmu_trigger_event(vcpu, vcpu_to_pmu(vcpu)->pmc_counting_instructions); 895 + } 896 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_instruction_retired); 897 + 898 + void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu) 899 + { 900 + kvm_pmu_trigger_event(vcpu, vcpu_to_pmu(vcpu)->pmc_counting_branches); 901 + } 902 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_pmu_branch_retired); 1004 903 1005 904 static bool is_masked_filter_valid(const struct kvm_x86_pmu_event_filter *filter) 1006 905 {

+7 -53

arch/x86/kvm/pmu.h

··· 23 23 24 24 #define KVM_FIXED_PMC_BASE_IDX INTEL_PMC_IDX_FIXED 25 25 26 - struct kvm_pmu_emulated_event_selectors { 27 - u64 INSTRUCTIONS_RETIRED; 28 - u64 BRANCH_INSTRUCTIONS_RETIRED; 29 - }; 30 - 31 26 struct kvm_pmu_ops { 32 27 struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu, 33 28 unsigned int idx, u64 *mask); ··· 160 165 return NULL; 161 166 } 162 167 163 - static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc) 168 + static inline bool pmc_is_locally_enabled(struct kvm_pmc *pmc) 164 169 { 165 170 struct kvm_pmu *pmu = pmc_to_pmu(pmc); 166 171 ··· 173 178 } 174 179 175 180 extern struct x86_pmu_capability kvm_pmu_cap; 176 - extern struct kvm_pmu_emulated_event_selectors kvm_pmu_eventsel; 177 181 178 - static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) 179 - { 180 - bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; 181 - int min_nr_gp_ctrs = pmu_ops->MIN_NR_GP_COUNTERS; 182 + void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops); 182 183 183 - /* 184 - * Hybrid PMUs don't play nice with virtualization without careful 185 - * configuration by userspace, and KVM's APIs for reporting supported 186 - * vPMU features do not account for hybrid PMUs. Disable vPMU support 187 - * for hybrid PMUs until KVM gains a way to let userspace opt-in. 188 - */ 189 - if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) 190 - enable_pmu = false; 191 - 192 - if (enable_pmu) { 193 - perf_get_x86_pmu_capability(&kvm_pmu_cap); 194 - 195 - /* 196 - * WARN if perf did NOT disable hardware PMU if the number of 197 - * architecturally required GP counters aren't present, i.e. if 198 - * there are a non-zero number of counters, but fewer than what 199 - * is architecturally required. 200 - */ 201 - if (!kvm_pmu_cap.num_counters_gp || 202 - WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs)) 203 - enable_pmu = false; 204 - else if (is_intel && !kvm_pmu_cap.version) 205 - enable_pmu = false; 206 - } 207 - 208 - if (!enable_pmu) { 209 - memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); 210 - return; 211 - } 212 - 213 - kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2); 214 - kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp, 215 - pmu_ops->MAX_NR_GP_COUNTERS); 216 - kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed, 217 - KVM_MAX_NR_FIXED_COUNTERS); 218 - 219 - kvm_pmu_eventsel.INSTRUCTIONS_RETIRED = 220 - perf_get_hw_event_config(PERF_COUNT_HW_INSTRUCTIONS); 221 - kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED = 222 - perf_get_hw_event_config(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); 223 - } 184 + void kvm_pmu_recalc_pmc_emulation(struct kvm_pmu *pmu, struct kvm_pmc *pmc); 224 185 225 186 static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc) 226 187 { 188 + kvm_pmu_recalc_pmc_emulation(pmc_to_pmu(pmc), pmc); 189 + 227 190 set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi); 228 191 kvm_make_request(KVM_REQ_PMU, pmc->vcpu); 229 192 } ··· 225 272 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu); 226 273 void kvm_pmu_destroy(struct kvm_vcpu *vcpu); 227 274 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); 228 - void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel); 275 + void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu); 276 + void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu); 229 277 230 278 bool is_vmware_backdoor_pmc(u32 pmc_idx); 231 279

+5

arch/x86/kvm/reverse_cpuid.h

··· 25 25 #define KVM_X86_FEATURE_SGX2 KVM_X86_FEATURE(CPUID_12_EAX, 1) 26 26 #define KVM_X86_FEATURE_SGX_EDECCSSA KVM_X86_FEATURE(CPUID_12_EAX, 11) 27 27 28 + /* Intel-defined sub-features, CPUID level 0x00000007:1 (ECX) */ 29 + #define KVM_X86_FEATURE_MSR_IMM KVM_X86_FEATURE(CPUID_7_1_ECX, 5) 30 + 28 31 /* Intel-defined sub-features, CPUID level 0x00000007:1 (EDX) */ 29 32 #define X86_FEATURE_AVX_VNNI_INT8 KVM_X86_FEATURE(CPUID_7_1_EDX, 4) 30 33 #define X86_FEATURE_AVX_NE_CONVERT KVM_X86_FEATURE(CPUID_7_1_EDX, 5) ··· 90 87 [CPUID_7_2_EDX] = { 7, 2, CPUID_EDX}, 91 88 [CPUID_24_0_EBX] = { 0x24, 0, CPUID_EBX}, 92 89 [CPUID_8000_0021_ECX] = {0x80000021, 0, CPUID_ECX}, 90 + [CPUID_7_1_ECX] = { 7, 1, CPUID_ECX}, 93 91 }; 94 92 95 93 /* ··· 132 128 KVM_X86_TRANSLATE_FEATURE(BHI_CTRL); 133 129 KVM_X86_TRANSLATE_FEATURE(TSA_SQ_NO); 134 130 KVM_X86_TRANSLATE_FEATURE(TSA_L1_NO); 131 + KVM_X86_TRANSLATE_FEATURE(MSR_IMM); 135 132 default: 136 133 return x86_feature; 137 134 }

+11 -3

arch/x86/kvm/smm.c

··· 131 131 132 132 kvm_mmu_reset_context(vcpu); 133 133 } 134 - EXPORT_SYMBOL_GPL(kvm_smm_changed); 134 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_smm_changed); 135 135 136 136 void process_smi(struct kvm_vcpu *vcpu) 137 137 { ··· 269 269 enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS); 270 270 271 271 smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu); 272 + 273 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 274 + kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp)) 275 + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); 272 276 } 273 277 #endif 274 278 ··· 533 529 534 530 vcpu->arch.smbase = smstate->smbase; 535 531 536 - if (kvm_set_msr(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA)) 532 + if (__kvm_emulate_msr_write(vcpu, MSR_EFER, smstate->efer & ~EFER_LMA)) 537 533 return X86EMUL_UNHANDLEABLE; 538 534 539 535 rsm_load_seg_64(vcpu, &smstate->tr, VCPU_SREG_TR); ··· 561 557 562 558 kvm_x86_call(set_interrupt_shadow)(vcpu, 0); 563 559 ctxt->interruptibility = (u8)smstate->int_shadow; 560 + 561 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 562 + kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp)) 563 + return X86EMUL_UNHANDLEABLE; 564 564 565 565 return X86EMUL_CONTINUE; 566 566 } ··· 628 620 629 621 /* And finally go back to 32-bit mode. */ 630 622 efer = 0; 631 - kvm_set_msr(vcpu, MSR_EFER, efer); 623 + __kvm_emulate_msr_write(vcpu, MSR_EFER, efer); 632 624 } 633 625 #endif 634 626

+1 -1

arch/x86/kvm/smm.h

··· 116 116 u32 smbase; 117 117 u32 reserved4[5]; 118 118 119 - /* ssp and svm_* fields below are not implemented by KVM */ 120 119 u64 ssp; 120 + /* svm_* fields below are not implemented by KVM */ 121 121 u64 svm_guest_pat; 122 122 u64 svm_host_efer; 123 123 u64 svm_host_cr4;

+125 -26

arch/x86/kvm/svm/avic.c

··· 64 64 65 65 static_assert(__AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_IDX_MASK) == -1u); 66 66 67 + #define AVIC_AUTO_MODE -1 68 + 69 + static int avic_param_set(const char *val, const struct kernel_param *kp) 70 + { 71 + if (val && sysfs_streq(val, "auto")) { 72 + *(int *)kp->arg = AVIC_AUTO_MODE; 73 + return 0; 74 + } 75 + 76 + return param_set_bint(val, kp); 77 + } 78 + 79 + static const struct kernel_param_ops avic_ops = { 80 + .flags = KERNEL_PARAM_OPS_FL_NOARG, 81 + .set = avic_param_set, 82 + .get = param_get_bool, 83 + }; 84 + 85 + /* 86 + * Enable / disable AVIC. In "auto" mode (default behavior), AVIC is enabled 87 + * for Zen4+ CPUs with x2AVIC (and all other criteria for enablement are met). 88 + */ 89 + static int avic = AVIC_AUTO_MODE; 90 + module_param_cb(avic, &avic_ops, &avic, 0444); 91 + __MODULE_PARM_TYPE(avic, "bool"); 92 + 93 + module_param(enable_ipiv, bool, 0444); 94 + 67 95 static bool force_avic; 68 96 module_param_unsafe(force_avic, bool, 0444); 69 97 ··· 105 77 static u32 next_vm_id = 0; 106 78 static bool next_vm_id_wrapped = 0; 107 79 static DEFINE_SPINLOCK(svm_vm_data_hash_lock); 108 - bool x2avic_enabled; 80 + static bool x2avic_enabled; 81 + 82 + 83 + static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm, 84 + bool intercept) 85 + { 86 + static const u32 x2avic_passthrough_msrs[] = { 87 + X2APIC_MSR(APIC_ID), 88 + X2APIC_MSR(APIC_LVR), 89 + X2APIC_MSR(APIC_TASKPRI), 90 + X2APIC_MSR(APIC_ARBPRI), 91 + X2APIC_MSR(APIC_PROCPRI), 92 + X2APIC_MSR(APIC_EOI), 93 + X2APIC_MSR(APIC_RRR), 94 + X2APIC_MSR(APIC_LDR), 95 + X2APIC_MSR(APIC_DFR), 96 + X2APIC_MSR(APIC_SPIV), 97 + X2APIC_MSR(APIC_ISR), 98 + X2APIC_MSR(APIC_TMR), 99 + X2APIC_MSR(APIC_IRR), 100 + X2APIC_MSR(APIC_ESR), 101 + X2APIC_MSR(APIC_ICR), 102 + X2APIC_MSR(APIC_ICR2), 103 + 104 + /* 105 + * Note! Always intercept LVTT, as TSC-deadline timer mode 106 + * isn't virtualized by hardware, and the CPU will generate a 107 + * #GP instead of a #VMEXIT. 108 + */ 109 + X2APIC_MSR(APIC_LVTTHMR), 110 + X2APIC_MSR(APIC_LVTPC), 111 + X2APIC_MSR(APIC_LVT0), 112 + X2APIC_MSR(APIC_LVT1), 113 + X2APIC_MSR(APIC_LVTERR), 114 + X2APIC_MSR(APIC_TMICT), 115 + X2APIC_MSR(APIC_TMCCT), 116 + X2APIC_MSR(APIC_TDCR), 117 + }; 118 + int i; 119 + 120 + if (intercept == svm->x2avic_msrs_intercepted) 121 + return; 122 + 123 + if (!x2avic_enabled) 124 + return; 125 + 126 + for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++) 127 + svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i], 128 + MSR_TYPE_RW, intercept); 129 + 130 + svm->x2avic_msrs_intercepted = intercept; 131 + } 109 132 110 133 static void avic_activate_vmcb(struct vcpu_svm *svm) 111 134 { ··· 178 99 vmcb->control.int_ctl |= X2APIC_MODE_MASK; 179 100 vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID; 180 101 /* Disabling MSR intercept for x2APIC registers */ 181 - svm_set_x2apic_msr_interception(svm, false); 102 + avic_set_x2apic_msr_interception(svm, false); 182 103 } else { 183 104 /* 184 105 * Flush the TLB, the guest may have inserted a non-APIC ··· 189 110 /* For xAVIC and hybrid-xAVIC modes */ 190 111 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; 191 112 /* Enabling MSR intercept for x2APIC registers */ 192 - svm_set_x2apic_msr_interception(svm, true); 113 + avic_set_x2apic_msr_interception(svm, true); 193 114 } 194 115 } 195 116 ··· 209 130 return; 210 131 211 132 /* Enabling MSR intercept for x2APIC registers */ 212 - svm_set_x2apic_msr_interception(svm, true); 133 + avic_set_x2apic_msr_interception(svm, true); 213 134 } 214 135 215 136 /* Note: ··· 1169 1090 avic_vcpu_load(vcpu, vcpu->cpu); 1170 1091 } 1171 1092 1172 - /* 1173 - * Note: 1174 - * - The module param avic enable both xAPIC and x2APIC mode. 1175 - * - Hypervisor can support both xAVIC and x2AVIC in the same guest. 1176 - * - The mode can be switched at run-time. 1177 - */ 1178 - bool avic_hardware_setup(void) 1093 + static bool __init avic_want_avic_enabled(void) 1179 1094 { 1180 - if (!npt_enabled) 1095 + /* 1096 + * In "auto" mode, enable AVIC by default for Zen4+ if x2AVIC is 1097 + * supported (to avoid enabling partial support by default, and because 1098 + * x2AVIC should be supported by all Zen4+ CPUs). Explicitly check for 1099 + * family 0x19 and later (Zen5+), as the kernel's synthetic ZenX flags 1100 + * aren't inclusive of previous generations, i.e. the kernel will set 1101 + * at most one ZenX feature flag. 1102 + */ 1103 + if (avic == AVIC_AUTO_MODE) 1104 + avic = boot_cpu_has(X86_FEATURE_X2AVIC) && 1105 + (boot_cpu_data.x86 > 0x19 || cpu_feature_enabled(X86_FEATURE_ZEN4)); 1106 + 1107 + if (!avic || !npt_enabled) 1181 1108 return false; 1182 1109 1183 1110 /* AVIC is a prerequisite for x2AVIC. */ 1184 1111 if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) { 1185 - if (boot_cpu_has(X86_FEATURE_X2AVIC)) { 1186 - pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled"); 1187 - pr_warn(FW_BUG "Try enable AVIC using force_avic option"); 1188 - } 1112 + if (boot_cpu_has(X86_FEATURE_X2AVIC)) 1113 + pr_warn(FW_BUG "Cannot enable x2AVIC, AVIC is unsupported\n"); 1189 1114 return false; 1190 1115 } 1191 1116 ··· 1199 1116 return false; 1200 1117 } 1201 1118 1202 - if (boot_cpu_has(X86_FEATURE_AVIC)) { 1203 - pr_info("AVIC enabled\n"); 1204 - } else if (force_avic) { 1205 - /* 1206 - * Some older systems does not advertise AVIC support. 1207 - * See Revision Guide for specific AMD processor for more detail. 1208 - */ 1209 - pr_warn("AVIC is not supported in CPUID but force enabled"); 1210 - pr_warn("Your system might crash and burn"); 1211 - } 1119 + /* 1120 + * Print a scary message if AVIC is force enabled to make it abundantly 1121 + * clear that ignoring CPUID could have repercussions. See Revision 1122 + * Guide for specific AMD processor for more details. 1123 + */ 1124 + if (!boot_cpu_has(X86_FEATURE_AVIC)) 1125 + pr_warn("AVIC unsupported in CPUID but force enabled, your system might crash and burn\n"); 1126 + 1127 + return true; 1128 + } 1129 + 1130 + /* 1131 + * Note: 1132 + * - The module param avic enable both xAPIC and x2APIC mode. 1133 + * - Hypervisor can support both xAVIC and x2AVIC in the same guest. 1134 + * - The mode can be switched at run-time. 1135 + */ 1136 + bool __init avic_hardware_setup(void) 1137 + { 1138 + avic = avic_want_avic_enabled(); 1139 + if (!avic) 1140 + return false; 1141 + 1142 + pr_info("AVIC enabled\n"); 1212 1143 1213 1144 /* AVIC is a prerequisite for x2AVIC. */ 1214 1145 x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC); 1215 1146 if (x2avic_enabled) 1216 1147 pr_info("x2AVIC enabled\n"); 1148 + else 1149 + svm_x86_ops.allow_apicv_in_x2apic_without_x2apic_virtualization = true; 1217 1150 1218 1151 /* 1219 1152 * Disable IPI virtualization for AMD Family 17h CPUs (Zen1 and Zen2)

+28 -10

arch/x86/kvm/svm/nested.c

··· 636 636 vmcb_mark_dirty(vmcb02, VMCB_DT); 637 637 } 638 638 639 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 640 + (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) { 641 + vmcb02->save.s_cet = vmcb12->save.s_cet; 642 + vmcb02->save.isst_addr = vmcb12->save.isst_addr; 643 + vmcb02->save.ssp = vmcb12->save.ssp; 644 + vmcb_mark_dirty(vmcb02, VMCB_CET); 645 + } 646 + 639 647 kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED); 640 648 641 649 svm_set_efer(vcpu, svm->nested.save.efer); ··· 1052 1044 to_save->rsp = from_save->rsp; 1053 1045 to_save->rip = from_save->rip; 1054 1046 to_save->cpl = 0; 1047 + 1048 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 1049 + to_save->s_cet = from_save->s_cet; 1050 + to_save->isst_addr = from_save->isst_addr; 1051 + to_save->ssp = from_save->ssp; 1052 + } 1055 1053 } 1056 1054 1057 1055 void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb) ··· 1124 1110 vmcb12->save.dr7 = vmcb02->save.dr7; 1125 1111 vmcb12->save.dr6 = svm->vcpu.arch.dr6; 1126 1112 vmcb12->save.cpl = vmcb02->save.cpl; 1113 + 1114 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 1115 + vmcb12->save.s_cet = vmcb02->save.s_cet; 1116 + vmcb12->save.isst_addr = vmcb02->save.isst_addr; 1117 + vmcb12->save.ssp = vmcb02->save.ssp; 1118 + } 1127 1119 1128 1120 vmcb12->control.int_state = vmcb02->control.int_state; 1129 1121 vmcb12->control.exit_code = vmcb02->control.exit_code; ··· 1818 1798 if (kvm_state->size < sizeof(*kvm_state) + KVM_STATE_NESTED_SVM_VMCB_SIZE) 1819 1799 return -EINVAL; 1820 1800 1821 - ret = -ENOMEM; 1822 - ctl = kzalloc(sizeof(*ctl), GFP_KERNEL); 1823 - save = kzalloc(sizeof(*save), GFP_KERNEL); 1824 - if (!ctl || !save) 1825 - goto out_free; 1801 + ctl = memdup_user(&user_vmcb->control, sizeof(*ctl)); 1802 + if (IS_ERR(ctl)) 1803 + return PTR_ERR(ctl); 1826 1804 1827 - ret = -EFAULT; 1828 - if (copy_from_user(ctl, &user_vmcb->control, sizeof(*ctl))) 1829 - goto out_free; 1830 - if (copy_from_user(save, &user_vmcb->save, sizeof(*save))) 1831 - goto out_free; 1805 + save = memdup_user(&user_vmcb->save, sizeof(*save)); 1806 + if (IS_ERR(save)) { 1807 + kfree(ctl); 1808 + return PTR_ERR(save); 1809 + } 1832 1810 1833 1811 ret = -EINVAL; 1834 1812 __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);

+4 -4

arch/x86/kvm/svm/pmu.c

··· 41 41 struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); 42 42 unsigned int idx; 43 43 44 - if (!vcpu->kvm->arch.enable_pmu) 44 + if (!pmu->version) 45 45 return NULL; 46 46 47 47 switch (msr) { ··· 113 113 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: 114 114 case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: 115 115 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 116 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 116 117 return pmu->version > 1; 117 118 default: 118 119 if (msr > MSR_F15H_PERF_CTR5 && ··· 200 199 kvm_pmu_cap.num_counters_gp); 201 200 202 201 if (pmu->version > 1) { 203 - pmu->global_ctrl_rsvd = ~((1ull << pmu->nr_arch_gp_counters) - 1); 202 + pmu->global_ctrl_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1); 204 203 pmu->global_status_rsvd = pmu->global_ctrl_rsvd; 205 204 } 206 205 207 - pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; 206 + pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(48) - 1; 208 207 pmu->reserved_bits = 0xfffffff000280000ull; 209 208 pmu->raw_event_mask = AMD64_RAW_EVENT_MASK; 210 209 /* not applicable to AMD; but clean them to prevent any fall out */ 211 210 pmu->counter_bitmask[KVM_PMC_FIXED] = 0; 212 211 pmu->nr_arch_fixed_counters = 0; 213 - bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); 214 212 } 215 213 216 214 static void amd_pmu_init(struct kvm_vcpu *vcpu)

+168 -63

arch/x86/kvm/svm/sev.c

··· 37 37 #include "trace.h" 38 38 39 39 #define GHCB_VERSION_MAX 2ULL 40 - #define GHCB_VERSION_DEFAULT 2ULL 41 40 #define GHCB_VERSION_MIN 1ULL 42 41 43 42 #define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION) ··· 57 58 static bool sev_es_debug_swap_enabled = true; 58 59 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444); 59 60 static u64 sev_supported_vmsa_features; 61 + 62 + static unsigned int nr_ciphertext_hiding_asids; 63 + module_param_named(ciphertext_hiding_asids, nr_ciphertext_hiding_asids, uint, 0444); 60 64 61 65 #define AP_RESET_HOLD_NONE 0 62 66 #define AP_RESET_HOLD_NAE_EVENT 1 ··· 87 85 static DEFINE_MUTEX(sev_bitmap_lock); 88 86 unsigned int max_sev_asid; 89 87 static unsigned int min_sev_asid; 88 + static unsigned int max_sev_es_asid; 89 + static unsigned int min_sev_es_asid; 90 + static unsigned int max_snp_asid; 91 + static unsigned int min_snp_asid; 90 92 static unsigned long sev_me_mask; 91 93 static unsigned int nr_asids; 92 94 static unsigned long *sev_asid_bitmap; ··· 153 147 return sev->vmsa_features & SVM_SEV_FEAT_DEBUG_SWAP; 154 148 } 155 149 150 + static bool snp_is_secure_tsc_enabled(struct kvm *kvm) 151 + { 152 + struct kvm_sev_info *sev = to_kvm_sev_info(kvm); 153 + 154 + return (sev->vmsa_features & SVM_SEV_FEAT_SECURE_TSC) && 155 + !WARN_ON_ONCE(!sev_snp_guest(kvm)); 156 + } 157 + 156 158 /* Must be called with the sev_bitmap_lock held */ 157 159 static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid) 158 160 { ··· 187 173 misc_cg_uncharge(type, sev->misc_cg, 1); 188 174 } 189 175 190 - static int sev_asid_new(struct kvm_sev_info *sev) 176 + static int sev_asid_new(struct kvm_sev_info *sev, unsigned long vm_type) 191 177 { 192 178 /* 193 179 * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. 194 180 * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. 195 - * Note: min ASID can end up larger than the max if basic SEV support is 196 - * effectively disabled by disallowing use of ASIDs for SEV guests. 197 181 */ 198 - unsigned int min_asid = sev->es_active ? 1 : min_sev_asid; 199 - unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; 200 - unsigned int asid; 182 + unsigned int min_asid, max_asid, asid; 201 183 bool retry = true; 202 184 int ret; 203 185 186 + if (vm_type == KVM_X86_SNP_VM) { 187 + min_asid = min_snp_asid; 188 + max_asid = max_snp_asid; 189 + } else if (sev->es_active) { 190 + min_asid = min_sev_es_asid; 191 + max_asid = max_sev_es_asid; 192 + } else { 193 + min_asid = min_sev_asid; 194 + max_asid = max_sev_asid; 195 + } 196 + 197 + /* 198 + * The min ASID can end up larger than the max if basic SEV support is 199 + * effectively disabled by disallowing use of ASIDs for SEV guests. 200 + * Similarly for SEV-ES guests the min ASID can end up larger than the 201 + * max when ciphertext hiding is enabled, effectively disabling SEV-ES 202 + * support. 203 + */ 204 204 if (min_asid > max_asid) 205 205 return -ENOTTY; 206 206 ··· 434 406 struct kvm_sev_info *sev = to_kvm_sev_info(kvm); 435 407 struct sev_platform_init_args init_args = {0}; 436 408 bool es_active = vm_type != KVM_X86_SEV_VM; 409 + bool snp_active = vm_type == KVM_X86_SNP_VM; 437 410 u64 valid_vmsa_features = es_active ? sev_supported_vmsa_features : 0; 438 411 int ret; 439 412 ··· 444 415 if (data->flags) 445 416 return -EINVAL; 446 417 418 + if (!snp_active) 419 + valid_vmsa_features &= ~SVM_SEV_FEAT_SECURE_TSC; 420 + 447 421 if (data->vmsa_features & ~valid_vmsa_features) 448 422 return -EINVAL; 449 423 450 424 if (data->ghcb_version > GHCB_VERSION_MAX || (!es_active && data->ghcb_version)) 425 + return -EINVAL; 426 + 427 + /* 428 + * KVM supports the full range of mandatory features defined by version 429 + * 2 of the GHCB protocol, so default to that for SEV-ES guests created 430 + * via KVM_SEV_INIT2 (KVM_SEV_INIT forces version 1). 431 + */ 432 + if (es_active && !data->ghcb_version) 433 + data->ghcb_version = 2; 434 + 435 + if (snp_active && data->ghcb_version < 2) 451 436 return -EINVAL; 452 437 453 438 if (unlikely(sev->active)) ··· 472 429 sev->vmsa_features = data->vmsa_features; 473 430 sev->ghcb_version = data->ghcb_version; 474 431 475 - /* 476 - * Currently KVM supports the full range of mandatory features defined 477 - * by version 2 of the GHCB protocol, so default to that for SEV-ES 478 - * guests created via KVM_SEV_INIT2. 479 - */ 480 - if (sev->es_active && !sev->ghcb_version) 481 - sev->ghcb_version = GHCB_VERSION_DEFAULT; 482 - 483 - if (vm_type == KVM_X86_SNP_VM) 432 + if (snp_active) 484 433 sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE; 485 434 486 - ret = sev_asid_new(sev); 435 + ret = sev_asid_new(sev, vm_type); 487 436 if (ret) 488 437 goto e_no_asid; 489 438 ··· 490 455 } 491 456 492 457 /* This needs to happen after SEV/SNP firmware initialization. */ 493 - if (vm_type == KVM_X86_SNP_VM) { 458 + if (snp_active) { 494 459 ret = snp_guest_req_init(kvm); 495 460 if (ret) 496 461 goto e_free; ··· 604 569 if (copy_from_user(&params, u64_to_user_ptr(argp->data), sizeof(params))) 605 570 return -EFAULT; 606 571 607 - sev->policy = params.policy; 608 - 609 572 memset(&start, 0, sizeof(start)); 610 573 611 574 dh_blob = NULL; ··· 651 618 goto e_free_session; 652 619 } 653 620 621 + sev->policy = params.policy; 654 622 sev->handle = start.handle; 655 623 sev->fd = argp->sev_fd; 656 624 ··· 2002 1968 kvm_for_each_vcpu(i, dst_vcpu, dst_kvm) { 2003 1969 dst_svm = to_svm(dst_vcpu); 2004 1970 2005 - sev_init_vmcb(dst_svm); 1971 + sev_init_vmcb(dst_svm, false); 2006 1972 2007 1973 if (!dst->es_active) 2008 1974 continue; ··· 2214 2180 if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO)) 2215 2181 return -EINVAL; 2216 2182 2217 - sev->policy = params.policy; 2183 + if (snp_is_secure_tsc_enabled(kvm)) { 2184 + if (WARN_ON_ONCE(!kvm->arch.default_tsc_khz)) 2185 + return -EINVAL; 2186 + 2187 + start.desired_tsc_khz = kvm->arch.default_tsc_khz; 2188 + } 2218 2189 2219 2190 sev->snp_context = snp_context_create(kvm, argp); 2220 2191 if (!sev->snp_context) ··· 2227 2188 2228 2189 start.gctx_paddr = __psp_pa(sev->snp_context); 2229 2190 start.policy = params.policy; 2191 + 2230 2192 memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw)); 2231 2193 rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error); 2232 2194 if (rc) { ··· 2236 2196 goto e_free_context; 2237 2197 } 2238 2198 2199 + sev->policy = params.policy; 2239 2200 sev->fd = argp->sev_fd; 2240 2201 rc = snp_bind_asid(kvm, &argp->error); 2241 2202 if (rc) { ··· 2370 2329 pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__, 2371 2330 params.gfn_start, params.len, params.type, params.flags); 2372 2331 2373 - if (!PAGE_ALIGNED(params.len) || params.flags || 2332 + if (!params.len || !PAGE_ALIGNED(params.len) || params.flags || 2374 2333 (params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL && 2375 2334 params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO && 2376 2335 params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED && ··· 3079 3038 if (min_sev_asid == 1) 3080 3039 goto out; 3081 3040 3041 + min_sev_es_asid = min_snp_asid = 1; 3042 + max_sev_es_asid = max_snp_asid = min_sev_asid - 1; 3043 + 3082 3044 sev_es_asid_count = min_sev_asid - 1; 3083 3045 WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count)); 3084 3046 sev_es_supported = true; ··· 3090 3046 out: 3091 3047 if (sev_enabled) { 3092 3048 init_args.probe = true; 3049 + 3050 + if (sev_is_snp_ciphertext_hiding_supported()) 3051 + init_args.max_snp_asid = min(nr_ciphertext_hiding_asids, 3052 + min_sev_asid - 1); 3053 + 3093 3054 if (sev_platform_init(&init_args)) 3094 3055 sev_supported = sev_es_supported = sev_snp_supported = false; 3095 3056 else if (sev_snp_supported) 3096 3057 sev_snp_supported = is_sev_snp_initialized(); 3058 + 3059 + if (sev_snp_supported) 3060 + nr_ciphertext_hiding_asids = init_args.max_snp_asid; 3061 + 3062 + /* 3063 + * If ciphertext hiding is enabled, the joint SEV-ES/SEV-SNP 3064 + * ASID range is partitioned into separate SEV-ES and SEV-SNP 3065 + * ASID ranges, with the SEV-SNP range being [1..max_snp_asid] 3066 + * and the SEV-ES range being (max_snp_asid..max_sev_es_asid]. 3067 + * Note, SEV-ES may effectively be disabled if all ASIDs from 3068 + * the joint range are assigned to SEV-SNP. 3069 + */ 3070 + if (nr_ciphertext_hiding_asids) { 3071 + max_snp_asid = nr_ciphertext_hiding_asids; 3072 + min_sev_es_asid = max_snp_asid + 1; 3073 + pr_info("SEV-SNP ciphertext hiding enabled\n"); 3074 + } 3097 3075 } 3098 3076 3099 3077 if (boot_cpu_has(X86_FEATURE_SEV)) ··· 3126 3060 min_sev_asid, max_sev_asid); 3127 3061 if (boot_cpu_has(X86_FEATURE_SEV_ES)) 3128 3062 pr_info("SEV-ES %s (ASIDs %u - %u)\n", 3129 - str_enabled_disabled(sev_es_supported), 3130 - min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1); 3063 + sev_es_supported ? min_sev_es_asid <= max_sev_es_asid ? "enabled" : 3064 + "unusable" : 3065 + "disabled", 3066 + min_sev_es_asid, max_sev_es_asid); 3131 3067 if (boot_cpu_has(X86_FEATURE_SEV_SNP)) 3132 3068 pr_info("SEV-SNP %s (ASIDs %u - %u)\n", 3133 3069 str_enabled_disabled(sev_snp_supported), 3134 - min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1); 3070 + min_snp_asid, max_snp_asid); 3135 3071 3136 3072 sev_enabled = sev_supported; 3137 3073 sev_es_enabled = sev_es_supported; ··· 3146 3078 sev_supported_vmsa_features = 0; 3147 3079 if (sev_es_debug_swap_enabled) 3148 3080 sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP; 3081 + 3082 + if (sev_snp_enabled && tsc_khz && cpu_feature_enabled(X86_FEATURE_SNP_SECURE_TSC)) 3083 + sev_supported_vmsa_features |= SVM_SEV_FEAT_SECURE_TSC; 3149 3084 } 3150 3085 3151 3086 void sev_hardware_unsetup(void) ··· 3264 3193 kvfree(svm->sev_es.ghcb_sa); 3265 3194 } 3266 3195 3267 - static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control) 3196 + static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control) 3268 3197 { 3269 3198 return (((u64)control->exit_code_hi) << 32) | control->exit_code; 3270 3199 } ··· 3290 3219 */ 3291 3220 pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa); 3292 3221 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code", 3293 - kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm)); 3222 + kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm)); 3294 3223 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1", 3295 3224 control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm)); 3296 3225 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2", ··· 3343 3272 BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap)); 3344 3273 memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap)); 3345 3274 3346 - vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb); 3347 - vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb); 3348 - vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb); 3349 - vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb); 3350 - vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb); 3275 + vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm); 3276 + vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm); 3277 + vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm); 3278 + vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm); 3279 + vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm); 3351 3280 3352 - svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb); 3281 + svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm); 3353 3282 3354 - if (kvm_ghcb_xcr0_is_valid(svm)) { 3355 - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); 3356 - vcpu->arch.cpuid_dynamic_bits_dirty = true; 3357 - } 3283 + if (kvm_ghcb_xcr0_is_valid(svm)) 3284 + __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm)); 3285 + 3286 + if (kvm_ghcb_xss_is_valid(svm)) 3287 + __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm)); 3358 3288 3359 3289 /* Copy the GHCB exit information into the VMCB fields */ 3360 - exit_code = ghcb_get_sw_exit_code(ghcb); 3290 + exit_code = kvm_ghcb_get_sw_exit_code(svm); 3361 3291 control->exit_code = lower_32_bits(exit_code); 3362 3292 control->exit_code_hi = upper_32_bits(exit_code); 3363 - control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb); 3364 - control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb); 3365 - svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb); 3293 + control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(svm); 3294 + control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(svm); 3295 + svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm); 3366 3296 3367 3297 /* Clear the valid entries fields */ 3368 3298 memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); ··· 3380 3308 * Retrieve the exit code now even though it may not be marked valid 3381 3309 * as it could help with debugging. 3382 3310 */ 3383 - exit_code = kvm_ghcb_get_sw_exit_code(control); 3311 + exit_code = kvm_get_cached_sw_exit_code(control); 3384 3312 3385 3313 /* Only GHCB Usage code 0 is supported */ 3386 3314 if (svm->sev_es.ghcb->ghcb_usage) { ··· 3952 3880 /* 3953 3881 * Invoked as part of svm_vcpu_reset() processing of an init event. 3954 3882 */ 3955 - void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) 3883 + static void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) 3956 3884 { 3957 3885 struct vcpu_svm *svm = to_svm(vcpu); 3958 3886 struct kvm_memory_slot *slot; 3959 3887 struct page *page; 3960 3888 kvm_pfn_t pfn; 3961 3889 gfn_t gfn; 3962 - 3963 - if (!sev_snp_guest(vcpu->kvm)) 3964 - return; 3965 3890 3966 3891 guard(mutex)(&svm->sev_es.snp_vmsa_mutex); 3967 3892 ··· 4385 4316 4386 4317 svm_vmgexit_success(svm, 0); 4387 4318 4388 - exit_code = kvm_ghcb_get_sw_exit_code(control); 4319 + exit_code = kvm_get_cached_sw_exit_code(control); 4389 4320 switch (exit_code) { 4390 4321 case SVM_VMGEXIT_MMIO_READ: 4391 4322 ret = setup_vmgexit_scratch(svm, true, control->exit_info_2); ··· 4517 4448 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) && 4518 4449 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID)); 4519 4450 4451 + svm_set_intercept_for_msr(vcpu, MSR_AMD64_GUEST_TSC_FREQ, MSR_TYPE_R, 4452 + !snp_is_secure_tsc_enabled(vcpu->kvm)); 4453 + 4520 4454 /* 4521 4455 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if 4522 4456 * the host/guest supports its use. ··· 4548 4476 vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f)); 4549 4477 } 4550 4478 4551 - static void sev_es_init_vmcb(struct vcpu_svm *svm) 4479 + static void sev_es_init_vmcb(struct vcpu_svm *svm, bool init_event) 4552 4480 { 4553 4481 struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm); 4554 4482 struct vmcb *vmcb = svm->vmcb01.ptr; ··· 4609 4537 4610 4538 /* Can't intercept XSETBV, HV can't modify XCR0 directly */ 4611 4539 svm_clr_intercept(svm, INTERCEPT_XSETBV); 4540 + 4541 + /* 4542 + * Set the GHCB MSR value as per the GHCB specification when emulating 4543 + * vCPU RESET for an SEV-ES guest. 4544 + */ 4545 + if (!init_event) 4546 + set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version, 4547 + GHCB_VERSION_MIN, 4548 + sev_enc_bit)); 4612 4549 } 4613 4550 4614 - void sev_init_vmcb(struct vcpu_svm *svm) 4551 + void sev_init_vmcb(struct vcpu_svm *svm, bool init_event) 4615 4552 { 4553 + struct kvm_vcpu *vcpu = &svm->vcpu; 4554 + 4616 4555 svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE; 4617 4556 clr_exception_intercept(svm, UD_VECTOR); 4618 4557 ··· 4633 4550 */ 4634 4551 clr_exception_intercept(svm, GP_VECTOR); 4635 4552 4636 - if (sev_es_guest(svm->vcpu.kvm)) 4637 - sev_es_init_vmcb(svm); 4553 + if (init_event && sev_snp_guest(vcpu->kvm)) 4554 + sev_snp_init_protected_guest_state(vcpu); 4555 + 4556 + if (sev_es_guest(vcpu->kvm)) 4557 + sev_es_init_vmcb(svm, init_event); 4638 4558 } 4639 4559 4640 - void sev_es_vcpu_reset(struct vcpu_svm *svm) 4560 + int sev_vcpu_create(struct kvm_vcpu *vcpu) 4641 4561 { 4642 - struct kvm_vcpu *vcpu = &svm->vcpu; 4643 - struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm); 4644 - 4645 - /* 4646 - * Set the GHCB MSR value as per the GHCB specification when emulating 4647 - * vCPU RESET for an SEV-ES guest. 4648 - */ 4649 - set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version, 4650 - GHCB_VERSION_MIN, 4651 - sev_enc_bit)); 4562 + struct vcpu_svm *svm = to_svm(vcpu); 4563 + struct page *vmsa_page; 4652 4564 4653 4565 mutex_init(&svm->sev_es.snp_vmsa_mutex); 4566 + 4567 + if (!sev_es_guest(vcpu->kvm)) 4568 + return 0; 4569 + 4570 + /* 4571 + * SEV-ES guests require a separate (from the VMCB) VMSA page used to 4572 + * contain the encrypted register state of the guest. 4573 + */ 4574 + vmsa_page = snp_safe_alloc_page(); 4575 + if (!vmsa_page) 4576 + return -ENOMEM; 4577 + 4578 + svm->sev_es.vmsa = page_address(vmsa_page); 4579 + 4580 + vcpu->arch.guest_tsc_protected = snp_is_secure_tsc_enabled(vcpu->kvm); 4581 + 4582 + return 0; 4654 4583 } 4655 4584 4656 4585 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa) ··· 4713 4618 hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2); 4714 4619 hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3); 4715 4620 } 4621 + 4622 + /* 4623 + * TSC_AUX is always virtualized for SEV-ES guests when the feature is 4624 + * available, i.e. TSC_AUX is loaded on #VMEXIT from the host save area. 4625 + * Set the save area to the current hardware value, i.e. the current 4626 + * user return value, so that the correct value is restored on #VMEXIT. 4627 + */ 4628 + if (cpu_feature_enabled(X86_FEATURE_V_TSC_AUX) && 4629 + !WARN_ON_ONCE(tsc_aux_uret_slot < 0)) 4630 + hostsa->tsc_aux = kvm_get_user_return_msr(tsc_aux_uret_slot); 4716 4631 } 4717 4632 4718 4633 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)

+101 -135

arch/x86/kvm/svm/svm.c

··· 158 158 static int tsc_scaling = true; 159 159 module_param(tsc_scaling, int, 0444); 160 160 161 - /* 162 - * enable / disable AVIC. Because the defaults differ for APICv 163 - * support between VMX and SVM we cannot use module_param_named. 164 - */ 165 - static bool avic; 166 - module_param(avic, bool, 0444); 167 - module_param(enable_ipiv, bool, 0444); 168 - 169 161 module_param(enable_device_posted_irqs, bool, 0444); 170 162 171 163 bool __read_mostly dump_invalid_vmcb; ··· 187 195 * RDTSCP and RDPID are not used in the kernel, specifically to allow KVM to 188 196 * defer the restoration of TSC_AUX until the CPU returns to userspace. 189 197 */ 190 - static int tsc_aux_uret_slot __read_mostly = -1; 198 + int tsc_aux_uret_slot __ro_after_init = -1; 191 199 192 200 static int get_npt_level(void) 193 201 { ··· 569 577 570 578 amd_pmu_enable_virt(); 571 579 572 - /* 573 - * If TSC_AUX virtualization is supported, TSC_AUX becomes a swap type 574 - * "B" field (see sev_es_prepare_switch_to_guest()) for SEV-ES guests. 575 - * Since Linux does not change the value of TSC_AUX once set, prime the 576 - * TSC_AUX field now to avoid a RDMSR on every vCPU run. 577 - */ 578 - if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) { 579 - u32 __maybe_unused msr_hi; 580 - 581 - rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi); 582 - } 583 - 584 580 return 0; 585 581 } 586 582 ··· 716 736 svm_set_intercept_for_msr(vcpu, MSR_IA32_DEBUGCTLMSR, MSR_TYPE_RW, intercept); 717 737 } 718 738 719 - void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept) 720 - { 721 - static const u32 x2avic_passthrough_msrs[] = { 722 - X2APIC_MSR(APIC_ID), 723 - X2APIC_MSR(APIC_LVR), 724 - X2APIC_MSR(APIC_TASKPRI), 725 - X2APIC_MSR(APIC_ARBPRI), 726 - X2APIC_MSR(APIC_PROCPRI), 727 - X2APIC_MSR(APIC_EOI), 728 - X2APIC_MSR(APIC_RRR), 729 - X2APIC_MSR(APIC_LDR), 730 - X2APIC_MSR(APIC_DFR), 731 - X2APIC_MSR(APIC_SPIV), 732 - X2APIC_MSR(APIC_ISR), 733 - X2APIC_MSR(APIC_TMR), 734 - X2APIC_MSR(APIC_IRR), 735 - X2APIC_MSR(APIC_ESR), 736 - X2APIC_MSR(APIC_ICR), 737 - X2APIC_MSR(APIC_ICR2), 738 - 739 - /* 740 - * Note! Always intercept LVTT, as TSC-deadline timer mode 741 - * isn't virtualized by hardware, and the CPU will generate a 742 - * #GP instead of a #VMEXIT. 743 - */ 744 - X2APIC_MSR(APIC_LVTTHMR), 745 - X2APIC_MSR(APIC_LVTPC), 746 - X2APIC_MSR(APIC_LVT0), 747 - X2APIC_MSR(APIC_LVT1), 748 - X2APIC_MSR(APIC_LVTERR), 749 - X2APIC_MSR(APIC_TMICT), 750 - X2APIC_MSR(APIC_TMCCT), 751 - X2APIC_MSR(APIC_TDCR), 752 - }; 753 - int i; 754 - 755 - if (intercept == svm->x2avic_msrs_intercepted) 756 - return; 757 - 758 - if (!x2avic_enabled) 759 - return; 760 - 761 - for (i = 0; i < ARRAY_SIZE(x2avic_passthrough_msrs); i++) 762 - svm_set_intercept_for_msr(&svm->vcpu, x2avic_passthrough_msrs[i], 763 - MSR_TYPE_RW, intercept); 764 - 765 - svm->x2avic_msrs_intercepted = intercept; 766 - } 767 - 768 739 void svm_vcpu_free_msrpm(void *msrpm) 769 740 { 770 741 __free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE)); ··· 773 842 if (kvm_aperfmperf_in_guest(vcpu->kvm)) { 774 843 svm_disable_intercept_for_msr(vcpu, MSR_IA32_APERF, MSR_TYPE_R); 775 844 svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R); 845 + } 846 + 847 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 848 + bool shstk_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 849 + 850 + svm_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, !shstk_enabled); 851 + svm_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, !shstk_enabled); 852 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, !shstk_enabled); 853 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, !shstk_enabled); 854 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, !shstk_enabled); 855 + svm_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, !shstk_enabled); 776 856 } 777 857 778 858 if (sev_es_guest(vcpu->kvm)) ··· 1019 1077 } 1020 1078 } 1021 1079 1022 - static void svm_recalc_intercepts_after_set_cpuid(struct kvm_vcpu *vcpu) 1080 + static void svm_recalc_intercepts(struct kvm_vcpu *vcpu) 1023 1081 { 1024 1082 svm_recalc_instruction_intercepts(vcpu); 1025 1083 svm_recalc_msr_intercepts(vcpu); 1026 1084 } 1027 1085 1028 - static void init_vmcb(struct kvm_vcpu *vcpu) 1086 + static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event) 1029 1087 { 1030 1088 struct vcpu_svm *svm = to_svm(vcpu); 1031 1089 struct vmcb *vmcb = svm->vmcb01.ptr; ··· 1163 1221 svm_set_intercept(svm, INTERCEPT_BUSLOCK); 1164 1222 1165 1223 if (sev_guest(vcpu->kvm)) 1166 - sev_init_vmcb(svm); 1224 + sev_init_vmcb(svm, init_event); 1167 1225 1168 1226 svm_hv_init_vmcb(vmcb); 1169 1227 1170 - svm_recalc_intercepts_after_set_cpuid(vcpu); 1228 + kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu); 1171 1229 1172 1230 vmcb_mark_all_dirty(vmcb); 1173 1231 ··· 1186 1244 1187 1245 svm->nmi_masked = false; 1188 1246 svm->awaiting_iret_completion = false; 1189 - 1190 - if (sev_es_guest(vcpu->kvm)) 1191 - sev_es_vcpu_reset(svm); 1192 1247 } 1193 1248 1194 1249 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ··· 1195 1256 svm->spec_ctrl = 0; 1196 1257 svm->virt_spec_ctrl = 0; 1197 1258 1198 - if (init_event) 1199 - sev_snp_init_protected_guest_state(vcpu); 1200 - 1201 - init_vmcb(vcpu); 1259 + init_vmcb(vcpu, init_event); 1202 1260 1203 1261 if (!init_event) 1204 1262 __svm_vcpu_reset(vcpu); ··· 1211 1275 { 1212 1276 struct vcpu_svm *svm; 1213 1277 struct page *vmcb01_page; 1214 - struct page *vmsa_page = NULL; 1215 1278 int err; 1216 1279 1217 1280 BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0); ··· 1221 1286 if (!vmcb01_page) 1222 1287 goto out; 1223 1288 1224 - if (sev_es_guest(vcpu->kvm)) { 1225 - /* 1226 - * SEV-ES guests require a separate VMSA page used to contain 1227 - * the encrypted register state of the guest. 1228 - */ 1229 - vmsa_page = snp_safe_alloc_page(); 1230 - if (!vmsa_page) 1231 - goto error_free_vmcb_page; 1232 - } 1289 + err = sev_vcpu_create(vcpu); 1290 + if (err) 1291 + goto error_free_vmcb_page; 1233 1292 1234 1293 err = avic_init_vcpu(svm); 1235 1294 if (err) 1236 - goto error_free_vmsa_page; 1295 + goto error_free_sev; 1237 1296 1238 1297 svm->msrpm = svm_vcpu_alloc_msrpm(); 1239 1298 if (!svm->msrpm) { 1240 1299 err = -ENOMEM; 1241 - goto error_free_vmsa_page; 1300 + goto error_free_sev; 1242 1301 } 1243 1302 1244 1303 svm->x2avic_msrs_intercepted = true; ··· 1241 1312 svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT); 1242 1313 svm_switch_vmcb(svm, &svm->vmcb01); 1243 1314 1244 - if (vmsa_page) 1245 - svm->sev_es.vmsa = page_address(vmsa_page); 1246 - 1247 1315 svm->guest_state_loaded = false; 1248 1316 1249 1317 return 0; 1250 1318 1251 - error_free_vmsa_page: 1252 - if (vmsa_page) 1253 - __free_page(vmsa_page); 1319 + error_free_sev: 1320 + sev_free_vcpu(vcpu); 1254 1321 error_free_vmcb_page: 1255 1322 __free_page(vmcb01_page); 1256 1323 out: ··· 1348 1423 __svm_write_tsc_multiplier(vcpu->arch.tsc_scaling_ratio); 1349 1424 1350 1425 /* 1351 - * TSC_AUX is always virtualized for SEV-ES guests when the feature is 1352 - * available. The user return MSR support is not required in this case 1353 - * because TSC_AUX is restored on #VMEXIT from the host save area 1354 - * (which has been initialized in svm_enable_virtualization_cpu()). 1426 + * TSC_AUX is always virtualized (context switched by hardware) for 1427 + * SEV-ES guests when the feature is available. For non-SEV-ES guests, 1428 + * context switch TSC_AUX via the user_return MSR infrastructure (not 1429 + * all CPUs support TSC_AUX virtualization). 1355 1430 */ 1356 1431 if (likely(tsc_aux_uret_slot >= 0) && 1357 1432 (!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm))) ··· 2652 2727 static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu, 2653 2728 struct msr_data *msr_info) 2654 2729 { 2655 - return sev_es_guest(vcpu->kvm) && 2656 - vcpu->arch.guest_state_protected && 2730 + return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected && 2731 + msr_info->index != MSR_IA32_XSS && 2657 2732 !msr_write_intercepted(vcpu, msr_info->index); 2658 2733 } 2659 2734 ··· 2708 2783 msr_info->data = svm->vmcb01.ptr->save.sysenter_esp; 2709 2784 if (guest_cpuid_is_intel_compatible(vcpu)) 2710 2785 msr_info->data |= (u64)svm->sysenter_esp_hi << 32; 2786 + break; 2787 + case MSR_IA32_S_CET: 2788 + msr_info->data = svm->vmcb->save.s_cet; 2789 + break; 2790 + case MSR_IA32_INT_SSP_TAB: 2791 + msr_info->data = svm->vmcb->save.isst_addr; 2792 + break; 2793 + case MSR_KVM_INTERNAL_GUEST_SSP: 2794 + msr_info->data = svm->vmcb->save.ssp; 2711 2795 break; 2712 2796 case MSR_TSC_AUX: 2713 2797 msr_info->data = svm->tsc_aux; ··· 2950 3016 svm->vmcb01.ptr->save.sysenter_esp = (u32)data; 2951 3017 svm->sysenter_esp_hi = guest_cpuid_is_intel_compatible(vcpu) ? (data >> 32) : 0; 2952 3018 break; 3019 + case MSR_IA32_S_CET: 3020 + svm->vmcb->save.s_cet = data; 3021 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3022 + break; 3023 + case MSR_IA32_INT_SSP_TAB: 3024 + svm->vmcb->save.isst_addr = data; 3025 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3026 + break; 3027 + case MSR_KVM_INTERNAL_GUEST_SSP: 3028 + svm->vmcb->save.ssp = data; 3029 + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET); 3030 + break; 2953 3031 case MSR_TSC_AUX: 2954 3032 /* 2955 3033 * TSC_AUX is always virtualized for SEV-ES guests when the 2956 3034 * feature is available. The user return MSR support is not 2957 3035 * required in this case because TSC_AUX is restored on #VMEXIT 2958 - * from the host save area (which has been initialized in 2959 - * svm_enable_virtualization_cpu()). 3036 + * from the host save area. 2960 3037 */ 2961 3038 if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) && sev_es_guest(vcpu->kvm)) 2962 3039 break; ··· 3352 3407 pr_err("%-15s %016llx %-13s %016llx\n", 3353 3408 "rsp:", save->rsp, "rax:", save->rax); 3354 3409 pr_err("%-15s %016llx %-13s %016llx\n", 3410 + "s_cet:", save->s_cet, "ssp:", save->ssp); 3411 + pr_err("%-15s %016llx\n", 3412 + "isst_addr:", save->isst_addr); 3413 + pr_err("%-15s %016llx %-13s %016llx\n", 3355 3414 "star:", save01->star, "lstar:", save01->lstar); 3356 3415 pr_err("%-15s %016llx %-13s %016llx\n", 3357 3416 "cstar:", save01->cstar, "sfmask:", save01->sfmask); ··· 3378 3429 3379 3430 pr_err("%-15s %016llx\n", 3380 3431 "sev_features", vmsa->sev_features); 3432 + 3433 + pr_err("%-15s %016llx %-13s %016llx\n", 3434 + "pl0_ssp:", vmsa->pl0_ssp, "pl1_ssp:", vmsa->pl1_ssp); 3435 + pr_err("%-15s %016llx %-13s %016llx\n", 3436 + "pl2_ssp:", vmsa->pl2_ssp, "pl3_ssp:", vmsa->pl3_ssp); 3437 + pr_err("%-15s %016llx\n", 3438 + "u_cet:", vmsa->u_cet); 3381 3439 3382 3440 pr_err("%-15s %016llx %-13s %016llx\n", 3383 3441 "rax:", vmsa->rax, "rbx:", vmsa->rbx); ··· 4136 4180 static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu) 4137 4181 { 4138 4182 struct vcpu_svm *svm = to_svm(vcpu); 4183 + struct vmcb_control_area *control = &svm->vmcb->control; 4184 + 4185 + /* 4186 + * Next RIP must be provided as IRQs are disabled, and accessing guest 4187 + * memory to decode the instruction might fault, i.e. might sleep. 4188 + */ 4189 + if (!nrips || !control->next_rip) 4190 + return EXIT_FASTPATH_NONE; 4139 4191 4140 4192 if (is_guest_mode(vcpu)) 4141 4193 return EXIT_FASTPATH_NONE; 4142 4194 4143 - switch (svm->vmcb->control.exit_code) { 4195 + switch (control->exit_code) { 4144 4196 case SVM_EXIT_MSR: 4145 - if (!svm->vmcb->control.exit_info_1) 4197 + if (!control->exit_info_1) 4146 4198 break; 4147 - return handle_fastpath_set_msr_irqoff(vcpu); 4199 + return handle_fastpath_wrmsr(vcpu); 4148 4200 case SVM_EXIT_HLT: 4149 4201 return handle_fastpath_hlt(vcpu); 4202 + case SVM_EXIT_INVD: 4203 + return handle_fastpath_invd(vcpu); 4150 4204 default: 4151 4205 break; 4152 4206 } ··· 4433 4467 4434 4468 if (sev_guest(vcpu->kvm)) 4435 4469 sev_vcpu_after_set_cpuid(svm); 4436 - 4437 - svm_recalc_intercepts_after_set_cpuid(vcpu); 4438 4470 } 4439 4471 4440 4472 static bool svm_has_wbinvd_exit(void) ··· 5005 5041 return page_address(page); 5006 5042 } 5007 5043 5008 - static struct kvm_x86_ops svm_x86_ops __initdata = { 5044 + struct kvm_x86_ops svm_x86_ops __initdata = { 5009 5045 .name = KBUILD_MODNAME, 5010 5046 5011 5047 .check_processor_compatibility = svm_check_processor_compat, ··· 5134 5170 5135 5171 .apic_init_signal_blocked = svm_apic_init_signal_blocked, 5136 5172 5137 - .recalc_msr_intercepts = svm_recalc_msr_intercepts, 5173 + .recalc_intercepts = svm_recalc_intercepts, 5138 5174 .complete_emulated_msr = svm_complete_emulated_msr, 5139 5175 5140 5176 .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, ··· 5192 5228 kvm_set_cpu_caps(); 5193 5229 5194 5230 kvm_caps.supported_perf_cap = 0; 5195 - kvm_caps.supported_xss = 0; 5231 + 5232 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 5196 5233 5197 5234 /* CPUID 0x80000001 and 0x8000000A (SVM features) */ 5198 5235 if (nested) { ··· 5265 5300 /* CPUID 0x8000001F (SME/SEV features) */ 5266 5301 sev_set_cpu_caps(); 5267 5302 5268 - /* Don't advertise Bus Lock Detect to guest if SVM support is absent */ 5303 + /* 5304 + * Clear capabilities that are automatically configured by common code, 5305 + * but that require explicit SVM support (that isn't yet implemented). 5306 + */ 5269 5307 kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT); 5308 + kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM); 5270 5309 } 5271 5310 5272 5311 static __init int svm_hardware_setup(void) ··· 5343 5374 get_npt_level(), PG_LEVEL_1G); 5344 5375 pr_info("Nested Paging %s\n", str_enabled_disabled(npt_enabled)); 5345 5376 5377 + /* 5378 + * It seems that on AMD processors PTE's accessed bit is 5379 + * being set by the CPU hardware before the NPF vmexit. 5380 + * This is not expected behaviour and our tests fail because 5381 + * of it. 5382 + * A workaround here is to disable support for 5383 + * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled. 5384 + * In this case userspace can know if there is support using 5385 + * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle 5386 + * it 5387 + * If future AMD CPU models change the behaviour described above, 5388 + * this variable can be changed accordingly 5389 + */ 5390 + allow_smaller_maxphyaddr = !npt_enabled; 5391 + 5346 5392 /* Setup shadow_me_value and shadow_me_mask */ 5347 5393 kvm_mmu_set_me_spte_mask(sme_me_mask, sme_me_mask); 5348 5394 ··· 5392 5408 goto err; 5393 5409 } 5394 5410 5395 - enable_apicv = avic = avic && avic_hardware_setup(); 5396 - 5411 + enable_apicv = avic_hardware_setup(); 5397 5412 if (!enable_apicv) { 5398 5413 enable_ipiv = false; 5399 5414 svm_x86_ops.vcpu_blocking = NULL; 5400 5415 svm_x86_ops.vcpu_unblocking = NULL; 5401 5416 svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL; 5402 - } else if (!x2avic_enabled) { 5403 - svm_x86_ops.allow_apicv_in_x2apic_without_x2apic_virtualization = true; 5404 5417 } 5405 5418 5406 5419 if (vls) { ··· 5433 5452 pr_info("PMU virtualization is disabled\n"); 5434 5453 5435 5454 svm_set_cpu_caps(); 5436 - 5437 - /* 5438 - * It seems that on AMD processors PTE's accessed bit is 5439 - * being set by the CPU hardware before the NPF vmexit. 5440 - * This is not expected behaviour and our tests fail because 5441 - * of it. 5442 - * A workaround here is to disable support for 5443 - * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled. 5444 - * In this case userspace can know if there is support using 5445 - * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle 5446 - * it 5447 - * If future AMD CPU models change the behaviour described above, 5448 - * this variable can be changed accordingly 5449 - */ 5450 - allow_smaller_maxphyaddr = !npt_enabled; 5451 5455 5452 5456 kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED; 5453 5457 return 0;

+26 -18

arch/x86/kvm/svm/svm.h

··· 48 48 extern int nrips; 49 49 extern int vgif; 50 50 extern bool intercept_smi; 51 - extern bool x2avic_enabled; 52 51 extern bool vnmi; 53 52 extern int lbrv; 53 + 54 + extern int tsc_aux_uret_slot __ro_after_init; 55 + 56 + extern struct kvm_x86_ops svm_x86_ops __initdata; 54 57 55 58 /* 56 59 * Clean bits in VMCB. ··· 77 74 * AVIC PHYSICAL_TABLE pointer, 78 75 * AVIC LOGICAL_TABLE pointer 79 76 */ 77 + VMCB_CET, /* S_CET, SSP, ISST_ADDR */ 80 78 VMCB_SW = 31, /* Reserved for hypervisor/software use */ 81 79 }; 82 80 ··· 86 82 (1U << VMCB_ASID) | (1U << VMCB_INTR) | \ 87 83 (1U << VMCB_NPT) | (1U << VMCB_CR) | (1U << VMCB_DR) | \ 88 84 (1U << VMCB_DT) | (1U << VMCB_SEG) | (1U << VMCB_CR2) | \ 89 - (1U << VMCB_LBR) | (1U << VMCB_AVIC) | \ 85 + (1U << VMCB_LBR) | (1U << VMCB_AVIC) | (1U << VMCB_CET) | \ 90 86 (1U << VMCB_SW)) 91 87 92 88 /* TPR and CR2 are always written before VMRUN */ ··· 703 699 int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code); 704 700 void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr, 705 701 int read, int write); 706 - void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable); 707 702 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode, 708 703 int trig_mode, int vec); 709 704 ··· 804 801 BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_TOO_BIG) \ 805 802 ) 806 803 807 - bool avic_hardware_setup(void); 804 + bool __init avic_hardware_setup(void); 808 805 int avic_ga_log_notifier(u32 ga_tag); 809 806 void avic_vm_destroy(struct kvm *kvm); 810 807 int avic_vm_init(struct kvm *kvm); ··· 829 826 /* sev.c */ 830 827 831 828 int pre_sev_run(struct vcpu_svm *svm, int cpu); 832 - void sev_init_vmcb(struct vcpu_svm *svm); 829 + void sev_init_vmcb(struct vcpu_svm *svm, bool init_event); 833 830 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm); 834 831 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in); 835 - void sev_es_vcpu_reset(struct vcpu_svm *svm); 836 832 void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu); 837 833 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector); 838 834 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa); ··· 856 854 return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT); 857 855 } 858 856 857 + int sev_vcpu_create(struct kvm_vcpu *vcpu); 859 858 void sev_free_vcpu(struct kvm_vcpu *vcpu); 860 859 void sev_vm_destroy(struct kvm *kvm); 861 860 void __init sev_set_cpu_caps(void); ··· 866 863 int sev_dev_get_attr(u32 group, u64 attr, u64 *val); 867 864 extern unsigned int max_sev_asid; 868 865 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code); 869 - void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); 870 866 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); 871 867 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); 872 868 int sev_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_private); ··· 882 880 return snp_safe_alloc_page_node(numa_node_id(), GFP_KERNEL_ACCOUNT); 883 881 } 884 882 883 + static inline int sev_vcpu_create(struct kvm_vcpu *vcpu) { return 0; } 885 884 static inline void sev_free_vcpu(struct kvm_vcpu *vcpu) {} 886 885 static inline void sev_vm_destroy(struct kvm *kvm) {} 887 886 static inline void __init sev_set_cpu_caps(void) {} ··· 892 889 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; } 893 890 #define max_sev_asid 0 894 891 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {} 895 - static inline void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu) {} 896 892 static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) 897 893 { 898 894 return 0; ··· 916 914 void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted); 917 915 918 916 #define DEFINE_KVM_GHCB_ACCESSORS(field) \ 919 - static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \ 920 - { \ 921 - return test_bit(GHCB_BITMAP_IDX(field), \ 922 - (unsigned long *)&svm->sev_es.valid_bitmap); \ 923 - } \ 924 - \ 925 - static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \ 926 - { \ 927 - return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0; \ 928 - } \ 917 + static __always_inline u64 kvm_ghcb_get_##field(struct vcpu_svm *svm) \ 918 + { \ 919 + return READ_ONCE(svm->sev_es.ghcb->save.field); \ 920 + } \ 921 + \ 922 + static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \ 923 + { \ 924 + return test_bit(GHCB_BITMAP_IDX(field), \ 925 + (unsigned long *)&svm->sev_es.valid_bitmap); \ 926 + } \ 927 + \ 928 + static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm) \ 929 + { \ 930 + return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(svm) : 0; \ 931 + } 929 932 930 933 DEFINE_KVM_GHCB_ACCESSORS(cpl) 931 934 DEFINE_KVM_GHCB_ACCESSORS(rax) ··· 943 936 DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2) 944 937 DEFINE_KVM_GHCB_ACCESSORS(sw_scratch) 945 938 DEFINE_KVM_GHCB_ACCESSORS(xcr0) 939 + DEFINE_KVM_GHCB_ACCESSORS(xss) 946 940 947 941 #endif

+27 -1

arch/x86/kvm/svm/svm_onhyperv.c

··· 15 15 #include "kvm_onhyperv.h" 16 16 #include "svm_onhyperv.h" 17 17 18 - int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) 18 + static int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) 19 19 { 20 20 struct hv_vmcb_enlightenments *hve; 21 21 hpa_t partition_assist_page = hv_get_partition_assist_page(vcpu); ··· 35 35 return 0; 36 36 } 37 37 38 + __init void svm_hv_hardware_setup(void) 39 + { 40 + if (npt_enabled && 41 + ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) { 42 + pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n"); 43 + svm_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; 44 + svm_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; 45 + } 46 + 47 + if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) { 48 + int cpu; 49 + 50 + pr_info(KBUILD_MODNAME ": Hyper-V Direct TLB Flush enabled\n"); 51 + for_each_online_cpu(cpu) { 52 + struct hv_vp_assist_page *vp_ap = 53 + hv_get_vp_assist_page(cpu); 54 + 55 + if (!vp_ap) 56 + continue; 57 + 58 + vp_ap->nested_control.features.directhypercall = 1; 59 + } 60 + svm_x86_ops.enable_l2_tlb_flush = 61 + svm_hv_enable_l2_tlb_flush; 62 + } 63 + }

+1 -30

arch/x86/kvm/svm/svm_onhyperv.h

··· 13 13 #include "kvm_onhyperv.h" 14 14 #include "svm/hyperv.h" 15 15 16 - static struct kvm_x86_ops svm_x86_ops; 17 - 18 - int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu); 16 + __init void svm_hv_hardware_setup(void); 19 17 20 18 static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu) 21 19 { ··· 36 38 37 39 if (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP) 38 40 hve->hv_enlightenments_control.msr_bitmap = 1; 39 - } 40 - 41 - static inline __init void svm_hv_hardware_setup(void) 42 - { 43 - if (npt_enabled && 44 - ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) { 45 - pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n"); 46 - svm_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; 47 - svm_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; 48 - } 49 - 50 - if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) { 51 - int cpu; 52 - 53 - pr_info(KBUILD_MODNAME ": Hyper-V Direct TLB Flush enabled\n"); 54 - for_each_online_cpu(cpu) { 55 - struct hv_vp_assist_page *vp_ap = 56 - hv_get_vp_assist_page(cpu); 57 - 58 - if (!vp_ap) 59 - continue; 60 - 61 - vp_ap->nested_control.features.directhypercall = 1; 62 - } 63 - svm_x86_ops.enable_l2_tlb_flush = 64 - svm_hv_enable_l2_tlb_flush; 65 - } 66 41 } 67 42 68 43 static inline void svm_hv_vmcb_dirty_nested_enlightenments(

+3 -2

arch/x86/kvm/trace.h

··· 461 461 462 462 #define kvm_trace_sym_exc \ 463 463 EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \ 464 - EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \ 465 - EXS(MF), EXS(AC), EXS(MC) 464 + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \ 465 + EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP), \ 466 + EXS(HV), EXS(VC), EXS(SX) 466 467 467 468 /* 468 469 * Tracepoint for kvm interrupt injection:

+9 -3

arch/x86/kvm/vmx/capabilities.h

··· 20 20 #define PT_MODE_SYSTEM 0 21 21 #define PT_MODE_HOST_GUEST 1 22 22 23 - #define PMU_CAP_FW_WRITES (1ULL << 13) 24 - #define PMU_CAP_LBR_FMT 0x3f 25 - 26 23 struct nested_vmx_msrs { 27 24 /* 28 25 * We only store the "true" versions of the VMX capability MSRs. We ··· 73 76 return vmcs_config.basic & VMX_BASIC_INOUT; 74 77 } 75 78 79 + static inline bool cpu_has_vmx_basic_no_hw_errcode_cc(void) 80 + { 81 + return vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC; 82 + } 83 + 76 84 static inline bool cpu_has_virtual_nmis(void) 77 85 { 78 86 return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS && ··· 105 103 return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; 106 104 } 107 105 106 + static inline bool cpu_has_load_cet_ctrl(void) 107 + { 108 + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE); 109 + } 108 110 static inline bool cpu_has_vmx_mpx(void) 109 111 { 110 112 return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;

+7 -7

arch/x86/kvm/vmx/main.c

··· 188 188 return vmx_get_msr(vcpu, msr_info); 189 189 } 190 190 191 - static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 191 + static void vt_recalc_intercepts(struct kvm_vcpu *vcpu) 192 192 { 193 193 /* 194 - * TDX doesn't allow VMM to configure interception of MSR accesses. 195 - * TDX guest requests MSR accesses by calling TDVMCALL. The MSR 196 - * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR 197 - * if the userspace has set any. 194 + * TDX doesn't allow VMM to configure interception of instructions or 195 + * MSR accesses. TDX guest requests MSR accesses by calling TDVMCALL. 196 + * The MSR filters will be applied when handling the TDVMCALL for 197 + * RDMSR/WRMSR if the userspace has set any. 198 198 */ 199 199 if (is_td_vcpu(vcpu)) 200 200 return; 201 201 202 - vmx_recalc_msr_intercepts(vcpu); 202 + vmx_recalc_intercepts(vcpu); 203 203 } 204 204 205 205 static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) ··· 996 996 .apic_init_signal_blocked = vt_op(apic_init_signal_blocked), 997 997 .migrate_timers = vmx_migrate_timers, 998 998 999 - .recalc_msr_intercepts = vt_op(recalc_msr_intercepts), 999 + .recalc_intercepts = vt_op(recalc_intercepts), 1000 1000 .complete_emulated_msr = vt_op(complete_emulated_msr), 1001 1001 1002 1002 .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,

+188 -27

arch/x86/kvm/vmx/nested.c

··· 721 721 nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 722 722 MSR_IA32_MPERF, MSR_TYPE_R); 723 723 724 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 725 + MSR_IA32_U_CET, MSR_TYPE_RW); 726 + 727 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 728 + MSR_IA32_S_CET, MSR_TYPE_RW); 729 + 730 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 731 + MSR_IA32_PL0_SSP, MSR_TYPE_RW); 732 + 733 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 734 + MSR_IA32_PL1_SSP, MSR_TYPE_RW); 735 + 736 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 737 + MSR_IA32_PL2_SSP, MSR_TYPE_RW); 738 + 739 + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, 740 + MSR_IA32_PL3_SSP, MSR_TYPE_RW); 741 + 724 742 kvm_vcpu_unmap(vcpu, &map); 725 743 726 744 vmx->nested.force_msr_bitmap_recalc = false; ··· 1015 997 __func__, i, e.index, e.reserved); 1016 998 goto fail; 1017 999 } 1018 - if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) { 1000 + if (kvm_emulate_msr_write(vcpu, e.index, e.value)) { 1019 1001 pr_debug_ratelimited( 1020 1002 "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", 1021 1003 __func__, i, e.index, e.value); ··· 1051 1033 } 1052 1034 } 1053 1035 1054 - if (kvm_get_msr_with_filter(vcpu, msr_index, data)) { 1036 + if (kvm_emulate_msr_read(vcpu, msr_index, data)) { 1055 1037 pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__, 1056 1038 msr_index); 1057 1039 return false; ··· 1290 1272 { 1291 1273 const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT | 1292 1274 VMX_BASIC_INOUT | 1293 - VMX_BASIC_TRUE_CTLS; 1275 + VMX_BASIC_TRUE_CTLS | 1276 + VMX_BASIC_NO_HW_ERROR_CODE_CC; 1294 1277 1295 - const u64 reserved_bits = GENMASK_ULL(63, 56) | 1278 + const u64 reserved_bits = GENMASK_ULL(63, 57) | 1296 1279 GENMASK_ULL(47, 45) | 1297 1280 BIT_ULL(31); 1298 1281 ··· 2539 2520 } 2540 2521 } 2541 2522 2523 + static void vmcs_read_cet_state(struct kvm_vcpu *vcpu, u64 *s_cet, 2524 + u64 *ssp, u64 *ssp_tbl) 2525 + { 2526 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) || 2527 + guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 2528 + *s_cet = vmcs_readl(GUEST_S_CET); 2529 + 2530 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 2531 + *ssp = vmcs_readl(GUEST_SSP); 2532 + *ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE); 2533 + } 2534 + } 2535 + 2536 + static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, 2537 + u64 ssp, u64 ssp_tbl) 2538 + { 2539 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) || 2540 + guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 2541 + vmcs_writel(GUEST_S_CET, s_cet); 2542 + 2543 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { 2544 + vmcs_writel(GUEST_SSP, ssp); 2545 + vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl); 2546 + } 2547 + } 2548 + 2542 2549 static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) 2543 2550 { 2544 2551 struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx); ··· 2681 2636 vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); 2682 2637 vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); 2683 2638 2639 + if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) 2640 + vmcs_write_cet_state(&vmx->vcpu, vmcs12->guest_s_cet, 2641 + vmcs12->guest_ssp, vmcs12->guest_ssp_tbl); 2642 + 2684 2643 set_cr4_guest_host_mask(vmx); 2685 2644 } 2686 2645 ··· 2724 2675 kvm_set_dr(vcpu, 7, vcpu->arch.dr7); 2725 2676 vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl); 2726 2677 } 2678 + 2679 + if (!vmx->nested.nested_run_pending || 2680 + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) 2681 + vmcs_write_cet_state(vcpu, vmx->nested.pre_vmenter_s_cet, 2682 + vmx->nested.pre_vmenter_ssp, 2683 + vmx->nested.pre_vmenter_ssp_tbl); 2684 + 2727 2685 if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || 2728 2686 !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) 2729 2687 vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs); ··· 2826 2770 2827 2771 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && 2828 2772 kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu)) && 2829 - WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 2830 - vmcs12->guest_ia32_perf_global_ctrl))) { 2773 + WARN_ON_ONCE(__kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 2774 + vmcs12->guest_ia32_perf_global_ctrl))) { 2831 2775 *entry_failure_code = ENTRY_FAIL_DEFAULT; 2832 2776 return -EINVAL; 2833 2777 } ··· 3005 2949 u8 vector = intr_info & INTR_INFO_VECTOR_MASK; 3006 2950 u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK; 3007 2951 bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK; 3008 - bool should_have_error_code; 3009 2952 bool urg = nested_cpu_has2(vmcs12, 3010 2953 SECONDARY_EXEC_UNRESTRICTED_GUEST); 3011 2954 bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE; ··· 3021 2966 CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0)) 3022 2967 return -EINVAL; 3023 2968 3024 - /* VM-entry interruption-info field: deliver error code */ 3025 - should_have_error_code = 3026 - intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode && 3027 - x86_exception_has_error_code(vector); 3028 - if (CC(has_error_code != should_have_error_code)) 3029 - return -EINVAL; 2969 + /* 2970 + * Cannot deliver error code in real mode or if the interrupt 2971 + * type is not hardware exception. For other cases, do the 2972 + * consistency check only if the vCPU doesn't enumerate 2973 + * VMX_BASIC_NO_HW_ERROR_CODE_CC. 2974 + */ 2975 + if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) { 2976 + if (CC(has_error_code)) 2977 + return -EINVAL; 2978 + } else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) { 2979 + if (CC(has_error_code != x86_exception_has_error_code(vector))) 2980 + return -EINVAL; 2981 + } 3030 2982 3031 2983 /* VM-entry exception error code */ 3032 2984 if (CC(has_error_code && ··· 3100 3038 return !__is_canonical_address(la, l1_address_bits_on_exit); 3101 3039 } 3102 3040 3041 + static int nested_vmx_check_cet_state_common(struct kvm_vcpu *vcpu, u64 s_cet, 3042 + u64 ssp, u64 ssp_tbl) 3043 + { 3044 + if (CC(!kvm_is_valid_u_s_cet(vcpu, s_cet)) || CC(!IS_ALIGNED(ssp, 4)) || 3045 + CC(is_noncanonical_msr_address(ssp_tbl, vcpu))) 3046 + return -EINVAL; 3047 + 3048 + return 0; 3049 + } 3050 + 3103 3051 static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, 3104 3052 struct vmcs12 *vmcs12) 3105 3053 { ··· 3118 3046 if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) || 3119 3047 CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) || 3120 3048 CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3))) 3049 + return -EINVAL; 3050 + 3051 + if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP))) 3121 3052 return -EINVAL; 3122 3053 3123 3054 if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) || ··· 3177 3102 CC(ia32e != !!(vmcs12->host_ia32_efer & EFER_LMA)) || 3178 3103 CC(ia32e != !!(vmcs12->host_ia32_efer & EFER_LME))) 3179 3104 return -EINVAL; 3105 + } 3106 + 3107 + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) { 3108 + if (nested_vmx_check_cet_state_common(vcpu, vmcs12->host_s_cet, 3109 + vmcs12->host_ssp, 3110 + vmcs12->host_ssp_tbl)) 3111 + return -EINVAL; 3112 + 3113 + /* 3114 + * IA32_S_CET and SSP must be canonical if the host will 3115 + * enter 64-bit mode after VM-exit; otherwise, higher 3116 + * 32-bits must be all 0s. 3117 + */ 3118 + if (ia32e) { 3119 + if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) || 3120 + CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu))) 3121 + return -EINVAL; 3122 + } else { 3123 + if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32)) 3124 + return -EINVAL; 3125 + } 3180 3126 } 3181 3127 3182 3128 return 0; ··· 3258 3162 CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))) 3259 3163 return -EINVAL; 3260 3164 3165 + if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP))) 3166 + return -EINVAL; 3167 + 3261 3168 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && 3262 3169 (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || 3263 3170 CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) ··· 3309 3210 (CC(is_noncanonical_msr_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu)) || 3310 3211 CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))) 3311 3212 return -EINVAL; 3213 + 3214 + if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) { 3215 + if (nested_vmx_check_cet_state_common(vcpu, vmcs12->guest_s_cet, 3216 + vmcs12->guest_ssp, 3217 + vmcs12->guest_ssp_tbl)) 3218 + return -EINVAL; 3219 + 3220 + /* 3221 + * Guest SSP must have 63:N bits identical, rather than 3222 + * be canonical (i.e., 63:N-1 bits identical), where N is 3223 + * the CPU's maximum linear-address width. Similar to 3224 + * is_noncanonical_msr_address(), use the host's 3225 + * linear-address width. 3226 + */ 3227 + if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1))) 3228 + return -EINVAL; 3229 + } 3312 3230 3313 3231 if (nested_check_guest_non_reg_state(vmcs12)) 3314 3232 return -EINVAL; ··· 3660 3544 !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) 3661 3545 vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS); 3662 3546 3547 + if (!vmx->nested.nested_run_pending || 3548 + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) 3549 + vmcs_read_cet_state(vcpu, &vmx->nested.pre_vmenter_s_cet, 3550 + &vmx->nested.pre_vmenter_ssp, 3551 + &vmx->nested.pre_vmenter_ssp_tbl); 3552 + 3663 3553 /* 3664 3554 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* 3665 3555 * nested early checks are disabled. In the event of a "late" VM-Fail, ··· 3812 3690 return 1; 3813 3691 } 3814 3692 3815 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED); 3693 + kvm_pmu_branch_retired(vcpu); 3816 3694 3817 3695 if (CC(evmptrld_status == EVMPTRLD_VMFAIL)) 3818 3696 return nested_vmx_failInvalid(vcpu); ··· 4749 4627 4750 4628 if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER) 4751 4629 vmcs12->guest_ia32_efer = vcpu->arch.efer; 4630 + 4631 + vmcs_read_cet_state(&vmx->vcpu, &vmcs12->guest_s_cet, 4632 + &vmcs12->guest_ssp, 4633 + &vmcs12->guest_ssp_tbl); 4752 4634 } 4753 4635 4754 4636 /* ··· 4878 4752 if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS) 4879 4753 vmcs_write64(GUEST_BNDCFGS, 0); 4880 4754 4755 + /* 4756 + * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set. 4757 + * otherwise CET state should be retained across VM-exit, i.e., 4758 + * guest values should be propagated from vmcs12 to vmcs01. 4759 + */ 4760 + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) 4761 + vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp, 4762 + vmcs12->host_ssp_tbl); 4763 + else 4764 + vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp, 4765 + vmcs12->guest_ssp_tbl); 4766 + 4881 4767 if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) { 4882 4768 vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat); 4883 4769 vcpu->arch.pat = vmcs12->host_ia32_pat; 4884 4770 } 4885 4771 if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) && 4886 4772 kvm_pmu_has_perf_global_ctrl(vcpu_to_pmu(vcpu))) 4887 - WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 4888 - vmcs12->host_ia32_perf_global_ctrl)); 4773 + WARN_ON_ONCE(__kvm_emulate_msr_write(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, 4774 + vmcs12->host_ia32_perf_global_ctrl)); 4889 4775 4890 4776 /* Set L1 segment info according to Intel SDM 4891 4777 27.5.2 Loading Host Segment and Descriptor-Table Registers */ ··· 5075 4937 goto vmabort; 5076 4938 } 5077 4939 5078 - if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) { 4940 + if (kvm_emulate_msr_write(vcpu, h.index, h.value)) { 5079 4941 pr_debug_ratelimited( 5080 4942 "%s WRMSR failed (%u, 0x%x, 0x%llx)\n", 5081 4943 __func__, j, h.index, h.value); ··· 6354 6216 struct vmcs12 *vmcs12, 6355 6217 union vmx_exit_reason exit_reason) 6356 6218 { 6357 - u32 msr_index = kvm_rcx_read(vcpu); 6219 + u32 msr_index; 6358 6220 gpa_t bitmap; 6359 6221 6360 6222 if (!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) 6361 6223 return true; 6224 + 6225 + if (exit_reason.basic == EXIT_REASON_MSR_READ_IMM || 6226 + exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6227 + msr_index = vmx_get_exit_qual(vcpu); 6228 + else 6229 + msr_index = kvm_rcx_read(vcpu); 6362 6230 6363 6231 /* 6364 6232 * The MSR_BITMAP page is divided into four 1024-byte bitmaps, ··· 6372 6228 * First we need to figure out which of the four to use: 6373 6229 */ 6374 6230 bitmap = vmcs12->msr_bitmap; 6375 - if (exit_reason.basic == EXIT_REASON_MSR_WRITE) 6231 + if (exit_reason.basic == EXIT_REASON_MSR_WRITE || 6232 + exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6376 6233 bitmap += 2048; 6377 6234 if (msr_index >= 0xc0000000) { 6378 6235 msr_index -= 0xc0000000; ··· 6672 6527 return nested_cpu_has2(vmcs12, SECONDARY_EXEC_DESC); 6673 6528 case EXIT_REASON_MSR_READ: 6674 6529 case EXIT_REASON_MSR_WRITE: 6530 + case EXIT_REASON_MSR_READ_IMM: 6531 + case EXIT_REASON_MSR_WRITE_IMM: 6675 6532 return nested_vmx_exit_handled_msr(vcpu, vmcs12, exit_reason); 6676 6533 case EXIT_REASON_INVALID_STATE: 6677 6534 return true; ··· 6708 6561 return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING); 6709 6562 case EXIT_REASON_XSETBV: 6710 6563 return true; 6711 - case EXIT_REASON_XSAVES: case EXIT_REASON_XRSTORS: 6564 + case EXIT_REASON_XSAVES: 6565 + case EXIT_REASON_XRSTORS: 6712 6566 /* 6713 - * This should never happen, since it is not possible to 6714 - * set XSS to a non-zero value---neither in L1 nor in L2. 6715 - * If if it were, XSS would have to be checked against 6716 - * the XSS exit bitmap in vmcs12. 6567 + * Always forward XSAVES/XRSTORS to L1 as KVM doesn't utilize 6568 + * XSS-bitmap, and always loads vmcs02 with vmcs12's XSS-bitmap 6569 + * verbatim, i.e. any exit is due to L1's bitmap. WARN if 6570 + * XSAVES isn't enabled, as the CPU is supposed to inject #UD 6571 + * in that case, before consulting the XSS-bitmap. 6717 6572 */ 6718 - return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES); 6573 + WARN_ON_ONCE(!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES)); 6574 + return true; 6719 6575 case EXIT_REASON_UMWAIT: 6720 6576 case EXIT_REASON_TPAUSE: 6721 6577 return nested_cpu_has2(vmcs12, ··· 7179 7029 VM_EXIT_HOST_ADDR_SPACE_SIZE | 7180 7030 #endif 7181 7031 VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | 7182 - VM_EXIT_CLEAR_BNDCFGS; 7032 + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE; 7183 7033 msrs->exit_ctls_high |= 7184 7034 VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | 7185 7035 VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | 7186 7036 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT | 7187 7037 VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; 7038 + 7039 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7040 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7041 + msrs->exit_ctls_high &= ~VM_EXIT_LOAD_CET_STATE; 7188 7042 7189 7043 /* We support free control of debug control saving. */ 7190 7044 msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS; ··· 7205 7051 #ifdef CONFIG_X86_64 7206 7052 VM_ENTRY_IA32E_MODE | 7207 7053 #endif 7208 - VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; 7054 + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | 7055 + VM_ENTRY_LOAD_CET_STATE; 7209 7056 msrs->entry_ctls_high |= 7210 7057 (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER | 7211 7058 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL); 7059 + 7060 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7061 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7062 + msrs->entry_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE; 7212 7063 7213 7064 /* We support free control of debug control loading. */ 7214 7065 msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS; ··· 7364 7205 msrs->basic |= VMX_BASIC_TRUE_CTLS; 7365 7206 if (cpu_has_vmx_basic_inout()) 7366 7207 msrs->basic |= VMX_BASIC_INOUT; 7208 + if (cpu_has_vmx_basic_no_hw_errcode_cc()) 7209 + msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC; 7367 7210 } 7368 7211 7369 7212 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)

+5

arch/x86/kvm/vmx/nested.h

··· 309 309 __kvm_is_valid_cr4(vcpu, val); 310 310 } 311 311 312 + static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu) 313 + { 314 + return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC; 315 + } 316 + 312 317 /* No difference in the restrictions on guest and host CR4 in VMX operation. */ 313 318 #define nested_guest_cr4_valid nested_cr4_valid 314 319 #define nested_host_cr4_valid nested_cr4_valid

+35 -44

arch/x86/kvm/vmx/pmu_intel.c

··· 138 138 139 139 static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu) 140 140 { 141 - return (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_FW_WRITES) != 0; 141 + return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) != 0; 142 142 } 143 143 144 144 static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr) ··· 478 478 }; 479 479 u64 eventsel; 480 480 481 - BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUTNERS); 482 - BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUTNERS); 481 + BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_perf_ids) != KVM_MAX_NR_INTEL_FIXED_COUNTERS); 482 + BUILD_BUG_ON(index >= KVM_MAX_NR_INTEL_FIXED_COUNTERS); 483 483 484 484 /* 485 485 * Yell if perf reports support for a fixed counter but perf doesn't ··· 536 536 kvm_pmu_cap.num_counters_gp); 537 537 eax.split.bit_width = min_t(int, eax.split.bit_width, 538 538 kvm_pmu_cap.bit_width_gp); 539 - pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1; 539 + pmu->counter_bitmask[KVM_PMC_GP] = BIT_ULL(eax.split.bit_width) - 1; 540 540 eax.split.mask_length = min_t(int, eax.split.mask_length, 541 541 kvm_pmu_cap.events_mask_len); 542 - pmu->available_event_types = ~entry->ebx & 543 - ((1ull << eax.split.mask_length) - 1); 542 + pmu->available_event_types = ~entry->ebx & (BIT_ULL(eax.split.mask_length) - 1); 544 543 545 - if (pmu->version == 1) { 546 - pmu->nr_arch_fixed_counters = 0; 547 - } else { 548 - pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed, 549 - kvm_pmu_cap.num_counters_fixed); 550 - edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed, 551 - kvm_pmu_cap.bit_width_fixed); 552 - pmu->counter_bitmask[KVM_PMC_FIXED] = 553 - ((u64)1 << edx.split.bit_width_fixed) - 1; 544 + entry = kvm_find_cpuid_entry_index(vcpu, 7, 0); 545 + if (entry && 546 + (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && 547 + (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { 548 + pmu->reserved_bits ^= HSW_IN_TX; 549 + pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); 554 550 } 551 + 552 + perf_capabilities = vcpu_get_perf_capabilities(vcpu); 553 + if (intel_pmu_lbr_is_compatible(vcpu) && 554 + (perf_capabilities & PERF_CAP_LBR_FMT)) 555 + memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); 556 + else 557 + lbr_desc->records.nr = 0; 558 + 559 + if (lbr_desc->records.nr) 560 + bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1); 561 + 562 + if (pmu->version == 1) 563 + return; 564 + 565 + pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed, 566 + kvm_pmu_cap.num_counters_fixed); 567 + edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed, 568 + kvm_pmu_cap.bit_width_fixed); 569 + pmu->counter_bitmask[KVM_PMC_FIXED] = BIT_ULL(edx.split.bit_width_fixed) - 1; 555 570 556 571 intel_pmu_enable_fixed_counter_bits(pmu, INTEL_FIXED_0_KERNEL | 557 572 INTEL_FIXED_0_USER | 558 573 INTEL_FIXED_0_ENABLE_PMI); 559 574 560 - counter_rsvd = ~(((1ull << pmu->nr_arch_gp_counters) - 1) | 561 - (((1ull << pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX)); 575 + counter_rsvd = ~((BIT_ULL(pmu->nr_arch_gp_counters) - 1) | 576 + ((BIT_ULL(pmu->nr_arch_fixed_counters) - 1) << KVM_FIXED_PMC_BASE_IDX)); 562 577 pmu->global_ctrl_rsvd = counter_rsvd; 563 578 564 579 /* ··· 588 573 pmu->global_status_rsvd &= 589 574 ~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI; 590 575 591 - entry = kvm_find_cpuid_entry_index(vcpu, 7, 0); 592 - if (entry && 593 - (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && 594 - (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { 595 - pmu->reserved_bits ^= HSW_IN_TX; 596 - pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); 597 - } 598 - 599 - bitmap_set(pmu->all_valid_pmc_idx, 600 - 0, pmu->nr_arch_gp_counters); 601 - bitmap_set(pmu->all_valid_pmc_idx, 602 - INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters); 603 - 604 - perf_capabilities = vcpu_get_perf_capabilities(vcpu); 605 - if (intel_pmu_lbr_is_compatible(vcpu) && 606 - (perf_capabilities & PMU_CAP_LBR_FMT)) 607 - memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); 608 - else 609 - lbr_desc->records.nr = 0; 610 - 611 - if (lbr_desc->records.nr) 612 - bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1); 613 - 614 576 if (perf_capabilities & PERF_CAP_PEBS_FORMAT) { 615 577 if (perf_capabilities & PERF_CAP_PEBS_BASELINE) { 616 578 pmu->pebs_enable_rsvd = counter_rsvd; ··· 595 603 pmu->pebs_data_cfg_rsvd = ~0xff00000full; 596 604 intel_pmu_enable_fixed_counter_bits(pmu, ICL_FIXED_0_ADAPTIVE); 597 605 } else { 598 - pmu->pebs_enable_rsvd = 599 - ~((1ull << pmu->nr_arch_gp_counters) - 1); 606 + pmu->pebs_enable_rsvd = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1); 600 607 } 601 608 } 602 609 } ··· 616 625 pmu->gp_counters[i].current_config = 0; 617 626 } 618 627 619 - for (i = 0; i < KVM_MAX_NR_INTEL_FIXED_COUTNERS; i++) { 628 + for (i = 0; i < KVM_MAX_NR_INTEL_FIXED_COUNTERS; i++) { 620 629 pmu->fixed_counters[i].type = KVM_PMC_FIXED; 621 630 pmu->fixed_counters[i].vcpu = vcpu; 622 631 pmu->fixed_counters[i].idx = i + KVM_FIXED_PMC_BASE_IDX; ··· 753 762 int bit, hw_idx; 754 763 755 764 kvm_for_each_pmc(pmu, pmc, bit, (unsigned long *)&pmu->global_ctrl) { 756 - if (!pmc_speculative_in_use(pmc) || 765 + if (!pmc_is_locally_enabled(pmc) || 757 766 !pmc_is_globally_enabled(pmc) || !pmc->perf_event) 758 767 continue; 759 768

+20 -8

arch/x86/kvm/vmx/tdx.c

··· 620 620 struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); 621 621 622 622 kvm->arch.has_protected_state = true; 623 + /* 624 + * TDX Module doesn't allow the hypervisor to modify the EOI-bitmap, 625 + * i.e. all EOIs are accelerated and never trigger exits. 626 + */ 627 + kvm->arch.has_protected_eoi = true; 623 628 kvm->arch.has_private_mem = true; 624 629 kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT; 625 630 ··· 1999 1994 * handle retries locally in their EPT violation handlers. 2000 1995 */ 2001 1996 while (1) { 1997 + struct kvm_memory_slot *slot; 1998 + 2002 1999 ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual); 2003 2000 2004 2001 if (ret != RET_PF_RETRY || !local_retry) ··· 2013 2006 ret = -EIO; 2014 2007 break; 2015 2008 } 2009 + 2010 + /* 2011 + * Bail if the memslot is invalid, i.e. is being deleted, as 2012 + * faulting in will never succeed and this task needs to drop 2013 + * SRCU in order to let memslot deletion complete. 2014 + */ 2015 + slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa)); 2016 + if (slot && slot->flags & KVM_MEMSLOT_INVALID) 2017 + break; 2016 2018 2017 2019 cond_resched(); 2018 2020 } ··· 2488 2472 /* TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ 2489 2473 kvm_tdx->td.tdcx_nr_pages = tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1; 2490 2474 tdcs_pages = kcalloc(kvm_tdx->td.tdcs_nr_pages, sizeof(*kvm_tdx->td.tdcs_pages), 2491 - GFP_KERNEL | __GFP_ZERO); 2475 + GFP_KERNEL); 2492 2476 if (!tdcs_pages) 2493 2477 goto free_tdr; 2494 2478 ··· 3476 3460 if (r) 3477 3461 goto tdx_bringup_err; 3478 3462 3463 + r = -EINVAL; 3479 3464 /* Get TDX global information for later use */ 3480 3465 tdx_sysinfo = tdx_get_sysinfo(); 3481 - if (WARN_ON_ONCE(!tdx_sysinfo)) { 3482 - r = -EINVAL; 3466 + if (WARN_ON_ONCE(!tdx_sysinfo)) 3483 3467 goto get_sysinfo_err; 3484 - } 3485 3468 3486 3469 /* Check TDX module and KVM capabilities */ 3487 3470 if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) || ··· 3523 3508 if (td_conf->max_vcpus_per_td < num_present_cpus()) { 3524 3509 pr_err("Disable TDX: MAX_VCPU_PER_TD (%u) smaller than number of logical CPUs (%u).\n", 3525 3510 td_conf->max_vcpus_per_td, num_present_cpus()); 3526 - r = -EINVAL; 3527 3511 goto get_sysinfo_err; 3528 3512 } 3529 3513 3530 - if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) { 3531 - r = -EINVAL; 3514 + if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) 3532 3515 goto get_sysinfo_err; 3533 - } 3534 3516 3535 3517 /* 3536 3518 * Leave hardware virtualization enabled after TDX is enabled

+6

arch/x86/kvm/vmx/vmcs12.c

··· 139 139 FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions), 140 140 FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp), 141 141 FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip), 142 + FIELD(GUEST_S_CET, guest_s_cet), 143 + FIELD(GUEST_SSP, guest_ssp), 144 + FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl), 142 145 FIELD(HOST_CR0, host_cr0), 143 146 FIELD(HOST_CR3, host_cr3), 144 147 FIELD(HOST_CR4, host_cr4), ··· 154 151 FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip), 155 152 FIELD(HOST_RSP, host_rsp), 156 153 FIELD(HOST_RIP, host_rip), 154 + FIELD(HOST_S_CET, host_s_cet), 155 + FIELD(HOST_SSP, host_ssp), 156 + FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl), 157 157 }; 158 158 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);

+13 -1

arch/x86/kvm/vmx/vmcs12.h

··· 117 117 natural_width host_ia32_sysenter_eip; 118 118 natural_width host_rsp; 119 119 natural_width host_rip; 120 - natural_width paddingl[8]; /* room for future expansion */ 120 + natural_width host_s_cet; 121 + natural_width host_ssp; 122 + natural_width host_ssp_tbl; 123 + natural_width guest_s_cet; 124 + natural_width guest_ssp; 125 + natural_width guest_ssp_tbl; 126 + natural_width paddingl[2]; /* room for future expansion */ 121 127 u32 pin_based_vm_exec_control; 122 128 u32 cpu_based_vm_exec_control; 123 129 u32 exception_bitmap; ··· 300 294 CHECK_OFFSET(host_ia32_sysenter_eip, 656); 301 295 CHECK_OFFSET(host_rsp, 664); 302 296 CHECK_OFFSET(host_rip, 672); 297 + CHECK_OFFSET(host_s_cet, 680); 298 + CHECK_OFFSET(host_ssp, 688); 299 + CHECK_OFFSET(host_ssp_tbl, 696); 300 + CHECK_OFFSET(guest_s_cet, 704); 301 + CHECK_OFFSET(guest_ssp, 712); 302 + CHECK_OFFSET(guest_ssp_tbl, 720); 303 303 CHECK_OFFSET(pin_based_vm_exec_control, 744); 304 304 CHECK_OFFSET(cpu_based_vm_exec_control, 748); 305 305 CHECK_OFFSET(exception_bitmap, 752);

+181 -52

arch/x86/kvm/vmx/vmx.c

··· 1344 1344 } 1345 1345 1346 1346 #ifdef CONFIG_X86_64 1347 - static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) 1347 + static u64 vmx_read_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 *cache) 1348 1348 { 1349 1349 preempt_disable(); 1350 1350 if (vmx->vt.guest_state_loaded) 1351 - rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); 1351 + *cache = read_msr(msr); 1352 1352 preempt_enable(); 1353 - return vmx->msr_guest_kernel_gs_base; 1353 + return *cache; 1354 + } 1355 + 1356 + static void vmx_write_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 data, 1357 + u64 *cache) 1358 + { 1359 + preempt_disable(); 1360 + if (vmx->vt.guest_state_loaded) 1361 + wrmsrns(msr, data); 1362 + preempt_enable(); 1363 + *cache = data; 1364 + } 1365 + 1366 + static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) 1367 + { 1368 + return vmx_read_guest_host_msr(vmx, MSR_KERNEL_GS_BASE, 1369 + &vmx->msr_guest_kernel_gs_base); 1354 1370 } 1355 1371 1356 1372 static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data) 1357 1373 { 1358 - preempt_disable(); 1359 - if (vmx->vt.guest_state_loaded) 1360 - wrmsrq(MSR_KERNEL_GS_BASE, data); 1361 - preempt_enable(); 1362 - vmx->msr_guest_kernel_gs_base = data; 1374 + vmx_write_guest_host_msr(vmx, MSR_KERNEL_GS_BASE, data, 1375 + &vmx->msr_guest_kernel_gs_base); 1363 1376 } 1364 1377 #endif 1365 1378 ··· 2106 2093 else 2107 2094 msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; 2108 2095 break; 2096 + case MSR_IA32_S_CET: 2097 + msr_info->data = vmcs_readl(GUEST_S_CET); 2098 + break; 2099 + case MSR_KVM_INTERNAL_GUEST_SSP: 2100 + msr_info->data = vmcs_readl(GUEST_SSP); 2101 + break; 2102 + case MSR_IA32_INT_SSP_TAB: 2103 + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE); 2104 + break; 2109 2105 case MSR_IA32_DEBUGCTLMSR: 2110 2106 msr_info->data = vmx_guest_debugctl_read(); 2111 2107 break; ··· 2149 2127 (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))) 2150 2128 debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT; 2151 2129 2152 - if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) && 2130 + if ((kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT) && 2153 2131 (host_initiated || intel_pmu_lbr_is_enabled(vcpu))) 2154 2132 debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; 2155 2133 ··· 2433 2411 else 2434 2412 vmx->pt_desc.guest.addr_a[index / 2] = data; 2435 2413 break; 2414 + case MSR_IA32_S_CET: 2415 + vmcs_writel(GUEST_S_CET, data); 2416 + break; 2417 + case MSR_KVM_INTERNAL_GUEST_SSP: 2418 + vmcs_writel(GUEST_SSP, data); 2419 + break; 2420 + case MSR_IA32_INT_SSP_TAB: 2421 + vmcs_writel(GUEST_INTR_SSP_TABLE, data); 2422 + break; 2436 2423 case MSR_IA32_PERF_CAPABILITIES: 2437 - if (data & PMU_CAP_LBR_FMT) { 2438 - if ((data & PMU_CAP_LBR_FMT) != 2439 - (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT)) 2424 + if (data & PERF_CAP_LBR_FMT) { 2425 + if ((data & PERF_CAP_LBR_FMT) != 2426 + (kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT)) 2440 2427 return 1; 2441 2428 if (!cpuid_model_is_consistent(vcpu)) 2442 2429 return 1; ··· 2615 2584 { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER }, 2616 2585 { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS }, 2617 2586 { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL }, 2587 + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE }, 2618 2588 }; 2619 2589 2620 2590 memset(vmcs_conf, 0, sizeof(*vmcs_conf)); ··· 4100 4068 } 4101 4069 } 4102 4070 4103 - void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 4071 + static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) 4104 4072 { 4073 + bool intercept; 4074 + 4105 4075 if (!cpu_has_vmx_msr_bitmap()) 4106 4076 return; 4107 4077 ··· 4149 4115 vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W, 4150 4116 !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D)); 4151 4117 4118 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 4119 + intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 4120 + 4121 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept); 4122 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept); 4123 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept); 4124 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept); 4125 + } 4126 + 4127 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) { 4128 + intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) && 4129 + !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 4130 + 4131 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept); 4132 + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept); 4133 + } 4134 + 4152 4135 /* 4153 4136 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be 4154 4137 * filtered by userspace. 4155 4138 */ 4139 + } 4140 + 4141 + void vmx_recalc_intercepts(struct kvm_vcpu *vcpu) 4142 + { 4143 + vmx_recalc_msr_intercepts(vcpu); 4156 4144 } 4157 4145 4158 4146 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, ··· 4326 4270 4327 4271 if (cpu_has_load_ia32_efer()) 4328 4272 vmcs_write64(HOST_IA32_EFER, kvm_host.efer); 4273 + 4274 + /* 4275 + * Supervisor shadow stack is not enabled on host side, i.e., 4276 + * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM 4277 + * description(RDSSP instruction), SSP is not readable in CPL0, 4278 + * so resetting the two registers to 0s at VM-Exit does no harm 4279 + * to kernel execution. When execution flow exits to userspace, 4280 + * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter 4281 + * 3 and 4 for details. 4282 + */ 4283 + if (cpu_has_load_cet_ctrl()) { 4284 + vmcs_writel(HOST_S_CET, kvm_host.s_cet); 4285 + vmcs_writel(HOST_SSP, 0); 4286 + vmcs_writel(HOST_INTR_SSP_TABLE, 0); 4287 + } 4329 4288 } 4330 4289 4331 4290 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) ··· 4375 4304 return pin_based_exec_ctrl; 4376 4305 } 4377 4306 4378 - static u32 vmx_vmentry_ctrl(void) 4307 + static u32 vmx_get_initial_vmentry_ctrl(void) 4379 4308 { 4380 4309 u32 vmentry_ctrl = vmcs_config.vmentry_ctrl; 4381 4310 ··· 4392 4321 return vmentry_ctrl; 4393 4322 } 4394 4323 4395 - static u32 vmx_vmexit_ctrl(void) 4324 + static u32 vmx_get_initial_vmexit_ctrl(void) 4396 4325 { 4397 4326 u32 vmexit_ctrl = vmcs_config.vmexit_ctrl; 4398 4327 ··· 4422 4351 4423 4352 pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); 4424 4353 4425 - if (kvm_vcpu_apicv_active(vcpu)) { 4426 - secondary_exec_controls_setbit(vmx, 4427 - SECONDARY_EXEC_APIC_REGISTER_VIRT | 4428 - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); 4429 - if (enable_ipiv) 4430 - tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT); 4431 - } else { 4432 - secondary_exec_controls_clearbit(vmx, 4433 - SECONDARY_EXEC_APIC_REGISTER_VIRT | 4434 - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); 4435 - if (enable_ipiv) 4436 - tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT); 4437 - } 4354 + secondary_exec_controls_changebit(vmx, 4355 + SECONDARY_EXEC_APIC_REGISTER_VIRT | 4356 + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY, 4357 + kvm_vcpu_apicv_active(vcpu)); 4358 + if (enable_ipiv) 4359 + tertiary_exec_controls_changebit(vmx, TERTIARY_EXEC_IPI_VIRT, 4360 + kvm_vcpu_apicv_active(vcpu)); 4438 4361 4439 4362 vmx_update_msr_bitmap_x2apic(vcpu); 4440 4363 } ··· 4751 4686 if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) 4752 4687 vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat); 4753 4688 4754 - vm_exit_controls_set(vmx, vmx_vmexit_ctrl()); 4689 + vm_exit_controls_set(vmx, vmx_get_initial_vmexit_ctrl()); 4755 4690 4756 4691 /* 22.2.1, 20.8.1 */ 4757 - vm_entry_controls_set(vmx, vmx_vmentry_ctrl()); 4692 + vm_entry_controls_set(vmx, vmx_get_initial_vmentry_ctrl()); 4758 4693 4759 4694 vmx->vcpu.arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits(); 4760 4695 vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits); ··· 4881 4816 vmcs_write64(GUEST_BNDCFGS, 0); 4882 4817 4883 4818 vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */ 4819 + 4820 + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { 4821 + vmcs_writel(GUEST_SSP, 0); 4822 + vmcs_writel(GUEST_INTR_SSP_TABLE, 0); 4823 + } 4824 + if (kvm_cpu_cap_has(X86_FEATURE_IBT) || 4825 + kvm_cpu_cap_has(X86_FEATURE_SHSTK)) 4826 + vmcs_writel(GUEST_S_CET, 0); 4884 4827 4885 4828 kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); 4886 4829 ··· 6083 6010 return 1; 6084 6011 } 6085 6012 6013 + static int vmx_get_msr_imm_reg(struct kvm_vcpu *vcpu) 6014 + { 6015 + return vmx_get_instr_info_reg(vmcs_read32(VMX_INSTRUCTION_INFO)); 6016 + } 6017 + 6018 + static int handle_rdmsr_imm(struct kvm_vcpu *vcpu) 6019 + { 6020 + return kvm_emulate_rdmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 6021 + vmx_get_msr_imm_reg(vcpu)); 6022 + } 6023 + 6024 + static int handle_wrmsr_imm(struct kvm_vcpu *vcpu) 6025 + { 6026 + return kvm_emulate_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 6027 + vmx_get_msr_imm_reg(vcpu)); 6028 + } 6029 + 6086 6030 /* 6087 6031 * The exit handlers return 1 if the exit was handled fully and guest execution 6088 6032 * may resume. Otherwise they set the kvm_run parameter to indicate what needs ··· 6158 6068 [EXIT_REASON_ENCLS] = handle_encls, 6159 6069 [EXIT_REASON_BUS_LOCK] = handle_bus_lock_vmexit, 6160 6070 [EXIT_REASON_NOTIFY] = handle_notify, 6071 + [EXIT_REASON_MSR_READ_IMM] = handle_rdmsr_imm, 6072 + [EXIT_REASON_MSR_WRITE_IMM] = handle_wrmsr_imm, 6161 6073 }; 6162 6074 6163 6075 static const int kvm_vmx_max_exit_handlers = ··· 6364 6272 if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0) 6365 6273 vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest); 6366 6274 6275 + if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) 6276 + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n", 6277 + vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP), 6278 + vmcs_readl(GUEST_INTR_SSP_TABLE)); 6367 6279 pr_err("*** Host State ***\n"); 6368 6280 pr_err("RIP = 0x%016lx RSP = 0x%016lx\n", 6369 6281 vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP)); ··· 6398 6302 vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); 6399 6303 if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0) 6400 6304 vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); 6305 + if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) 6306 + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n", 6307 + vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP), 6308 + vmcs_readl(HOST_INTR_SSP_TABLE)); 6401 6309 6402 6310 pr_err("*** Control State ***\n"); 6403 6311 pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n", ··· 6602 6502 #ifdef CONFIG_MITIGATION_RETPOLINE 6603 6503 if (exit_reason.basic == EXIT_REASON_MSR_WRITE) 6604 6504 return kvm_emulate_wrmsr(vcpu); 6505 + else if (exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM) 6506 + return handle_wrmsr_imm(vcpu); 6605 6507 else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER) 6606 6508 return handle_preemption_timer(vcpu); 6607 6509 else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW) ··· 7279 7177 7280 7178 switch (vmx_get_exit_reason(vcpu).basic) { 7281 7179 case EXIT_REASON_MSR_WRITE: 7282 - return handle_fastpath_set_msr_irqoff(vcpu); 7180 + return handle_fastpath_wrmsr(vcpu); 7181 + case EXIT_REASON_MSR_WRITE_IMM: 7182 + return handle_fastpath_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu), 7183 + vmx_get_msr_imm_reg(vcpu)); 7283 7184 case EXIT_REASON_PREEMPTION_TIMER: 7284 7185 return handle_fastpath_preemption_timer(vcpu, force_immediate_exit); 7285 7186 case EXIT_REASON_HLT: 7286 7187 return handle_fastpath_hlt(vcpu); 7188 + case EXIT_REASON_INVD: 7189 + return handle_fastpath_invd(vcpu); 7287 7190 default: 7288 7191 return EXIT_FASTPATH_NONE; 7289 7192 } ··· 7755 7648 cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); 7756 7649 cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); 7757 7650 cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); 7651 + cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK)); 7652 + cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT)); 7758 7653 7759 7654 entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1); 7760 7655 cr4_fixed1_update(X86_CR4_LAM_SUP, eax, feature_bit(LAM)); ··· 7891 7782 vmx->msr_ia32_feature_control_valid_bits &= 7892 7783 ~FEAT_CTL_SGX_LC_ENABLED; 7893 7784 7894 - /* Recalc MSR interception to account for feature changes. */ 7895 - vmx_recalc_msr_intercepts(vcpu); 7896 - 7897 7785 /* Refresh #PF interception to account for MAXPHYADDR changes. */ 7898 7786 vmx_update_exception_bitmap(vcpu); 7899 7787 } 7900 7788 7901 7789 static __init u64 vmx_get_perf_capabilities(void) 7902 7790 { 7903 - u64 perf_cap = PMU_CAP_FW_WRITES; 7791 + u64 perf_cap = PERF_CAP_FW_WRITES; 7904 7792 u64 host_perf_cap = 0; 7905 7793 7906 7794 if (!enable_pmu) ··· 7917 7811 if (!vmx_lbr_caps.has_callstack) 7918 7812 memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps)); 7919 7813 else if (vmx_lbr_caps.nr) 7920 - perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT; 7814 + perf_cap |= host_perf_cap & PERF_CAP_LBR_FMT; 7921 7815 } 7922 7816 7923 7817 if (vmx_pebs_supported()) { ··· 7985 7879 kvm_cpu_cap_set(X86_FEATURE_UMIP); 7986 7880 7987 7881 /* CPUID 0xD.1 */ 7988 - kvm_caps.supported_xss = 0; 7989 7882 if (!cpu_has_vmx_xsaves()) 7990 7883 kvm_cpu_cap_clear(X86_FEATURE_XSAVES); 7991 7884 ··· 7996 7891 7997 7892 if (cpu_has_vmx_waitpkg()) 7998 7893 kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); 7894 + 7895 + /* 7896 + * Disable CET if unrestricted_guest is unsupported as KVM doesn't 7897 + * enforce CET HW behaviors in emulator. On platforms with 7898 + * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code 7899 + * fails, so disable CET in this case too. 7900 + */ 7901 + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || 7902 + !cpu_has_vmx_basic_no_hw_errcode_cc()) { 7903 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 7904 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 7905 + } 7999 7906 } 8000 7907 8001 7908 static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu, ··· 8457 8340 8458 8341 vmx_setup_user_return_msrs(); 8459 8342 8460 - if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0) 8461 - return -EIO; 8462 8343 8463 8344 if (boot_cpu_has(X86_FEATURE_NX)) 8464 8345 kvm_enable_efer_bits(EFER_NX); ··· 8485 8370 pr_err_ratelimited("NX (Execute Disable) not supported\n"); 8486 8371 return -EOPNOTSUPP; 8487 8372 } 8373 + 8374 + /* 8375 + * Shadow paging doesn't have a (further) performance penalty 8376 + * from GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable it 8377 + * by default 8378 + */ 8379 + if (!enable_ept) 8380 + allow_smaller_maxphyaddr = true; 8488 8381 8489 8382 if (!cpu_has_vmx_ept_ad_bits() || !enable_ept) 8490 8383 enable_ept_ad_bits = 0; ··· 8619 8496 8620 8497 setup_default_sgx_lepubkeyhash(); 8621 8498 8499 + vmx_set_cpu_caps(); 8500 + 8501 + /* 8502 + * Configure nested capabilities after core CPU capabilities so that 8503 + * nested support can be conditional on base support, e.g. so that KVM 8504 + * can hide/show features based on kvm_cpu_cap_has(). 8505 + */ 8622 8506 if (nested) { 8623 8507 nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept); 8624 8508 ··· 8633 8503 if (r) 8634 8504 return r; 8635 8505 } 8636 - 8637 - vmx_set_cpu_caps(); 8638 8506 8639 8507 r = alloc_kvm_area(); 8640 8508 if (r && nested) ··· 8660 8532 */ 8661 8533 if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) 8662 8534 kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8663 - kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8535 + 8536 + kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; 8537 + 8664 8538 return r; 8665 8539 } 8666 8540 ··· 8695 8565 return -EOPNOTSUPP; 8696 8566 8697 8567 /* 8698 - * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing 8699 - * to unwind if a later step fails. 8568 + * Note, VMCS and eVMCS configuration only touch VMX knobs/variables, 8569 + * i.e. there's nothing to unwind if a later step fails. 8700 8570 */ 8701 8571 hv_init_evmcs(); 8572 + 8573 + /* 8574 + * Parse the VMCS config and VMX capabilities before anything else, so 8575 + * that the information is available to all setup flows. 8576 + */ 8577 + if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0) 8578 + return -EIO; 8702 8579 8703 8580 r = kvm_x86_vendor_init(&vt_init_ops); 8704 8581 if (r) ··· 8729 8592 } 8730 8593 8731 8594 vmx_check_vmcs12_offsets(); 8732 - 8733 - /* 8734 - * Shadow paging doesn't have a (further) performance penalty 8735 - * from GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable it 8736 - * by default 8737 - */ 8738 - if (!enable_ept) 8739 - allow_smaller_maxphyaddr = true; 8740 8595 8741 8596 return 0; 8742 8597

+20 -2

arch/x86/kvm/vmx/vmx.h

··· 181 181 */ 182 182 u64 pre_vmenter_debugctl; 183 183 u64 pre_vmenter_bndcfgs; 184 + u64 pre_vmenter_s_cet; 185 + u64 pre_vmenter_ssp; 186 + u64 pre_vmenter_ssp_tbl; 184 187 185 188 /* to migrate it to L1 if L2 writes to L1's CR8 directly */ 186 189 int l1_tpr_threshold; ··· 487 484 VM_ENTRY_LOAD_IA32_EFER | \ 488 485 VM_ENTRY_LOAD_BNDCFGS | \ 489 486 VM_ENTRY_PT_CONCEAL_PIP | \ 490 - VM_ENTRY_LOAD_IA32_RTIT_CTL) 487 + VM_ENTRY_LOAD_IA32_RTIT_CTL | \ 488 + VM_ENTRY_LOAD_CET_STATE) 491 489 492 490 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \ 493 491 (VM_EXIT_SAVE_DEBUG_CONTROLS | \ ··· 510 506 VM_EXIT_LOAD_IA32_EFER | \ 511 507 VM_EXIT_CLEAR_BNDCFGS | \ 512 508 VM_EXIT_PT_CONCEAL_PIP | \ 513 - VM_EXIT_CLEAR_IA32_RTIT_CTL) 509 + VM_EXIT_CLEAR_IA32_RTIT_CTL | \ 510 + VM_EXIT_LOAD_CET_STATE) 514 511 515 512 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \ 516 513 (PIN_BASED_EXT_INTR_MASK | \ ··· 613 608 { \ 614 609 BUILD_BUG_ON(!(val & (KVM_REQUIRED_VMX_##uname | KVM_OPTIONAL_VMX_##uname))); \ 615 610 lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ 611 + } \ 612 + static __always_inline void lname##_controls_changebit(struct vcpu_vmx *vmx, u##bits val, \ 613 + bool set) \ 614 + { \ 615 + if (set) \ 616 + lname##_controls_setbit(vmx, val); \ 617 + else \ 618 + lname##_controls_clearbit(vmx, val); \ 616 619 } 617 620 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32) 618 621 BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) ··· 718 705 } 719 706 720 707 void dump_vmcs(struct kvm_vcpu *vcpu); 708 + 709 + static inline int vmx_get_instr_info_reg(u32 vmx_instr_info) 710 + { 711 + return (vmx_instr_info >> 3) & 0xf; 712 + } 721 713 722 714 static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info) 723 715 {

+1 -1

arch/x86/kvm/vmx/x86_ops.h

··· 52 52 int trig_mode, int vector); 53 53 void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu); 54 54 bool vmx_has_emulated_msr(struct kvm *kvm, u32 index); 55 - void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu); 55 + void vmx_recalc_intercepts(struct kvm_vcpu *vcpu); 56 56 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); 57 57 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu); 58 58 int vmx_get_feature_msr(u32 msr, u64 *data);

+681 -271

arch/x86/kvm/x86.c

··· 97 97 * vendor module being reloaded with different module parameters. 98 98 */ 99 99 struct kvm_caps kvm_caps __read_mostly; 100 - EXPORT_SYMBOL_GPL(kvm_caps); 100 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_caps); 101 101 102 102 struct kvm_host_values kvm_host __read_mostly; 103 - EXPORT_SYMBOL_GPL(kvm_host); 103 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host); 104 104 105 105 #define ERR_PTR_USR(e) ((void __user *)ERR_PTR(e)) 106 106 ··· 136 136 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); 137 137 138 138 static DEFINE_MUTEX(vendor_module_lock); 139 + static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); 140 + static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); 141 + 139 142 struct kvm_x86_ops kvm_x86_ops __read_mostly; 140 143 141 144 #define KVM_X86_OP(func) \ ··· 155 152 156 153 bool __read_mostly report_ignored_msrs = true; 157 154 module_param(report_ignored_msrs, bool, 0644); 158 - EXPORT_SYMBOL_GPL(report_ignored_msrs); 155 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs); 159 156 160 157 unsigned int min_timer_period_us = 200; 161 158 module_param(min_timer_period_us, uint, 0644); ··· 167 164 static u32 __read_mostly tsc_tolerance_ppm = 250; 168 165 module_param(tsc_tolerance_ppm, uint, 0644); 169 166 170 - static bool __read_mostly vector_hashing = true; 171 - module_param(vector_hashing, bool, 0444); 172 - 173 167 bool __read_mostly enable_vmware_backdoor = false; 174 168 module_param(enable_vmware_backdoor, bool, 0444); 175 - EXPORT_SYMBOL_GPL(enable_vmware_backdoor); 169 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_vmware_backdoor); 176 170 177 171 /* 178 172 * Flags to manipulate forced emulation behavior (any non-zero value will ··· 184 184 185 185 /* Enable/disable PMU virtualization */ 186 186 bool __read_mostly enable_pmu = true; 187 - EXPORT_SYMBOL_GPL(enable_pmu); 187 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_pmu); 188 188 module_param(enable_pmu, bool, 0444); 189 189 190 190 bool __read_mostly eager_page_split = true; ··· 211 211 }; 212 212 213 213 u32 __read_mostly kvm_nr_uret_msrs; 214 - EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs); 214 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs); 215 215 static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS]; 216 216 static struct kvm_user_return_msrs __percpu *user_return_msrs; 217 217 ··· 220 220 | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ 221 221 | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) 222 222 223 + #define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL) 224 + /* 225 + * Note, KVM supports exposing PT to the guest, but does not support context 226 + * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping 227 + * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support 228 + * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs). 229 + */ 230 + #define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_ALL) 231 + 223 232 bool __read_mostly allow_smaller_maxphyaddr = 0; 224 - EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); 233 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(allow_smaller_maxphyaddr); 225 234 226 235 bool __read_mostly enable_apicv = true; 227 - EXPORT_SYMBOL_GPL(enable_apicv); 236 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_apicv); 228 237 229 238 bool __read_mostly enable_ipiv = true; 230 - EXPORT_SYMBOL_GPL(enable_ipiv); 239 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_ipiv); 231 240 232 241 bool __read_mostly enable_device_posted_irqs = true; 233 - EXPORT_SYMBOL_GPL(enable_device_posted_irqs); 242 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_device_posted_irqs); 234 243 235 244 const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 236 245 KVM_GENERIC_VM_STATS(), ··· 344 335 MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, 345 336 MSR_IA32_UMWAIT_CONTROL, 346 337 347 - MSR_IA32_XFD, MSR_IA32_XFD_ERR, 338 + MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS, 339 + 340 + MSR_IA32_U_CET, MSR_IA32_S_CET, 341 + MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, 342 + MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, 348 343 }; 349 344 350 345 static const u32 msrs_to_save_pmu[] = { ··· 380 367 MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 381 368 MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 382 369 MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 370 + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, 383 371 }; 384 372 385 373 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) + ··· 628 614 kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr; 629 615 return kvm_nr_uret_msrs++; 630 616 } 631 - EXPORT_SYMBOL_GPL(kvm_add_user_return_msr); 617 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr); 632 618 633 619 int kvm_find_user_return_msr(u32 msr) 634 620 { ··· 640 626 } 641 627 return -1; 642 628 } 643 - EXPORT_SYMBOL_GPL(kvm_find_user_return_msr); 629 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr); 644 630 645 631 static void kvm_user_return_msr_cpu_online(void) 646 632 { ··· 680 666 kvm_user_return_register_notifier(msrs); 681 667 return 0; 682 668 } 683 - EXPORT_SYMBOL_GPL(kvm_set_user_return_msr); 669 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr); 684 670 685 671 void kvm_user_return_msr_update_cache(unsigned int slot, u64 value) 686 672 { ··· 689 675 msrs->values[slot].curr = value; 690 676 kvm_user_return_register_notifier(msrs); 691 677 } 692 - EXPORT_SYMBOL_GPL(kvm_user_return_msr_update_cache); 678 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_user_return_msr_update_cache); 679 + 680 + u64 kvm_get_user_return_msr(unsigned int slot) 681 + { 682 + return this_cpu_ptr(user_return_msrs)->values[slot].curr; 683 + } 684 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr); 693 685 694 686 static void drop_user_return_notifiers(void) 695 687 { ··· 717 697 /* Fault while not rebooting. We want the trace. */ 718 698 BUG_ON(!kvm_rebooting); 719 699 } 720 - EXPORT_SYMBOL_GPL(kvm_spurious_fault); 700 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spurious_fault); 721 701 722 702 #define EXCPT_BENIGN 0 723 703 #define EXCPT_CONTRIBUTORY 1 ··· 822 802 ex->has_payload = false; 823 803 ex->payload = 0; 824 804 } 825 - EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload); 805 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_deliver_exception_payload); 826 806 827 807 static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vector, 828 808 bool has_error_code, u32 error_code, ··· 906 886 { 907 887 kvm_multiple_exception(vcpu, nr, false, 0, false, 0); 908 888 } 909 - EXPORT_SYMBOL_GPL(kvm_queue_exception); 889 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception); 910 890 911 891 912 892 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, ··· 914 894 { 915 895 kvm_multiple_exception(vcpu, nr, false, 0, true, payload); 916 896 } 917 - EXPORT_SYMBOL_GPL(kvm_queue_exception_p); 897 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception_p); 918 898 919 899 static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr, 920 900 u32 error_code, unsigned long payload) ··· 949 929 vcpu->arch.exception.has_payload = false; 950 930 vcpu->arch.exception.payload = 0; 951 931 } 952 - EXPORT_SYMBOL_GPL(kvm_requeue_exception); 932 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_requeue_exception); 953 933 954 934 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err) 955 935 { ··· 960 940 961 941 return 1; 962 942 } 963 - EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); 943 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_complete_insn_gp); 964 944 965 945 static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err) 966 946 { ··· 1010 990 1011 991 fault_mmu->inject_page_fault(vcpu, fault); 1012 992 } 1013 - EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault); 993 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_emulated_page_fault); 1014 994 1015 995 void kvm_inject_nmi(struct kvm_vcpu *vcpu) 1016 996 { ··· 1022 1002 { 1023 1003 kvm_multiple_exception(vcpu, nr, true, error_code, false, 0); 1024 1004 } 1025 - EXPORT_SYMBOL_GPL(kvm_queue_exception_e); 1005 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_queue_exception_e); 1026 1006 1027 1007 /* 1028 1008 * Checks if cpl <= required_cpl; if true, return true. Otherwise queue ··· 1044 1024 kvm_queue_exception(vcpu, UD_VECTOR); 1045 1025 return false; 1046 1026 } 1047 - EXPORT_SYMBOL_GPL(kvm_require_dr); 1027 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_require_dr); 1048 1028 1049 1029 static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu) 1050 1030 { ··· 1099 1079 1100 1080 return 1; 1101 1081 } 1102 - EXPORT_SYMBOL_GPL(load_pdptrs); 1082 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs); 1103 1083 1104 1084 static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 1105 1085 { ··· 1152 1132 if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS) 1153 1133 kvm_mmu_reset_context(vcpu); 1154 1134 } 1155 - EXPORT_SYMBOL_GPL(kvm_post_set_cr0); 1135 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0); 1156 1136 1157 1137 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) 1158 1138 { ··· 1187 1167 (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))) 1188 1168 return 1; 1189 1169 1170 + if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) 1171 + return 1; 1172 + 1190 1173 kvm_x86_call(set_cr0)(vcpu, cr0); 1191 1174 1192 1175 kvm_post_set_cr0(vcpu, old_cr0, cr0); 1193 1176 1194 1177 return 0; 1195 1178 } 1196 - EXPORT_SYMBOL_GPL(kvm_set_cr0); 1179 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0); 1197 1180 1198 1181 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) 1199 1182 { 1200 1183 (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f)); 1201 1184 } 1202 - EXPORT_SYMBOL_GPL(kvm_lmsw); 1185 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw); 1203 1186 1204 1187 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) 1205 1188 { ··· 1225 1202 kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) 1226 1203 wrpkru(vcpu->arch.pkru); 1227 1204 } 1228 - EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); 1205 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_load_guest_xsave_state); 1229 1206 1230 1207 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) 1231 1208 { ··· 1251 1228 } 1252 1229 1253 1230 } 1254 - EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state); 1231 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_load_host_xsave_state); 1255 1232 1256 1233 #ifdef CONFIG_X86_64 1257 1234 static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu) ··· 1260 1237 } 1261 1238 #endif 1262 1239 1263 - static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 1240 + int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) 1264 1241 { 1265 1242 u64 xcr0 = xcr; 1266 1243 u64 old_xcr0 = vcpu->arch.xcr0; ··· 1304 1281 vcpu->arch.cpuid_dynamic_bits_dirty = true; 1305 1282 return 0; 1306 1283 } 1284 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_set_xcr); 1307 1285 1308 1286 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu) 1309 1287 { ··· 1317 1293 1318 1294 return kvm_skip_emulated_instruction(vcpu); 1319 1295 } 1320 - EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv); 1296 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv); 1321 1297 1322 1298 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 1323 1299 { ··· 1365 1341 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 1366 1342 1367 1343 } 1368 - EXPORT_SYMBOL_GPL(kvm_post_set_cr4); 1344 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4); 1369 1345 1370 1346 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 1371 1347 { ··· 1390 1366 return 1; 1391 1367 } 1392 1368 1369 + if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP)) 1370 + return 1; 1371 + 1393 1372 kvm_x86_call(set_cr4)(vcpu, cr4); 1394 1373 1395 1374 kvm_post_set_cr4(vcpu, old_cr4, cr4); 1396 1375 1397 1376 return 0; 1398 1377 } 1399 - EXPORT_SYMBOL_GPL(kvm_set_cr4); 1378 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4); 1400 1379 1401 1380 static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid) 1402 1381 { ··· 1491 1464 1492 1465 return 0; 1493 1466 } 1494 - EXPORT_SYMBOL_GPL(kvm_set_cr3); 1467 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3); 1495 1468 1496 1469 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) 1497 1470 { ··· 1503 1476 vcpu->arch.cr8 = cr8; 1504 1477 return 0; 1505 1478 } 1506 - EXPORT_SYMBOL_GPL(kvm_set_cr8); 1479 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8); 1507 1480 1508 1481 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) 1509 1482 { ··· 1512 1485 else 1513 1486 return vcpu->arch.cr8; 1514 1487 } 1515 - EXPORT_SYMBOL_GPL(kvm_get_cr8); 1488 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8); 1516 1489 1517 1490 static void kvm_update_dr0123(struct kvm_vcpu *vcpu) 1518 1491 { ··· 1537 1510 if (dr7 & DR7_BP_EN_MASK) 1538 1511 vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; 1539 1512 } 1540 - EXPORT_SYMBOL_GPL(kvm_update_dr7); 1513 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7); 1541 1514 1542 1515 static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu) 1543 1516 { ··· 1578 1551 1579 1552 return 0; 1580 1553 } 1581 - EXPORT_SYMBOL_GPL(kvm_set_dr); 1554 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr); 1582 1555 1583 1556 unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr) 1584 1557 { ··· 1595 1568 return vcpu->arch.dr7; 1596 1569 } 1597 1570 } 1598 - EXPORT_SYMBOL_GPL(kvm_get_dr); 1571 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr); 1599 1572 1600 1573 int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu) 1601 1574 { 1602 - u32 ecx = kvm_rcx_read(vcpu); 1575 + u32 pmc = kvm_rcx_read(vcpu); 1603 1576 u64 data; 1604 1577 1605 - if (kvm_pmu_rdpmc(vcpu, ecx, &data)) { 1578 + if (kvm_pmu_rdpmc(vcpu, pmc, &data)) { 1606 1579 kvm_inject_gp(vcpu, 0); 1607 1580 return 1; 1608 1581 } ··· 1611 1584 kvm_rdx_write(vcpu, data >> 32); 1612 1585 return kvm_skip_emulated_instruction(vcpu); 1613 1586 } 1614 - EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc); 1587 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdpmc); 1615 1588 1616 1589 /* 1617 1590 * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM ··· 1750 1723 1751 1724 return __kvm_valid_efer(vcpu, efer); 1752 1725 } 1753 - EXPORT_SYMBOL_GPL(kvm_valid_efer); 1726 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer); 1754 1727 1755 1728 static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 1756 1729 { ··· 1793 1766 { 1794 1767 efer_reserved_bits &= ~mask; 1795 1768 } 1796 - EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); 1769 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits); 1797 1770 1798 1771 bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type) 1799 1772 { ··· 1836 1809 1837 1810 return allowed; 1838 1811 } 1839 - EXPORT_SYMBOL_GPL(kvm_msr_allowed); 1812 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed); 1840 1813 1841 1814 /* 1842 1815 * Write @data into the MSR specified by @index. Select MSR specific fault ··· 1897 1870 1898 1871 data = (u32)data; 1899 1872 break; 1873 + case MSR_IA32_U_CET: 1874 + case MSR_IA32_S_CET: 1875 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 1876 + !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT)) 1877 + return KVM_MSR_RET_UNSUPPORTED; 1878 + if (!kvm_is_valid_u_s_cet(vcpu, data)) 1879 + return 1; 1880 + break; 1881 + case MSR_KVM_INTERNAL_GUEST_SSP: 1882 + if (!host_initiated) 1883 + return 1; 1884 + fallthrough; 1885 + /* 1886 + * Note that the MSR emulation here is flawed when a vCPU 1887 + * doesn't support the Intel 64 architecture. The expected 1888 + * architectural behavior in this case is that the upper 32 1889 + * bits do not exist and should always read '0'. However, 1890 + * because the actual hardware on which the virtual CPU is 1891 + * running does support Intel 64, XRSTORS/XSAVES in the 1892 + * guest could observe behavior that violates the 1893 + * architecture. Intercepting XRSTORS/XSAVES for this 1894 + * special case isn't deemed worthwhile. 1895 + */ 1896 + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: 1897 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 1898 + return KVM_MSR_RET_UNSUPPORTED; 1899 + /* 1900 + * MSR_IA32_INT_SSP_TAB is not present on processors that do 1901 + * not support Intel 64 architecture. 1902 + */ 1903 + if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) 1904 + return KVM_MSR_RET_UNSUPPORTED; 1905 + if (is_noncanonical_msr_address(data, vcpu)) 1906 + return 1; 1907 + /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */ 1908 + if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4)) 1909 + return 1; 1910 + break; 1900 1911 } 1901 1912 1902 1913 msr.data = data; ··· 1963 1898 * Returns 0 on success, non-0 otherwise. 1964 1899 * Assumes vcpu_load() was already called. 1965 1900 */ 1966 - int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, 1967 - bool host_initiated) 1901 + static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, 1902 + bool host_initiated) 1968 1903 { 1969 1904 struct msr_data msr; 1970 1905 int ret; ··· 1979 1914 !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID)) 1980 1915 return 1; 1981 1916 break; 1917 + case MSR_IA32_U_CET: 1918 + case MSR_IA32_S_CET: 1919 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 1920 + !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT)) 1921 + return KVM_MSR_RET_UNSUPPORTED; 1922 + break; 1923 + case MSR_KVM_INTERNAL_GUEST_SSP: 1924 + if (!host_initiated) 1925 + return 1; 1926 + fallthrough; 1927 + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: 1928 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 1929 + return KVM_MSR_RET_UNSUPPORTED; 1930 + break; 1982 1931 } 1983 1932 1984 1933 msr.index = index; ··· 2004 1925 return ret; 2005 1926 } 2006 1927 1928 + int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 1929 + { 1930 + return __kvm_set_msr(vcpu, index, data, true); 1931 + } 1932 + 1933 + int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1934 + { 1935 + return __kvm_get_msr(vcpu, index, data, true); 1936 + } 1937 + 2007 1938 static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, 2008 1939 u32 index, u64 *data, bool host_initiated) 2009 1940 { ··· 2021 1932 __kvm_get_msr); 2022 1933 } 2023 1934 2024 - int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1935 + int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 1936 + { 1937 + return kvm_get_msr_ignored_check(vcpu, index, data, false); 1938 + } 1939 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read); 1940 + 1941 + int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 1942 + { 1943 + return kvm_set_msr_ignored_check(vcpu, index, data, false); 1944 + } 1945 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write); 1946 + 1947 + int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data) 2025 1948 { 2026 1949 if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) 2027 1950 return KVM_MSR_RET_FILTERED; 2028 - return kvm_get_msr_ignored_check(vcpu, index, data, false); 2029 - } 2030 - EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter); 2031 1951 2032 - int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) 1952 + return __kvm_emulate_msr_read(vcpu, index, data); 1953 + } 1954 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read); 1955 + 1956 + int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data) 2033 1957 { 2034 1958 if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) 2035 1959 return KVM_MSR_RET_FILTERED; 2036 - return kvm_set_msr_ignored_check(vcpu, index, data, false); 2037 - } 2038 - EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter); 2039 1960 2040 - int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data) 2041 - { 2042 - return kvm_get_msr_ignored_check(vcpu, index, data, false); 1961 + return __kvm_emulate_msr_write(vcpu, index, data); 2043 1962 } 2044 - EXPORT_SYMBOL_GPL(kvm_get_msr); 1963 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write); 2045 1964 2046 - int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data) 2047 - { 2048 - return kvm_set_msr_ignored_check(vcpu, index, data, false); 2049 - } 2050 - EXPORT_SYMBOL_GPL(kvm_set_msr); 2051 1965 2052 1966 static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu) 2053 1967 { ··· 2079 1987 static int complete_fast_rdmsr(struct kvm_vcpu *vcpu) 2080 1988 { 2081 1989 complete_userspace_rdmsr(vcpu); 1990 + return complete_fast_msr_access(vcpu); 1991 + } 1992 + 1993 + static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu) 1994 + { 1995 + if (!vcpu->run->msr.error) 1996 + kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg, 1997 + vcpu->run->msr.data); 1998 + 2082 1999 return complete_fast_msr_access(vcpu); 2083 2000 } 2084 2001 ··· 2125 2024 return 1; 2126 2025 } 2127 2026 2128 - int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu) 2027 + static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg, 2028 + int (*complete_rdmsr)(struct kvm_vcpu *)) 2129 2029 { 2130 - u32 ecx = kvm_rcx_read(vcpu); 2131 2030 u64 data; 2132 2031 int r; 2133 2032 2134 - r = kvm_get_msr_with_filter(vcpu, ecx, &data); 2033 + r = kvm_emulate_msr_read(vcpu, msr, &data); 2135 2034 2136 2035 if (!r) { 2137 - trace_kvm_msr_read(ecx, data); 2036 + trace_kvm_msr_read(msr, data); 2138 2037 2139 - kvm_rax_write(vcpu, data & -1u); 2140 - kvm_rdx_write(vcpu, (data >> 32) & -1u); 2038 + if (reg < 0) { 2039 + kvm_rax_write(vcpu, data & -1u); 2040 + kvm_rdx_write(vcpu, (data >> 32) & -1u); 2041 + } else { 2042 + kvm_register_write(vcpu, reg, data); 2043 + } 2141 2044 } else { 2142 2045 /* MSR read failed? See if we should ask user space */ 2143 - if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0, 2144 - complete_fast_rdmsr, r)) 2046 + if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0, 2047 + complete_rdmsr, r)) 2145 2048 return 0; 2146 - trace_kvm_msr_read_ex(ecx); 2049 + trace_kvm_msr_read_ex(msr); 2147 2050 } 2148 2051 2149 2052 return kvm_x86_call(complete_emulated_msr)(vcpu, r); 2150 2053 } 2151 - EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr); 2152 2054 2153 - int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu) 2055 + int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu) 2154 2056 { 2155 - u32 ecx = kvm_rcx_read(vcpu); 2156 - u64 data = kvm_read_edx_eax(vcpu); 2057 + return __kvm_emulate_rdmsr(vcpu, kvm_rcx_read(vcpu), -1, 2058 + complete_fast_rdmsr); 2059 + } 2060 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr); 2061 + 2062 + int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2063 + { 2064 + vcpu->arch.cui_rdmsr_imm_reg = reg; 2065 + 2066 + return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm); 2067 + } 2068 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm); 2069 + 2070 + static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data) 2071 + { 2157 2072 int r; 2158 2073 2159 - r = kvm_set_msr_with_filter(vcpu, ecx, data); 2160 - 2074 + r = kvm_emulate_msr_write(vcpu, msr, data); 2161 2075 if (!r) { 2162 - trace_kvm_msr_write(ecx, data); 2076 + trace_kvm_msr_write(msr, data); 2163 2077 } else { 2164 2078 /* MSR write failed? See if we should ask user space */ 2165 - if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data, 2079 + if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data, 2166 2080 complete_fast_msr_access, r)) 2167 2081 return 0; 2168 2082 /* Signal all other negative errors to userspace */ 2169 2083 if (r < 0) 2170 2084 return r; 2171 - trace_kvm_msr_write_ex(ecx, data); 2085 + trace_kvm_msr_write_ex(msr, data); 2172 2086 } 2173 2087 2174 2088 return kvm_x86_call(complete_emulated_msr)(vcpu, r); 2175 2089 } 2176 - EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr); 2090 + 2091 + int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu) 2092 + { 2093 + return __kvm_emulate_wrmsr(vcpu, kvm_rcx_read(vcpu), 2094 + kvm_read_edx_eax(vcpu)); 2095 + } 2096 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr); 2097 + 2098 + int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2099 + { 2100 + return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg)); 2101 + } 2102 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm); 2177 2103 2178 2104 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu) 2179 2105 { ··· 2212 2084 /* Treat an INVD instruction as a NOP and just skip it. */ 2213 2085 return kvm_emulate_as_nop(vcpu); 2214 2086 } 2215 - EXPORT_SYMBOL_GPL(kvm_emulate_invd); 2087 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_invd); 2088 + 2089 + fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu) 2090 + { 2091 + if (!kvm_emulate_invd(vcpu)) 2092 + return EXIT_FASTPATH_EXIT_USERSPACE; 2093 + 2094 + return EXIT_FASTPATH_REENTER_GUEST; 2095 + } 2096 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_invd); 2216 2097 2217 2098 int kvm_handle_invalid_op(struct kvm_vcpu *vcpu) 2218 2099 { 2219 2100 kvm_queue_exception(vcpu, UD_VECTOR); 2220 2101 return 1; 2221 2102 } 2222 - EXPORT_SYMBOL_GPL(kvm_handle_invalid_op); 2103 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_invalid_op); 2223 2104 2224 2105 2225 2106 static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn) ··· 2254 2117 { 2255 2118 return kvm_emulate_monitor_mwait(vcpu, "MWAIT"); 2256 2119 } 2257 - EXPORT_SYMBOL_GPL(kvm_emulate_mwait); 2120 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_mwait); 2258 2121 2259 2122 int kvm_emulate_monitor(struct kvm_vcpu *vcpu) 2260 2123 { 2261 2124 return kvm_emulate_monitor_mwait(vcpu, "MONITOR"); 2262 2125 } 2263 - EXPORT_SYMBOL_GPL(kvm_emulate_monitor); 2126 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_monitor); 2264 2127 2265 2128 static inline bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu) 2266 2129 { ··· 2270 2133 kvm_request_pending(vcpu) || xfer_to_guest_mode_work_pending(); 2271 2134 } 2272 2135 2273 - /* 2274 - * The fast path for frequent and performance sensitive wrmsr emulation, 2275 - * i.e. the sending of IPI, sending IPI early in the VM-Exit flow reduces 2276 - * the latency of virtual IPI by avoiding the expensive bits of transitioning 2277 - * from guest to host, e.g. reacquiring KVM's SRCU lock. In contrast to the 2278 - * other cases which must be called after interrupts are enabled on the host. 2279 - */ 2280 - static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data) 2136 + static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data) 2281 2137 { 2282 - if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic)) 2283 - return 1; 2284 - 2285 - if (((data & APIC_SHORT_MASK) == APIC_DEST_NOSHORT) && 2286 - ((data & APIC_DEST_MASK) == APIC_DEST_PHYSICAL) && 2287 - ((data & APIC_MODE_MASK) == APIC_DM_FIXED) && 2288 - ((u32)(data >> 32) != X2APIC_BROADCAST)) 2289 - return kvm_x2apic_icr_write(vcpu->arch.apic, data); 2290 - 2291 - return 1; 2292 - } 2293 - 2294 - static int handle_fastpath_set_tscdeadline(struct kvm_vcpu *vcpu, u64 data) 2295 - { 2296 - if (!kvm_can_use_hv_timer(vcpu)) 2297 - return 1; 2298 - 2299 - kvm_set_lapic_tscdeadline_msr(vcpu, data); 2300 - return 0; 2301 - } 2302 - 2303 - fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) 2304 - { 2305 - u32 msr = kvm_rcx_read(vcpu); 2306 - u64 data; 2307 - fastpath_t ret; 2308 - bool handled; 2309 - 2310 - kvm_vcpu_srcu_read_lock(vcpu); 2311 - 2312 2138 switch (msr) { 2313 2139 case APIC_BASE_MSR + (APIC_ICR >> 4): 2314 - data = kvm_read_edx_eax(vcpu); 2315 - handled = !handle_fastpath_set_x2apic_icr_irqoff(vcpu, data); 2140 + if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) || 2141 + kvm_x2apic_icr_write_fast(vcpu->arch.apic, data)) 2142 + return EXIT_FASTPATH_NONE; 2316 2143 break; 2317 2144 case MSR_IA32_TSC_DEADLINE: 2318 - data = kvm_read_edx_eax(vcpu); 2319 - handled = !handle_fastpath_set_tscdeadline(vcpu, data); 2145 + kvm_set_lapic_tscdeadline_msr(vcpu, data); 2320 2146 break; 2321 2147 default: 2322 - handled = false; 2323 - break; 2148 + return EXIT_FASTPATH_NONE; 2324 2149 } 2325 2150 2326 - if (handled) { 2327 - if (!kvm_skip_emulated_instruction(vcpu)) 2328 - ret = EXIT_FASTPATH_EXIT_USERSPACE; 2329 - else 2330 - ret = EXIT_FASTPATH_REENTER_GUEST; 2331 - trace_kvm_msr_write(msr, data); 2332 - } else { 2333 - ret = EXIT_FASTPATH_NONE; 2334 - } 2151 + trace_kvm_msr_write(msr, data); 2335 2152 2336 - kvm_vcpu_srcu_read_unlock(vcpu); 2153 + if (!kvm_skip_emulated_instruction(vcpu)) 2154 + return EXIT_FASTPATH_EXIT_USERSPACE; 2337 2155 2338 - return ret; 2156 + return EXIT_FASTPATH_REENTER_GUEST; 2339 2157 } 2340 - EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff); 2158 + 2159 + fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu) 2160 + { 2161 + return __handle_fastpath_wrmsr(vcpu, kvm_rcx_read(vcpu), 2162 + kvm_read_edx_eax(vcpu)); 2163 + } 2164 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr); 2165 + 2166 + fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg) 2167 + { 2168 + return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg)); 2169 + } 2170 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm); 2341 2171 2342 2172 /* 2343 2173 * Adapt set_msr() to msr_io()'s calling convention ··· 2670 2566 return vcpu->arch.l1_tsc_offset + 2671 2567 kvm_scale_tsc(host_tsc, vcpu->arch.l1_tsc_scaling_ratio); 2672 2568 } 2673 - EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); 2569 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_l1_tsc); 2674 2570 2675 2571 u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier) 2676 2572 { ··· 2685 2581 nested_offset += l2_offset; 2686 2582 return nested_offset; 2687 2583 } 2688 - EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset); 2584 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_offset); 2689 2585 2690 2586 u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier) 2691 2587 { ··· 2695 2591 2696 2592 return l1_multiplier; 2697 2593 } 2698 - EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier); 2594 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_multiplier); 2699 2595 2700 2596 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset) 2701 2597 { ··· 3773 3669 if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) 3774 3670 kvm_vcpu_flush_tlb_guest(vcpu); 3775 3671 } 3776 - EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests); 3672 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_service_local_tlb_flush_requests); 3777 3673 3778 3674 static void record_steal_time(struct kvm_vcpu *vcpu) 3779 3675 { ··· 3871 3767 user_access_end(); 3872 3768 dirty: 3873 3769 mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); 3770 + } 3771 + 3772 + /* 3773 + * Returns true if the MSR in question is managed via XSTATE, i.e. is context 3774 + * switched with the rest of guest FPU state. Note! S_CET is _not_ context 3775 + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS. 3776 + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields, 3777 + * the value saved/restored via XSTATE is always the host's value. That detail 3778 + * is _extremely_ important, as the guest's S_CET must _never_ be resident in 3779 + * hardware while executing in the host. Loading guest values for U_CET and 3780 + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to 3781 + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower 3782 + * privilege levels, i.e. are effectively only consumed by userspace as well. 3783 + */ 3784 + static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr) 3785 + { 3786 + if (!vcpu) 3787 + return false; 3788 + 3789 + switch (msr) { 3790 + case MSR_IA32_U_CET: 3791 + return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) || 3792 + guest_cpu_cap_has(vcpu, X86_FEATURE_IBT); 3793 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 3794 + return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); 3795 + default: 3796 + return false; 3797 + } 3798 + } 3799 + 3800 + /* 3801 + * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an 3802 + * MSR that is managed via XSTATE. Note, the caller is responsible for doing 3803 + * the initial FPU load, this helper only ensures that guest state is resident 3804 + * in hardware (the kernel can load its FPU state in IRQ context). 3805 + */ 3806 + static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu, 3807 + struct msr_data *msr_info, 3808 + int access) 3809 + { 3810 + BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W); 3811 + 3812 + KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm); 3813 + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm); 3814 + 3815 + kvm_fpu_get(); 3816 + if (access == MSR_TYPE_R) 3817 + rdmsrq(msr_info->index, msr_info->data); 3818 + else 3819 + wrmsrq(msr_info->index, msr_info->data); 3820 + kvm_fpu_put(); 3821 + } 3822 + 3823 + static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 3824 + { 3825 + kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W); 3826 + } 3827 + 3828 + static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) 3829 + { 3830 + kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R); 3874 3831 } 3875 3832 3876 3833 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ··· 4125 3960 } 4126 3961 break; 4127 3962 case MSR_IA32_XSS: 4128 - if (!msr_info->host_initiated && 4129 - !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) 3963 + if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) 3964 + return KVM_MSR_RET_UNSUPPORTED; 3965 + 3966 + if (data & ~vcpu->arch.guest_supported_xss) 4130 3967 return 1; 4131 - /* 4132 - * KVM supports exposing PT to the guest, but does not support 4133 - * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than 4134 - * XSAVES/XRSTORS to save/restore PT MSRs. 4135 - */ 4136 - if (data & ~kvm_caps.supported_xss) 4137 - return 1; 3968 + if (vcpu->arch.ia32_xss == data) 3969 + break; 4138 3970 vcpu->arch.ia32_xss = data; 4139 3971 vcpu->arch.cpuid_dynamic_bits_dirty = true; 4140 3972 break; ··· 4315 4153 vcpu->arch.guest_fpu.xfd_err = data; 4316 4154 break; 4317 4155 #endif 4156 + case MSR_IA32_U_CET: 4157 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 4158 + kvm_set_xstate_msr(vcpu, msr_info); 4159 + break; 4318 4160 default: 4319 4161 if (kvm_pmu_is_valid_msr(vcpu, msr)) 4320 4162 return kvm_pmu_set_msr(vcpu, msr_info); ··· 4327 4161 } 4328 4162 return 0; 4329 4163 } 4330 - EXPORT_SYMBOL_GPL(kvm_set_msr_common); 4164 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common); 4331 4165 4332 4166 static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) 4333 4167 { ··· 4668 4502 msr_info->data = vcpu->arch.guest_fpu.xfd_err; 4669 4503 break; 4670 4504 #endif 4505 + case MSR_IA32_U_CET: 4506 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 4507 + kvm_get_xstate_msr(vcpu, msr_info); 4508 + break; 4671 4509 default: 4672 4510 if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) 4673 4511 return kvm_pmu_get_msr(vcpu, msr_info); ··· 4680 4510 } 4681 4511 return 0; 4682 4512 } 4683 - EXPORT_SYMBOL_GPL(kvm_get_msr_common); 4513 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common); 4684 4514 4685 4515 /* 4686 4516 * Read or write a bunch of msrs. All parameters are kernel addresses. ··· 4692 4522 int (*do_msr)(struct kvm_vcpu *vcpu, 4693 4523 unsigned index, u64 *data)) 4694 4524 { 4525 + bool fpu_loaded = false; 4695 4526 int i; 4696 4527 4697 - for (i = 0; i < msrs->nmsrs; ++i) 4528 + for (i = 0; i < msrs->nmsrs; ++i) { 4529 + /* 4530 + * If userspace is accessing one or more XSTATE-managed MSRs, 4531 + * temporarily load the guest's FPU state so that the guest's 4532 + * MSR value(s) is resident in hardware and thus can be accessed 4533 + * via RDMSR/WRMSR. 4534 + */ 4535 + if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) { 4536 + kvm_load_guest_fpu(vcpu); 4537 + fpu_loaded = true; 4538 + } 4698 4539 if (do_msr(vcpu, entries[i].index, &entries[i].data)) 4699 4540 break; 4541 + } 4542 + if (fpu_loaded) 4543 + kvm_put_guest_fpu(vcpu); 4700 4544 4701 4545 return i; 4702 4546 } ··· 4895 4711 case KVM_CAP_IRQFD_RESAMPLE: 4896 4712 case KVM_CAP_MEMORY_FAULT_INFO: 4897 4713 case KVM_CAP_X86_GUEST_MODE: 4714 + case KVM_CAP_ONE_REG: 4898 4715 r = 1; 4899 4716 break; 4900 4717 case KVM_CAP_PRE_FAULT_MEMORY: ··· 6074 5889 } 6075 5890 } 6076 5891 5892 + struct kvm_x86_reg_id { 5893 + __u32 index; 5894 + __u8 type; 5895 + __u8 rsvd1; 5896 + __u8 rsvd2:4; 5897 + __u8 size:4; 5898 + __u8 x86; 5899 + }; 5900 + 5901 + static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu, 5902 + struct kvm_x86_reg_id *reg) 5903 + { 5904 + switch (reg->index) { 5905 + case KVM_REG_GUEST_SSP: 5906 + /* 5907 + * FIXME: If host-initiated accesses are ever exempted from 5908 + * ignore_msrs (in kvm_do_msr_access()), drop this manual check 5909 + * and rely on KVM's standard checks to reject accesses to regs 5910 + * that don't exist. 5911 + */ 5912 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) 5913 + return -EINVAL; 5914 + 5915 + reg->type = KVM_X86_REG_TYPE_MSR; 5916 + reg->index = MSR_KVM_INTERNAL_GUEST_SSP; 5917 + break; 5918 + default: 5919 + return -EINVAL; 5920 + } 5921 + return 0; 5922 + } 5923 + 5924 + static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val) 5925 + { 5926 + u64 val; 5927 + 5928 + if (do_get_msr(vcpu, msr, &val)) 5929 + return -EINVAL; 5930 + 5931 + if (put_user(val, user_val)) 5932 + return -EFAULT; 5933 + 5934 + return 0; 5935 + } 5936 + 5937 + static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val) 5938 + { 5939 + u64 val; 5940 + 5941 + if (get_user(val, user_val)) 5942 + return -EFAULT; 5943 + 5944 + if (do_set_msr(vcpu, msr, &val)) 5945 + return -EINVAL; 5946 + 5947 + return 0; 5948 + } 5949 + 5950 + static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl, 5951 + void __user *argp) 5952 + { 5953 + struct kvm_one_reg one_reg; 5954 + struct kvm_x86_reg_id *reg; 5955 + u64 __user *user_val; 5956 + bool load_fpu; 5957 + int r; 5958 + 5959 + if (copy_from_user(&one_reg, argp, sizeof(one_reg))) 5960 + return -EFAULT; 5961 + 5962 + if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86) 5963 + return -EINVAL; 5964 + 5965 + reg = (struct kvm_x86_reg_id *)&one_reg.id; 5966 + if (reg->rsvd1 || reg->rsvd2) 5967 + return -EINVAL; 5968 + 5969 + if (reg->type == KVM_X86_REG_TYPE_KVM) { 5970 + r = kvm_translate_kvm_reg(vcpu, reg); 5971 + if (r) 5972 + return r; 5973 + } 5974 + 5975 + if (reg->type != KVM_X86_REG_TYPE_MSR) 5976 + return -EINVAL; 5977 + 5978 + if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64) 5979 + return -EINVAL; 5980 + 5981 + guard(srcu)(&vcpu->kvm->srcu); 5982 + 5983 + load_fpu = is_xstate_managed_msr(vcpu, reg->index); 5984 + if (load_fpu) 5985 + kvm_load_guest_fpu(vcpu); 5986 + 5987 + user_val = u64_to_user_ptr(one_reg.addr); 5988 + if (ioctl == KVM_GET_ONE_REG) 5989 + r = kvm_get_one_msr(vcpu, reg->index, user_val); 5990 + else 5991 + r = kvm_set_one_msr(vcpu, reg->index, user_val); 5992 + 5993 + if (load_fpu) 5994 + kvm_put_guest_fpu(vcpu); 5995 + return r; 5996 + } 5997 + 5998 + static int kvm_get_reg_list(struct kvm_vcpu *vcpu, 5999 + struct kvm_reg_list __user *user_list) 6000 + { 6001 + u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0; 6002 + u64 user_nr_regs; 6003 + 6004 + if (get_user(user_nr_regs, &user_list->n)) 6005 + return -EFAULT; 6006 + 6007 + if (put_user(nr_regs, &user_list->n)) 6008 + return -EFAULT; 6009 + 6010 + if (user_nr_regs < nr_regs) 6011 + return -E2BIG; 6012 + 6013 + if (nr_regs && 6014 + put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0])) 6015 + return -EFAULT; 6016 + 6017 + return 0; 6018 + } 6019 + 6077 6020 long kvm_arch_vcpu_ioctl(struct file *filp, 6078 6021 unsigned int ioctl, unsigned long arg) 6079 6022 { ··· 6318 6005 srcu_read_unlock(&vcpu->kvm->srcu, idx); 6319 6006 break; 6320 6007 } 6008 + case KVM_GET_ONE_REG: 6009 + case KVM_SET_ONE_REG: 6010 + r = kvm_get_set_one_reg(vcpu, ioctl, argp); 6011 + break; 6012 + case KVM_GET_REG_LIST: 6013 + r = kvm_get_reg_list(vcpu, argp); 6014 + break; 6321 6015 case KVM_TPR_ACCESS_REPORTING: { 6322 6016 struct kvm_tpr_access_ctl tac; 6323 6017 ··· 7091 6771 7092 6772 kvm_free_msr_filter(old_filter); 7093 6773 7094 - kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED); 6774 + /* 6775 + * Recalc MSR intercepts as userspace may want to intercept accesses to 6776 + * MSRs that KVM would otherwise pass through to the guest. 6777 + */ 6778 + kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS); 7095 6779 7096 6780 return 0; 7097 6781 } ··· 7288 6964 7289 6965 r = -EEXIST; 7290 6966 if (irqchip_in_kernel(kvm)) 6967 + goto create_irqchip_unlock; 6968 + 6969 + /* 6970 + * Disallow an in-kernel I/O APIC if the VM has protected EOIs, 6971 + * i.e. if KVM can't intercept EOIs and thus can't properly 6972 + * emulate level-triggered interrupts. 6973 + */ 6974 + r = -ENOTTY; 6975 + if (kvm->arch.has_protected_eoi) 7291 6976 goto create_irqchip_unlock; 7292 6977 7293 6978 r = -EINVAL; ··· 7686 7353 case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: 7687 7354 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: 7688 7355 case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: 7356 + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET: 7689 7357 if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) 7690 7358 return; 7691 7359 break; ··· 7697 7363 break; 7698 7364 case MSR_IA32_TSX_CTRL: 7699 7365 if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) 7366 + return; 7367 + break; 7368 + case MSR_IA32_XSS: 7369 + if (!kvm_caps.supported_xss) 7370 + return; 7371 + break; 7372 + case MSR_IA32_U_CET: 7373 + case MSR_IA32_S_CET: 7374 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 7375 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 7376 + return; 7377 + break; 7378 + case MSR_IA32_INT_SSP_TAB: 7379 + if (!kvm_cpu_cap_has(X86_FEATURE_LM)) 7380 + return; 7381 + fallthrough; 7382 + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: 7383 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) 7700 7384 return; 7701 7385 break; 7702 7386 default: ··· 7836 7484 u64 access = (kvm_x86_call(get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0; 7837 7485 return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception); 7838 7486 } 7839 - EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); 7487 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_gva_to_gpa_read); 7840 7488 7841 7489 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, 7842 7490 struct x86_exception *exception) ··· 7847 7495 access |= PFERR_WRITE_MASK; 7848 7496 return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception); 7849 7497 } 7850 - EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); 7498 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_gva_to_gpa_write); 7851 7499 7852 7500 /* uses this to access any guest's mapped memory without checking CPL */ 7853 7501 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, ··· 7933 7581 return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, 7934 7582 exception); 7935 7583 } 7936 - EXPORT_SYMBOL_GPL(kvm_read_guest_virt); 7584 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_virt); 7937 7585 7938 7586 static int emulator_read_std(struct x86_emulate_ctxt *ctxt, 7939 7587 gva_t addr, void *val, unsigned int bytes, ··· 8005 7653 return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 8006 7654 PFERR_WRITE_MASK, exception); 8007 7655 } 8008 - EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system); 7656 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_virt_system); 8009 7657 8010 7658 static int kvm_check_emulate_insn(struct kvm_vcpu *vcpu, int emul_type, 8011 7659 void *insn, int insn_len) ··· 8039 7687 8040 7688 return kvm_emulate_instruction(vcpu, emul_type); 8041 7689 } 8042 - EXPORT_SYMBOL_GPL(handle_ud); 7690 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_ud); 8043 7691 8044 7692 static int vcpu_is_mmio_gpa(struct kvm_vcpu *vcpu, unsigned long gva, 8045 7693 gpa_t gpa, bool write) ··· 8518 8166 kvm_emulate_wbinvd_noskip(vcpu); 8519 8167 return kvm_skip_emulated_instruction(vcpu); 8520 8168 } 8521 - EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd); 8169 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wbinvd); 8522 8170 8523 8171 8524 8172 ··· 8705 8353 struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); 8706 8354 int r; 8707 8355 8708 - r = kvm_get_msr_with_filter(vcpu, msr_index, pdata); 8356 + r = kvm_emulate_msr_read(vcpu, msr_index, pdata); 8709 8357 if (r < 0) 8710 8358 return X86EMUL_UNHANDLEABLE; 8711 8359 ··· 8728 8376 struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); 8729 8377 int r; 8730 8378 8731 - r = kvm_set_msr_with_filter(vcpu, msr_index, data); 8379 + r = kvm_emulate_msr_write(vcpu, msr_index, data); 8732 8380 if (r < 0) 8733 8381 return X86EMUL_UNHANDLEABLE; 8734 8382 ··· 8748 8396 static int emulator_get_msr(struct x86_emulate_ctxt *ctxt, 8749 8397 u32 msr_index, u64 *pdata) 8750 8398 { 8751 - return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata); 8399 + /* 8400 + * Treat emulator accesses to the current shadow stack pointer as host- 8401 + * initiated, as they aren't true MSR accesses (SSP is a "just a reg"), 8402 + * and this API is used only for implicit accesses, i.e. not RDMSR, and 8403 + * so the index is fully KVM-controlled. 8404 + */ 8405 + if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP)) 8406 + return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata); 8407 + 8408 + return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata); 8752 8409 } 8753 8410 8754 8411 static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc) ··· 8829 8468 static bool emulator_is_smm(struct x86_emulate_ctxt *ctxt) 8830 8469 { 8831 8470 return is_smm(emul_to_vcpu(ctxt)); 8832 - } 8833 - 8834 - static bool emulator_is_guest_mode(struct x86_emulate_ctxt *ctxt) 8835 - { 8836 - return is_guest_mode(emul_to_vcpu(ctxt)); 8837 8471 } 8838 8472 8839 8473 #ifndef CONFIG_KVM_SMM ··· 8914 8558 .guest_cpuid_is_intel_compatible = emulator_guest_cpuid_is_intel_compatible, 8915 8559 .set_nmi_mask = emulator_set_nmi_mask, 8916 8560 .is_smm = emulator_is_smm, 8917 - .is_guest_mode = emulator_is_guest_mode, 8918 8561 .leave_smm = emulator_leave_smm, 8919 8562 .triple_fault = emulator_triple_fault, 8920 8563 .set_xcr = emulator_set_xcr, ··· 9016 8661 kvm_set_rflags(vcpu, ctxt->eflags); 9017 8662 } 9018 8663 } 9019 - EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt); 8664 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inject_realmode_interrupt); 9020 8665 9021 8666 static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, 9022 8667 u8 ndata, u8 *insn_bytes, u8 insn_size) ··· 9081 8726 { 9082 8727 prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0); 9083 8728 } 9084 - EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit); 8729 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_prepare_emulation_failure_exit); 9085 8730 9086 8731 void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) 9087 8732 { 9088 8733 __kvm_prepare_emulation_failure_exit(vcpu, NULL, 0); 9089 8734 } 9090 - EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit); 8735 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prepare_emulation_failure_exit); 9091 8736 9092 8737 void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa) 9093 8738 { ··· 9109 8754 run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV; 9110 8755 run->internal.ndata = ndata; 9111 8756 } 9112 - EXPORT_SYMBOL_GPL(kvm_prepare_event_vectoring_exit); 8757 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prepare_event_vectoring_exit); 9113 8758 9114 8759 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) 9115 8760 { ··· 9219 8864 if (unlikely(!r)) 9220 8865 return 0; 9221 8866 9222 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED); 8867 + kvm_pmu_instruction_retired(vcpu); 9223 8868 9224 8869 /* 9225 8870 * rflags is the old, "raw" value of the flags. The new value has ··· 9233 8878 r = kvm_vcpu_do_singlestep(vcpu); 9234 8879 return r; 9235 8880 } 9236 - EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction); 8881 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_skip_emulated_instruction); 9237 8882 9238 8883 static bool kvm_is_code_breakpoint_inhibited(struct kvm_vcpu *vcpu) 9239 8884 { ··· 9364 9009 9365 9010 return r; 9366 9011 } 9367 - EXPORT_SYMBOL_GPL(x86_decode_emulated_instruction); 9012 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(x86_decode_emulated_instruction); 9368 9013 9369 9014 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 9370 9015 int emulation_type, void *insn, int insn_len) ··· 9498 9143 ctxt->exception.address = 0; 9499 9144 } 9500 9145 9501 - r = x86_emulate_insn(ctxt); 9146 + /* 9147 + * Check L1's instruction intercepts when emulating instructions for 9148 + * L2, unless KVM is re-emulating a previously decoded instruction, 9149 + * e.g. to complete userspace I/O, in which case KVM has already 9150 + * checked the intercepts. 9151 + */ 9152 + r = x86_emulate_insn(ctxt, is_guest_mode(vcpu) && 9153 + !(emulation_type & EMULTYPE_NO_DECODE)); 9502 9154 9503 9155 if (r == EMULATION_INTERCEPTED) 9504 9156 return 1; ··· 9560 9198 */ 9561 9199 if (!ctxt->have_exception || 9562 9200 exception_type(ctxt->exception.vector) == EXCPT_TRAP) { 9563 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.INSTRUCTIONS_RETIRED); 9201 + kvm_pmu_instruction_retired(vcpu); 9564 9202 if (ctxt->is_branch) 9565 - kvm_pmu_trigger_event(vcpu, kvm_pmu_eventsel.BRANCH_INSTRUCTIONS_RETIRED); 9203 + kvm_pmu_branch_retired(vcpu); 9566 9204 kvm_rip_write(vcpu, ctxt->eip); 9567 9205 if (r && (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP))) 9568 9206 r = kvm_vcpu_do_singlestep(vcpu); ··· 9588 9226 { 9589 9227 return x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0); 9590 9228 } 9591 - EXPORT_SYMBOL_GPL(kvm_emulate_instruction); 9229 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_instruction); 9592 9230 9593 9231 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, 9594 9232 void *insn, int insn_len) 9595 9233 { 9596 9234 return x86_emulate_instruction(vcpu, 0, 0, insn, insn_len); 9597 9235 } 9598 - EXPORT_SYMBOL_GPL(kvm_emulate_instruction_from_buffer); 9236 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_instruction_from_buffer); 9599 9237 9600 9238 static int complete_fast_pio_out_port_0x7e(struct kvm_vcpu *vcpu) 9601 9239 { ··· 9690 9328 ret = kvm_fast_pio_out(vcpu, size, port); 9691 9329 return ret && kvm_skip_emulated_instruction(vcpu); 9692 9330 } 9693 - EXPORT_SYMBOL_GPL(kvm_fast_pio); 9331 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fast_pio); 9694 9332 9695 9333 static int kvmclock_cpu_down_prep(unsigned int cpu) 9696 9334 { ··· 10013 9651 return -EIO; 10014 9652 } 10015 9653 9654 + if (boot_cpu_has(X86_FEATURE_SHSTK) || boot_cpu_has(X86_FEATURE_IBT)) { 9655 + rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet); 9656 + /* 9657 + * Linux doesn't yet support supervisor shadow stacks (SSS), so 9658 + * KVM doesn't save/restore the associated MSRs, i.e. KVM may 9659 + * clobber the host values. Yell and refuse to load if SSS is 9660 + * unexpectedly enabled, e.g. to avoid crashing the host. 9661 + */ 9662 + if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN)) 9663 + return -EIO; 9664 + } 9665 + 10016 9666 memset(&kvm_caps, 0, sizeof(kvm_caps)); 10017 9667 10018 9668 x86_emulator_cache = kvm_alloc_emulator_cache(); ··· 10052 9678 kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); 10053 9679 kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; 10054 9680 } 9681 + 9682 + if (boot_cpu_has(X86_FEATURE_XSAVES)) { 9683 + rdmsrq(MSR_IA32_XSS, kvm_host.xss); 9684 + kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS; 9685 + } 9686 + 10055 9687 kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS; 10056 9688 kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS; 10057 9689 10058 9690 rdmsrq_safe(MSR_EFER, &kvm_host.efer); 10059 - 10060 - if (boot_cpu_has(X86_FEATURE_XSAVES)) 10061 - rdmsrq(MSR_IA32_XSS, kvm_host.xss); 10062 9691 10063 9692 kvm_init_pmu_capability(ops->pmu_ops); 10064 9693 ··· 10111 9734 if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) 10112 9735 kvm_caps.supported_xss = 0; 10113 9736 9737 + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && 9738 + !kvm_cpu_cap_has(X86_FEATURE_IBT)) 9739 + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL; 9740 + 9741 + if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) { 9742 + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); 9743 + kvm_cpu_cap_clear(X86_FEATURE_IBT); 9744 + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL; 9745 + } 9746 + 10114 9747 if (kvm_caps.has_tsc_control) { 10115 9748 /* 10116 9749 * Make sure the user can only configure tsc_khz values that ··· 10147 9760 kmem_cache_destroy(x86_emulator_cache); 10148 9761 return r; 10149 9762 } 10150 - EXPORT_SYMBOL_GPL(kvm_x86_vendor_init); 9763 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x86_vendor_init); 10151 9764 10152 9765 void kvm_x86_vendor_exit(void) 10153 9766 { ··· 10181 9794 kvm_x86_ops.enable_virtualization_cpu = NULL; 10182 9795 mutex_unlock(&vendor_module_lock); 10183 9796 } 10184 - EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit); 9797 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_x86_vendor_exit); 10185 9798 10186 9799 #ifdef CONFIG_X86_64 10187 9800 static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr, ··· 10245 9858 { 10246 9859 return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0); 10247 9860 } 10248 - EXPORT_SYMBOL_GPL(kvm_apicv_activated); 9861 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_apicv_activated); 10249 9862 10250 9863 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu) 10251 9864 { ··· 10255 9868 10256 9869 return (vm_reasons | vcpu_reasons) == 0; 10257 9870 } 10258 - EXPORT_SYMBOL_GPL(kvm_vcpu_apicv_activated); 9871 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_apicv_activated); 10259 9872 10260 9873 static void set_or_clear_apicv_inhibit(unsigned long *inhibits, 10261 9874 enum kvm_apicv_inhibit reason, bool set) ··· 10431 10044 vcpu->run->hypercall.ret = ret; 10432 10045 return 1; 10433 10046 } 10434 - EXPORT_SYMBOL_GPL(____kvm_emulate_hypercall); 10047 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(____kvm_emulate_hypercall); 10435 10048 10436 10049 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) 10437 10050 { ··· 10444 10057 return __kvm_emulate_hypercall(vcpu, kvm_x86_call(get_cpl)(vcpu), 10445 10058 complete_hypercall_exit); 10446 10059 } 10447 - EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); 10060 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_hypercall); 10448 10061 10449 10062 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) 10450 10063 { ··· 10887 10500 preempt_enable(); 10888 10501 up_read(&vcpu->kvm->arch.apicv_update_lock); 10889 10502 } 10890 - EXPORT_SYMBOL_GPL(__kvm_vcpu_update_apicv); 10503 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_update_apicv); 10891 10504 10892 10505 static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) 10893 10506 { ··· 10963 10576 __kvm_set_or_clear_apicv_inhibit(kvm, reason, set); 10964 10577 up_write(&kvm->arch.apicv_update_lock); 10965 10578 } 10966 - EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit); 10579 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_or_clear_apicv_inhibit); 10967 10580 10968 10581 static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) 10969 10582 { ··· 11183 10796 if (kvm_check_request(KVM_REQ_APF_READY, vcpu)) 11184 10797 kvm_check_async_pf_completion(vcpu); 11185 10798 11186 - /* 11187 - * Recalc MSR intercepts as userspace may want to intercept 11188 - * accesses to MSRs that KVM would otherwise pass through to 11189 - * the guest. 11190 - */ 11191 - if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu)) 11192 - kvm_x86_call(recalc_msr_intercepts)(vcpu); 10799 + if (kvm_check_request(KVM_REQ_RECALC_INTERCEPTS, vcpu)) 10800 + kvm_x86_call(recalc_intercepts)(vcpu); 11193 10801 11194 10802 if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) 11195 10803 kvm_x86_call(update_cpu_dirty_logging)(vcpu); ··· 11517 11135 11518 11136 return false; 11519 11137 } 11520 - EXPORT_SYMBOL_GPL(kvm_vcpu_has_events); 11138 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_has_events); 11521 11139 11522 11140 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 11523 11141 { ··· 11670 11288 { 11671 11289 return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT); 11672 11290 } 11673 - EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip); 11291 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_halt_noskip); 11674 11292 11675 11293 int kvm_emulate_halt(struct kvm_vcpu *vcpu) 11676 11294 { ··· 11681 11299 */ 11682 11300 return kvm_emulate_halt_noskip(vcpu) && ret; 11683 11301 } 11684 - EXPORT_SYMBOL_GPL(kvm_emulate_halt); 11302 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_halt); 11685 11303 11686 11304 fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu) 11687 11305 { 11688 - int ret; 11689 - 11690 - kvm_vcpu_srcu_read_lock(vcpu); 11691 - ret = kvm_emulate_halt(vcpu); 11692 - kvm_vcpu_srcu_read_unlock(vcpu); 11693 - 11694 - if (!ret) 11306 + if (!kvm_emulate_halt(vcpu)) 11695 11307 return EXIT_FASTPATH_EXIT_USERSPACE; 11696 11308 11697 11309 if (kvm_vcpu_running(vcpu)) ··· 11693 11317 11694 11318 return EXIT_FASTPATH_EXIT_HANDLED; 11695 11319 } 11696 - EXPORT_SYMBOL_GPL(handle_fastpath_hlt); 11320 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_hlt); 11697 11321 11698 11322 int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu) 11699 11323 { ··· 11702 11326 return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, 11703 11327 KVM_EXIT_AP_RESET_HOLD) && ret; 11704 11328 } 11705 - EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold); 11329 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_ap_reset_hold); 11706 11330 11707 11331 bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) 11708 11332 { ··· 12213 11837 struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 12214 11838 int ret; 12215 11839 11840 + if (kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) { 11841 + u64 u_cet, s_cet; 11842 + 11843 + /* 11844 + * Check both User and Supervisor on task switches as inter- 11845 + * privilege level task switches are impacted by CET at both 11846 + * the current privilege level and the new privilege level, and 11847 + * that information is not known at this time. The expectation 11848 + * is that the guest won't require emulation of task switches 11849 + * while using IBT or Shadow Stacks. 11850 + */ 11851 + if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) || 11852 + __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet)) 11853 + goto unhandled_task_switch; 11854 + 11855 + if ((u_cet | s_cet) & (CET_ENDBR_EN | CET_SHSTK_EN)) 11856 + goto unhandled_task_switch; 11857 + } 11858 + 12216 11859 init_emulate_ctxt(vcpu); 12217 11860 12218 11861 ret = emulator_task_switch(ctxt, tss_selector, idt_index, reason, ··· 12241 11846 * Report an error userspace if MMIO is needed, as KVM doesn't support 12242 11847 * MMIO during a task switch (or any other complex operation). 12243 11848 */ 12244 - if (ret || vcpu->mmio_needed) { 12245 - vcpu->mmio_needed = false; 12246 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 12247 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 12248 - vcpu->run->internal.ndata = 0; 12249 - return 0; 12250 - } 11849 + if (ret || vcpu->mmio_needed) 11850 + goto unhandled_task_switch; 12251 11851 12252 11852 kvm_rip_write(vcpu, ctxt->eip); 12253 11853 kvm_set_rflags(vcpu, ctxt->eflags); 12254 11854 return 1; 11855 + 11856 + unhandled_task_switch: 11857 + vcpu->mmio_needed = false; 11858 + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 11859 + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 11860 + vcpu->run->internal.ndata = 0; 11861 + return 0; 12255 11862 } 12256 - EXPORT_SYMBOL_GPL(kvm_task_switch); 11863 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_task_switch); 12257 11864 12258 11865 static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) 12259 11866 { ··· 12785 12388 kvfree(vcpu->arch.cpuid_entries); 12786 12389 } 12787 12390 12391 + static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event) 12392 + { 12393 + struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; 12394 + u64 xfeatures_mask; 12395 + int i; 12396 + 12397 + /* 12398 + * Guest FPU state is zero allocated and so doesn't need to be manually 12399 + * cleared on RESET, i.e. during vCPU creation. 12400 + */ 12401 + if (!init_event || !fpstate) 12402 + return; 12403 + 12404 + /* 12405 + * On INIT, only select XSTATE components are zeroed, most components 12406 + * are unchanged. Currently, the only components that are zeroed and 12407 + * supported by KVM are MPX and CET related. 12408 + */ 12409 + xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) & 12410 + (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR | 12411 + XFEATURE_MASK_CET_ALL); 12412 + if (!xfeatures_mask) 12413 + return; 12414 + 12415 + BUILD_BUG_ON(sizeof(xfeatures_mask) * BITS_PER_BYTE <= XFEATURE_MAX); 12416 + 12417 + /* 12418 + * All paths that lead to INIT are required to load the guest's FPU 12419 + * state (because most paths are buried in KVM_RUN). 12420 + */ 12421 + kvm_put_guest_fpu(vcpu); 12422 + for_each_set_bit(i, (unsigned long *)&xfeatures_mask, XFEATURE_MAX) 12423 + fpstate_clear_xstate_component(fpstate, i); 12424 + kvm_load_guest_fpu(vcpu); 12425 + } 12426 + 12788 12427 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 12789 12428 { 12790 12429 struct kvm_cpuid_entry2 *cpuid_0x1; ··· 12878 12445 kvm_async_pf_hash_reset(vcpu); 12879 12446 vcpu->arch.apf.halted = false; 12880 12447 12881 - if (vcpu->arch.guest_fpu.fpstate && kvm_mpx_supported()) { 12882 - struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; 12883 - 12884 - /* 12885 - * All paths that lead to INIT are required to load the guest's 12886 - * FPU state (because most paths are buried in KVM_RUN). 12887 - */ 12888 - if (init_event) 12889 - kvm_put_guest_fpu(vcpu); 12890 - 12891 - fpstate_clear_xstate_component(fpstate, XFEATURE_BNDREGS); 12892 - fpstate_clear_xstate_component(fpstate, XFEATURE_BNDCSR); 12893 - 12894 - if (init_event) 12895 - kvm_load_guest_fpu(vcpu); 12896 - } 12448 + kvm_xstate_reset(vcpu, init_event); 12897 12449 12898 12450 if (!init_event) { 12899 12451 vcpu->arch.smbase = 0x30000; ··· 12890 12472 MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; 12891 12473 12892 12474 __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP); 12893 - __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true); 12475 + kvm_msr_write(vcpu, MSR_IA32_XSS, 0); 12894 12476 } 12895 12477 12896 12478 /* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ ··· 12956 12538 if (init_event) 12957 12539 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); 12958 12540 } 12959 - EXPORT_SYMBOL_GPL(kvm_vcpu_reset); 12541 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_reset); 12960 12542 12961 12543 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) 12962 12544 { ··· 12968 12550 kvm_set_segment(vcpu, &cs, VCPU_SREG_CS); 12969 12551 kvm_rip_write(vcpu, 0); 12970 12552 } 12971 - EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector); 12553 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_deliver_sipi_vector); 12972 12554 12973 12555 void kvm_arch_enable_virtualization(void) 12974 12556 { ··· 13086 12668 { 13087 12669 return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id; 13088 12670 } 13089 - EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp); 12671 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_is_reset_bsp); 13090 12672 13091 12673 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) 13092 12674 { ··· 13250 12832 13251 12833 return (void __user *)hva; 13252 12834 } 13253 - EXPORT_SYMBOL_GPL(__x86_set_memory_region); 12835 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__x86_set_memory_region); 13254 12836 13255 12837 void kvm_arch_pre_destroy_vm(struct kvm *kvm) 13256 12838 { ··· 13658 13240 return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) + 13659 13241 kvm_rip_read(vcpu)); 13660 13242 } 13661 - EXPORT_SYMBOL_GPL(kvm_get_linear_rip); 13243 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip); 13662 13244 13663 13245 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip) 13664 13246 { 13665 13247 return kvm_get_linear_rip(vcpu) == linear_rip; 13666 13248 } 13667 - EXPORT_SYMBOL_GPL(kvm_is_linear_rip); 13249 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip); 13668 13250 13669 13251 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu) 13670 13252 { ··· 13675 13257 rflags &= ~X86_EFLAGS_TF; 13676 13258 return rflags; 13677 13259 } 13678 - EXPORT_SYMBOL_GPL(kvm_get_rflags); 13260 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags); 13679 13261 13680 13262 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) 13681 13263 { ··· 13690 13272 __kvm_set_rflags(vcpu, rflags); 13691 13273 kvm_make_request(KVM_REQ_EVENT, vcpu); 13692 13274 } 13693 - EXPORT_SYMBOL_GPL(kvm_set_rflags); 13275 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags); 13694 13276 13695 13277 static inline u32 kvm_async_pf_hash_fn(gfn_t gfn) 13696 13278 { ··· 13922 13504 if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) 13923 13505 kvm_noncoherent_dma_assignment_start_or_stop(kvm); 13924 13506 } 13925 - EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); 13926 13507 13927 13508 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) 13928 13509 { 13929 13510 if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) 13930 13511 kvm_noncoherent_dma_assignment_start_or_stop(kvm); 13931 13512 } 13932 - EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); 13933 13513 13934 13514 bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) 13935 13515 { 13936 13516 return atomic_read(&kvm->arch.noncoherent_dma_count); 13937 13517 } 13938 - EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); 13939 - 13940 - bool kvm_vector_hashing_enabled(void) 13941 - { 13942 - return vector_hashing; 13943 - } 13518 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_arch_has_noncoherent_dma); 13944 13519 13945 13520 bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) 13946 13521 { 13947 13522 return (vcpu->arch.msr_kvm_poll_control & 1) == 0; 13948 13523 } 13949 - EXPORT_SYMBOL_GPL(kvm_arch_no_poll); 13950 13524 13951 13525 #ifdef CONFIG_KVM_GUEST_MEMFD 13952 13526 /* ··· 13989 13579 13990 13580 return ret; 13991 13581 } 13992 - EXPORT_SYMBOL_GPL(kvm_spec_ctrl_test_value); 13582 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value); 13993 13583 13994 13584 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code) 13995 13585 { ··· 14014 13604 } 14015 13605 vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault); 14016 13606 } 14017 - EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error); 13607 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_fixup_and_inject_pf_error); 14018 13608 14019 13609 /* 14020 13610 * Handles kvm_read/write_guest_virt*() result and either injects #PF or returns ··· 14043 13633 14044 13634 return 0; 14045 13635 } 14046 - EXPORT_SYMBOL_GPL(kvm_handle_memory_failure); 13636 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_memory_failure); 14047 13637 14048 13638 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva) 14049 13639 { ··· 14107 13697 return 1; 14108 13698 } 14109 13699 } 14110 - EXPORT_SYMBOL_GPL(kvm_handle_invpcid); 13700 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_handle_invpcid); 14111 13701 14112 13702 static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu) 14113 13703 { ··· 14192 13782 14193 13783 return 0; 14194 13784 } 14195 - EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_write); 13785 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_mmio_write); 14196 13786 14197 13787 int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes, 14198 13788 void *data) ··· 14230 13820 14231 13821 return 0; 14232 13822 } 14233 - EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read); 13823 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_mmio_read); 14234 13824 14235 13825 static void advance_sev_es_emulated_pio(struct kvm_vcpu *vcpu, unsigned count, int size) 14236 13826 { ··· 14318 13908 return in ? kvm_sev_es_ins(vcpu, size, port) 14319 13909 : kvm_sev_es_outs(vcpu, size, port); 14320 13910 } 14321 - EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); 13911 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_sev_es_string_io); 14322 13912 14323 13913 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); 14324 13914 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);

+40 -2

arch/x86/kvm/x86.h

··· 50 50 u64 efer; 51 51 u64 xcr0; 52 52 u64 xss; 53 + u64 s_cet; 53 54 u64 arch_capabilities; 54 55 }; 55 56 ··· 101 100 #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX 102 101 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX 103 102 #define KVM_SVM_DEFAULT_PLE_WINDOW 3000 103 + 104 + /* 105 + * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves 106 + * are arbitrary and have no meaning, the only requirement is that they don't 107 + * conflict with "real" MSRs that KVM supports. Use values at the upper end 108 + * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values 109 + * will be usable until KVM exhausts its supply of paravirtual MSR indices. 110 + */ 111 + 112 + #define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff 104 113 105 114 static inline unsigned int __grow_ple_window(unsigned int val, 106 115 unsigned int base, unsigned int modifier, unsigned int max) ··· 442 431 443 432 int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); 444 433 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 445 - bool kvm_vector_hashing_enabled(void); 446 434 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code); 447 435 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, 448 436 void *insn, int insn_len); 449 437 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 450 438 int emulation_type, void *insn, int insn_len); 451 - fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); 439 + fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu); 440 + fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg); 452 441 fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); 442 + fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu); 453 443 454 444 extern struct kvm_caps kvm_caps; 455 445 extern struct kvm_host_values kvm_host; ··· 680 668 __reserved_bits |= X86_CR4_PCIDE; \ 681 669 if (!__cpu_has(__c, X86_FEATURE_LAM)) \ 682 670 __reserved_bits |= X86_CR4_LAM_SUP; \ 671 + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \ 672 + !__cpu_has(__c, X86_FEATURE_IBT)) \ 673 + __reserved_bits |= X86_CR4_CET; \ 683 674 __reserved_bits; \ 684 675 }) 685 676 ··· 714 699 715 700 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); 716 701 702 + #define CET_US_RESERVED_BITS GENMASK(9, 6) 703 + #define CET_US_SHSTK_MASK_BITS GENMASK(1, 0) 704 + #define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10)) 705 + #define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12) 706 + 707 + static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data) 708 + { 709 + if (data & CET_US_RESERVED_BITS) 710 + return false; 711 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && 712 + (data & CET_US_SHSTK_MASK_BITS)) 713 + return false; 714 + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) && 715 + (data & CET_US_IBT_MASK_BITS)) 716 + return false; 717 + if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4)) 718 + return false; 719 + /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */ 720 + if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR)) 721 + return false; 722 + 723 + return true; 724 + } 717 725 #endif

+1 -1

drivers/s390/crypto/vfio_ap_ops.c

··· 354 354 355 355 if (!*nib) 356 356 return -EINVAL; 357 - if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT))) 357 + if (!kvm_s390_is_gpa_in_memslot(vcpu->kvm, *nib)) 358 358 return -EINVAL; 359 359 360 360 return 0;

+18 -7

include/linux/kvm_types.h

··· 3 3 #ifndef __KVM_TYPES_H__ 4 4 #define __KVM_TYPES_H__ 5 5 6 + #include <linux/bits.h> 7 + #include <linux/export.h> 8 + #include <linux/types.h> 9 + #include <asm/kvm_types.h> 10 + 11 + #ifdef KVM_SUB_MODULES 12 + #define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol) \ 13 + EXPORT_SYMBOL_FOR_MODULES(symbol, __stringify(KVM_SUB_MODULES)) 14 + #else 15 + #define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol) 16 + #endif 17 + 18 + #ifndef __ASSEMBLER__ 19 + 20 + #include <linux/mutex.h> 21 + #include <linux/spinlock_types.h> 22 + 6 23 struct kvm; 7 24 struct kvm_async_pf; 8 25 struct kvm_device_ops; ··· 35 18 struct kvm_memslots; 36 19 37 20 enum kvm_mr_change; 38 - 39 - #include <linux/bits.h> 40 - #include <linux/mutex.h> 41 - #include <linux/types.h> 42 - #include <linux/spinlock_types.h> 43 - 44 - #include <asm/kvm_types.h> 45 21 46 22 /* 47 23 * Address types: ··· 126 116 }; 127 117 128 118 #define KVM_STATS_NAME_SIZE 48 119 + #endif /* !__ASSEMBLER__ */ 129 120 130 121 #endif /* __KVM_TYPES_H__ */

+1

tools/testing/selftests/kvm/Makefile.kvm

··· 87 87 TEST_GEN_PROGS_x86 += x86/kvm_pv_test 88 88 TEST_GEN_PROGS_x86 += x86/kvm_buslock_test 89 89 TEST_GEN_PROGS_x86 += x86/monitor_mwait_test 90 + TEST_GEN_PROGS_x86 += x86/msrs_test 90 91 TEST_GEN_PROGS_x86 += x86/nested_emulation_test 91 92 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test 92 93 TEST_GEN_PROGS_x86 += x86/platform_info_test

+5

tools/testing/selftests/kvm/include/x86/processor.h

··· 1362 1362 return get_kvm_intel_param_bool("unrestricted_guest"); 1363 1363 } 1364 1364 1365 + static inline bool kvm_is_ignore_msrs(void) 1366 + { 1367 + return get_kvm_param_bool("ignore_msrs"); 1368 + } 1369 + 1365 1370 uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr, 1366 1371 int *level); 1367 1372 uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);

+489

tools/testing/selftests/kvm/x86/msrs_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <asm/msr-index.h> 3 + 4 + #include <stdint.h> 5 + 6 + #include "kvm_util.h" 7 + #include "processor.h" 8 + 9 + /* Use HYPERVISOR for MSRs that are emulated unconditionally (as is HYPERVISOR). */ 10 + #define X86_FEATURE_NONE X86_FEATURE_HYPERVISOR 11 + 12 + struct kvm_msr { 13 + const struct kvm_x86_cpu_feature feature; 14 + const struct kvm_x86_cpu_feature feature2; 15 + const char *name; 16 + const u64 reset_val; 17 + const u64 write_val; 18 + const u64 rsvd_val; 19 + const u32 index; 20 + const bool is_kvm_defined; 21 + }; 22 + 23 + #define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2, is_kvm) \ 24 + { \ 25 + .index = msr, \ 26 + .name = str, \ 27 + .write_val = val, \ 28 + .rsvd_val = rsvd, \ 29 + .reset_val = reset, \ 30 + .feature = X86_FEATURE_ ##feat, \ 31 + .feature2 = X86_FEATURE_ ##f2, \ 32 + .is_kvm_defined = is_kvm, \ 33 + } 34 + 35 + #define __MSR_TEST(msr, str, val, rsvd, reset, feat) \ 36 + ____MSR_TEST(msr, str, val, rsvd, reset, feat, feat, false) 37 + 38 + #define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat) \ 39 + __MSR_TEST(msr, #msr, val, rsvd, reset, feat) 40 + 41 + #define MSR_TEST(msr, val, rsvd, feat) \ 42 + __MSR_TEST(msr, #msr, val, rsvd, 0, feat) 43 + 44 + #define MSR_TEST2(msr, val, rsvd, feat, f2) \ 45 + ____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2, false) 46 + 47 + /* 48 + * Note, use a page aligned value for the canonical value so that the value 49 + * is compatible with MSRs that use bits 11:0 for things other than addresses. 50 + */ 51 + static const u64 canonical_val = 0x123456789000ull; 52 + 53 + /* 54 + * Arbitrary value with bits set in every byte, but not all bits set. This is 55 + * also a non-canonical value, but that's coincidental (any 64-bit value with 56 + * an alternating 0s/1s pattern will be non-canonical). 57 + */ 58 + static const u64 u64_val = 0xaaaa5555aaaa5555ull; 59 + 60 + #define MSR_TEST_CANONICAL(msr, feat) \ 61 + __MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat) 62 + 63 + #define MSR_TEST_KVM(msr, val, rsvd, feat) \ 64 + ____MSR_TEST(KVM_REG_ ##msr, #msr, val, rsvd, 0, feat, feat, true) 65 + 66 + /* 67 + * The main struct must be scoped to a function due to the use of structures to 68 + * define features. For the global structure, allocate enough space for the 69 + * foreseeable future without getting too ridiculous, to minimize maintenance 70 + * costs (bumping the array size every time an MSR is added is really annoying). 71 + */ 72 + static struct kvm_msr msrs[128]; 73 + static int idx; 74 + 75 + static bool ignore_unsupported_msrs; 76 + 77 + static u64 fixup_rdmsr_val(u32 msr, u64 want) 78 + { 79 + /* 80 + * AMD CPUs drop bits 63:32 on some MSRs that Intel CPUs support. KVM 81 + * is supposed to emulate that behavior based on guest vendor model 82 + * (which is the same as the host vendor model for this test). 83 + */ 84 + if (!host_cpu_is_amd) 85 + return want; 86 + 87 + switch (msr) { 88 + case MSR_IA32_SYSENTER_ESP: 89 + case MSR_IA32_SYSENTER_EIP: 90 + case MSR_TSC_AUX: 91 + return want & GENMASK_ULL(31, 0); 92 + default: 93 + return want; 94 + } 95 + } 96 + 97 + static void __rdmsr(u32 msr, u64 want) 98 + { 99 + u64 val; 100 + u8 vec; 101 + 102 + vec = rdmsr_safe(msr, &val); 103 + __GUEST_ASSERT(!vec, "Unexpected %s on RDMSR(0x%x)", ex_str(vec), msr); 104 + 105 + __GUEST_ASSERT(val == want, "Wanted 0x%lx from RDMSR(0x%x), got 0x%lx", 106 + want, msr, val); 107 + } 108 + 109 + static void __wrmsr(u32 msr, u64 val) 110 + { 111 + u8 vec; 112 + 113 + vec = wrmsr_safe(msr, val); 114 + __GUEST_ASSERT(!vec, "Unexpected %s on WRMSR(0x%x, 0x%lx)", 115 + ex_str(vec), msr, val); 116 + __rdmsr(msr, fixup_rdmsr_val(msr, val)); 117 + } 118 + 119 + static void guest_test_supported_msr(const struct kvm_msr *msr) 120 + { 121 + __rdmsr(msr->index, msr->reset_val); 122 + __wrmsr(msr->index, msr->write_val); 123 + GUEST_SYNC(fixup_rdmsr_val(msr->index, msr->write_val)); 124 + 125 + __rdmsr(msr->index, msr->reset_val); 126 + } 127 + 128 + static void guest_test_unsupported_msr(const struct kvm_msr *msr) 129 + { 130 + u64 val; 131 + u8 vec; 132 + 133 + /* 134 + * KVM's ABI with respect to ignore_msrs is a mess and largely beyond 135 + * repair, just skip the unsupported MSR tests. 136 + */ 137 + if (ignore_unsupported_msrs) 138 + goto skip_wrmsr_gp; 139 + 140 + /* 141 + * {S,U}_CET exist if IBT or SHSTK is supported, but with bits that are 142 + * writable only if their associated feature is supported. Skip the 143 + * RDMSR #GP test if the secondary feature is supported, but perform 144 + * the WRMSR #GP test as the to-be-written value is tied to the primary 145 + * feature. For all other MSRs, simply do nothing. 146 + */ 147 + if (this_cpu_has(msr->feature2)) { 148 + if (msr->index != MSR_IA32_U_CET && 149 + msr->index != MSR_IA32_S_CET) 150 + goto skip_wrmsr_gp; 151 + 152 + goto skip_rdmsr_gp; 153 + } 154 + 155 + vec = rdmsr_safe(msr->index, &val); 156 + __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s", 157 + msr->index, ex_str(vec)); 158 + 159 + skip_rdmsr_gp: 160 + vec = wrmsr_safe(msr->index, msr->write_val); 161 + __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s", 162 + msr->index, msr->write_val, ex_str(vec)); 163 + 164 + skip_wrmsr_gp: 165 + GUEST_SYNC(0); 166 + } 167 + 168 + void guest_test_reserved_val(const struct kvm_msr *msr) 169 + { 170 + /* Skip reserved value checks as well, ignore_msrs is trully a mess. */ 171 + if (ignore_unsupported_msrs) 172 + return; 173 + 174 + /* 175 + * If the CPU will truncate the written value (e.g. SYSENTER on AMD), 176 + * expect success and a truncated value, not #GP. 177 + */ 178 + if (!this_cpu_has(msr->feature) || 179 + msr->rsvd_val == fixup_rdmsr_val(msr->index, msr->rsvd_val)) { 180 + u8 vec = wrmsr_safe(msr->index, msr->rsvd_val); 181 + 182 + __GUEST_ASSERT(vec == GP_VECTOR, 183 + "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s", 184 + msr->index, msr->rsvd_val, ex_str(vec)); 185 + } else { 186 + __wrmsr(msr->index, msr->rsvd_val); 187 + __wrmsr(msr->index, msr->reset_val); 188 + } 189 + } 190 + 191 + static void guest_main(void) 192 + { 193 + for (;;) { 194 + const struct kvm_msr *msr = &msrs[READ_ONCE(idx)]; 195 + 196 + if (this_cpu_has(msr->feature)) 197 + guest_test_supported_msr(msr); 198 + else 199 + guest_test_unsupported_msr(msr); 200 + 201 + if (msr->rsvd_val) 202 + guest_test_reserved_val(msr); 203 + 204 + GUEST_SYNC(msr->reset_val); 205 + } 206 + } 207 + 208 + static bool has_one_reg; 209 + static bool use_one_reg; 210 + 211 + #define KVM_X86_MAX_NR_REGS 1 212 + 213 + static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg) 214 + { 215 + struct { 216 + struct kvm_reg_list list; 217 + u64 regs[KVM_X86_MAX_NR_REGS]; 218 + } regs = {}; 219 + int r, i; 220 + 221 + /* 222 + * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported 223 + * regs, then the vCPU obviously doesn't support the reg. 224 + */ 225 + r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list); 226 + if (!r) 227 + return false; 228 + 229 + TEST_ASSERT_EQ(errno, E2BIG); 230 + 231 + /* 232 + * KVM x86 is expected to support enumerating a relative small number 233 + * of regs. The majority of registers supported by KVM_{G,S}ET_ONE_REG 234 + * are enumerated via other ioctls, e.g. KVM_GET_MSR_INDEX_LIST. For 235 + * simplicity, hardcode the maximum number of regs and manually update 236 + * the test as necessary. 237 + */ 238 + TEST_ASSERT(regs.list.n <= KVM_X86_MAX_NR_REGS, 239 + "KVM reports %llu regs, test expects at most %u regs, stale test?", 240 + regs.list.n, KVM_X86_MAX_NR_REGS); 241 + 242 + vcpu_ioctl(vcpu, KVM_GET_REG_LIST, &regs.list); 243 + for (i = 0; i < regs.list.n; i++) { 244 + if (regs.regs[i] == reg) 245 + return true; 246 + } 247 + 248 + return false; 249 + } 250 + 251 + static void host_test_kvm_reg(struct kvm_vcpu *vcpu) 252 + { 253 + bool has_reg = vcpu_cpuid_has(vcpu, msrs[idx].feature); 254 + u64 reset_val = msrs[idx].reset_val; 255 + u64 write_val = msrs[idx].write_val; 256 + u64 rsvd_val = msrs[idx].rsvd_val; 257 + u32 reg = msrs[idx].index; 258 + u64 val; 259 + int r; 260 + 261 + if (!use_one_reg) 262 + return; 263 + 264 + TEST_ASSERT_EQ(vcpu_has_reg(vcpu, KVM_X86_REG_KVM(reg)), has_reg); 265 + 266 + if (!has_reg) { 267 + r = __vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg), &val); 268 + TEST_ASSERT(r && errno == EINVAL, 269 + "Expected failure on get_reg(0x%x)", reg); 270 + rsvd_val = 0; 271 + goto out; 272 + } 273 + 274 + val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg)); 275 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 276 + reset_val, reg, val); 277 + 278 + vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), write_val); 279 + val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg)); 280 + TEST_ASSERT(val == write_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 281 + write_val, reg, val); 282 + 283 + out: 284 + r = __vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), rsvd_val); 285 + TEST_ASSERT(r, "Expected failure on set_reg(0x%x, 0x%lx)", reg, rsvd_val); 286 + } 287 + 288 + static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val) 289 + { 290 + u64 reset_val = msrs[idx].reset_val; 291 + u32 msr = msrs[idx].index; 292 + u64 val; 293 + 294 + if (!kvm_cpu_has(msrs[idx].feature)) 295 + return; 296 + 297 + val = vcpu_get_msr(vcpu, msr); 298 + TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx", 299 + guest_val, msr, val); 300 + 301 + if (use_one_reg) 302 + vcpu_set_reg(vcpu, KVM_X86_REG_MSR(msr), reset_val); 303 + else 304 + vcpu_set_msr(vcpu, msr, reset_val); 305 + 306 + val = vcpu_get_msr(vcpu, msr); 307 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx", 308 + reset_val, msr, val); 309 + 310 + if (!has_one_reg) 311 + return; 312 + 313 + val = vcpu_get_reg(vcpu, KVM_X86_REG_MSR(msr)); 314 + TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx", 315 + reset_val, msr, val); 316 + } 317 + 318 + static void do_vcpu_run(struct kvm_vcpu *vcpu) 319 + { 320 + struct ucall uc; 321 + 322 + for (;;) { 323 + vcpu_run(vcpu); 324 + 325 + switch (get_ucall(vcpu, &uc)) { 326 + case UCALL_SYNC: 327 + host_test_msr(vcpu, uc.args[1]); 328 + return; 329 + case UCALL_PRINTF: 330 + pr_info("%s", uc.buffer); 331 + break; 332 + case UCALL_ABORT: 333 + REPORT_GUEST_ASSERT(uc); 334 + case UCALL_DONE: 335 + TEST_FAIL("Unexpected UCALL_DONE"); 336 + default: 337 + TEST_FAIL("Unexpected ucall: %lu", uc.cmd); 338 + } 339 + } 340 + } 341 + 342 + static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS) 343 + { 344 + int i; 345 + 346 + for (i = 0; i < NR_VCPUS; i++) 347 + do_vcpu_run(vcpus[i]); 348 + } 349 + 350 + #define MISC_ENABLES_RESET_VAL (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | MSR_IA32_MISC_ENABLE_BTS_UNAVAIL) 351 + 352 + static void test_msrs(void) 353 + { 354 + const struct kvm_msr __msrs[] = { 355 + MSR_TEST_NON_ZERO(MSR_IA32_MISC_ENABLE, 356 + MISC_ENABLES_RESET_VAL | MSR_IA32_MISC_ENABLE_FAST_STRING, 357 + MSR_IA32_MISC_ENABLE_FAST_STRING, MISC_ENABLES_RESET_VAL, NONE), 358 + MSR_TEST_NON_ZERO(MSR_IA32_CR_PAT, 0x07070707, 0, 0x7040600070406, NONE), 359 + 360 + /* 361 + * TSC_AUX is supported if RDTSCP *or* RDPID is supported. Add 362 + * entries for each features so that TSC_AUX doesn't exists for 363 + * the "unsupported" vCPU, and obviously to test both cases. 364 + */ 365 + MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDTSCP, RDPID), 366 + MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDPID, RDTSCP), 367 + 368 + MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE), 369 + /* 370 + * SYSENTER_{ESP,EIP} are technically non-canonical on Intel, 371 + * but KVM doesn't emulate that behavior on emulated writes, 372 + * i.e. this test will observe different behavior if the MSR 373 + * writes are handed by hardware vs. KVM. KVM's behavior is 374 + * intended (though far from ideal), so don't bother testing 375 + * non-canonical values. 376 + */ 377 + MSR_TEST(MSR_IA32_SYSENTER_ESP, canonical_val, 0, NONE), 378 + MSR_TEST(MSR_IA32_SYSENTER_EIP, canonical_val, 0, NONE), 379 + 380 + MSR_TEST_CANONICAL(MSR_FS_BASE, LM), 381 + MSR_TEST_CANONICAL(MSR_GS_BASE, LM), 382 + MSR_TEST_CANONICAL(MSR_KERNEL_GS_BASE, LM), 383 + MSR_TEST_CANONICAL(MSR_LSTAR, LM), 384 + MSR_TEST_CANONICAL(MSR_CSTAR, LM), 385 + MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM), 386 + 387 + MSR_TEST2(MSR_IA32_S_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT), 388 + MSR_TEST2(MSR_IA32_S_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK), 389 + MSR_TEST2(MSR_IA32_U_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT), 390 + MSR_TEST2(MSR_IA32_U_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK), 391 + MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK), 392 + MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK), 393 + MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK), 394 + MSR_TEST(MSR_IA32_PL1_SSP, canonical_val, canonical_val | 1, SHSTK), 395 + MSR_TEST_CANONICAL(MSR_IA32_PL2_SSP, SHSTK), 396 + MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK), 397 + MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK), 398 + MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK), 399 + 400 + MSR_TEST_KVM(GUEST_SSP, canonical_val, NONCANONICAL, SHSTK), 401 + }; 402 + 403 + const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE; 404 + const struct kvm_x86_cpu_feature feat_lm = X86_FEATURE_LM; 405 + 406 + /* 407 + * Create three vCPUs, but run them on the same task, to validate KVM's 408 + * context switching of MSR state. Don't pin the task to a pCPU to 409 + * also validate KVM's handling of cross-pCPU migration. Use the full 410 + * set of features for the first two vCPUs, but clear all features in 411 + * third vCPU in order to test both positive and negative paths. 412 + */ 413 + const int NR_VCPUS = 3; 414 + struct kvm_vcpu *vcpus[NR_VCPUS]; 415 + struct kvm_vm *vm; 416 + int i; 417 + 418 + kvm_static_assert(sizeof(__msrs) <= sizeof(msrs)); 419 + kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs)); 420 + memcpy(msrs, __msrs, sizeof(__msrs)); 421 + 422 + ignore_unsupported_msrs = kvm_is_ignore_msrs(); 423 + 424 + vm = vm_create_with_vcpus(NR_VCPUS, guest_main, vcpus); 425 + 426 + sync_global_to_guest(vm, msrs); 427 + sync_global_to_guest(vm, ignore_unsupported_msrs); 428 + 429 + /* 430 + * Clear features in the "unsupported features" vCPU. This needs to be 431 + * done before the first vCPU run as KVM's ABI is that guest CPUID is 432 + * immutable once the vCPU has been run. 433 + */ 434 + for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) { 435 + /* 436 + * Don't clear LM; selftests are 64-bit only, and KVM doesn't 437 + * honor LM=0 for MSRs that are supposed to exist if and only 438 + * if the vCPU is a 64-bit model. Ditto for NONE; clearing a 439 + * fake feature flag will result in false failures. 440 + */ 441 + if (memcmp(&msrs[idx].feature, &feat_lm, sizeof(feat_lm)) && 442 + memcmp(&msrs[idx].feature, &feat_none, sizeof(feat_none))) 443 + vcpu_clear_cpuid_feature(vcpus[2], msrs[idx].feature); 444 + } 445 + 446 + for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) { 447 + struct kvm_msr *msr = &msrs[idx]; 448 + 449 + if (msr->is_kvm_defined) { 450 + for (i = 0; i < NR_VCPUS; i++) 451 + host_test_kvm_reg(vcpus[i]); 452 + continue; 453 + } 454 + 455 + /* 456 + * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST 457 + * are consistent with respect to MSRs whose existence is 458 + * enumerated via CPUID. Skip the check for FS/GS.base MSRs, 459 + * as they aren't reported in the save/restore list since their 460 + * state is managed via SREGS. 461 + */ 462 + TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE || 463 + kvm_msr_is_in_save_restore_list(msr->index) == 464 + (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)), 465 + "%s %s in save/restore list, but %s according to CPUID", msr->name, 466 + kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't", 467 + (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ? 468 + "supported" : "unsupported"); 469 + 470 + sync_global_to_guest(vm, idx); 471 + 472 + vcpus_run(vcpus, NR_VCPUS); 473 + vcpus_run(vcpus, NR_VCPUS); 474 + } 475 + 476 + kvm_vm_free(vm); 477 + } 478 + 479 + int main(void) 480 + { 481 + has_one_reg = kvm_has_cap(KVM_CAP_ONE_REG); 482 + 483 + test_msrs(); 484 + 485 + if (has_one_reg) { 486 + use_one_reg = true; 487 + test_msrs(); 488 + } 489 + }

+5 -3

tools/testing/selftests/kvm/x86/pmu_counters_test.c

··· 14 14 #define NUM_BRANCH_INSNS_RETIRED (NUM_LOOPS) 15 15 16 16 /* 17 - * Number of instructions in each loop. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 18 - * 1 LOOP. 17 + * Number of instructions in each loop. 1 ENTER, 1 CLFLUSH/CLFLUSHOPT/NOP, 18 + * 1 MFENCE, 1 MOV, 1 LEAVE, 1 LOOP. 19 19 */ 20 - #define NUM_INSNS_PER_LOOP 4 20 + #define NUM_INSNS_PER_LOOP 6 21 21 22 22 /* 23 23 * Number of "extra" instructions that will be counted, i.e. the number of ··· 226 226 __asm__ __volatile__("wrmsr\n\t" \ 227 227 " mov $" __stringify(NUM_LOOPS) ", %%ecx\n\t" \ 228 228 "1:\n\t" \ 229 + FEP "enter $0, $0\n\t" \ 229 230 clflush "\n\t" \ 230 231 "mfence\n\t" \ 231 232 "mov %[m], %%eax\n\t" \ 233 + FEP "leave\n\t" \ 232 234 FEP "loop 1b\n\t" \ 233 235 FEP "mov %%edi, %%ecx\n\t" \ 234 236 FEP "xor %%eax, %%eax\n\t" \

+1 -1

virt/kvm/eventfd.c

··· 525 525 526 526 return false; 527 527 } 528 - EXPORT_SYMBOL_GPL(kvm_irq_has_notifier); 528 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_irq_has_notifier); 529 529 530 530 void kvm_notify_acked_gsi(struct kvm *kvm, int gsi) 531 531 {

+4 -3

virt/kvm/guest_memfd.c

··· 702 702 fput(file); 703 703 return r; 704 704 } 705 - EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); 705 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn); 706 706 707 707 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE 708 708 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages, ··· 716 716 long i; 717 717 718 718 lockdep_assert_held(&kvm->slots_lock); 719 - if (npages < 0) 719 + 720 + if (WARN_ON_ONCE(npages <= 0)) 720 721 return -EINVAL; 721 722 722 723 slot = gfn_to_memslot(kvm, start_gfn); ··· 785 784 fput(file); 786 785 return ret && !i ? ret : i; 787 786 } 788 - EXPORT_SYMBOL_GPL(kvm_gmem_populate); 787 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_populate); 789 788 #endif

+64 -63

virt/kvm/kvm_main.c

··· 77 77 /* Architectures should define their poll value according to the halt latency */ 78 78 unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT; 79 79 module_param(halt_poll_ns, uint, 0644); 80 - EXPORT_SYMBOL_GPL(halt_poll_ns); 80 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns); 81 81 82 82 /* Default doubles per-vcpu halt_poll_ns. */ 83 83 unsigned int halt_poll_ns_grow = 2; 84 84 module_param(halt_poll_ns_grow, uint, 0644); 85 - EXPORT_SYMBOL_GPL(halt_poll_ns_grow); 85 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_grow); 86 86 87 87 /* The start value to grow halt_poll_ns from */ 88 88 unsigned int halt_poll_ns_grow_start = 10000; /* 10us */ 89 89 module_param(halt_poll_ns_grow_start, uint, 0644); 90 - EXPORT_SYMBOL_GPL(halt_poll_ns_grow_start); 90 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_grow_start); 91 91 92 92 /* Default halves per-vcpu halt_poll_ns. */ 93 93 unsigned int halt_poll_ns_shrink = 2; 94 94 module_param(halt_poll_ns_shrink, uint, 0644); 95 - EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); 95 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink); 96 96 97 97 /* 98 98 * Allow direct access (from KVM or the CPU) without MMU notifier protection ··· 170 170 kvm_arch_vcpu_load(vcpu, cpu); 171 171 put_cpu(); 172 172 } 173 - EXPORT_SYMBOL_GPL(vcpu_load); 173 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(vcpu_load); 174 174 175 175 void vcpu_put(struct kvm_vcpu *vcpu) 176 176 { ··· 180 180 __this_cpu_write(kvm_running_vcpu, NULL); 181 181 preempt_enable(); 182 182 } 183 - EXPORT_SYMBOL_GPL(vcpu_put); 183 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(vcpu_put); 184 184 185 185 /* TODO: merge with kvm_arch_vcpu_should_kick */ 186 186 static bool kvm_request_needs_ipi(struct kvm_vcpu *vcpu, unsigned req) ··· 288 288 289 289 return called; 290 290 } 291 - EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request); 291 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_make_all_cpus_request); 292 292 293 293 void kvm_flush_remote_tlbs(struct kvm *kvm) 294 294 { ··· 309 309 || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) 310 310 ++kvm->stat.generic.remote_tlb_flush; 311 311 } 312 - EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs); 312 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_flush_remote_tlbs); 313 313 314 314 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages) 315 315 { ··· 499 499 500 500 atomic_set(&kvm->online_vcpus, 0); 501 501 } 502 - EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); 502 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_destroy_vcpus); 503 503 504 504 #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER 505 505 static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) ··· 1365 1365 { 1366 1366 WARN_ON(refcount_dec_and_test(&kvm->users_count)); 1367 1367 } 1368 - EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); 1368 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_put_kvm_no_destroy); 1369 1369 1370 1370 static int kvm_vm_release(struct inode *inode, struct file *filp) 1371 1371 { ··· 1397 1397 } 1398 1398 return -EINTR; 1399 1399 } 1400 - EXPORT_SYMBOL_GPL(kvm_trylock_all_vcpus); 1400 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_trylock_all_vcpus); 1401 1401 1402 1402 int kvm_lock_all_vcpus(struct kvm *kvm) 1403 1403 { ··· 1422 1422 } 1423 1423 return r; 1424 1424 } 1425 - EXPORT_SYMBOL_GPL(kvm_lock_all_vcpus); 1425 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lock_all_vcpus); 1426 1426 1427 1427 void kvm_unlock_all_vcpus(struct kvm *kvm) 1428 1428 { ··· 1434 1434 kvm_for_each_vcpu(i, vcpu, kvm) 1435 1435 mutex_unlock(&vcpu->mutex); 1436 1436 } 1437 - EXPORT_SYMBOL_GPL(kvm_unlock_all_vcpus); 1437 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_unlock_all_vcpus); 1438 1438 1439 1439 /* 1440 1440 * Allocation size is twice as large as the actual dirty bitmap size. ··· 2142 2142 2143 2143 return kvm_set_memory_region(kvm, mem); 2144 2144 } 2145 - EXPORT_SYMBOL_GPL(kvm_set_internal_memslot); 2145 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_internal_memslot); 2146 2146 2147 2147 static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, 2148 2148 struct kvm_userspace_memory_region2 *mem) ··· 2201 2201 *is_dirty = 1; 2202 2202 return 0; 2203 2203 } 2204 - EXPORT_SYMBOL_GPL(kvm_get_dirty_log); 2204 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dirty_log); 2205 2205 2206 2206 #else /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ 2207 2207 /** ··· 2636 2636 { 2637 2637 return __gfn_to_memslot(kvm_memslots(kvm), gfn); 2638 2638 } 2639 - EXPORT_SYMBOL_GPL(gfn_to_memslot); 2639 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_memslot); 2640 2640 2641 2641 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn) 2642 2642 { ··· 2670 2670 2671 2671 return NULL; 2672 2672 } 2673 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_gfn_to_memslot); 2673 2674 2674 2675 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) 2675 2676 { ··· 2678 2677 2679 2678 return kvm_is_visible_memslot(memslot); 2680 2679 } 2681 - EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); 2680 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_visible_gfn); 2682 2681 2683 2682 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) 2684 2683 { ··· 2686 2685 2687 2686 return kvm_is_visible_memslot(memslot); 2688 2687 } 2689 - EXPORT_SYMBOL_GPL(kvm_vcpu_is_visible_gfn); 2688 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_is_visible_gfn); 2690 2689 2691 2690 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) 2692 2691 { ··· 2743 2742 { 2744 2743 return gfn_to_hva_many(slot, gfn, NULL); 2745 2744 } 2746 - EXPORT_SYMBOL_GPL(gfn_to_hva_memslot); 2745 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_hva_memslot); 2747 2746 2748 2747 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) 2749 2748 { 2750 2749 return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL); 2751 2750 } 2752 - EXPORT_SYMBOL_GPL(gfn_to_hva); 2751 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(gfn_to_hva); 2753 2752 2754 2753 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn) 2755 2754 { 2756 2755 return gfn_to_hva_many(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, NULL); 2757 2756 } 2758 - EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_hva); 2757 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_gfn_to_hva); 2759 2758 2760 2759 /* 2761 2760 * Return the hva of a @gfn and the R/W attribute if possible. ··· 2819 2818 kvm_set_page_accessed(page); 2820 2819 put_page(page); 2821 2820 } 2822 - EXPORT_SYMBOL_GPL(kvm_release_page_clean); 2821 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_release_page_clean); 2823 2822 2824 2823 void kvm_release_page_dirty(struct page *page) 2825 2824 { ··· 2829 2828 kvm_set_page_dirty(page); 2830 2829 kvm_release_page_clean(page); 2831 2830 } 2832 - EXPORT_SYMBOL_GPL(kvm_release_page_dirty); 2831 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_release_page_dirty); 2833 2832 2834 2833 static kvm_pfn_t kvm_resolve_pfn(struct kvm_follow_pfn *kfp, struct page *page, 2835 2834 struct follow_pfnmap_args *map, bool writable) ··· 3073 3072 3074 3073 return kvm_follow_pfn(&kfp); 3075 3074 } 3076 - EXPORT_SYMBOL_GPL(__kvm_faultin_pfn); 3075 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_faultin_pfn); 3077 3076 3078 3077 int kvm_prefetch_pages(struct kvm_memory_slot *slot, gfn_t gfn, 3079 3078 struct page **pages, int nr_pages) ··· 3090 3089 3091 3090 return get_user_pages_fast_only(addr, nr_pages, FOLL_WRITE, pages); 3092 3091 } 3093 - EXPORT_SYMBOL_GPL(kvm_prefetch_pages); 3092 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_prefetch_pages); 3094 3093 3095 3094 /* 3096 3095 * Don't use this API unless you are absolutely, positively certain that KVM ··· 3112 3111 (void)kvm_follow_pfn(&kfp); 3113 3112 return refcounted_page; 3114 3113 } 3115 - EXPORT_SYMBOL_GPL(__gfn_to_page); 3114 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__gfn_to_page); 3116 3115 3117 3116 int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map, 3118 3117 bool writable) ··· 3146 3145 3147 3146 return map->hva ? 0 : -EFAULT; 3148 3147 } 3149 - EXPORT_SYMBOL_GPL(__kvm_vcpu_map); 3148 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_map); 3150 3149 3151 3150 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map) 3152 3151 { ··· 3174 3173 map->page = NULL; 3175 3174 map->pinned_page = NULL; 3176 3175 } 3177 - EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); 3176 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_unmap); 3178 3177 3179 3178 static int next_segment(unsigned long len, int offset) 3180 3179 { ··· 3210 3209 3211 3210 return __kvm_read_guest_page(slot, gfn, data, offset, len); 3212 3211 } 3213 - EXPORT_SYMBOL_GPL(kvm_read_guest_page); 3212 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_page); 3214 3213 3215 3214 int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, 3216 3215 int offset, int len) ··· 3219 3218 3220 3219 return __kvm_read_guest_page(slot, gfn, data, offset, len); 3221 3220 } 3222 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); 3221 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest_page); 3223 3222 3224 3223 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len) 3225 3224 { ··· 3239 3238 } 3240 3239 return 0; 3241 3240 } 3242 - EXPORT_SYMBOL_GPL(kvm_read_guest); 3241 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest); 3243 3242 3244 3243 int kvm_vcpu_read_guest(struct kvm_vcpu *vcpu, gpa_t gpa, void *data, unsigned long len) 3245 3244 { ··· 3259 3258 } 3260 3259 return 0; 3261 3260 } 3262 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest); 3261 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest); 3263 3262 3264 3263 static int __kvm_read_guest_atomic(struct kvm_memory_slot *slot, gfn_t gfn, 3265 3264 void *data, int offset, unsigned long len) ··· 3290 3289 3291 3290 return __kvm_read_guest_atomic(slot, gfn, data, offset, len); 3292 3291 } 3293 - EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); 3292 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_read_guest_atomic); 3294 3293 3295 3294 /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ 3296 3295 static int __kvm_write_guest_page(struct kvm *kvm, ··· 3320 3319 3321 3320 return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); 3322 3321 } 3323 - EXPORT_SYMBOL_GPL(kvm_write_guest_page); 3322 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_page); 3324 3323 3325 3324 int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, 3326 3325 const void *data, int offset, int len) ··· 3329 3328 3330 3329 return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); 3331 3330 } 3332 - EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); 3331 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_write_guest_page); 3333 3332 3334 3333 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, 3335 3334 unsigned long len) ··· 3350 3349 } 3351 3350 return 0; 3352 3351 } 3353 - EXPORT_SYMBOL_GPL(kvm_write_guest); 3352 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest); 3354 3353 3355 3354 int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data, 3356 3355 unsigned long len) ··· 3371 3370 } 3372 3371 return 0; 3373 3372 } 3374 - EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest); 3373 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_write_guest); 3375 3374 3376 3375 static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots, 3377 3376 struct gfn_to_hva_cache *ghc, ··· 3420 3419 struct kvm_memslots *slots = kvm_memslots(kvm); 3421 3420 return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); 3422 3421 } 3423 - EXPORT_SYMBOL_GPL(kvm_gfn_to_hva_cache_init); 3422 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gfn_to_hva_cache_init); 3424 3423 3425 3424 int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3426 3425 void *data, unsigned int offset, ··· 3451 3450 3452 3451 return 0; 3453 3452 } 3454 - EXPORT_SYMBOL_GPL(kvm_write_guest_offset_cached); 3453 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_offset_cached); 3455 3454 3456 3455 int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3457 3456 void *data, unsigned long len) 3458 3457 { 3459 3458 return kvm_write_guest_offset_cached(kvm, ghc, data, 0, len); 3460 3459 } 3461 - EXPORT_SYMBOL_GPL(kvm_write_guest_cached); 3460 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_write_guest_cached); 3462 3461 3463 3462 int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3464 3463 void *data, unsigned int offset, ··· 3488 3487 3489 3488 return 0; 3490 3489 } 3491 - EXPORT_SYMBOL_GPL(kvm_read_guest_offset_cached); 3490 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_offset_cached); 3492 3491 3493 3492 int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, 3494 3493 void *data, unsigned long len) 3495 3494 { 3496 3495 return kvm_read_guest_offset_cached(kvm, ghc, data, 0, len); 3497 3496 } 3498 - EXPORT_SYMBOL_GPL(kvm_read_guest_cached); 3497 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_read_guest_cached); 3499 3498 3500 3499 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) 3501 3500 { ··· 3515 3514 } 3516 3515 return 0; 3517 3516 } 3518 - EXPORT_SYMBOL_GPL(kvm_clear_guest); 3517 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_clear_guest); 3519 3518 3520 3519 void mark_page_dirty_in_slot(struct kvm *kvm, 3521 3520 const struct kvm_memory_slot *memslot, ··· 3540 3539 set_bit_le(rel_gfn, memslot->dirty_bitmap); 3541 3540 } 3542 3541 } 3543 - EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot); 3542 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(mark_page_dirty_in_slot); 3544 3543 3545 3544 void mark_page_dirty(struct kvm *kvm, gfn_t gfn) 3546 3545 { ··· 3549 3548 memslot = gfn_to_memslot(kvm, gfn); 3550 3549 mark_page_dirty_in_slot(kvm, memslot, gfn); 3551 3550 } 3552 - EXPORT_SYMBOL_GPL(mark_page_dirty); 3551 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(mark_page_dirty); 3553 3552 3554 3553 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) 3555 3554 { ··· 3558 3557 memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 3559 3558 mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn); 3560 3559 } 3561 - EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); 3560 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_mark_page_dirty); 3562 3561 3563 3562 void kvm_sigset_activate(struct kvm_vcpu *vcpu) 3564 3563 { ··· 3795 3794 3796 3795 trace_kvm_vcpu_wakeup(halt_ns, waited, vcpu_valid_wakeup(vcpu)); 3797 3796 } 3798 - EXPORT_SYMBOL_GPL(kvm_vcpu_halt); 3797 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_halt); 3799 3798 3800 3799 bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu) 3801 3800 { ··· 3807 3806 3808 3807 return false; 3809 3808 } 3810 - EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up); 3809 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_wake_up); 3811 3810 3812 3811 #ifndef CONFIG_S390 3813 3812 /* ··· 3859 3858 out: 3860 3859 put_cpu(); 3861 3860 } 3862 - EXPORT_SYMBOL_GPL(__kvm_vcpu_kick); 3861 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_kick); 3863 3862 #endif /* !CONFIG_S390 */ 3864 3863 3865 3864 int kvm_vcpu_yield_to(struct kvm_vcpu *target) ··· 3882 3881 3883 3882 return ret; 3884 3883 } 3885 - EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to); 3884 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_yield_to); 3886 3885 3887 3886 /* 3888 3887 * Helper that checks whether a VCPU is eligible for directed yield. ··· 4037 4036 /* Ensure vcpu is not eligible during next spinloop */ 4038 4037 kvm_vcpu_set_dy_eligible(me, false); 4039 4038 } 4040 - EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); 4039 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_on_spin); 4041 4040 4042 4041 static bool kvm_page_in_dirty_ring(struct kvm *kvm, unsigned long pgoff) 4043 4042 { ··· 5019 5018 5020 5019 return true; 5021 5020 } 5022 - EXPORT_SYMBOL_GPL(kvm_are_all_memslots_empty); 5021 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_are_all_memslots_empty); 5023 5022 5024 5023 static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, 5025 5024 struct kvm_enable_cap *cap) ··· 5474 5473 { 5475 5474 return file && file->f_op == &kvm_vm_fops; 5476 5475 } 5477 - EXPORT_SYMBOL_GPL(file_is_kvm); 5476 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm); 5478 5477 5479 5478 static int kvm_dev_ioctl_create_vm(unsigned long type) 5480 5479 { ··· 5569 5568 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING 5570 5569 bool enable_virt_at_load = true; 5571 5570 module_param(enable_virt_at_load, bool, 0444); 5572 - EXPORT_SYMBOL_GPL(enable_virt_at_load); 5571 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_virt_at_load); 5573 5572 5574 5573 __visible bool kvm_rebooting; 5575 - EXPORT_SYMBOL_GPL(kvm_rebooting); 5574 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); 5576 5575 5577 5576 static DEFINE_PER_CPU(bool, virtualization_enabled); 5578 5577 static DEFINE_MUTEX(kvm_usage_lock); ··· 5723 5722 --kvm_usage_count; 5724 5723 return r; 5725 5724 } 5726 - EXPORT_SYMBOL_GPL(kvm_enable_virtualization); 5725 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_virtualization); 5727 5726 5728 5727 void kvm_disable_virtualization(void) 5729 5728 { ··· 5736 5735 cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); 5737 5736 kvm_arch_disable_virtualization(); 5738 5737 } 5739 - EXPORT_SYMBOL_GPL(kvm_disable_virtualization); 5738 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_disable_virtualization); 5740 5739 5741 5740 static int kvm_init_virtualization(void) 5742 5741 { ··· 5885 5884 r = __kvm_io_bus_write(vcpu, bus, &range, val); 5886 5885 return r < 0 ? r : 0; 5887 5886 } 5888 - EXPORT_SYMBOL_GPL(kvm_io_bus_write); 5887 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_write); 5889 5888 5890 5889 int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, 5891 5890 gpa_t addr, int len, const void *val, long cookie) ··· 5954 5953 r = __kvm_io_bus_read(vcpu, bus, &range, val); 5955 5954 return r < 0 ? r : 0; 5956 5955 } 5957 - EXPORT_SYMBOL_GPL(kvm_io_bus_read); 5956 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_read); 5958 5957 5959 5958 static void __free_bus(struct rcu_head *rcu) 5960 5959 { ··· 6078 6077 6079 6078 return iodev; 6080 6079 } 6081 - EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev); 6080 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_io_bus_get_dev); 6082 6081 6083 6082 static int kvm_debugfs_open(struct inode *inode, struct file *file, 6084 6083 int (*get)(void *, u64 *), int (*set)(void *, u64), ··· 6415 6414 6416 6415 return vcpu; 6417 6416 } 6418 - EXPORT_SYMBOL_GPL(kvm_get_running_vcpu); 6417 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_running_vcpu); 6419 6418 6420 6419 /** 6421 6420 * kvm_get_running_vcpus - get the per-CPU array of currently running vcpus. ··· 6550 6549 kmem_cache_destroy(kvm_vcpu_cache); 6551 6550 return r; 6552 6551 } 6553 - EXPORT_SYMBOL_GPL(kvm_init); 6552 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init); 6554 6553 6555 6554 void kvm_exit(void) 6556 6555 { ··· 6573 6572 kvm_async_pf_deinit(); 6574 6573 kvm_irqfd_exit(); 6575 6574 } 6576 - EXPORT_SYMBOL_GPL(kvm_exit); 6575 + EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_exit);

Configure Feed

Configure Feed