Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.20

- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.

- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.

- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.

- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.

- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.

- Treat exit_code as an unsigned 64-bit value through all of KVM.

- Add support for fetching SNP certificates from userspace.

- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.

- Misc fixes and cleanups.

+559 -156
+44
Documentation/virt/kvm/api.rst
··· 7382 7382 primary storage for certain register types. Therefore, the kernel may use the 7383 7383 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. 7384 7384 7385 + :: 7386 + 7387 + /* KVM_EXIT_SNP_REQ_CERTS */ 7388 + struct kvm_exit_snp_req_certs { 7389 + __u64 gpa; 7390 + __u64 npages; 7391 + __u64 ret; 7392 + }; 7393 + 7394 + KVM_EXIT_SNP_REQ_CERTS indicates an SEV-SNP guest with certificate-fetching 7395 + enabled (see KVM_SEV_SNP_ENABLE_REQ_CERTS) has generated an Extended Guest 7396 + Request NAE #VMGEXIT (SNP_GUEST_REQUEST) with message type MSG_REPORT_REQ, 7397 + i.e. has requested an attestation report from firmware, and would like the 7398 + certificate data corresponding to the attestation report signature to be 7399 + provided by the hypervisor as part of the request. 7400 + 7401 + To allow for userspace to provide the certificate, the 'gpa' and 'npages' 7402 + are forwarded verbatim from the guest request (the RAX and RBX GHCB fields 7403 + respectively). 'ret' is not an "output" from KVM, and is always '0' on 7404 + exit. KVM verifies the 'gpa' is 4KiB aligned prior to exiting to userspace, 7405 + but otherwise the information from the guest isn't validated. 7406 + 7407 + Upon the next KVM_RUN, e.g. after userspace has serviced the request (or not), 7408 + KVM will complete the #VMGEXIT, using the 'ret' field to determine whether to 7409 + signal success or failure to the guest, and on failure, what reason code will 7410 + be communicated via SW_EXITINFO2. If 'ret' is set to an unsupported value (see 7411 + the table below), KVM_RUN will fail with -EINVAL. For a 'ret' of 'ENOSPC', KVM 7412 + also consumes the 'npages' field, i.e. userspace can use the field to inform 7413 + the guest of the number of pages needed to hold all the certificate data. 7414 + 7415 + The supported 'ret' values and their respective SW_EXITINFO2 encodings: 7416 + 7417 + ====== ============================================================= 7418 + 0 0x0, i.e. success. KVM will emit an SNP_GUEST_REQUEST command 7419 + to SNP firmware. 7420 + ENOSPC 0x0000000100000000, i.e. not enough guest pages to hold the 7421 + certificate table and certificate data. KVM will also set the 7422 + RBX field in the GHBC to 'npages'. 7423 + EAGAIN 0x0000000200000000, i.e. the host is busy and the guest should 7424 + retry the request. 7425 + EIO 0xffffffff00000000, for all other errors (this return code is 7426 + a KVM-defined hypervisor value, as allowed by the GHCB) 7427 + ====== ============================================================= 7428 + 7385 7429 7386 7430 .. _cap_enable: 7387 7431
+51 -1
Documentation/virt/kvm/x86/amd-memory-encryption.rst
··· 572 572 See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further 573 573 details on the input parameters in ``struct kvm_sev_snp_launch_finish``. 574 574 575 + 21. KVM_SEV_SNP_ENABLE_REQ_CERTS 576 + -------------------------------- 577 + 578 + The KVM_SEV_SNP_ENABLE_REQ_CERTS command will configure KVM to exit to 579 + userspace with a ``KVM_EXIT_SNP_REQ_CERTS`` exit type as part of handling 580 + a guest attestation report, which will to allow userspace to provide a 581 + certificate corresponding to the endorsement key used by firmware to sign 582 + that attestation report. 583 + 584 + Returns: 0 on success, -negative on error 585 + 586 + NOTE: The endorsement key used by firmware may change as a result of 587 + management activities like updating SEV-SNP firmware or loading new 588 + endorsement keys, so some care should be taken to keep the returned 589 + certificate data in sync with the actual endorsement key in use by 590 + firmware at the time the attestation request is sent to SNP firmware. The 591 + recommended scheme to do this is to use file locking (e.g. via fcntl()'s 592 + F_OFD_SETLK) in the following manner: 593 + 594 + - Prior to obtaining/providing certificate data as part of servicing an 595 + exit type of ``KVM_EXIT_SNP_REQ_CERTS``, the VMM should obtain a 596 + shared/read or exclusive/write lock on the certificate blob file before 597 + reading it and returning it to KVM, and continue to hold the lock until 598 + the attestation request is actually sent to firmware. To facilitate 599 + this, the VMM can set the ``immediate_exit`` flag of kvm_run just after 600 + supplying the certificate data, and just before resuming the vCPU. 601 + This will ensure the vCPU will exit again to userspace with ``-EINTR`` 602 + after it finishes fetching the attestation request from firmware, at 603 + which point the VMM can safely drop the file lock. 604 + 605 + - Tools/libraries that perform updates to SNP firmware TCB values or 606 + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, 607 + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see 608 + Documentation/virt/coco/sev-guest.rst for more details) in such a way 609 + that the certificate blob needs to be updated, should similarly take an 610 + exclusive lock on the certificate blob for the duration of any updates 611 + to endorsement keys or the certificate blob contents to ensure that 612 + VMMs using the above scheme will not return certificate blob data that 613 + is out of sync with the endorsement key used by firmware at the time 614 + the attestation request is actually issued. 615 + 616 + This scheme is recommended so that tools can use a fairly generic/natural 617 + approach to synchronizing firmware/certificate updates via file-locking, 618 + which should make it easier to maintain interoperability across 619 + tools/VMMs/vendors. 620 + 575 621 Device attribute API 576 622 ==================== 577 623 ··· 625 579 ``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm`` 626 580 device node, using group ``KVM_X86_GRP_SEV``. 627 581 628 - Currently only one attribute is implemented: 582 + The following attributes are currently implemented: 629 583 630 584 * ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that 631 585 are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``. 586 + 587 + * ``KVM_X86_SEV_SNP_REQ_CERTS``: return a value of 1 if the kernel supports the 588 + ``KVM_EXIT_SNP_REQ_CERTS`` exit, which allows for fetching endorsement key 589 + certificates from userspace for each SNP attestation request the guest issues. 632 590 633 591 Firmware Management 634 592 ===================
+1
arch/x86/include/asm/cpufeatures.h
··· 472 472 #define X86_FEATURE_GP_ON_USER_CPUID (20*32+17) /* User CPUID faulting */ 473 473 474 474 #define X86_FEATURE_PREFETCHI (20*32+20) /* Prefetch Data/Instruction to Cache Level */ 475 + #define X86_FEATURE_ERAPS (20*32+24) /* Enhanced Return Address Predictor Security */ 475 476 #define X86_FEATURE_SBPB (20*32+27) /* Selective Branch Prediction Barrier */ 476 477 #define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */ 477 478 #define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
+8
arch/x86/include/asm/kvm_host.h
··· 195 195 196 196 VCPU_EXREG_PDPTR = NR_VCPU_REGS, 197 197 VCPU_EXREG_CR0, 198 + /* 199 + * Alias AMD's ERAPS (not a real register) to CR3 so that common code 200 + * can trigger emulation of the RAP (Return Address Predictor) with 201 + * minimal support required in common code. Piggyback CR3 as the RAP 202 + * is cleared on writes to CR3, i.e. marking CR3 dirty will naturally 203 + * mark ERAPS dirty as well. 204 + */ 198 205 VCPU_EXREG_CR3, 206 + VCPU_EXREG_ERAPS = VCPU_EXREG_CR3, 199 207 VCPU_EXREG_CR4, 200 208 VCPU_EXREG_RFLAGS, 201 209 VCPU_EXREG_SEGMENTS,
+6 -3
arch/x86/include/asm/svm.h
··· 131 131 u64 tsc_offset; 132 132 u32 asid; 133 133 u8 tlb_ctl; 134 - u8 reserved_2[3]; 134 + u8 erap_ctl; 135 + u8 reserved_2[2]; 135 136 u32 int_ctl; 136 137 u32 int_vector; 137 138 u32 int_state; 138 139 u8 reserved_3[4]; 139 - u32 exit_code; 140 - u32 exit_code_hi; 140 + u64 exit_code; 141 141 u64 exit_info_1; 142 142 u64 exit_info_2; 143 143 u32 exit_int_info; ··· 181 181 #define TLB_CONTROL_FLUSH_ALL_ASID 1 182 182 #define TLB_CONTROL_FLUSH_ASID 3 183 183 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7 184 + 185 + #define ERAP_CONTROL_ALLOW_LARGER_RAP BIT(0) 186 + #define ERAP_CONTROL_CLEAR_RAP BIT(1) 184 187 185 188 #define V_TPR_MASK 0x0f 186 189
+2
arch/x86/include/uapi/asm/kvm.h
··· 503 503 #define KVM_X86_GRP_SEV 1 504 504 # define KVM_X86_SEV_VMSA_FEATURES 0 505 505 # define KVM_X86_SNP_POLICY_BITS 1 506 + # define KVM_X86_SEV_SNP_REQ_CERTS 2 506 507 507 508 struct kvm_vmx_nested_state_data { 508 509 __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; ··· 744 743 KVM_SEV_SNP_LAUNCH_START = 100, 745 744 KVM_SEV_SNP_LAUNCH_UPDATE, 746 745 KVM_SEV_SNP_LAUNCH_FINISH, 746 + KVM_SEV_SNP_ENABLE_REQ_CERTS, 747 747 748 748 KVM_SEV_NR_MAX, 749 749 };
+16 -16
arch/x86/include/uapi/asm/svm.h
··· 103 103 #define SVM_EXIT_VMGEXIT 0x403 104 104 105 105 /* SEV-ES software-defined VMGEXIT events */ 106 - #define SVM_VMGEXIT_MMIO_READ 0x80000001 107 - #define SVM_VMGEXIT_MMIO_WRITE 0x80000002 108 - #define SVM_VMGEXIT_NMI_COMPLETE 0x80000003 109 - #define SVM_VMGEXIT_AP_HLT_LOOP 0x80000004 110 - #define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005 106 + #define SVM_VMGEXIT_MMIO_READ 0x80000001ull 107 + #define SVM_VMGEXIT_MMIO_WRITE 0x80000002ull 108 + #define SVM_VMGEXIT_NMI_COMPLETE 0x80000003ull 109 + #define SVM_VMGEXIT_AP_HLT_LOOP 0x80000004ull 110 + #define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005ull 111 111 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0 112 112 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1 113 - #define SVM_VMGEXIT_PSC 0x80000010 114 - #define SVM_VMGEXIT_GUEST_REQUEST 0x80000011 115 - #define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012 116 - #define SVM_VMGEXIT_AP_CREATION 0x80000013 113 + #define SVM_VMGEXIT_PSC 0x80000010ull 114 + #define SVM_VMGEXIT_GUEST_REQUEST 0x80000011ull 115 + #define SVM_VMGEXIT_EXT_GUEST_REQUEST 0x80000012ull 116 + #define SVM_VMGEXIT_AP_CREATION 0x80000013ull 117 117 #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0 118 118 #define SVM_VMGEXIT_AP_CREATE 1 119 119 #define SVM_VMGEXIT_AP_DESTROY 2 120 - #define SVM_VMGEXIT_SNP_RUN_VMPL 0x80000018 121 - #define SVM_VMGEXIT_SAVIC 0x8000001a 120 + #define SVM_VMGEXIT_SNP_RUN_VMPL 0x80000018ull 121 + #define SVM_VMGEXIT_SAVIC 0x8000001aull 122 122 #define SVM_VMGEXIT_SAVIC_REGISTER_GPA 0 123 123 #define SVM_VMGEXIT_SAVIC_UNREGISTER_GPA 1 124 124 #define SVM_VMGEXIT_SAVIC_SELF_GPA ~0ULL 125 - #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd 126 - #define SVM_VMGEXIT_TERM_REQUEST 0x8000fffe 125 + #define SVM_VMGEXIT_HV_FEATURES 0x8000fffdull 126 + #define SVM_VMGEXIT_TERM_REQUEST 0x8000fffeull 127 127 #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code) \ 128 128 /* SW_EXITINFO1[3:0] */ \ 129 129 (((((u64)reason_set) & 0xf)) | \ 130 130 /* SW_EXITINFO1[11:4] */ \ 131 131 ((((u64)reason_code) & 0xff) << 4)) 132 - #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff 132 + #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffffull 133 133 134 134 /* Exit code reserved for hypervisor/software use */ 135 - #define SVM_EXIT_SW 0xf0000000 135 + #define SVM_EXIT_SW 0xf0000000ull 136 136 137 - #define SVM_EXIT_ERR -1 137 + #define SVM_EXIT_ERR -1ull 138 138 139 139 #define SVM_EXIT_REASONS \ 140 140 { SVM_EXIT_READ_CR0, "read_cr0" }, \
+8 -1
arch/x86/kvm/cpuid.c
··· 1223 1223 /* PrefetchCtlMsr */ 1224 1224 /* GpOnUserCpuid */ 1225 1225 /* EPSF */ 1226 + F(ERAPS), 1226 1227 SYNTHESIZED_F(SBPB), 1227 1228 SYNTHESIZED_F(IBPB_BRTYPE), 1228 1229 SYNTHESIZED_F(SRSO_NO), ··· 1804 1803 entry->eax = entry->ebx = entry->ecx = entry->edx = 0; 1805 1804 break; 1806 1805 case 0x80000021: 1807 - entry->ebx = entry->edx = 0; 1806 + entry->edx = 0; 1808 1807 cpuid_entry_override(entry, CPUID_8000_0021_EAX); 1808 + 1809 + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) 1810 + entry->ebx &= GENMASK(23, 16); 1811 + else 1812 + entry->ebx = 0; 1813 + 1809 1814 cpuid_entry_override(entry, CPUID_8000_0021_ECX); 1810 1815 break; 1811 1816 /* AMD Extended Performance Monitoring and Debug */
+2 -2
arch/x86/kvm/svm/avic.c
··· 1224 1224 * In "auto" mode, enable AVIC by default for Zen4+ if x2AVIC is 1225 1225 * supported (to avoid enabling partial support by default, and because 1226 1226 * x2AVIC should be supported by all Zen4+ CPUs). Explicitly check for 1227 - * family 0x19 and later (Zen5+), as the kernel's synthetic ZenX flags 1227 + * family 0x1A and later (Zen5+), as the kernel's synthetic ZenX flags 1228 1228 * aren't inclusive of previous generations, i.e. the kernel will set 1229 1229 * at most one ZenX feature flag. 1230 1230 */ 1231 1231 if (avic == AVIC_AUTO_MODE) 1232 1232 avic = boot_cpu_has(X86_FEATURE_X2AVIC) && 1233 - (boot_cpu_data.x86 > 0x19 || cpu_feature_enabled(X86_FEATURE_ZEN4)); 1233 + (cpu_feature_enabled(X86_FEATURE_ZEN4) || boot_cpu_data.x86 >= 0x1A); 1234 1234 1235 1235 if (!avic || !npt_enabled) 1236 1236 return false;
+6 -1
arch/x86/kvm/svm/hyperv.c
··· 10 10 { 11 11 struct vcpu_svm *svm = to_svm(vcpu); 12 12 13 + /* 14 + * The exit code used by Hyper-V for software-defined exits is reserved 15 + * by AMD specifically for such use cases. 16 + */ 17 + BUILD_BUG_ON(HV_SVM_EXITCODE_ENL != SVM_EXIT_SW); 18 + 13 19 svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL; 14 - svm->vmcb->control.exit_code_hi = 0; 15 20 svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH; 16 21 svm->vmcb->control.exit_info_2 = 0; 17 22 nested_svm_vmexit(svm);
+55 -27
arch/x86/kvm/svm/nested.c
··· 45 45 * correctly fill in the high bits of exit_info_1. 46 46 */ 47 47 vmcb->control.exit_code = SVM_EXIT_NPF; 48 - vmcb->control.exit_code_hi = 0; 49 48 vmcb->control.exit_info_1 = (1ULL << 32); 50 49 vmcb->control.exit_info_2 = fault->address; 51 50 } ··· 402 403 return __nested_vmcb_check_controls(vcpu, ctl); 403 404 } 404 405 406 + /* 407 + * If a feature is not advertised to L1, clear the corresponding vmcb12 408 + * intercept. 409 + */ 410 + #define __nested_svm_sanitize_intercept(__vcpu, __control, fname, iname) \ 411 + do { \ 412 + if (!guest_cpu_cap_has(__vcpu, X86_FEATURE_##fname)) \ 413 + vmcb12_clr_intercept(__control, INTERCEPT_##iname); \ 414 + } while (0) 415 + 416 + #define nested_svm_sanitize_intercept(__vcpu, __control, name) \ 417 + __nested_svm_sanitize_intercept(__vcpu, __control, name, name) 418 + 405 419 static 406 420 void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, 407 421 struct vmcb_ctrl_area_cached *to, ··· 425 413 for (i = 0; i < MAX_INTERCEPT; i++) 426 414 to->intercepts[i] = from->intercepts[i]; 427 415 416 + __nested_svm_sanitize_intercept(vcpu, to, XSAVE, XSETBV); 417 + nested_svm_sanitize_intercept(vcpu, to, INVPCID); 418 + nested_svm_sanitize_intercept(vcpu, to, RDTSCP); 419 + nested_svm_sanitize_intercept(vcpu, to, SKINIT); 420 + nested_svm_sanitize_intercept(vcpu, to, RDPRU); 421 + 428 422 to->iopm_base_pa = from->iopm_base_pa; 429 423 to->msrpm_base_pa = from->msrpm_base_pa; 430 424 to->tsc_offset = from->tsc_offset; 431 425 to->tlb_ctl = from->tlb_ctl; 426 + to->erap_ctl = from->erap_ctl; 432 427 to->int_ctl = from->int_ctl; 433 428 to->int_vector = from->int_vector; 434 429 to->int_state = from->int_state; 435 430 to->exit_code = from->exit_code; 436 - to->exit_code_hi = from->exit_code_hi; 437 431 to->exit_info_1 = from->exit_info_1; 438 432 to->exit_info_2 = from->exit_info_2; 439 433 to->exit_int_info = from->exit_int_info; ··· 681 663 vmcb02->save.rsp = vmcb12->save.rsp; 682 664 vmcb02->save.rip = vmcb12->save.rip; 683 665 684 - /* These bits will be set properly on the first execution when new_vmc12 is true */ 685 666 if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) { 686 667 vmcb02->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1; 687 668 svm->vcpu.arch.dr6 = svm->nested.save.dr6 | DR6_ACTIVE_LOW; ··· 744 727 enter_guest_mode(vcpu); 745 728 746 729 /* 747 - * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2, 748 - * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes. 730 + * Filled at exit: exit_code, exit_info_1, exit_info_2, exit_int_info, 731 + * exit_int_info_err, next_rip, insn_len, insn_bytes. 749 732 */ 750 733 751 734 if (guest_cpu_cap_has(vcpu, X86_FEATURE_VGIF) && ··· 884 867 } 885 868 886 869 /* 870 + * Take ALLOW_LARGER_RAP from vmcb12 even though it should be safe to 871 + * let L2 use a larger RAP since KVM will emulate the necessary clears, 872 + * as it's possible L1 deliberately wants to restrict L2 to the legacy 873 + * RAP size. Unconditionally clear the RAP on nested VMRUN, as KVM is 874 + * responsible for emulating the host vs. guest tags (L1 is the "host", 875 + * L2 is the "guest"). 876 + */ 877 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) 878 + vmcb02->control.erap_ctl = (svm->nested.ctl.erap_ctl & 879 + ERAP_CONTROL_ALLOW_LARGER_RAP) | 880 + ERAP_CONTROL_CLEAR_RAP; 881 + 882 + /* 887 883 * Merge guest and host intercepts - must be called with vcpu in 888 884 * guest-mode to take effect. 889 885 */ ··· 1015 985 if (!nested_vmcb_check_save(vcpu) || 1016 986 !nested_vmcb_check_controls(vcpu)) { 1017 987 vmcb12->control.exit_code = SVM_EXIT_ERR; 1018 - vmcb12->control.exit_code_hi = -1u; 1019 988 vmcb12->control.exit_info_1 = 0; 1020 989 vmcb12->control.exit_info_2 = 0; 1021 990 goto out; ··· 1047 1018 svm->soft_int_injected = false; 1048 1019 1049 1020 svm->vmcb->control.exit_code = SVM_EXIT_ERR; 1050 - svm->vmcb->control.exit_code_hi = -1u; 1051 1021 svm->vmcb->control.exit_info_1 = 0; 1052 1022 svm->vmcb->control.exit_info_2 = 0; 1053 1023 ··· 1158 1130 1159 1131 vmcb12->control.int_state = vmcb02->control.int_state; 1160 1132 vmcb12->control.exit_code = vmcb02->control.exit_code; 1161 - vmcb12->control.exit_code_hi = vmcb02->control.exit_code_hi; 1162 1133 vmcb12->control.exit_info_1 = vmcb02->control.exit_info_1; 1163 1134 vmcb12->control.exit_info_2 = vmcb02->control.exit_info_2; 1164 1135 1165 - if (vmcb12->control.exit_code != SVM_EXIT_ERR) 1136 + if (!svm_is_vmrun_failure(vmcb12->control.exit_code)) 1166 1137 nested_save_pending_event_to_vmcb12(svm, vmcb12); 1167 1138 1168 1139 if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) ··· 1187 1160 nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); 1188 1161 1189 1162 kvm_nested_vmexit_handle_ibrs(vcpu); 1163 + 1164 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) 1165 + vmcb01->control.erap_ctl |= ERAP_CONTROL_CLEAR_RAP; 1190 1166 1191 1167 svm_switch_vmcb(svm, &svm->vmcb01); 1192 1168 ··· 1393 1363 nested_svm_uninit_mmu_context(vcpu); 1394 1364 vmcb_mark_all_dirty(svm->vmcb); 1395 1365 1366 + svm_set_gif(svm, true); 1367 + 1396 1368 if (kvm_apicv_activated(vcpu->kvm)) 1397 1369 kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); 1398 1370 } ··· 1454 1422 1455 1423 static int nested_svm_intercept(struct vcpu_svm *svm) 1456 1424 { 1457 - u32 exit_code = svm->vmcb->control.exit_code; 1425 + u64 exit_code = svm->vmcb->control.exit_code; 1458 1426 int vmexit = NESTED_EXIT_HOST; 1427 + 1428 + if (svm_is_vmrun_failure(exit_code)) 1429 + return NESTED_EXIT_DONE; 1459 1430 1460 1431 switch (exit_code) { 1461 1432 case SVM_EXIT_MSR: ··· 1467 1432 case SVM_EXIT_IOIO: 1468 1433 vmexit = nested_svm_intercept_ioio(svm); 1469 1434 break; 1470 - case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: { 1435 + case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: 1471 1436 /* 1472 1437 * Host-intercepted exceptions have been checked already in 1473 1438 * nested_svm_exit_special. There is nothing to do here, ··· 1475 1440 */ 1476 1441 vmexit = NESTED_EXIT_DONE; 1477 1442 break; 1478 - } 1479 - case SVM_EXIT_ERR: { 1480 - vmexit = NESTED_EXIT_DONE; 1481 - break; 1482 - } 1483 - default: { 1443 + default: 1484 1444 if (vmcb12_is_intercept(&svm->nested.ctl, exit_code)) 1485 1445 vmexit = NESTED_EXIT_DONE; 1486 - } 1446 + break; 1487 1447 } 1488 1448 1489 1449 return vmexit; ··· 1526 1496 struct vmcb *vmcb = svm->vmcb; 1527 1497 1528 1498 vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + ex->vector; 1529 - vmcb->control.exit_code_hi = 0; 1530 1499 1531 1500 if (ex->has_error_code) 1532 1501 vmcb->control.exit_info_1 = ex->error_code; ··· 1696 1667 dst->tsc_offset = from->tsc_offset; 1697 1668 dst->asid = from->asid; 1698 1669 dst->tlb_ctl = from->tlb_ctl; 1670 + dst->erap_ctl = from->erap_ctl; 1699 1671 dst->int_ctl = from->int_ctl; 1700 1672 dst->int_vector = from->int_vector; 1701 1673 dst->int_state = from->int_state; 1702 1674 dst->exit_code = from->exit_code; 1703 - dst->exit_code_hi = from->exit_code_hi; 1704 1675 dst->exit_info_1 = from->exit_info_1; 1705 1676 dst->exit_info_2 = from->exit_info_2; 1706 1677 dst->exit_int_info = from->exit_int_info; ··· 1811 1782 /* 1812 1783 * If in guest mode, vcpu->arch.efer actually refers to the L2 guest's 1813 1784 * EFER.SVME, but EFER.SVME still has to be 1 for VMRUN to succeed. 1785 + * If SVME is disabled, the only valid states are "none" and GIF=1 1786 + * (clearing SVME does NOT set GIF, i.e. GIF=0 is allowed). 1814 1787 */ 1815 - if (!(vcpu->arch.efer & EFER_SVME)) { 1816 - /* GIF=1 and no guest mode are required if SVME=0. */ 1817 - if (kvm_state->flags != KVM_STATE_NESTED_GIF_SET) 1818 - return -EINVAL; 1819 - } 1788 + if (!(vcpu->arch.efer & EFER_SVME) && kvm_state->flags && 1789 + kvm_state->flags != KVM_STATE_NESTED_GIF_SET) 1790 + return -EINVAL; 1820 1791 1821 1792 /* SMM temporarily disables SVM, so we cannot be in guest mode. */ 1822 1793 if (is_smm(vcpu) && (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) ··· 1899 1870 * thus MMU might not be initialized correctly. 1900 1871 * Set it again to fix this. 1901 1872 */ 1902 - 1903 1873 ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3, 1904 1874 nested_npt_enabled(svm), false); 1905 - if (WARN_ON_ONCE(ret)) 1875 + if (ret) 1906 1876 goto out_free; 1907 1877 1908 1878 svm->nested.force_msr_bitmap_recalc = true;
+89 -40
arch/x86/kvm/svm/sev.c
··· 41 41 42 42 #define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION) 43 43 44 + /* 45 + * The GHCB spec essentially states that all non-zero error codes other than 46 + * those explicitly defined above should be treated as an error by the guest. 47 + * Define a generic error to cover that case, and choose a value that is not 48 + * likely to overlap with new explicit error codes should more be added to 49 + * the GHCB spec later. KVM will use this to report generic errors when 50 + * handling SNP guest requests. 51 + */ 52 + #define SNP_GUEST_VMM_ERR_GENERIC (~0U) 53 + 44 54 /* enable/disable SEV support */ 45 55 static bool sev_enabled = true; 46 56 module_param_named(sev, sev_enabled, bool, 0444); ··· 62 52 /* enable/disable SEV-SNP support */ 63 53 static bool sev_snp_enabled = true; 64 54 module_param_named(sev_snp, sev_snp_enabled, bool, 0444); 65 - 66 - /* enable/disable SEV-ES DebugSwap support */ 67 - static bool sev_es_debug_swap_enabled = true; 68 - module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444); 69 - static u64 sev_supported_vmsa_features; 70 55 71 56 static unsigned int nr_ciphertext_hiding_asids; 72 57 module_param_named(ciphertext_hiding_asids, nr_ciphertext_hiding_asids, uint, 0444); ··· 88 83 SNP_POLICY_MASK_PAGE_SWAP_DISABLE) 89 84 90 85 static u64 snp_supported_policy_bits __ro_after_init; 86 + 87 + static u64 sev_supported_vmsa_features __ro_after_init; 91 88 92 89 #define INITIAL_VMSA_GPA 0xFFFFFFFFF000 93 90 ··· 2158 2151 *val = snp_supported_policy_bits; 2159 2152 return 0; 2160 2153 2154 + case KVM_X86_SEV_SNP_REQ_CERTS: 2155 + *val = sev_snp_enabled ? 1 : 0; 2156 + return 0; 2161 2157 default: 2162 2158 return -ENXIO; 2163 2159 } ··· 2577 2567 return ret; 2578 2568 } 2579 2569 2570 + static int snp_enable_certs(struct kvm *kvm) 2571 + { 2572 + if (kvm->created_vcpus || !sev_snp_guest(kvm)) 2573 + return -EINVAL; 2574 + 2575 + to_kvm_sev_info(kvm)->snp_certs_enabled = true; 2576 + 2577 + return 0; 2578 + } 2579 + 2580 2580 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp) 2581 2581 { 2582 2582 struct kvm_sev_cmd sev_cmd; ··· 2691 2671 break; 2692 2672 case KVM_SEV_SNP_LAUNCH_FINISH: 2693 2673 r = snp_launch_finish(kvm, &sev_cmd); 2674 + break; 2675 + case KVM_SEV_SNP_ENABLE_REQ_CERTS: 2676 + r = snp_enable_certs(kvm); 2694 2677 break; 2695 2678 default: 2696 2679 r = -EINVAL; ··· 3173 3150 sev_es_enabled = sev_es_supported; 3174 3151 sev_snp_enabled = sev_snp_supported; 3175 3152 3176 - if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) || 3177 - !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP)) 3178 - sev_es_debug_swap_enabled = false; 3179 - 3180 3153 sev_supported_vmsa_features = 0; 3181 - if (sev_es_debug_swap_enabled) 3154 + 3155 + if (sev_es_enabled && cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) && 3156 + cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP)) 3182 3157 sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP; 3183 3158 3184 3159 if (sev_snp_enabled && tsc_khz && cpu_feature_enabled(X86_FEATURE_SNP_SECURE_TSC)) ··· 3296 3275 kvfree(svm->sev_es.ghcb_sa); 3297 3276 } 3298 3277 3299 - static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control) 3300 - { 3301 - return (((u64)control->exit_code_hi) << 32) | control->exit_code; 3302 - } 3303 - 3304 3278 static void dump_ghcb(struct vcpu_svm *svm) 3305 3279 { 3306 3280 struct vmcb_control_area *control = &svm->vmcb->control; ··· 3317 3301 */ 3318 3302 pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa); 3319 3303 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code", 3320 - kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm)); 3304 + control->exit_code, kvm_ghcb_sw_exit_code_is_valid(svm)); 3321 3305 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1", 3322 3306 control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm)); 3323 3307 pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2", ··· 3351 3335 struct vmcb_control_area *control = &svm->vmcb->control; 3352 3336 struct kvm_vcpu *vcpu = &svm->vcpu; 3353 3337 struct ghcb *ghcb = svm->sev_es.ghcb; 3354 - u64 exit_code; 3355 3338 3356 3339 /* 3357 3340 * The GHCB protocol so far allows for the following data ··· 3384 3369 __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm)); 3385 3370 3386 3371 /* Copy the GHCB exit information into the VMCB fields */ 3387 - exit_code = kvm_ghcb_get_sw_exit_code(svm); 3388 - control->exit_code = lower_32_bits(exit_code); 3389 - control->exit_code_hi = upper_32_bits(exit_code); 3372 + control->exit_code = kvm_ghcb_get_sw_exit_code(svm); 3390 3373 control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(svm); 3391 3374 control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(svm); 3392 3375 svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm); ··· 3397 3384 { 3398 3385 struct vmcb_control_area *control = &svm->vmcb->control; 3399 3386 struct kvm_vcpu *vcpu = &svm->vcpu; 3400 - u64 exit_code; 3401 3387 u64 reason; 3402 - 3403 - /* 3404 - * Retrieve the exit code now even though it may not be marked valid 3405 - * as it could help with debugging. 3406 - */ 3407 - exit_code = kvm_get_cached_sw_exit_code(control); 3408 3388 3409 3389 /* Only GHCB Usage code 0 is supported */ 3410 3390 if (svm->sev_es.ghcb->ghcb_usage) { ··· 3412 3406 !kvm_ghcb_sw_exit_info_2_is_valid(svm)) 3413 3407 goto vmgexit_err; 3414 3408 3415 - switch (exit_code) { 3409 + switch (control->exit_code) { 3416 3410 case SVM_EXIT_READ_DR7: 3417 3411 break; 3418 3412 case SVM_EXIT_WRITE_DR7: ··· 3513 3507 return 0; 3514 3508 3515 3509 vmgexit_err: 3510 + /* 3511 + * Print the exit code even though it may not be marked valid as it 3512 + * could help with debugging. 3513 + */ 3516 3514 if (reason == GHCB_ERR_INVALID_USAGE) { 3517 3515 vcpu_unimpl(vcpu, "vmgexit: ghcb usage %#x is not valid\n", 3518 3516 svm->sev_es.ghcb->ghcb_usage); 3519 3517 } else if (reason == GHCB_ERR_INVALID_EVENT) { 3520 3518 vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n", 3521 - exit_code); 3519 + control->exit_code); 3522 3520 } else { 3523 3521 vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n", 3524 - exit_code); 3522 + control->exit_code); 3525 3523 dump_ghcb(svm); 3526 3524 } 3527 3525 ··· 4165 4155 return ret; 4166 4156 } 4167 4157 4158 + static int snp_req_certs_err(struct vcpu_svm *svm, u32 vmm_error) 4159 + { 4160 + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_error, 0)); 4161 + 4162 + return 1; /* resume guest */ 4163 + } 4164 + 4165 + static int snp_complete_req_certs(struct kvm_vcpu *vcpu) 4166 + { 4167 + struct vcpu_svm *svm = to_svm(vcpu); 4168 + struct vmcb_control_area *control = &svm->vmcb->control; 4169 + 4170 + switch (READ_ONCE(vcpu->run->snp_req_certs.ret)) { 4171 + case 0: 4172 + return snp_handle_guest_req(svm, control->exit_info_1, 4173 + control->exit_info_2); 4174 + case ENOSPC: 4175 + vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->snp_req_certs.npages; 4176 + return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_INVALID_LEN); 4177 + case EAGAIN: 4178 + return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_BUSY); 4179 + case EIO: 4180 + return snp_req_certs_err(svm, SNP_GUEST_VMM_ERR_GENERIC); 4181 + default: 4182 + break; 4183 + } 4184 + 4185 + return -EINVAL; 4186 + } 4187 + 4168 4188 static int snp_handle_ext_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa) 4169 4189 { 4170 4190 struct kvm *kvm = svm->vcpu.kvm; ··· 4210 4170 /* 4211 4171 * As per GHCB spec, requests of type MSG_REPORT_REQ also allow for 4212 4172 * additional certificate data to be provided alongside the attestation 4213 - * report via the guest-provided data pages indicated by RAX/RBX. The 4214 - * certificate data is optional and requires additional KVM enablement 4215 - * to provide an interface for userspace to provide it, but KVM still 4216 - * needs to be able to handle extended guest requests either way. So 4217 - * provide a stub implementation that will always return an empty 4218 - * certificate table in the guest-provided data pages. 4173 + * report via the guest-provided data pages indicated by RAX/RBX. If 4174 + * userspace enables KVM_EXIT_SNP_REQ_CERTS, then exit to userspace 4175 + * to give userspace an opportunity to provide the certificate data 4176 + * before issuing/completing the attestation request. Otherwise, return 4177 + * an empty certificate table in the guest-provided data pages and 4178 + * handle the attestation request immediately. 4219 4179 */ 4220 4180 if (msg_type == SNP_MSG_REPORT_REQ) { 4181 + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; 4221 4182 struct kvm_vcpu *vcpu = &svm->vcpu; 4222 4183 u64 data_npages; 4223 4184 gpa_t data_gpa; ··· 4231 4190 4232 4191 if (!PAGE_ALIGNED(data_gpa)) 4233 4192 goto request_invalid; 4193 + 4194 + if (sev->snp_certs_enabled) { 4195 + vcpu->run->exit_reason = KVM_EXIT_SNP_REQ_CERTS; 4196 + vcpu->run->snp_req_certs.gpa = data_gpa; 4197 + vcpu->run->snp_req_certs.npages = data_npages; 4198 + vcpu->run->snp_req_certs.ret = 0; 4199 + vcpu->arch.complete_userspace_io = snp_complete_req_certs; 4200 + return 0; 4201 + } 4234 4202 4235 4203 /* 4236 4204 * As per GHCB spec (see "SNP Extended Guest Request"), the ··· 4404 4354 { 4405 4355 struct vcpu_svm *svm = to_svm(vcpu); 4406 4356 struct vmcb_control_area *control = &svm->vmcb->control; 4407 - u64 ghcb_gpa, exit_code; 4357 + u64 ghcb_gpa; 4408 4358 int ret; 4409 4359 4410 4360 /* Validate the GHCB */ ··· 4446 4396 4447 4397 svm_vmgexit_success(svm, 0); 4448 4398 4449 - exit_code = kvm_get_cached_sw_exit_code(control); 4450 - switch (exit_code) { 4399 + switch (control->exit_code) { 4451 4400 case SVM_VMGEXIT_MMIO_READ: 4452 4401 ret = setup_vmgexit_scratch(svm, true, control->exit_info_2); 4453 4402 if (ret) ··· 4538 4489 ret = -EINVAL; 4539 4490 break; 4540 4491 default: 4541 - ret = svm_invoke_exit_handler(vcpu, exit_code); 4492 + ret = svm_invoke_exit_handler(vcpu, control->exit_code); 4542 4493 } 4543 4494 4544 4495 return ret;
+89 -32
arch/x86/kvm/svm/svm.c
··· 215 215 if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) { 216 216 if (!(efer & EFER_SVME)) { 217 217 svm_leave_nested(vcpu); 218 - svm_set_gif(svm, true); 219 218 /* #GP intercept is still needed for vmware backdoor */ 220 219 if (!enable_vmware_backdoor) 221 220 clr_exception_intercept(svm, GP_VECTOR); ··· 995 996 svm_set_intercept(svm, INTERCEPT_RDTSCP); 996 997 } 997 998 999 + /* 1000 + * No need to toggle VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK here, it is 1001 + * always set if vls is enabled. If the intercepts are set, the bit is 1002 + * meaningless anyway. 1003 + */ 998 1004 if (guest_cpuid_is_intel_compatible(vcpu)) { 999 1005 svm_set_intercept(svm, INTERCEPT_VMLOAD); 1000 1006 svm_set_intercept(svm, INTERCEPT_VMSAVE); 1001 - svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 1002 1007 } else { 1003 1008 /* 1004 1009 * If hardware supports Virtual VMLOAD VMSAVE then enable it ··· 1011 1008 if (vls) { 1012 1009 svm_clr_intercept(svm, INTERCEPT_VMLOAD); 1013 1010 svm_clr_intercept(svm, INTERCEPT_VMSAVE); 1014 - svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 1015 1011 } 1016 1012 } 1017 1013 } ··· 1143 1141 svm_clr_intercept(svm, INTERCEPT_PAUSE); 1144 1142 } 1145 1143 1144 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) 1145 + svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; 1146 + 1146 1147 if (kvm_vcpu_apicv_active(vcpu)) 1147 1148 avic_init_vmcb(svm, vmcb); 1148 1149 ··· 1157 1152 svm_clr_intercept(svm, INTERCEPT_CLGI); 1158 1153 svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK; 1159 1154 } 1155 + 1156 + if (vls) 1157 + svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 1160 1158 1161 1159 if (vcpu->kvm->arch.bus_lock_detection_enabled) 1162 1160 svm_set_intercept(svm, INTERCEPT_BUSLOCK); ··· 1870 1862 svm->vmcb->control.insn_len); 1871 1863 } 1872 1864 1865 + static int svm_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, 1866 + void *insn, int insn_len); 1867 + 1873 1868 static int npf_interception(struct kvm_vcpu *vcpu) 1874 1869 { 1875 1870 struct vcpu_svm *svm = to_svm(vcpu); 1876 1871 int rc; 1877 1872 1878 - u64 fault_address = svm->vmcb->control.exit_info_2; 1879 1873 u64 error_code = svm->vmcb->control.exit_info_1; 1874 + gpa_t gpa = svm->vmcb->control.exit_info_2; 1880 1875 1881 1876 /* 1882 1877 * WARN if hardware generates a fault with an error code that collides ··· 1890 1879 if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK)) 1891 1880 error_code &= ~PFERR_SYNTHETIC_MASK; 1892 1881 1882 + /* 1883 + * Expedite fast MMIO kicks if the next RIP is known and KVM is allowed 1884 + * emulate a page fault, e.g. skipping the current instruction is wrong 1885 + * if the #NPF occurred while vectoring an event. 1886 + */ 1887 + if ((error_code & PFERR_RSVD_MASK) && !is_guest_mode(vcpu)) { 1888 + const int emul_type = EMULTYPE_PF | EMULTYPE_NO_DECODE; 1889 + 1890 + if (svm_check_emulate_instruction(vcpu, emul_type, NULL, 0)) 1891 + return 1; 1892 + 1893 + if (nrips && svm->vmcb->control.next_rip && 1894 + !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { 1895 + trace_kvm_fast_mmio(gpa); 1896 + return kvm_skip_emulated_instruction(vcpu); 1897 + } 1898 + } 1899 + 1893 1900 if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK)) 1894 1901 error_code |= PFERR_PRIVATE_ACCESS; 1895 1902 1896 - trace_kvm_page_fault(vcpu, fault_address, error_code); 1897 - rc = kvm_mmu_page_fault(vcpu, fault_address, error_code, 1903 + trace_kvm_page_fault(vcpu, gpa, error_code); 1904 + rc = kvm_mmu_page_fault(vcpu, gpa, error_code, 1898 1905 static_cpu_has(X86_FEATURE_DECODEASSISTS) ? 1899 1906 svm->vmcb->control.insn_bytes : NULL, 1900 1907 svm->vmcb->control.insn_len); 1901 1908 1902 1909 if (rc > 0 && error_code & PFERR_GUEST_RMP_MASK) 1903 - sev_handle_rmp_fault(vcpu, fault_address, error_code); 1910 + sev_handle_rmp_fault(vcpu, gpa, error_code); 1904 1911 1905 1912 return rc; 1906 1913 } ··· 2128 2099 2129 2100 ret = kvm_skip_emulated_instruction(vcpu); 2130 2101 2102 + /* KVM always performs VMLOAD/VMSAVE on VMCB01 (see __svm_vcpu_run()) */ 2131 2103 if (vmload) { 2132 - svm_copy_vmloadsave_state(svm->vmcb, vmcb12); 2104 + svm_copy_vmloadsave_state(svm->vmcb01.ptr, vmcb12); 2133 2105 svm->sysenter_eip_hi = 0; 2134 2106 svm->sysenter_esp_hi = 0; 2135 2107 } else { 2136 - svm_copy_vmloadsave_state(vmcb12, svm->vmcb); 2108 + svm_copy_vmloadsave_state(vmcb12, svm->vmcb01.ptr); 2137 2109 } 2138 2110 2139 2111 kvm_vcpu_unmap(vcpu, &map); ··· 2473 2443 2474 2444 if (cr0 ^ val) { 2475 2445 svm->vmcb->control.exit_code = SVM_EXIT_CR0_SEL_WRITE; 2476 - svm->vmcb->control.exit_code_hi = 0; 2477 2446 ret = (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE); 2478 2447 } 2479 2448 ··· 3301 3272 pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset); 3302 3273 pr_err("%-20s%d\n", "asid:", control->asid); 3303 3274 pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl); 3275 + pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl); 3304 3276 pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl); 3305 3277 pr_err("%-20s%08x\n", "int_vector:", control->int_vector); 3306 3278 pr_err("%-20s%08x\n", "int_state:", control->int_state); 3307 - pr_err("%-20s%08x\n", "exit_code:", control->exit_code); 3279 + pr_err("%-20s%016llx\n", "exit_code:", control->exit_code); 3308 3280 pr_err("%-20s%016llx\n", "exit_info1:", control->exit_info_1); 3309 3281 pr_err("%-20s%016llx\n", "exit_info2:", control->exit_info_2); 3310 3282 pr_err("%-20s%08x\n", "exit_int_info:", control->exit_int_info); ··· 3473 3443 sev_free_decrypted_vmsa(vcpu, save); 3474 3444 } 3475 3445 3476 - static bool svm_check_exit_valid(u64 exit_code) 3446 + int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 __exit_code) 3477 3447 { 3478 - return (exit_code < ARRAY_SIZE(svm_exit_handlers) && 3479 - svm_exit_handlers[exit_code]); 3480 - } 3448 + u32 exit_code = __exit_code; 3481 3449 3482 - static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code) 3483 - { 3484 - dump_vmcb(vcpu); 3485 - kvm_prepare_unexpected_reason_exit(vcpu, exit_code); 3486 - return 0; 3487 - } 3488 - 3489 - int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) 3490 - { 3491 - if (!svm_check_exit_valid(exit_code)) 3492 - return svm_handle_invalid_exit(vcpu, exit_code); 3450 + /* 3451 + * SVM uses negative values, i.e. 64-bit values, to indicate that VMRUN 3452 + * failed. Report all such errors to userspace (note, VMEXIT_INVALID, 3453 + * a.k.a. SVM_EXIT_ERR, is special cased by svm_handle_exit()). Skip 3454 + * the check when running as a VM, as KVM has historically left garbage 3455 + * in bits 63:32, i.e. running KVM-on-KVM would hit false positives if 3456 + * the underlying kernel is buggy. 3457 + */ 3458 + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && 3459 + (u64)exit_code != __exit_code) 3460 + goto unexpected_vmexit; 3493 3461 3494 3462 #ifdef CONFIG_MITIGATION_RETPOLINE 3495 3463 if (exit_code == SVM_EXIT_MSR) ··· 3505 3477 return sev_handle_vmgexit(vcpu); 3506 3478 #endif 3507 3479 #endif 3480 + if (exit_code >= ARRAY_SIZE(svm_exit_handlers)) 3481 + goto unexpected_vmexit; 3482 + 3483 + exit_code = array_index_nospec(exit_code, ARRAY_SIZE(svm_exit_handlers)); 3484 + if (!svm_exit_handlers[exit_code]) 3485 + goto unexpected_vmexit; 3486 + 3508 3487 return svm_exit_handlers[exit_code](vcpu); 3488 + 3489 + unexpected_vmexit: 3490 + dump_vmcb(vcpu); 3491 + kvm_prepare_unexpected_reason_exit(vcpu, __exit_code); 3492 + return 0; 3509 3493 } 3510 3494 3511 3495 static void svm_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, ··· 3556 3516 { 3557 3517 struct vcpu_svm *svm = to_svm(vcpu); 3558 3518 struct kvm_run *kvm_run = vcpu->run; 3559 - u32 exit_code = svm->vmcb->control.exit_code; 3560 3519 3561 3520 /* SEV-ES guests must use the CR write traps to track CR registers. */ 3562 3521 if (!sev_es_guest(vcpu->kvm)) { ··· 3579 3540 return 1; 3580 3541 } 3581 3542 3582 - if (svm->vmcb->control.exit_code == SVM_EXIT_ERR) { 3543 + if (svm_is_vmrun_failure(svm->vmcb->control.exit_code)) { 3583 3544 kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY; 3584 3545 kvm_run->fail_entry.hardware_entry_failure_reason 3585 3546 = svm->vmcb->control.exit_code; ··· 3591 3552 if (exit_fastpath != EXIT_FASTPATH_NONE) 3592 3553 return 1; 3593 3554 3594 - return svm_invoke_exit_handler(vcpu, exit_code); 3555 + return svm_invoke_exit_handler(vcpu, svm->vmcb->control.exit_code); 3595 3556 } 3596 3557 3597 3558 static int pre_svm_run(struct kvm_vcpu *vcpu) ··· 4022 3983 invlpga(gva, svm->vmcb->control.asid); 4023 3984 } 4024 3985 3986 + static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu) 3987 + { 3988 + kvm_register_mark_dirty(vcpu, VCPU_EXREG_ERAPS); 3989 + 3990 + svm_flush_tlb_asid(vcpu); 3991 + } 3992 + 4025 3993 static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu) 4026 3994 { 4027 3995 struct vcpu_svm *svm = to_svm(vcpu); ··· 4287 4241 } 4288 4242 svm->vmcb->save.cr2 = vcpu->arch.cr2; 4289 4243 4244 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS) && 4245 + kvm_register_is_dirty(vcpu, VCPU_EXREG_ERAPS)) 4246 + svm->vmcb->control.erap_ctl |= ERAP_CONTROL_CLEAR_RAP; 4247 + 4290 4248 svm_hv_update_vp_id(svm->vmcb, vcpu); 4291 4249 4292 4250 /* ··· 4361 4311 4362 4312 /* Track VMRUNs that have made past consistency checking */ 4363 4313 if (svm->nested.nested_run_pending && 4364 - svm->vmcb->control.exit_code != SVM_EXIT_ERR) 4314 + !svm_is_vmrun_failure(svm->vmcb->control.exit_code)) 4365 4315 ++vcpu->stat.nested_run; 4366 4316 4367 4317 svm->nested.nested_run_pending = 0; 4368 4318 } 4369 4319 4370 4320 svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING; 4321 + 4322 + /* 4323 + * Unconditionally mask off the CLEAR_RAP bit, the AND is just as cheap 4324 + * as the TEST+Jcc to avoid it. 4325 + */ 4326 + if (cpu_feature_enabled(X86_FEATURE_ERAPS)) 4327 + svm->vmcb->control.erap_ctl &= ~ERAP_CONTROL_CLEAR_RAP; 4328 + 4371 4329 vmcb_mark_all_clean(svm->vmcb); 4372 4330 4373 4331 /* if exit due to PF check for async PF */ ··· 4676 4618 if (static_cpu_has(X86_FEATURE_NRIPS)) 4677 4619 vmcb->control.next_rip = info->next_rip; 4678 4620 vmcb->control.exit_code = icpt_info.exit_code; 4679 - vmcb->control.exit_code_hi = 0; 4680 4621 vmexit = nested_svm_exit_handled(svm); 4681 4622 4682 4623 ret = (vmexit == NESTED_EXIT_DONE) ? X86EMUL_INTERCEPTED ··· 5130 5073 .flush_tlb_all = svm_flush_tlb_all, 5131 5074 .flush_tlb_current = svm_flush_tlb_current, 5132 5075 .flush_tlb_gva = svm_flush_tlb_gva, 5133 - .flush_tlb_guest = svm_flush_tlb_asid, 5076 + .flush_tlb_guest = svm_flush_tlb_guest, 5134 5077 5135 5078 .vcpu_pre_run = svm_vcpu_pre_run, 5136 5079 .vcpu_run = svm_vcpu_run,
+38 -11
arch/x86/kvm/svm/svm.h
··· 115 115 void *guest_resp_buf; /* Bounce buffer for SNP Guest Request output */ 116 116 struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */ 117 117 cpumask_var_t have_run_cpus; /* CPUs that have done VMRUN for this VM. */ 118 + bool snp_certs_enabled; /* SNP certificate-fetching support. */ 118 119 }; 119 120 120 121 struct kvm_svm { ··· 157 156 u64 tsc_offset; 158 157 u32 asid; 159 158 u8 tlb_ctl; 159 + u8 erap_ctl; 160 160 u32 int_ctl; 161 161 u32 int_vector; 162 162 u32 int_state; 163 - u32 exit_code; 164 - u32 exit_code_hi; 163 + u64 exit_code; 165 164 u64 exit_info_1; 166 165 u64 exit_info_2; 167 166 u32 exit_int_info; ··· 425 424 return container_of(vcpu, struct vcpu_svm, vcpu); 426 425 } 427 426 427 + static inline bool svm_is_vmrun_failure(u64 exit_code) 428 + { 429 + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) 430 + return (u32)exit_code == (u32)SVM_EXIT_ERR; 431 + 432 + return exit_code == SVM_EXIT_ERR; 433 + } 434 + 428 435 /* 429 436 * Only the PDPTRs are loaded on demand into the shadow MMU. All other 430 437 * fields are synchronized on VM-Exit, because accessing the VMCB is cheap. ··· 443 434 */ 444 435 #define SVM_REGS_LAZY_LOAD_SET (1 << VCPU_EXREG_PDPTR) 445 436 446 - static inline void vmcb_set_intercept(struct vmcb_control_area *control, u32 bit) 437 + static inline void __vmcb_set_intercept(unsigned long *intercepts, u32 bit) 447 438 { 448 439 WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 449 - __set_bit(bit, (unsigned long *)&control->intercepts); 440 + __set_bit(bit, intercepts); 441 + } 442 + 443 + static inline void __vmcb_clr_intercept(unsigned long *intercepts, u32 bit) 444 + { 445 + WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 446 + __clear_bit(bit, intercepts); 447 + } 448 + 449 + static inline bool __vmcb_is_intercept(unsigned long *intercepts, u32 bit) 450 + { 451 + WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 452 + return test_bit(bit, intercepts); 453 + } 454 + 455 + static inline void vmcb_set_intercept(struct vmcb_control_area *control, u32 bit) 456 + { 457 + __vmcb_set_intercept((unsigned long *)&control->intercepts, bit); 450 458 } 451 459 452 460 static inline void vmcb_clr_intercept(struct vmcb_control_area *control, u32 bit) 453 461 { 454 - WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 455 - __clear_bit(bit, (unsigned long *)&control->intercepts); 462 + __vmcb_clr_intercept((unsigned long *)&control->intercepts, bit); 456 463 } 457 464 458 465 static inline bool vmcb_is_intercept(struct vmcb_control_area *control, u32 bit) 459 466 { 460 - WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 461 - return test_bit(bit, (unsigned long *)&control->intercepts); 467 + return __vmcb_is_intercept((unsigned long *)&control->intercepts, bit); 468 + } 469 + 470 + static inline void vmcb12_clr_intercept(struct vmcb_ctrl_area_cached *control, u32 bit) 471 + { 472 + __vmcb_clr_intercept((unsigned long *)&control->intercepts, bit); 462 473 } 463 474 464 475 static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u32 bit) 465 476 { 466 - WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT); 467 - return test_bit(bit, (unsigned long *)&control->intercepts); 477 + return __vmcb_is_intercept((unsigned long *)&control->intercepts, bit); 468 478 } 469 479 470 480 static inline void set_exception_intercept(struct vcpu_svm *svm, u32 bit) ··· 790 762 static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code) 791 763 { 792 764 svm->vmcb->control.exit_code = exit_code; 793 - svm->vmcb->control.exit_code_hi = 0; 794 765 svm->vmcb->control.exit_info_1 = 0; 795 766 svm->vmcb->control.exit_info_2 = 0; 796 767 return nested_svm_vmexit(svm);
+3 -3
arch/x86/kvm/trace.h
··· 383 383 #define kvm_print_exit_reason(exit_reason, isa) \ 384 384 (isa == KVM_ISA_VMX) ? \ 385 385 __print_symbolic(exit_reason & 0xffff, VMX_EXIT_REASONS) : \ 386 - __print_symbolic(exit_reason, SVM_EXIT_REASONS), \ 386 + __print_symbolic_u64(exit_reason, SVM_EXIT_REASONS), \ 387 387 (isa == KVM_ISA_VMX && exit_reason & ~0xffff) ? " " : "", \ 388 388 (isa == KVM_ISA_VMX) ? \ 389 - __print_flags(exit_reason & ~0xffff, " ", VMX_EXIT_REASON_FLAGS) : "" 389 + __print_flags_u64(exit_reason & ~0xffff, " ", VMX_EXIT_REASON_FLAGS) : "" 390 390 391 391 #define TRACE_EVENT_KVM_EXIT(name) \ 392 392 TRACE_EVENT(name, \ ··· 781 781 * Tracepoint for #VMEXIT reinjected to the guest 782 782 */ 783 783 TRACE_EVENT(kvm_nested_vmexit_inject, 784 - TP_PROTO(__u32 exit_code, 784 + TP_PROTO(__u64 exit_code, 785 785 __u64 exit_info1, __u64 exit_info2, 786 786 __u32 exit_int_info, __u32 exit_int_info_err, __u32 isa), 787 787 TP_ARGS(exit_code, exit_info1, exit_info2,
+12
arch/x86/kvm/x86.c
··· 14143 14143 return 1; 14144 14144 } 14145 14145 14146 + /* 14147 + * When ERAPS is supported, invalidating a specific PCID clears 14148 + * the RAP (Return Address Predicator). 14149 + */ 14150 + if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) 14151 + kvm_register_is_dirty(vcpu, VCPU_EXREG_ERAPS); 14152 + 14146 14153 kvm_invalidate_pcid(vcpu, operand.pcid); 14147 14154 return kvm_skip_emulated_instruction(vcpu); 14148 14155 ··· 14163 14156 14164 14157 fallthrough; 14165 14158 case INVPCID_TYPE_ALL_INCL_GLOBAL: 14159 + /* 14160 + * Don't bother marking VCPU_EXREG_ERAPS dirty, SVM will take 14161 + * care of doing so when emulating the full guest TLB flush 14162 + * (the RAP is cleared on all implicit TLB flushes). 14163 + */ 14166 14164 kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); 14167 14165 return kvm_skip_emulated_instruction(vcpu); 14168 14166
+1 -1
include/hyperv/hvgdk.h
··· 281 281 #define HV_VMCB_NESTED_ENLIGHTENMENTS 31 282 282 283 283 /* Synthetic VM-Exit */ 284 - #define HV_SVM_EXITCODE_ENL 0xf0000000 284 + #define HV_SVM_EXITCODE_ENL 0xf0000000ull 285 285 #define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1) 286 286 287 287 /* VM_PARTITION_ASSIST_PAGE */
+9
include/uapi/linux/kvm.h
··· 135 135 } u; 136 136 }; 137 137 138 + struct kvm_exit_snp_req_certs { 139 + __u64 gpa; 140 + __u64 npages; 141 + __u64 ret; 142 + }; 143 + 138 144 #define KVM_S390_GET_SKEYS_NONE 1 139 145 #define KVM_S390_SKEYS_MAX 1048576 140 146 ··· 187 181 #define KVM_EXIT_TDX 40 188 182 #define KVM_EXIT_ARM_SEA 41 189 183 #define KVM_EXIT_ARM_LDST64B 42 184 + #define KVM_EXIT_SNP_REQ_CERTS 43 190 185 191 186 /* For KVM_EXIT_INTERNAL_ERROR */ 192 187 /* Emulate instruction failed. */ ··· 490 483 __u64 gva; 491 484 __u64 gpa; 492 485 } arm_sea; 486 + /* KVM_EXIT_SNP_REQ_CERTS */ 487 + struct kvm_exit_snp_req_certs snp_req_certs; 493 488 /* Fix the size of the union. */ 494 489 char padding[256]; 495 490 };
+1 -1
tools/testing/selftests/kvm/Makefile.kvm
··· 93 93 TEST_GEN_PROGS_x86 += x86/nested_emulation_test 94 94 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test 95 95 TEST_GEN_PROGS_x86 += x86/nested_invalid_cr3_test 96 + TEST_GEN_PROGS_x86 += x86/nested_set_state_test 96 97 TEST_GEN_PROGS_x86 += x86/nested_tsc_adjust_test 97 98 TEST_GEN_PROGS_x86 += x86/nested_tsc_scaling_test 98 99 TEST_GEN_PROGS_x86 += x86/nested_vmsave_vmload_test ··· 122 121 TEST_GEN_PROGS_x86 += x86/vmx_msrs_test 123 122 TEST_GEN_PROGS_x86 += x86/vmx_invalid_nested_guest_state 124 123 TEST_GEN_PROGS_x86 += x86/vmx_nested_la57_state_test 125 - TEST_GEN_PROGS_x86 += x86/vmx_set_nested_state_test 126 124 TEST_GEN_PROGS_x86 += x86/apic_bus_clock_test 127 125 TEST_GEN_PROGS_x86 += x86/xapic_ipi_test 128 126 TEST_GEN_PROGS_x86 += x86/xapic_state_test
+1 -2
tools/testing/selftests/kvm/include/x86/svm.h
··· 92 92 u32 int_vector; 93 93 u32 int_state; 94 94 u8 reserved_3[4]; 95 - u32 exit_code; 96 - u32 exit_code_hi; 95 + u64 exit_code; 97 96 u64 exit_info_1; 98 97 u64 exit_info_2; 99 98 u32 exit_int_info;
+2 -2
tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c
··· 103 103 104 104 run_guest(vmcb, svm->vmcb_gpa); 105 105 __GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL, 106 - "Expected VMMCAL #VMEXIT, got '0x%x', info1 = '0x%lx, info2 = '0x%lx'", 106 + "Expected VMMCAL #VMEXIT, got '0x%lx', info1 = '0x%lx, info2 = '0x%lx'", 107 107 vmcb->control.exit_code, 108 108 vmcb->control.exit_info_1, vmcb->control.exit_info_2); 109 109 ··· 133 133 134 134 run_guest(vmcb, svm->vmcb_gpa); 135 135 __GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_HLT, 136 - "Expected HLT #VMEXIT, got '0x%x', info1 = '0x%lx, info2 = '0x%lx'", 136 + "Expected HLT #VMEXIT, got '0x%lx', info1 = '0x%lx, info2 = '0x%lx'", 137 137 vmcb->control.exit_code, 138 138 vmcb->control.exit_info_1, vmcb->control.exit_info_2); 139 139
+115 -13
tools/testing/selftests/kvm/x86/vmx_set_nested_state_test.c tools/testing/selftests/kvm/x86/nested_set_state_test.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 - * vmx_set_nested_state_test 4 - * 5 3 * Copyright (C) 2019, Google LLC. 6 4 * 7 5 * This test verifies the integrity of calling the ioctl KVM_SET_NESTED_STATE. ··· 9 11 #include "kvm_util.h" 10 12 #include "processor.h" 11 13 #include "vmx.h" 14 + #include "svm_util.h" 12 15 13 16 #include <errno.h> 14 17 #include <linux/kvm.h> ··· 240 241 TEST_ASSERT(state->size >= sizeof(*state) && state->size <= state_sz, 241 242 "Size must be between %ld and %d. The size returned was %d.", 242 243 sizeof(*state), state_sz, state->size); 243 - TEST_ASSERT(state->hdr.vmx.vmxon_pa == -1ull, "vmxon_pa must be -1ull."); 244 - TEST_ASSERT(state->hdr.vmx.vmcs12_pa == -1ull, "vmcs_pa must be -1ull."); 244 + 245 + TEST_ASSERT_EQ(state->hdr.vmx.vmxon_pa, -1ull); 246 + TEST_ASSERT_EQ(state->hdr.vmx.vmcs12_pa, -1ull); 247 + TEST_ASSERT_EQ(state->flags, 0); 248 + 249 + free(state); 250 + } 251 + 252 + static void vcpu_efer_enable_svm(struct kvm_vcpu *vcpu) 253 + { 254 + uint64_t old_efer = vcpu_get_msr(vcpu, MSR_EFER); 255 + 256 + vcpu_set_msr(vcpu, MSR_EFER, old_efer | EFER_SVME); 257 + } 258 + 259 + static void vcpu_efer_disable_svm(struct kvm_vcpu *vcpu) 260 + { 261 + uint64_t old_efer = vcpu_get_msr(vcpu, MSR_EFER); 262 + 263 + vcpu_set_msr(vcpu, MSR_EFER, old_efer & ~EFER_SVME); 264 + } 265 + 266 + void set_default_svm_state(struct kvm_nested_state *state, int size) 267 + { 268 + memset(state, 0, size); 269 + state->format = 1; 270 + state->size = size; 271 + state->hdr.svm.vmcb_pa = 0x3000; 272 + } 273 + 274 + void test_svm_nested_state(struct kvm_vcpu *vcpu) 275 + { 276 + /* Add a page for VMCB. */ 277 + const int state_sz = sizeof(struct kvm_nested_state) + getpagesize(); 278 + struct kvm_nested_state *state = 279 + (struct kvm_nested_state *)malloc(state_sz); 280 + 281 + vcpu_set_cpuid_feature(vcpu, X86_FEATURE_SVM); 282 + 283 + /* The format must be set to 1. 0 for VMX, 1 for SVM. */ 284 + set_default_svm_state(state, state_sz); 285 + state->format = 0; 286 + test_nested_state_expect_einval(vcpu, state); 287 + 288 + /* Invalid flags are rejected, KVM_STATE_NESTED_EVMCS is VMX-only */ 289 + set_default_svm_state(state, state_sz); 290 + state->flags = KVM_STATE_NESTED_EVMCS; 291 + test_nested_state_expect_einval(vcpu, state); 292 + 293 + /* 294 + * If EFER.SVME is clear, guest mode is disallowed and GIF can be set or 295 + * cleared. 296 + */ 297 + vcpu_efer_disable_svm(vcpu); 298 + 299 + set_default_svm_state(state, state_sz); 300 + state->flags = KVM_STATE_NESTED_GUEST_MODE; 301 + test_nested_state_expect_einval(vcpu, state); 302 + 303 + state->flags = 0; 304 + test_nested_state(vcpu, state); 305 + 306 + state->flags = KVM_STATE_NESTED_GIF_SET; 307 + test_nested_state(vcpu, state); 308 + 309 + /* Enable SVM in the guest EFER. */ 310 + vcpu_efer_enable_svm(vcpu); 311 + 312 + /* Setting vmcb_pa to a non-aligned address is only fine when not entering guest mode */ 313 + set_default_svm_state(state, state_sz); 314 + state->hdr.svm.vmcb_pa = -1ull; 315 + state->flags = 0; 316 + test_nested_state(vcpu, state); 317 + state->flags = KVM_STATE_NESTED_GUEST_MODE; 318 + test_nested_state_expect_einval(vcpu, state); 319 + 320 + /* 321 + * Size must be large enough to fit kvm_nested_state and VMCB 322 + * only when entering guest mode. 323 + */ 324 + set_default_svm_state(state, state_sz/2); 325 + state->flags = 0; 326 + test_nested_state(vcpu, state); 327 + state->flags = KVM_STATE_NESTED_GUEST_MODE; 328 + test_nested_state_expect_einval(vcpu, state); 329 + 330 + /* 331 + * Test that if we leave nesting the state reflects that when we get it 332 + * again, except for vmcb_pa, which is always returned as 0 when not in 333 + * guest mode. 334 + */ 335 + set_default_svm_state(state, state_sz); 336 + state->hdr.svm.vmcb_pa = -1ull; 337 + state->flags = KVM_STATE_NESTED_GIF_SET; 338 + test_nested_state(vcpu, state); 339 + vcpu_nested_state_get(vcpu, state); 340 + TEST_ASSERT(state->size >= sizeof(*state) && state->size <= state_sz, 341 + "Size must be between %ld and %d. The size returned was %d.", 342 + sizeof(*state), state_sz, state->size); 343 + 344 + TEST_ASSERT_EQ(state->hdr.svm.vmcb_pa, 0); 345 + TEST_ASSERT_EQ(state->flags, KVM_STATE_NESTED_GIF_SET); 245 346 246 347 free(state); 247 348 } ··· 354 255 355 256 have_evmcs = kvm_check_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS); 356 257 258 + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || 259 + kvm_cpu_has(X86_FEATURE_SVM)); 357 260 TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE)); 358 - 359 - /* 360 - * AMD currently does not implement set_nested_state, so for now we 361 - * just early out. 362 - */ 363 - TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX)); 364 261 365 262 vm = vm_create_with_one_vcpu(&vcpu, NULL); 366 263 367 264 /* 368 - * First run tests with VMX disabled to check error handling. 265 + * First run tests with VMX/SVM disabled to check error handling. 266 + * test_{vmx/svm}_nested_state() will re-enable as needed. 369 267 */ 370 - vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_VMX); 268 + if (kvm_cpu_has(X86_FEATURE_VMX)) 269 + vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_VMX); 270 + else 271 + vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_SVM); 371 272 372 273 /* Passing a NULL kvm_nested_state causes a EFAULT. */ 373 274 test_nested_state_expect_efault(vcpu, NULL); ··· 396 297 state.flags = KVM_STATE_NESTED_RUN_PENDING; 397 298 test_nested_state_expect_einval(vcpu, &state); 398 299 399 - test_vmx_nested_state(vcpu); 300 + if (kvm_cpu_has(X86_FEATURE_VMX)) 301 + test_vmx_nested_state(vcpu); 302 + else 303 + test_svm_nested_state(vcpu); 400 304 401 305 kvm_vm_free(vm); 402 306 return 0;