Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Arm:

- Fix trapping regression when no in-kernel irqchip is present

- Check host-provided, untrusted ranges and offsets in pKVM

- Fix regression restoring the ID_PFR1_EL1 register

- Fix vgic ITS locking issues when LPIs are not directly injected

Arm selftests:

- Correct target CPU programming in vgic_lpi_stress selftest

- Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest

RISC-V:

- Fix check for local interrupts on riscv32

- Read HGEIP CSR on the correct cpu when checking for IMSIC
interrupts

- Remove automatic I/O mapping from kvm_arch_prepare_memory_region()

x86:

- Inject #UD if the guest attempts to execute SEAMCALL or TDCALL as
KVM doesn't support virtualization the instructions, but the
instructions are gated only by VMXON. That is, they will VM-Exit
instead of taking a #UD and until now this resulted in KVM exiting
to userspace with an emulation error.

- Unload the "FPU" when emulating INIT of XSTATE features if and only
if the FPU is actually loaded, instead of trying to predict when
KVM will emulate an INIT (CET support missed the MP_STATE path).
Add sanity checks to detect and harden against similar bugs in the
future.

- Unregister KVM's GALog notifier (for AVIC) when kvm-amd.ko is
unloaded.

- Use a raw spinlock for svm->ir_list_lock as the lock is taken
during schedule(), and "normal" spinlocks are sleepable locks when
PREEMPT_RT=y.

- Remove guest_memfd bindings on memslot deletion when a gmem file is
dying to fix a use-after-free race found by syzkaller.

- Fix a goof in the EPT Violation handler where KVM checks the wrong
variable when determining if the reported GVA is valid.

- Fix and simplify the handling of LBR virtualization on AMD, which
was made buggy and unnecessarily complicated by nested VM support

Misc:

- Update Oliver's email address"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: nSVM: Fix and simplify LBR virtualization handling with nested
KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
MAINTAINERS: Switch myself to using kernel.org address
KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to userspace irqchip
KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
KVM: arm64: Make all 32bit ID registers fully writable
KVM: VMX: Fix check for valid GVA on an EPT violation
KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
KVM: SVM: switch to raw spinlock for svm->ir_list_lock
KVM: SVM: Make avic_ga_log_notifier() local to avic.c
KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES
KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
KVM: arm64: Check the untrusted offset in FF-A memory share
KVM: arm64: Check range args for pKVM mem transitions
...

+297 -197
+2 -1
.mailmap
··· 605 605 Oleksij Rempel <o.rempel@pengutronix.de> <ore@pengutronix.de> 606 606 Oliver Hartkopp <socketcan@hartkopp.net> <oliver.hartkopp@volkswagen.de> 607 607 Oliver Hartkopp <socketcan@hartkopp.net> <oliver@hartkopp.net> 608 - Oliver Upton <oliver.upton@linux.dev> <oupton@google.com> 608 + Oliver Upton <oupton@kernel.org> <oupton@google.com> 609 + Oliver Upton <oupton@kernel.org> <oliver.upton@linux.dev> 609 610 Ondřej Jirman <megi@xff.cz> <megous@megous.com> 610 611 Oza Pawandeep <quic_poza@quicinc.com> <poza@codeaurora.org> 611 612 Pali Rohár <pali@kernel.org> <pali.rohar@gmail.com>
+1 -1
MAINTAINERS
··· 13659 13659 13660 13660 KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64) 13661 13661 M: Marc Zyngier <maz@kernel.org> 13662 - M: Oliver Upton <oliver.upton@linux.dev> 13662 + M: Oliver Upton <oupton@kernel.org> 13663 13663 R: Joey Gouly <joey.gouly@arm.com> 13664 13664 R: Suzuki K Poulose <suzuki.poulose@arm.com> 13665 13665 R: Zenghui Yu <yuzenghui@huawei.com>
+7 -2
arch/arm64/kvm/hyp/nvhe/ffa.c
··· 479 479 struct ffa_mem_region_attributes *ep_mem_access; 480 480 struct ffa_composite_mem_region *reg; 481 481 struct ffa_mem_region *buf; 482 - u32 offset, nr_ranges; 482 + u32 offset, nr_ranges, checked_offset; 483 483 int ret = 0; 484 484 485 485 if (addr_mbz || npages_mbz || fraglen > len || ··· 516 516 goto out_unlock; 517 517 } 518 518 519 - if (fraglen < offset + sizeof(struct ffa_composite_mem_region)) { 519 + if (check_add_overflow(offset, sizeof(struct ffa_composite_mem_region), &checked_offset)) { 520 + ret = FFA_RET_INVALID_PARAMETERS; 521 + goto out_unlock; 522 + } 523 + 524 + if (fraglen < checked_offset) { 520 525 ret = FFA_RET_INVALID_PARAMETERS; 521 526 goto out_unlock; 522 527 }
+28
arch/arm64/kvm/hyp/nvhe/mem_protect.c
··· 367 367 return kvm_pgtable_stage2_unmap(pgt, addr, BIT(pgt->ia_bits) - addr); 368 368 } 369 369 370 + /* 371 + * Ensure the PFN range is contained within PA-range. 372 + * 373 + * This check is also robust to overflows and is therefore a requirement before 374 + * using a pfn/nr_pages pair from an untrusted source. 375 + */ 376 + static bool pfn_range_is_valid(u64 pfn, u64 nr_pages) 377 + { 378 + u64 limit = BIT(kvm_phys_shift(&host_mmu.arch.mmu) - PAGE_SHIFT); 379 + 380 + return pfn < limit && ((limit - pfn) >= nr_pages); 381 + } 382 + 370 383 struct kvm_mem_range { 371 384 u64 start; 372 385 u64 end; ··· 789 776 void *virt = __hyp_va(phys); 790 777 int ret; 791 778 779 + if (!pfn_range_is_valid(pfn, nr_pages)) 780 + return -EINVAL; 781 + 792 782 host_lock_component(); 793 783 hyp_lock_component(); 794 784 ··· 819 803 u64 size = PAGE_SIZE * nr_pages; 820 804 u64 virt = (u64)__hyp_va(phys); 821 805 int ret; 806 + 807 + if (!pfn_range_is_valid(pfn, nr_pages)) 808 + return -EINVAL; 822 809 823 810 host_lock_component(); 824 811 hyp_lock_component(); ··· 906 887 u64 size = PAGE_SIZE * nr_pages; 907 888 int ret; 908 889 890 + if (!pfn_range_is_valid(pfn, nr_pages)) 891 + return -EINVAL; 892 + 909 893 host_lock_component(); 910 894 ret = __host_check_page_state_range(phys, size, PKVM_PAGE_OWNED); 911 895 if (!ret) ··· 923 901 u64 phys = hyp_pfn_to_phys(pfn); 924 902 u64 size = PAGE_SIZE * nr_pages; 925 903 int ret; 904 + 905 + if (!pfn_range_is_valid(pfn, nr_pages)) 906 + return -EINVAL; 926 907 927 908 host_lock_component(); 928 909 ret = __host_check_page_state_range(phys, size, PKVM_PAGE_SHARED_OWNED); ··· 968 943 int ret; 969 944 970 945 if (prot & ~KVM_PGTABLE_PROT_RWX) 946 + return -EINVAL; 947 + 948 + if (!pfn_range_is_valid(pfn, nr_pages)) 971 949 return -EINVAL; 972 950 973 951 ret = __guest_check_transition_size(phys, ipa, nr_pages, &size);
+38 -33
arch/arm64/kvm/sys_regs.c
··· 2595 2595 .val = 0, \ 2596 2596 } 2597 2597 2598 - /* sys_reg_desc initialiser for known cpufeature ID registers */ 2599 - #define AA32_ID_SANITISED(name) { \ 2600 - ID_DESC(name), \ 2601 - .visibility = aa32_id_visibility, \ 2602 - .val = 0, \ 2603 - } 2604 - 2605 2598 /* sys_reg_desc initialiser for writable ID registers */ 2606 2599 #define ID_WRITABLE(name, mask) { \ 2607 2600 ID_DESC(name), \ 2608 2601 .val = mask, \ 2602 + } 2603 + 2604 + /* 2605 + * 32bit ID regs are fully writable when the guest is 32bit 2606 + * capable. Nothing in the KVM code should rely on 32bit features 2607 + * anyway, only 64bit, so let the VMM do its worse. 2608 + */ 2609 + #define AA32_ID_WRITABLE(name) { \ 2610 + ID_DESC(name), \ 2611 + .visibility = aa32_id_visibility, \ 2612 + .val = GENMASK(31, 0), \ 2609 2613 } 2610 2614 2611 2615 /* sys_reg_desc initialiser for cpufeature ID registers that need filtering */ ··· 3132 3128 3133 3129 /* AArch64 mappings of the AArch32 ID registers */ 3134 3130 /* CRm=1 */ 3135 - AA32_ID_SANITISED(ID_PFR0_EL1), 3136 - AA32_ID_SANITISED(ID_PFR1_EL1), 3131 + AA32_ID_WRITABLE(ID_PFR0_EL1), 3132 + AA32_ID_WRITABLE(ID_PFR1_EL1), 3137 3133 { SYS_DESC(SYS_ID_DFR0_EL1), 3138 3134 .access = access_id_reg, 3139 3135 .get_user = get_id_reg, 3140 3136 .set_user = set_id_dfr0_el1, 3141 3137 .visibility = aa32_id_visibility, 3142 3138 .reset = read_sanitised_id_dfr0_el1, 3143 - .val = ID_DFR0_EL1_PerfMon_MASK | 3144 - ID_DFR0_EL1_CopDbg_MASK, }, 3139 + .val = GENMASK(31, 0) }, 3145 3140 ID_HIDDEN(ID_AFR0_EL1), 3146 - AA32_ID_SANITISED(ID_MMFR0_EL1), 3147 - AA32_ID_SANITISED(ID_MMFR1_EL1), 3148 - AA32_ID_SANITISED(ID_MMFR2_EL1), 3149 - AA32_ID_SANITISED(ID_MMFR3_EL1), 3141 + AA32_ID_WRITABLE(ID_MMFR0_EL1), 3142 + AA32_ID_WRITABLE(ID_MMFR1_EL1), 3143 + AA32_ID_WRITABLE(ID_MMFR2_EL1), 3144 + AA32_ID_WRITABLE(ID_MMFR3_EL1), 3150 3145 3151 3146 /* CRm=2 */ 3152 - AA32_ID_SANITISED(ID_ISAR0_EL1), 3153 - AA32_ID_SANITISED(ID_ISAR1_EL1), 3154 - AA32_ID_SANITISED(ID_ISAR2_EL1), 3155 - AA32_ID_SANITISED(ID_ISAR3_EL1), 3156 - AA32_ID_SANITISED(ID_ISAR4_EL1), 3157 - AA32_ID_SANITISED(ID_ISAR5_EL1), 3158 - AA32_ID_SANITISED(ID_MMFR4_EL1), 3159 - AA32_ID_SANITISED(ID_ISAR6_EL1), 3147 + AA32_ID_WRITABLE(ID_ISAR0_EL1), 3148 + AA32_ID_WRITABLE(ID_ISAR1_EL1), 3149 + AA32_ID_WRITABLE(ID_ISAR2_EL1), 3150 + AA32_ID_WRITABLE(ID_ISAR3_EL1), 3151 + AA32_ID_WRITABLE(ID_ISAR4_EL1), 3152 + AA32_ID_WRITABLE(ID_ISAR5_EL1), 3153 + AA32_ID_WRITABLE(ID_MMFR4_EL1), 3154 + AA32_ID_WRITABLE(ID_ISAR6_EL1), 3160 3155 3161 3156 /* CRm=3 */ 3162 - AA32_ID_SANITISED(MVFR0_EL1), 3163 - AA32_ID_SANITISED(MVFR1_EL1), 3164 - AA32_ID_SANITISED(MVFR2_EL1), 3157 + AA32_ID_WRITABLE(MVFR0_EL1), 3158 + AA32_ID_WRITABLE(MVFR1_EL1), 3159 + AA32_ID_WRITABLE(MVFR2_EL1), 3165 3160 ID_UNALLOCATED(3,3), 3166 - AA32_ID_SANITISED(ID_PFR2_EL1), 3161 + AA32_ID_WRITABLE(ID_PFR2_EL1), 3167 3162 ID_HIDDEN(ID_DFR1_EL1), 3168 - AA32_ID_SANITISED(ID_MMFR5_EL1), 3163 + AA32_ID_WRITABLE(ID_MMFR5_EL1), 3169 3164 ID_UNALLOCATED(3,7), 3170 3165 3171 3166 /* AArch64 ID registers */ ··· 5609 5606 5610 5607 guard(mutex)(&kvm->arch.config_lock); 5611 5608 5612 - if (!(static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif) && 5613 - irqchip_in_kernel(kvm) && 5614 - kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)) { 5615 - kvm->arch.id_regs[IDREG_IDX(SYS_ID_AA64PFR0_EL1)] &= ~ID_AA64PFR0_EL1_GIC_MASK; 5616 - kvm->arch.id_regs[IDREG_IDX(SYS_ID_PFR1_EL1)] &= ~ID_PFR1_EL1_GIC_MASK; 5609 + if (!irqchip_in_kernel(kvm)) { 5610 + u64 val; 5611 + 5612 + val = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC; 5613 + kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, val); 5614 + val = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC; 5615 + kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, val); 5617 5616 } 5618 5617 5619 5618 if (vcpu_has_nv(vcpu)) {
+12 -4
arch/arm64/kvm/vgic/vgic-debug.c
··· 64 64 static int iter_mark_lpis(struct kvm *kvm) 65 65 { 66 66 struct vgic_dist *dist = &kvm->arch.vgic; 67 + unsigned long intid, flags; 67 68 struct vgic_irq *irq; 68 - unsigned long intid; 69 69 int nr_lpis = 0; 70 + 71 + xa_lock_irqsave(&dist->lpi_xa, flags); 70 72 71 73 xa_for_each(&dist->lpi_xa, intid, irq) { 72 74 if (!vgic_try_get_irq_ref(irq)) 73 75 continue; 74 76 75 - xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER); 77 + __xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER); 76 78 nr_lpis++; 77 79 } 80 + 81 + xa_unlock_irqrestore(&dist->lpi_xa, flags); 78 82 79 83 return nr_lpis; 80 84 } ··· 86 82 static void iter_unmark_lpis(struct kvm *kvm) 87 83 { 88 84 struct vgic_dist *dist = &kvm->arch.vgic; 85 + unsigned long intid, flags; 89 86 struct vgic_irq *irq; 90 - unsigned long intid; 91 87 92 88 xa_for_each_marked(&dist->lpi_xa, intid, irq, LPI_XA_MARK_DEBUG_ITER) { 93 - xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER); 89 + xa_lock_irqsave(&dist->lpi_xa, flags); 90 + __xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER); 91 + xa_unlock_irqrestore(&dist->lpi_xa, flags); 92 + 93 + /* vgic_put_irq() expects to be called outside of the xa_lock */ 94 94 vgic_put_irq(kvm, irq); 95 95 } 96 96 }
+13 -3
arch/arm64/kvm/vgic/vgic-init.c
··· 53 53 { 54 54 struct vgic_dist *dist = &kvm->arch.vgic; 55 55 56 - xa_init(&dist->lpi_xa); 56 + xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ); 57 57 } 58 58 59 59 /* CREATION */ ··· 71 71 int kvm_vgic_create(struct kvm *kvm, u32 type) 72 72 { 73 73 struct kvm_vcpu *vcpu; 74 + u64 aa64pfr0, pfr1; 74 75 unsigned long i; 75 76 int ret; 76 77 ··· 162 161 163 162 kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF; 164 163 165 - if (type == KVM_DEV_TYPE_ARM_VGIC_V2) 164 + aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC; 165 + pfr1 = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC; 166 + 167 + if (type == KVM_DEV_TYPE_ARM_VGIC_V2) { 166 168 kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; 167 - else 169 + } else { 168 170 INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions); 171 + aa64pfr0 |= SYS_FIELD_PREP_ENUM(ID_AA64PFR0_EL1, GIC, IMP); 172 + pfr1 |= SYS_FIELD_PREP_ENUM(ID_PFR1_EL1, GIC, GICv3); 173 + } 174 + 175 + kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0); 176 + kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1); 169 177 170 178 if (type == KVM_DEV_TYPE_ARM_VGIC_V3) 171 179 kvm->arch.vgic.nassgicap = system_supports_direct_sgis();
+8 -10
arch/arm64/kvm/vgic/vgic-its.c
··· 78 78 { 79 79 struct vgic_dist *dist = &kvm->arch.vgic; 80 80 struct vgic_irq *irq = vgic_get_irq(kvm, intid), *oldirq; 81 + unsigned long flags; 81 82 int ret; 82 83 83 84 /* In this case there is no put, since we keep the reference. */ ··· 89 88 if (!irq) 90 89 return ERR_PTR(-ENOMEM); 91 90 92 - ret = xa_reserve(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT); 91 + ret = xa_reserve_irq(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT); 93 92 if (ret) { 94 93 kfree(irq); 95 94 return ERR_PTR(ret); ··· 104 103 irq->target_vcpu = vcpu; 105 104 irq->group = 1; 106 105 107 - xa_lock(&dist->lpi_xa); 106 + xa_lock_irqsave(&dist->lpi_xa, flags); 108 107 109 108 /* 110 109 * There could be a race with another vgic_add_lpi(), so we need to ··· 115 114 /* Someone was faster with adding this LPI, lets use that. */ 116 115 kfree(irq); 117 116 irq = oldirq; 118 - 119 - goto out_unlock; 117 + } else { 118 + ret = xa_err(__xa_store(&dist->lpi_xa, intid, irq, 0)); 120 119 } 121 120 122 - ret = xa_err(__xa_store(&dist->lpi_xa, intid, irq, 0)); 121 + xa_unlock_irqrestore(&dist->lpi_xa, flags); 122 + 123 123 if (ret) { 124 124 xa_release(&dist->lpi_xa, intid); 125 125 kfree(irq); 126 - } 127 126 128 - out_unlock: 129 - xa_unlock(&dist->lpi_xa); 130 - 131 - if (ret) 132 127 return ERR_PTR(ret); 128 + } 133 129 134 130 /* 135 131 * We "cache" the configuration table entries in our struct vgic_irq's.
+2 -1
arch/arm64/kvm/vgic/vgic-v3.c
··· 301 301 return; 302 302 303 303 /* Hide GICv3 sysreg if necessary */ 304 - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2) { 304 + if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2 || 305 + !irqchip_in_kernel(vcpu->kvm)) { 305 306 vgic_v3->vgic_hcr |= (ICH_HCR_EL2_TALL0 | ICH_HCR_EL2_TALL1 | 306 307 ICH_HCR_EL2_TC); 307 308 return;
+15 -8
arch/arm64/kvm/vgic/vgic.c
··· 28 28 * kvm->arch.config_lock (mutex) 29 29 * its->cmd_lock (mutex) 30 30 * its->its_lock (mutex) 31 - * vgic_dist->lpi_xa.xa_lock 31 + * vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled 32 32 * vgic_cpu->ap_list_lock must be taken with IRQs disabled 33 33 * vgic_irq->irq_lock must be taken with IRQs disabled 34 34 * ··· 141 141 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq) 142 142 { 143 143 struct vgic_dist *dist = &kvm->arch.vgic; 144 + unsigned long flags; 144 145 145 - if (irq->intid >= VGIC_MIN_LPI) 146 - might_lock(&dist->lpi_xa.xa_lock); 146 + /* 147 + * Normally the lock is only taken when the refcount drops to 0. 148 + * Acquire/release it early on lockdep kernels to make locking issues 149 + * in rare release paths a bit more obvious. 150 + */ 151 + if (IS_ENABLED(CONFIG_LOCKDEP) && irq->intid >= VGIC_MIN_LPI) { 152 + guard(spinlock_irqsave)(&dist->lpi_xa.xa_lock); 153 + } 147 154 148 155 if (!__vgic_put_irq(kvm, irq)) 149 156 return; 150 157 151 - xa_lock(&dist->lpi_xa); 158 + xa_lock_irqsave(&dist->lpi_xa, flags); 152 159 vgic_release_lpi_locked(dist, irq); 153 - xa_unlock(&dist->lpi_xa); 160 + xa_unlock_irqrestore(&dist->lpi_xa, flags); 154 161 } 155 162 156 163 static void vgic_release_deleted_lpis(struct kvm *kvm) 157 164 { 158 165 struct vgic_dist *dist = &kvm->arch.vgic; 159 - unsigned long intid; 166 + unsigned long flags, intid; 160 167 struct vgic_irq *irq; 161 168 162 - xa_lock(&dist->lpi_xa); 169 + xa_lock_irqsave(&dist->lpi_xa, flags); 163 170 164 171 xa_for_each(&dist->lpi_xa, intid, irq) { 165 172 if (irq->pending_release) 166 173 vgic_release_lpi_locked(dist, irq); 167 174 } 168 175 169 - xa_unlock(&dist->lpi_xa); 176 + xa_unlock_irqrestore(&dist->lpi_xa, flags); 170 177 } 171 178 172 179 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
+14 -2
arch/riscv/kvm/aia_imsic.c
··· 689 689 */ 690 690 691 691 read_lock_irqsave(&imsic->vsfile_lock, flags); 692 - if (imsic->vsfile_cpu > -1) 693 - ret = !!(csr_read(CSR_HGEIP) & BIT(imsic->vsfile_hgei)); 692 + if (imsic->vsfile_cpu > -1) { 693 + /* 694 + * This function is typically called from kvm_vcpu_block() via 695 + * kvm_arch_vcpu_runnable() upon WFI trap. The kvm_vcpu_block() 696 + * can be preempted and the blocking VCPU might resume on a 697 + * different CPU. This means it is possible that current CPU 698 + * does not match the imsic->vsfile_cpu hence this function 699 + * must check imsic->vsfile_cpu before accessing HGEIP CSR. 700 + */ 701 + if (imsic->vsfile_cpu != vcpu->cpu) 702 + ret = true; 703 + else 704 + ret = !!(csr_read(CSR_HGEIP) & BIT(imsic->vsfile_hgei)); 705 + } 694 706 read_unlock_irqrestore(&imsic->vsfile_lock, flags); 695 707 696 708 return ret;
+2 -23
arch/riscv/kvm/mmu.c
··· 171 171 enum kvm_mr_change change) 172 172 { 173 173 hva_t hva, reg_end, size; 174 - gpa_t base_gpa; 175 174 bool writable; 176 175 int ret = 0; 177 176 ··· 189 190 hva = new->userspace_addr; 190 191 size = new->npages << PAGE_SHIFT; 191 192 reg_end = hva + size; 192 - base_gpa = new->base_gfn << PAGE_SHIFT; 193 193 writable = !(new->flags & KVM_MEM_READONLY); 194 194 195 195 mmap_read_lock(current->mm); 196 196 197 197 /* 198 198 * A memory region could potentially cover multiple VMAs, and 199 - * any holes between them, so iterate over all of them to find 200 - * out if we can map any of them right now. 199 + * any holes between them, so iterate over all of them. 201 200 * 202 201 * +--------------------------------------------+ 203 202 * +---------------+----------------+ +----------------+ ··· 206 209 */ 207 210 do { 208 211 struct vm_area_struct *vma; 209 - hva_t vm_start, vm_end; 212 + hva_t vm_end; 210 213 211 214 vma = find_vma_intersection(current->mm, hva, reg_end); 212 215 if (!vma) ··· 222 225 } 223 226 224 227 /* Take the intersection of this VMA with the memory region */ 225 - vm_start = max(hva, vma->vm_start); 226 228 vm_end = min(reg_end, vma->vm_end); 227 229 228 230 if (vma->vm_flags & VM_PFNMAP) { 229 - gpa_t gpa = base_gpa + (vm_start - hva); 230 - phys_addr_t pa; 231 - 232 - pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT; 233 - pa += vm_start - vma->vm_start; 234 - 235 231 /* IO region dirty page logging not allowed */ 236 232 if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { 237 233 ret = -EINVAL; 238 234 goto out; 239 235 } 240 - 241 - ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start, 242 - writable, false); 243 - if (ret) 244 - break; 245 236 } 246 237 hva = vm_end; 247 238 } while (hva < reg_end); 248 - 249 - if (change == KVM_MR_FLAGS_ONLY) 250 - goto out; 251 - 252 - if (ret) 253 - kvm_riscv_mmu_iounmap(kvm, base_gpa, size); 254 239 255 240 out: 256 241 mmap_read_unlock(current->mm);
+1 -1
arch/riscv/kvm/vcpu.c
··· 212 212 213 213 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 214 214 { 215 - return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) && 215 + return (kvm_riscv_vcpu_has_interrupts(vcpu, -1ULL) && 216 216 !kvm_riscv_vcpu_stopped(vcpu) && !vcpu->arch.pause); 217 217 } 218 218
+1
arch/x86/include/uapi/asm/vmx.h
··· 93 93 #define EXIT_REASON_TPAUSE 68 94 94 #define EXIT_REASON_BUS_LOCK 74 95 95 #define EXIT_REASON_NOTIFY 75 96 + #define EXIT_REASON_SEAMCALL 76 96 97 #define EXIT_REASON_TDCALL 77 97 98 #define EXIT_REASON_MSR_READ_IMM 84 98 99 #define EXIT_REASON_MSR_WRITE_IMM 85
+15 -9
arch/x86/kvm/svm/avic.c
··· 216 216 * This function is called from IOMMU driver to notify 217 217 * SVM to schedule in a particular vCPU of a particular VM. 218 218 */ 219 - int avic_ga_log_notifier(u32 ga_tag) 219 + static int avic_ga_log_notifier(u32 ga_tag) 220 220 { 221 221 unsigned long flags; 222 222 struct kvm_svm *kvm_svm; ··· 788 788 struct kvm_vcpu *vcpu = &svm->vcpu; 789 789 790 790 INIT_LIST_HEAD(&svm->ir_list); 791 - spin_lock_init(&svm->ir_list_lock); 791 + raw_spin_lock_init(&svm->ir_list_lock); 792 792 793 793 if (!enable_apicv || !irqchip_in_kernel(vcpu->kvm)) 794 794 return 0; ··· 816 816 if (!vcpu) 817 817 return; 818 818 819 - spin_lock_irqsave(&to_svm(vcpu)->ir_list_lock, flags); 819 + raw_spin_lock_irqsave(&to_svm(vcpu)->ir_list_lock, flags); 820 820 list_del(&irqfd->vcpu_list); 821 - spin_unlock_irqrestore(&to_svm(vcpu)->ir_list_lock, flags); 821 + raw_spin_unlock_irqrestore(&to_svm(vcpu)->ir_list_lock, flags); 822 822 } 823 823 824 824 int avic_pi_update_irte(struct kvm_kernel_irqfd *irqfd, struct kvm *kvm, ··· 855 855 * list of IRQs being posted to the vCPU, to ensure the IRTE 856 856 * isn't programmed with stale pCPU/IsRunning information. 857 857 */ 858 - guard(spinlock_irqsave)(&svm->ir_list_lock); 858 + guard(raw_spinlock_irqsave)(&svm->ir_list_lock); 859 859 860 860 /* 861 861 * Update the target pCPU for IOMMU doorbells if the vCPU is ··· 972 972 * up-to-date entry information, or that this task will wait until 973 973 * svm_ir_list_add() completes to set the new target pCPU. 974 974 */ 975 - spin_lock_irqsave(&svm->ir_list_lock, flags); 975 + raw_spin_lock_irqsave(&svm->ir_list_lock, flags); 976 976 977 977 entry = svm->avic_physical_id_entry; 978 978 WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); ··· 997 997 998 998 avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, action); 999 999 1000 - spin_unlock_irqrestore(&svm->ir_list_lock, flags); 1000 + raw_spin_unlock_irqrestore(&svm->ir_list_lock, flags); 1001 1001 } 1002 1002 1003 1003 void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) ··· 1035 1035 * or that this task will wait until svm_ir_list_add() completes to 1036 1036 * mark the vCPU as not running. 1037 1037 */ 1038 - spin_lock_irqsave(&svm->ir_list_lock, flags); 1038 + raw_spin_lock_irqsave(&svm->ir_list_lock, flags); 1039 1039 1040 1040 avic_update_iommu_vcpu_affinity(vcpu, -1, action); 1041 1041 ··· 1059 1059 1060 1060 svm->avic_physical_id_entry = entry; 1061 1061 1062 - spin_unlock_irqrestore(&svm->ir_list_lock, flags); 1062 + raw_spin_unlock_irqrestore(&svm->ir_list_lock, flags); 1063 1063 } 1064 1064 1065 1065 void avic_vcpu_put(struct kvm_vcpu *vcpu) ··· 1242 1242 amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier); 1243 1243 1244 1244 return true; 1245 + } 1246 + 1247 + void avic_hardware_unsetup(void) 1248 + { 1249 + if (avic) 1250 + amd_iommu_register_ga_log_notifier(NULL); 1245 1251 }
+7 -13
arch/x86/kvm/svm/nested.c
··· 677 677 */ 678 678 svm_copy_lbrs(vmcb02, vmcb12); 679 679 vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS; 680 - svm_update_lbrv(&svm->vcpu); 681 - 682 - } else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) { 680 + } else { 683 681 svm_copy_lbrs(vmcb02, vmcb01); 684 682 } 683 + svm_update_lbrv(&svm->vcpu); 685 684 } 686 685 687 686 static inline bool is_evtinj_soft(u32 evtinj) ··· 832 833 svm->soft_int_next_rip = vmcb12_rip; 833 834 } 834 835 835 - vmcb02->control.virt_ext = vmcb01->control.virt_ext & 836 - LBR_CTL_ENABLE_MASK; 837 - if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV)) 838 - vmcb02->control.virt_ext |= 839 - (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK); 836 + /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */ 840 837 841 838 if (!nested_vmcb_needs_vls_intercept(svm)) 842 839 vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; ··· 1184 1189 kvm_make_request(KVM_REQ_EVENT, &svm->vcpu); 1185 1190 1186 1191 if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) && 1187 - (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) { 1192 + (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) 1188 1193 svm_copy_lbrs(vmcb12, vmcb02); 1189 - svm_update_lbrv(vcpu); 1190 - } else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) { 1194 + else 1191 1195 svm_copy_lbrs(vmcb01, vmcb02); 1192 - svm_update_lbrv(vcpu); 1193 - } 1196 + 1197 + svm_update_lbrv(vcpu); 1194 1198 1195 1199 if (vnmi) { 1196 1200 if (vmcb02->control.int_ctl & V_NMI_BLOCKING_MASK)
+39 -49
arch/x86/kvm/svm/svm.c
··· 806 806 vmcb_mark_dirty(to_vmcb, VMCB_LBR); 807 807 } 808 808 809 + static void __svm_enable_lbrv(struct kvm_vcpu *vcpu) 810 + { 811 + to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; 812 + } 813 + 809 814 void svm_enable_lbrv(struct kvm_vcpu *vcpu) 810 815 { 811 - struct vcpu_svm *svm = to_svm(vcpu); 812 - 813 - svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; 816 + __svm_enable_lbrv(vcpu); 814 817 svm_recalc_lbr_msr_intercepts(vcpu); 815 - 816 - /* Move the LBR msrs to the vmcb02 so that the guest can see them. */ 817 - if (is_guest_mode(vcpu)) 818 - svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr); 819 818 } 820 819 821 - static void svm_disable_lbrv(struct kvm_vcpu *vcpu) 820 + static void __svm_disable_lbrv(struct kvm_vcpu *vcpu) 822 821 { 823 - struct vcpu_svm *svm = to_svm(vcpu); 824 - 825 822 KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm); 826 - svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK; 827 - svm_recalc_lbr_msr_intercepts(vcpu); 828 - 829 - /* 830 - * Move the LBR msrs back to the vmcb01 to avoid copying them 831 - * on nested guest entries. 832 - */ 833 - if (is_guest_mode(vcpu)) 834 - svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb); 835 - } 836 - 837 - static struct vmcb *svm_get_lbr_vmcb(struct vcpu_svm *svm) 838 - { 839 - /* 840 - * If LBR virtualization is disabled, the LBR MSRs are always kept in 841 - * vmcb01. If LBR virtualization is enabled and L1 is running VMs of 842 - * its own, the MSRs are moved between vmcb01 and vmcb02 as needed. 843 - */ 844 - return svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK ? svm->vmcb : 845 - svm->vmcb01.ptr; 823 + to_svm(vcpu)->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK; 846 824 } 847 825 848 826 void svm_update_lbrv(struct kvm_vcpu *vcpu) 849 827 { 850 828 struct vcpu_svm *svm = to_svm(vcpu); 851 829 bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK; 852 - bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) || 830 + bool enable_lbrv = (svm->vmcb->save.dbgctl & DEBUGCTLMSR_LBR) || 853 831 (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) && 854 832 (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK)); 855 833 856 - if (enable_lbrv == current_enable_lbrv) 857 - return; 834 + if (enable_lbrv && !current_enable_lbrv) 835 + __svm_enable_lbrv(vcpu); 836 + else if (!enable_lbrv && current_enable_lbrv) 837 + __svm_disable_lbrv(vcpu); 858 838 859 - if (enable_lbrv) 860 - svm_enable_lbrv(vcpu); 861 - else 862 - svm_disable_lbrv(vcpu); 839 + /* 840 + * During nested transitions, it is possible that the current VMCB has 841 + * LBR_CTL set, but the previous LBR_CTL had it cleared (or vice versa). 842 + * In this case, even though LBR_CTL does not need an update, intercepts 843 + * do, so always recalculate the intercepts here. 844 + */ 845 + svm_recalc_lbr_msr_intercepts(vcpu); 863 846 } 864 847 865 848 void disable_nmi_singlestep(struct vcpu_svm *svm) ··· 903 920 static void svm_hardware_unsetup(void) 904 921 { 905 922 int cpu; 923 + 924 + avic_hardware_unsetup(); 906 925 907 926 sev_hardware_unsetup(); 908 927 ··· 2707 2722 msr_info->data = svm->tsc_aux; 2708 2723 break; 2709 2724 case MSR_IA32_DEBUGCTLMSR: 2710 - msr_info->data = svm_get_lbr_vmcb(svm)->save.dbgctl; 2725 + msr_info->data = svm->vmcb->save.dbgctl; 2711 2726 break; 2712 2727 case MSR_IA32_LASTBRANCHFROMIP: 2713 - msr_info->data = svm_get_lbr_vmcb(svm)->save.br_from; 2728 + msr_info->data = svm->vmcb->save.br_from; 2714 2729 break; 2715 2730 case MSR_IA32_LASTBRANCHTOIP: 2716 - msr_info->data = svm_get_lbr_vmcb(svm)->save.br_to; 2731 + msr_info->data = svm->vmcb->save.br_to; 2717 2732 break; 2718 2733 case MSR_IA32_LASTINTFROMIP: 2719 - msr_info->data = svm_get_lbr_vmcb(svm)->save.last_excp_from; 2734 + msr_info->data = svm->vmcb->save.last_excp_from; 2720 2735 break; 2721 2736 case MSR_IA32_LASTINTTOIP: 2722 - msr_info->data = svm_get_lbr_vmcb(svm)->save.last_excp_to; 2737 + msr_info->data = svm->vmcb->save.last_excp_to; 2723 2738 break; 2724 2739 case MSR_VM_HSAVE_PA: 2725 2740 msr_info->data = svm->nested.hsave_msr; ··· 2987 3002 if (data & DEBUGCTL_RESERVED_BITS) 2988 3003 return 1; 2989 3004 2990 - svm_get_lbr_vmcb(svm)->save.dbgctl = data; 3005 + if (svm->vmcb->save.dbgctl == data) 3006 + break; 3007 + 3008 + svm->vmcb->save.dbgctl = data; 3009 + vmcb_mark_dirty(svm->vmcb, VMCB_LBR); 2991 3010 svm_update_lbrv(vcpu); 2992 3011 break; 2993 3012 case MSR_VM_HSAVE_PA: ··· 5375 5386 5376 5387 svm_hv_hardware_setup(); 5377 5388 5378 - for_each_possible_cpu(cpu) { 5379 - r = svm_cpu_init(cpu); 5380 - if (r) 5381 - goto err; 5382 - } 5383 - 5384 5389 enable_apicv = avic_hardware_setup(); 5385 5390 if (!enable_apicv) { 5386 5391 enable_ipiv = false; ··· 5418 5435 svm_set_cpu_caps(); 5419 5436 5420 5437 kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED; 5438 + 5439 + for_each_possible_cpu(cpu) { 5440 + r = svm_cpu_init(cpu); 5441 + if (r) 5442 + goto err; 5443 + } 5444 + 5421 5445 return 0; 5422 5446 5423 5447 err:
+2 -2
arch/x86/kvm/svm/svm.h
··· 329 329 * back into remapped mode). 330 330 */ 331 331 struct list_head ir_list; 332 - spinlock_t ir_list_lock; 332 + raw_spinlock_t ir_list_lock; 333 333 334 334 struct vcpu_sev_es_state sev_es; 335 335 ··· 805 805 ) 806 806 807 807 bool __init avic_hardware_setup(void); 808 - int avic_ga_log_notifier(u32 ga_tag); 808 + void avic_hardware_unsetup(void); 809 809 void avic_vm_destroy(struct kvm *kvm); 810 810 int avic_vm_init(struct kvm *kvm); 811 811 void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb);
+1 -1
arch/x86/kvm/vmx/common.h
··· 98 98 error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK) 99 99 ? PFERR_PRESENT_MASK : 0; 100 100 101 - if (error_code & EPT_VIOLATION_GVA_IS_VALID) 101 + if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) 102 102 error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? 103 103 PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; 104 104
+8
arch/x86/kvm/vmx/nested.c
··· 6728 6728 case EXIT_REASON_NOTIFY: 6729 6729 /* Notify VM exit is not exposed to L1 */ 6730 6730 return false; 6731 + case EXIT_REASON_SEAMCALL: 6732 + case EXIT_REASON_TDCALL: 6733 + /* 6734 + * SEAMCALL and TDCALL unconditionally VM-Exit, but aren't 6735 + * virtualized by KVM for L1 hypervisors, i.e. L1 should 6736 + * never want or expect such an exit. 6737 + */ 6738 + return false; 6731 6739 default: 6732 6740 return true; 6733 6741 }
+8
arch/x86/kvm/vmx/vmx.c
··· 6032 6032 return 1; 6033 6033 } 6034 6034 6035 + static int handle_tdx_instruction(struct kvm_vcpu *vcpu) 6036 + { 6037 + kvm_queue_exception(vcpu, UD_VECTOR); 6038 + return 1; 6039 + } 6040 + 6035 6041 #ifndef CONFIG_X86_SGX_KVM 6036 6042 static int handle_encls(struct kvm_vcpu *vcpu) 6037 6043 { ··· 6163 6157 [EXIT_REASON_ENCLS] = handle_encls, 6164 6158 [EXIT_REASON_BUS_LOCK] = handle_bus_lock_vmexit, 6165 6159 [EXIT_REASON_NOTIFY] = handle_notify, 6160 + [EXIT_REASON_SEAMCALL] = handle_tdx_instruction, 6161 + [EXIT_REASON_TDCALL] = handle_tdx_instruction, 6166 6162 [EXIT_REASON_MSR_READ_IMM] = handle_rdmsr_imm, 6167 6163 [EXIT_REASON_MSR_WRITE_IMM] = handle_wrmsr_imm, 6168 6164 };
+29 -19
arch/x86/kvm/x86.c
··· 3874 3874 3875 3875 /* 3876 3876 * Returns true if the MSR in question is managed via XSTATE, i.e. is context 3877 - * switched with the rest of guest FPU state. Note! S_CET is _not_ context 3878 - * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS. 3879 - * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields, 3880 - * the value saved/restored via XSTATE is always the host's value. That detail 3881 - * is _extremely_ important, as the guest's S_CET must _never_ be resident in 3882 - * hardware while executing in the host. Loading guest values for U_CET and 3883 - * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to 3884 - * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower 3885 - * privilege levels, i.e. are effectively only consumed by userspace as well. 3877 + * switched with the rest of guest FPU state. 3878 + * 3879 + * Note, S_CET is _not_ saved/restored via XSAVES/XRSTORS. 3886 3880 */ 3887 3881 static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr) 3888 3882 { ··· 3899 3905 * MSR that is managed via XSTATE. Note, the caller is responsible for doing 3900 3906 * the initial FPU load, this helper only ensures that guest state is resident 3901 3907 * in hardware (the kernel can load its FPU state in IRQ context). 3908 + * 3909 + * Note, loading guest values for U_CET and PL[0-3]_SSP while executing in the 3910 + * kernel is safe, as U_CET is specific to userspace, and PL[0-3]_SSP are only 3911 + * consumed when transitioning to lower privilege levels, i.e. are effectively 3912 + * only consumed by userspace as well. 3902 3913 */ 3903 3914 static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu, 3904 3915 struct msr_data *msr_info, ··· 11806 11807 /* Swap (qemu) user FPU context for the guest FPU context. */ 11807 11808 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) 11808 11809 { 11810 + if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm)) 11811 + return; 11812 + 11809 11813 /* Exclude PKRU, it's restored separately immediately after VM-Exit. */ 11810 11814 fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true); 11811 11815 trace_kvm_fpu(1); ··· 11817 11815 /* When vcpu_run ends, restore user space FPU context. */ 11818 11816 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) 11819 11817 { 11818 + if (KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm)) 11819 + return; 11820 + 11820 11821 fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false); 11821 11822 ++vcpu->stat.fpu_reload; 11822 11823 trace_kvm_fpu(0); ··· 12142 12137 int r; 12143 12138 12144 12139 vcpu_load(vcpu); 12145 - if (kvm_mpx_supported()) 12146 - kvm_load_guest_fpu(vcpu); 12147 - 12148 12140 kvm_vcpu_srcu_read_lock(vcpu); 12149 12141 12150 12142 r = kvm_apic_accept_events(vcpu); ··· 12158 12156 12159 12157 out: 12160 12158 kvm_vcpu_srcu_read_unlock(vcpu); 12161 - 12162 - if (kvm_mpx_supported()) 12163 - kvm_put_guest_fpu(vcpu); 12164 12159 vcpu_put(vcpu); 12165 12160 return r; 12166 12161 } ··· 12787 12788 { 12788 12789 struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; 12789 12790 u64 xfeatures_mask; 12791 + bool fpu_in_use; 12790 12792 int i; 12791 12793 12792 12794 /* ··· 12811 12811 BUILD_BUG_ON(sizeof(xfeatures_mask) * BITS_PER_BYTE <= XFEATURE_MAX); 12812 12812 12813 12813 /* 12814 - * All paths that lead to INIT are required to load the guest's FPU 12815 - * state (because most paths are buried in KVM_RUN). 12814 + * Unload guest FPU state (if necessary) before zeroing XSTATE fields 12815 + * as the kernel can only modify the state when its resident in memory, 12816 + * i.e. when it's not loaded into hardware. 12817 + * 12818 + * WARN if the vCPU's desire to run, i.e. whether or not its in KVM_RUN, 12819 + * doesn't match the loaded/in-use state of the FPU, as KVM_RUN is the 12820 + * only path that can trigger INIT emulation _and_ loads FPU state, and 12821 + * KVM_RUN should _always_ load FPU state. 12816 12822 */ 12817 - kvm_put_guest_fpu(vcpu); 12823 + WARN_ON_ONCE(vcpu->wants_to_run != fpstate->in_use); 12824 + fpu_in_use = fpstate->in_use; 12825 + if (fpu_in_use) 12826 + kvm_put_guest_fpu(vcpu); 12818 12827 for_each_set_bit(i, (unsigned long *)&xfeatures_mask, XFEATURE_MAX) 12819 12828 fpstate_clear_xstate_component(fpstate, i); 12820 - kvm_load_guest_fpu(vcpu); 12829 + if (fpu_in_use) 12830 + kvm_load_guest_fpu(vcpu); 12821 12831 } 12822 12832 12823 12833 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+3
tools/testing/selftests/kvm/arm64/get-reg-list.c
··· 63 63 REG_FEAT(HDFGWTR2_EL2, ID_AA64MMFR0_EL1, FGT, FGT2), 64 64 REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), 65 65 REG_FEAT(SCTLR2_EL1, ID_AA64MMFR3_EL1, SCTLRX, IMP), 66 + REG_FEAT(SCTLR2_EL2, ID_AA64MMFR3_EL1, SCTLRX, IMP), 66 67 REG_FEAT(VDISR_EL2, ID_AA64PFR0_EL1, RAS, IMP), 67 68 REG_FEAT(VSESR_EL2, ID_AA64PFR0_EL1, RAS, IMP), 68 69 REG_FEAT(VNCR_EL2, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY), 69 70 REG_FEAT(CNTHV_CTL_EL2, ID_AA64MMFR1_EL1, VH, IMP), 70 71 REG_FEAT(CNTHV_CVAL_EL2,ID_AA64MMFR1_EL1, VH, IMP), 72 + REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), 71 73 }; 72 74 73 75 bool filter_reg(__u64 reg) ··· 720 718 SYS_REG(VMPIDR_EL2), 721 719 SYS_REG(SCTLR_EL2), 722 720 SYS_REG(ACTLR_EL2), 721 + SYS_REG(SCTLR2_EL2), 723 722 SYS_REG(HCR_EL2), 724 723 SYS_REG(MDCR_EL2), 725 724 SYS_REG(CPTR_EL2),
+8 -1
tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c
··· 15 15 #include "gic_v3.h" 16 16 #include "processor.h" 17 17 18 + #define GITS_COLLECTION_TARGET_SHIFT 16 19 + 18 20 static u64 its_read_u64(unsigned long offset) 19 21 { 20 22 return readq_relaxed(GITS_BASE_GVA + offset); ··· 165 163 its_mask_encode(&cmd->raw_cmd[2], col, 15, 0); 166 164 } 167 165 166 + static u64 procnum_to_rdbase(u32 vcpu_id) 167 + { 168 + return vcpu_id << GITS_COLLECTION_TARGET_SHIFT; 169 + } 170 + 168 171 #define GITS_CMDQ_POLL_ITERATIONS 0 169 172 170 173 static void its_send_cmd(void *cmdq_base, struct its_cmd_block *cmd) ··· 224 217 225 218 its_encode_cmd(&cmd, GITS_CMD_MAPC); 226 219 its_encode_collection(&cmd, collection_id); 227 - its_encode_target(&cmd, vcpu_id); 220 + its_encode_target(&cmd, procnum_to_rdbase(vcpu_id)); 228 221 its_encode_valid(&cmd, valid); 229 222 230 223 its_send_cmd(cmdq_base, &cmd);
+33 -14
virt/kvm/guest_memfd.c
··· 623 623 return r; 624 624 } 625 625 626 - void kvm_gmem_unbind(struct kvm_memory_slot *slot) 626 + static void __kvm_gmem_unbind(struct kvm_memory_slot *slot, struct kvm_gmem *gmem) 627 627 { 628 628 unsigned long start = slot->gmem.pgoff; 629 629 unsigned long end = start + slot->npages; 630 - struct kvm_gmem *gmem; 631 - struct file *file; 632 630 633 - /* 634 - * Nothing to do if the underlying file was already closed (or is being 635 - * closed right now), kvm_gmem_release() invalidates all bindings. 636 - */ 637 - file = kvm_gmem_get_file(slot); 638 - if (!file) 639 - return; 640 - 641 - gmem = file->private_data; 642 - 643 - filemap_invalidate_lock(file->f_mapping); 644 631 xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL); 645 632 646 633 /* ··· 635 648 * cannot see this memslot. 636 649 */ 637 650 WRITE_ONCE(slot->gmem.file, NULL); 651 + } 652 + 653 + void kvm_gmem_unbind(struct kvm_memory_slot *slot) 654 + { 655 + struct file *file; 656 + 657 + /* 658 + * Nothing to do if the underlying file was _already_ closed, as 659 + * kvm_gmem_release() invalidates and nullifies all bindings. 660 + */ 661 + if (!slot->gmem.file) 662 + return; 663 + 664 + file = kvm_gmem_get_file(slot); 665 + 666 + /* 667 + * However, if the file is _being_ closed, then the bindings need to be 668 + * removed as kvm_gmem_release() might not run until after the memslot 669 + * is freed. Note, modifying the bindings is safe even though the file 670 + * is dying as kvm_gmem_release() nullifies slot->gmem.file under 671 + * slots_lock, and only puts its reference to KVM after destroying all 672 + * bindings. I.e. reaching this point means kvm_gmem_release() hasn't 673 + * yet destroyed the bindings or freed the gmem_file, and can't do so 674 + * until the caller drops slots_lock. 675 + */ 676 + if (!file) { 677 + __kvm_gmem_unbind(slot, slot->gmem.file->private_data); 678 + return; 679 + } 680 + 681 + filemap_invalidate_lock(file->f_mapping); 682 + __kvm_gmem_unbind(slot, file->private_data); 638 683 filemap_invalidate_unlock(file->f_mapping); 639 684 640 685 fput(file);