Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull kvm fixes from Paolo Bonzini:
"Quite a large pull request, partly due to skipping last week and
therefore having material from ~all submaintainers in this one. About
a fourth of it is a new selftest, and a couple more changes are large
in number of files touched (fixing a -Wflex-array-member-not-at-end
compiler warning) or lines changed (reformatting of a table in the API
documentation, thanks rST).

But who am I kidding---it's a lot of commits and there are a lot of
bugs being fixed here, some of them on the nastier side like the
RISC-V ones.

ARM:

- Correctly handle deactivation of interrupts that were activated
from LRs. Since EOIcount only denotes deactivation of interrupts
that are not present in an LR, start EOIcount deactivation walk
*after* the last irq that made it into an LR

- Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
is already enabled -- not only thhis isn't possible (pKVM will
reject the call), but it is also useless: this can only happen for
a CPU that has already booted once, and the capability will not
change

- Fix a couple of low-severity bugs in our S2 fault handling path,
affecting the recently introduced LS64 handling and the even more
esoteric handling of hwpoison in a nested context

- Address yet another syzkaller finding in the vgic initialisation,
where we would end-up destroying an uninitialised vgic with nasty
consequences

- Address an annoying case of pKVM failing to boot when some of the
memblock regions that the host is faulting in are not page-aligned

- Inject some sanity in the NV stage-2 walker by checking the limits
against the advertised PA size, and correctly report the resulting
faults

PPC:

- Fix a PPC e500 build error due to a long-standing wart that was
exposed by the recent conversion to kmalloc_obj(); rip out all the
ugliness that led to the wart

RISC-V:

- Prevent speculative out-of-bounds access using array_index_nospec()
in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
access, float register access, and PMU counter access

- Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

- Fix potential null pointer dereference in
kvm_riscv_vcpu_aia_rmw_topei()

- Fix off-by-one array access in SBI PMU

- Skip THP support check during dirty logging

- Fix error code returned for Smstateen and Ssaia ONE_REG interface

- Check host Ssaia extension when creating AIA irqchip

x86:

- Fix cases where CPUID mitigation features were incorrectly marked
as available whenever the kernel used scattered feature words for
them

- Validate _all_ GVAs, rather than just the first GVA, when
processing a range of GVAs for Hyper-V's TLB flush hypercalls

- Fix a brown paper bug in add_atomic_switch_msr()

- Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

- Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
local APIC (and AVIC is enabled at the module level)

- Update CR8 write interception when AVIC is (de)activated, to fix a
bug where the guest can run in perpetuity with the CR8 intercept
enabled

- Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
default) an unintentional tightening of userspace ABI in 6.17, and
provides some amount of backwards compatibility with hypervisors
who want to freeze PMCs on VM-Entry

- Validate the VMCS/VMCB on return to a nested guest from SMM,
because either userspace or the guest could stash invalid values in
memory and trigger the processor's consistency checks

Generic:

- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
being unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end

- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite
being rather unintuitive

Selftests:

- Increase the maximum number of NUMA nodes in the guest_memfd
selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
Documentation: kvm: fix formatting of the quirks table
KVM: x86: clarify leave_smm() return value
selftests: kvm: add a test that VMX validates controls on RSM
selftests: kvm: extract common functionality out of smm_test.c
KVM: SVM: check validity of VMCB controls when returning from SMM
KVM: VMX: check validity of VMCS controls when returning from SMM
KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
KVM: x86: synthesize CPUID bits only if CPU capability is set
KVM: PPC: e500: Rip out "struct tlbe_ref"
KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
KVM: selftests: Increase 'maxnode' for guest_memfd tests
KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
...

Linus Torvalds 1 month ago 11e8c7e9 4f3df2e5

+749 -379

57 changed files

expand all

Documentation

virt

kvm

api.rst

locking.rst

arch

arm64

include

asm

kvm_host.h

kernel

cpufeature.c

kvm

at.c

guest.c

hyp

nvhe

mem_protect.c

mmu.c

nested.c

vgic

vgic-init.c

vgic-v2.c

vgic-v3.c

vgic.c

loongarch

kvm

vcpu.c

vm.c

mips

kvm

mips.c

powerpc

kvm

book3s.c

booke.c

e500.h

e500_mmu.c

e500_mmu_host.c

riscv

kvm

aia.c

aia_aplic.c

aia_device.c

aia_imsic.c

mmu.c

vcpu.c

vcpu_fp.c

vcpu_onereg.c

vcpu_pmu.c

vm.c

s390

kvm

kvm-s390.c

x86

include

asm

kvm_host.h

uapi

asm

kvm.h

kvm

cpuid.c

hyperv.c

ioapic.c

svm

avic.c

nested.c

svm.c

svm.h

vmx

nested.c

nested.h

vmx.c

x86.c

include

linux

kvm_host.h

uapi

linux

kvm.h

tools

testing

selftests

kvm

Makefile.kvm

guest_memfd_test.c

include

x86

processor.h

smm.h

lib

x86

processor.c

x86

evmcs_smm_controls_test.c

sev_smoke_test.c

smm_test.c

virt

kvm

binary_stats.c

kvm_main.c

+107 -99

Documentation/virt/kvm/api.rst

··· 8435 8435 8436 8436 The valid bits in cap.args[0] are: 8437 8437 8438 - =================================== ============================================ 8439 - KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT 8440 - LINT0 register is 0x700 (APIC_MODE_EXTINT). 8441 - When this quirk is disabled, the reset value 8442 - is 0x10000 (APIC_LVT_MASKED). 8438 + ======================================== ================================================ 8439 + KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT 8440 + LINT0 register is 0x700 (APIC_MODE_EXTINT). 8441 + When this quirk is disabled, the reset value 8442 + is 0x10000 (APIC_LVT_MASKED). 8443 8443 8444 - KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on 8445 - AMD CPUs to workaround buggy guest firmware 8446 - that runs in perpetuity with CR0.CD, i.e. 8447 - with caches in "no fill" mode. 8444 + KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on 8445 + AMD CPUs to workaround buggy guest firmware 8446 + that runs in perpetuity with CR0.CD, i.e. 8447 + with caches in "no fill" mode. 8448 8448 8449 - When this quirk is disabled, KVM does not 8450 - change the value of CR0.CD and CR0.NW. 8449 + When this quirk is disabled, KVM does not 8450 + change the value of CR0.CD and CR0.NW. 8451 8451 8452 - KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is 8453 - available even when configured for x2APIC 8454 - mode. When this quirk is disabled, KVM 8455 - disables the MMIO LAPIC interface if the 8456 - LAPIC is in x2APIC mode. 8452 + KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is 8453 + available even when configured for x2APIC 8454 + mode. When this quirk is disabled, KVM 8455 + disables the MMIO LAPIC interface if the 8456 + LAPIC is in x2APIC mode. 8457 8457 8458 - KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before 8459 - exiting to userspace for an OUT instruction 8460 - to port 0x7e. When this quirk is disabled, 8461 - KVM does not pre-increment %rip before 8462 - exiting to userspace. 8458 + KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before 8459 + exiting to userspace for an OUT instruction 8460 + to port 0x7e. When this quirk is disabled, 8461 + KVM does not pre-increment %rip before 8462 + exiting to userspace. 8463 8463 8464 - KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets 8465 - CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if 8466 - IA32_MISC_ENABLE[bit 18] (MWAIT) is set. 8467 - Additionally, when this quirk is disabled, 8468 - KVM clears CPUID.01H:ECX[bit 3] if 8469 - IA32_MISC_ENABLE[bit 18] is cleared. 8464 + KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets 8465 + CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if 8466 + IA32_MISC_ENABLE[bit 18] (MWAIT) is set. 8467 + Additionally, when this quirk is disabled, 8468 + KVM clears CPUID.01H:ECX[bit 3] if 8469 + IA32_MISC_ENABLE[bit 18] is cleared. 8470 8470 8471 - KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest 8472 - VMMCALL/VMCALL instructions to match the 8473 - vendor's hypercall instruction for the 8474 - system. When this quirk is disabled, KVM 8475 - will no longer rewrite invalid guest 8476 - hypercall instructions. Executing the 8477 - incorrect hypercall instruction will 8478 - generate a #UD within the guest. 8471 + KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest 8472 + VMMCALL/VMCALL instructions to match the 8473 + vendor's hypercall instruction for the 8474 + system. When this quirk is disabled, KVM 8475 + will no longer rewrite invalid guest 8476 + hypercall instructions. Executing the 8477 + incorrect hypercall instruction will 8478 + generate a #UD within the guest. 8479 8479 8480 - KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if 8481 - they are intercepted) as NOPs regardless of 8482 - whether or not MONITOR/MWAIT are supported 8483 - according to guest CPUID. When this quirk 8484 - is disabled and KVM_X86_DISABLE_EXITS_MWAIT 8485 - is not set (MONITOR/MWAIT are intercepted), 8486 - KVM will inject a #UD on MONITOR/MWAIT if 8487 - they're unsupported per guest CPUID. Note, 8488 - KVM will modify MONITOR/MWAIT support in 8489 - guest CPUID on writes to MISC_ENABLE if 8490 - KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is 8491 - disabled. 8480 + KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if 8481 + they are intercepted) as NOPs regardless of 8482 + whether or not MONITOR/MWAIT are supported 8483 + according to guest CPUID. When this quirk 8484 + is disabled and KVM_X86_DISABLE_EXITS_MWAIT 8485 + is not set (MONITOR/MWAIT are intercepted), 8486 + KVM will inject a #UD on MONITOR/MWAIT if 8487 + they're unsupported per guest CPUID. Note, 8488 + KVM will modify MONITOR/MWAIT support in 8489 + guest CPUID on writes to MISC_ENABLE if 8490 + KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is 8491 + disabled. 8492 8492 8493 - KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM 8494 - invalidates all SPTEs in all memslots and 8495 - address spaces when a memslot is deleted or 8496 - moved. When this quirk is disabled (or the 8497 - VM type isn't KVM_X86_DEFAULT_VM), KVM only 8498 - ensures the backing memory of the deleted 8499 - or moved memslot isn't reachable, i.e KVM 8500 - _may_ invalidate only SPTEs related to the 8501 - memslot. 8493 + KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM 8494 + invalidates all SPTEs in all memslots and 8495 + address spaces when a memslot is deleted or 8496 + moved. When this quirk is disabled (or the 8497 + VM type isn't KVM_X86_DEFAULT_VM), KVM only 8498 + ensures the backing memory of the deleted 8499 + or moved memslot isn't reachable, i.e KVM 8500 + _may_ invalidate only SPTEs related to the 8501 + memslot. 8502 8502 8503 - KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the 8504 - vCPU's MSR_IA32_PERF_CAPABILITIES (0x345), 8505 - MSR_IA32_ARCH_CAPABILITIES (0x10a), 8506 - MSR_PLATFORM_INFO (0xce), and all VMX MSRs 8507 - (0x480..0x492) to the maximal capabilities 8508 - supported by KVM. KVM also sets 8509 - MSR_IA32_UCODE_REV (0x8b) to an arbitrary 8510 - value (which is different for Intel vs. 8511 - AMD). Lastly, when guest CPUID is set (by 8512 - userspace), KVM modifies select VMX MSR 8513 - fields to force consistency between guest 8514 - CPUID and L2's effective ISA. When this 8515 - quirk is disabled, KVM zeroes the vCPU's MSR 8516 - values (with two exceptions, see below), 8517 - i.e. treats the feature MSRs like CPUID 8518 - leaves and gives userspace full control of 8519 - the vCPU model definition. This quirk does 8520 - not affect VMX MSRs CR0/CR4_FIXED1 (0x487 8521 - and 0x489), as KVM does now allow them to 8522 - be set by userspace (KVM sets them based on 8523 - guest CPUID, for safety purposes). 8503 + KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the 8504 + vCPU's MSR_IA32_PERF_CAPABILITIES (0x345), 8505 + MSR_IA32_ARCH_CAPABILITIES (0x10a), 8506 + MSR_PLATFORM_INFO (0xce), and all VMX MSRs 8507 + (0x480..0x492) to the maximal capabilities 8508 + supported by KVM. KVM also sets 8509 + MSR_IA32_UCODE_REV (0x8b) to an arbitrary 8510 + value (which is different for Intel vs. 8511 + AMD). Lastly, when guest CPUID is set (by 8512 + userspace), KVM modifies select VMX MSR 8513 + fields to force consistency between guest 8514 + CPUID and L2's effective ISA. When this 8515 + quirk is disabled, KVM zeroes the vCPU's MSR 8516 + values (with two exceptions, see below), 8517 + i.e. treats the feature MSRs like CPUID 8518 + leaves and gives userspace full control of 8519 + the vCPU model definition. This quirk does 8520 + not affect VMX MSRs CR0/CR4_FIXED1 (0x487 8521 + and 0x489), as KVM does now allow them to 8522 + be set by userspace (KVM sets them based on 8523 + guest CPUID, for safety purposes). 8524 8524 8525 - KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores 8526 - guest PAT and forces the effective memory 8527 - type to WB in EPT. The quirk is not available 8528 - on Intel platforms which are incapable of 8529 - safely honoring guest PAT (i.e., without CPU 8530 - self-snoop, KVM always ignores guest PAT and 8531 - forces effective memory type to WB). It is 8532 - also ignored on AMD platforms or, on Intel, 8533 - when a VM has non-coherent DMA devices 8534 - assigned; KVM always honors guest PAT in 8535 - such case. The quirk is needed to avoid 8536 - slowdowns on certain Intel Xeon platforms 8537 - (e.g. ICX, SPR) where self-snoop feature is 8538 - supported but UC is slow enough to cause 8539 - issues with some older guests that use 8540 - UC instead of WC to map the video RAM. 8541 - Userspace can disable the quirk to honor 8542 - guest PAT if it knows that there is no such 8543 - guest software, for example if it does not 8544 - expose a bochs graphics device (which is 8545 - known to have had a buggy driver). 8546 - =================================== ============================================ 8525 + KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores 8526 + guest PAT and forces the effective memory 8527 + type to WB in EPT. The quirk is not available 8528 + on Intel platforms which are incapable of 8529 + safely honoring guest PAT (i.e., without CPU 8530 + self-snoop, KVM always ignores guest PAT and 8531 + forces effective memory type to WB). It is 8532 + also ignored on AMD platforms or, on Intel, 8533 + when a VM has non-coherent DMA devices 8534 + assigned; KVM always honors guest PAT in 8535 + such case. The quirk is needed to avoid 8536 + slowdowns on certain Intel Xeon platforms 8537 + (e.g. ICX, SPR) where self-snoop feature is 8538 + supported but UC is slow enough to cause 8539 + issues with some older guests that use 8540 + UC instead of WC to map the video RAM. 8541 + Userspace can disable the quirk to honor 8542 + guest PAT if it knows that there is no such 8543 + guest software, for example if it does not 8544 + expose a bochs graphics device (which is 8545 + known to have had a buggy driver). 8546 + 8547 + KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM By default, KVM relaxes the consistency 8548 + check for GUEST_IA32_DEBUGCTL in vmcs12 8549 + to allow FREEZE_IN_SMM to be set. When 8550 + this quirk is disabled, KVM requires this 8551 + bit to be cleared. Note that the vmcs02 8552 + bit is still completely controlled by the 8553 + host, regardless of the quirk setting. 8554 + ======================================== ================================================ 8547 8555 8548 8556 7.32 KVM_CAP_MAX_VCPU_ID 8549 8557 ------------------------

Documentation/virt/kvm/locking.rst

··· 17 17 18 18 - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock 19 19 20 + - vcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock 21 + 20 22 - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring 21 23 them together is quite rare. 22 24

arch/arm64/include/asm/kvm_host.h

··· 784 784 /* Number of debug breakpoints/watchpoints for this CPU (minus 1) */ 785 785 unsigned int debug_brps; 786 786 unsigned int debug_wrps; 787 + 788 + /* Last vgic_irq part of the AP list recorded in an LR */ 789 + struct vgic_irq *last_lr_irq; 787 790 }; 788 791 789 792 struct kvm_host_psci_config {

arch/arm64/kernel/cpufeature.c

··· 2345 2345 !is_midr_in_range_list(has_vgic_v3)) 2346 2346 return false; 2347 2347 2348 + /* 2349 + * pKVM prevents late onlining of CPUs. This means that whatever 2350 + * state the capability is in after deprivilege cannot be affected 2351 + * by a new CPU booting -- this is garanteed to be a CPU we have 2352 + * already seen, and the cap is therefore unchanged. 2353 + */ 2354 + if (system_capabilities_finalized() && is_protected_kvm_enabled()) 2355 + return cpus_have_final_cap(ARM64_HAS_ICH_HCR_EL2_TDIR); 2356 + 2348 2357 if (is_kernel_in_hyp_mode()) 2349 2358 res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2); 2350 2359 else

-2

arch/arm64/kvm/at.c

··· 1504 1504 fail = true; 1505 1505 } 1506 1506 1507 - isb(); 1508 - 1509 1507 if (!fail) 1510 1508 par = read_sysreg_par(); 1511 1509

+2 -2

arch/arm64/kvm/guest.c

··· 29 29 30 30 #include "trace.h" 31 31 32 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 32 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 33 33 KVM_GENERIC_VM_STATS() 34 34 }; 35 35 ··· 42 42 sizeof(kvm_vm_stats_desc), 43 43 }; 44 44 45 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 45 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 46 46 KVM_GENERIC_VCPU_STATS(), 47 47 STATS_DESC_COUNTER(VCPU, hvc_exit_stat), 48 48 STATS_DESC_COUNTER(VCPU, wfe_exit_stat),

+1 -1

arch/arm64/kvm/hyp/nvhe/mem_protect.c

··· 518 518 granule = kvm_granule_size(level); 519 519 cur.start = ALIGN_DOWN(addr, granule); 520 520 cur.end = cur.start + granule; 521 - if (!range_included(&cur, range)) 521 + if (!range_included(&cur, range) && level < KVM_PGTABLE_LAST_LEVEL) 522 522 continue; 523 523 *range = cur; 524 524 return 0;

+9 -5

arch/arm64/kvm/mmu.c

··· 1751 1751 1752 1752 force_pte = (max_map_size == PAGE_SIZE); 1753 1753 vma_pagesize = min_t(long, vma_pagesize, max_map_size); 1754 + vma_shift = __ffs(vma_pagesize); 1754 1755 } 1755 1756 1756 1757 /* ··· 1838 1837 if (exec_fault && s2_force_noncacheable) 1839 1838 ret = -ENOEXEC; 1840 1839 1841 - if (ret) { 1842 - kvm_release_page_unused(page); 1843 - return ret; 1844 - } 1840 + if (ret) 1841 + goto out_put_page; 1845 1842 1846 1843 /* 1847 1844 * Guest performs atomic/exclusive operations on memory with unsupported ··· 1849 1850 */ 1850 1851 if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(vcpu))) { 1851 1852 kvm_inject_dabt_excl_atomic(vcpu, kvm_vcpu_get_hfar(vcpu)); 1852 - return 1; 1853 + ret = 1; 1854 + goto out_put_page; 1853 1855 } 1854 1856 1855 1857 if (nested) ··· 1936 1936 mark_page_dirty_in_slot(kvm, memslot, gfn); 1937 1937 1938 1938 return ret != -EAGAIN ? ret : 0; 1939 + 1940 + out_put_page: 1941 + kvm_release_page_unused(page); 1942 + return ret; 1939 1943 } 1940 1944 1941 1945 /* Resolve the access fault by making the page young again. */

+16 -11

arch/arm64/kvm/nested.c

··· 152 152 return 64 - wi->t0sz; 153 153 } 154 154 155 - static int check_base_s2_limits(struct s2_walk_info *wi, 155 + static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi, 156 156 int level, int input_size, int stride) 157 157 { 158 - int start_size, ia_size; 158 + int start_size, pa_max; 159 159 160 - ia_size = get_ia_size(wi); 160 + pa_max = kvm_get_pa_bits(vcpu->kvm); 161 161 162 162 /* Check translation limits */ 163 163 switch (BIT(wi->pgshift)) { 164 164 case SZ_64K: 165 - if (level == 0 || (level == 1 && ia_size <= 42)) 165 + if (level == 0 || (level == 1 && pa_max <= 42)) 166 166 return -EFAULT; 167 167 break; 168 168 case SZ_16K: 169 - if (level == 0 || (level == 1 && ia_size <= 40)) 169 + if (level == 0 || (level == 1 && pa_max <= 40)) 170 170 return -EFAULT; 171 171 break; 172 172 case SZ_4K: 173 - if (level < 0 || (level == 0 && ia_size <= 42)) 173 + if (level < 0 || (level == 0 && pa_max <= 42)) 174 174 return -EFAULT; 175 175 break; 176 176 } 177 177 178 178 /* Check input size limits */ 179 - if (input_size > ia_size) 179 + if (input_size > pa_max) 180 180 return -EFAULT; 181 181 182 182 /* Check number of entries in starting level table */ ··· 269 269 if (input_size > 48 || input_size < 25) 270 270 return -EFAULT; 271 271 272 - ret = check_base_s2_limits(wi, level, input_size, stride); 273 - if (WARN_ON(ret)) 272 + ret = check_base_s2_limits(vcpu, wi, level, input_size, stride); 273 + if (WARN_ON(ret)) { 274 + out->esr = compute_fsc(0, ESR_ELx_FSC_FAULT); 274 275 return ret; 276 + } 275 277 276 278 base_lower_bound = 3 + input_size - ((3 - level) * stride + 277 279 wi->pgshift); 278 280 base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound); 279 281 280 282 if (check_output_size(wi, base_addr)) { 281 - out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ); 283 + /* R_BFHQH */ 284 + out->esr = compute_fsc(0, ESR_ELx_FSC_ADDRSZ); 282 285 return 1; 283 286 } 284 287 ··· 296 293 297 294 paddr = base_addr | index; 298 295 ret = read_guest_s2_desc(vcpu, paddr, &desc, wi); 299 - if (ret < 0) 296 + if (ret < 0) { 297 + out->esr = ESR_ELx_FSC_SEA_TTW(level); 300 298 return ret; 299 + } 301 300 302 301 new_desc = desc; 303 302

+17 -17

arch/arm64/kvm/vgic/vgic-init.c

··· 143 143 kvm->arch.vgic.in_kernel = true; 144 144 kvm->arch.vgic.vgic_model = type; 145 145 kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST; 146 - 147 - kvm_for_each_vcpu(i, vcpu, kvm) { 148 - ret = vgic_allocate_private_irqs_locked(vcpu, type); 149 - if (ret) 150 - break; 151 - } 152 - 153 - if (ret) { 154 - kvm_for_each_vcpu(i, vcpu, kvm) { 155 - struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 156 - kfree(vgic_cpu->private_irqs); 157 - vgic_cpu->private_irqs = NULL; 158 - } 159 - 160 - goto out_unlock; 161 - } 162 - 163 146 kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF; 164 147 165 148 aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC; ··· 158 175 159 176 kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0); 160 177 kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1); 178 + 179 + kvm_for_each_vcpu(i, vcpu, kvm) { 180 + ret = vgic_allocate_private_irqs_locked(vcpu, type); 181 + if (ret) 182 + break; 183 + } 184 + 185 + if (ret) { 186 + kvm_for_each_vcpu(i, vcpu, kvm) { 187 + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 188 + kfree(vgic_cpu->private_irqs); 189 + vgic_cpu->private_irqs = NULL; 190 + } 191 + 192 + kvm->arch.vgic.vgic_model = 0; 193 + goto out_unlock; 194 + } 161 195 162 196 if (type == KVM_DEV_TYPE_ARM_VGIC_V3) 163 197 kvm->arch.vgic.nassgicap = system_supports_direct_sgis();

+2 -2

arch/arm64/kvm/vgic/vgic-v2.c

··· 115 115 struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 116 116 struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2; 117 117 u32 eoicount = FIELD_GET(GICH_HCR_EOICOUNT, cpuif->vgic_hcr); 118 - struct vgic_irq *irq; 118 + struct vgic_irq *irq = *host_data_ptr(last_lr_irq); 119 119 120 120 DEBUG_SPINLOCK_BUG_ON(!irqs_disabled()); 121 121 ··· 123 123 vgic_v2_fold_lr(vcpu, cpuif->vgic_lr[lr]); 124 124 125 125 /* See the GICv3 equivalent for the EOIcount handling rationale */ 126 - list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 126 + list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) { 127 127 u32 lr; 128 128 129 129 if (!eoicount) {

+6 -6

arch/arm64/kvm/vgic/vgic-v3.c

··· 148 148 struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 149 149 struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3; 150 150 u32 eoicount = FIELD_GET(ICH_HCR_EL2_EOIcount, cpuif->vgic_hcr); 151 - struct vgic_irq *irq; 151 + struct vgic_irq *irq = *host_data_ptr(last_lr_irq); 152 152 153 153 DEBUG_SPINLOCK_BUG_ON(!irqs_disabled()); 154 154 ··· 158 158 /* 159 159 * EOIMode=0: use EOIcount to emulate deactivation. We are 160 160 * guaranteed to deactivate in reverse order of the activation, so 161 - * just pick one active interrupt after the other in the ap_list, 162 - * and replay the deactivation as if the CPU was doing it. We also 163 - * rely on priority drop to have taken place, and the list to be 164 - * sorted by priority. 161 + * just pick one active interrupt after the other in the tail part 162 + * of the ap_list, past the LRs, and replay the deactivation as if 163 + * the CPU was doing it. We also rely on priority drop to have taken 164 + * place, and the list to be sorted by priority. 165 165 */ 166 - list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 166 + list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) { 167 167 u64 lr; 168 168 169 169 /*

arch/arm64/kvm/vgic/vgic.c

··· 814 814 815 815 static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu) 816 816 { 817 + if (!*host_data_ptr(last_lr_irq)) 818 + return; 819 + 817 820 if (kvm_vgic_global_state.type == VGIC_V2) 818 821 vgic_v2_fold_lr_state(vcpu); 819 822 else ··· 963 960 if (irqs_outside_lrs(&als)) 964 961 vgic_sort_ap_list(vcpu); 965 962 963 + *host_data_ptr(last_lr_irq) = NULL; 964 + 966 965 list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 967 966 scoped_guard(raw_spinlock, &irq->irq_lock) { 968 967 if (likely(vgic_target_oracle(irq) == vcpu)) { 969 968 vgic_populate_lr(vcpu, irq, count++); 969 + *host_data_ptr(last_lr_irq) = irq; 970 970 } 971 971 } 972 972

+1 -1

arch/loongarch/kvm/vcpu.c

··· 14 14 #define CREATE_TRACE_POINTS 15 15 #include "trace.h" 16 16 17 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 17 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 18 18 KVM_GENERIC_VCPU_STATS(), 19 19 STATS_DESC_COUNTER(VCPU, int_exits), 20 20 STATS_DESC_COUNTER(VCPU, idle_exits),

+1 -1

arch/loongarch/kvm/vm.c

··· 10 10 #include <asm/kvm_eiointc.h> 11 11 #include <asm/kvm_pch_pic.h> 12 12 13 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 13 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 14 14 KVM_GENERIC_VM_STATS(), 15 15 STATS_DESC_ICOUNTER(VM, pages), 16 16 STATS_DESC_ICOUNTER(VM, hugepages),

+2 -2

arch/mips/kvm/mips.c

··· 38 38 #define VECTORSPACING 0x100 /* for EI/VI mode */ 39 39 #endif 40 40 41 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 41 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 42 42 KVM_GENERIC_VM_STATS() 43 43 }; 44 44 ··· 51 51 sizeof(kvm_vm_stats_desc), 52 52 }; 53 53 54 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 54 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 55 55 KVM_GENERIC_VCPU_STATS(), 56 56 STATS_DESC_COUNTER(VCPU, wait_exits), 57 57 STATS_DESC_COUNTER(VCPU, cache_exits),

+2 -2

arch/powerpc/kvm/book3s.c

··· 38 38 39 39 /* #define EXIT_DEBUG */ 40 40 41 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 41 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 42 42 KVM_GENERIC_VM_STATS(), 43 43 STATS_DESC_ICOUNTER(VM, num_2M_pages), 44 44 STATS_DESC_ICOUNTER(VM, num_1G_pages) ··· 53 53 sizeof(kvm_vm_stats_desc), 54 54 }; 55 55 56 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 56 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 57 57 KVM_GENERIC_VCPU_STATS(), 58 58 STATS_DESC_COUNTER(VCPU, sum_exits), 59 59 STATS_DESC_COUNTER(VCPU, mmio_exits),

+2 -2

arch/powerpc/kvm/booke.c

··· 36 36 37 37 unsigned long kvmppc_booke_handlers; 38 38 39 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 39 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 40 40 KVM_GENERIC_VM_STATS(), 41 41 STATS_DESC_ICOUNTER(VM, num_2M_pages), 42 42 STATS_DESC_ICOUNTER(VM, num_1G_pages) ··· 51 51 sizeof(kvm_vm_stats_desc), 52 52 }; 53 53 54 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 54 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 55 55 KVM_GENERIC_VCPU_STATS(), 56 56 STATS_DESC_COUNTER(VCPU, sum_exits), 57 57 STATS_DESC_COUNTER(VCPU, mmio_exits),

+1 -5

arch/powerpc/kvm/e500.h

··· 39 39 /* bits [6-5] MAS2_X1 and MAS2_X0 and [4-0] bits for WIMGE */ 40 40 #define E500_TLB_MAS2_ATTR (0x7f) 41 41 42 - struct tlbe_ref { 42 + struct tlbe_priv { 43 43 kvm_pfn_t pfn; /* valid only for TLB0, except briefly */ 44 44 unsigned int flags; /* E500_TLB_* */ 45 - }; 46 - 47 - struct tlbe_priv { 48 - struct tlbe_ref ref; 49 45 }; 50 46 51 47 #ifdef CONFIG_KVM_E500V2

+2 -2

arch/powerpc/kvm/e500_mmu.c

··· 920 920 vcpu_e500->gtlb_offset[0] = 0; 921 921 vcpu_e500->gtlb_offset[1] = KVM_E500_TLB0_SIZE; 922 922 923 - vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_ref, 923 + vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_priv, 924 924 vcpu_e500->gtlb_params[0].entries); 925 925 if (!vcpu_e500->gtlb_priv[0]) 926 926 goto free_vcpu; 927 927 928 - vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_ref, 928 + vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_priv, 929 929 vcpu_e500->gtlb_params[1].entries); 930 930 if (!vcpu_e500->gtlb_priv[1]) 931 931 goto free_vcpu;

+44 -47

arch/powerpc/kvm/e500_mmu_host.c

··· 189 189 { 190 190 struct kvm_book3e_206_tlb_entry *gtlbe = 191 191 get_entry(vcpu_e500, tlbsel, esel); 192 - struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[tlbsel][esel].ref; 192 + struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[tlbsel][esel]; 193 193 194 194 /* Don't bother with unmapped entries */ 195 - if (!(ref->flags & E500_TLB_VALID)) { 196 - WARN(ref->flags & (E500_TLB_BITMAP | E500_TLB_TLB0), 197 - "%s: flags %x\n", __func__, ref->flags); 195 + if (!(tlbe->flags & E500_TLB_VALID)) { 196 + WARN(tlbe->flags & (E500_TLB_BITMAP | E500_TLB_TLB0), 197 + "%s: flags %x\n", __func__, tlbe->flags); 198 198 WARN_ON(tlbsel == 1 && vcpu_e500->g2h_tlb1_map[esel]); 199 199 } 200 200 201 - if (tlbsel == 1 && ref->flags & E500_TLB_BITMAP) { 201 + if (tlbsel == 1 && tlbe->flags & E500_TLB_BITMAP) { 202 202 u64 tmp = vcpu_e500->g2h_tlb1_map[esel]; 203 203 int hw_tlb_indx; 204 204 unsigned long flags; ··· 216 216 } 217 217 mb(); 218 218 vcpu_e500->g2h_tlb1_map[esel] = 0; 219 - ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID); 219 + tlbe->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID); 220 220 local_irq_restore(flags); 221 221 } 222 222 223 - if (tlbsel == 1 && ref->flags & E500_TLB_TLB0) { 223 + if (tlbsel == 1 && tlbe->flags & E500_TLB_TLB0) { 224 224 /* 225 225 * TLB1 entry is backed by 4k pages. This should happen 226 226 * rarely and is not worth optimizing. Invalidate everything. 227 227 */ 228 228 kvmppc_e500_tlbil_all(vcpu_e500); 229 - ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID); 229 + tlbe->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID); 230 230 } 231 231 232 232 /* 233 233 * If TLB entry is still valid then it's a TLB0 entry, and thus 234 234 * backed by at most one host tlbe per shadow pid 235 235 */ 236 - if (ref->flags & E500_TLB_VALID) 236 + if (tlbe->flags & E500_TLB_VALID) 237 237 kvmppc_e500_tlbil_one(vcpu_e500, gtlbe); 238 238 239 239 /* Mark the TLB as not backed by the host anymore */ 240 - ref->flags = 0; 240 + tlbe->flags = 0; 241 241 } 242 242 243 243 static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe) ··· 245 245 return tlbe->mas7_3 & (MAS3_SW|MAS3_UW); 246 246 } 247 247 248 - static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref, 249 - struct kvm_book3e_206_tlb_entry *gtlbe, 250 - kvm_pfn_t pfn, unsigned int wimg, 251 - bool writable) 248 + static inline void kvmppc_e500_tlbe_setup(struct tlbe_priv *tlbe, 249 + struct kvm_book3e_206_tlb_entry *gtlbe, 250 + kvm_pfn_t pfn, unsigned int wimg, 251 + bool writable) 252 252 { 253 - ref->pfn = pfn; 254 - ref->flags = E500_TLB_VALID; 253 + tlbe->pfn = pfn; 254 + tlbe->flags = E500_TLB_VALID; 255 255 if (writable) 256 - ref->flags |= E500_TLB_WRITABLE; 256 + tlbe->flags |= E500_TLB_WRITABLE; 257 257 258 258 /* Use guest supplied MAS2_G and MAS2_E */ 259 - ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg; 259 + tlbe->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg; 260 260 } 261 261 262 - static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref) 262 + static inline void kvmppc_e500_tlbe_release(struct tlbe_priv *tlbe) 263 263 { 264 - if (ref->flags & E500_TLB_VALID) { 264 + if (tlbe->flags & E500_TLB_VALID) { 265 265 /* FIXME: don't log bogus pfn for TLB1 */ 266 - trace_kvm_booke206_ref_release(ref->pfn, ref->flags); 267 - ref->flags = 0; 266 + trace_kvm_booke206_ref_release(tlbe->pfn, tlbe->flags); 267 + tlbe->flags = 0; 268 268 } 269 269 } 270 270 ··· 284 284 int i; 285 285 286 286 for (tlbsel = 0; tlbsel <= 1; tlbsel++) { 287 - for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++) { 288 - struct tlbe_ref *ref = 289 - &vcpu_e500->gtlb_priv[tlbsel][i].ref; 290 - kvmppc_e500_ref_release(ref); 291 - } 287 + for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++) 288 + kvmppc_e500_tlbe_release(&vcpu_e500->gtlb_priv[tlbsel][i]); 292 289 } 293 290 } 294 291 ··· 301 304 static void kvmppc_e500_setup_stlbe( 302 305 struct kvm_vcpu *vcpu, 303 306 struct kvm_book3e_206_tlb_entry *gtlbe, 304 - int tsize, struct tlbe_ref *ref, u64 gvaddr, 307 + int tsize, struct tlbe_priv *tlbe, u64 gvaddr, 305 308 struct kvm_book3e_206_tlb_entry *stlbe) 306 309 { 307 - kvm_pfn_t pfn = ref->pfn; 310 + kvm_pfn_t pfn = tlbe->pfn; 308 311 u32 pr = vcpu->arch.shared->msr & MSR_PR; 309 - bool writable = !!(ref->flags & E500_TLB_WRITABLE); 312 + bool writable = !!(tlbe->flags & E500_TLB_WRITABLE); 310 313 311 - BUG_ON(!(ref->flags & E500_TLB_VALID)); 314 + BUG_ON(!(tlbe->flags & E500_TLB_VALID)); 312 315 313 316 /* Force IPROT=0 for all guest mappings. */ 314 317 stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; 315 - stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR); 318 + stlbe->mas2 = (gvaddr & MAS2_EPN) | (tlbe->flags & E500_TLB_MAS2_ATTR); 316 319 stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) | 317 320 e500_shadow_mas3_attrib(gtlbe->mas7_3, writable, pr); 318 321 } ··· 320 323 static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, 321 324 u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe, 322 325 int tlbsel, struct kvm_book3e_206_tlb_entry *stlbe, 323 - struct tlbe_ref *ref) 326 + struct tlbe_priv *tlbe) 324 327 { 325 328 struct kvm_memory_slot *slot; 326 329 unsigned int psize; ··· 452 455 } 453 456 } 454 457 455 - kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg, writable); 458 + kvmppc_e500_tlbe_setup(tlbe, gtlbe, pfn, wimg, writable); 456 459 kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize, 457 - ref, gvaddr, stlbe); 460 + tlbe, gvaddr, stlbe); 458 461 writable = tlbe_is_writable(stlbe); 459 462 460 463 /* Clear i-cache for new pages */ ··· 471 474 struct kvm_book3e_206_tlb_entry *stlbe) 472 475 { 473 476 struct kvm_book3e_206_tlb_entry *gtlbe; 474 - struct tlbe_ref *ref; 477 + struct tlbe_priv *tlbe; 475 478 int stlbsel = 0; 476 479 int sesel = 0; 477 480 int r; 478 481 479 482 gtlbe = get_entry(vcpu_e500, 0, esel); 480 - ref = &vcpu_e500->gtlb_priv[0][esel].ref; 483 + tlbe = &vcpu_e500->gtlb_priv[0][esel]; 481 484 482 485 r = kvmppc_e500_shadow_map(vcpu_e500, get_tlb_eaddr(gtlbe), 483 486 get_tlb_raddr(gtlbe) >> PAGE_SHIFT, 484 - gtlbe, 0, stlbe, ref); 487 + gtlbe, 0, stlbe, tlbe); 485 488 if (r) 486 489 return r; 487 490 ··· 491 494 } 492 495 493 496 static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500, 494 - struct tlbe_ref *ref, 497 + struct tlbe_priv *tlbe, 495 498 int esel) 496 499 { 497 500 unsigned int sesel = vcpu_e500->host_tlb1_nv++; ··· 504 507 vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel); 505 508 } 506 509 507 - vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP; 510 + vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_BITMAP; 508 511 vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel; 509 512 vcpu_e500->h2g_tlb1_rmap[sesel] = esel + 1; 510 - WARN_ON(!(ref->flags & E500_TLB_VALID)); 513 + WARN_ON(!(tlbe->flags & E500_TLB_VALID)); 511 514 512 515 return sesel; 513 516 } ··· 519 522 u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe, 520 523 struct kvm_book3e_206_tlb_entry *stlbe, int esel) 521 524 { 522 - struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[1][esel].ref; 525 + struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[1][esel]; 523 526 int sesel; 524 527 int r; 525 528 526 529 r = kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe, 527 - ref); 530 + tlbe); 528 531 if (r) 529 532 return r; 530 533 531 534 /* Use TLB0 when we can only map a page with 4k */ 532 535 if (get_tlb_tsize(stlbe) == BOOK3E_PAGESZ_4K) { 533 - vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_TLB0; 536 + vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_TLB0; 534 537 write_stlbe(vcpu_e500, gtlbe, stlbe, 0, 0); 535 538 return 0; 536 539 } 537 540 538 541 /* Otherwise map into TLB1 */ 539 - sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, ref, esel); 542 + sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, tlbe, esel); 540 543 write_stlbe(vcpu_e500, gtlbe, stlbe, 1, sesel); 541 544 542 545 return 0; ··· 558 561 priv = &vcpu_e500->gtlb_priv[tlbsel][esel]; 559 562 560 563 /* Triggers after clear_tlb_privs or on initial mapping */ 561 - if (!(priv->ref.flags & E500_TLB_VALID)) { 564 + if (!(priv->flags & E500_TLB_VALID)) { 562 565 kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe); 563 566 } else { 564 567 kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K, 565 - &priv->ref, eaddr, &stlbe); 568 + priv, eaddr, &stlbe); 566 569 write_stlbe(vcpu_e500, gtlbe, &stlbe, 0, 0); 567 570 } 568 571 break;

+13 -2

arch/riscv/kvm/aia.c

··· 13 13 #include <linux/irqchip/riscv-imsic.h> 14 14 #include <linux/irqdomain.h> 15 15 #include <linux/kvm_host.h> 16 + #include <linux/nospec.h> 16 17 #include <linux/percpu.h> 17 18 #include <linux/spinlock.h> 18 19 #include <asm/cpufeature.h> ··· 183 182 unsigned long *out_val) 184 183 { 185 184 struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr; 185 + unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long); 186 186 187 - if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long)) 187 + if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA)) 188 188 return -ENOENT; 189 + if (reg_num >= regs_max) 190 + return -ENOENT; 191 + 192 + reg_num = array_index_nospec(reg_num, regs_max); 189 193 190 194 *out_val = 0; 191 195 if (kvm_riscv_aia_available()) ··· 204 198 unsigned long val) 205 199 { 206 200 struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr; 201 + unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long); 207 202 208 - if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long)) 203 + if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA)) 209 204 return -ENOENT; 205 + if (reg_num >= regs_max) 206 + return -ENOENT; 207 + 208 + reg_num = array_index_nospec(reg_num, regs_max); 210 209 211 210 if (kvm_riscv_aia_available()) { 212 211 ((unsigned long *)csr)[reg_num] = val;

+12 -11

arch/riscv/kvm/aia_aplic.c

··· 10 10 #include <linux/irqchip/riscv-aplic.h> 11 11 #include <linux/kvm_host.h> 12 12 #include <linux/math.h> 13 + #include <linux/nospec.h> 13 14 #include <linux/spinlock.h> 14 15 #include <linux/swab.h> 15 16 #include <kvm/iodev.h> ··· 46 45 47 46 if (!irq || aplic->nr_irqs <= irq) 48 47 return 0; 49 - irqd = &aplic->irqs[irq]; 48 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 50 49 51 50 raw_spin_lock_irqsave(&irqd->lock, flags); 52 51 ret = irqd->sourcecfg; ··· 62 61 63 62 if (!irq || aplic->nr_irqs <= irq) 64 63 return; 65 - irqd = &aplic->irqs[irq]; 64 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 66 65 67 66 if (val & APLIC_SOURCECFG_D) 68 67 val = 0; ··· 82 81 83 82 if (!irq || aplic->nr_irqs <= irq) 84 83 return 0; 85 - irqd = &aplic->irqs[irq]; 84 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 86 85 87 86 raw_spin_lock_irqsave(&irqd->lock, flags); 88 87 ret = irqd->target; ··· 98 97 99 98 if (!irq || aplic->nr_irqs <= irq) 100 99 return; 101 - irqd = &aplic->irqs[irq]; 100 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 102 101 103 102 val &= APLIC_TARGET_EIID_MASK | 104 103 (APLIC_TARGET_HART_IDX_MASK << APLIC_TARGET_HART_IDX_SHIFT) | ··· 117 116 118 117 if (!irq || aplic->nr_irqs <= irq) 119 118 return false; 120 - irqd = &aplic->irqs[irq]; 119 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 121 120 122 121 raw_spin_lock_irqsave(&irqd->lock, flags); 123 122 ret = (irqd->state & APLIC_IRQ_STATE_PENDING) ? true : false; ··· 133 132 134 133 if (!irq || aplic->nr_irqs <= irq) 135 134 return; 136 - irqd = &aplic->irqs[irq]; 135 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 137 136 138 137 raw_spin_lock_irqsave(&irqd->lock, flags); 139 138 ··· 171 170 172 171 if (!irq || aplic->nr_irqs <= irq) 173 172 return false; 174 - irqd = &aplic->irqs[irq]; 173 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 175 174 176 175 raw_spin_lock_irqsave(&irqd->lock, flags); 177 176 ret = (irqd->state & APLIC_IRQ_STATE_ENABLED) ? true : false; ··· 187 186 188 187 if (!irq || aplic->nr_irqs <= irq) 189 188 return; 190 - irqd = &aplic->irqs[irq]; 189 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 191 190 192 191 raw_spin_lock_irqsave(&irqd->lock, flags); 193 192 if (enabled) ··· 206 205 207 206 if (!irq || aplic->nr_irqs <= irq) 208 207 return false; 209 - irqd = &aplic->irqs[irq]; 208 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 210 209 211 210 raw_spin_lock_irqsave(&irqd->lock, flags); 212 211 ··· 255 254 for (irq = first; irq <= last; irq++) { 256 255 if (!irq || aplic->nr_irqs <= irq) 257 256 continue; 258 - irqd = &aplic->irqs[irq]; 257 + irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)]; 259 258 260 259 raw_spin_lock_irqsave(&irqd->lock, flags); 261 260 ··· 284 283 285 284 if (!aplic || !source || (aplic->nr_irqs <= source)) 286 285 return -ENODEV; 287 - irqd = &aplic->irqs[source]; 286 + irqd = &aplic->irqs[array_index_nospec(source, aplic->nr_irqs)]; 288 287 ie = (aplic->domaincfg & APLIC_DOMAINCFG_IE) ? true : false; 289 288 290 289 raw_spin_lock_irqsave(&irqd->lock, flags);

+14 -4

arch/riscv/kvm/aia_device.c

··· 11 11 #include <linux/irqchip/riscv-imsic.h> 12 12 #include <linux/kvm_host.h> 13 13 #include <linux/uaccess.h> 14 + #include <linux/cpufeature.h> 14 15 15 16 static int aia_create(struct kvm_device *dev, u32 type) 16 17 { ··· 22 21 23 22 if (irqchip_in_kernel(kvm)) 24 23 return -EEXIST; 24 + 25 + if (!riscv_isa_extension_available(NULL, SSAIA)) 26 + return -ENODEV; 25 27 26 28 ret = -EBUSY; 27 29 if (kvm_trylock_all_vcpus(kvm)) ··· 441 437 442 438 static int aia_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr) 443 439 { 444 - int nr_vcpus; 440 + int nr_vcpus, r = -ENXIO; 445 441 446 442 switch (attr->group) { 447 443 case KVM_DEV_RISCV_AIA_GRP_CONFIG: ··· 470 466 } 471 467 break; 472 468 case KVM_DEV_RISCV_AIA_GRP_APLIC: 473 - return kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr); 469 + mutex_lock(&dev->kvm->lock); 470 + r = kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr); 471 + mutex_unlock(&dev->kvm->lock); 472 + break; 474 473 case KVM_DEV_RISCV_AIA_GRP_IMSIC: 475 - return kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr); 474 + mutex_lock(&dev->kvm->lock); 475 + r = kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr); 476 + mutex_unlock(&dev->kvm->lock); 477 + break; 476 478 } 477 479 478 - return -ENXIO; 480 + return r; 479 481 } 480 482 481 483 struct kvm_device_ops kvm_riscv_aia_device_ops = {

arch/riscv/kvm/aia_imsic.c

··· 908 908 int r, rc = KVM_INSN_CONTINUE_NEXT_SEPC; 909 909 struct imsic *imsic = vcpu->arch.aia_context.imsic_state; 910 910 911 + /* If IMSIC vCPU state not initialized then forward to user space */ 912 + if (!imsic) 913 + return KVM_INSN_EXIT_TO_USER_SPACE; 914 + 911 915 if (isel == KVM_RISCV_AIA_IMSIC_TOPEI) { 912 916 /* Read pending and enabled interrupt with highest priority */ 913 917 topei = imsic_mrif_topei(imsic->swfile, imsic->nr_eix,

+5 -1

arch/riscv/kvm/mmu.c

··· 245 245 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) 246 246 { 247 247 struct kvm_gstage gstage; 248 + bool mmu_locked; 248 249 249 250 if (!kvm->arch.pgd) 250 251 return false; ··· 254 253 gstage.flags = 0; 255 254 gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid); 256 255 gstage.pgd = kvm->arch.pgd; 256 + mmu_locked = spin_trylock(&kvm->mmu_lock); 257 257 kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT, 258 258 (range->end - range->start) << PAGE_SHIFT, 259 259 range->may_block); 260 + if (mmu_locked) 261 + spin_unlock(&kvm->mmu_lock); 260 262 return false; 261 263 } 262 264 ··· 539 535 goto out_unlock; 540 536 541 537 /* Check if we are backed by a THP and thus use block mapping if possible */ 542 - if (vma_pagesize == PAGE_SIZE) 538 + if (!logging && (vma_pagesize == PAGE_SIZE)) 543 539 vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &hfn, &gpa); 544 540 545 541 if (writable) {

+1 -1

arch/riscv/kvm/vcpu.c

··· 24 24 #define CREATE_TRACE_POINTS 25 25 #include "trace.h" 26 26 27 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 27 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 28 28 KVM_GENERIC_VCPU_STATS(), 29 29 STATS_DESC_COUNTER(VCPU, ecall_exit_stat), 30 30 STATS_DESC_COUNTER(VCPU, wfi_exit_stat),

+13 -4

arch/riscv/kvm/vcpu_fp.c

··· 10 10 #include <linux/errno.h> 11 11 #include <linux/err.h> 12 12 #include <linux/kvm_host.h> 13 + #include <linux/nospec.h> 13 14 #include <linux/uaccess.h> 14 15 #include <asm/cpufeature.h> 15 16 ··· 94 93 if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) 95 94 reg_val = &cntx->fp.f.fcsr; 96 95 else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && 97 - reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) 96 + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) { 97 + reg_num = array_index_nospec(reg_num, 98 + ARRAY_SIZE(cntx->fp.f.f)); 98 99 reg_val = &cntx->fp.f.f[reg_num]; 99 - else 100 + } else 100 101 return -ENOENT; 101 102 } else if ((rtype == KVM_REG_RISCV_FP_D) && 102 103 riscv_isa_extension_available(vcpu->arch.isa, d)) { ··· 110 107 reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { 111 108 if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 112 109 return -EINVAL; 110 + reg_num = array_index_nospec(reg_num, 111 + ARRAY_SIZE(cntx->fp.d.f)); 113 112 reg_val = &cntx->fp.d.f[reg_num]; 114 113 } else 115 114 return -ENOENT; ··· 143 138 if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) 144 139 reg_val = &cntx->fp.f.fcsr; 145 140 else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && 146 - reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) 141 + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) { 142 + reg_num = array_index_nospec(reg_num, 143 + ARRAY_SIZE(cntx->fp.f.f)); 147 144 reg_val = &cntx->fp.f.f[reg_num]; 148 - else 145 + } else 149 146 return -ENOENT; 150 147 } else if ((rtype == KVM_REG_RISCV_FP_D) && 151 148 riscv_isa_extension_available(vcpu->arch.isa, d)) { ··· 159 152 reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { 160 153 if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 161 154 return -EINVAL; 155 + reg_num = array_index_nospec(reg_num, 156 + ARRAY_SIZE(cntx->fp.d.f)); 162 157 reg_val = &cntx->fp.d.f[reg_num]; 163 158 } else 164 159 return -ENOENT;

+36 -18

arch/riscv/kvm/vcpu_onereg.c

··· 10 10 #include <linux/bitops.h> 11 11 #include <linux/errno.h> 12 12 #include <linux/err.h> 13 + #include <linux/nospec.h> 13 14 #include <linux/uaccess.h> 14 15 #include <linux/kvm_host.h> 15 16 #include <asm/cacheflush.h> ··· 128 127 kvm_ext >= ARRAY_SIZE(kvm_isa_ext_arr)) 129 128 return -ENOENT; 130 129 130 + kvm_ext = array_index_nospec(kvm_ext, ARRAY_SIZE(kvm_isa_ext_arr)); 131 131 *guest_ext = kvm_isa_ext_arr[kvm_ext]; 132 132 switch (*guest_ext) { 133 133 case RISCV_ISA_EXT_SMNPM: ··· 445 443 unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 446 444 KVM_REG_SIZE_MASK | 447 445 KVM_REG_RISCV_CORE); 446 + unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long); 448 447 unsigned long reg_val; 449 448 450 449 if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 451 450 return -EINVAL; 452 - if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) 451 + if (reg_num >= regs_max) 453 452 return -ENOENT; 453 + 454 + reg_num = array_index_nospec(reg_num, regs_max); 454 455 455 456 if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) 456 457 reg_val = cntx->sepc; ··· 481 476 unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 482 477 KVM_REG_SIZE_MASK | 483 478 KVM_REG_RISCV_CORE); 479 + unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long); 484 480 unsigned long reg_val; 485 481 486 482 if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 487 483 return -EINVAL; 488 - if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) 484 + if (reg_num >= regs_max) 489 485 return -ENOENT; 486 + 487 + reg_num = array_index_nospec(reg_num, regs_max); 490 488 491 489 if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) 492 490 return -EFAULT; ··· 515 507 unsigned long *out_val) 516 508 { 517 509 struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 510 + unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long); 518 511 519 - if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) 512 + if (reg_num >= regs_max) 520 513 return -ENOENT; 514 + 515 + reg_num = array_index_nospec(reg_num, regs_max); 521 516 522 517 if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) { 523 518 kvm_riscv_vcpu_flush_interrupts(vcpu); ··· 537 526 unsigned long reg_val) 538 527 { 539 528 struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 529 + unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long); 540 530 541 - if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) 531 + if (reg_num >= regs_max) 542 532 return -ENOENT; 533 + 534 + reg_num = array_index_nospec(reg_num, regs_max); 543 535 544 536 if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) { 545 537 reg_val &= VSIP_VALID_MASK; ··· 562 548 unsigned long reg_val) 563 549 { 564 550 struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr; 551 + unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) / 552 + sizeof(unsigned long); 565 553 566 - if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) / 567 - sizeof(unsigned long)) 568 - return -EINVAL; 554 + if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN)) 555 + return -ENOENT; 556 + if (reg_num >= regs_max) 557 + return -ENOENT; 558 + 559 + reg_num = array_index_nospec(reg_num, regs_max); 569 560 570 561 ((unsigned long *)csr)[reg_num] = reg_val; 571 562 return 0; ··· 581 562 unsigned long *out_val) 582 563 { 583 564 struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr; 565 + unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) / 566 + sizeof(unsigned long); 584 567 585 - if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) / 586 - sizeof(unsigned long)) 587 - return -EINVAL; 568 + if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN)) 569 + return -ENOENT; 570 + if (reg_num >= regs_max) 571 + return -ENOENT; 572 + 573 + reg_num = array_index_nospec(reg_num, regs_max); 588 574 589 575 *out_val = ((unsigned long *)csr)[reg_num]; 590 576 return 0; ··· 619 595 rc = kvm_riscv_vcpu_aia_get_csr(vcpu, reg_num, &reg_val); 620 596 break; 621 597 case KVM_REG_RISCV_CSR_SMSTATEEN: 622 - rc = -EINVAL; 623 - if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) 624 - rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num, 625 - &reg_val); 598 + rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num, &reg_val); 626 599 break; 627 600 default: 628 601 rc = -ENOENT; ··· 661 640 rc = kvm_riscv_vcpu_aia_set_csr(vcpu, reg_num, reg_val); 662 641 break; 663 642 case KVM_REG_RISCV_CSR_SMSTATEEN: 664 - rc = -EINVAL; 665 - if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) 666 - rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num, 667 - reg_val); 643 + rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num, reg_val); 668 644 break; 669 645 default: 670 646 rc = -ENOENT;

+12 -4

arch/riscv/kvm/vcpu_pmu.c

··· 10 10 #include <linux/errno.h> 11 11 #include <linux/err.h> 12 12 #include <linux/kvm_host.h> 13 + #include <linux/nospec.h> 13 14 #include <linux/perf/riscv_pmu.h> 14 15 #include <asm/csr.h> 15 16 #include <asm/kvm_vcpu_sbi.h> ··· 88 87 89 88 static u64 kvm_pmu_get_perf_event_hw_config(u32 sbi_event_code) 90 89 { 91 - return hw_event_perf_map[sbi_event_code]; 90 + return hw_event_perf_map[array_index_nospec(sbi_event_code, 91 + SBI_PMU_HW_GENERAL_MAX)]; 92 92 } 93 93 94 94 static u64 kvm_pmu_get_perf_event_cache_config(u32 sbi_event_code) ··· 220 218 return -EINVAL; 221 219 } 222 220 221 + cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS); 223 222 pmc = &kvpmu->pmc[cidx]; 224 223 225 224 if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW) ··· 247 244 return -EINVAL; 248 245 } 249 246 247 + cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS); 250 248 pmc = &kvpmu->pmc[cidx]; 251 249 252 250 if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) { ··· 524 520 { 525 521 struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu); 526 522 527 - if (cidx > RISCV_KVM_MAX_COUNTERS || cidx == 1) { 523 + if (cidx >= RISCV_KVM_MAX_COUNTERS || cidx == 1) { 528 524 retdata->err_val = SBI_ERR_INVALID_PARAM; 529 525 return 0; 530 526 } 531 527 528 + cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS); 532 529 retdata->out_val = kvpmu->pmc[cidx].cinfo.value; 533 530 534 531 return 0; ··· 564 559 } 565 560 /* Start the counters that have been configured and requested by the guest */ 566 561 for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) { 567 - pmc_index = i + ctr_base; 562 + pmc_index = array_index_nospec(i + ctr_base, 563 + RISCV_KVM_MAX_COUNTERS); 568 564 if (!test_bit(pmc_index, kvpmu->pmc_in_use)) 569 565 continue; 570 566 /* The guest started the counter again. Reset the overflow status */ ··· 636 630 637 631 /* Stop the counters that have been configured and requested by the guest */ 638 632 for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) { 639 - pmc_index = i + ctr_base; 633 + pmc_index = array_index_nospec(i + ctr_base, 634 + RISCV_KVM_MAX_COUNTERS); 640 635 if (!test_bit(pmc_index, kvpmu->pmc_in_use)) 641 636 continue; 642 637 pmc = &kvpmu->pmc[pmc_index]; ··· 768 761 } 769 762 } 770 763 764 + ctr_idx = array_index_nospec(ctr_idx, RISCV_KVM_MAX_COUNTERS); 771 765 pmc = &kvpmu->pmc[ctr_idx]; 772 766 pmc->idx = ctr_idx; 773 767

+1 -1

arch/riscv/kvm/vm.c

··· 13 13 #include <linux/kvm_host.h> 14 14 #include <asm/kvm_mmu.h> 15 15 16 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 16 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 17 17 KVM_GENERIC_VM_STATS() 18 18 }; 19 19 static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==

+2 -2

arch/s390/kvm/kvm-s390.c

··· 65 65 #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \ 66 66 (KVM_MAX_VCPUS + LOCAL_IRQS)) 67 67 68 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 68 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 69 69 KVM_GENERIC_VM_STATS(), 70 70 STATS_DESC_COUNTER(VM, inject_io), 71 71 STATS_DESC_COUNTER(VM, inject_float_mchk), ··· 91 91 sizeof(kvm_vm_stats_desc), 92 92 }; 93 93 94 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 94 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 95 95 KVM_GENERIC_VCPU_STATS(), 96 96 STATS_DESC_COUNTER(VCPU, exit_userspace), 97 97 STATS_DESC_COUNTER(VCPU, exit_null),

+2 -1

arch/x86/include/asm/kvm_host.h

··· 2485 2485 KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \ 2486 2486 KVM_X86_QUIRK_SLOT_ZAP_ALL | \ 2487 2487 KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \ 2488 - KVM_X86_QUIRK_IGNORE_GUEST_PAT) 2488 + KVM_X86_QUIRK_IGNORE_GUEST_PAT | \ 2489 + KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM) 2489 2490 2490 2491 #define KVM_X86_CONDITIONAL_QUIRKS \ 2491 2492 (KVM_X86_QUIRK_CD_NW_CLEARED | \

arch/x86/include/uapi/asm/kvm.h

··· 476 476 #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) 477 477 #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) 478 478 #define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9) 479 + #define KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM (1 << 10) 479 480 480 481 #define KVM_STATE_NESTED_FORMAT_VMX 0 481 482 #define KVM_STATE_NESTED_FORMAT_SVM 1

+4 -1

arch/x86/kvm/cpuid.c

··· 776 776 #define SYNTHESIZED_F(name) \ 777 777 ({ \ 778 778 kvm_cpu_cap_synthesized |= feature_bit(name); \ 779 - F(name); \ 779 + \ 780 + BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); \ 781 + if (boot_cpu_has(X86_FEATURE_##name)) \ 782 + F(name); \ 780 783 }) 781 784 782 785 /*

+5 -4

arch/x86/kvm/hyperv.c

··· 1981 1981 if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY) 1982 1982 goto out_flush_all; 1983 1983 1984 - if (is_noncanonical_invlpg_address(entries[i], vcpu)) 1985 - continue; 1986 - 1987 1984 /* 1988 1985 * Lower 12 bits of 'address' encode the number of additional 1989 1986 * pages to flush. 1990 1987 */ 1991 1988 gva = entries[i] & PAGE_MASK; 1992 - for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++) 1989 + for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++) { 1990 + if (is_noncanonical_invlpg_address(gva + j * PAGE_SIZE, vcpu)) 1991 + continue; 1992 + 1993 1993 kvm_x86_call(flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE); 1994 + } 1994 1995 1995 1996 ++vcpu->stat.tlb_flush; 1996 1997 }

+2 -1

arch/x86/kvm/ioapic.c

··· 321 321 idx = srcu_read_lock(&kvm->irq_srcu); 322 322 gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin); 323 323 if (gsi != -1) 324 - hlist_for_each_entry_rcu(kimn, &ioapic->mask_notifier_list, link) 324 + hlist_for_each_entry_srcu(kimn, &ioapic->mask_notifier_list, link, 325 + srcu_read_lock_held(&kvm->irq_srcu)) 325 326 if (kimn->irq == gsi) 326 327 kimn->func(kimn, mask); 327 328 srcu_read_unlock(&kvm->irq_srcu, idx);

+6 -3

arch/x86/kvm/svm/avic.c

··· 189 189 struct kvm_vcpu *vcpu = &svm->vcpu; 190 190 191 191 vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); 192 - 193 192 vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; 194 193 vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu); 195 - 196 194 vmcb->control.int_ctl |= AVIC_ENABLE_MASK; 195 + 196 + svm_clr_intercept(svm, INTERCEPT_CR8_WRITE); 197 197 198 198 /* 199 199 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR ··· 225 225 226 226 vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK); 227 227 vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK; 228 + 229 + if (!sev_es_guest(svm->vcpu.kvm)) 230 + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); 228 231 229 232 /* 230 233 * If running nested and the guest uses its own MSR bitmap, there ··· 371 368 vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table)); 372 369 vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE; 373 370 374 - if (kvm_apicv_activated(svm->vcpu.kvm)) 371 + if (kvm_vcpu_apicv_active(&svm->vcpu)) 375 372 avic_activate_vmcb(svm); 376 373 else 377 374 avic_deactivate_vmcb(svm);

+10 -2

arch/x86/kvm/svm/nested.c

··· 418 418 return __nested_vmcb_check_controls(vcpu, ctl); 419 419 } 420 420 421 + int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu) 422 + { 423 + if (!nested_vmcb_check_save(vcpu) || 424 + !nested_vmcb_check_controls(vcpu)) 425 + return -EINVAL; 426 + 427 + return 0; 428 + } 429 + 421 430 /* 422 431 * If a feature is not advertised to L1, clear the corresponding vmcb12 423 432 * intercept. ··· 1037 1028 nested_copy_vmcb_control_to_cache(svm, &vmcb12->control); 1038 1029 nested_copy_vmcb_save_to_cache(svm, &vmcb12->save); 1039 1030 1040 - if (!nested_vmcb_check_save(vcpu) || 1041 - !nested_vmcb_check_controls(vcpu)) { 1031 + if (nested_svm_check_cached_vmcb12(vcpu) < 0) { 1042 1032 vmcb12->control.exit_code = SVM_EXIT_ERR; 1043 1033 vmcb12->control.exit_info_1 = 0; 1044 1034 vmcb12->control.exit_info_2 = 0;

+11 -6

arch/x86/kvm/svm/svm.c

··· 1077 1077 svm_set_intercept(svm, INTERCEPT_CR0_WRITE); 1078 1078 svm_set_intercept(svm, INTERCEPT_CR3_WRITE); 1079 1079 svm_set_intercept(svm, INTERCEPT_CR4_WRITE); 1080 - if (!kvm_vcpu_apicv_active(vcpu)) 1081 - svm_set_intercept(svm, INTERCEPT_CR8_WRITE); 1080 + svm_set_intercept(svm, INTERCEPT_CR8_WRITE); 1082 1081 1083 1082 set_dr_intercepts(svm); 1084 1083 ··· 1188 1189 if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) 1189 1190 svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP; 1190 1191 1191 - if (kvm_vcpu_apicv_active(vcpu)) 1192 + if (enable_apicv && irqchip_in_kernel(vcpu->kvm)) 1192 1193 avic_init_vmcb(svm, vmcb); 1193 1194 1194 1195 if (vnmi) ··· 2673 2674 2674 2675 static int cr8_write_interception(struct kvm_vcpu *vcpu) 2675 2676 { 2677 + u8 cr8_prev = kvm_get_cr8(vcpu); 2676 2678 int r; 2677 2679 2678 - u8 cr8_prev = kvm_get_cr8(vcpu); 2680 + WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu)); 2681 + 2679 2682 /* instruction emulation calls kvm_set_cr8() */ 2680 2683 r = cr_interception(vcpu); 2681 2684 if (lapic_in_kernel(vcpu)) ··· 4880 4879 vmcb12 = map.hva; 4881 4880 nested_copy_vmcb_control_to_cache(svm, &vmcb12->control); 4882 4881 nested_copy_vmcb_save_to_cache(svm, &vmcb12->save); 4883 - ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, false); 4884 4882 4885 - if (ret) 4883 + if (nested_svm_check_cached_vmcb12(vcpu) < 0) 4886 4884 goto unmap_save; 4887 4885 4886 + if (enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, 4887 + vmcb12, false) != 0) 4888 + goto unmap_save; 4889 + 4890 + ret = 0; 4888 4891 svm->nested.nested_run_pending = 1; 4889 4892 4890 4893 unmap_save:

arch/x86/kvm/svm/svm.h

··· 797 797 798 798 int nested_svm_exit_handled(struct vcpu_svm *svm); 799 799 int nested_svm_check_permissions(struct kvm_vcpu *vcpu); 800 + int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu); 800 801 int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, 801 802 bool has_error_code, u32 error_code); 802 803 int nested_svm_exit_special(struct vcpu_svm *svm);

+45 -16

arch/x86/kvm/vmx/nested.c

··· 3300 3300 if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP))) 3301 3301 return -EINVAL; 3302 3302 3303 - if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && 3304 - (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || 3305 - CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) 3306 - return -EINVAL; 3303 + if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) { 3304 + u64 debugctl = vmcs12->guest_ia32_debugctl; 3305 + 3306 + /* 3307 + * FREEZE_IN_SMM is not virtualized, but allow L1 to set it in 3308 + * vmcs12's DEBUGCTL under a quirk for backwards compatibility. 3309 + * Note that the quirk only relaxes the consistency check. The 3310 + * vmcc02 bit is still under the control of the host. In 3311 + * particular, if a host administrator decides to clear the bit, 3312 + * then L1 has no say in the matter. 3313 + */ 3314 + if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM)) 3315 + debugctl &= ~DEBUGCTLMSR_FREEZE_IN_SMM; 3316 + 3317 + if (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || 3318 + CC(!vmx_is_valid_debugctl(vcpu, debugctl, false))) 3319 + return -EINVAL; 3320 + } 3307 3321 3308 3322 if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && 3309 3323 CC(!kvm_pat_valid(vmcs12->guest_ia32_pat))) ··· 6856 6842 free_nested(vcpu); 6857 6843 } 6858 6844 6845 + int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu) 6846 + { 6847 + enum vm_entry_failure_code ignored; 6848 + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); 6849 + 6850 + if (nested_cpu_has_shadow_vmcs(vmcs12) && 6851 + vmcs12->vmcs_link_pointer != INVALID_GPA) { 6852 + struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu); 6853 + 6854 + if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION || 6855 + !shadow_vmcs12->hdr.shadow_vmcs) 6856 + return -EINVAL; 6857 + } 6858 + 6859 + if (nested_vmx_check_controls(vcpu, vmcs12) || 6860 + nested_vmx_check_host_state(vcpu, vmcs12) || 6861 + nested_vmx_check_guest_state(vcpu, vmcs12, &ignored)) 6862 + return -EINVAL; 6863 + 6864 + return 0; 6865 + } 6866 + 6859 6867 static int vmx_set_nested_state(struct kvm_vcpu *vcpu, 6860 6868 struct kvm_nested_state __user *user_kvm_nested_state, 6861 6869 struct kvm_nested_state *kvm_state) 6862 6870 { 6863 6871 struct vcpu_vmx *vmx = to_vmx(vcpu); 6864 6872 struct vmcs12 *vmcs12; 6865 - enum vm_entry_failure_code ignored; 6866 6873 struct kvm_vmx_nested_state_data __user *user_vmx_nested_state = 6867 6874 &user_kvm_nested_state->data.vmx[0]; 6868 6875 int ret; ··· 7014 6979 vmx->nested.mtf_pending = 7015 6980 !!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING); 7016 6981 7017 - ret = -EINVAL; 7018 6982 if (nested_cpu_has_shadow_vmcs(vmcs12) && 7019 6983 vmcs12->vmcs_link_pointer != INVALID_GPA) { 7020 6984 struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu); 7021 6985 6986 + ret = -EINVAL; 7022 6987 if (kvm_state->size < 7023 6988 sizeof(*kvm_state) + 7024 6989 sizeof(user_vmx_nested_state->vmcs12) + sizeof(*shadow_vmcs12)) 7025 6990 goto error_guest_mode; 7026 6991 6992 + ret = -EFAULT; 7027 6993 if (copy_from_user(shadow_vmcs12, 7028 6994 user_vmx_nested_state->shadow_vmcs12, 7029 - sizeof(*shadow_vmcs12))) { 7030 - ret = -EFAULT; 7031 - goto error_guest_mode; 7032 - } 7033 - 7034 - if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION || 7035 - !shadow_vmcs12->hdr.shadow_vmcs) 6995 + sizeof(*shadow_vmcs12))) 7036 6996 goto error_guest_mode; 7037 6997 } 7038 6998 ··· 7038 7008 kvm_state->hdr.vmx.preemption_timer_deadline; 7039 7009 } 7040 7010 7041 - if (nested_vmx_check_controls(vcpu, vmcs12) || 7042 - nested_vmx_check_host_state(vcpu, vmcs12) || 7043 - nested_vmx_check_guest_state(vcpu, vmcs12, &ignored)) 7011 + ret = nested_vmx_check_restored_vmcs12(vcpu); 7012 + if (ret < 0) 7044 7013 goto error_guest_mode; 7045 7014 7046 7015 vmx->nested.dirty_vmcs12 = true;

arch/x86/kvm/vmx/nested.h

··· 22 22 void nested_vmx_hardware_unsetup(void); 23 23 __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *)); 24 24 void nested_vmx_set_vmcs_shadowing_bitmap(void); 25 + int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu); 25 26 void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu); 26 27 enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, 27 28 bool from_vmentry);

+7 -3

arch/x86/kvm/vmx/vmx.c

··· 1149 1149 } 1150 1150 1151 1151 vmx_add_auto_msr(&m->guest, msr, guest_val, VM_ENTRY_MSR_LOAD_COUNT, kvm); 1152 - vmx_add_auto_msr(&m->guest, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm); 1152 + vmx_add_auto_msr(&m->host, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm); 1153 1153 } 1154 1154 1155 1155 static bool update_transition_efer(struct vcpu_vmx *vmx) ··· 8528 8528 } 8529 8529 8530 8530 if (vmx->nested.smm.guest_mode) { 8531 + /* Triple fault if the state is invalid. */ 8532 + if (nested_vmx_check_restored_vmcs12(vcpu) < 0) 8533 + return 1; 8534 + 8531 8535 ret = nested_vmx_enter_non_root_mode(vcpu, false); 8532 - if (ret) 8533 - return ret; 8536 + if (ret != NVMX_VMENTRY_SUCCESS) 8537 + return 1; 8534 8538 8535 8539 vmx->nested.nested_run_pending = 1; 8536 8540 vmx->nested.smm.guest_mode = false;

+2 -2

arch/x86/kvm/x86.c

··· 243 243 bool __read_mostly enable_device_posted_irqs = true; 244 244 EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_device_posted_irqs); 245 245 246 - const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 246 + const struct kvm_stats_desc kvm_vm_stats_desc[] = { 247 247 KVM_GENERIC_VM_STATS(), 248 248 STATS_DESC_COUNTER(VM, mmu_shadow_zapped), 249 249 STATS_DESC_COUNTER(VM, mmu_pte_write), ··· 269 269 sizeof(kvm_vm_stats_desc), 270 270 }; 271 271 272 - const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 272 + const struct kvm_stats_desc kvm_vcpu_stats_desc[] = { 273 273 KVM_GENERIC_VCPU_STATS(), 274 274 STATS_DESC_COUNTER(VCPU, pf_taken), 275 275 STATS_DESC_COUNTER(VCPU, pf_fixed),

+35 -48

include/linux/kvm_host.h

··· 1940 1940 1941 1941 struct kvm_stat_data { 1942 1942 struct kvm *kvm; 1943 - const struct _kvm_stats_desc *desc; 1943 + const struct kvm_stats_desc *desc; 1944 1944 enum kvm_stat_kind kind; 1945 1945 }; 1946 1946 1947 - struct _kvm_stats_desc { 1948 - struct kvm_stats_desc desc; 1949 - char name[KVM_STATS_NAME_SIZE]; 1950 - }; 1951 - 1952 - #define STATS_DESC_COMMON(type, unit, base, exp, sz, bsz) \ 1953 - .flags = type | unit | base | \ 1954 - BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) | \ 1955 - BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) | \ 1956 - BUILD_BUG_ON_ZERO(base & ~KVM_STATS_BASE_MASK), \ 1957 - .exponent = exp, \ 1958 - .size = sz, \ 1947 + #define STATS_DESC_COMMON(type, unit, base, exp, sz, bsz) \ 1948 + .flags = type | unit | base | \ 1949 + BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) | \ 1950 + BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) | \ 1951 + BUILD_BUG_ON_ZERO(base & ~KVM_STATS_BASE_MASK), \ 1952 + .exponent = exp, \ 1953 + .size = sz, \ 1959 1954 .bucket_size = bsz 1960 1955 1961 - #define VM_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1962 - { \ 1963 - { \ 1964 - STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1965 - .offset = offsetof(struct kvm_vm_stat, generic.stat) \ 1966 - }, \ 1967 - .name = #stat, \ 1968 - } 1969 - #define VCPU_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1970 - { \ 1971 - { \ 1972 - STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1973 - .offset = offsetof(struct kvm_vcpu_stat, generic.stat) \ 1974 - }, \ 1975 - .name = #stat, \ 1976 - } 1977 - #define VM_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1978 - { \ 1979 - { \ 1980 - STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1981 - .offset = offsetof(struct kvm_vm_stat, stat) \ 1982 - }, \ 1983 - .name = #stat, \ 1984 - } 1985 - #define VCPU_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1986 - { \ 1987 - { \ 1988 - STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1989 - .offset = offsetof(struct kvm_vcpu_stat, stat) \ 1990 - }, \ 1991 - .name = #stat, \ 1992 - } 1956 + #define VM_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1957 + { \ 1958 + STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1959 + .offset = offsetof(struct kvm_vm_stat, generic.stat), \ 1960 + .name = #stat, \ 1961 + } 1962 + #define VCPU_GENERIC_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1963 + { \ 1964 + STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1965 + .offset = offsetof(struct kvm_vcpu_stat, generic.stat), \ 1966 + .name = #stat, \ 1967 + } 1968 + #define VM_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1969 + { \ 1970 + STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1971 + .offset = offsetof(struct kvm_vm_stat, stat), \ 1972 + .name = #stat, \ 1973 + } 1974 + #define VCPU_STATS_DESC(stat, type, unit, base, exp, sz, bsz) \ 1975 + { \ 1976 + STATS_DESC_COMMON(type, unit, base, exp, sz, bsz), \ 1977 + .offset = offsetof(struct kvm_vcpu_stat, stat), \ 1978 + .name = #stat, \ 1979 + } 1993 1980 /* SCOPE: VM, VM_GENERIC, VCPU, VCPU_GENERIC */ 1994 1981 #define STATS_DESC(SCOPE, stat, type, unit, base, exp, sz, bsz) \ 1995 1982 SCOPE##_STATS_DESC(stat, type, unit, base, exp, sz, bsz) ··· 2053 2066 STATS_DESC_IBOOLEAN(VCPU_GENERIC, blocking) 2054 2067 2055 2068 ssize_t kvm_stats_read(char *id, const struct kvm_stats_header *header, 2056 - const struct _kvm_stats_desc *desc, 2069 + const struct kvm_stats_desc *desc, 2057 2070 void *stats, size_t size_stats, 2058 2071 char __user *user_buffer, size_t size, loff_t *offset); 2059 2072 ··· 2098 2111 2099 2112 2100 2113 extern const struct kvm_stats_header kvm_vm_stats_header; 2101 - extern const struct _kvm_stats_desc kvm_vm_stats_desc[]; 2114 + extern const struct kvm_stats_desc kvm_vm_stats_desc[]; 2102 2115 extern const struct kvm_stats_header kvm_vcpu_stats_header; 2103 - extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; 2116 + extern const struct kvm_stats_desc kvm_vcpu_stats_desc[]; 2104 2117 2105 2118 static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) 2106 2119 {

include/uapi/linux/kvm.h

··· 14 14 #include <linux/ioctl.h> 15 15 #include <asm/kvm.h> 16 16 17 + #ifdef __KERNEL__ 18 + #include <linux/kvm_types.h> 19 + #endif 20 + 17 21 #define KVM_API_VERSION 12 18 22 19 23 /* ··· 1605 1601 __u16 size; 1606 1602 __u32 offset; 1607 1603 __u32 bucket_size; 1604 + #ifdef __KERNEL__ 1605 + char name[KVM_STATS_NAME_SIZE]; 1606 + #else 1608 1607 char name[]; 1608 + #endif 1609 1609 }; 1610 1610 1611 1611 #define KVM_GET_STATS_FD _IO(KVMIO, 0xce)

tools/testing/selftests/kvm/Makefile.kvm

··· 71 71 TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test 72 72 TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test 73 73 TEST_GEN_PROGS_x86 += x86/feature_msrs_test 74 + TEST_GEN_PROGS_x86 += x86/evmcs_smm_controls_test 74 75 TEST_GEN_PROGS_x86 += x86/exit_on_emulation_failure_test 75 76 TEST_GEN_PROGS_x86 += x86/fastops_test 76 77 TEST_GEN_PROGS_x86 += x86/fix_hypercall_test

+1 -1

tools/testing/selftests/kvm/guest_memfd_test.c

··· 80 80 { 81 81 const unsigned long nodemask_0 = 1; /* nid: 0 */ 82 82 unsigned long nodemask = 0; 83 - unsigned long maxnode = 8; 83 + unsigned long maxnode = BITS_PER_TYPE(nodemask); 84 84 int policy; 85 85 char *mem; 86 86 int ret;

+23

tools/testing/selftests/kvm/include/x86/processor.h

··· 557 557 return cr0; 558 558 } 559 559 560 + static inline void set_cr0(uint64_t val) 561 + { 562 + __asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory"); 563 + } 564 + 560 565 static inline uint64_t get_cr3(void) 561 566 { 562 567 uint64_t cr3; ··· 569 564 __asm__ __volatile__("mov %%cr3, %[cr3]" 570 565 : /* output */ [cr3]"=r"(cr3)); 571 566 return cr3; 567 + } 568 + 569 + static inline void set_cr3(uint64_t val) 570 + { 571 + __asm__ __volatile__("mov %0, %%cr3" : : "r" (val) : "memory"); 572 572 } 573 573 574 574 static inline uint64_t get_cr4(void) ··· 588 578 static inline void set_cr4(uint64_t val) 589 579 { 590 580 __asm__ __volatile__("mov %0, %%cr4" : : "r" (val) : "memory"); 581 + } 582 + 583 + static inline uint64_t get_cr8(void) 584 + { 585 + uint64_t cr8; 586 + 587 + __asm__ __volatile__("mov %%cr8, %[cr8]" : [cr8]"=r"(cr8)); 588 + return cr8; 589 + } 590 + 591 + static inline void set_cr8(uint64_t val) 592 + { 593 + __asm__ __volatile__("mov %0, %%cr8" : : "r" (val) : "memory"); 591 594 } 592 595 593 596 static inline void set_idt(const struct desc_ptr *idt_desc)

+17

tools/testing/selftests/kvm/include/x86/smm.h

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #ifndef SELFTEST_KVM_SMM_H 3 + #define SELFTEST_KVM_SMM_H 4 + 5 + #include "kvm_util.h" 6 + 7 + #define SMRAM_SIZE 65536 8 + #define SMRAM_MEMSLOT ((1 << 16) | 1) 9 + #define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE) 10 + 11 + void setup_smram(struct kvm_vm *vm, struct kvm_vcpu *vcpu, 12 + uint64_t smram_gpa, 13 + const void *smi_handler, size_t handler_size); 14 + 15 + void inject_smi(struct kvm_vcpu *vcpu); 16 + 17 + #endif /* SELFTEST_KVM_SMM_H */

+26

tools/testing/selftests/kvm/lib/x86/processor.c

··· 8 8 #include "kvm_util.h" 9 9 #include "pmu.h" 10 10 #include "processor.h" 11 + #include "smm.h" 11 12 #include "svm_util.h" 12 13 #include "sev.h" 13 14 #include "vmx.h" ··· 1444 1443 bool kvm_arch_has_default_irqchip(void) 1445 1444 { 1446 1445 return true; 1446 + } 1447 + 1448 + void setup_smram(struct kvm_vm *vm, struct kvm_vcpu *vcpu, 1449 + uint64_t smram_gpa, 1450 + const void *smi_handler, size_t handler_size) 1451 + { 1452 + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, smram_gpa, 1453 + SMRAM_MEMSLOT, SMRAM_PAGES, 0); 1454 + TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, smram_gpa, 1455 + SMRAM_MEMSLOT) == smram_gpa, 1456 + "Could not allocate guest physical addresses for SMRAM"); 1457 + 1458 + memset(addr_gpa2hva(vm, smram_gpa), 0x0, SMRAM_SIZE); 1459 + memcpy(addr_gpa2hva(vm, smram_gpa) + 0x8000, smi_handler, handler_size); 1460 + vcpu_set_msr(vcpu, MSR_IA32_SMBASE, smram_gpa); 1461 + } 1462 + 1463 + void inject_smi(struct kvm_vcpu *vcpu) 1464 + { 1465 + struct kvm_vcpu_events events; 1466 + 1467 + vcpu_events_get(vcpu, &events); 1468 + events.smi.pending = 1; 1469 + events.flags |= KVM_VCPUEVENT_VALID_SMM; 1470 + vcpu_events_set(vcpu, &events); 1447 1471 }

+150

tools/testing/selftests/kvm/x86/evmcs_smm_controls_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2026, Red Hat, Inc. 4 + * 5 + * Test that vmx_leave_smm() validates vmcs12 controls before re-entering 6 + * nested guest mode on RSM. 7 + */ 8 + #include <fcntl.h> 9 + #include <stdio.h> 10 + #include <stdlib.h> 11 + #include <string.h> 12 + #include <sys/ioctl.h> 13 + 14 + #include "test_util.h" 15 + #include "kvm_util.h" 16 + #include "smm.h" 17 + #include "hyperv.h" 18 + #include "vmx.h" 19 + 20 + #define SMRAM_GPA 0x1000000 21 + #define SMRAM_STAGE 0xfe 22 + 23 + #define SYNC_PORT 0xe 24 + 25 + #define STR(x) #x 26 + #define XSTR(s) STR(s) 27 + 28 + /* 29 + * SMI handler: runs in real-address mode. 30 + * Reports SMRAM_STAGE via port IO, then does RSM. 31 + */ 32 + static uint8_t smi_handler[] = { 33 + 0xb0, SMRAM_STAGE, /* mov $SMRAM_STAGE, %al */ 34 + 0xe4, SYNC_PORT, /* in $SYNC_PORT, %al */ 35 + 0x0f, 0xaa, /* rsm */ 36 + }; 37 + 38 + static inline void sync_with_host(uint64_t phase) 39 + { 40 + asm volatile("in $" XSTR(SYNC_PORT) ", %%al \n" 41 + : "+a" (phase)); 42 + } 43 + 44 + static void l2_guest_code(void) 45 + { 46 + sync_with_host(1); 47 + 48 + /* After SMI+RSM with invalid controls, we should not reach here. */ 49 + vmcall(); 50 + } 51 + 52 + static void guest_code(struct vmx_pages *vmx_pages, 53 + struct hyperv_test_pages *hv_pages) 54 + { 55 + #define L2_GUEST_STACK_SIZE 64 56 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; 57 + 58 + /* Set up Hyper-V enlightenments and eVMCS */ 59 + wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID); 60 + enable_vp_assist(hv_pages->vp_assist_gpa, hv_pages->vp_assist); 61 + evmcs_enable(); 62 + 63 + GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages)); 64 + GUEST_ASSERT(load_evmcs(hv_pages)); 65 + prepare_vmcs(vmx_pages, l2_guest_code, 66 + &l2_guest_stack[L2_GUEST_STACK_SIZE]); 67 + 68 + GUEST_ASSERT(!vmlaunch()); 69 + 70 + /* L2 exits via vmcall if test fails */ 71 + sync_with_host(2); 72 + } 73 + 74 + int main(int argc, char *argv[]) 75 + { 76 + vm_vaddr_t vmx_pages_gva = 0, hv_pages_gva = 0; 77 + struct hyperv_test_pages *hv; 78 + struct hv_enlightened_vmcs *evmcs; 79 + struct kvm_vcpu *vcpu; 80 + struct kvm_vm *vm; 81 + struct kvm_regs regs; 82 + int stage_reported; 83 + 84 + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX)); 85 + TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE)); 86 + TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS)); 87 + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_SMM)); 88 + 89 + vm = vm_create_with_one_vcpu(&vcpu, guest_code); 90 + 91 + setup_smram(vm, vcpu, SMRAM_GPA, smi_handler, sizeof(smi_handler)); 92 + 93 + vcpu_set_hv_cpuid(vcpu); 94 + vcpu_enable_evmcs(vcpu); 95 + vcpu_alloc_vmx(vm, &vmx_pages_gva); 96 + hv = vcpu_alloc_hyperv_test_pages(vm, &hv_pages_gva); 97 + vcpu_args_set(vcpu, 2, vmx_pages_gva, hv_pages_gva); 98 + 99 + vcpu_run(vcpu); 100 + 101 + /* L2 is running and syncs with host. */ 102 + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); 103 + vcpu_regs_get(vcpu, &regs); 104 + stage_reported = regs.rax & 0xff; 105 + TEST_ASSERT(stage_reported == 1, 106 + "Expected stage 1, got %d", stage_reported); 107 + 108 + /* Inject SMI while L2 is running. */ 109 + inject_smi(vcpu); 110 + vcpu_run(vcpu); 111 + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); 112 + vcpu_regs_get(vcpu, &regs); 113 + stage_reported = regs.rax & 0xff; 114 + TEST_ASSERT(stage_reported == SMRAM_STAGE, 115 + "Expected SMM handler stage %#x, got %#x", 116 + SMRAM_STAGE, stage_reported); 117 + 118 + /* 119 + * Guest is now paused in the SMI handler, about to execute RSM. 120 + * Hack the eVMCS page to set-up invalid pin-based execution 121 + * control (PIN_BASED_VIRTUAL_NMIS without PIN_BASED_NMI_EXITING). 122 + */ 123 + evmcs = hv->enlightened_vmcs_hva; 124 + evmcs->pin_based_vm_exec_control |= PIN_BASED_VIRTUAL_NMIS; 125 + evmcs->hv_clean_fields = 0; 126 + 127 + /* 128 + * Trigger copy_enlightened_to_vmcs12() via KVM_GET_NESTED_STATE, 129 + * copying the invalid pin_based_vm_exec_control into cached_vmcs12. 130 + */ 131 + union { 132 + struct kvm_nested_state state; 133 + char state_[16384]; 134 + } nested_state_buf; 135 + 136 + memset(&nested_state_buf, 0, sizeof(nested_state_buf)); 137 + nested_state_buf.state.size = sizeof(nested_state_buf); 138 + vcpu_nested_state_get(vcpu, &nested_state_buf.state); 139 + 140 + /* 141 + * Resume the guest. The SMI handler executes RSM, which calls 142 + * vmx_leave_smm(). nested_vmx_check_controls() should detect 143 + * VIRTUAL_NMIS without NMI_EXITING and cause a triple fault. 144 + */ 145 + vcpu_run(vcpu); 146 + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SHUTDOWN); 147 + 148 + kvm_vm_free(vm); 149 + return 0; 150 + }

+30

tools/testing/selftests/kvm/x86/sev_smoke_test.c

··· 13 13 #include "linux/psp-sev.h" 14 14 #include "sev.h" 15 15 16 + static void guest_sev_test_msr(uint32_t msr) 17 + { 18 + uint64_t val = rdmsr(msr); 19 + 20 + wrmsr(msr, val); 21 + GUEST_ASSERT(val == rdmsr(msr)); 22 + } 23 + 24 + #define guest_sev_test_reg(reg) \ 25 + do { \ 26 + uint64_t val = get_##reg(); \ 27 + \ 28 + set_##reg(val); \ 29 + GUEST_ASSERT(val == get_##reg()); \ 30 + } while (0) 31 + 32 + static void guest_sev_test_regs(void) 33 + { 34 + guest_sev_test_msr(MSR_EFER); 35 + guest_sev_test_reg(cr0); 36 + guest_sev_test_reg(cr3); 37 + guest_sev_test_reg(cr4); 38 + guest_sev_test_reg(cr8); 39 + } 16 40 17 41 #define XFEATURE_MASK_X87_AVX (XFEATURE_MASK_FP | XFEATURE_MASK_SSE | XFEATURE_MASK_YMM) 18 42 ··· 48 24 GUEST_ASSERT(sev_msr & MSR_AMD64_SEV_ES_ENABLED); 49 25 GUEST_ASSERT(sev_msr & MSR_AMD64_SEV_SNP_ENABLED); 50 26 27 + guest_sev_test_regs(); 28 + 51 29 wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_TERM_REQ); 52 30 vmgexit(); 53 31 } ··· 59 33 /* TODO: Check CPUID after GHCB-based hypercall support is added. */ 60 34 GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ENABLED); 61 35 GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ES_ENABLED); 36 + 37 + guest_sev_test_regs(); 62 38 63 39 /* 64 40 * TODO: Add GHCB and ucall support for SEV-ES guests. For now, simply ··· 74 46 { 75 47 GUEST_ASSERT(this_cpu_has(X86_FEATURE_SEV)); 76 48 GUEST_ASSERT(rdmsr(MSR_AMD64_SEV) & MSR_AMD64_SEV_ENABLED); 49 + 50 + guest_sev_test_regs(); 77 51 78 52 GUEST_DONE(); 79 53 }

+2 -25

tools/testing/selftests/kvm/x86/smm_test.c

··· 14 14 #include "test_util.h" 15 15 16 16 #include "kvm_util.h" 17 + #include "smm.h" 17 18 18 19 #include "vmx.h" 19 20 #include "svm_util.h" 20 21 21 - #define SMRAM_SIZE 65536 22 - #define SMRAM_MEMSLOT ((1 << 16) | 1) 23 - #define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE) 24 22 #define SMRAM_GPA 0x1000000 25 23 #define SMRAM_STAGE 0xfe 26 24 ··· 111 113 sync_with_host(DONE); 112 114 } 113 115 114 - void inject_smi(struct kvm_vcpu *vcpu) 115 - { 116 - struct kvm_vcpu_events events; 117 - 118 - vcpu_events_get(vcpu, &events); 119 - 120 - events.smi.pending = 1; 121 - events.flags |= KVM_VCPUEVENT_VALID_SMM; 122 - 123 - vcpu_events_set(vcpu, &events); 124 - } 125 - 126 116 int main(int argc, char *argv[]) 127 117 { 128 118 vm_vaddr_t nested_gva = 0; ··· 126 140 /* Create VM */ 127 141 vm = vm_create_with_one_vcpu(&vcpu, guest_code); 128 142 129 - vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA, 130 - SMRAM_MEMSLOT, SMRAM_PAGES, 0); 131 - TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT) 132 - == SMRAM_GPA, "could not allocate guest physical addresses?"); 133 - 134 - memset(addr_gpa2hva(vm, SMRAM_GPA), 0x0, SMRAM_SIZE); 135 - memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler, 136 - sizeof(smi_handler)); 137 - 138 - vcpu_set_msr(vcpu, MSR_IA32_SMBASE, SMRAM_GPA); 143 + setup_smram(vm, vcpu, SMRAM_GPA, smi_handler, sizeof(smi_handler)); 139 144 140 145 if (kvm_has_cap(KVM_CAP_NESTED_STATE)) { 141 146 if (kvm_cpu_has(X86_FEATURE_SVM))

+1 -1

virt/kvm/binary_stats.c

··· 50 50 * Return: the number of bytes that has been successfully read 51 51 */ 52 52 ssize_t kvm_stats_read(char *id, const struct kvm_stats_header *header, 53 - const struct _kvm_stats_desc *desc, 53 + const struct kvm_stats_desc *desc, 54 54 void *stats, size_t size_stats, 55 55 char __user *user_buffer, size_t size, loff_t *offset) 56 56 {

+10 -10

virt/kvm/kvm_main.c

··· 973 973 kvm_free_memslot(kvm, memslot); 974 974 } 975 975 976 - static umode_t kvm_stats_debugfs_mode(const struct _kvm_stats_desc *pdesc) 976 + static umode_t kvm_stats_debugfs_mode(const struct kvm_stats_desc *desc) 977 977 { 978 - switch (pdesc->desc.flags & KVM_STATS_TYPE_MASK) { 978 + switch (desc->flags & KVM_STATS_TYPE_MASK) { 979 979 case KVM_STATS_TYPE_INSTANT: 980 980 return 0444; 981 981 case KVM_STATS_TYPE_CUMULATIVE: ··· 1010 1010 struct dentry *dent; 1011 1011 char dir_name[ITOA_MAX_LEN * 2]; 1012 1012 struct kvm_stat_data *stat_data; 1013 - const struct _kvm_stats_desc *pdesc; 1013 + const struct kvm_stats_desc *pdesc; 1014 1014 int i, ret = -ENOMEM; 1015 1015 int kvm_debugfs_num_entries = kvm_vm_stats_header.num_desc + 1016 1016 kvm_vcpu_stats_header.num_desc; ··· 6171 6171 switch (stat_data->kind) { 6172 6172 case KVM_STAT_VM: 6173 6173 r = kvm_get_stat_per_vm(stat_data->kvm, 6174 - stat_data->desc->desc.offset, val); 6174 + stat_data->desc->offset, val); 6175 6175 break; 6176 6176 case KVM_STAT_VCPU: 6177 6177 r = kvm_get_stat_per_vcpu(stat_data->kvm, 6178 - stat_data->desc->desc.offset, val); 6178 + stat_data->desc->offset, val); 6179 6179 break; 6180 6180 } 6181 6181 ··· 6193 6193 switch (stat_data->kind) { 6194 6194 case KVM_STAT_VM: 6195 6195 r = kvm_clear_stat_per_vm(stat_data->kvm, 6196 - stat_data->desc->desc.offset); 6196 + stat_data->desc->offset); 6197 6197 break; 6198 6198 case KVM_STAT_VCPU: 6199 6199 r = kvm_clear_stat_per_vcpu(stat_data->kvm, 6200 - stat_data->desc->desc.offset); 6200 + stat_data->desc->offset); 6201 6201 break; 6202 6202 } 6203 6203 ··· 6345 6345 static void kvm_init_debug(void) 6346 6346 { 6347 6347 const struct file_operations *fops; 6348 - const struct _kvm_stats_desc *pdesc; 6348 + const struct kvm_stats_desc *pdesc; 6349 6349 int i; 6350 6350 6351 6351 kvm_debugfs_dir = debugfs_create_dir("kvm", NULL); ··· 6358 6358 fops = &vm_stat_readonly_fops; 6359 6359 debugfs_create_file(pdesc->name, kvm_stats_debugfs_mode(pdesc), 6360 6360 kvm_debugfs_dir, 6361 - (void *)(long)pdesc->desc.offset, fops); 6361 + (void *)(long)pdesc->offset, fops); 6362 6362 } 6363 6363 6364 6364 for (i = 0; i < kvm_vcpu_stats_header.num_desc; ++i) { ··· 6369 6369 fops = &vcpu_stat_readonly_fops; 6370 6370 debugfs_create_file(pdesc->name, kvm_stats_debugfs_mode(pdesc), 6371 6371 kvm_debugfs_dir, 6372 - (void *)(long)pdesc->desc.offset, fops); 6372 + (void *)(long)pdesc->offset, fops); 6373 6373 } 6374 6374 } 6375 6375

Configure Feed

Configure Feed