Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Changes that were posted too late for 6.1, or after the release.

x86:

- several fixes to nested VMX execution controls

- fixes and clarification to the documentation for Xen emulation

- do not unnecessarily release a pmu event with zero period

- MMU fixes

- fix Coverity warning in kvm_hv_flush_tlb()

selftests:

- fixes for the ucall mechanism in selftests

- other fixes mostly related to compilation with clang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (41 commits)
KVM: selftests: restore special vmmcall code layout needed by the harness
Documentation: kvm: clarify SRCU locking order
KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET
KVM: x86/xen: Documentation updates and clarifications
KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi
KVM: x86/xen: Simplify eventfd IOCTLs
KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_ports
KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly
KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()
KVM: Delete extra block of "};" in the KVM API documentation
kvm: x86/mmu: Remove duplicated "be split" in spte.h
kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()
MAINTAINERS: adjust entry after renaming the vmx hyperv files
KVM: selftests: Mark correct page as mapped in virt_map()
KVM: arm64: selftests: Don't identity map the ucall MMIO hole
KVM: selftests: document the default implementation of vm_vaddr_populate_bitmap
KVM: selftests: Use magic value to signal ucall_alloc() failure
KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warning
KVM: selftests: Include lib.mk before consuming $(CC)
KVM: selftests: Explicitly disable builtins for mem*() overrides
...

+290 -287
+26 -20
Documentation/virt/kvm/api.rst
··· 5343 5343 32 vCPUs in the shared_info page, KVM does not automatically do so 5344 5344 and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used 5345 5345 explicitly even when the vcpu_info for a given vCPU resides at the 5346 - "default" location in the shared_info page. This is because KVM is 5347 - not aware of the Xen CPU id which is used as the index into the 5348 - vcpu_info[] array, so cannot know the correct default location. 5346 + "default" location in the shared_info page. This is because KVM may 5347 + not be aware of the Xen CPU id which is used as the index into the 5348 + vcpu_info[] array, so may know the correct default location. 5349 5349 5350 5350 Note that the shared info page may be constantly written to by KVM; 5351 5351 it contains the event channel bitmap used to deliver interrupts to ··· 5356 5356 any vCPU has been running or any event channel interrupts can be 5357 5357 routed to the guest. 5358 5358 5359 + Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info 5360 + page. 5361 + 5359 5362 KVM_XEN_ATTR_TYPE_UPCALL_VECTOR 5360 5363 Sets the exception vector used to deliver Xen event channel upcalls. 5361 5364 This is the HVM-wide vector injected directly by the hypervisor 5362 5365 (not through the local APIC), typically configured by a guest via 5363 - HVM_PARAM_CALLBACK_IRQ. 5366 + HVM_PARAM_CALLBACK_IRQ. This can be disabled again (e.g. for guest 5367 + SHUTDOWN_soft_reset) by setting it to zero. 5364 5368 5365 5369 KVM_XEN_ATTR_TYPE_EVTCHN 5366 5370 This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates 5367 5371 support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures 5368 5372 an outbound port number for interception of EVTCHNOP_send requests 5369 - from the guest. A given sending port number may be directed back 5370 - to a specified vCPU (by APIC ID) / port / priority on the guest, 5371 - or to trigger events on an eventfd. The vCPU and priority can be 5372 - changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, 5373 - but other fields cannot change for a given sending port. A port 5374 - mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags 5375 - field. 5373 + from the guest. A given sending port number may be directed back to 5374 + a specified vCPU (by APIC ID) / port / priority on the guest, or to 5375 + trigger events on an eventfd. The vCPU and priority can be changed 5376 + by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but but other 5377 + fields cannot change for a given sending port. A port mapping is 5378 + removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags field. Passing 5379 + KVM_XEN_EVTCHN_RESET in the flags field removes all interception of 5380 + outbound event channels. The values of the flags field are mutually 5381 + exclusive and cannot be combined as a bitmask. 5376 5382 5377 5383 KVM_XEN_ATTR_TYPE_XEN_VERSION 5378 5384 This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates ··· 5394 5388 support for KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG. It enables the 5395 5389 XEN_RUNSTATE_UPDATE flag which allows guest vCPUs to safely read 5396 5390 other vCPUs' vcpu_runstate_info. Xen guests enable this feature via 5397 - the VM_ASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist 5391 + the VMASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist 5398 5392 hypercall. 5399 5393 5400 5394 4.127 KVM_XEN_HVM_GET_ATTR ··· 5452 5446 As with the shared_info page for the VM, the corresponding page may be 5453 5447 dirtied at any time if event channel interrupt delivery is enabled, so 5454 5448 userspace should always assume that the page is dirty without relying 5455 - on dirty logging. 5449 + on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable 5450 + the vcpu_info. 5456 5451 5457 5452 KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO 5458 5453 Sets the guest physical address of an additional pvclock structure 5459 5454 for a given vCPU. This is typically used for guest vsyscall support. 5455 + Setting the gpa to KVM_XEN_INVALID_GPA will disable the structure. 5460 5456 5461 5457 KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR 5462 5458 Sets the guest physical address of the vcpu_runstate_info for a given 5463 5459 vCPU. This is how a Xen guest tracks CPU state such as steal time. 5460 + Setting the gpa to KVM_XEN_INVALID_GPA will disable the runstate area. 5464 5461 5465 5462 KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT 5466 5463 Sets the runstate (RUNSTATE_running/_runnable/_blocked/_offline) of ··· 5496 5487 This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates 5497 5488 support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the 5498 5489 event channel port/priority for the VIRQ_TIMER of the vCPU, as well 5499 - as allowing a pending timer to be saved/restored. 5490 + as allowing a pending timer to be saved/restored. Setting the timer 5491 + port to zero disables kernel handling of the singleshot timer. 5500 5492 5501 5493 KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR 5502 5494 This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates ··· 5505 5495 per-vCPU local APIC upcall vector, configured by a Xen guest with 5506 5496 the HVMOP_set_evtchn_upcall_vector hypercall. This is typically 5507 5497 used by Windows guests, and is distinct from the HVM-wide upcall 5508 - vector configured with HVM_PARAM_CALLBACK_IRQ. 5498 + vector configured with HVM_PARAM_CALLBACK_IRQ. It is disabled by 5499 + setting the vector to zero. 5509 5500 5510 5501 5511 5502 4.129 KVM_XEN_VCPU_GET_ATTR ··· 6587 6576 Please note that the kernel is allowed to use the kvm_run structure as the 6588 6577 primary storage for certain register types. Therefore, the kernel may use the 6589 6578 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. 6590 - 6591 - :: 6592 - 6593 - }; 6594 - 6595 6579 6596 6580 6597 6581 6. Capabilities that can be enabled on vCPUs
+14 -5
Documentation/virt/kvm/locking.rst
··· 16 16 - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring 17 17 them together is quite rare. 18 18 19 - - Unlike kvm->slots_lock, kvm->slots_arch_lock is released before 20 - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock 21 - can be taken inside a kvm->srcu read-side critical section, 22 - while kvm->slots_lock cannot. 23 - 24 19 - kvm->mn_active_invalidate_count ensures that pairs of 25 20 invalidate_range_start() and invalidate_range_end() callbacks 26 21 use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock 27 22 are taken on the waiting side in install_new_memslots, so MMU notifiers 28 23 must not take either kvm->slots_lock or kvm->slots_arch_lock. 24 + 25 + For SRCU: 26 + 27 + - ``synchronize_srcu(&kvm->srcu)`` is called _inside_ 28 + the kvm->slots_lock critical section, therefore kvm->slots_lock 29 + cannot be taken inside a kvm->srcu read-side critical section. 30 + Instead, kvm->slots_arch_lock is released before the call 31 + to ``synchronize_srcu()`` and _can_ be taken inside a 32 + kvm->srcu read-side critical section. 33 + 34 + - kvm->lock is taken inside kvm->srcu, therefore 35 + ``synchronize_srcu(&kvm->srcu)`` cannot be called inside 36 + a kvm->lock critical section. If you cannot delay the 37 + call until after kvm->lock is released, use ``call_srcu``. 29 38 30 39 On x86: 31 40
+1 -1
MAINTAINERS
··· 11468 11468 F: arch/x86/kvm/kvm_onhyperv.* 11469 11469 F: arch/x86/kvm/svm/hyperv.* 11470 11470 F: arch/x86/kvm/svm/svm_onhyperv.* 11471 - F: arch/x86/kvm/vmx/evmcs.* 11471 + F: arch/x86/kvm/vmx/hyperv.* 11472 11472 11473 11473 KVM X86 Xen (KVM/Xen) 11474 11474 M: David Woodhouse <dwmw2@infradead.org>
+36 -27
arch/x86/kvm/hyperv.c
··· 1769 1769 } 1770 1770 1771 1771 struct kvm_hv_hcall { 1772 + /* Hypercall input data */ 1772 1773 u64 param; 1773 1774 u64 ingpa; 1774 1775 u64 outgpa; ··· 1780 1779 bool fast; 1781 1780 bool rep; 1782 1781 sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS]; 1782 + 1783 + /* 1784 + * Current read offset when KVM reads hypercall input data gradually, 1785 + * either offset in bytes from 'ingpa' for regular hypercalls or the 1786 + * number of already consumed 'XMM halves' for 'fast' hypercalls. 1787 + */ 1788 + union { 1789 + gpa_t data_offset; 1790 + int consumed_xmm_halves; 1791 + }; 1783 1792 }; 1784 1793 1785 1794 1786 1795 static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc, 1787 - u16 orig_cnt, u16 cnt_cap, u64 *data, 1788 - int consumed_xmm_halves, gpa_t offset) 1796 + u16 orig_cnt, u16 cnt_cap, u64 *data) 1789 1797 { 1790 1798 /* 1791 1799 * Preserve the original count when ignoring entries via a "cap", KVM ··· 1809 1799 * Each XMM holds two sparse banks, but do not count halves that 1810 1800 * have already been consumed for hypercall parameters. 1811 1801 */ 1812 - if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves) 1802 + if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - hc->consumed_xmm_halves) 1813 1803 return HV_STATUS_INVALID_HYPERCALL_INPUT; 1814 1804 1815 1805 for (i = 0; i < cnt; i++) { 1816 - j = i + consumed_xmm_halves; 1806 + j = i + hc->consumed_xmm_halves; 1817 1807 if (j % 2) 1818 1808 data[i] = sse128_hi(hc->xmm[j / 2]); 1819 1809 else ··· 1822 1812 return 0; 1823 1813 } 1824 1814 1825 - return kvm_read_guest(kvm, hc->ingpa + offset, data, 1815 + return kvm_read_guest(kvm, hc->ingpa + hc->data_offset, data, 1826 1816 cnt * sizeof(*data)); 1827 1817 } 1828 1818 1829 1819 static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc, 1830 - u64 *sparse_banks, int consumed_xmm_halves, 1831 - gpa_t offset) 1820 + u64 *sparse_banks) 1832 1821 { 1833 1822 if (hc->var_cnt > HV_MAX_SPARSE_VCPU_BANKS) 1834 1823 return -EINVAL; 1835 1824 1836 1825 /* Cap var_cnt to ignore banks that cannot contain a legal VP index. */ 1837 1826 return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS, 1838 - sparse_banks, consumed_xmm_halves, offset); 1827 + sparse_banks); 1839 1828 } 1840 1829 1841 - static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[], 1842 - int consumed_xmm_halves, gpa_t offset) 1830 + static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[]) 1843 1831 { 1844 - return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt, 1845 - entries, consumed_xmm_halves, offset); 1832 + return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt, entries); 1846 1833 } 1847 1834 1848 1835 static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu, ··· 1933 1926 struct kvm_vcpu *v; 1934 1927 unsigned long i; 1935 1928 bool all_cpus; 1936 - int consumed_xmm_halves = 0; 1937 - gpa_t data_offset; 1938 1929 1939 1930 /* 1940 1931 * The Hyper-V TLFS doesn't allow more than HV_MAX_SPARSE_VCPU_BANKS ··· 1960 1955 flush.address_space = hc->ingpa; 1961 1956 flush.flags = hc->outgpa; 1962 1957 flush.processor_mask = sse128_lo(hc->xmm[0]); 1963 - consumed_xmm_halves = 1; 1958 + hc->consumed_xmm_halves = 1; 1964 1959 } else { 1965 1960 if (unlikely(kvm_read_guest(kvm, hc->ingpa, 1966 1961 &flush, sizeof(flush)))) 1967 1962 return HV_STATUS_INVALID_HYPERCALL_INPUT; 1968 - data_offset = sizeof(flush); 1963 + hc->data_offset = sizeof(flush); 1969 1964 } 1970 1965 1971 1966 trace_kvm_hv_flush_tlb(flush.processor_mask, ··· 1990 1985 flush_ex.flags = hc->outgpa; 1991 1986 memcpy(&flush_ex.hv_vp_set, 1992 1987 &hc->xmm[0], sizeof(hc->xmm[0])); 1993 - consumed_xmm_halves = 2; 1988 + hc->consumed_xmm_halves = 2; 1994 1989 } else { 1995 1990 if (unlikely(kvm_read_guest(kvm, hc->ingpa, &flush_ex, 1996 1991 sizeof(flush_ex)))) 1997 1992 return HV_STATUS_INVALID_HYPERCALL_INPUT; 1998 - data_offset = sizeof(flush_ex); 1993 + hc->data_offset = sizeof(flush_ex); 1999 1994 } 2000 1995 2001 1996 trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask, ··· 2014 2009 if (!hc->var_cnt) 2015 2010 goto ret_success; 2016 2011 2017 - if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2018 - consumed_xmm_halves, data_offset)) 2012 + if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks)) 2019 2013 return HV_STATUS_INVALID_HYPERCALL_INPUT; 2020 2014 } 2021 2015 ··· 2025 2021 * consumed_xmm_halves to make sure TLB flush entries are read 2026 2022 * from the correct offset. 2027 2023 */ 2028 - data_offset += hc->var_cnt * sizeof(sparse_banks[0]); 2029 - consumed_xmm_halves += hc->var_cnt; 2024 + if (hc->fast) 2025 + hc->consumed_xmm_halves += hc->var_cnt; 2026 + else 2027 + hc->data_offset += hc->var_cnt * sizeof(sparse_banks[0]); 2030 2028 } 2031 2029 2032 2030 if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE || ··· 2036 2030 hc->rep_cnt > ARRAY_SIZE(__tlb_flush_entries)) { 2037 2031 tlb_flush_entries = NULL; 2038 2032 } else { 2039 - if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries, 2040 - consumed_xmm_halves, data_offset)) 2033 + if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries)) 2041 2034 return HV_STATUS_INVALID_HYPERCALL_INPUT; 2042 2035 tlb_flush_entries = __tlb_flush_entries; 2043 2036 } ··· 2185 2180 if (!hc->var_cnt) 2186 2181 goto ret_success; 2187 2182 2188 - if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1, 2189 - offsetof(struct hv_send_ipi_ex, 2190 - vp_set.bank_contents))) 2183 + if (!hc->fast) 2184 + hc->data_offset = offsetof(struct hv_send_ipi_ex, 2185 + vp_set.bank_contents); 2186 + else 2187 + hc->consumed_xmm_halves = 1; 2188 + 2189 + if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks)) 2191 2190 return HV_STATUS_INVALID_HYPERCALL_INPUT; 2192 2191 } 2193 2192
+3 -2
arch/x86/kvm/irq_comm.c
··· 426 426 kvm_set_msi_irq(vcpu->kvm, entry, &irq); 427 427 428 428 if (irq.trig_mode && 429 - kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT, 430 - irq.dest_id, irq.dest_mode)) 429 + (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT, 430 + irq.dest_id, irq.dest_mode) || 431 + kvm_apic_pending_eoi(vcpu, irq.vector))) 431 432 __set_bit(irq.vector, ioapic_handled_vectors); 432 433 } 433 434 }
+2 -2
arch/x86/kvm/lapic.h
··· 188 188 189 189 extern struct static_key_false_deferred apic_hw_disabled; 190 190 191 - static inline int kvm_apic_hw_enabled(struct kvm_lapic *apic) 191 + static inline bool kvm_apic_hw_enabled(struct kvm_lapic *apic) 192 192 { 193 193 if (static_branch_unlikely(&apic_hw_disabled.key)) 194 194 return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE; 195 - return MSR_IA32_APICBASE_ENABLE; 195 + return true; 196 196 } 197 197 198 198 extern struct static_key_false_deferred apic_sw_disabled;
+1 -1
arch/x86/kvm/mmu/spte.h
··· 363 363 * A shadow-present leaf SPTE may be non-writable for 4 possible reasons: 364 364 * 365 365 * 1. To intercept writes for dirty logging. KVM write-protects huge pages 366 - * so that they can be split be split down into the dirty logging 366 + * so that they can be split down into the dirty logging 367 367 * granularity (4KiB) whenever the guest writes to them. KVM also 368 368 * write-protects 4KiB pages so that writes can be recorded in the dirty log 369 369 * (e.g. if not using PML). SPTEs are write-protected for dirty logging
+18 -7
arch/x86/kvm/mmu/tdp_mmu.c
··· 1074 1074 int ret = RET_PF_FIXED; 1075 1075 bool wrprot = false; 1076 1076 1077 - WARN_ON(sp->role.level != fault->goal_level); 1077 + if (WARN_ON_ONCE(sp->role.level != fault->goal_level)) 1078 + return RET_PF_RETRY; 1079 + 1078 1080 if (unlikely(!fault->slot)) 1079 1081 new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); 1080 1082 else ··· 1175 1173 if (fault->nx_huge_page_workaround_enabled) 1176 1174 disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); 1177 1175 1178 - if (iter.level == fault->goal_level) 1179 - break; 1180 - 1181 1176 /* 1182 1177 * If SPTE has been frozen by another thread, just give up and 1183 1178 * retry, avoiding unnecessary page table allocation and free. 1184 1179 */ 1185 1180 if (is_removed_spte(iter.old_spte)) 1186 1181 goto retry; 1182 + 1183 + if (iter.level == fault->goal_level) 1184 + goto map_target_level; 1187 1185 1188 1186 /* Step down into the lower level page table if it exists. */ 1189 1187 if (is_shadow_present_pte(iter.old_spte) && ··· 1205 1203 r = tdp_mmu_link_sp(kvm, &iter, sp, true); 1206 1204 1207 1205 /* 1208 - * Also force the guest to retry the access if the upper level SPTEs 1209 - * aren't in place. 1206 + * Force the guest to retry if installing an upper level SPTE 1207 + * failed, e.g. because a different task modified the SPTE. 1210 1208 */ 1211 1209 if (r) { 1212 1210 tdp_mmu_free_sp(sp); ··· 1216 1214 if (fault->huge_page_disallowed && 1217 1215 fault->req_level >= iter.level) { 1218 1216 spin_lock(&kvm->arch.tdp_mmu_pages_lock); 1219 - track_possible_nx_huge_page(kvm, sp); 1217 + if (sp->nx_huge_page_disallowed) 1218 + track_possible_nx_huge_page(kvm, sp); 1220 1219 spin_unlock(&kvm->arch.tdp_mmu_pages_lock); 1221 1220 } 1222 1221 } 1223 1222 1223 + /* 1224 + * The walk aborted before reaching the target level, e.g. because the 1225 + * iterator detected an upper level SPTE was frozen during traversal. 1226 + */ 1227 + WARN_ON_ONCE(iter.level == fault->goal_level); 1228 + goto retry; 1229 + 1230 + map_target_level: 1224 1231 ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); 1225 1232 1226 1233 retry:
+2 -1
arch/x86/kvm/pmu.c
··· 238 238 return false; 239 239 240 240 /* recalibrate sample period and check if it's accepted by perf core */ 241 - if (perf_event_period(pmc->perf_event, 241 + if (is_sampling_event(pmc->perf_event) && 242 + perf_event_period(pmc->perf_event, 242 243 get_sample_period(pmc, pmc->counter))) 243 244 return false; 244 245
+2 -1
arch/x86/kvm/pmu.h
··· 140 140 141 141 static inline void pmc_update_sample_period(struct kvm_pmc *pmc) 142 142 { 143 - if (!pmc->perf_event || pmc->is_paused) 143 + if (!pmc->perf_event || pmc->is_paused || 144 + !is_sampling_event(pmc->perf_event)) 144 145 return; 145 146 146 147 perf_event_period(pmc->perf_event,
+15 -5
arch/x86/kvm/vmx/nested.c
··· 5296 5296 if (vmptr == vmx->nested.current_vmptr) 5297 5297 nested_release_vmcs12(vcpu); 5298 5298 5299 - kvm_vcpu_write_guest(vcpu, 5300 - vmptr + offsetof(struct vmcs12, 5301 - launch_state), 5302 - &zero, sizeof(zero)); 5299 + /* 5300 + * Silently ignore memory errors on VMCLEAR, Intel's pseudocode 5301 + * for VMCLEAR includes a "ensure that data for VMCS referenced 5302 + * by the operand is in memory" clause that guards writes to 5303 + * memory, i.e. doing nothing for I/O is architecturally valid. 5304 + * 5305 + * FIXME: Suppress failures if and only if no memslot is found, 5306 + * i.e. exit to userspace if __copy_to_user() fails. 5307 + */ 5308 + (void)kvm_vcpu_write_guest(vcpu, 5309 + vmptr + offsetof(struct vmcs12, 5310 + launch_state), 5311 + &zero, sizeof(zero)); 5303 5312 } else if (vmx->nested.hv_evmcs && vmptr == vmx->nested.hv_evmcs_vmptr) { 5304 5313 nested_release_evmcs(vcpu); 5305 5314 } ··· 6882 6873 SECONDARY_EXEC_ENABLE_INVPCID | 6883 6874 SECONDARY_EXEC_RDSEED_EXITING | 6884 6875 SECONDARY_EXEC_XSAVES | 6885 - SECONDARY_EXEC_TSC_SCALING; 6876 + SECONDARY_EXEC_TSC_SCALING | 6877 + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; 6886 6878 6887 6879 /* 6888 6880 * We can emulate "VMCS shadowing," even if the hardware
+7
arch/x86/kvm/vmx/vmx.c
··· 4459 4459 * controls for features that are/aren't exposed to the guest. 4460 4460 */ 4461 4461 if (nested) { 4462 + /* 4463 + * All features that can be added or removed to VMX MSRs must 4464 + * be supported in the first place for nested virtualization. 4465 + */ 4466 + if (WARN_ON_ONCE(!(vmcs_config.nested.secondary_ctls_high & control))) 4467 + enabled = false; 4468 + 4462 4469 if (enabled) 4463 4470 vmx->nested.msrs.secondary_ctls_high |= control; 4464 4471 else
+3
arch/x86/kvm/x86.c
··· 13132 13132 struct x86_exception *e) 13133 13133 { 13134 13134 if (r == X86EMUL_PROPAGATE_FAULT) { 13135 + if (KVM_BUG_ON(!e, vcpu->kvm)) 13136 + return -EIO; 13137 + 13135 13138 kvm_inject_emulated_page_fault(vcpu, e); 13136 13139 return 1; 13137 13140 }
+75 -69
arch/x86/kvm/xen.c
··· 41 41 int ret = 0; 42 42 int idx = srcu_read_lock(&kvm->srcu); 43 43 44 - if (gfn == GPA_INVALID) { 44 + if (gfn == KVM_XEN_INVALID_GFN) { 45 45 kvm_gpc_deactivate(gpc); 46 46 goto out; 47 47 } ··· 659 659 if (kvm->arch.xen.shinfo_cache.active) 660 660 data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa); 661 661 else 662 - data->u.shared_info.gfn = GPA_INVALID; 662 + data->u.shared_info.gfn = KVM_XEN_INVALID_GFN; 663 663 r = 0; 664 664 break; 665 665 ··· 705 705 BUILD_BUG_ON(offsetof(struct vcpu_info, time) != 706 706 offsetof(struct compat_vcpu_info, time)); 707 707 708 - if (data->u.gpa == GPA_INVALID) { 708 + if (data->u.gpa == KVM_XEN_INVALID_GPA) { 709 709 kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache); 710 710 r = 0; 711 711 break; ··· 719 719 break; 720 720 721 721 case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: 722 - if (data->u.gpa == GPA_INVALID) { 722 + if (data->u.gpa == KVM_XEN_INVALID_GPA) { 723 723 kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_time_info_cache); 724 724 r = 0; 725 725 break; ··· 739 739 r = -EOPNOTSUPP; 740 740 break; 741 741 } 742 - if (data->u.gpa == GPA_INVALID) { 742 + if (data->u.gpa == KVM_XEN_INVALID_GPA) { 743 743 r = 0; 744 744 deactivate_out: 745 745 kvm_gpc_deactivate(&vcpu->arch.xen.runstate_cache); ··· 937 937 if (vcpu->arch.xen.vcpu_info_cache.active) 938 938 data->u.gpa = vcpu->arch.xen.vcpu_info_cache.gpa; 939 939 else 940 - data->u.gpa = GPA_INVALID; 940 + data->u.gpa = KVM_XEN_INVALID_GPA; 941 941 r = 0; 942 942 break; 943 943 ··· 945 945 if (vcpu->arch.xen.vcpu_time_info_cache.active) 946 946 data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa; 947 947 else 948 - data->u.gpa = GPA_INVALID; 948 + data->u.gpa = KVM_XEN_INVALID_GPA; 949 949 r = 0; 950 950 break; 951 951 ··· 1069 1069 u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64 1070 1070 : kvm->arch.xen_hvm_config.blob_size_32; 1071 1071 u8 *page; 1072 + int ret; 1072 1073 1073 1074 if (page_num >= blob_size) 1074 1075 return 1; ··· 1080 1079 if (IS_ERR(page)) 1081 1080 return PTR_ERR(page); 1082 1081 1083 - if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) { 1084 - kfree(page); 1082 + ret = kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE); 1083 + kfree(page); 1084 + if (ret) 1085 1085 return 1; 1086 - } 1087 1086 } 1088 1087 return 0; 1089 1088 } ··· 1184 1183 static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, bool longmode, 1185 1184 u64 param, u64 *r) 1186 1185 { 1187 - int idx, i; 1188 1186 struct sched_poll sched_poll; 1189 1187 evtchn_port_t port, *ports; 1190 - gpa_t gpa; 1188 + struct x86_exception e; 1189 + int i; 1191 1190 1192 1191 if (!lapic_in_kernel(vcpu) || 1193 1192 !(vcpu->kvm->arch.xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_EVTCHN_SEND)) 1194 1193 return false; 1195 - 1196 - idx = srcu_read_lock(&vcpu->kvm->srcu); 1197 - gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); 1198 - srcu_read_unlock(&vcpu->kvm->srcu, idx); 1199 - if (!gpa) { 1200 - *r = -EFAULT; 1201 - return true; 1202 - } 1203 1194 1204 1195 if (IS_ENABLED(CONFIG_64BIT) && !longmode) { 1205 1196 struct compat_sched_poll sp32; ··· 1199 1206 /* Sanity check that the compat struct definition is correct */ 1200 1207 BUILD_BUG_ON(sizeof(sp32) != 16); 1201 1208 1202 - if (kvm_vcpu_read_guest(vcpu, gpa, &sp32, sizeof(sp32))) { 1209 + if (kvm_read_guest_virt(vcpu, param, &sp32, sizeof(sp32), &e)) { 1203 1210 *r = -EFAULT; 1204 1211 return true; 1205 1212 } ··· 1213 1220 sched_poll.nr_ports = sp32.nr_ports; 1214 1221 sched_poll.timeout = sp32.timeout; 1215 1222 } else { 1216 - if (kvm_vcpu_read_guest(vcpu, gpa, &sched_poll, 1217 - sizeof(sched_poll))) { 1223 + if (kvm_read_guest_virt(vcpu, param, &sched_poll, 1224 + sizeof(sched_poll), &e)) { 1218 1225 *r = -EFAULT; 1219 1226 return true; 1220 1227 } ··· 1236 1243 } else 1237 1244 ports = &port; 1238 1245 1239 - for (i = 0; i < sched_poll.nr_ports; i++) { 1240 - idx = srcu_read_lock(&vcpu->kvm->srcu); 1241 - gpa = kvm_mmu_gva_to_gpa_system(vcpu, 1242 - (gva_t)(sched_poll.ports + i), 1243 - NULL); 1244 - srcu_read_unlock(&vcpu->kvm->srcu, idx); 1246 + if (kvm_read_guest_virt(vcpu, (gva_t)sched_poll.ports, ports, 1247 + sched_poll.nr_ports * sizeof(*ports), &e)) { 1248 + *r = -EFAULT; 1249 + return true; 1250 + } 1245 1251 1246 - if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, 1247 - &ports[i], sizeof(port))) { 1248 - *r = -EFAULT; 1249 - goto out; 1250 - } 1252 + for (i = 0; i < sched_poll.nr_ports; i++) { 1251 1253 if (ports[i] >= max_evtchn_port(vcpu->kvm)) { 1252 1254 *r = -EINVAL; 1253 1255 goto out; ··· 1318 1330 int vcpu_id, u64 param, u64 *r) 1319 1331 { 1320 1332 struct vcpu_set_singleshot_timer oneshot; 1333 + struct x86_exception e; 1321 1334 s64 delta; 1322 - gpa_t gpa; 1323 - int idx; 1324 1335 1325 1336 if (!kvm_xen_timer_enabled(vcpu)) 1326 1337 return false; ··· 1330 1343 *r = -EINVAL; 1331 1344 return true; 1332 1345 } 1333 - idx = srcu_read_lock(&vcpu->kvm->srcu); 1334 - gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); 1335 - srcu_read_unlock(&vcpu->kvm->srcu, idx); 1336 1346 1337 1347 /* 1338 1348 * The only difference for 32-bit compat is the 4 bytes of ··· 1347 1363 BUILD_BUG_ON(sizeof_field(struct compat_vcpu_set_singleshot_timer, flags) != 1348 1364 sizeof_field(struct vcpu_set_singleshot_timer, flags)); 1349 1365 1350 - if (!gpa || 1351 - kvm_vcpu_read_guest(vcpu, gpa, &oneshot, longmode ? sizeof(oneshot) : 1352 - sizeof(struct compat_vcpu_set_singleshot_timer))) { 1366 + if (kvm_read_guest_virt(vcpu, param, &oneshot, longmode ? sizeof(oneshot) : 1367 + sizeof(struct compat_vcpu_set_singleshot_timer), &e)) { 1353 1368 *r = -EFAULT; 1354 1369 return true; 1355 1370 } ··· 1808 1825 { 1809 1826 u32 port = data->u.evtchn.send_port; 1810 1827 struct evtchnfd *evtchnfd; 1828 + int ret; 1811 1829 1812 - if (!port || port >= max_evtchn_port(kvm)) 1813 - return -EINVAL; 1814 - 1830 + /* Protect writes to evtchnfd as well as the idr lookup. */ 1815 1831 mutex_lock(&kvm->lock); 1816 1832 evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); 1817 - mutex_unlock(&kvm->lock); 1818 1833 1834 + ret = -ENOENT; 1819 1835 if (!evtchnfd) 1820 - return -ENOENT; 1836 + goto out_unlock; 1821 1837 1822 1838 /* For an UPDATE, nothing may change except the priority/vcpu */ 1839 + ret = -EINVAL; 1823 1840 if (evtchnfd->type != data->u.evtchn.type) 1824 - return -EINVAL; 1841 + goto out_unlock; 1825 1842 1826 1843 /* 1827 1844 * Port cannot change, and if it's zero that was an eventfd ··· 1829 1846 */ 1830 1847 if (!evtchnfd->deliver.port.port || 1831 1848 evtchnfd->deliver.port.port != data->u.evtchn.deliver.port.port) 1832 - return -EINVAL; 1849 + goto out_unlock; 1833 1850 1834 1851 /* We only support 2 level event channels for now */ 1835 1852 if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) 1836 - return -EINVAL; 1853 + goto out_unlock; 1837 1854 1838 - mutex_lock(&kvm->lock); 1839 1855 evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; 1840 1856 if (evtchnfd->deliver.port.vcpu_id != data->u.evtchn.deliver.port.vcpu) { 1841 1857 evtchnfd->deliver.port.vcpu_id = data->u.evtchn.deliver.port.vcpu; 1842 1858 evtchnfd->deliver.port.vcpu_idx = -1; 1843 1859 } 1860 + ret = 0; 1861 + out_unlock: 1844 1862 mutex_unlock(&kvm->lock); 1845 - return 0; 1863 + return ret; 1846 1864 } 1847 1865 1848 1866 /* ··· 1855 1871 { 1856 1872 u32 port = data->u.evtchn.send_port; 1857 1873 struct eventfd_ctx *eventfd = NULL; 1858 - struct evtchnfd *evtchnfd = NULL; 1874 + struct evtchnfd *evtchnfd; 1859 1875 int ret = -EINVAL; 1860 - 1861 - if (!port || port >= max_evtchn_port(kvm)) 1862 - return -EINVAL; 1863 1876 1864 1877 evtchnfd = kzalloc(sizeof(struct evtchnfd), GFP_KERNEL); 1865 1878 if (!evtchnfd) ··· 1933 1952 if (!evtchnfd) 1934 1953 return -ENOENT; 1935 1954 1936 - if (kvm) 1937 - synchronize_srcu(&kvm->srcu); 1955 + synchronize_srcu(&kvm->srcu); 1938 1956 if (!evtchnfd->deliver.port.port) 1939 1957 eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); 1940 1958 kfree(evtchnfd); ··· 1942 1962 1943 1963 static int kvm_xen_eventfd_reset(struct kvm *kvm) 1944 1964 { 1945 - struct evtchnfd *evtchnfd; 1965 + struct evtchnfd *evtchnfd, **all_evtchnfds; 1946 1966 int i; 1967 + int n = 0; 1947 1968 1948 1969 mutex_lock(&kvm->lock); 1970 + 1971 + /* 1972 + * Because synchronize_srcu() cannot be called inside the 1973 + * critical section, first collect all the evtchnfd objects 1974 + * in an array as they are removed from evtchn_ports. 1975 + */ 1976 + idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) 1977 + n++; 1978 + 1979 + all_evtchnfds = kmalloc_array(n, sizeof(struct evtchnfd *), GFP_KERNEL); 1980 + if (!all_evtchnfds) { 1981 + mutex_unlock(&kvm->lock); 1982 + return -ENOMEM; 1983 + } 1984 + 1985 + n = 0; 1949 1986 idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { 1987 + all_evtchnfds[n++] = evtchnfd; 1950 1988 idr_remove(&kvm->arch.xen.evtchn_ports, evtchnfd->send_port); 1951 - synchronize_srcu(&kvm->srcu); 1989 + } 1990 + mutex_unlock(&kvm->lock); 1991 + 1992 + synchronize_srcu(&kvm->srcu); 1993 + 1994 + while (n--) { 1995 + evtchnfd = all_evtchnfds[n]; 1952 1996 if (!evtchnfd->deliver.port.port) 1953 1997 eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); 1954 1998 kfree(evtchnfd); 1955 1999 } 1956 - mutex_unlock(&kvm->lock); 2000 + kfree(all_evtchnfds); 1957 2001 1958 2002 return 0; 1959 2003 } ··· 2006 2002 { 2007 2003 struct evtchnfd *evtchnfd; 2008 2004 struct evtchn_send send; 2009 - gpa_t gpa; 2010 - int idx; 2005 + struct x86_exception e; 2011 2006 2012 - idx = srcu_read_lock(&vcpu->kvm->srcu); 2013 - gpa = kvm_mmu_gva_to_gpa_system(vcpu, param, NULL); 2014 - srcu_read_unlock(&vcpu->kvm->srcu, idx); 2015 - 2016 - if (!gpa || kvm_vcpu_read_guest(vcpu, gpa, &send, sizeof(send))) { 2007 + /* Sanity check: this structure is the same for 32-bit and 64-bit */ 2008 + BUILD_BUG_ON(sizeof(send) != 4); 2009 + if (kvm_read_guest_virt(vcpu, param, &send, sizeof(send), &e)) { 2017 2010 *r = -EFAULT; 2018 2011 return true; 2019 2012 } 2020 2013 2021 - /* The evtchn_ports idr is protected by vcpu->kvm->srcu */ 2014 + /* 2015 + * evtchnfd is protected by kvm->srcu; the idr lookup instead 2016 + * is protected by RCU. 2017 + */ 2018 + rcu_read_lock(); 2022 2019 evtchnfd = idr_find(&vcpu->kvm->arch.xen.evtchn_ports, send.port); 2020 + rcu_read_unlock(); 2023 2021 if (!evtchnfd) 2024 2022 return false; 2025 2023
+3
include/uapi/linux/kvm.h
··· 1767 1767 __u8 runstate_update_flag; 1768 1768 struct { 1769 1769 __u64 gfn; 1770 + #define KVM_XEN_INVALID_GFN ((__u64)-1) 1770 1771 } shared_info; 1771 1772 struct { 1772 1773 __u32 send_port; ··· 1799 1798 } u; 1800 1799 }; 1801 1800 1801 + 1802 1802 /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */ 1803 1803 #define KVM_XEN_ATTR_TYPE_LONG_MODE 0x0 1804 1804 #define KVM_XEN_ATTR_TYPE_SHARED_INFO 0x1 ··· 1825 1823 __u16 pad[3]; 1826 1824 union { 1827 1825 __u64 gpa; 1826 + #define KVM_XEN_INVALID_GPA ((__u64)-1) 1828 1827 __u64 pad[8]; 1829 1828 struct { 1830 1829 __u64 state;
+6 -85
tools/testing/selftests/kvm/.gitignore
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 - /aarch64/aarch32_id_regs 3 - /aarch64/arch_timer 4 - /aarch64/debug-exceptions 5 - /aarch64/get-reg-list 6 - /aarch64/hypercalls 7 - /aarch64/page_fault_test 8 - /aarch64/psci_test 9 - /aarch64/vcpu_width_config 10 - /aarch64/vgic_init 11 - /aarch64/vgic_irq 12 - /s390x/memop 13 - /s390x/resets 14 - /s390x/sync_regs_test 15 - /s390x/tprot 16 - /x86_64/amx_test 17 - /x86_64/cpuid_test 18 - /x86_64/cr4_cpuid_sync_test 19 - /x86_64/debug_regs 20 - /x86_64/exit_on_emulation_failure_test 21 - /x86_64/fix_hypercall_test 22 - /x86_64/get_msr_index_features 23 - /x86_64/kvm_clock_test 24 - /x86_64/kvm_pv_test 25 - /x86_64/hyperv_clock 26 - /x86_64/hyperv_cpuid 27 - /x86_64/hyperv_evmcs 28 - /x86_64/hyperv_features 29 - /x86_64/hyperv_ipi 30 - /x86_64/hyperv_svm_test 31 - /x86_64/hyperv_tlb_flush 32 - /x86_64/max_vcpuid_cap_test 33 - /x86_64/mmio_warning_test 34 - /x86_64/monitor_mwait_test 35 - /x86_64/nested_exceptions_test 36 - /x86_64/nx_huge_pages_test 37 - /x86_64/platform_info_test 38 - /x86_64/pmu_event_filter_test 39 - /x86_64/set_boot_cpu_id 40 - /x86_64/set_sregs_test 41 - /x86_64/sev_migrate_tests 42 - /x86_64/smaller_maxphyaddr_emulation_test 43 - /x86_64/smm_test 44 - /x86_64/state_test 45 - /x86_64/svm_vmcall_test 46 - /x86_64/svm_int_ctl_test 47 - /x86_64/svm_nested_soft_inject_test 48 - /x86_64/svm_nested_shutdown_test 49 - /x86_64/sync_regs_test 50 - /x86_64/tsc_msrs_test 51 - /x86_64/tsc_scaling_sync 52 - /x86_64/ucna_injection_test 53 - /x86_64/userspace_io_test 54 - /x86_64/userspace_msr_exit_test 55 - /x86_64/vmx_apic_access_test 56 - /x86_64/vmx_close_while_nested_test 57 - /x86_64/vmx_dirty_log_test 58 - /x86_64/vmx_exception_with_invalid_guest_state 59 - /x86_64/vmx_invalid_nested_guest_state 60 - /x86_64/vmx_msrs_test 61 - /x86_64/vmx_preemption_timer_test 62 - /x86_64/vmx_set_nested_state_test 63 - /x86_64/vmx_tsc_adjust_test 64 - /x86_64/vmx_nested_tsc_scaling_test 65 - /x86_64/xapic_ipi_test 66 - /x86_64/xapic_state_test 67 - /x86_64/xen_shinfo_test 68 - /x86_64/xen_vmcall_test 69 - /x86_64/xss_msr_test 70 - /x86_64/vmx_pmu_caps_test 71 - /x86_64/triple_fault_event_test 72 - /access_tracking_perf_test 73 - /demand_paging_test 74 - /dirty_log_test 75 - /dirty_log_perf_test 76 - /hardware_disable_test 77 - /kvm_create_max_vcpus 78 - /kvm_page_table_test 79 - /max_guest_memory_test 80 - /memslot_modification_stress_test 81 - /memslot_perf_test 82 - /rseq_test 83 - /set_memory_region_test 84 - /steal_time 85 - /kvm_binary_stats_test 86 - /system_counter_offset_test 2 + * 3 + !/**/ 4 + !*.c 5 + !*.h 6 + !*.S 7 + !*.sh
+23 -41
tools/testing/selftests/kvm/Makefile
··· 7 7 include $(top_srcdir)/scripts/subarch.include 8 8 ARCH ?= $(SUBARCH) 9 9 10 - # For cross-builds to work, UNAME_M has to map to ARCH and arch specific 11 - # directories and targets in this Makefile. "uname -m" doesn't map to 12 - # arch specific sub-directory names. 13 - # 14 - # UNAME_M variable to used to run the compiles pointing to the right arch 15 - # directories and build the right targets for these supported architectures. 16 - # 17 - # TEST_GEN_PROGS and LIBKVM are set using UNAME_M variable. 18 - # LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable. 19 - # 20 - # x86_64 targets are named to include x86_64 as a suffix and directories 21 - # for includes are in x86_64 sub-directory. s390x and aarch64 follow the 22 - # same convention. "uname -m" doesn't result in the correct mapping for 23 - # s390x and aarch64. 24 - # 25 - # No change necessary for x86_64 26 - UNAME_M := $(shell uname -m) 27 - 28 - # Set UNAME_M for arm64 compile/install to work 29 - ifeq ($(ARCH),arm64) 30 - UNAME_M := aarch64 31 - endif 32 - # Set UNAME_M s390x compile/install to work 33 - ifeq ($(ARCH),s390) 34 - UNAME_M := s390x 35 - endif 36 - # Set UNAME_M riscv compile/install to work 37 - ifeq ($(ARCH),riscv) 38 - UNAME_M := riscv 10 + ifeq ($(ARCH),x86) 11 + ARCH_DIR := x86_64 12 + else ifeq ($(ARCH),arm64) 13 + ARCH_DIR := aarch64 14 + else ifeq ($(ARCH),s390) 15 + ARCH_DIR := s390x 16 + else 17 + ARCH_DIR := $(ARCH) 39 18 endif 40 19 41 20 LIBKVM += lib/assert.c ··· 175 196 TEST_GEN_PROGS_riscv += set_memory_region_test 176 197 TEST_GEN_PROGS_riscv += kvm_binary_stats_test 177 198 178 - TEST_PROGS += $(TEST_PROGS_$(UNAME_M)) 179 - TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M)) 180 - TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(UNAME_M)) 181 - LIBKVM += $(LIBKVM_$(UNAME_M)) 199 + TEST_PROGS += $(TEST_PROGS_$(ARCH_DIR)) 200 + TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(ARCH_DIR)) 201 + TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(ARCH_DIR)) 202 + LIBKVM += $(LIBKVM_$(ARCH_DIR)) 203 + 204 + # lib.mak defines $(OUTPUT), prepends $(OUTPUT)/ to $(TEST_GEN_PROGS), and most 205 + # importantly defines, i.e. overwrites, $(CC) (unless `make -e` or `make CC=`, 206 + # which causes the environment variable to override the makefile). 207 + include ../lib.mk 182 208 183 209 INSTALL_HDR_PATH = $(top_srcdir)/usr 184 210 LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ ··· 194 210 LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/$(ARCH)/include 195 211 endif 196 212 CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ 213 + -Wno-gnu-variable-sized-type-not-at-end \ 214 + -fno-builtin-memcmp -fno-builtin-memcpy -fno-builtin-memset \ 197 215 -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ 198 216 -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \ 199 - -I$(<D) -Iinclude/$(UNAME_M) -I ../rseq -I.. $(EXTRA_CFLAGS) \ 217 + -I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \ 200 218 $(KHDR_INCLUDES) 201 219 202 - no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \ 203 - $(CC) -Werror -no-pie -x c - -o "$$TMP", -no-pie) 220 + no-pie-option := $(call try-run, echo 'int main(void) { return 0; }' | \ 221 + $(CC) -Werror $(CFLAGS) -no-pie -x c - -o "$$TMP", -no-pie) 204 222 205 223 # On s390, build the testcases KVM-enabled 206 - pgste-option = $(call try-run, echo 'int main() { return 0; }' | \ 224 + pgste-option = $(call try-run, echo 'int main(void) { return 0; }' | \ 207 225 $(CC) -Werror -Wl$(comma)--s390-pgste -x c - -o "$$TMP",-Wl$(comma)--s390-pgste) 208 226 209 227 LDLIBS += -ldl 210 228 LDFLAGS += -pthread $(no-pie-option) $(pgste-option) 211 - 212 - # After inclusion, $(OUTPUT) is defined and 213 - # $(TEST_GEN_PROGS) starts with $(OUTPUT)/ 214 - include ../lib.mk 215 229 216 230 LIBKVM_C := $(filter %.c,$(LIBKVM)) 217 231 LIBKVM_S := $(filter %.S,$(LIBKVM))
+1 -1
tools/testing/selftests/kvm/aarch64/page_fault_test.c
··· 117 117 GUEST_ASSERT(guest_check_lse()); 118 118 asm volatile(".arch_extension lse\n" 119 119 "casal %0, %1, [%2]\n" 120 - :: "r" (0), "r" (TEST_DATA), "r" (guest_test_memory)); 120 + :: "r" (0ul), "r" (TEST_DATA), "r" (guest_test_memory)); 121 121 val = READ_ONCE(*guest_test_memory); 122 122 GUEST_ASSERT_EQ(val, TEST_DATA); 123 123 }
+4 -2
tools/testing/selftests/kvm/lib/aarch64/ucall.c
··· 14 14 15 15 void ucall_arch_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa) 16 16 { 17 - virt_pg_map(vm, mmio_gpa, mmio_gpa); 17 + vm_vaddr_t mmio_gva = vm_vaddr_unused_gap(vm, vm->page_size, KVM_UTIL_MIN_VADDR); 18 + 19 + virt_map(vm, mmio_gva, mmio_gpa, 1); 18 20 19 21 vm->ucall_mmio_addr = mmio_gpa; 20 22 21 - write_guest_global(vm, ucall_exit_mmio_addr, (vm_vaddr_t *)mmio_gpa); 23 + write_guest_global(vm, ucall_exit_mmio_addr, (vm_vaddr_t *)mmio_gva); 22 24 } 23 25 24 26 void ucall_arch_do_ucall(vm_vaddr_t uc)
+11 -2
tools/testing/selftests/kvm/lib/kvm_util.c
··· 186 186 _Static_assert(sizeof(vm_guest_mode_params)/sizeof(struct vm_guest_mode_params) == NUM_VM_MODES, 187 187 "Missing new mode params?"); 188 188 189 + /* 190 + * Initializes vm->vpages_valid to match the canonical VA space of the 191 + * architecture. 192 + * 193 + * The default implementation is valid for architectures which split the 194 + * range addressed by a single page table into a low and high region 195 + * based on the MSB of the VA. On architectures with this behavior 196 + * the VA region spans [0, 2^(va_bits - 1)), [-(2^(va_bits - 1), -1]. 197 + */ 189 198 __weak void vm_vaddr_populate_bitmap(struct kvm_vm *vm) 190 199 { 191 200 sparsebit_set_num(vm->vpages_valid, ··· 1425 1416 1426 1417 while (npages--) { 1427 1418 virt_pg_map(vm, vaddr, paddr); 1419 + sparsebit_set(vm->vpages_mapped, vaddr >> vm->page_shift); 1420 + 1428 1421 vaddr += page_size; 1429 1422 paddr += page_size; 1430 - 1431 - sparsebit_set(vm->vpages_mapped, vaddr >> vm->page_shift); 1432 1423 } 1433 1424 } 1434 1425
+14 -2
tools/testing/selftests/kvm/lib/ucall_common.c
··· 4 4 #include "linux/bitmap.h" 5 5 #include "linux/atomic.h" 6 6 7 + #define GUEST_UCALL_FAILED -1 8 + 7 9 struct ucall_header { 8 10 DECLARE_BITMAP(in_use, KVM_MAX_VCPUS); 9 11 struct ucall ucalls[KVM_MAX_VCPUS]; ··· 43 41 struct ucall *uc; 44 42 int i; 45 43 46 - GUEST_ASSERT(ucall_pool); 44 + if (!ucall_pool) 45 + goto ucall_failed; 47 46 48 47 for (i = 0; i < KVM_MAX_VCPUS; ++i) { 49 48 if (!test_and_set_bit(i, ucall_pool->in_use)) { ··· 54 51 } 55 52 } 56 53 57 - GUEST_ASSERT(0); 54 + ucall_failed: 55 + /* 56 + * If the vCPU cannot grab a ucall structure, make a bare ucall with a 57 + * magic value to signal to get_ucall() that things went sideways. 58 + * GUEST_ASSERT() depends on ucall_alloc() and so cannot be used here. 59 + */ 60 + ucall_arch_do_ucall(GUEST_UCALL_FAILED); 58 61 return NULL; 59 62 } 60 63 ··· 102 93 103 94 addr = ucall_arch_get_ucall(vcpu); 104 95 if (addr) { 96 + TEST_ASSERT(addr != (void *)GUEST_UCALL_FAILED, 97 + "Guest failed to allocate ucall struct"); 98 + 105 99 memcpy(uc, addr, sizeof(*uc)); 106 100 vcpu_run_complete_io(vcpu); 107 101 } else {
+1 -1
tools/testing/selftests/kvm/lib/x86_64/processor.c
··· 1031 1031 void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits) 1032 1032 { 1033 1033 if (!kvm_cpu_has_p(X86_PROPERTY_MAX_PHY_ADDR)) { 1034 - *pa_bits == kvm_cpu_has(X86_FEATURE_PAE) ? 36 : 32; 1034 + *pa_bits = kvm_cpu_has(X86_FEATURE_PAE) ? 36 : 32; 1035 1035 *va_bits = 32; 1036 1036 } else { 1037 1037 *pa_bits = kvm_cpu_property(X86_PROPERTY_MAX_PHY_ADDR);
+3
tools/testing/selftests/kvm/memslot_perf_test.c
··· 265 265 slots = data->nslots; 266 266 while (--slots > 1) { 267 267 pages_per_slot = mempages / slots; 268 + if (!pages_per_slot) 269 + continue; 270 + 268 271 rempages = mempages % pages_per_slot; 269 272 if (check_slot_pages(host_page_size, guest_page_size, 270 273 pages_per_slot, rempages))
+2 -1
tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
··· 193 193 GUEST_SYNC(stage++); 194 194 /* 195 195 * 'XMM Fast' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL. 196 - * Nothing to write anything to XMM regs. 197 196 */ 197 + ipi_ex->vp_set.valid_bank_mask = 0; 198 + hyperv_write_xmm_input(&ipi_ex->vp_set.valid_bank_mask, 2); 198 199 hyperv_hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT, 199 200 IPI_VECTOR, HV_GENERIC_SET_ALL); 200 201 nop_loop();
+11 -2
tools/testing/selftests/kvm/x86_64/svm_nested_soft_inject_test.c
··· 41 41 static void l2_guest_code_int(void) 42 42 { 43 43 GUEST_ASSERT_1(int_fired == 1, int_fired); 44 - vmmcall(); 45 - ud2(); 44 + 45 + /* 46 + * Same as the vmmcall() function, but with a ud2 sneaked after the 47 + * vmmcall. The caller injects an exception with the return address 48 + * increased by 2, so the "pop rbp" must be after the ud2 and we cannot 49 + * use vmmcall() directly. 50 + */ 51 + __asm__ __volatile__("push %%rbp; vmmcall; ud2; pop %%rbp" 52 + : : "a"(0xdeadbeef), "c"(0xbeefdead) 53 + : "rbx", "rdx", "rsi", "rdi", "r8", "r9", 54 + "r10", "r11", "r12", "r13", "r14", "r15"); 46 55 47 56 GUEST_ASSERT_1(bp_fired == 1, bp_fired); 48 57 hlt();
-5
tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
··· 49 49 NUM_VMX_PAGES, 50 50 }; 51 51 52 - struct kvm_single_msr { 53 - struct kvm_msrs header; 54 - struct kvm_msr_entry entry; 55 - } __attribute__((packed)); 56 - 57 52 /* The virtual machine object. */ 58 53 static struct kvm_vm *vm; 59 54
+6
tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
··· 962 962 } 963 963 964 964 done: 965 + struct kvm_xen_hvm_attr evt_reset = { 966 + .type = KVM_XEN_ATTR_TYPE_EVTCHN, 967 + .u.evtchn.flags = KVM_XEN_EVTCHN_RESET, 968 + }; 969 + vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset); 970 + 965 971 alarm(0); 966 972 clock_gettime(CLOCK_REALTIME, &max_ts); 967 973
-4
virt/kvm/kvm_mm.h
··· 14 14 #define KVM_MMU_LOCK_INIT(kvm) rwlock_init(&(kvm)->mmu_lock) 15 15 #define KVM_MMU_LOCK(kvm) write_lock(&(kvm)->mmu_lock) 16 16 #define KVM_MMU_UNLOCK(kvm) write_unlock(&(kvm)->mmu_lock) 17 - #define KVM_MMU_READ_LOCK(kvm) read_lock(&(kvm)->mmu_lock) 18 - #define KVM_MMU_READ_UNLOCK(kvm) read_unlock(&(kvm)->mmu_lock) 19 17 #else 20 18 #define KVM_MMU_LOCK_INIT(kvm) spin_lock_init(&(kvm)->mmu_lock) 21 19 #define KVM_MMU_LOCK(kvm) spin_lock(&(kvm)->mmu_lock) 22 20 #define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) 23 - #define KVM_MMU_READ_LOCK(kvm) spin_lock(&(kvm)->mmu_lock) 24 - #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) 25 21 #endif /* KVM_HAVE_MMU_RWLOCK */ 26 22 27 23 kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible,