commits
User-controlled indices are used to access AIA CSR registers.
Sanitize them with array_index_nospec() to prevent speculative
out-of-bounds access.
Similar to x86 commit 8c86405f606c ("KVM: x86: Protect
ioapic_read_indirect() from Spectre-v1/L1TF attacks") and arm64
commit 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1
in vgic_get_irq()").
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://lore.kernel.org/r/20260303-kvm-riscv-spectre-v1-v2-2-192caab8e0dc@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
User-controlled register indices from the ONE_REG ioctl are used to
index into arrays of register values. Sanitize them with
array_index_nospec() to prevent speculative out-of-bounds access.
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://lore.kernel.org/r/20260303-kvm-riscv-spectre-v1-v2-1-192caab8e0dc@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
When dirty logging is enabled, guest stage mappings are forced to
PAGE_SIZE granularity. Changing the mapping page size at this point
is incorrect.
Fixes: ed7ae7a34bea ("RISC-V: KVM: Transparent huge page support")
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260226191231140_X1Juus7s2kgVlc0ZyW_K@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM_DEV_RISCV_AIA_GRP_APLIC branch of aia_has_attr() was identified
to have a race condition with concurrent KVM_SET_DEVICE_ATTR ioctls,
leading to a use-after-free bug.
Upon analyzing the code, it was discovered that the
KVM_DEV_RISCV_AIA_GRP_IMSIC branch of aia_has_attr() suffers from the same
lack of synchronization. It invokes kvm_riscv_aia_imsic_has_attr() without
holding dev->kvm->lock.
While aia_has_attr() is running, a concurrent aia_set_attr() could call
aia_init() under the dev->kvm->lock. If aia_init() fails, it may trigger
kvm_riscv_vcpu_aia_imsic_cleanup(), which frees imsic_state. Without proper
locking, kvm_riscv_aia_imsic_has_attr() could attempt to access imsic_state
while it is being deallocated.
Although this specific path has not yet been reported by a fuzzer, it
is logically identical to the APLIC issue. Fix this by acquiring the
dev->kvm->lock before calling kvm_riscv_aia_imsic_has_attr(), ensuring
consistency with the locking pattern used for other AIA attribute groups.
Fixes: 5463091a51cf ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260304080804.2281721-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
Fuzzer reports a KASAN use-after-free bug triggered by a race
between KVM_HAS_DEVICE_ATTR and KVM_SET_DEVICE_ATTR ioctls on
the AIA device. The root cause is that aia_has_attr() invokes
kvm_riscv_aia_aplic_has_attr() without holding dev->kvm->lock, while
a concurrent aia_set_attr() may call aia_init() under that lock. When
aia_init() fails after kvm_riscv_aia_aplic_init() has succeeded, it
calls kvm_riscv_aia_aplic_cleanup() in its fail_cleanup_imsics path,
which frees both aplic_state and aplic_state->irqs. The concurrent
has_attr path can then dereference the freed aplic->irqs in
aplic_read_pending():
irqd = &aplic->irqs[irq]; /* UAF here */
KASAN report:
BUG: KASAN: slab-use-after-free in aplic_read_pending
arch/riscv/kvm/aia_aplic.c:119 [inline]
BUG: KASAN: slab-use-after-free in aplic_read_pending_word
arch/riscv/kvm/aia_aplic.c:351 [inline]
BUG: KASAN: slab-use-after-free in aplic_mmio_read_offset
arch/riscv/kvm/aia_aplic.c:406
Read of size 8 at addr ff600000ba965d58 by task 9498
Call Trace:
aplic_read_pending arch/riscv/kvm/aia_aplic.c:119 [inline]
aplic_read_pending_word arch/riscv/kvm/aia_aplic.c:351 [inline]
aplic_mmio_read_offset arch/riscv/kvm/aia_aplic.c:406
kvm_riscv_aia_aplic_has_attr arch/riscv/kvm/aia_aplic.c:566
aia_has_attr arch/riscv/kvm/aia_device.c:469
allocated by task 9473:
kvm_riscv_aia_aplic_init arch/riscv/kvm/aia_aplic.c:583
aia_init arch/riscv/kvm/aia_device.c:248 [inline]
aia_set_attr arch/riscv/kvm/aia_device.c:334
freed by task 9473:
kvm_riscv_aia_aplic_cleanup arch/riscv/kvm/aia_aplic.c:644
aia_init arch/riscv/kvm/aia_device.c:292 [inline]
aia_set_attr arch/riscv/kvm/aia_device.c:334
Fix this race by acquiring dev->kvm->lock in aia_has_attr() before
calling kvm_riscv_aia_aplic_has_attr(), consistent with the locking
pattern used in aia_get_attr() and aia_set_attr().
Fixes: 289a007b98b06d ("RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260302132703.1721415-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The indexed array only has RISCV_KVM_MAX_COUNTERS elements.
The out-of-bound access could have been performed by a guest,
but it could only access another guest accessible data.
Fixes: 8f0153ecd3bf ("RISC-V: KVM: Add skeleton support for perf")
Signed-off-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260227134617.23378-1-radim.krcmar@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
kvm_riscv_vcpu_aia_rmw_topei() assumes that the per-vCPU IMSIC state has
been initialized once AIA is reported as available and initialized at
the VM level. This assumption does not always hold.
Under fuzzed ioctl sequences, a guest may access the IMSIC TOPEI CSR
before the vCPU IMSIC state is set up. In this case,
vcpu->arch.aia_context.imsic_state is still NULL, and the TOPEI RMW path
dereferences it unconditionally, leading to a host kernel crash.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff0000000e
...
kvm_riscv_vcpu_aia_imsic_rmw arch/riscv/kvm/aia_imsic.c:909
kvm_riscv_vcpu_aia_rmw_topei arch/riscv/kvm/aia.c:231
csr_insn arch/riscv/kvm/vcpu_insn.c:208
system_opcode_insn arch/riscv/kvm/vcpu_insn.c:281
kvm_riscv_vcpu_virtual_insn arch/riscv/kvm/vcpu_insn.c:355
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:230
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:1008
...
Fix this by explicitly checking whether the vCPU IMSIC state has been
initialized before handling TOPEI CSR accesses. If not, forward the CSR
emulation to user space.
Fixes: db8b7e97d6137 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260226085119.643295-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
While fuzzing KVM on RISC-V, a use-after-free was observed in
kvm_riscv_gstage_get_leaf(), where ptep_get() dereferences a
freed gstage page table page during gfn unmap.
The crash manifests as:
use-after-free in ptep_get include/linux/pgtable.h:340 [inline]
use-after-free in kvm_riscv_gstage_get_leaf arch/riscv/kvm/gstage.c:89
Call Trace:
ptep_get include/linux/pgtable.h:340 [inline]
kvm_riscv_gstage_get_leaf+0x2ea/0x358 arch/riscv/kvm/gstage.c:89
kvm_riscv_gstage_unmap_range+0xf0/0x308 arch/riscv/kvm/gstage.c:265
kvm_unmap_gfn_range+0x168/0x1fc arch/riscv/kvm/mmu.c:256
kvm_mmu_unmap_gfn_range virt/kvm/kvm_main.c:724 [inline]
page last free pid 808 tgid 808 stack trace:
kvm_riscv_mmu_free_pgd+0x1b6/0x26a arch/riscv/kvm/mmu.c:457
kvm_arch_flush_shadow_all+0x1a/0x24 arch/riscv/kvm/mmu.c:134
kvm_flush_shadow_all virt/kvm/kvm_main.c:344 [inline]
The UAF is caused by gstage page table walks running concurrently with
gstage pgd teardown. In particular, kvm_unmap_gfn_range() can traverse
gstage page tables while kvm_arch_flush_shadow_all() frees the pgd,
leading to use-after-free of page table pages.
Fix the issue by serializing gstage unmap and pgd teardown with
kvm->mmu_lock. Holding mmu_lock ensures that gstage page tables
remain valid for the duration of unmap operations and prevents
concurrent frees.
This matches existing RISC-V KVM usage of mmu_lock to protect gstage
map/unmap operations, e.g. kvm_riscv_mmu_iounmap.
Fixes: dd82e35638d67f ("RISC-V: KVM: Factor-out g-stage page table management")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260202040059.1801167-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
Guests can control IRQ indices via MMIO. Sanitize them with
array_index_nospec() to prevent speculative out-of-bounds access
to the aplic->irqs[] array.
Similar to arm64 commit 41b87599c743 ("KVM: arm/arm64: vgic: fix possible
spectre-v1 in vgic_get_irq()") and x86 commit 8c86405f606c ("KVM: x86:
Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks").
Fixes: 74967aa208e2 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC")
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260116095731.24555-1-lukas.gerlach@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature
set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered... [his words, not mine]
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Generic:
- Remove internal Kconfigs that are now set on all architectures
- Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all
architectures finally enable it in Linux 7.0"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: always define KVM_CAP_SYNC_MMU
KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER
KVM: arm64: Deduplicate ASID retrieval code
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs
KVM: arm64: Fix protected mode handling of pages larger than 4kB
KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
KVM: arm64: Fix ID register initialization for non-protected pKVM guests
KVM: arm64: Optimise away S1POE handling when not supported by host
KVM: arm64: Hide S1POE from guests when not supported by the host
Pull debugobjects fix from Thomas Gleixner:
"A single fix for debugobjects.
The deferred page initialization prevents debug objects from
allocating slab pages until the initialization is complete. That
causes depletion of the pool and disabling of debugobjects.
The reason is that debugobjects uses __GFP_HIGH for allocations as it
might be invoked from arbitrary contexts. When PREEMPT_COUNT is
disabled there is no way to know whether the context is safe to set
__GFP_KSWAPD_RECLAIM.
This worked until v6.18. Since then allocations w/o a reclaim flag
cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(),
which returns early when deferred page initialization has not yet
completed.
Work around that when PREEMPT_COUNT is enabled as the preempt counter
allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when
the context is preemtible. When PREEMPT_COUNT is disabled the context
is unknown and the reclaim bit can't be set because the caller might
hold locks which might deadlock in the allocator.
That makes debugobjects depend on PREEMPT_COUNT ||
!DEFERRED_STRUCT_PAGE_INIT, which limits the coverage slightly, but
keeps it functional for most cases"
* tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Make it work with deferred page initialization - again
KVM/arm64 fixes for 7.0, take #1
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered...
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Pull x86 fixes from Ingo Molnar:
- Fix speculative safety in fred_extint()
- Fix __WARN_printf() trap in early_fixup_exception()
- Fix clang-build boot bug for unusual alignments, triggered by
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y
- Replace the final few __ASSEMBLY__ stragglers that snuck in lately
into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again)
* tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__
x86/cfi: Fix CFI rewrite for odd alignments
x86/bug: Handle __WARN_printf() trap in early_fixup_exception()
x86/fred: Correct speculative safety in fred_extint()
debugobjects uses __GFP_HIGH for allocations as it might be invoked
within locked regions. That worked perfectly fine until v6.18. It still
works correctly when deferred page initialization is disabled and works
by chance when no page allocation is required before deferred page
initialization has completed.
Since v6.18 allocations w/o a reclaim flag cause new_slab() to end up in
alloc_frozen_pages_nolock_noprof(), which returns early when deferred
page initialization has not yet completed. As the deferred page
initialization takes quite a while the debugobject pool is depleted and
debugobjects are disabled.
This can be worked around when PREEMPT_COUNT is enabled as that allows
debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context
is preemtible. When PREEMPT_COUNT is disabled the context is unknown and
the reclaim bit can't be set because the caller might hold locks which
might deadlock in the allocator.
In preemptible context the reclaim bit is harmless and not a performance
issue as that's usually invoked from slow path initialization context.
That makes debugobjects depend on PREEMPT_COUNT || !DEFERRED_STRUCT_PAGE_INIT.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/87pl6gznti.ffs@tglx
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always
available. Move the definition from individual architectures to common
code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull timer fix from Ingo Molnar:
"Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for
the common HZ=100, 250 or 1000 cases. Only use a function call for odd
HZ values like HZ=300 that generate more code.
The function call overhead showed up in performance tests of the TCP
code"
* tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs()
After converting the __ASSEMBLY__ statements to __ASSEMBLER__ in
commit 24a295e4ef1ca ("x86/headers: Replace __ASSEMBLY__ with
__ASSEMBLER__ in non-UAPI headers"), some new code has been
added that uses __ASSEMBLY__ again. Convert these stragglers, too.
This is a mechanical patch, done with a simple "sed -i" command.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251218182029.166993-1-thuth@redhat.com
All architectures now use MMU notifier for KVM page table management.
Remove the Kconfig symbol and the code that is used when it is
disabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.
Re-add the missing !, fixing the behaviour.
Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull scheduler fixes from Ingo Molnar:
- Fix zero_vruntime tracking when there's a single task running
- Fix slice protection logic
- Fix the ->vprot logic for reniced tasks
- Fix lag clamping in mixed slice workloads
- Fix objtool uaccess warning (and bug) in the
!CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining,
which triggers with older compilers
- Fix a comment in the rseq registration rseq_size bound check code
- Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
differently, which special size we now reached naturally and want to
avoid. The visible ugliness of the new reserved field will be avoided
the next time the RSEQ area is extended.
* tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: slice ext: Ensure rseq feature size differs from original rseq size
rseq: Clarify rseq registration rseq_size bound check comment
sched/core: Fix wakeup_preempt's next_class tracking
rseq: Mark rseq_arm_slice_extension_timer() __always_inline
sched/fair: Fix lag clamp
sched/eevdf: Update se->vprot in reweight_entity()
sched/fair: Only set slice protection at pick time
sched/fair: Fix zero_vruntime tracking
For common cases (HZ=100, 250 or 1000), these helpers are at most one
multiply, so there is no point calling a tiny function.
Keep them out of line for HZ=300 and others.
This saves cycles in TCP fast path, among other things.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/8 grow/shrink: 25/89 up/down: 530/-3474 (-2944)
...
nla_put_msecs 193 - -193
message_stats_print 2131 920 -1211
Total: Before=25365208, After=25362264, chg -0.01%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260210170226.57209-1-edumazet@google.com
Rustam reported his clang builds did not boot properly; turns out his
.config has: CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y set.
Fix up the FineIBT code to deal with this unusual alignment.
Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT")
Reported-by: Rustam Kovhaev <rkovhaev@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Rustam Kovhaev <rkovhaev@gmail.com>
Pull i2c fix from Wolfram Sang:
- imx: preserve error state during SMBus block read length handling
* tag 'i2c-for-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: preserve error state in block data length handler
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/
Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported")
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull perf events fixes from Ingo Molnar:
- Fix lock ordering bug found by lockdep in perf_event_wakeup()
- Fix uncore counter enumeration on Granite Rapids and Sierra Forest
- Fix perf_mmap() refcount bug found by Syzkaller
- Fix __perf_event_overflow() vs perf_remove_from_context() race
* tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
perf/core: Fix refcount bug and potential UAF in perf_mmap
perf/x86/intel/uncore: Add per-scheduler IMC CAS count events
perf/core: Fix invalid wait context in ctx_sched_in()
Before rseq became extensible, its original size was 32 bytes even
though the active rseq area was only 20 bytes. This had the following
impact in terms of userspace ecosystem evolution:
* The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set
to 32, even though the size of the active rseq area is really 20.
* The GNU libc 2.40 changes this __rseq_size to 20, thus making it
express the active rseq area.
* Starting from glibc 2.41, __rseq_size corresponds to the
AT_RSEQ_FEATURE_SIZE from getauxval(3).
This means that users of __rseq_size can always expect it to
correspond to the active rseq area, except for the value 32, for
which the active rseq area is 20 bytes.
Exposing a 32 bytes feature size would make life needlessly painful
for userspace. Therefore, add a reserved field at the end of the
rseq area to bump the feature size to 33 bytes. This reserved field
is expected to be replaced with whatever field will come next,
expecting that this field will be larger than 1 byte.
The effect of this change is to increase the size from 32 to 64 bytes
before we actually have fields using that memory.
Clarify the allocation size and alignment requirements in the struct
rseq uapi comment.
Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the
value of the active rseq area size rounded up to next power of 2, which
guarantees that the rseq structure will always be aligned on the nearest
power of two large enough to contain it, even as it grows. Change the
alignment check in the rseq registration accordingly.
This will minimize the amount of ABI corner-cases we need to document
and require userspace to play games with. The rule stays simple when
__rseq_size != 32:
#define rseq_field_available(field) (__rseq_size >= offsetofend(struct rseq_abi, field))
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com
Pull powerpc updates for 7.0
- Implement masked user access
- Add bpf support for internal only per-CPU instructions and inline the
bpf_get_smp_processor_id() and bpf_get_current_task() functions
- Fix pSeries MSI-X allocation failure when quota is exceeded
- Fix recursive pci_lock_rescan_remove locking in EEH event handling
- Support tailcalls with subprogs & BPF exceptions on 64bit
- Extend "trusted" keys to support the PowerVM Key Wrapping Module
(PKWM)
Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.
* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
powerpc/pseries: plpks: export plpks_wrapping_is_supported
docs: trusted-encryped: add PKWM as a new trust source
keys/trusted_keys: establish PKWM as a trusted source
pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
pseries/plpks: expose PowerVM wrapping features via the sysfs
powerpc/pseries: move the PLPKS config inside its own sysfs directory
pseries/plpks: fix kernel-doc comment inconsistencies
powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
powerpc: kgdb: Remove OUTBUFMAX constant
powerpc64/bpf: Additional NVR handling for bpf_throw
powerpc64/bpf: Support exceptions
powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
powerpc64/bpf: Avoid tailcall restore from trampoline
powerpc64/bpf: Support tailcalls with subprogs
powerpc64/bpf: Moving tail_call_cnt to bottom of frame
powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
...
The commit 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()")
implemented __WARN_printf(), which changed the mechanism to use UD1
instead of UD2. However, it only handles the trap in the runtime IDT
handler, while the early booting IDT handler lacks this handling. As a
result, the usage of WARN() before the runtime IDT setup can lead to
kernel crashes. Since KMSAN is enabled after the runtime IDT setup, it
is safe to use handle_bug() directly in early_fixup_exception() to
address this issue.
Fixes: 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/c4fb3645f60d3a78629d9870e8fcc8535281c24f.1768016713.git.houwenlong.hwl@antgroup.com
Pull spi fixes from Mark Brown:
"One final batch of fixes for the Tegra SPI drivers, the main one is a
batch of fixes for races with the interrupts in the Tegra210 QSPI
driver that Breno has been working on for a while"
* tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: tegra114: Preserve SPI mode bits in def_command1_reg
spi: tegra: Fix a memory leak in tegra_slink_probe()
spi: tegra210-quad: Protect curr_xfer check in IRQ handler
spi: tegra210-quad: Protect curr_xfer clearing in tegra_qspi_non_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer in tegra_qspi_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer assignment in tegra_qspi_setup_transfer_one
spi: tegra210-quad: Move curr_xfer read inside spinlock
spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer
When a block read returns an invalid length, zero or >I2C_SMBUS_BLOCK_MAX,
the length handler sets the state to IMX_I2C_STATE_FAILED. However,
i2c_imx_master_isr() unconditionally overwrites this with
IMX_I2C_STATE_READ_CONTINUE, causing an endless read loop that overruns
buffers and crashes the system.
Guard the state transition to preserve error states set by the length
handler.
Fixes: 5f5c2d4579ca ("i2c: imx: prevent rescheduling in non dma mode")
Signed-off-by: LI Qingwu <Qing-wu.Li@leica-geosystems.com.cn>
Cc: <stable@vger.kernel.org> # v6.13+
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260116111906.3413346-2-Qing-wu.Li@leica-geosystems.com.cn
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull fsverity fixes from Eric Biggers:
- Fix a build error on parisc
- Remove the non-large-folio-aware function fsverity_verify_page()
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: fix build error by adding fsverity_readahead() stub
fsverity: remove fsverity_verify_page()
f2fs: make f2fs_verify_cluster() partially large-folio-aware
f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()
Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.
However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.
As it turns out, there are multiple issues at play:
- while the HPFAR_EL2 register gives you the faulting IPA minus
the bottom 12 bits, it will still give you the extra bits that
are part of the page offset for anything larger than 4kB,
even for a level-3 mapping
- pkvm_pgtable_stage2_map() assumes that the address passed as
a parameter is aligned to the size of the intended mapping
- the faulting address is only aligned for a non-page mapping
When the planets are suitably aligned (pun intended), the guest
faults on a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again...
This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.
Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.
Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org
Cc: stable@vger.kernel.org
Pull locking fix from Ingo Molnar:
"Now that LLVM 22 has been released officially, require a release
version to use the new CONFIG_WARN_CONTEXT_ANALYSIS feature.
In particular this avoids the widely used Android clang 22.0.1
pre-release build which is known to be broken for this usecase"
* tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis
Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.
This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.
Fixes: 592903cdcbf6 ("perf_counter: add an event_list")
Reported-by: Simond Hu <cmdhh1767@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Simond Hu <cmdhh1767@gmail.com>
Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net
The rseq registration validates that the rseq_size argument is greater
or equal to 32 (the original rseq size), but the comment associated with
this check does not clearly state this.
Clarify the comment to that effect.
Fixes: ee3e3ac05c26 ("rseq: Introduce extensible rseq ABI")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-2-mathieu.desnoyers@efficios.com
Pull parisc updates from Helge Deller:
- Fix device reference leak in error path
- Check if system provides a 64-bit free running platform counter
- Minor fixes in debug code
* tag 'parisc-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: lba_pci: Add debug code to show IO and PA ranges
parisc: Detect 64-bit free running platform counter
parisc: Fix minor printk issues in iosapic debug code
parisc: Enhance debug code for PAT firmware
parisc: Add PDC PAT call to get free running 64-bit counter
parisc: Fix module path output in qemu tables
parisc: Export model name for MPE/ix
parisc: Prevent interrupts during reboot
parisc: Print hardware IDs as 4 digit hex strings
parisc: kernel: replace kfree() with put_device() in create_tree_node()
Building trusted-keys as a module fails modpost with:
ERROR: modpost: "plpks_wrapping_is_supported" [security/keys/trusted-keys/
trusted.ko] undefined!
Export plpks_wrapping_is_supported() so trusted-keys links cleanly
This patch is intended to be applied on top of the earlier "Extend "trusted
" keys to support a new trust source named the PowerVM Key Wrapping Module
(PKWM)" series (v5).
Link: https://lore.kernel.org/all/20260127145228.48320-1-ssrish@linux.ibm.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602010724.1g9hbLKv-lkp@intel.com/
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260201165344.950870-1-ssrish@linux.ibm.com
array_index_nospec() is no use if the result gets spilled to the stack, as
it makes the believed safe-under-speculation value subject to memory
predictions.
For all practical purposes, this means array_index_nospec() must be used in
the expression that accesses the array.
As the code currently stands, it's the wrong side of irqentry_enter(), and
'index' is put into %ebp across the function call.
Remove the index variable and reposition array_index_nospec(), so it's
calculated immediately before the array access.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260106131504.679932-1-andrew.cooper3@citrix.com
Pull regulator fix from Mark Brown:
"One last fix for v6.19: the voltages for the SpaceMIT P1 were not
described correctly"
* tag 'regulator-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: spacemit-p1: Fix n_voltages for BUCK and LDO regulators
The COMMAND1 register bits [29:28] set the SPI mode, which controls
the clock idle level. When a transfer ends, tegra_spi_transfer_end()
writes def_command1_reg back to restore the default state, but this
register value currently lacks the mode bits. This results in the
clock always being configured as idle low, breaking devices that
need it high.
Fix this by storing the mode bits in def_command1_reg during setup,
to prevent this field from always being cleared.
Fixes: f333a331adfa ("spi/tegra114: add spi driver")
Signed-off-by: Vishwaroop A <va@nvidia.com>
Link: https://patch.msgid.link/20260204141212.1540382-1-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull crypto library fix from Eric Biggers:
"Fix a big endian specific issue in the PPC64-optimized AES code"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs
hppa-linux-gcc 9.5.0 generates a call to fsverity_readahead() in
f2fs_readahead() when CONFIG_FS_VERITY=n, because it fails to do the
expected dead code elimination based on vi always being NULL. Fix the
build error by adding an inline stub for fsverity_readahead(). Since
it's just for opportunistic readahead, just make it a no-op.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602180838.pwICdY2r-lkp@intel.com/
Fixes: 45dcb3ac9832 ("f2fs: consolidate fsverity_info lookup")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20260218012244.18536-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull irqchip driver fixes from Ingo Molnar:
- Fix frozen interrupt bug in the sifive-plic driver
- Limit per-device MSI interrupts on uncommon gic-v3-its hardware
variants
- Address Sparse warning by constifying a variable in the MMP driver
- Revert broken commit and also fix an error check in the ls-extirq
driver
* tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ls-extirq: Fix devm_of_iomap() error check
Revert "irqchip/ls-extirq: Use for_each_of_imap_item iterator"
irqchip/mmp: Make icu_irq_chip variable static const
irqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS supports
irqchip/sifive-plic: Fix frozen interrupt due to affinity setting
Using a prerelease version as a minimum supported version for
CONFIG_WARN_CONTEXT_ANALYSIS was reasonable to do while LLVM 22 was the
development version so that people could immediately build from main and
start testing and validating this in their own code. However, it can be
problematic when using prerelease versions of LLVM 22, such as Android
clang 22.0.1 (the current android mainline compiler) or when bisecting
LLVM between llvmorg-22-init and llvmorg-23-init, to build the kernel,
as all compiler fixes for the context analysis may not be present,
potentially resulting in warnings that can easily turn into errors.
Now that LLVM 22 is released as 22.1.0, upgrade the check to require at
least this version to ensure that a user's toolchain actually has all
the changes needed for a smooth experience with context analysis.
Fixes: 3269701cb256 ("compiler-context-analysis: Add infrastructure for Context Analysis with Clang")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://patch.msgid.link/20260224-bump-clang-ver-context-analysis-v1-1-16cc7a90a040@kernel.org
Syzkaller reported a refcount_t: addition on 0; use-after-free warning
in perf_mmap.
The issue is caused by a race condition between a failing mmap() setup
and a concurrent mmap() on a dependent event (e.g., using output
redirection).
In perf_mmap(), the ring_buffer (rb) is allocated and assigned to
event->rb with the mmap_mutex held. The mutex is then released to
perform map_range().
If map_range() fails, perf_mmap_close() is called to clean up.
However, since the mutex was dropped, another thread attaching to
this event (via inherited events or output redirection) can acquire
the mutex, observe the valid event->rb pointer, and attempt to
increment its reference count. If the cleanup path has already
dropped the reference count to zero, this results in a
use-after-free or refcount saturation warning.
Fix this by extending the scope of mmap_mutex to cover the
map_range() call. This ensures that the ring buffer initialization
and mapping (or cleanup on failure) happens atomically effectively,
preventing other threads from accessing a half-initialized or
dying ring buffer.
Closes: https://lore.kernel.org/oe-kbuild-all/202602020208.m7KIjdzW-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Haocheng Yu <yuhaocheng035@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260202162057.7237-1-yuhaocheng035@gmail.com
Kernel test robot reported that
tools/testing/selftests/kvm/hardware_disable_test was failing due to
commit 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt()
and rq_modified_*()")
It turns out there were two related problems that could lead to a
missed preemption:
- when hitting newidle balance from the idle thread, it would elevate
rb->next_class from &idle_sched_class to &fair_sched_class, causing
later wakeup_preempt() calls to not hit the sched_class_above()
case, and not issue resched_curr().
Notably, this modification pattern should only lower the
next_class, and never raise it. Create two new helper functions to
wrap this.
- when doing schedule_idle(), it was possible to miss (re)setting
rq->next_class to &idle_sched_class, leading to the very same
problem.
Cc: Sean Christopherson <seanjc@google.com>
Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602122157.4e861298-lkp@intel.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260218163329.GQ1395416@noisy.programming.kicks-ass.net
Pull SoC devicetree updates from Arnd Bergmann:
"There are a handful of new SoCs this time, all of these are more or
less related to chips in a wider family:
- SpacemiT Key Stone K3 is an 8-core risc-v chip, and the first
widely available RVA23 implementation. Note that this is entirely
unrelated with the similarly named Texas Instruments K3 chip family
that follwed the TI Keystone2 SoC.
- The Realtek Kent family of SoCs contains three chip models
rtd1501s, rtd1861b and rtd1920s, and is related to their earlier
Set-top-box and NAS products such as rtd1619, but is built on newer
Arm Cortex-A78 cores.
- The Qualcomm Milos family includes the Snapdragon 7s Gen 3 (SM7635)
mobile phone SoC built around Armv9 Kryo cores of the Arm
Cortex-A720 generation. This one is used in the Fairphone Gen 6
- Qualcomm Kaanapali is a new SoC based around eight high performance
Oryon CPU cores
- NXP i.MX8QP and i.MX952 are both feature reduced versions of chips
we already support, i.e. the i.MX8QM and i.MX952, with fewer CPU
cores and I/O interfaces.
As part of a cleanup, a number of SoC specific devicetree files got
removed because they did not have a single board using the .dtsi files
and they were never compile tested as a result: Samsung s3c6400, ST
spear320s, ST stm32mp21xc/stm32mp23xc/stm32mp25xc, Renesas
r8a779m0/r8a779m2/r8a779m4/r8a779m6/r8a779m7/r8a779m8/r8a779mb/
r9a07g044c1/r9a07g044l1/r9a07g054l1/r9a09g047e37, and TI
am3703/am3715. All of these could be restored easily if a new board
gets merged.
Broadcom/Cavium/Marvell ThunderX2 gets removed along with its only
machine, as all remaining users are assumed to be using ACPI based
firmware.
A relatively small number of 43 boards get added this time, and almost
all of them for arm64. Aside from the reference boards for the newly
added SoCs, this includes:
- Three server boards use 32-bit ASpeed BMCs
- One more reference board for 32-bit Microchip LAN9668
- 64-bit Arm single-board computers based on Amlogic s905y4, CIX
sky1, NXP ls1028a/imx8mn/imx8mp/imx91/imx93/imx95, Qualcomm
qcs6490/qrb2210 and Rockchip rk3568/rk3588s
- Carrier board for SOMs using Intel agilex5, Marvell Armada 7020,
NXP iMX8QP, Mediatek mt8370/mt8390 and rockchip rk3588
- Two mobile phones using Snapdragon 845
- A gaming device and a NAS box, both based on Rockchips rk356x
On top of the newly added boards and SoCs, there is a lot of
background activity going into cleanups, in particular towards getting
a warning-free dtc build, and the usual work on adding support for
more hardware on the previously added machines"
* tag 'soc-dt-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (757 commits)
dt-bindings: intel: Add Agilex eMMC support
arm64: dts: socfpga: agilex: add emmc support
arm64: dts: intel: agilex5: Add simple-bus node on top of dma controller node
ARM: dts: socfpga: fix dtbs_check warning for fpga-region
ARM: dts: socfpga: add #address-cells and #size-cells for sram node
dt-bindings: altera: document syscon as fallback for sys-mgr
arm64: dts: altera: Use lowercase hex
dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml
arm64: dts: socfpga: agilex5: Add IOMMUS property for ethernet nodes
arm64: dts: socfpga: agilex5: add support for modular board
dt-bindings: intel: Add Agilex5 SoCFPGA modular board
arm64: dts: socfpga: agilex5: Add dma-coherent property
arm64: dts: realtek: Add Kent SoC and EVB device trees
dt-bindings: arm: realtek: Add Kent Soc family compatibles
ARM: dts: samsung: Drop s3c6400.dtsi
ARM: dts: nuvoton: Minor whitespace cleanup
MAINTAINERS: Add Falcon DB
arm64: dts: a7k: add COM Express boards
ARM: dts: microchip: Drop usb_a9g20-dab-mmx.dtsi
arm64: dts: rockchip: Fix rk3588 PCIe range mappings
...
Add more code to debug the PAT PDC firmware.
Signed-off-by: Helge Deller <deller@gmx.de>
Update Documentation/security/keys/trusted-encrypted.rst and Documentation/
admin-guide/kernel-parameters.txt with PowerVM Key Wrapping Module (PKWM)
as a new trust source
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-7-ssrish@linux.ibm.com
Pull binder fixes from Greg KH:
"Here are some small, last-minute binder C and Rust driver fixes for
reported issues. They include a number of fixes for reported crashes
and other problems.
All of these have been in linux-next this week, and longer"
* tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binderfs: fix ida_alloc_max() upper bound
rust_binderfs: fix ida_alloc_max() upper bound
binder: fix BR_FROZEN_REPLY error log
rust_binder: add additional alignment checks
binder: fix UAF in binder_netlink_report()
rust_binder: correctly handle FDA objects of length zero
Higher voltage settings were unusable due to incorrect n_voltages values
causing registration failures. For example, setting aldo4 to 3.3V failed
with -EINVAL because the required selector (123) exceeded the allowed
range (n_voltages=117).
Fix by aligning n_voltages with the hardware register widths per the P1
datasheet [1]:
- BUCK: 255 (was 254), allows selectors 0-254, selector 255 is reserved
- LDO: 128 (was 117), allows selectors 0-127, selectors 0-10 are for
suspend mode, valid operational range is 11-127
This enables the full voltage range supported by the hardware.
Fixes: 8b84d712ad84 ("regulator: spacemit: support SpacemiT P1 regulators")
Link: https://developer.spacemit.com/documentation [1]
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Link: https://patch.msgid.link/20260122-spacemit-p1-v1-1-309be27fbff9@riscstar.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In tegra_slink_probe(), when platform_get_irq() fails, it directly
returns from the function with an error code, which causes a memory leak.
Replace it with a goto label to ensure proper cleanup.
Fixes: eb9913b511f1 ("spi: tegra: Fix missing IRQ check in tegra_slink_probe()")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260202-slink-v1-1-eac50433a6f9@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull SCSI fixes from James Bottomley:
"Small changes in drivers only, no core changes.
The firewire one fixes a user controlled overflow (but I still can't
see how it could be exploited)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: amd-versal2: Fix PHY initialization in HCE enable notify
scsi: firewire: sbp-target: Fix overflow in sbp_make_tpg()
scsi: be2iscsi: Fix a memory leak in beiscsi_boot_get_sinfo()
scsi: qla2xxx: edif: Fix dma_free_coherent() size
Stephen retired and stepped back from -next maintainership, update his
entry in CREDITS to recognise his 18 years of hard work making it what
it is today and all the impact it's had on our development process.
Also update to his current GnuPG key while we're here.
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
User-controlled indices are used to access AIA CSR registers.
Sanitize them with array_index_nospec() to prevent speculative
out-of-bounds access.
Similar to x86 commit 8c86405f606c ("KVM: x86: Protect
ioapic_read_indirect() from Spectre-v1/L1TF attacks") and arm64
commit 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1
in vgic_get_irq()").
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://lore.kernel.org/r/20260303-kvm-riscv-spectre-v1-v2-2-192caab8e0dc@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
User-controlled register indices from the ONE_REG ioctl are used to
index into arrays of register values. Sanitize them with
array_index_nospec() to prevent speculative out-of-bounds access.
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://lore.kernel.org/r/20260303-kvm-riscv-spectre-v1-v2-1-192caab8e0dc@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
When dirty logging is enabled, guest stage mappings are forced to
PAGE_SIZE granularity. Changing the mapping page size at this point
is incorrect.
Fixes: ed7ae7a34bea ("RISC-V: KVM: Transparent huge page support")
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260226191231140_X1Juus7s2kgVlc0ZyW_K@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM_DEV_RISCV_AIA_GRP_APLIC branch of aia_has_attr() was identified
to have a race condition with concurrent KVM_SET_DEVICE_ATTR ioctls,
leading to a use-after-free bug.
Upon analyzing the code, it was discovered that the
KVM_DEV_RISCV_AIA_GRP_IMSIC branch of aia_has_attr() suffers from the same
lack of synchronization. It invokes kvm_riscv_aia_imsic_has_attr() without
holding dev->kvm->lock.
While aia_has_attr() is running, a concurrent aia_set_attr() could call
aia_init() under the dev->kvm->lock. If aia_init() fails, it may trigger
kvm_riscv_vcpu_aia_imsic_cleanup(), which frees imsic_state. Without proper
locking, kvm_riscv_aia_imsic_has_attr() could attempt to access imsic_state
while it is being deallocated.
Although this specific path has not yet been reported by a fuzzer, it
is logically identical to the APLIC issue. Fix this by acquiring the
dev->kvm->lock before calling kvm_riscv_aia_imsic_has_attr(), ensuring
consistency with the locking pattern used for other AIA attribute groups.
Fixes: 5463091a51cf ("RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260304080804.2281721-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
Fuzzer reports a KASAN use-after-free bug triggered by a race
between KVM_HAS_DEVICE_ATTR and KVM_SET_DEVICE_ATTR ioctls on
the AIA device. The root cause is that aia_has_attr() invokes
kvm_riscv_aia_aplic_has_attr() without holding dev->kvm->lock, while
a concurrent aia_set_attr() may call aia_init() under that lock. When
aia_init() fails after kvm_riscv_aia_aplic_init() has succeeded, it
calls kvm_riscv_aia_aplic_cleanup() in its fail_cleanup_imsics path,
which frees both aplic_state and aplic_state->irqs. The concurrent
has_attr path can then dereference the freed aplic->irqs in
aplic_read_pending():
irqd = &aplic->irqs[irq]; /* UAF here */
KASAN report:
BUG: KASAN: slab-use-after-free in aplic_read_pending
arch/riscv/kvm/aia_aplic.c:119 [inline]
BUG: KASAN: slab-use-after-free in aplic_read_pending_word
arch/riscv/kvm/aia_aplic.c:351 [inline]
BUG: KASAN: slab-use-after-free in aplic_mmio_read_offset
arch/riscv/kvm/aia_aplic.c:406
Read of size 8 at addr ff600000ba965d58 by task 9498
Call Trace:
aplic_read_pending arch/riscv/kvm/aia_aplic.c:119 [inline]
aplic_read_pending_word arch/riscv/kvm/aia_aplic.c:351 [inline]
aplic_mmio_read_offset arch/riscv/kvm/aia_aplic.c:406
kvm_riscv_aia_aplic_has_attr arch/riscv/kvm/aia_aplic.c:566
aia_has_attr arch/riscv/kvm/aia_device.c:469
allocated by task 9473:
kvm_riscv_aia_aplic_init arch/riscv/kvm/aia_aplic.c:583
aia_init arch/riscv/kvm/aia_device.c:248 [inline]
aia_set_attr arch/riscv/kvm/aia_device.c:334
freed by task 9473:
kvm_riscv_aia_aplic_cleanup arch/riscv/kvm/aia_aplic.c:644
aia_init arch/riscv/kvm/aia_device.c:292 [inline]
aia_set_attr arch/riscv/kvm/aia_device.c:334
Fix this race by acquiring dev->kvm->lock in aia_has_attr() before
calling kvm_riscv_aia_aplic_has_attr(), consistent with the locking
pattern used in aia_get_attr() and aia_set_attr().
Fixes: 289a007b98b06d ("RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260302132703.1721415-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
The indexed array only has RISCV_KVM_MAX_COUNTERS elements.
The out-of-bound access could have been performed by a guest,
but it could only access another guest accessible data.
Fixes: 8f0153ecd3bf ("RISC-V: KVM: Add skeleton support for perf")
Signed-off-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260227134617.23378-1-radim.krcmar@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
kvm_riscv_vcpu_aia_rmw_topei() assumes that the per-vCPU IMSIC state has
been initialized once AIA is reported as available and initialized at
the VM level. This assumption does not always hold.
Under fuzzed ioctl sequences, a guest may access the IMSIC TOPEI CSR
before the vCPU IMSIC state is set up. In this case,
vcpu->arch.aia_context.imsic_state is still NULL, and the TOPEI RMW path
dereferences it unconditionally, leading to a host kernel crash.
The crash manifests as:
Unable to handle kernel paging request at virtual address
dfffffff0000000e
...
kvm_riscv_vcpu_aia_imsic_rmw arch/riscv/kvm/aia_imsic.c:909
kvm_riscv_vcpu_aia_rmw_topei arch/riscv/kvm/aia.c:231
csr_insn arch/riscv/kvm/vcpu_insn.c:208
system_opcode_insn arch/riscv/kvm/vcpu_insn.c:281
kvm_riscv_vcpu_virtual_insn arch/riscv/kvm/vcpu_insn.c:355
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:230
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:1008
...
Fix this by explicitly checking whether the vCPU IMSIC state has been
initialized before handling TOPEI CSR accesses. If not, forward the CSR
emulation to user space.
Fixes: db8b7e97d6137 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260226085119.643295-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
While fuzzing KVM on RISC-V, a use-after-free was observed in
kvm_riscv_gstage_get_leaf(), where ptep_get() dereferences a
freed gstage page table page during gfn unmap.
The crash manifests as:
use-after-free in ptep_get include/linux/pgtable.h:340 [inline]
use-after-free in kvm_riscv_gstage_get_leaf arch/riscv/kvm/gstage.c:89
Call Trace:
ptep_get include/linux/pgtable.h:340 [inline]
kvm_riscv_gstage_get_leaf+0x2ea/0x358 arch/riscv/kvm/gstage.c:89
kvm_riscv_gstage_unmap_range+0xf0/0x308 arch/riscv/kvm/gstage.c:265
kvm_unmap_gfn_range+0x168/0x1fc arch/riscv/kvm/mmu.c:256
kvm_mmu_unmap_gfn_range virt/kvm/kvm_main.c:724 [inline]
page last free pid 808 tgid 808 stack trace:
kvm_riscv_mmu_free_pgd+0x1b6/0x26a arch/riscv/kvm/mmu.c:457
kvm_arch_flush_shadow_all+0x1a/0x24 arch/riscv/kvm/mmu.c:134
kvm_flush_shadow_all virt/kvm/kvm_main.c:344 [inline]
The UAF is caused by gstage page table walks running concurrently with
gstage pgd teardown. In particular, kvm_unmap_gfn_range() can traverse
gstage page tables while kvm_arch_flush_shadow_all() frees the pgd,
leading to use-after-free of page table pages.
Fix the issue by serializing gstage unmap and pgd teardown with
kvm->mmu_lock. Holding mmu_lock ensures that gstage page tables
remain valid for the duration of unmap operations and prevents
concurrent frees.
This matches existing RISC-V KVM usage of mmu_lock to protect gstage
map/unmap operations, e.g. kvm_riscv_mmu_iounmap.
Fixes: dd82e35638d67f ("RISC-V: KVM: Factor-out g-stage page table management")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260202040059.1801167-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
Guests can control IRQ indices via MMIO. Sanitize them with
array_index_nospec() to prevent speculative out-of-bounds access
to the aplic->irqs[] array.
Similar to arm64 commit 41b87599c743 ("KVM: arm/arm64: vgic: fix possible
spectre-v1 in vgic_get_irq()") and x86 commit 8c86405f606c ("KVM: x86:
Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks").
Fixes: 74967aa208e2 ("RISC-V: KVM: Add in-kernel emulation of AIA APLIC")
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260116095731.24555-1-lukas.gerlach@cispa.de
Signed-off-by: Anup Patel <anup@brainfault.org>
Pull kvm fixes from Paolo Bonzini:
"Arm:
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature
set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered... [his words, not mine]
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Generic:
- Remove internal Kconfigs that are now set on all architectures
- Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all
architectures finally enable it in Linux 7.0"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: always define KVM_CAP_SYNC_MMU
KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER
KVM: arm64: Deduplicate ASID retrieval code
irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag
KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs
KVM: arm64: Fix protected mode handling of pages larger than 4kB
KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()
KVM: arm64: Fix ID register initialization for non-protected pKVM guests
KVM: arm64: Optimise away S1POE handling when not supported by host
KVM: arm64: Hide S1POE from guests when not supported by the host
Pull debugobjects fix from Thomas Gleixner:
"A single fix for debugobjects.
The deferred page initialization prevents debug objects from
allocating slab pages until the initialization is complete. That
causes depletion of the pool and disabling of debugobjects.
The reason is that debugobjects uses __GFP_HIGH for allocations as it
might be invoked from arbitrary contexts. When PREEMPT_COUNT is
disabled there is no way to know whether the context is safe to set
__GFP_KSWAPD_RECLAIM.
This worked until v6.18. Since then allocations w/o a reclaim flag
cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(),
which returns early when deferred page initialization has not yet
completed.
Work around that when PREEMPT_COUNT is enabled as the preempt counter
allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when
the context is preemtible. When PREEMPT_COUNT is disabled the context
is unknown and the reclaim bit can't be set because the caller might
hold locks which might deadlock in the allocator.
That makes debugobjects depend on PREEMPT_COUNT ||
!DEFERRED_STRUCT_PAGE_INIT, which limits the coverage slightly, but
keeps it functional for most cases"
* tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Make it work with deferred page initialization - again
KVM/arm64 fixes for 7.0, take #1
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered...
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
Pull x86 fixes from Ingo Molnar:
- Fix speculative safety in fred_extint()
- Fix __WARN_printf() trap in early_fixup_exception()
- Fix clang-build boot bug for unusual alignments, triggered by
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y
- Replace the final few __ASSEMBLY__ stragglers that snuck in lately
into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again)
* tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__
x86/cfi: Fix CFI rewrite for odd alignments
x86/bug: Handle __WARN_printf() trap in early_fixup_exception()
x86/fred: Correct speculative safety in fred_extint()
debugobjects uses __GFP_HIGH for allocations as it might be invoked
within locked regions. That worked perfectly fine until v6.18. It still
works correctly when deferred page initialization is disabled and works
by chance when no page allocation is required before deferred page
initialization has completed.
Since v6.18 allocations w/o a reclaim flag cause new_slab() to end up in
alloc_frozen_pages_nolock_noprof(), which returns early when deferred
page initialization has not yet completed. As the deferred page
initialization takes quite a while the debugobject pool is depleted and
debugobjects are disabled.
This can be worked around when PREEMPT_COUNT is enabled as that allows
debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context
is preemtible. When PREEMPT_COUNT is disabled the context is unknown and
the reclaim bit can't be set because the caller might hold locks which
might deadlock in the allocator.
In preemptible context the reclaim bit is harmless and not a performance
issue as that's usually invoked from slow path initialization context.
That makes debugobjects depend on PREEMPT_COUNT || !DEFERRED_STRUCT_PAGE_INIT.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/87pl6gznti.ffs@tglx
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull timer fix from Ingo Molnar:
"Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for
the common HZ=100, 250 or 1000 cases. Only use a function call for odd
HZ values like HZ=300 that generate more code.
The function call overhead showed up in performance tests of the TCP
code"
* tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs()
After converting the __ASSEMBLY__ statements to __ASSEMBLER__ in
commit 24a295e4ef1ca ("x86/headers: Replace __ASSEMBLY__ with
__ASSEMBLER__ in non-UAPI headers"), some new code has been
added that uses __ASSEMBLY__ again. Convert these stragglers, too.
This is a mechanical patch, done with a simple "sed -i" command.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251218182029.166993-1-thuth@redhat.com
It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.
Re-add the missing !, fixing the behaviour.
Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull scheduler fixes from Ingo Molnar:
- Fix zero_vruntime tracking when there's a single task running
- Fix slice protection logic
- Fix the ->vprot logic for reniced tasks
- Fix lag clamping in mixed slice workloads
- Fix objtool uaccess warning (and bug) in the
!CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining,
which triggers with older compilers
- Fix a comment in the rseq registration rseq_size bound check code
- Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
differently, which special size we now reached naturally and want to
avoid. The visible ugliness of the new reserved field will be avoided
the next time the RSEQ area is extended.
* tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: slice ext: Ensure rseq feature size differs from original rseq size
rseq: Clarify rseq registration rseq_size bound check comment
sched/core: Fix wakeup_preempt's next_class tracking
rseq: Mark rseq_arm_slice_extension_timer() __always_inline
sched/fair: Fix lag clamp
sched/eevdf: Update se->vprot in reweight_entity()
sched/fair: Only set slice protection at pick time
sched/fair: Fix zero_vruntime tracking
For common cases (HZ=100, 250 or 1000), these helpers are at most one
multiply, so there is no point calling a tiny function.
Keep them out of line for HZ=300 and others.
This saves cycles in TCP fast path, among other things.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/8 grow/shrink: 25/89 up/down: 530/-3474 (-2944)
...
nla_put_msecs 193 - -193
message_stats_print 2131 920 -1211
Total: Before=25365208, After=25362264, chg -0.01%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260210170226.57209-1-edumazet@google.com
Rustam reported his clang builds did not boot properly; turns out his
.config has: CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y set.
Fix up the FineIBT code to deal with this unusual alignment.
Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT")
Reported-by: Rustam Kovhaev <rkovhaev@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Rustam Kovhaev <rkovhaev@gmail.com>
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/
Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported")
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull perf events fixes from Ingo Molnar:
- Fix lock ordering bug found by lockdep in perf_event_wakeup()
- Fix uncore counter enumeration on Granite Rapids and Sierra Forest
- Fix perf_mmap() refcount bug found by Syzkaller
- Fix __perf_event_overflow() vs perf_remove_from_context() race
* tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix __perf_event_overflow() vs perf_remove_from_context() race
perf/core: Fix refcount bug and potential UAF in perf_mmap
perf/x86/intel/uncore: Add per-scheduler IMC CAS count events
perf/core: Fix invalid wait context in ctx_sched_in()
Before rseq became extensible, its original size was 32 bytes even
though the active rseq area was only 20 bytes. This had the following
impact in terms of userspace ecosystem evolution:
* The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set
to 32, even though the size of the active rseq area is really 20.
* The GNU libc 2.40 changes this __rseq_size to 20, thus making it
express the active rseq area.
* Starting from glibc 2.41, __rseq_size corresponds to the
AT_RSEQ_FEATURE_SIZE from getauxval(3).
This means that users of __rseq_size can always expect it to
correspond to the active rseq area, except for the value 32, for
which the active rseq area is 20 bytes.
Exposing a 32 bytes feature size would make life needlessly painful
for userspace. Therefore, add a reserved field at the end of the
rseq area to bump the feature size to 33 bytes. This reserved field
is expected to be replaced with whatever field will come next,
expecting that this field will be larger than 1 byte.
The effect of this change is to increase the size from 32 to 64 bytes
before we actually have fields using that memory.
Clarify the allocation size and alignment requirements in the struct
rseq uapi comment.
Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the
value of the active rseq area size rounded up to next power of 2, which
guarantees that the rseq structure will always be aligned on the nearest
power of two large enough to contain it, even as it grows. Change the
alignment check in the rseq registration accordingly.
This will minimize the amount of ABI corner-cases we need to document
and require userspace to play games with. The rule stays simple when
__rseq_size != 32:
#define rseq_field_available(field) (__rseq_size >= offsetofend(struct rseq_abi, field))
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com
Pull powerpc updates for 7.0
- Implement masked user access
- Add bpf support for internal only per-CPU instructions and inline the
bpf_get_smp_processor_id() and bpf_get_current_task() functions
- Fix pSeries MSI-X allocation failure when quota is exceeded
- Fix recursive pci_lock_rescan_remove locking in EEH event handling
- Support tailcalls with subprogs & BPF exceptions on 64bit
- Extend "trusted" keys to support the PowerVM Key Wrapping Module
(PKWM)
Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.
* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
powerpc/pseries: plpks: export plpks_wrapping_is_supported
docs: trusted-encryped: add PKWM as a new trust source
keys/trusted_keys: establish PKWM as a trusted source
pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
pseries/plpks: expose PowerVM wrapping features via the sysfs
powerpc/pseries: move the PLPKS config inside its own sysfs directory
pseries/plpks: fix kernel-doc comment inconsistencies
powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
powerpc: kgdb: Remove OUTBUFMAX constant
powerpc64/bpf: Additional NVR handling for bpf_throw
powerpc64/bpf: Support exceptions
powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
powerpc64/bpf: Avoid tailcall restore from trampoline
powerpc64/bpf: Support tailcalls with subprogs
powerpc64/bpf: Moving tail_call_cnt to bottom of frame
powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
...
The commit 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()")
implemented __WARN_printf(), which changed the mechanism to use UD1
instead of UD2. However, it only handles the trap in the runtime IDT
handler, while the early booting IDT handler lacks this handling. As a
result, the usage of WARN() before the runtime IDT setup can lead to
kernel crashes. Since KMSAN is enabled after the runtime IDT setup, it
is safe to use handle_bug() directly in early_fixup_exception() to
address this issue.
Fixes: 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/c4fb3645f60d3a78629d9870e8fcc8535281c24f.1768016713.git.houwenlong.hwl@antgroup.com
Pull spi fixes from Mark Brown:
"One final batch of fixes for the Tegra SPI drivers, the main one is a
batch of fixes for races with the interrupts in the Tegra210 QSPI
driver that Breno has been working on for a while"
* tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: tegra114: Preserve SPI mode bits in def_command1_reg
spi: tegra: Fix a memory leak in tegra_slink_probe()
spi: tegra210-quad: Protect curr_xfer check in IRQ handler
spi: tegra210-quad: Protect curr_xfer clearing in tegra_qspi_non_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer in tegra_qspi_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer assignment in tegra_qspi_setup_transfer_one
spi: tegra210-quad: Move curr_xfer read inside spinlock
spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer
When a block read returns an invalid length, zero or >I2C_SMBUS_BLOCK_MAX,
the length handler sets the state to IMX_I2C_STATE_FAILED. However,
i2c_imx_master_isr() unconditionally overwrites this with
IMX_I2C_STATE_READ_CONTINUE, causing an endless read loop that overruns
buffers and crashes the system.
Guard the state transition to preserve error states set by the length
handler.
Fixes: 5f5c2d4579ca ("i2c: imx: prevent rescheduling in non dma mode")
Signed-off-by: LI Qingwu <Qing-wu.Li@leica-geosystems.com.cn>
Cc: <stable@vger.kernel.org> # v6.13+
Reviewed-by: Stefan Eichenberger <eichest@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260116111906.3413346-2-Qing-wu.Li@leica-geosystems.com.cn
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull fsverity fixes from Eric Biggers:
- Fix a build error on parisc
- Remove the non-large-folio-aware function fsverity_verify_page()
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: fix build error by adding fsverity_readahead() stub
fsverity: remove fsverity_verify_page()
f2fs: make f2fs_verify_cluster() partially large-folio-aware
f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()
Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.
However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.
As it turns out, there are multiple issues at play:
- while the HPFAR_EL2 register gives you the faulting IPA minus
the bottom 12 bits, it will still give you the extra bits that
are part of the page offset for anything larger than 4kB,
even for a level-3 mapping
- pkvm_pgtable_stage2_map() assumes that the address passed as
a parameter is aligned to the size of the intended mapping
- the faulting address is only aligned for a non-page mapping
When the planets are suitably aligned (pun intended), the guest
faults on a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again...
This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.
Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.
Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org
Cc: stable@vger.kernel.org
Pull locking fix from Ingo Molnar:
"Now that LLVM 22 has been released officially, require a release
version to use the new CONFIG_WARN_CONTEXT_ANALYSIS feature.
In particular this avoids the widely used Android clang 22.0.1
pre-release build which is known to be broken for this usecase"
* tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis
Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.
This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.
Fixes: 592903cdcbf6 ("perf_counter: add an event_list")
Reported-by: Simond Hu <cmdhh1767@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Simond Hu <cmdhh1767@gmail.com>
Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net
The rseq registration validates that the rseq_size argument is greater
or equal to 32 (the original rseq size), but the comment associated with
this check does not clearly state this.
Clarify the comment to that effect.
Fixes: ee3e3ac05c26 ("rseq: Introduce extensible rseq ABI")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-2-mathieu.desnoyers@efficios.com
Pull parisc updates from Helge Deller:
- Fix device reference leak in error path
- Check if system provides a 64-bit free running platform counter
- Minor fixes in debug code
* tag 'parisc-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: lba_pci: Add debug code to show IO and PA ranges
parisc: Detect 64-bit free running platform counter
parisc: Fix minor printk issues in iosapic debug code
parisc: Enhance debug code for PAT firmware
parisc: Add PDC PAT call to get free running 64-bit counter
parisc: Fix module path output in qemu tables
parisc: Export model name for MPE/ix
parisc: Prevent interrupts during reboot
parisc: Print hardware IDs as 4 digit hex strings
parisc: kernel: replace kfree() with put_device() in create_tree_node()
Building trusted-keys as a module fails modpost with:
ERROR: modpost: "plpks_wrapping_is_supported" [security/keys/trusted-keys/
trusted.ko] undefined!
Export plpks_wrapping_is_supported() so trusted-keys links cleanly
This patch is intended to be applied on top of the earlier "Extend "trusted
" keys to support a new trust source named the PowerVM Key Wrapping Module
(PKWM)" series (v5).
Link: https://lore.kernel.org/all/20260127145228.48320-1-ssrish@linux.ibm.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602010724.1g9hbLKv-lkp@intel.com/
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260201165344.950870-1-ssrish@linux.ibm.com
array_index_nospec() is no use if the result gets spilled to the stack, as
it makes the believed safe-under-speculation value subject to memory
predictions.
For all practical purposes, this means array_index_nospec() must be used in
the expression that accesses the array.
As the code currently stands, it's the wrong side of irqentry_enter(), and
'index' is put into %ebp across the function call.
Remove the index variable and reposition array_index_nospec(), so it's
calculated immediately before the array access.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260106131504.679932-1-andrew.cooper3@citrix.com
The COMMAND1 register bits [29:28] set the SPI mode, which controls
the clock idle level. When a transfer ends, tegra_spi_transfer_end()
writes def_command1_reg back to restore the default state, but this
register value currently lacks the mode bits. This results in the
clock always being configured as idle low, breaking devices that
need it high.
Fix this by storing the mode bits in def_command1_reg during setup,
to prevent this field from always being cleared.
Fixes: f333a331adfa ("spi/tegra114: add spi driver")
Signed-off-by: Vishwaroop A <va@nvidia.com>
Link: https://patch.msgid.link/20260204141212.1540382-1-va@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
hppa-linux-gcc 9.5.0 generates a call to fsverity_readahead() in
f2fs_readahead() when CONFIG_FS_VERITY=n, because it fails to do the
expected dead code elimination based on vi always being NULL. Fix the
build error by adding an inline stub for fsverity_readahead(). Since
it's just for opportunistic readahead, just make it a no-op.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602180838.pwICdY2r-lkp@intel.com/
Fixes: 45dcb3ac9832 ("f2fs: consolidate fsverity_info lookup")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20260218012244.18536-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull irqchip driver fixes from Ingo Molnar:
- Fix frozen interrupt bug in the sifive-plic driver
- Limit per-device MSI interrupts on uncommon gic-v3-its hardware
variants
- Address Sparse warning by constifying a variable in the MMP driver
- Revert broken commit and also fix an error check in the ls-extirq
driver
* tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ls-extirq: Fix devm_of_iomap() error check
Revert "irqchip/ls-extirq: Use for_each_of_imap_item iterator"
irqchip/mmp: Make icu_irq_chip variable static const
irqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS supports
irqchip/sifive-plic: Fix frozen interrupt due to affinity setting
Using a prerelease version as a minimum supported version for
CONFIG_WARN_CONTEXT_ANALYSIS was reasonable to do while LLVM 22 was the
development version so that people could immediately build from main and
start testing and validating this in their own code. However, it can be
problematic when using prerelease versions of LLVM 22, such as Android
clang 22.0.1 (the current android mainline compiler) or when bisecting
LLVM between llvmorg-22-init and llvmorg-23-init, to build the kernel,
as all compiler fixes for the context analysis may not be present,
potentially resulting in warnings that can easily turn into errors.
Now that LLVM 22 is released as 22.1.0, upgrade the check to require at
least this version to ensure that a user's toolchain actually has all
the changes needed for a smooth experience with context analysis.
Fixes: 3269701cb256 ("compiler-context-analysis: Add infrastructure for Context Analysis with Clang")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://patch.msgid.link/20260224-bump-clang-ver-context-analysis-v1-1-16cc7a90a040@kernel.org
Syzkaller reported a refcount_t: addition on 0; use-after-free warning
in perf_mmap.
The issue is caused by a race condition between a failing mmap() setup
and a concurrent mmap() on a dependent event (e.g., using output
redirection).
In perf_mmap(), the ring_buffer (rb) is allocated and assigned to
event->rb with the mmap_mutex held. The mutex is then released to
perform map_range().
If map_range() fails, perf_mmap_close() is called to clean up.
However, since the mutex was dropped, another thread attaching to
this event (via inherited events or output redirection) can acquire
the mutex, observe the valid event->rb pointer, and attempt to
increment its reference count. If the cleanup path has already
dropped the reference count to zero, this results in a
use-after-free or refcount saturation warning.
Fix this by extending the scope of mmap_mutex to cover the
map_range() call. This ensures that the ring buffer initialization
and mapping (or cleanup on failure) happens atomically effectively,
preventing other threads from accessing a half-initialized or
dying ring buffer.
Closes: https://lore.kernel.org/oe-kbuild-all/202602020208.m7KIjdzW-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Haocheng Yu <yuhaocheng035@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260202162057.7237-1-yuhaocheng035@gmail.com
Kernel test robot reported that
tools/testing/selftests/kvm/hardware_disable_test was failing due to
commit 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt()
and rq_modified_*()")
It turns out there were two related problems that could lead to a
missed preemption:
- when hitting newidle balance from the idle thread, it would elevate
rb->next_class from &idle_sched_class to &fair_sched_class, causing
later wakeup_preempt() calls to not hit the sched_class_above()
case, and not issue resched_curr().
Notably, this modification pattern should only lower the
next_class, and never raise it. Create two new helper functions to
wrap this.
- when doing schedule_idle(), it was possible to miss (re)setting
rq->next_class to &idle_sched_class, leading to the very same
problem.
Cc: Sean Christopherson <seanjc@google.com>
Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602122157.4e861298-lkp@intel.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260218163329.GQ1395416@noisy.programming.kicks-ass.net
Pull SoC devicetree updates from Arnd Bergmann:
"There are a handful of new SoCs this time, all of these are more or
less related to chips in a wider family:
- SpacemiT Key Stone K3 is an 8-core risc-v chip, and the first
widely available RVA23 implementation. Note that this is entirely
unrelated with the similarly named Texas Instruments K3 chip family
that follwed the TI Keystone2 SoC.
- The Realtek Kent family of SoCs contains three chip models
rtd1501s, rtd1861b and rtd1920s, and is related to their earlier
Set-top-box and NAS products such as rtd1619, but is built on newer
Arm Cortex-A78 cores.
- The Qualcomm Milos family includes the Snapdragon 7s Gen 3 (SM7635)
mobile phone SoC built around Armv9 Kryo cores of the Arm
Cortex-A720 generation. This one is used in the Fairphone Gen 6
- Qualcomm Kaanapali is a new SoC based around eight high performance
Oryon CPU cores
- NXP i.MX8QP and i.MX952 are both feature reduced versions of chips
we already support, i.e. the i.MX8QM and i.MX952, with fewer CPU
cores and I/O interfaces.
As part of a cleanup, a number of SoC specific devicetree files got
removed because they did not have a single board using the .dtsi files
and they were never compile tested as a result: Samsung s3c6400, ST
spear320s, ST stm32mp21xc/stm32mp23xc/stm32mp25xc, Renesas
r8a779m0/r8a779m2/r8a779m4/r8a779m6/r8a779m7/r8a779m8/r8a779mb/
r9a07g044c1/r9a07g044l1/r9a07g054l1/r9a09g047e37, and TI
am3703/am3715. All of these could be restored easily if a new board
gets merged.
Broadcom/Cavium/Marvell ThunderX2 gets removed along with its only
machine, as all remaining users are assumed to be using ACPI based
firmware.
A relatively small number of 43 boards get added this time, and almost
all of them for arm64. Aside from the reference boards for the newly
added SoCs, this includes:
- Three server boards use 32-bit ASpeed BMCs
- One more reference board for 32-bit Microchip LAN9668
- 64-bit Arm single-board computers based on Amlogic s905y4, CIX
sky1, NXP ls1028a/imx8mn/imx8mp/imx91/imx93/imx95, Qualcomm
qcs6490/qrb2210 and Rockchip rk3568/rk3588s
- Carrier board for SOMs using Intel agilex5, Marvell Armada 7020,
NXP iMX8QP, Mediatek mt8370/mt8390 and rockchip rk3588
- Two mobile phones using Snapdragon 845
- A gaming device and a NAS box, both based on Rockchips rk356x
On top of the newly added boards and SoCs, there is a lot of
background activity going into cleanups, in particular towards getting
a warning-free dtc build, and the usual work on adding support for
more hardware on the previously added machines"
* tag 'soc-dt-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (757 commits)
dt-bindings: intel: Add Agilex eMMC support
arm64: dts: socfpga: agilex: add emmc support
arm64: dts: intel: agilex5: Add simple-bus node on top of dma controller node
ARM: dts: socfpga: fix dtbs_check warning for fpga-region
ARM: dts: socfpga: add #address-cells and #size-cells for sram node
dt-bindings: altera: document syscon as fallback for sys-mgr
arm64: dts: altera: Use lowercase hex
dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml
arm64: dts: socfpga: agilex5: Add IOMMUS property for ethernet nodes
arm64: dts: socfpga: agilex5: add support for modular board
dt-bindings: intel: Add Agilex5 SoCFPGA modular board
arm64: dts: socfpga: agilex5: Add dma-coherent property
arm64: dts: realtek: Add Kent SoC and EVB device trees
dt-bindings: arm: realtek: Add Kent Soc family compatibles
ARM: dts: samsung: Drop s3c6400.dtsi
ARM: dts: nuvoton: Minor whitespace cleanup
MAINTAINERS: Add Falcon DB
arm64: dts: a7k: add COM Express boards
ARM: dts: microchip: Drop usb_a9g20-dab-mmx.dtsi
arm64: dts: rockchip: Fix rk3588 PCIe range mappings
...
Update Documentation/security/keys/trusted-encrypted.rst and Documentation/
admin-guide/kernel-parameters.txt with PowerVM Key Wrapping Module (PKWM)
as a new trust source
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-7-ssrish@linux.ibm.com
Pull binder fixes from Greg KH:
"Here are some small, last-minute binder C and Rust driver fixes for
reported issues. They include a number of fixes for reported crashes
and other problems.
All of these have been in linux-next this week, and longer"
* tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binderfs: fix ida_alloc_max() upper bound
rust_binderfs: fix ida_alloc_max() upper bound
binder: fix BR_FROZEN_REPLY error log
rust_binder: add additional alignment checks
binder: fix UAF in binder_netlink_report()
rust_binder: correctly handle FDA objects of length zero
Higher voltage settings were unusable due to incorrect n_voltages values
causing registration failures. For example, setting aldo4 to 3.3V failed
with -EINVAL because the required selector (123) exceeded the allowed
range (n_voltages=117).
Fix by aligning n_voltages with the hardware register widths per the P1
datasheet [1]:
- BUCK: 255 (was 254), allows selectors 0-254, selector 255 is reserved
- LDO: 128 (was 117), allows selectors 0-127, selectors 0-10 are for
suspend mode, valid operational range is 11-127
This enables the full voltage range supported by the hardware.
Fixes: 8b84d712ad84 ("regulator: spacemit: support SpacemiT P1 regulators")
Link: https://developer.spacemit.com/documentation [1]
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Link: https://patch.msgid.link/20260122-spacemit-p1-v1-1-309be27fbff9@riscstar.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In tegra_slink_probe(), when platform_get_irq() fails, it directly
returns from the function with an error code, which causes a memory leak.
Replace it with a goto label to ensure proper cleanup.
Fixes: eb9913b511f1 ("spi: tegra: Fix missing IRQ check in tegra_slink_probe()")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260202-slink-v1-1-eac50433a6f9@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull SCSI fixes from James Bottomley:
"Small changes in drivers only, no core changes.
The firewire one fixes a user controlled overflow (but I still can't
see how it could be exploited)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: amd-versal2: Fix PHY initialization in HCE enable notify
scsi: firewire: sbp-target: Fix overflow in sbp_make_tpg()
scsi: be2iscsi: Fix a memory leak in beiscsi_boot_get_sinfo()
scsi: qla2xxx: edif: Fix dma_free_coherent() size
Stephen retired and stepped back from -next maintainership, update his
entry in CREDITS to recognise his 18 years of hard work making it what
it is today and all the impact it's had on our development process.
Also update to his current GnuPG key while we're here.
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>