commits
Pull more arm64 updates from Catalin Marinas:
"The main 'feature' is a workaround for C1-Pro erratum 4193714
requiring IPIs during TLB maintenance if a process is running in user
space with SME enabled.
The hardware acknowledges the DVMSync messages before completing
in-flight SME accesses, with security implications. The workaround
makes use of the mm_cpumask() to track the cores that need
interrupting (arm64 hasn't used this mask before).
The rest are fixes for MPAM, CCA and generated header that turned up
during the merging window or shortly before.
Summary:
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a
recent commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation,
uninitialised variable and checker warnings"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
Pull sh updates from John Paul Adrian Glaubitz:
"Two patches from Thomas Zimmermann, one by Tim Bird and one by Thomas
Weißschuh.
The first patch by Thomas Zimmermann adds a missing include in dac.h
for SH-3 which became necessary after 243ce64b2b37 ("backlight: Do not
include <linux/fb.h> in header file") which made __raw_readb() and
__raw_writeb() inaccessible in dac.h.
Thomas' second patch drops CONFIG_FIRMWARE_EDID for SH as it depends
on X86 or EFI_GENERIC_STUB which are not defined on SH for obvious
reasons.
The patch by Tim Bird fixes just a small typo in two SPDX ID lines
which he stumbled over by accident.
And, least but not last, the patch by Thomas Weißschuh removes the
CONFIG_VSYSCALL reference from UAPI. This was necessary as the
definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe3994 (arm64,ia64,ppc,s390,
sh,tile,um,x86,mm: remove default gate area)"
* tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
sh: Remove CONFIG_VSYSCALL reference from UAPI
sh: Fix typo in SPDX license ID lines
sh: Include <linux/io.h> in dac.h
* for-next/c1-pro-erratum-4193714:
: Work around C1-Pro erratum 4193714 (CVE-2026-0995)
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
Pull uml updates from Johannes Berg:
"Mostly cleanups and small things, notably:
- musl libc compatibility
- vDSO installation fix
- TLB sync race fix for recent SMP support
- build fix for 32-bit with Clang 20/21"
* tag 'uml-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Disable GCOV_PROFILE_ALL on 32-bit UML with Clang 20/21
um: drivers: call kernel_strrchr() explicitly in cow_user.c
um: Replace strncpy() with strnlen()+memcpy_and_pad() in strncpy_chunk_from_user()
x86/um: fix vDSO installation
um: Remove CONFIG_FRAME_WARN from x86_64_defconfig
um: Fix pte_read() and pte_exec() for kernel mappings
um: Fix potential race condition in TLB sync
um: time-travel: clean up kernel-doc warnings
um: avoid struct sigcontext redefinition with musl
um: fix address-of CMSG_DATA() rvalue in stub
CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither
is true here, so drop the lines from the defconfig files.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
* for-next/misc:
: Miscellaneous cleanups/fixes
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
* for-next/mpam:
: Fix an unmount->remount problem with the CDP emulation, uninitialised
: variable and checker warnings
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
C1-Pro acknowledges DVMSync messages before completing the SME/CME
memory accesses. Work around this by issuing an IPI to the affected CPUs
if they are running in EL0 with SME enabled.
Note that we avoid the local DSB in the IPI handler as the kernel runs
with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory
accesses at EL0 on taking an exception to EL1. On the return to user
path, no barrier is necessary either. See the comment in
sme_set_active() and the more detailed explanation in the link below.
To avoid a potential IPI flood from malicious applications (e.g.
madvise(MADV_PAGEOUT) in a tight loop), track where a process is active
via mm_cpumask() and only interrupt those CPUs.
Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull printk updates from Petr Mladek:
- Fix printk ring buffer initialization and sanity checks
- Workaround printf kunit test compilation with gcc < 12.1
- Add IPv6 address printf format tests
- Misc code and documentation cleanup
* tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING
lib/vsprintf: use bool for local decode variable
lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()
printk: ringbuffer: fix errors in comments
printk_ringbuffer: Add sanity check for 0-size data
printk_ringbuffer: Fix get_data() size sanity check
printf: add IPv6 address format tests
printk: Fix _DESCS_COUNT type for 64-bit systems
Clang 20 and 21 miscompute __builtin_object_size() when -fprofile-arcs
is active on 32-bit UML targets, which passes incorrect object size
calculations for local variables through always_inline copy_to_user()
and check_copy_size(), causing spurious compile-time errors:
include/linux/ucopysize.h:52:4: error: call to '__bad_copy_from' declared with 'error' attribute: copy source size is too small
The regression was introduced in LLVM commit 02b8ee281947 ("[llvm]
Improve llvm.objectsize computation by computing GEP, alloca and malloc
parameters bound"), which shipped in Clang 20. It was fixed in LLVM
by commit 45b697e610fd ("[MemoryBuiltins] Consider index type size
when aggregating gep offsets"), which was backported to the LLVM 22.x
release branch.
The bug requires 32-bit UML + GCOV_PROFILE_ALL (which uses -fprofile-arcs),
though the exact trigger depends on optimizer decisions influenced by other
enabled configs.
Prevent the bad combination by disabling UML's ARCH_HAS_GCOV_PROFILE_ALL
on 32-bit when using Clang 20.x or 21.x.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604030531.O6FveVgn-lkp@intel.com/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Assisted-by: Claude:claude-opus-4-6[1m]
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260409052038.make.995-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
AT_SYSINFO_EHDR defines the auxvector index representing the vDSO
entrypoint. Its value or presence does not depend on whether a vDSO
is actually provided by the kernel.
The definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe3994
("arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area").
Remove the now unnecessary conditional.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
* arm64/for-next/perf:
: Perf updates
perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc()
perf/arm-cmn: Fix incorrect error check for devm_ioremap()
perf: add NVIDIA Tegra410 C2C PMU
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
perf/arm_cspmu: nvidia: Rename doc to Tegra241
perf/arm-cmn: Stop claiming entire iomem region
arm64: cpufeature: Use pmuv3_implemented() function
arm64: cpufeature: Make PMUVer and PerfMon unsigned
KVM: arm64: Read PMUVer as unsigned
* arm64/for-next/read-once:
: Fixes for __READ_ONCE() with CONFIG_LTO=y
arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
* for-next/misc:
: Miscellaneous cleanups/fixes
arm64: rsi: use linear-map alias for realm config buffer
arm64: Kconfig: fix duplicate word in CMDLINE help text
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
arm64: kexec: Remove duplicate allocation for trans_pgd
arm64: mm: Use generic enum pgtable_level
arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0
arm64: remove ARCH_INLINE_*
* for-next/tlbflush:
: Refactor the arm64 TLB invalidation API and implementation
arm64: mm: __ptep_set_access_flags must hint correct TTL
arm64: mm: Provide level hint for flush_tlb_page()
arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range()
arm64: mm: More flags for __flush_tlb_range()
arm64: mm: Refactor __flush_tlb_range() to take flags
arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid()
arm64: mm: Simplify __flush_tlb_range_limit_excess()
arm64: mm: Simplify __TLBI_RANGE_NUM() macro
arm64: mm: Re-implement the __flush_tlb_range_op macro in C
arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range()
arm64: mm: Push __TLBI_VADDR() into __tlbi_level()
arm64: mm: Implicitly invalidate user ASID based on TLBI operation
arm64: mm: Introduce a C wrapper for by-range TLB invalidation
arm64: mm: Re-implement the __tlbi_level macro as a C function
* for-next/ttbr-macros-cleanup:
: Cleanups of the TTBR1_* macros
arm64/mm: Directly use TTBRx_EL1_CnP
arm64/mm: Directly use TTBRx_EL1_ASID_MASK
arm64/mm: Describe TTBR1_BADDR_4852_OFFSET
* for-next/kselftest:
: arm64 kselftest updates
selftests/arm64: Implement cmpbr_sigill() to hwcap test
* for-next/feat_lsui:
: Futex support using FEAT_LSUI instructions to avoid toggling PAN
arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
arm64: Kconfig: Add support for LSUI
KVM: arm64: Use CAST instruction for swapping guest descriptor
arm64: futex: Support futex with FEAT_LSUI
arm64: futex: Refactor futex atomic operation
KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI
KVM: arm64: Expose FEAT_LSUI to guests
arm64: cpufeature: Add FEAT_LSUI
* for-next/mpam: (40 commits)
: Expose MPAM to user-space via resctrl:
: - Add architecture context-switch and hiding of the feature from KVM.
: - Add interface to allow MPAM to be exposed to user-space using resctrl.
: - Add errata workaoround for some existing platforms.
: - Add documentation for using MPAM and what shape of platforms can use resctrl
arm64: mpam: Add initial MPAM documentation
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
arm_mpam: Add workaround for T241-MPAM-6
arm_mpam: Add workaround for T241-MPAM-4
arm_mpam: Add workaround for T241-MPAM-1
arm_mpam: Add quirk framework
arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
arm_mpam: resctrl: Update the rmid reallocation limit
arm_mpam: resctrl: Add resctrl_arch_rmid_read()
arm_mpam: resctrl: Allow resctrl to allocate monitors
arm_mpam: resctrl: Add support for csu counters
arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
arm_mpam: resctrl: Add kunit test for control format conversions
arm_mpam: resctrl: Add support for 'MB' resource
arm_mpam: resctrl: Wait for cacheinfo to be ready
arm_mpam: resctrl: Add rmid index helpers
arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
...
* for-next/hotplug-batched-tlbi:
: arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
arm64/mm: Reject memory removal that splits a kernel leaf mapping
arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
* for-next/bbml2-fixes:
: Fixes for realm guest and BBML2_NOABORT
arm64: mm: Remove pmd_sect() and pud_sect()
arm64: mm: Handle invalid large leaf mappings correctly
arm64: mm: Fix rodata=full block mapping support for realm guests
* for-next/sysreg:
: arm64 sysreg updates
arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
* for-next/generic-entry:
: More arm64 refactoring towards using the generic entry code
arm64: Check DAIF (and PMR) at task-switch time
arm64: entry: Use split preemption logic
arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
arm64: entry: Consistently prefix arm64-specific wrappers
arm64: entry: Don't preempt with SError or Debug masked
entry: Split preemption from irqentry_exit_to_kernel_mode()
entry: Split kernel mode logic from irqentry_{enter,exit}()
entry: Move irqentry_enter() prototype later
entry: Remove local_irq_{enable,disable}_exit_to_user()
entry: Fix stale comment for irqentry_enter()
* for-next/acpi:
: arm64 ACPI updates
ACPI: AGDI: fix missing newline in error message
The RSI interface can return RSI_INCOMPLETE when a report spans
multiple granules. This is an expected condition and should not be
treated as a fatal error.
Currently, arm_cca_report_new() checks for `info.result != RSI_SUCCESS`
and bails out, which incorrectly flags RSI_INCOMPLETE as a failure.
Fix the check to only break out on results other than RSI_SUCCESS or
RSI_INCOMPLETE.
This ensures partial reports are handled correctly and avoids spurious
-ENXIO errors when generating attestation reports.
Fixes: 7999edc484ca ("virt: arm-cca-guest: TSM_REPORT support for realms")
Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Reported-by: Jagdish Gediya <Jagdish.Gediya@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
resctrl_mon_ctx_waiters is not used outside of this file, so make it
static. This fixes the sparse warning:
drivers/resctrl/mpam_resctrl.c:25:1: warning: symbol 'resctrl_mon_ctx_waiters' was not declared. Should it be static?
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603281842.c2K96tJA-lkp@intel.com/
Fixes: 2a3c79c61539 ("arm_mpam: resctrl: Allow resctrl to allocate monitors")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Add cputype definitions for C1-Pro. These will be used for errata
detection in subsequent patches.
These values can be found in "Table A-303: MIDR_EL1 bit descriptions" in
issue 07 of the C1-Pro TRM:
https://documentation-service.arm.com/static/6930126730f8f55a656570af
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull timer fix from Ingo Molnar:
"Fix timer stalls caused by incorrect handling of the
dev->next_event_forced flag"
* tag 'timers-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Add missing resets of the next_event_forced flag
Building ARCH=um on glibc >= 2.43 fails:
arch/um/drivers/cow_user.c: error: implicit declaration of
function 'strrchr' [-Wimplicit-function-declaration]
glibc 2.43's C23 const-preserving strrchr() macro does not survive
UML's global -Dstrrchr=kernel_strrchr remap from arch/um/Makefile.
Call kernel_strrchr() directly in cow_user.c so the source no longer
depends on the -D rewrite.
Fixes: 2c51a4bc0233 ("um: fix strrchr() problems")
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260408070102.2325572-1-michael.bommarito@gmail.com
[remove unnecessary 'extern']
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both platform_early.c and platform_early.h have an extra dash in
their SPDX-License-Identifier lines. Use the correct (single-dash)
syntax for these lines.
Signed-off-by: Tim Bird <tim.bird@sony.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
When building for 32-bit ARM, there is a warning when using the %llx
specifier to print a resource_size_t variable:
drivers/perf/arm-cmn.c: In function 'arm_cmn_init_dtc':
drivers/perf/arm-cmn.c:2149:73: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'resource_size_t' {aka 'unsigned int'} [-Werror=format=]
2149 | "Failed to request DTC region 0x%llx\n", base);
| ~~~^ ~~~~
| | |
| | resource_size_t {aka unsigned int}
| long long unsigned int
| %x
Use the %pa specifier to handle the possible sizes of phys_addr_t
properly. This requires passing the variable by reference.
Fixes: 5394396ff548 ("perf/arm-cmn: Stop claiming entire iomem region")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Robin murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
could see:
| kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:976:2: note: spinlock acquired here
| 976 | spin_lock(lock_ptr);
| | ^
| kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:966:6: note: spinlock acquired here
| 966 | void futex_q_lockptr_lock(struct futex_q *q)
| | ^
| 2 warnings generated.
Where we have:
extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
..
void futex_q_lockptr_lock(struct futex_q *q)
{
spinlock_t *lock_ptr;
/*
* See futex_unqueue() why lock_ptr can change.
*/
guard(rcu)();
retry:
>> lock_ptr = READ_ONCE(q->lock_ptr);
spin_lock(lock_ptr);
...
}
At the time of the above report (prior to removal of the 'atomic' flag),
Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
'atomic ? __u.__val : q->lock_ptr' (now just '__u.__val'), and used
this as the identity of the context lock given it cannot "see through"
the inline assembly; however, we want 'q->lock_ptr' as the canonical
context lock.
While for code generation the compiler simplified to '__u.__val' for
pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
much earlier on the AST, and (b) would be the wrong deduction.
Now that we've gotten rid of the 'atomic' ternary comparison, we can
return '__u.__val' through a pointer that we initialize with '&x', but
then update via a pointer-to-pointer. When READ_ONCE()'ing a context
lock pointer, TSA's alias analysis does not invalidate the initial alias
when updated through the pointer-to-pointer, and we make it effectively
"see through" the __READ_ONCE().
Code generation is unchanged.
Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Boqun Feng <boqun@kernel.org>
Reviewed-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
rsi_get_realm_config() passes its argument to virt_to_phys(), but
&config is a kernel image address and not a linear-map alias.
On arm64 this triggers the below warning:
virt_to_phys used for non-linear address: (____ptrval____) (config+0x0/0x1000)
WARNING: arch/arm64/mm/physaddr.c:15 at __virt_to_phys+0x50/0x70, CPU#0: swapper/0
Modules linked in:
.....
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __virt_to_phys+0x50/0x70
lr : __virt_to_phys+0x4c/0x70
.....
......
Call trace:
__virt_to_phys+0x50/0x70 (P)
arm64_rsi_init+0xa0/0x1b8
setup_arch+0x13c/0x1a0
start_kernel+0x68/0x398
__primary_switched+0x88/0x90
Pass lm_alias(&config) instead so the RSI call uses the linear-map
alias of the same buffer and avoids the boot-time warning.
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It has been reported that since commit 752a0d1d483e9 ("arm64: mm:
Provide level hint for flush_tlb_page()"), the arm64
check_hugetlb_options selftest has been locking up while running "Check
child hugetlb memory with private mapping, sync error mode and mmap
memory".
This is due to hugetlb (and THP) helpers casting their PMD/PUD entries
to PTE and calling __ptep_set_access_flags(), which issues a
__flush_tlb_page(). Now that this is hinted for level 3, in this case,
the TLB entry does not get evicted and we end up in a spurious fault
loop.
Fix this by creating a __ptep_set_access_flags_anysz() function which
takes the pgsize of the entry. It can then add the appropriate hint. The
"_anysz" approach is the established pattern for problems of this class.
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Fixes: 752a0d1d483e ("arm64: mm: Provide level hint for flush_tlb_page()")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Replace all TTBR_CNP_BIT macro instances with TTBRx_EL1_CNP_BIT which
is a standard field from tools sysreg format. Drop the now redundant
custom macro TTBR_CNP_BIT. No functional change.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The function executes a CBEQ instruction which is valid if the CPU
supports the CMPBR extension. The CBEQ branches to skip the following
UDF instruction, and no SIGILL is generated. Otherwise, it will
generate a SIGILL.
Signed-off-by: Yifan Wu <wuyifan50@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that
support LSUI are unlikely to support a 32-bit runtime. Rather than
emulating the SWP instruction using LSUI instructions in order to remove
PAN toggling, simply disable SWP emulation.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: some tweaks to the in-code comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MPAM (Memory Partitioning and Monitoring) is now exposed to user-space via
resctrl. Add some documentation so the user knows what features to expect.
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
Linear and vmemmap mappings that get torn down during a memory hot remove
operation might contain leaf level entries on any page table level. If the
requested memory range's linear or vmemmap mappings falls within such leaf
entries, new mappings need to be created for the remaining memory mapped on
the leaf entry earlier, following standard break before make aka BBM rules.
But kernel cannot tolerate BBM and hence remapping to fine grained leaves
would not be possible on systems without BBML2_NOABORT.
Currently memory hot remove operation does not perform such restructuring,
and so removing memory ranges that could split a kernel leaf level mapping
need to be rejected.
While memory_hotplug.c does appear to permit hot removing arbitrary ranges
of memory, the higher layers that drive memory_hotplug (e.g. ACPI, virtio,
...) all appear to treat memory as fixed size devices. So it is impossible
to hot unplug a different amount than was previously hot plugged, and hence
we should never see a rejection in practice, but adding the check makes us
robust against a future change.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.
Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().
There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The 2025 extensions add FEAT_SME2P3, including LUT6.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When __switch_to() switches from a 'prev' task to a 'next' task, various
pieces of CPU state are expected to have specific values, such that
these do not need to be saved/restored. If any of these hold an
unexpected value when switching away from the prev task, they could lead
to surprising behaviour in the context of the next task, and it would be
difficult to determine where they were configured to their unexpected
value.
Add some checks for DAIF and PMR at task-switch time so that we can
detect such issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add the missing trailing newline to the dev_err() message
printed when SDEI event registration fails.
This keeps the error output as a properly terminated log line.
Fixes: a2a591fb76e6 ("ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device")
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Haoyu Lu <hechushiguitu666@gmail.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When adding generation for the kernel internal constants for hwcaps the
generated file was not explicitly flagged as such in the build system,
causing it to be regenerated on each build. This wasn't obvious when the
series the change was included in was developed since it was all about
changes that trigger rebuilds anyway.
Fixes: abed23c3c44f ("arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps")
Reported-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Dan Carpenter reports that, in mpam_resctrl_alloc_domain(), any_mon_comp is
used in an 'if' condition when it may be uninitialized. Initialize it to
NULL so that the check behaves correctly when no monitor components are
found.
Reported-by: Dan Carpenter <error27@gmail.com>
Fixes: 264c285999fc ("arm_mpam: resctrl: Add monitor initialisation and domain boilerplate")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
The mm structure will be used for workarounds that need limiting to
specific tasks.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull entry cleanup from Ingo Molnar:
"Remove the unused ARCH_SYSCALL_WORK_{ENTER,EXIT} flags"
* tag 'core-urgent-2026-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Kill ARCH_SYSCALL_WORK_{ENTER,EXIT}
The prevention mechanism against timer interrupt starvation missed to reset
the next_event_forced flag in a couple of places:
- When the clock event state changes. That can cause the flag to be
stale over a shutdown/startup sequence
- When a non-forced event is armed, which then prevents rearming before
that event. If that event is far out in the future this will cause
missed timer interrupts.
- In the suspend wakeup handler.
That led to stalls which have been reported by several people.
Add the missing resets, which fixes the problems for the reporters.
Fixes: d6e152d905bd ("clockevents: Prevent timer interrupt starvation")
Reported-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Reported-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com
Link: https://patch.msgid.link/87340xfeje.ffs@tglx
The printk ringbuffer implementation is described in the comment as
using three ringbuffers, but the current implementation uses two (desc
and data). Update the comment so it matches the code.
Fix few more known issues in the comments.
Signed-off-by: Loïc Grégoire <loicgre@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20260328021855.53956-1-loicgre@gmail.com
[pmladek@suse.com: Fixed few more issues in the comments by John Ogness.]
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Replace the deprecated[1] strncpy() with strnlen() on the source
followed by memcpy_and_pad().
This function is a chunk callback for UML's strncpy_from_user()
implementation, called by buffer_op() to process userspace memory one
page at a time. The source is a kernel-mapped userspace address that
is not guaranteed to be NUL-terminated; "len" bounds how many bytes
to read from it.
By measuring the source string length first with strnlen(), we avoid
reading past the NUL terminator in the source. memcpy_and_pad() then
copies the string content and zero-fills the remainder of the chunk,
preserving the original strncpy() behavior exactly: copy up to the
first NUL, then pad with zeros to the full length.
strtomem_pad() would be the idiomatic helper for this strnlen() +
memcpy_and_pad() pattern, but it requires a compile-time-determinable
destination size (via ARRAY_SIZE()). Here the destination is a char *
into a caller-provided buffer and the chunk length is a runtime value,
so the explicit two-step is necessary.
No behavioral change: the same bytes are written to the destination
(string content followed by zero padding), the pointer advances by
the same amount, and the NUL-found return condition is unchanged.
Link: https://github.com/KSPP/linux/issues/90 [1]
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260323171713.work.839-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Include <linux/io.h> to avoid depending on <linux/backlight.h>
for including it. Declares __raw_readb() and __raw_writeb().
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510282206.wI0HrqcK-lkp@intel.com/
Fixes: 243ce64b2b37 ("backlight: Do not include <linux/fb.h> in header file")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Thompson (RISCstar) <danielt@kernel.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Lee Jones <lee@kernel.org>
Cc: Daniel Thompson <danielt@kernel.org>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Check devm_ioremap() return value for NULL instead of ERR_PTR and return
-ENOMEM on failure. devm_ioremap() never returns ERR_PTR, using IS_ERR()
skips the error path and may cause a NULL pointer dereference.
Fixes: 5394396ff548 ("perf/arm-cmn: Stop claiming entire iomem region")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Will Deacon <will@kernel.org>
Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
1. Replace _Generic-based __unqual_scalar_typeof() with more complete
__rwonce_typeof_unqual(). This strips qualifiers from all types, not
just integer types, which is required to be able to assign (must be
non-const) to __u.__val in the non-atomic case (required for #2).
One subtle point here is that non-integer types of __val could be const
or volatile within the union with the old __unqual_scalar_typeof(), if
the passed variable is const or volatile. This would then result in a
forced load from the stack if __u.__val is volatile; in the case of
const, it does look odd if the underlying storage changes, but the
compiler is told said member is "const" -- it smells like UB.
2. Eliminate the atomic flag and ternary conditional expression. Move
the fallback volatile load into the default case of the switch,
ensuring __u is unconditionally initialized across all paths.
The statement expression now unconditionally returns __u.__val.
This refactoring appears to help the compiler improve (or fix) code
generation.
With a defconfig + LTO + debug options builds, we observe different
codegen for the following functions:
btrfs_reclaim_sweep (708 -> 1032 bytes)
btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
console_flush_all (1268 -> 1264 bytes)
console_lock_spinning_disable_and_check (180 -> 176 bytes)
igb_add_filter (640 -> 636 bytes)
igb_config_tx_modes (2404 -> 2400 bytes)
kvm_vcpu_on_spin (480 -> 476 bytes)
map_freeze (376 -> 380 bytes)
netlink_bind (1664 -> 1656 bytes)
nmi_cpu_backtrace (404 -> 400 bytes)
set_rps_cpu (516 -> 520 bytes)
swap_cluster_readahead (944 -> 932 bytes)
tcp_accecn_third_ack (328 -> 336 bytes)
tcp_create_openreq_child (1764 -> 1772 bytes)
tcp_data_queue (5784 -> 5892 bytes)
tcp_ecn_rcv_synack (620 -> 628 bytes)
xen_manage_runstate_time (944 -> 896 bytes)
xen_steal_clock (340 -> 296 bytes)
Increase of some functions are due to more aggressive inlining due to
better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
present due to being inlined completely).
NOTE: The return-value-of-function-drops-qualifiers hack was first
suggested by Al Viro in [1], which notes some of its limitations which
make it unsuitable for a general __unqual_scalar_typeof() replacement.
Notably, array types are not supported, and GCC 8.1-8.3 still fail. Why
should we use it here? READ_ONCE() does not support reading whole
arrays, and the GCC version problem only affects 3 minor releases of a
very ancient still-supported GCC version; not only that, this arm64
READ_ONCE() version is currently only activated by LTO builds, which
to-date are *only supported by Clang*!
Link: https://lore.kernel.org/all/20260111182010.GH3634291@ZenIV/ [1]
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Remove duplicate 'the' in the CMDLINE config help text.
Signed-off-by: Michael Ugrin <mugrinphoto@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Previously tlb invalidations issued by __flush_tlb_page() did not
contain a level hint. From the core API documentation, this function is
clearly only ever intended to target level 3 (PTE) tlb entries:
| 4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)``
|
| This time we need to remove the PAGE_SIZE sized translation
| from the TLB.
However, the arm64 documentation is more relaxed allowing any last level:
| this operation only invalidates a single, last-level page-table
| entry and therefore does not affect any walk-caches
It turns out that the function was actually being used to invalidate a
level 2 mapping via flush_tlb_fix_spurious_fault_pmd(). The bug was
benign because the level hint was not set so the HW would still
invalidate the PMD mapping, and also because the TLBF_NONOTIFY flag was
set, the bounds of the mapping were never used for anything else.
Now that we have the new and improved range-invalidation API, it is
trival to fix flush_tlb_fix_spurious_fault_pmd() to explicitly flush the
whole range (locally, without notification and last level only). So
let's do that, and then update __flush_tlb_page() to hint level 3.
Reviewed-by: Linu Cherian <linu.cherian@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
[catalin.marinas@arm.com: use "level 3" in the __flush_tlb_page() description]
[catalin.marinas@arm.com: tweak the commit message to include the core API text]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Replace all TTBR_ASID_MASK macro instances with TTBRx_EL1_ASID_MASK which
is a standard field mask from tools sysreg format. Drop the now redundant
custom macro TTBR_ASID_MASK. No functional change.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
Add Kconfig option entry for FEAT_LSUI.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
During a memory hot remove operation, both linear and vmemmap mappings for
the memory range being removed, get unmapped via unmap_hotplug_range() but
mapped pages get freed only for vmemmap mapping. This is just a sequential
operation where each table entry gets cleared, followed by a leaf specific
TLB flush, and then followed by memory free operation when applicable.
This approach was simple and uniform both for vmemmap and linear mappings.
But linear mapping might contain CONT marked block memory where it becomes
necessary to first clear out all entire in the range before a TLB flush.
This is as per the architecture requirement. Hence batch all TLB flushes
during the table tear down walk and finally do it in unmap_hotplug_range().
Prior to this fix, it was hypothetically possible for a speculative access
to a higher address in the contiguous block to fill the TLB with shattered
entries for the entire contiguous range after a lower address had already
been cleared and invalidated. Due to the table entries being shattered, the
subsequent TLB invalidation for the higher address would not then clear the
TLB entries for the lower address, meaning stale TLB entries could persist.
Besides it also helps in improving the performance via TLBI range operation
along with reduced synchronization instructions. The time spent executing
unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal
in KVM guest.
This scheme is not applicable during vmemmap mapping tear down where memory
needs to be freed and hence a TLB flush is required after clearing out page
table entry.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
[ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[ 15.513762] Mem abort info:
[ 15.527245] ESR = 0x0000000096000046
[ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits
[ 15.572146] SET = 0, FnV = 0
[ 15.592141] EA = 0, S1PTW = 0
[ 15.612694] FSC = 0x06: level 2 translation fault
[ 15.640644] Data abort info:
[ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP
[ 15.886394] Modules linked in:
[ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[ 15.935258] Hardware name: linux,dummy-virt (DT)
[ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[ 16.006163] lr : swiotlb_bounce+0xf4/0x158
[ 16.024145] sp : ffff80008000b8f0
[ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[ 16.349071] Call trace:
[ 16.360143] __pi_memcpy_generic+0x128/0x22c (P)
[ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4
[ 16.400282] swiotlb_map+0x5c/0x228
[ 16.415984] dma_map_phys+0x244/0x2b8
[ 16.432199] dma_map_page_attrs+0x44/0x58
[ 16.449782] virtqueue_map_page_attrs+0x38/0x44
[ 16.469596] virtqueue_map_single_attrs+0xc0/0x130
[ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc
[ 16.510355] try_fill_recv+0x2a4/0x584
[ 16.526989] virtnet_open+0xd4/0x238
[ 16.542775] __dev_open+0x110/0x24c
[ 16.558280] __dev_change_flags+0x194/0x20c
[ 16.576879] netif_change_flags+0x24/0x6c
[ 16.594489] dev_change_flags+0x48/0x7c
[ 16.611462] ip_auto_config+0x258/0x1114
[ 16.628727] do_one_initcall+0x80/0x1c8
[ 16.645590] kernel_init_freeable+0x208/0x2f0
[ 16.664917] kernel_init+0x24/0x1e0
[ 16.680295] ret_from_fork+0x10/0x20
[ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[ 16.723106] ---[ end trace 0000000000000000 ]---
[ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[ 16.818966] PHYS_OFFSET: 0xfff1000080000000
[ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[ 16.862904] Memory Limit: none
[ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.
The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.
But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.
But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.
Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().
The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.
Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The 2025 extensions add FEAT_SVE2P3 and FEAT_SVE_B16MM.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.
Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.
In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The code to set MBA's alloc_capable to true appears to be trying to
restore alloc_capable on unmount. This can never work because
resctrl_arch_set_cdp_enabled() is never invoked with RDT_RESOURCE_MBA
as the rid parameter. Consequently,
mpam_resctrl_controls[RDT_RESOURCE_MBA].cdp_enabled always remains false.
The alloc_capable setting in resctrl_arch_set_cdp_enabled() is to
re-enable MBA if the caller opts in to separate control values using
CDP for this resource. This doesn't happen today.
Add a comment to describe this.
However a bug remains where MBA allocation is permanently disabled after
the mount with CDP option. Remounting without CDP cannot restore the MBA
partition capability.
Add a check to re-enable MBA when CDP is disabled, which happens on
unmount.
Fixes: 6789fb99282c ("arm_mpam: resctrl: Add CDP emulation")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
[ morse: Added comment for existing code, added hunk to fix this bug from
Ben H ]
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Add __tlbi_sync_s1ish_kernel() similar to __tlbi_sync_s1ish() and use it
for kernel TLB maintenance. Also use this function in flush_tlb_all()
which is only used in relation to kernel mappings. Subsequent patches
can differentiate between workarounds that apply to user only or both
user and kernel.
A subsequent patch will add mm_struct to __tlbi_sync_s1ish(). Since
arch_tlbbatch_flush() is not specific to an mm, add a corresponding
__tlbi_sync_s1ish_batch() helper.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull x86 platform driver updates from Ilpo Järvinen:
"asus-wmi:
- Retain battery charge threshold during boot which avoids
unsolicited change to 100%. Return -ENODATA when the limit
is not yet known
- Improve screenpad power/brightness handling consistency
- Fix screenpad brightness range
barco-p50-gpio:
- Normalize gpio_get return values
bitland-mifs-wmi:
- Add driver for Bitland laptops (supports platform profile,
hwmon, kbd backlight, gpu mode, hotkeys, and fan boost)
dell_rbu:
- Fix using uninitialized value in sysfs write function
dell-wmi-sysman:
- Respect destination length when constructing enum strings
hp-wmi:
- Propagate fan setting apply failures and log an error
- Fix sysfs write vs work handler cancel_delayed_work_sync() deadlock
- Correct keepalive schedule_delayed_work() to mod_delayed_work()
- Fix u8 underflows in GPU delta calculation
- Use mutex to protect fan pwm/mode
- Ignore kbd backlight and FnLock key events that are handled by FW
- Fix fan table parsing (use correct field)
- Add support for Omen 14-fb0xxx, 16-n0xxx, 16-wf1xxx, and
Omen MAX 16-ak0xxxx
input: trackpoint & thinkpad_acpi:
- Enable doubletap by default and add sysfs enable/disable
int3472:
- Add support for GPIO type 0x02 (IR flood LED)
intel-speed-select: (updated to v1.26)
- Avoid using current base frequency as maximum
- Fix CPU extended family ID decoding
- Fix exit code
- Improve error reporting
intel/vsec:
- Refactor to support ACPI-enumerated PMT endpoints.
pcengines-apuv2:
- Attach software node to the gpiochip
uniwill:
- Refactor hwmon to smaller parts to accomodate HW diversity
- Support USB-C power/performance priority switch through sysfs
- Add another XMG Fusion 15 (L19) DMI vendor
- Enable fine-grained features to device lineup mapping
wmi:
- Perform output size check within WMI core to allow simpler WMI
drivers
misc:
- acpi_driver -> platform driver conversions (a large number of
changes from Rafael J. Wysocki)
- cleanups / refactoring / improvements"
* tag 'platform-drivers-x86-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (106 commits)
platform/x86: hp-wmi: Add support for Omen 16-wf1xxx (8C77)
platform/x86: hp-wmi: Add support for Omen 16-n0xxx (8A44)
platform/x86: hp-wmi: Add support for OMEN MAX 16-ak0xxx (8D87)
platform/x86: hp-wmi: fix fan table parsing
platform/x86: hp-wmi: add Omen 14-fb0xxx (board 8C58) support
platform/wmi: Replace .no_notify_data with .min_event_size
platform/wmi: Extend wmidev_query_block() to reject undersized data
platform/wmi: Extend wmidev_invoke_method() to reject undersized data
platform/wmi: Prepare to reject undersized unmarshalling results
platform/wmi: Convert drivers to use wmidev_invoke_procedure()
platform/wmi: Add wmidev_invoke_procedure()
platform/x86: int3472: Add support for GPIO type 0x02 (IR flood LED)
platform/x86: int3472: Parameterize LED con_id in registration
platform/x86: int3472: Rename pled to led in LED registration code
platform/x86: int3472: Use local variable for LED struct access
platform/x86: thinkpad_acpi: remove obsolete TODO comment
platform/x86: dell-wmi-sysman: bound enumeration string aggregation
platform/x86: hp-wmi: Ignore backlight and FnLock events
platform/x86: uniwill-laptop: Fix signedness bug
platform/x86: dell_rbu: avoid uninit value usage in packet_size_write()
...
Nowadays nothing redefines these flags.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/advfWWKgOQkFkwp9@redhat.com
Pull jfs updates from Dave Kleikamp:
"More robust data integrity checking and some fixes"
* tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy:
jfs: avoid -Wtautological-constant-out-of-range-compare warning again
JFS: always load filesystem UUID during mount
jfs: hold LOG_LOCK on umount to avoid null-ptr-deref
jfs: Set the lbmDone flag at the end of lbmIODone
jfs: fix corrupted list in dbUpdatePMap
jfs: add dmapctl integrity check to prevent invalid operations
jfs: add dtpage integrity check to prevent index/pointer overflows
jfs: add dtroot integrity check to prevent index out-of-bounds
The local variable 'decode' is only used as a boolean value - change its
data type from int to bool accordingly.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260407181835.1053072-2-thorsten.blum@linux.dev
Signed-off-by: Petr Mladek <pmladek@suse.com>
GCC < 12.1 can miscompile printf_kunit's errptr() test when branch
profiling is enabled. BUILD_BUG_ON(IS_ERR(PTR)) is a constant false
expression, but CONFIG_TRACE_BRANCH_PROFILING and
CONFIG_PROFILE_ALL_BRANCHES make the IS_ERR() path side-effectful.
GCC's IPA splitter can then outline the cold assert arm into
errptr.part.* and leave that clone with an unconditional
__compiletime_assert_*() call, causing a false build failure.
This started showing up after test_hashed() became a macro and moved its
local buffer into errptr(), which changed GCC's inlining and splitting
decisions enough to expose the compiler bug.
Workaround the problem by disabling the branch profiling for
printf_kunit.o. It is a straightforward and acceptable solution.
The workaround can be removed once the minimum GCC includes commit
76fe49423047 ("Fix tree-optimization/101941: IPA splitting out
function with error attribute"), which first shipped in GCC 12.1.
Fixes: 9bfa52dac27a ("printf: convert test_hashed into macro")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604030636.NqjaJvYp-lkp@intel.com/
Cc: stable@vger.kernel.org
Acked-by: Tamir Duberstein <tamird@kernel.org>
Link: https://patch.msgid.link/ad5gJAX9f6dSQluz@pathway.suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.com>
get_data() has a sanity check for regular data blocks to ensure at
least space for the ID exists. But a regular block should also have
at least 1 byte of data (otherwise it would be data-less instead of
regular).
Expand the get_data() block size sanity check to additionally expect
at least 1 byte of data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260326133809.8045-2-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
Pull more arm64 updates from Catalin Marinas:
"The main 'feature' is a workaround for C1-Pro erratum 4193714
requiring IPIs during TLB maintenance if a process is running in user
space with SME enabled.
The hardware acknowledges the DVMSync messages before completing
in-flight SME accesses, with security implications. The workaround
makes use of the mm_cpumask() to track the cores that need
interrupting (arm64 hasn't used this mask before).
The rest are fixes for MPAM, CCA and generated header that turned up
during the merging window or shortly before.
Summary:
Core features:
- Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
DVMSync acknowledgement. The fix consists of sending IPIs on TLB
maintenance to those CPUs running in user space with SME enabled
- Include kernel-hwcap.h in list of generated files (missed in a
recent commit generating the KERNEL_HWCAP_* macros)
CCA:
- Fix RSI_INCOMPLETE error check in arm-cca-guest
MPAM:
- Fix an unmount->remount problem with the CDP emulation,
uninitialised variable and checker warnings"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
Pull sh updates from John Paul Adrian Glaubitz:
"Two patches from Thomas Zimmermann, one by Tim Bird and one by Thomas
Weißschuh.
The first patch by Thomas Zimmermann adds a missing include in dac.h
for SH-3 which became necessary after 243ce64b2b37 ("backlight: Do not
include <linux/fb.h> in header file") which made __raw_readb() and
__raw_writeb() inaccessible in dac.h.
Thomas' second patch drops CONFIG_FIRMWARE_EDID for SH as it depends
on X86 or EFI_GENERIC_STUB which are not defined on SH for obvious
reasons.
The patch by Tim Bird fixes just a small typo in two SPDX ID lines
which he stumbled over by accident.
And, least but not last, the patch by Thomas Weißschuh removes the
CONFIG_VSYSCALL reference from UAPI. This was necessary as the
definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe3994 (arm64,ia64,ppc,s390,
sh,tile,um,x86,mm: remove default gate area)"
* tag 'sh-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Drop CONFIG_FIRMWARE_EDID from defconfig files
sh: Remove CONFIG_VSYSCALL reference from UAPI
sh: Fix typo in SPDX license ID lines
sh: Include <linux/io.h> in dac.h
* for-next/c1-pro-erratum-4193714:
: Work around C1-Pro erratum 4193714 (CVE-2026-0995)
arm64: errata: Work around early CME DVMSync acknowledgement
arm64: cputype: Add C1-Pro definitions
arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
Pull uml updates from Johannes Berg:
"Mostly cleanups and small things, notably:
- musl libc compatibility
- vDSO installation fix
- TLB sync race fix for recent SMP support
- build fix for 32-bit with Clang 20/21"
* tag 'uml-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Disable GCOV_PROFILE_ALL on 32-bit UML with Clang 20/21
um: drivers: call kernel_strrchr() explicitly in cow_user.c
um: Replace strncpy() with strnlen()+memcpy_and_pad() in strncpy_chunk_from_user()
x86/um: fix vDSO installation
um: Remove CONFIG_FRAME_WARN from x86_64_defconfig
um: Fix pte_read() and pte_exec() for kernel mappings
um: Fix potential race condition in TLB sync
um: time-travel: clean up kernel-doc warnings
um: avoid struct sigcontext redefinition with musl
um: fix address-of CMSG_DATA() rvalue in stub
CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither
is true here, so drop the lines from the defconfig files.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
* for-next/misc:
: Miscellaneous cleanups/fixes
virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
arm64/hwcap: Include kernel-hwcap.h in list of generated files
* for-next/mpam:
: Fix an unmount->remount problem with the CDP emulation, uninitialised
: variable and checker warnings
arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
arm_mpam: resctrl: Fix the check for no monitor components found
arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
C1-Pro acknowledges DVMSync messages before completing the SME/CME
memory accesses. Work around this by issuing an IPI to the affected CPUs
if they are running in EL0 with SME enabled.
Note that we avoid the local DSB in the IPI handler as the kernel runs
with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory
accesses at EL0 on taking an exception to EL1. On the return to user
path, no barrier is necessary either. See the comment in
sme_set_active() and the more detailed explanation in the link below.
To avoid a potential IPI flood from malicious applications (e.g.
madvise(MADV_PAGEOUT) in a tight loop), track where a process is active
via mm_cpumask() and only interrupt those CPUs.
Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull printk updates from Petr Mladek:
- Fix printk ring buffer initialization and sanity checks
- Workaround printf kunit test compilation with gcc < 12.1
- Add IPv6 address printf format tests
- Misc code and documentation cleanup
* tag 'printk-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printf: Compile the kunit test with DISABLE_BRANCH_PROFILING DISABLE_BRANCH_PROFILING
lib/vsprintf: use bool for local decode variable
lib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()
printk: ringbuffer: fix errors in comments
printk_ringbuffer: Add sanity check for 0-size data
printk_ringbuffer: Fix get_data() size sanity check
printf: add IPv6 address format tests
printk: Fix _DESCS_COUNT type for 64-bit systems
Clang 20 and 21 miscompute __builtin_object_size() when -fprofile-arcs
is active on 32-bit UML targets, which passes incorrect object size
calculations for local variables through always_inline copy_to_user()
and check_copy_size(), causing spurious compile-time errors:
include/linux/ucopysize.h:52:4: error: call to '__bad_copy_from' declared with 'error' attribute: copy source size is too small
The regression was introduced in LLVM commit 02b8ee281947 ("[llvm]
Improve llvm.objectsize computation by computing GEP, alloca and malloc
parameters bound"), which shipped in Clang 20. It was fixed in LLVM
by commit 45b697e610fd ("[MemoryBuiltins] Consider index type size
when aggregating gep offsets"), which was backported to the LLVM 22.x
release branch.
The bug requires 32-bit UML + GCOV_PROFILE_ALL (which uses -fprofile-arcs),
though the exact trigger depends on optimizer decisions influenced by other
enabled configs.
Prevent the bad combination by disabling UML's ARCH_HAS_GCOV_PROFILE_ALL
on 32-bit when using Clang 20.x or 21.x.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604030531.O6FveVgn-lkp@intel.com/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Assisted-by: Claude:claude-opus-4-6[1m]
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260409052038.make.995-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
AT_SYSINFO_EHDR defines the auxvector index representing the vDSO
entrypoint. Its value or presence does not depend on whether a vDSO
is actually provided by the kernel.
The definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe3994
("arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area").
Remove the now unnecessary conditional.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
* arm64/for-next/perf:
: Perf updates
perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc()
perf/arm-cmn: Fix incorrect error check for devm_ioremap()
perf: add NVIDIA Tegra410 C2C PMU
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
perf/arm_cspmu: nvidia: Rename doc to Tegra241
perf/arm-cmn: Stop claiming entire iomem region
arm64: cpufeature: Use pmuv3_implemented() function
arm64: cpufeature: Make PMUVer and PerfMon unsigned
KVM: arm64: Read PMUVer as unsigned
* arm64/for-next/read-once:
: Fixes for __READ_ONCE() with CONFIG_LTO=y
arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
* for-next/misc:
: Miscellaneous cleanups/fixes
arm64: rsi: use linear-map alias for realm config buffer
arm64: Kconfig: fix duplicate word in CMDLINE help text
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
arm64: kexec: Remove duplicate allocation for trans_pgd
arm64: mm: Use generic enum pgtable_level
arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0
arm64: remove ARCH_INLINE_*
* for-next/tlbflush:
: Refactor the arm64 TLB invalidation API and implementation
arm64: mm: __ptep_set_access_flags must hint correct TTL
arm64: mm: Provide level hint for flush_tlb_page()
arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range()
arm64: mm: More flags for __flush_tlb_range()
arm64: mm: Refactor __flush_tlb_range() to take flags
arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid()
arm64: mm: Simplify __flush_tlb_range_limit_excess()
arm64: mm: Simplify __TLBI_RANGE_NUM() macro
arm64: mm: Re-implement the __flush_tlb_range_op macro in C
arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range()
arm64: mm: Push __TLBI_VADDR() into __tlbi_level()
arm64: mm: Implicitly invalidate user ASID based on TLBI operation
arm64: mm: Introduce a C wrapper for by-range TLB invalidation
arm64: mm: Re-implement the __tlbi_level macro as a C function
* for-next/ttbr-macros-cleanup:
: Cleanups of the TTBR1_* macros
arm64/mm: Directly use TTBRx_EL1_CnP
arm64/mm: Directly use TTBRx_EL1_ASID_MASK
arm64/mm: Describe TTBR1_BADDR_4852_OFFSET
* for-next/kselftest:
: arm64 kselftest updates
selftests/arm64: Implement cmpbr_sigill() to hwcap test
* for-next/feat_lsui:
: Futex support using FEAT_LSUI instructions to avoid toggling PAN
arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
arm64: Kconfig: Add support for LSUI
KVM: arm64: Use CAST instruction for swapping guest descriptor
arm64: futex: Support futex with FEAT_LSUI
arm64: futex: Refactor futex atomic operation
KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI
KVM: arm64: Expose FEAT_LSUI to guests
arm64: cpufeature: Add FEAT_LSUI
* for-next/mpam: (40 commits)
: Expose MPAM to user-space via resctrl:
: - Add architecture context-switch and hiding of the feature from KVM.
: - Add interface to allow MPAM to be exposed to user-space using resctrl.
: - Add errata workaoround for some existing platforms.
: - Add documentation for using MPAM and what shape of platforms can use resctrl
arm64: mpam: Add initial MPAM documentation
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
arm_mpam: Add workaround for T241-MPAM-6
arm_mpam: Add workaround for T241-MPAM-4
arm_mpam: Add workaround for T241-MPAM-1
arm_mpam: Add quirk framework
arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
arm_mpam: resctrl: Update the rmid reallocation limit
arm_mpam: resctrl: Add resctrl_arch_rmid_read()
arm_mpam: resctrl: Allow resctrl to allocate monitors
arm_mpam: resctrl: Add support for csu counters
arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
arm_mpam: resctrl: Add kunit test for control format conversions
arm_mpam: resctrl: Add support for 'MB' resource
arm_mpam: resctrl: Wait for cacheinfo to be ready
arm_mpam: resctrl: Add rmid index helpers
arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
...
* for-next/hotplug-batched-tlbi:
: arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
arm64/mm: Reject memory removal that splits a kernel leaf mapping
arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
* for-next/bbml2-fixes:
: Fixes for realm guest and BBML2_NOABORT
arm64: mm: Remove pmd_sect() and pud_sect()
arm64: mm: Handle invalid large leaf mappings correctly
arm64: mm: Fix rodata=full block mapping support for realm guests
* for-next/sysreg:
: arm64 sysreg updates
arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
* for-next/generic-entry:
: More arm64 refactoring towards using the generic entry code
arm64: Check DAIF (and PMR) at task-switch time
arm64: entry: Use split preemption logic
arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
arm64: entry: Consistently prefix arm64-specific wrappers
arm64: entry: Don't preempt with SError or Debug masked
entry: Split preemption from irqentry_exit_to_kernel_mode()
entry: Split kernel mode logic from irqentry_{enter,exit}()
entry: Move irqentry_enter() prototype later
entry: Remove local_irq_{enable,disable}_exit_to_user()
entry: Fix stale comment for irqentry_enter()
* for-next/acpi:
: arm64 ACPI updates
ACPI: AGDI: fix missing newline in error message
The RSI interface can return RSI_INCOMPLETE when a report spans
multiple granules. This is an expected condition and should not be
treated as a fatal error.
Currently, arm_cca_report_new() checks for `info.result != RSI_SUCCESS`
and bails out, which incorrectly flags RSI_INCOMPLETE as a failure.
Fix the check to only break out on results other than RSI_SUCCESS or
RSI_INCOMPLETE.
This ensures partial reports are handled correctly and avoids spurious
-ENXIO errors when generating attestation reports.
Fixes: 7999edc484ca ("virt: arm-cca-guest: TSM_REPORT support for realms")
Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Reported-by: Jagdish Gediya <Jagdish.Gediya@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
resctrl_mon_ctx_waiters is not used outside of this file, so make it
static. This fixes the sparse warning:
drivers/resctrl/mpam_resctrl.c:25:1: warning: symbol 'resctrl_mon_ctx_waiters' was not declared. Should it be static?
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603281842.c2K96tJA-lkp@intel.com/
Fixes: 2a3c79c61539 ("arm_mpam: resctrl: Allow resctrl to allocate monitors")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Add cputype definitions for C1-Pro. These will be used for errata
detection in subsequent patches.
These values can be found in "Table A-303: MIDR_EL1 bit descriptions" in
issue 07 of the C1-Pro TRM:
https://documentation-service.arm.com/static/6930126730f8f55a656570af
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Building ARCH=um on glibc >= 2.43 fails:
arch/um/drivers/cow_user.c: error: implicit declaration of
function 'strrchr' [-Wimplicit-function-declaration]
glibc 2.43's C23 const-preserving strrchr() macro does not survive
UML's global -Dstrrchr=kernel_strrchr remap from arch/um/Makefile.
Call kernel_strrchr() directly in cow_user.c so the source no longer
depends on the -D rewrite.
Fixes: 2c51a4bc0233 ("um: fix strrchr() problems")
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260408070102.2325572-1-michael.bommarito@gmail.com
[remove unnecessary 'extern']
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both platform_early.c and platform_early.h have an extra dash in
their SPDX-License-Identifier lines. Use the correct (single-dash)
syntax for these lines.
Signed-off-by: Tim Bird <tim.bird@sony.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
When building for 32-bit ARM, there is a warning when using the %llx
specifier to print a resource_size_t variable:
drivers/perf/arm-cmn.c: In function 'arm_cmn_init_dtc':
drivers/perf/arm-cmn.c:2149:73: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'resource_size_t' {aka 'unsigned int'} [-Werror=format=]
2149 | "Failed to request DTC region 0x%llx\n", base);
| ~~~^ ~~~~
| | |
| | resource_size_t {aka unsigned int}
| long long unsigned int
| %x
Use the %pa specifier to handle the possible sizes of phys_addr_t
properly. This requires passing the variable by reference.
Fixes: 5394396ff548 ("perf/arm-cmn: Stop claiming entire iomem region")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Robin murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When enabling Clang's Context Analysis (aka. Thread Safety Analysis) on
kernel/futex/core.o (see Peter's changes at [1]), in arm64 LTO builds we
could see:
| kernel/futex/core.c:982:1: warning: spinlock 'atomic ? __u.__val : q->lock_ptr' is still held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:976:2: note: spinlock acquired here
| 976 | spin_lock(lock_ptr);
| | ^
| kernel/futex/core.c:982:1: warning: expecting spinlock 'q->lock_ptr' to be held at the end of function [-Wthread-safety-analysis]
| 982 | }
| | ^
| kernel/futex/core.c:966:6: note: spinlock acquired here
| 966 | void futex_q_lockptr_lock(struct futex_q *q)
| | ^
| 2 warnings generated.
Where we have:
extern void futex_q_lockptr_lock(struct futex_q *q) __acquires(q->lock_ptr);
..
void futex_q_lockptr_lock(struct futex_q *q)
{
spinlock_t *lock_ptr;
/*
* See futex_unqueue() why lock_ptr can change.
*/
guard(rcu)();
retry:
>> lock_ptr = READ_ONCE(q->lock_ptr);
spin_lock(lock_ptr);
...
}
At the time of the above report (prior to removal of the 'atomic' flag),
Clang Thread Safety Analysis's alias analysis resolved 'lock_ptr' to
'atomic ? __u.__val : q->lock_ptr' (now just '__u.__val'), and used
this as the identity of the context lock given it cannot "see through"
the inline assembly; however, we want 'q->lock_ptr' as the canonical
context lock.
While for code generation the compiler simplified to '__u.__val' for
pointers (8 byte case -> 'atomic' was set), TSA's analysis (a) happens
much earlier on the AST, and (b) would be the wrong deduction.
Now that we've gotten rid of the 'atomic' ternary comparison, we can
return '__u.__val' through a pointer that we initialize with '&x', but
then update via a pointer-to-pointer. When READ_ONCE()'ing a context
lock pointer, TSA's alias analysis does not invalidate the initial alias
when updated through the pointer-to-pointer, and we make it effectively
"see through" the __READ_ONCE().
Code generation is unchanged.
Link: https://lkml.kernel.org/r/20260121110704.221498346@infradead.org [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221040.TeM0ihff-lkp@intel.com/
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Boqun Feng <boqun@kernel.org>
Reviewed-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
rsi_get_realm_config() passes its argument to virt_to_phys(), but
&config is a kernel image address and not a linear-map alias.
On arm64 this triggers the below warning:
virt_to_phys used for non-linear address: (____ptrval____) (config+0x0/0x1000)
WARNING: arch/arm64/mm/physaddr.c:15 at __virt_to_phys+0x50/0x70, CPU#0: swapper/0
Modules linked in:
.....
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __virt_to_phys+0x50/0x70
lr : __virt_to_phys+0x4c/0x70
.....
......
Call trace:
__virt_to_phys+0x50/0x70 (P)
arm64_rsi_init+0xa0/0x1b8
setup_arch+0x13c/0x1a0
start_kernel+0x68/0x398
__primary_switched+0x88/0x90
Pass lm_alias(&config) instead so the RSI call uses the linear-map
alias of the same buffer and avoids the boot-time warning.
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It has been reported that since commit 752a0d1d483e9 ("arm64: mm:
Provide level hint for flush_tlb_page()"), the arm64
check_hugetlb_options selftest has been locking up while running "Check
child hugetlb memory with private mapping, sync error mode and mmap
memory".
This is due to hugetlb (and THP) helpers casting their PMD/PUD entries
to PTE and calling __ptep_set_access_flags(), which issues a
__flush_tlb_page(). Now that this is hinted for level 3, in this case,
the TLB entry does not get evicted and we end up in a spurious fault
loop.
Fix this by creating a __ptep_set_access_flags_anysz() function which
takes the pgsize of the entry. It can then add the appropriate hint. The
"_anysz" approach is the established pattern for problems of this class.
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Fixes: 752a0d1d483e ("arm64: mm: Provide level hint for flush_tlb_page()")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Replace all TTBR_CNP_BIT macro instances with TTBRx_EL1_CNP_BIT which
is a standard field from tools sysreg format. Drop the now redundant
custom macro TTBR_CNP_BIT. No functional change.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The function executes a CBEQ instruction which is valid if the CPU
supports the CMPBR extension. The CBEQ branches to skip the following
UDF instruction, and no SIGILL is generated. Otherwise, it will
generate a SIGILL.
Signed-off-by: Yifan Wu <wuyifan50@huawei.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that
support LSUI are unlikely to support a 32-bit runtime. Rather than
emulating the SWP instruction using LSUI instructions in order to remove
PAN toggling, simply disable SWP emulation.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: some tweaks to the in-code comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MPAM (Memory Partitioning and Monitoring) is now exposed to user-space via
resctrl. Add some documentation so the user knows what features to expect.
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
Linear and vmemmap mappings that get torn down during a memory hot remove
operation might contain leaf level entries on any page table level. If the
requested memory range's linear or vmemmap mappings falls within such leaf
entries, new mappings need to be created for the remaining memory mapped on
the leaf entry earlier, following standard break before make aka BBM rules.
But kernel cannot tolerate BBM and hence remapping to fine grained leaves
would not be possible on systems without BBML2_NOABORT.
Currently memory hot remove operation does not perform such restructuring,
and so removing memory ranges that could split a kernel leaf level mapping
need to be rejected.
While memory_hotplug.c does appear to permit hot removing arbitrary ranges
of memory, the higher layers that drive memory_hotplug (e.g. ACPI, virtio,
...) all appear to treat memory as fixed size devices. So it is impossible
to hot unplug a different amount than was previously hot plugged, and hence
we should never see a rejection in practice, but adding the check makes us
robust against a future change.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.
Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().
There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When __switch_to() switches from a 'prev' task to a 'next' task, various
pieces of CPU state are expected to have specific values, such that
these do not need to be saved/restored. If any of these hold an
unexpected value when switching away from the prev task, they could lead
to surprising behaviour in the context of the next task, and it would be
difficult to determine where they were configured to their unexpected
value.
Add some checks for DAIF and PMR at task-switch time so that we can
detect such issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add the missing trailing newline to the dev_err() message
printed when SDEI event registration fails.
This keeps the error output as a properly terminated log line.
Fixes: a2a591fb76e6 ("ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device")
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Haoyu Lu <hechushiguitu666@gmail.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When adding generation for the kernel internal constants for hwcaps the
generated file was not explicitly flagged as such in the build system,
causing it to be regenerated on each build. This wasn't obvious when the
series the change was included in was developed since it was all about
changes that trigger rebuilds anyway.
Fixes: abed23c3c44f ("arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps")
Reported-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Dan Carpenter reports that, in mpam_resctrl_alloc_domain(), any_mon_comp is
used in an 'if' condition when it may be uninitialized. Initialize it to
NULL so that the check behaves correctly when no monitor components are
found.
Reported-by: Dan Carpenter <error27@gmail.com>
Fixes: 264c285999fc ("arm_mpam: resctrl: Add monitor initialisation and domain boilerplate")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
The prevention mechanism against timer interrupt starvation missed to reset
the next_event_forced flag in a couple of places:
- When the clock event state changes. That can cause the flag to be
stale over a shutdown/startup sequence
- When a non-forced event is armed, which then prevents rearming before
that event. If that event is far out in the future this will cause
missed timer interrupts.
- In the suspend wakeup handler.
That led to stalls which have been reported by several people.
Add the missing resets, which fixes the problems for the reporters.
Fixes: d6e152d905bd ("clockevents: Prevent timer interrupt starvation")
Reported-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Reported-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com
Link: https://patch.msgid.link/87340xfeje.ffs@tglx
The printk ringbuffer implementation is described in the comment as
using three ringbuffers, but the current implementation uses two (desc
and data). Update the comment so it matches the code.
Fix few more known issues in the comments.
Signed-off-by: Loïc Grégoire <loicgre@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20260328021855.53956-1-loicgre@gmail.com
[pmladek@suse.com: Fixed few more issues in the comments by John Ogness.]
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Replace the deprecated[1] strncpy() with strnlen() on the source
followed by memcpy_and_pad().
This function is a chunk callback for UML's strncpy_from_user()
implementation, called by buffer_op() to process userspace memory one
page at a time. The source is a kernel-mapped userspace address that
is not guaranteed to be NUL-terminated; "len" bounds how many bytes
to read from it.
By measuring the source string length first with strnlen(), we avoid
reading past the NUL terminator in the source. memcpy_and_pad() then
copies the string content and zero-fills the remainder of the chunk,
preserving the original strncpy() behavior exactly: copy up to the
first NUL, then pad with zeros to the full length.
strtomem_pad() would be the idiomatic helper for this strnlen() +
memcpy_and_pad() pattern, but it requires a compile-time-determinable
destination size (via ARRAY_SIZE()). Here the destination is a char *
into a caller-provided buffer and the chunk length is a runtime value,
so the explicit two-step is necessary.
No behavioral change: the same bytes are written to the destination
(string content followed by zero padding), the pointer advances by
the same amount, and the NUL-found return condition is unchanged.
Link: https://github.com/KSPP/linux/issues/90 [1]
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260323171713.work.839-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Include <linux/io.h> to avoid depending on <linux/backlight.h>
for including it. Declares __raw_readb() and __raw_writeb().
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510282206.wI0HrqcK-lkp@intel.com/
Fixes: 243ce64b2b37 ("backlight: Do not include <linux/fb.h> in header file")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Thompson (RISCstar) <danielt@kernel.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Lee Jones <lee@kernel.org>
Cc: Daniel Thompson <danielt@kernel.org>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Check devm_ioremap() return value for NULL instead of ERR_PTR and return
-ENOMEM on failure. devm_ioremap() never returns ERR_PTR, using IS_ERR()
skips the error path and may cause a NULL pointer dereference.
Fixes: 5394396ff548 ("perf/arm-cmn: Stop claiming entire iomem region")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Will Deacon <will@kernel.org>
Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
1. Replace _Generic-based __unqual_scalar_typeof() with more complete
__rwonce_typeof_unqual(). This strips qualifiers from all types, not
just integer types, which is required to be able to assign (must be
non-const) to __u.__val in the non-atomic case (required for #2).
One subtle point here is that non-integer types of __val could be const
or volatile within the union with the old __unqual_scalar_typeof(), if
the passed variable is const or volatile. This would then result in a
forced load from the stack if __u.__val is volatile; in the case of
const, it does look odd if the underlying storage changes, but the
compiler is told said member is "const" -- it smells like UB.
2. Eliminate the atomic flag and ternary conditional expression. Move
the fallback volatile load into the default case of the switch,
ensuring __u is unconditionally initialized across all paths.
The statement expression now unconditionally returns __u.__val.
This refactoring appears to help the compiler improve (or fix) code
generation.
With a defconfig + LTO + debug options builds, we observe different
codegen for the following functions:
btrfs_reclaim_sweep (708 -> 1032 bytes)
btrfs_sinfo_bg_reclaim_threshold_store (200 -> 204 bytes)
check_mem_access (3652 -> 3692 bytes) [inlined bpf_map_is_rdonly]
console_flush_all (1268 -> 1264 bytes)
console_lock_spinning_disable_and_check (180 -> 176 bytes)
igb_add_filter (640 -> 636 bytes)
igb_config_tx_modes (2404 -> 2400 bytes)
kvm_vcpu_on_spin (480 -> 476 bytes)
map_freeze (376 -> 380 bytes)
netlink_bind (1664 -> 1656 bytes)
nmi_cpu_backtrace (404 -> 400 bytes)
set_rps_cpu (516 -> 520 bytes)
swap_cluster_readahead (944 -> 932 bytes)
tcp_accecn_third_ack (328 -> 336 bytes)
tcp_create_openreq_child (1764 -> 1772 bytes)
tcp_data_queue (5784 -> 5892 bytes)
tcp_ecn_rcv_synack (620 -> 628 bytes)
xen_manage_runstate_time (944 -> 896 bytes)
xen_steal_clock (340 -> 296 bytes)
Increase of some functions are due to more aggressive inlining due to
better codegen (in this build, e.g. bpf_map_is_rdonly is no longer
present due to being inlined completely).
NOTE: The return-value-of-function-drops-qualifiers hack was first
suggested by Al Viro in [1], which notes some of its limitations which
make it unsuitable for a general __unqual_scalar_typeof() replacement.
Notably, array types are not supported, and GCC 8.1-8.3 still fail. Why
should we use it here? READ_ONCE() does not support reading whole
arrays, and the GCC version problem only affects 3 minor releases of a
very ancient still-supported GCC version; not only that, this arm64
READ_ONCE() version is currently only activated by LTO builds, which
to-date are *only supported by Clang*!
Link: https://lore.kernel.org/all/20260111182010.GH3634291@ZenIV/ [1]
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Previously tlb invalidations issued by __flush_tlb_page() did not
contain a level hint. From the core API documentation, this function is
clearly only ever intended to target level 3 (PTE) tlb entries:
| 4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)``
|
| This time we need to remove the PAGE_SIZE sized translation
| from the TLB.
However, the arm64 documentation is more relaxed allowing any last level:
| this operation only invalidates a single, last-level page-table
| entry and therefore does not affect any walk-caches
It turns out that the function was actually being used to invalidate a
level 2 mapping via flush_tlb_fix_spurious_fault_pmd(). The bug was
benign because the level hint was not set so the HW would still
invalidate the PMD mapping, and also because the TLBF_NONOTIFY flag was
set, the bounds of the mapping were never used for anything else.
Now that we have the new and improved range-invalidation API, it is
trival to fix flush_tlb_fix_spurious_fault_pmd() to explicitly flush the
whole range (locally, without notification and last level only). So
let's do that, and then update __flush_tlb_page() to hint level 3.
Reviewed-by: Linu Cherian <linu.cherian@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
[catalin.marinas@arm.com: use "level 3" in the __flush_tlb_page() description]
[catalin.marinas@arm.com: tweak the commit message to include the core API text]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Replace all TTBR_ASID_MASK macro instances with TTBRx_EL1_ASID_MASK which
is a standard field mask from tools sysreg format. Drop the now redundant
custom macro TTBR_ASID_MASK. No functional change.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since Armv9.6, FEAT_LSUI supplies the load/store instructions for
previleged level to access to access user memory without clearing
PSTATE.PAN bit.
Add Kconfig option entry for FEAT_LSUI.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
During a memory hot remove operation, both linear and vmemmap mappings for
the memory range being removed, get unmapped via unmap_hotplug_range() but
mapped pages get freed only for vmemmap mapping. This is just a sequential
operation where each table entry gets cleared, followed by a leaf specific
TLB flush, and then followed by memory free operation when applicable.
This approach was simple and uniform both for vmemmap and linear mappings.
But linear mapping might contain CONT marked block memory where it becomes
necessary to first clear out all entire in the range before a TLB flush.
This is as per the architecture requirement. Hence batch all TLB flushes
during the table tear down walk and finally do it in unmap_hotplug_range().
Prior to this fix, it was hypothetically possible for a speculative access
to a higher address in the contiguous block to fill the TLB with shattered
entries for the entire contiguous range after a lower address had already
been cleared and invalidated. Due to the table entries being shattered, the
subsequent TLB invalidation for the higher address would not then clear the
TLB entries for the lower address, meaning stale TLB entries could persist.
Besides it also helps in improving the performance via TLBI range operation
along with reduced synchronization instructions. The time spent executing
unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal
in KVM guest.
This scheme is not applicable during vmemmap mapping tear down where memory
needs to be freed and hence a TLB flush is required after clearing out page
table entry.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
[ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[ 15.513762] Mem abort info:
[ 15.527245] ESR = 0x0000000096000046
[ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits
[ 15.572146] SET = 0, FnV = 0
[ 15.592141] EA = 0, S1PTW = 0
[ 15.612694] FSC = 0x06: level 2 translation fault
[ 15.640644] Data abort info:
[ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP
[ 15.886394] Modules linked in:
[ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[ 15.935258] Hardware name: linux,dummy-virt (DT)
[ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[ 16.006163] lr : swiotlb_bounce+0xf4/0x158
[ 16.024145] sp : ffff80008000b8f0
[ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[ 16.349071] Call trace:
[ 16.360143] __pi_memcpy_generic+0x128/0x22c (P)
[ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4
[ 16.400282] swiotlb_map+0x5c/0x228
[ 16.415984] dma_map_phys+0x244/0x2b8
[ 16.432199] dma_map_page_attrs+0x44/0x58
[ 16.449782] virtqueue_map_page_attrs+0x38/0x44
[ 16.469596] virtqueue_map_single_attrs+0xc0/0x130
[ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc
[ 16.510355] try_fill_recv+0x2a4/0x584
[ 16.526989] virtnet_open+0xd4/0x238
[ 16.542775] __dev_open+0x110/0x24c
[ 16.558280] __dev_change_flags+0x194/0x20c
[ 16.576879] netif_change_flags+0x24/0x6c
[ 16.594489] dev_change_flags+0x48/0x7c
[ 16.611462] ip_auto_config+0x258/0x1114
[ 16.628727] do_one_initcall+0x80/0x1c8
[ 16.645590] kernel_init_freeable+0x208/0x2f0
[ 16.664917] kernel_init+0x24/0x1e0
[ 16.680295] ret_from_fork+0x10/0x20
[ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[ 16.723106] ---[ end trace 0000000000000000 ]---
[ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[ 16.818966] PHYS_OFFSET: 0xfff1000080000000
[ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[ 16.862904] Memory Limit: none
[ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.
The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.
But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.
But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.
Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().
The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.
Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.
Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.
In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The code to set MBA's alloc_capable to true appears to be trying to
restore alloc_capable on unmount. This can never work because
resctrl_arch_set_cdp_enabled() is never invoked with RDT_RESOURCE_MBA
as the rid parameter. Consequently,
mpam_resctrl_controls[RDT_RESOURCE_MBA].cdp_enabled always remains false.
The alloc_capable setting in resctrl_arch_set_cdp_enabled() is to
re-enable MBA if the caller opts in to separate control values using
CDP for this resource. This doesn't happen today.
Add a comment to describe this.
However a bug remains where MBA allocation is permanently disabled after
the mount with CDP option. Remounting without CDP cannot restore the MBA
partition capability.
Add a check to re-enable MBA when CDP is disabled, which happens on
unmount.
Fixes: 6789fb99282c ("arm_mpam: resctrl: Add CDP emulation")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
[ morse: Added comment for existing code, added hunk to fix this bug from
Ben H ]
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Add __tlbi_sync_s1ish_kernel() similar to __tlbi_sync_s1ish() and use it
for kernel TLB maintenance. Also use this function in flush_tlb_all()
which is only used in relation to kernel mappings. Subsequent patches
can differentiate between workarounds that apply to user only or both
user and kernel.
A subsequent patch will add mm_struct to __tlbi_sync_s1ish(). Since
arch_tlbbatch_flush() is not specific to an mm, add a corresponding
__tlbi_sync_s1ish_batch() helper.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull x86 platform driver updates from Ilpo Järvinen:
"asus-wmi:
- Retain battery charge threshold during boot which avoids
unsolicited change to 100%. Return -ENODATA when the limit
is not yet known
- Improve screenpad power/brightness handling consistency
- Fix screenpad brightness range
barco-p50-gpio:
- Normalize gpio_get return values
bitland-mifs-wmi:
- Add driver for Bitland laptops (supports platform profile,
hwmon, kbd backlight, gpu mode, hotkeys, and fan boost)
dell_rbu:
- Fix using uninitialized value in sysfs write function
dell-wmi-sysman:
- Respect destination length when constructing enum strings
hp-wmi:
- Propagate fan setting apply failures and log an error
- Fix sysfs write vs work handler cancel_delayed_work_sync() deadlock
- Correct keepalive schedule_delayed_work() to mod_delayed_work()
- Fix u8 underflows in GPU delta calculation
- Use mutex to protect fan pwm/mode
- Ignore kbd backlight and FnLock key events that are handled by FW
- Fix fan table parsing (use correct field)
- Add support for Omen 14-fb0xxx, 16-n0xxx, 16-wf1xxx, and
Omen MAX 16-ak0xxxx
input: trackpoint & thinkpad_acpi:
- Enable doubletap by default and add sysfs enable/disable
int3472:
- Add support for GPIO type 0x02 (IR flood LED)
intel-speed-select: (updated to v1.26)
- Avoid using current base frequency as maximum
- Fix CPU extended family ID decoding
- Fix exit code
- Improve error reporting
intel/vsec:
- Refactor to support ACPI-enumerated PMT endpoints.
pcengines-apuv2:
- Attach software node to the gpiochip
uniwill:
- Refactor hwmon to smaller parts to accomodate HW diversity
- Support USB-C power/performance priority switch through sysfs
- Add another XMG Fusion 15 (L19) DMI vendor
- Enable fine-grained features to device lineup mapping
wmi:
- Perform output size check within WMI core to allow simpler WMI
drivers
misc:
- acpi_driver -> platform driver conversions (a large number of
changes from Rafael J. Wysocki)
- cleanups / refactoring / improvements"
* tag 'platform-drivers-x86-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (106 commits)
platform/x86: hp-wmi: Add support for Omen 16-wf1xxx (8C77)
platform/x86: hp-wmi: Add support for Omen 16-n0xxx (8A44)
platform/x86: hp-wmi: Add support for OMEN MAX 16-ak0xxx (8D87)
platform/x86: hp-wmi: fix fan table parsing
platform/x86: hp-wmi: add Omen 14-fb0xxx (board 8C58) support
platform/wmi: Replace .no_notify_data with .min_event_size
platform/wmi: Extend wmidev_query_block() to reject undersized data
platform/wmi: Extend wmidev_invoke_method() to reject undersized data
platform/wmi: Prepare to reject undersized unmarshalling results
platform/wmi: Convert drivers to use wmidev_invoke_procedure()
platform/wmi: Add wmidev_invoke_procedure()
platform/x86: int3472: Add support for GPIO type 0x02 (IR flood LED)
platform/x86: int3472: Parameterize LED con_id in registration
platform/x86: int3472: Rename pled to led in LED registration code
platform/x86: int3472: Use local variable for LED struct access
platform/x86: thinkpad_acpi: remove obsolete TODO comment
platform/x86: dell-wmi-sysman: bound enumeration string aggregation
platform/x86: hp-wmi: Ignore backlight and FnLock events
platform/x86: uniwill-laptop: Fix signedness bug
platform/x86: dell_rbu: avoid uninit value usage in packet_size_write()
...
Pull jfs updates from Dave Kleikamp:
"More robust data integrity checking and some fixes"
* tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy:
jfs: avoid -Wtautological-constant-out-of-range-compare warning again
JFS: always load filesystem UUID during mount
jfs: hold LOG_LOCK on umount to avoid null-ptr-deref
jfs: Set the lbmDone flag at the end of lbmIODone
jfs: fix corrupted list in dbUpdatePMap
jfs: add dmapctl integrity check to prevent invalid operations
jfs: add dtpage integrity check to prevent index/pointer overflows
jfs: add dtroot integrity check to prevent index out-of-bounds
The local variable 'decode' is only used as a boolean value - change its
data type from int to bool accordingly.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260407181835.1053072-2-thorsten.blum@linux.dev
Signed-off-by: Petr Mladek <pmladek@suse.com>
GCC < 12.1 can miscompile printf_kunit's errptr() test when branch
profiling is enabled. BUILD_BUG_ON(IS_ERR(PTR)) is a constant false
expression, but CONFIG_TRACE_BRANCH_PROFILING and
CONFIG_PROFILE_ALL_BRANCHES make the IS_ERR() path side-effectful.
GCC's IPA splitter can then outline the cold assert arm into
errptr.part.* and leave that clone with an unconditional
__compiletime_assert_*() call, causing a false build failure.
This started showing up after test_hashed() became a macro and moved its
local buffer into errptr(), which changed GCC's inlining and splitting
decisions enough to expose the compiler bug.
Workaround the problem by disabling the branch profiling for
printf_kunit.o. It is a straightforward and acceptable solution.
The workaround can be removed once the minimum GCC includes commit
76fe49423047 ("Fix tree-optimization/101941: IPA splitting out
function with error attribute"), which first shipped in GCC 12.1.
Fixes: 9bfa52dac27a ("printf: convert test_hashed into macro")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604030636.NqjaJvYp-lkp@intel.com/
Cc: stable@vger.kernel.org
Acked-by: Tamir Duberstein <tamird@kernel.org>
Link: https://patch.msgid.link/ad5gJAX9f6dSQluz@pathway.suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.com>
get_data() has a sanity check for regular data blocks to ensure at
least space for the ID exists. But a regular block should also have
at least 1 byte of data (otherwise it would be data-less instead of
regular).
Expand the get_data() block size sanity check to additionally expect
at least 1 byte of data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260326133809.8045-2-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>