commits

Pull vfs fixes from Christian Brauner:

- Fix netfs_limit_iter() hitting BUG() when an ITER_KVEC iterator
reaches it via core dump writes to 9P filesystems. Add ITER_KVEC
handling following the same pattern as the existing ITER_BVEC code.

- Fix a NULL pointer dereference in the netfs unbuffered write retry
path when the filesystem (e.g., 9P) doesn't set the prepare_write
operation.

- Clear I_DIRTY_TIME in sync_lazytime for filesystems implementing
->sync_lazytime. Without this the flag stays set and may cause
additional unnecessary calls during inode deactivation.

- Increase tmpfs size in mount_setattr selftests. A recent commit
bumped the ext4 image size to 2 GB but didn't adjust the tmpfs
backing store, so mkfs.ext4 fails with ENOSPC writing metadata.

- Fix an invalid folio access in iomap when i_blkbits matches the folio
size but differs from the I/O granularity. The cur_folio pointer
would not get invalidated and iomap_read_end() would still be called
on it despite the IO helper owning it.

- Fix hash_name() docstring.

- Fix read abandonment during netfs retry where the subreq variable
used for abandonment could be uninitialized on the first pass or
point to a deleted subrequest on later passes.

- Don't block sync for filesystems with no data integrity guarantees.
Add a SB_I_NO_DATA_INTEGRITY superblock flag replacing the per-inode
AS_NO_DATA_INTEGRITY mapping flag so sync kicks off writeback but
doesn't wait for flusher threads. This fixes a suspend-to-RAM hang on
fuse-overlayfs where the flusher thread blocks when the fuse daemon
is frozen.

- Fix a lockdep splat in iomap when reads fail. iomap_read_end_io()
invokes fserror_report() which calls igrab() taking i_lock in hardirq
context while i_lock is normally held with interrupts enabled. Kick
failed read handling to a workqueue.

- Remove the redundant netfs_io_stream::front member and use
stream->subrequests.next instead, fixing a potential issue in the
direct write code path.

* tag 'vfs-7.0-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the handling of stream->front by removing it
iomap: fix lockdep complaint when reads fail
writeback: don't block sync for filesystems with no data integrity guarantees
netfs: Fix read abandonment during retry
vfs: fix docstring of hash_name()
iomap: fix invalid folio access when i_blkbits differs from I/O granularity
selftests/mount_setattr: increase tmpfs size for idmapped mount tests
fs: clear I_DIRTY_TIME in sync_lazytime
netfs: Fix NULL pointer dereference in netfs_unbuffered_write() on retry
netfs: Fix kernel BUG in netfs_limit_iter() for ITER_KVEC iterators

5w ago

Linus Torvalds

fc9eae25

Merge tag 'phy-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

5w ago

David Howells

0e764b9d

netfs: Fix the handling of stream->front by removing it

6w ago

Linus Torvalds

a516c618

Merge tag 'dmaengine-fix-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

5w ago

Abel Vesa

81af9e40

phy: qcom: qmp-ufs: Fix SM8650 PCS table for Gear 4

2mo ago

Darrick J. Wong

f621324d

iomap: fix lockdep complaint when reads fail

Zorro Lang reported the following lockdep splat:

"While running fstests xfs/556 on kernel 7.0.0-rc4+ (HEAD=04a9f1766954),
a lockdep warning was triggered indicating an inconsistent lock state
for sb->s_type->i_lock_key.

"The deadlock might occur because iomap_read_end_io (called from a
hardware interrupt completion path) invokes fserror_report, which then
calls igrab. igrab attempts to acquire the i_lock spinlock. However,
the i_lock is frequently acquired in process context with interrupts
enabled. If an interrupt occurs while a process holds the i_lock, and
that interrupt handler calls fserror_report, the system deadlocks.

"I hit this warning several times by running xfs/556 (mostly) or
generic/648 on xfs. More details refer to below console log."

along with this dmesg, for which I've cleaned up the stacktraces:

run fstests xfs/556 at 2026-03-18 20:05:30
XFS (sda3): Mounting V5 Filesystem 396e9164-c45a-4e05-be9d-b38c2c5c6477
XFS (sda3): Ending clean mount
XFS (sda3): Unmounting Filesystem 396e9164-c45a-4e05-be9d-b38c2c5c6477
XFS (sda3): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (sda3): Ending clean mount
XFS (sda3): Unmounting Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Ending clean mount
device-mapper: table: 253:0: adding target device (start sect 209 len 1) caused an alignment inconsistency
device-mapper: table: 253:0: adding target device (start sect 210 len 62914350) caused an alignment inconsistency
buffer_io_error: 6 callbacks suppressed
Buffer I/O error on dev dm-0, logical block 209, async page read
Buffer I/O error on dev dm-0, logical block 209, async page read
XFS (dm-0): Unmounting Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Ending clean mount

================================
WARNING: inconsistent lock state
7.0.0-rc4+ #1 Tainted: G S W
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
od/2368602 [HC1[1]:SC0[0]:HE0:SE1] takes:
ff1100069f2b4a98 (&sb->s_type->i_lock_key#31){?.+.}-{3:3}, at: igrab+0x28/0x1a0
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x40d/0xbd0
lock_acquire.part.0+0xbd/0x260
_raw_spin_lock+0x37/0x80
unlock_new_inode+0x66/0x2a0
xfs_iget+0x67b/0x7b0 [xfs]
xfs_mountfs+0xde4/0x1c80 [xfs]
xfs_fs_fill_super+0xe86/0x17a0 [xfs]
get_tree_bdev_flags+0x312/0x590
vfs_get_tree+0x8d/0x2f0
vfs_cmd_create+0xb2/0x240
__do_sys_fsconfig+0x3d8/0x9a0
do_syscall_64+0x13a/0x1520
entry_SYSCALL_64_after_hwframe+0x76/0x7e
irq event stamp: 3118
hardirqs last enabled at (3117): [<ffffffffb54e4ad8>] _raw_spin_unlock_irq+0x28/0x50
hardirqs last disabled at (3118): [<ffffffffb54b84c9>] common_interrupt+0x19/0xe0
softirqs last enabled at (3040): [<ffffffffb290ca28>] handle_softirqs+0x6b8/0x950
softirqs last disabled at (3023): [<ffffffffb290ce4d>] __irq_exit_rcu+0xfd/0x250

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&sb->s_type->i_lock_key#31);
<Interrupt>
lock(&sb->s_type->i_lock_key#31);

*** DEADLOCK ***

1 lock held by od/2368602:
#0: ff1100069f2b4b58 (&sb->s_type->i_mutex_key#19){++++}-{4:4}, at: xfs_ilock+0x324/0x4b0 [xfs]

stack backtrace:
CPU: 15 UID: 0 PID: 2368602 Comm: od Kdump: loaded Tainted: G S W 7.0.0-rc4+ #1 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Hardware name: Dell Inc. PowerEdge R660/0R5JJC, BIOS 2.1.5 03/14/2024
Call Trace:
<IRQ>
dump_stack_lvl+0x6f/0xb0
print_usage_bug.part.0+0x230/0x2c0
mark_lock_irq+0x3ce/0x5b0
mark_lock+0x1cb/0x3d0
mark_usage+0x109/0x120
__lock_acquire+0x40d/0xbd0
lock_acquire.part.0+0xbd/0x260
_raw_spin_lock+0x37/0x80
igrab+0x28/0x1a0
fserror_report+0x127/0x2d0
iomap_finish_folio_read+0x13c/0x280
iomap_read_end_io+0x10e/0x2c0
clone_endio+0x37e/0x780 [dm_mod]
blk_update_request+0x448/0xf00
scsi_end_request+0x74/0x750
scsi_io_completion+0xe9/0x7c0
_scsih_io_done+0x6ba/0x1ca0 [mpt3sas]
_base_process_reply_queue+0x249/0x15b0 [mpt3sas]
_base_interrupt+0x95/0xe0 [mpt3sas]
__handle_irq_event_percpu+0x1f0/0x780
handle_irq_event+0xa9/0x1c0
handle_edge_irq+0x2ef/0x8a0
__common_interrupt+0xa0/0x170
common_interrupt+0xb7/0xe0
</IRQ>
<TASK>
asm_common_interrupt+0x26/0x40
RIP: 0010:_raw_spin_unlock_irq+0x2e/0x50
Code: 0f 1f 44 00 00 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 b5 73 5e fd 48 89 df e8 ed e2 5e fd e8 08 78 8f fd fb bf 01 00 00 00 <e8> 8d 56 4d fd 65 8b 05 46 d5 1d 03 85 c0 74 06 5b c3 cc cc cc cc
RSP: 0018:ffa0000027d07538 EFLAGS: 00000206
RAX: 0000000000000c2d RBX: ffffffffb6614bc8 RCX: 0000000000000080
RDX: 0000000000000000 RSI: ffffffffb6306a01 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffffb75efc67 R11: 0000000000000001 R12: ff1100015ada0000
R13: 0000000000000083 R14: 0000000000000002 R15: ffffffffb6614c10
folio_wait_bit_common+0x407/0x780
filemap_update_page+0x8e7/0xbd0
filemap_get_pages+0x904/0xc50
filemap_read+0x320/0xc20
xfs_file_buffered_read+0x2aa/0x380 [xfs]
xfs_file_read_iter+0x263/0x4a0 [xfs]
vfs_read+0x6cb/0xb70
ksys_read+0xf9/0x1d0
do_syscall_64+0x13a/0x1520

Zorro's diagnosis makes sense, so the solution is to kick the failed
read handling to a workqueue much like we added for writeback ioends in
commit 294f54f849d846 ("fserror: fix lockdep complaint when igrabbing
inode").

Cc: Zorro Lang <zlang@redhat.com>
Link: https://lore.kernel.org/linux-xfs/20260319194303.efw4wcu7c4idhthz@doltdoltdolt/
Fixes: a9d573ee88af98 ("iomap: report file I/O errors to the VFS")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://patch.msgid.link/20260323210017.GL6223@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>

6w ago

Linus Torvalds

32ee88da

Merge tag 'i2c-for-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

5w ago

Tomi Valkeinen

a17ce4bc

dmaengine: xilinx_dma: Fix reset related timeout with two-channel AXIDMA

A single AXIDMA controller can have one or two channels. When it has two
channels, the reset for both are tied together: resetting one channel
resets the other as well. This creates a problem where resetting one
channel will reset the registers for both channels, including clearing
interrupt enable bits for the other channel, which can then lead to
timeouts as the driver is waiting for an interrupt which never comes.

The driver currently has a probe-time work around for this: when a
channel is created, the driver also resets and enables the
interrupts. With two channels the reset for the second channel will
clear the interrupt enables for the first one. The work around in the
driver is just to manually enable the interrupts again in
xilinx_dma_alloc_chan_resources().

This workaround only addresses the probe-time issue. When channels are
reset at runtime (e.g., in xilinx_dma_terminate_all() or during error
recovery), there's no corresponding mechanism to restore the other
channel's interrupt enables. This leads to one channel having its
interrupts disabled while the driver expects them to work, causing
timeouts and DMA failures.

A proper fix is a complicated matter, as we should not reset the other
channel when it's operating normally. So, perhaps, there should be some
kind of synchronization for a common reset, which is not trivial to
implement. To add to the complexity, the driver also supports other DMA
types, like VDMA, CDMA and MCDMA, which don't have a shared reset.

However, when the two-channel AXIDMA is used in the (assumably) normal
use case, providing DMA for a single memory-to-memory device, the common
reset is a bit smaller issue: when something bad happens on one channel,
or when one channel is terminated, the assumption is that we also want
to terminate the other channel. And thus resetting both at the same time
is "ok".

With that line of thinking we can implement a bit better work around
than just the current probe time work around: let's enable the
AXIDMA interrupts at xilinx_dma_start_transfer() instead.
This ensures interrupts are enabled whenever a transfer starts,
regardless of any prior resets that may have cleared them.

This approach is also more logical: enable interrupts only when needed
for a transfer, rather than at resource allocation time, and, I think,
all the other DMA types should also use this model, but I'm reluctant to
do such changes as I cannot test them.

The reset function still enables interrupts even though it's not needed
for AXIDMA anymore, but it's common code for all DMA types (VDMA, CDMA,
MCDMA), so leave it unchanged to avoid affecting other variants.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine")
Link: https://patch.msgid.link/20260311-xilinx-dma-fix-v2-1-a725abb66e3c@ideasonboard.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

7w ago

Felix Gu

584b457f

phy: ti: j721e-wiz: Fix device node reference leak in wiz_get_lane_phy_types()

2mo ago

Joanne Koong

76f9377c

writeback: don't block sync for filesystems with no data integrity guarantees

Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot
guarantee data persistence on sync (eg fuse). For superblocks with this
flag set, sync kicks off writeback of dirty inodes but does not wait
for the flusher threads to complete the writeback.

This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in
commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings
in wait_sb_inodes()"). The flag belongs at the superblock level because
data integrity is a filesystem-wide property, not a per-inode one.
Having this flag at the superblock level also allows us to skip having
to iterate every dirty inode in wait_sb_inodes() only to skip each inode
individually.

Prior to this commit, mappings with no data integrity guarantees skipped
waiting on writeback completion but still waited on the flusher threads
to finish initiating the writeback. Waiting on the flusher threads is
unnecessary. This commit kicks off writeback but does not wait on the
flusher threads. This change properly addresses a recent report [1] for
a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting
on the flusher threads to finish:

Workqueue: pm_fs_sync pm_fs_sync_work_fn
Call Trace:
<TASK>
__schedule+0x457/0x1720
schedule+0x27/0xd0
wb_wait_for_completion+0x97/0xe0
sync_inodes_sb+0xf8/0x2e0
__iterate_supers+0xdc/0x160
ksys_sync+0x43/0xb0
pm_fs_sync_work_fn+0x17/0xa0
process_one_work+0x193/0x350
worker_thread+0x1a1/0x310
kthread+0xfc/0x240
ret_from_fork+0x243/0x280
ret_from_fork_asm+0x1a/0x30
</TASK>

On fuse this is problematic because there are paths that may cause the
flusher thread to block (eg if systemd freezes the user session cgroups
first, which freezes the fuse daemon, before invoking the kernel
suspend. The kernel suspend triggers ->write_node() which on fuse issues
a synchronous setattr request, which cannot be processed since the
daemon is frozen. Or if the daemon is buggy and cannot properly complete
writeback, initiating writeback on a dirty folio already under writeback
leads to writeback_get_folio() -> folio_prepare_writeback() ->
unconditional wait on writeback to finish, which will cause a hang).
This commit restores fuse to its prior behavior before tmp folios were
removed, where sync was essentially a no-op.

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: John <therealgraysky@proton.me>
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

7w ago

Linus Torvalds

ac354b5c

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

5w ago

Wolfram Sang

b0faf733

MAINTAINERS: drop outdated I2C website

5w ago

Marek Vasut

c7d812e3

dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction

7w ago

Yixun Lan

f0cf0a88

phy: k1-usb: add disconnect function support

2mo ago

David Howells

7e575234

netfs: Fix read abandonment during retry

7w ago

Linus Torvalds

b8a3bc85

Merge tag 'for-linus-7.0a-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

5w ago

Sean Christopherson

df837460

KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE

6w ago

Wolfram Sang

4c10830f

Merge tag 'i2c-host-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

6w ago

Marek Vasut

f61d1459

dmaengine: xilinx: xilinx_dma: Fix residue calculation for cyclic DMA

7w ago

Vladimir Oltean

a258d843

phy: lynx-28g: skip CDR lock workaround for lanes disabled in the device tree

2mo ago

Jori Koolstra

8fb6857f

vfs: fix docstring of hash_name()

7w ago

Linus Torvalds

f242ac4a

Merge tag 'x86-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5w ago

GuoHan Zhao

cd7e1fef

xen/privcmd: unregister xenstore notifier on module exit

6w ago

Sean Christopherson

aad885e7

KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE

6w ago

Linus Torvalds

c3692998

Linux 7.0-rc5 v7.0-rc5

6w ago

Pratap Nirujogi

e2f1ada8

i2c: designware: amdisp: Fix resume-probe race condition issue

6w ago

Marek Vasut

e9cc9539

dmaengine: xilinx: xilinx_dma: Fix dma_device directions

7w ago

Vladimir Oltean

48fafffc

phy: make PHY_COMMON_PROPS Kconfig symbol conditionally user-selectable

2mo ago

Joanne Koong

bd71fb3f

iomap: fix invalid folio access when i_blkbits differs from I/O granularity

7w ago

Linus Torvalds

47e3f23f

Merge tag 'timers-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5w ago

Peter Zijlstra

a3e93cac

x86/cpu: Add comment clarifying CRn pinning

6w ago

Linus Torvalds

0138af24

Merge tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

6w ago

Paolo Bonzini

6c6ba548

Merge tag 'kvm-s390-master-7.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

6w ago

Mikko Perttunen

ec69c9e8

i2c: tegra: Don't mark devices with pins as IRQ safe

6w ago

Stefan Eichenberger

13101db7

i2c: imx: ensure no clock is generated after last read

6w ago

Claudiu Beznea

89a8567d

dmaengine: sh: rz-dmac: Move CHCTRL updates under spinlock

7w ago

Linus Torvalds

6de23f81

Linux 7.0-rc1 v7.0-rc1

2mo ago

Christian Brauner

c465f559

selftests/mount_setattr: increase tmpfs size for idmapped mount tests

7w ago

Linus Torvalds

f087b0ba

Merge tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5w ago

Zhan Xusheng

5d16467a

alarmtimer: Fix argument order in alarm_timer_forward()

6w ago

Nikunj A Dadhania

3645eb7e

x86/fred: Fix early boot failures on SEV-ES/SNP guests

6w ago

Linus Torvalds

aba9da09

Merge tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

6w ago

Gao Xiang

2f0407ed

erofs: fix .fadvise() for page cache sharing

6w ago

Claudio Imbrenda

0a28e065

KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl

6w ago

Linus Torvalds

d5273fd3

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

6w ago

Stefan Eichenberger

f88e2e74

i2c: imx: fix i2c issue when reading multiple messages

6w ago

Claudiu Beznea

abb863e6

dmaengine: sh: rz-dmac: Protect the driver specific lists

7w ago

Linus Torvalds

fbf33803

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

2mo ago

Christoph Hellwig

f8b8820a

fs: clear I_DIRTY_TIME in sync_lazytime

7w ago

Linus Torvalds

21047b17

Merge tag 'irq-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5w ago

Davidlohr Bueso

210d36d8

futex: Clear stale exiting pointer in futex_lock_pi() retry path

6w ago

Borislav Petkov (AMD)

411df123

x86/cpu: Remove X86_CR4_FRED from the CR4 pinned bits mask

6w ago

Linus Torvalds

d2a43e7f

Merge tag 'hardening-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

6w ago

Joel Fernandes

a6fc88b2

srcu: Use irq_work to start GP in tiny SRCU

6w ago

Gao Xiang

938c4184

erofs: update the Kconfig description

6w ago

Claudio Imbrenda

a12cc7e3

KVM: s390: vsie: Fix guest page tables protection

6w ago

Linus Torvalds

ac57fa9f

Merge tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Revert "tracing: Remove pid in task_rename tracing output"

A change was made to remove the pid field from the task_rename event
because it was thought that it was always done for the current task
and recording the pid would be redundant. This turned out to be
incorrect and there are a few corner case where this is not true and
caused some regressions in tooling.

- Fix the reading from user space for migration

The reading of user space uses a seq lock type of logic where it uses
a per-cpu temporary buffer and disables migration, then enables
preemption, does the copy from user space, disables preemption,
enables migration and checks if there was any schedule switches while
preemption was enabled. If there was a context switch, then it is
considered that the per-cpu buffer could be corrupted and it tries
again. There's a protection check that tests if it takes a hundred
tries, it issues a warning and exits out to prevent a live lock.

This was triggered because the task was selected by the load balancer
to be migrated to another CPU, every time preemption is enabled the
migration task would schedule in try to migrate the task but can't
because migration is disabled and let it run again. This caused the
scheduler to schedule out the task every time it enabled preemption
and made the loop never exit (until the 100 iteration test
triggered).

Fix this by enabling and disabling preemption and keeping migration
enabled if the reading from user space needs to be done again. This
will let the migration thread migrate the task and the copy from user
space will likely pass on the next iteration.

- Fix trace_marker copy option freeing

The "copy_trace_marker" option allows a tracing instance to get a
copy of a write to the trace_marker file of the top level instance.
This is managed by a link list protected by RCU. When an instance is
removed, a check is made if the option is set, and if so
synchronized_rcu() is called.

The problem is that an iteration is made to reset all the flags to
what they were when the instance was created (to perform clean ups)
was done before the check of the copy_trace_marker option and that
option was cleared, so the synchronize_rcu() was never called.

Move the clearing of all the flags after the check of
copy_trace_marker to do synchronize_rcu() so that the option is still
set if it was before and the synchronization is performed.

- Fix entries setting when validating the persistent ring buffer

When validating the persistent ring buffer on boot up, the number of
events per sub-buffer is added to the sub-buffer meta page. The
validator was updating cpu_buffer->head_page (the first sub-buffer of
the per-cpu buffer) and not the "head_page" variable that was
iterating the sub-buffers. This was causing the first sub-buffer to
be assigned the entries for each sub-buffer and not the sub-buffer
that was supposed to be updated.

- Use "hash" value to update the direct callers

When updating the ftrace direct callers, it assigned a temporary
callback to all the callback functions of the ftrace ops and not just
the functions represented by the passed in hash. This causes an
unnecessary slow down of the functions of the ftrace_ops that is not
being modified. Only update the functions that are going to be
modified to call the ftrace loop function so that the update can be
made on those functions.

* tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
tracing: Fix trace_marker copy link list updates
tracing: Fix failure to read user space from system call trace events
tracing: Revert "tracing: Remove pid in task_rename tracing output"

6w ago

Daniel Borkmann

4a04d135

selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation

Add multiple test cases for linked register tracking with alu32 ops:

- Add a test that checks sync_linked_regs() regarding reg->id (the linked
target register) for BPF_ADD_CONST32 rather than known_reg->id (the
branch register).

- Add a test case for linked register tracking that exposes the cross-type
sync_linked_regs() bug. One register uses alu32 (w7 += 1, BPF_ADD_CONST32)
and another uses alu64 (r8 += 2, BPF_ADD_CONST64), both linked to the
same base register.

- Add a test case that exercises regsafe() path pruning when two execution
paths reach the same program point with linked registers carrying
different ADD_CONST flags (BPF_ADD_CONST32 from alu32 vs BPF_ADD_CONST64
from alu64). This particular test passes with and without the fix since
the pruning will fail due to different ranges, but it would still be
useful to carry this one as a regression test for the unreachable div
by zero.

With the fix applied all the tests pass:

# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
[...]
./test_progs -t verifier_linked_scalars
#602/1 verifier_linked_scalars/scalars: find linked scalars:OK
#602/2 verifier_linked_scalars/sync_linked_regs_preserves_id:OK
#602/3 verifier_linked_scalars/scalars_neg:OK
#602/4 verifier_linked_scalars/scalars_neg_sub:OK
#602/5 verifier_linked_scalars/scalars_neg_alu32_add:OK
#602/6 verifier_linked_scalars/scalars_neg_alu32_sub:OK
#602/7 verifier_linked_scalars/scalars_pos:OK
#602/8 verifier_linked_scalars/scalars_sub_neg_imm:OK
#602/9 verifier_linked_scalars/scalars_double_add:OK
#602/10 verifier_linked_scalars/scalars_sync_delta_overflow:OK
#602/11 verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
#602/12 verifier_linked_scalars/scalars_alu32_big_offset:OK
#602/13 verifier_linked_scalars/scalars_alu32_basic:OK
#602/14 verifier_linked_scalars/scalars_alu32_wrap:OK
#602/15 verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
#602/16 verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
#602/17 verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
#602/18 verifier_linked_scalars/alu32_negative_offset:OK
#602/19 verifier_linked_scalars/spurious_precision_marks:OK
#602 verifier_linked_scalars:OK
Summary: 1/19 PASSED, 0 SKIPPED, 0 FAILED

Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>