commits

When unregistered my self-written scx scheduler, the following panic
occurs.

[ 229.923133] Kernel text patching generated an invalid instruction at 0xffff80009bc2c1f8!
[ 229.923146] Internal error: Oops - BRK: 00000000f2000100 [#1] SMP
[ 230.077871] CPU: 48 UID: 0 PID: 1760 Comm: kworker/u583:7 Not tainted 7.0.0+ #3 PREEMPT(full)
[ 230.086677] Hardware name: NVIDIA GB200 NVL/P3809-BMC, BIOS 02.05.12 20251107
[ 230.093972] Workqueue: events_unbound bpf_map_free_deferred
[ 230.099675] Sched_ext: invariant_0.1.0_aarch64_unknown_linux_gnu_debug (disabling), task: runnable_at=-174ms
[ 230.116843] pc : 0xffff80009bc2c1f8
[ 230.120406] lr : dequeue_task_scx+0x270/0x2d0
[ 230.217749] Call trace:
[ 230.228515] 0xffff80009bc2c1f8 (P)
[ 230.232077] dequeue_task+0x84/0x188
[ 230.235728] sched_change_begin+0x1dc/0x250
[ 230.240000] __set_cpus_allowed_ptr_locked+0x17c/0x240
[ 230.245250] __set_cpus_allowed_ptr+0x74/0xf0
[ 230.249701] ___migrate_enable+0x4c/0xa0
[ 230.253707] bpf_map_free_deferred+0x1a4/0x1b0
[ 230.258246] process_one_work+0x184/0x540
[ 230.262342] worker_thread+0x19c/0x348
[ 230.266170] kthread+0x13c/0x150
[ 230.269465] ret_from_fork+0x10/0x20
[ 230.281393] Code: d4202000 d4202000 d4202000 d4202000 (d4202000)
[ 230.287621] ---[ end trace 0000000000000000 ]---
[ 231.160046] Kernel panic - not syncing: Oops - BRK: Fatal exception in interrupt

The root cause is that the JIT page backing ops->quiescent() is freed
before all callers of that function have stopped.

The expected ordering during teardown is:
bitmap_zero(sch->has_op) + synchronize_rcu()
-> guarantees no CPU will ever call sch->ops.* again
-> only THEN free the BPF struct_ops JIT page

bpf_scx_unreg() is supposed to enforce the order, but after
commit f4a6c506d118 ("sched_ext: Always bounce scx_disable() through
irq_work"), disable_work is no longer queued directly, causing
kthread_flush_work() to be a noop. Thus, the caller drops the struct_ops
map too early and poisoned with AARCH64_BREAK_FAULT before
disable_workfn ever execute.

So the subsequent dequeue_task() still sees SCX_HAS_OP(sch, quiescent)
as true and calls ops.quiescent, which hit on the poisoned page and BRK
panic.

Add a helper scx_flush_disable_work() so the future use cases that want
to flush disable_work can use it.
Also amend the call for scx_root_enable_workfn() and
scx_sub_enable_workfn() which have similar pattern in the error path.

Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

15d ago

zhidao su

4e3d7c89

sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched

16d ago

Tejun Heo

05909810

tools/sched_ext: scx_qmap: Silence task_ctx lookup miss

18d ago

Tejun Heo

4fe98529

rhashtable: Bounce deferred worker kick through irq_work

18d ago

Cheng-Yang Chou

5897ca15

selftests/sched_ext: Add non_scx_kfunc_deny test

19d ago

Cheng-Yang Chou

2d2b026c

sched_ext: Deny SCX kfuncs to non-SCX struct_ops programs

19d ago

Tejun Heo

87019cb6

sched_ext: Mark scx_sched_hash insecure_elasticity

2w ago

Herbert Xu

73bd1227

rhashtable: Restore insecure_elasticity toggle

2w ago

Linus Torvalds

3cd8b194

Merge tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd

3w ago

Linus Torvalds

d3d9443f

Merge tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

3w ago

Stefan Metzmacher

d09a040c

smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting

3w ago

Linus Torvalds

090748e6

Merge tag 'm68k-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

3w ago

Petr Mladek

448c0f8c

Merge branch 'for-7.1/module-function-test' into for-linus

3w ago

Stefan Metzmacher

25c2e349

smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error

3w ago

Linus Torvalds

948ef73f

Merge tag 'efi-next-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

3w ago

Kuan-Wei Chiu

56f29585

m68k: virt: Switch to qemu-virt-ctrl driver

3w ago

Marcos Paulo de Souza

920e5001

selftests: livepatch: test-ftrace: livepatch a traced function

2mo ago

Pablo Alessandro Santos Hugen

57000fe6

selftests/livepatch: add test for module function patching

5w ago

Stefan Metzmacher

3892007f

smb: smbdirect: fix copyright header of smbdirect.h

3w ago

Linus Torvalds

f0bf3eac

Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay
Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu)

- Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro)

- Conversions to const struct class in support of class_create()
deprecation (Jori Koolstra)

- Improve selftest compiler compatibility by avoiding initializer on
variable-length array (Manish Honap)

- Define new uAPI for drivers supporting migration to advise user-
space of new initial data for reducing target startup latency.
Implemented for mlx5 vfio-pci variant driver (Yishai Hadas)

- Enable vfio selftests on aarch64, not just cross-compiles reporting
arm64 (Ted Logan)

- Update vfio selftest driver support to include additional DSA devices
(Yi Lai)

- Unconditionally include debugfs root pointer in vfio device struct,
avoiding a build failure seen in hisi_acc variant driver without
debugfs otherwise (Arnd Bergmann)

- Add support for the s390 ISM (Internal Shared Memory) device via a
new variant driver. The device is unique in the size of its BAR space
(256TiB) and lack of mmap support (Julian Ruess)

- Enforce that vfio-pci drivers implement a name in their ops structure
for use in sequestering SR-IOV VFs (Alex Williamson)

- Prune leftover group notifier code (Paolo Bonzini)

- Fix Xe vfio-pci variant driver to avoid migration support as a
dependency in the reset path and missing release call (Michał
Winiarski)

* tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits)
vfio/xe: Add a missing vfio_pci_core_release_dev()
vfio/xe: Reorganize the init to decouple migration from reset
vfio: remove dead notifier code
vfio/pci: Require vfio_device_ops.name
MAINTAINERS: add VFIO ISM PCI DRIVER section
vfio/ism: Implement vfio_pci driver for ISM devices
vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it
vfio: unhide vdev->debug_root
vfio/qat: add support for Intel QAT 420xx VFs
vfio: selftests: Support DMR and GNR-D DSA devices
vfio: selftests: Build tests on aarch64
vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
vfio/mlx5: consider inflight SAVE during PRE_COPY
net/mlx5: Add IFC bits for migration state
vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set()
vfio: uapi: fix comment typo
vfio: mdev: replace mtty_dev->vd_class with a const struct class
...

3w ago

Thomas Huth

48a42821

efi/capsule-loader: fix incorrect sizeof in phys array reallocation

4w ago

Kuan-Wei Chiu

ad9d2cd0

power: reset: Add QEMU virt-ctrl driver

3w ago

Linus Torvalds

51ab33fc

Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

5mo ago

Stefan Metzmacher

735610d0

smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16

3w ago

Linus Torvalds

1d51b370

Merge tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy

3w ago

Michał Winiarski

493c7eff

vfio/xe: Add a missing vfio_pci_core_release_dev()

3w ago

Ard Biesheuvel

259e3e6f

efi: Tag memblock reservations of boot services regions as RSRV_KERN

4w ago

Geert Uytterhoeven

cbd3b8ef

m68k: defconfig: Update defconfigs for v7.0-rc1

4w ago

Linus Torvalds

02baaa67

Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

- Improve recovery from misbehaving BPF schedulers.

When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.

Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.

Add scx_cpu0 example scheduler to demonstrate this scenario.

- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.

- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.

- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.

- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.

- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.

- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.

* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...

5mo ago

Fushuai Wang

5cb55753

selftests: livepatch: use canonical ftrace path

6mo ago

Stefan Metzmacher

aa43bb2c

smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic

3w ago

Linus Torvalds

5414f3fd

Merge tag 'fs_for_v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

3w ago

Arnd Bergmann

dad98c5b

jfs: avoid -Wtautological-constant-out-of-range-compare warning again

7w ago

Michał Winiarski

1b81ed61

vfio/xe: Reorganize the init to decouple migration from reset

Attempting to issue reset on VF devices that don't support migration
leads to the following:

BUG: unable to handle page fault for address: 00000000000011f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 7443 Comm: xe_sriov_flr Tainted: G S U 7.0.0-rc1-lgci-xe-xe-4588-cec43d5c2696af219-nodebug+ #1 PREEMPT(lazy)
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
RIP: 0010:xe_sriov_vfio_wait_flr_done+0xc/0x80 [xe]
Code: ff c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 53 <83> bf f8 11 00 00 02 75 61 41 89 f4 85 f6 74 52 48 8b 47 08 48 89
RSP: 0018:ffffc9000f7c39b8 EFLAGS: 00010202
RAX: ffffffffa04d8660 RBX: ffff88813e3e4000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000f7c39c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888101a48800
R13: ffff88813e3e4150 R14: ffff888130d0d008 R15: ffff88813e3e40d0
FS: 00007877d3d0d940(0000) GS:ffff88890b6d3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000011f8 CR3: 000000015a762000 CR4: 0000000000f52ef0
PKRU: 55555554
Call Trace:
<TASK>
xe_vfio_pci_reset_done+0x49/0x120 [xe_vfio_pci]
pci_dev_restore+0x3b/0x80
pci_reset_function+0x109/0x140
reset_store+0x5c/0xb0
dev_attr_store+0x17/0x40
sysfs_kf_write+0x72/0x90
kernfs_fop_write_iter+0x161/0x1f0
vfs_write+0x261/0x440
ksys_write+0x69/0xf0
__x64_sys_write+0x19/0x30
x64_sys_call+0x259/0x26e0
do_syscall_64+0xcb/0x1500
? __fput+0x1a2/0x2d0
? fput_close_sync+0x3d/0xa0
? __x64_sys_close+0x3e/0x90
? x64_sys_call+0x1b7c/0x26e0
? do_syscall_64+0x109/0x1500
? __task_pid_nr_ns+0x68/0x100
? __do_sys_getpid+0x1d/0x30
? x64_sys_call+0x10b5/0x26e0
? do_syscall_64+0x109/0x1500
? putname+0x41/0x90
? do_faccessat+0x1e8/0x300
? __x64_sys_access+0x1c/0x30
? x64_sys_call+0x1822/0x26e0
? do_syscall_64+0x109/0x1500
? tick_program_event+0x43/0xa0
? hrtimer_interrupt+0x126/0x260
? irqentry_exit+0xb2/0x710
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7877d5f1c5a4
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fff48e5f908 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007877d5f1c5a4
RDX: 0000000000000001 RSI: 00007877d621b0c9 RDI: 0000000000000009
RBP: 0000000000000001 R08: 00005fb49113b010 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000202 R12: 00007877d621b0c9
R13: 0000000000000009 R14: 00007fff48e5fac0 R15: 00007fff48e5fac0
</TASK>

This is caused by the fact that some of the xe_vfio_pci_core_device
members needed for handling reset are only initialized as part of
migration init.

Fix the problem by reorganizing the code to decouple VF init from
migration init.

Fixes: 1f5556ec8b9ef ("vfio/xe: Add device specific vfio_pci driver variant for Intel graphics")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7352
Cc: stable@vger.kernel.org
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260410224948.900550-1-michal.winiarski@intel.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

3w ago

Ard Biesheuvel

a142d0ae

memblock: Permit existing reserved regions to be marked RSRV_KERN

4w ago

Thorsten Blum

59747bec

m68k: emu: Replace unbounded sprintf() in nfhd_init_one()

4w ago

Linus Torvalds

8449d325

Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

5mo ago

Zqiang

1dd6c84f

sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks

When loading the ebpf scheduler, the tasks in the scx_tasks list will
be traversed and invoke __setscheduler_class() to get new sched_class.
however, this would also incorrectly set the per-cpu migration
task's->sched_class to rt_sched_class, even after unload, the per-cpu
migration task's->sched_class remains sched_rt_class.

The log for this issue is as follows:

./scx_rustland --stats 1
[ 199.245639][ T630] sched_ext: "rustland" does not implement cgroup cpu.weight
[ 199.269213][ T630] sched_ext: BPF scheduler "rustland" enabled
04:25:09 [INFO] RustLand scheduler attached

bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0

sched_ext: BPF scheduler "rustland" disabled (unregistered from user space)
EXIT: unregistered from user space
04:25:21 [INFO] Unregister RustLand scheduler

bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/
{ printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_class); }'
Attaching 1 probe...
migration/0:24->rt_sched_class+0x0/0xe0
migration/1:27->rt_sched_class+0x0/0xe0
migration/2:33->rt_sched_class+0x0/0xe0
migration/3:39->rt_sched_class+0x0/0xe0
migration/4:45->rt_sched_class+0x0/0xe0
migration/5:52->rt_sched_class+0x0/0xe0
migration/6:58->rt_sched_class+0x0/0xe0
migration/7:64->rt_sched_class+0x0/0xe0

This commit therefore generate a new scx_setscheduler_class() and
add check for stop_sched_class to replace __setscheduler_class().

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

5mo ago

Song Liu

139560e8

livepatch: Match old_sympos 0 and 1 in klp_find_func()

When there is only one function of the same name, old_sympos of 0 and 1
are logically identical. Match them in klp_find_func().

This is to avoid a corner case with different toolchain behavior.

In this specific issue, two versions of kpatch-build were used to
build livepatch for the same kernel. One assigns old_sympos == 0 for
unique local functions, the other assigns old_sympos == 1 for unique
local functions. Both versions work fine by themselves. (PS: This
behavior change was introduced in a downstream version of kpatch-build.
This change does not exist in upstream kpatch-build.)

However, during livepatch upgrade (with the replace flag set) from a
patch built with one version of kpatch-build to the same fix built with
the other version of kpatch-build, livepatching fails with errors like:

[ 14.218706] sysfs: cannot create duplicate filename 'xxx/somefunc,1'
...
[ 14.219466] Call Trace:
[ 14.219468] <TASK>
[ 14.219469] dump_stack_lvl+0x47/0x60
[ 14.219474] sysfs_warn_dup.cold+0x17/0x27
[ 14.219476] sysfs_create_dir_ns+0x95/0xb0
[ 14.219479] kobject_add_internal+0x9e/0x260
[ 14.219483] kobject_add+0x68/0x80
[ 14.219485] ? kstrdup+0x3c/0xa0
[ 14.219486] klp_enable_patch+0x320/0x830
[ 14.219488] patch_init+0x443/0x1000 [ccc_0_6]
[ 14.219491] ? 0xffffffffa05eb000
[ 14.219492] do_one_initcall+0x2e/0x190
[ 14.219494] do_init_module+0x67/0x270
[ 14.219496] init_module_from_file+0x75/0xa0
[ 14.219499] idempotent_init_module+0x15a/0x240
[ 14.219501] __x64_sys_finit_module+0x61/0xc0
[ 14.219503] do_syscall_64+0x5b/0x160
[ 14.219505] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 14.219507] RIP: 0033:0x7f545a4bd96d
...
[ 14.219516] kobject: kobject_add_internal failed for somefunc,1 with
-EEXIST, don't try to register things with the same name ...

This happens because klp_find_func() thinks somefunc with old_sympos==0
is not the same as somefunc with old_sympos==1, and klp_add_object_nops
adds another xxx/func,1 to the list of functions to patch.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
[pmladek@suse.com: Fixed some typos.]
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>

6mo ago

Stefan Metzmacher

649c4755

smb: server: no longer use smbdirect_socket_set_custom_workqueue()

3w ago

Linus Torvalds

c4ef28fe

Merge tag 'fsnotify_for_v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

3w ago

Vasiliy Kovalev

25947cc5

ext2: reject inodes with zero i_nlink and valid mode in ext2_iget()

ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is
zero or i_dtime is set, treating them as deleted. However, the case of
i_nlink == 0 with a non-zero mode and zero dtime slips through. Since
ext2 has no orphan list, such a combination can only result from
filesystem corruption - a legitimate inode deletion always sets either
i_dtime or clears i_mode before freeing the inode.

A crafted image can exploit this gap to present such an inode to the
VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via
ext2_unlink(), ext2_rename() and ext2_rmdir():

WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295
vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477
do_unlinkat+0x53e/0x730 fs/namei.c:4541
__x64_sys_unlink+0xc6/0x110 fs/namei.c:4587
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>

WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rename+0x35e/0x850 fs/ext2/namei.c:374
vfs_rename+0xf2f/0x2060 fs/namei.c:5021
do_renameat2+0xbe2/0xd50 fs/namei.c:5178
__x64_sys_rename+0x7e/0xa0 fs/namei.c:5223
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>

WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336
CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1
Call Trace:
<TASK>
inode_dec_link_count include/linux/fs.h:2518 [inline]
ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311
vfs_rmdir+0x204/0x690 fs/namei.c:4348
do_rmdir+0x372/0x3e0 fs/namei.c:4407
__x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577
do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>

Extend the existing i_nlink == 0 check to also catch this case,
reporting the corruption via ext2_error() and returning -EFSCORRUPTED.
This rejects the inode at load time and prevents it from reaching any
of the namei.c paths.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Link: https://patch.msgid.link/20260404152011.2590197-1-kovalev@altlinux.org
Signed-off-by: Jan Kara <jack@suse.cz>

4w ago

João Paredes

679330e4

JFS: always load filesystem UUID during mount

1mo ago

Paolo Bonzini

3c443ec7

vfio: remove dead notifier code

4w ago

sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable

deb7b2f9

Tejun Heo

15d

sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime

05b4a9a9

Tejun Heo

15d

sched_ext: Refuse cross-task select_cpu_from_kfunc calls

ea7c716a

Tejun Heo

15d

sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED

c0e8ddc7

Tejun Heo

15d

sched_ext: Make bypass LB cpumasks per-scheduler

d292aa00

Tejun Heo

15d

sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before

4155fb48

Tejun Heo

15d

sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task

207d76a3

Tejun Heo

15d

sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP

7fb39e4e

Tejun Heo

15d

sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail

2f2ea770

Tejun Heo

15d

sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued()

cc2a387d

Tejun Heo

15d

sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters

80afd4c8

Tejun Heo

15d

sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path

21a5a97b

Tejun Heo

15d

sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()

da2d81b4

Tejun Heo

15d

sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new

4fda9f0e

Tejun Heo

15d

sched_ext: Unregister sub_kset on scheduler disable

411d3ef1

Tejun Heo

15d

sched_ext: Defer scx_hardlockup() out of NMI

bd2d7645

Tejun Heo

15d

sched_ext: sync disable_irq_work in bpf_scx_unreg()

510a2705

Richard Cheng

15d

sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched

4e3d7c89

zhidao su

16d

tools/sched_ext: scx_qmap: Silence task_ctx lookup miss

05909810

Tejun Heo

18d

rhashtable: Bounce deferred worker kick through irq_work

4fe98529

Tejun Heo

18d

selftests/sched_ext: Add non_scx_kfunc_deny test

5897ca15

Cheng-Yang Chou

19d

sched_ext: Deny SCX kfuncs to non-SCX struct_ops programs

2d2b026c

Cheng-Yang Chou

19d

sched_ext: Mark scx_sched_hash insecure_elasticity

87019cb6

Tejun Heo

rhashtable: Restore insecure_elasticity toggle

73bd1227

Herbert Xu

Merge tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd

Pull smbdirect updates from Steve French:
"Move smbdirect server and client code to common directory:

- temporary use of smbdirect_all_c_files.c to allow micro steps

- factor out common functions into a smbdirect.ko.

- convert cifs.ko to use smbdirect.ko

- convert ksmbd.ko to use smbdirect.ko

- let smbdirect.ko use global workqueues

- move ib_client logic from ksmbd.ko into smbdirect.ko

- remove smbdirect_all_c_files.c hack again

- some locking and teardown related fixes on top"

* tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd: (145 commits)
smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting
smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error
smb: smbdirect: fix copyright header of smbdirect.h
smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16
smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic
smb: server: no longer use smbdirect_socket_set_custom_workqueue()
smb: client: no longer use smbdirect_socket_set_custom_workqueue()
smb: smbdirect: introduce global workqueues
smb: smbdirect: prepare use of dedicated workqueues for different steps
smb: smbdirect: remove unused smbdirect_connection_mr_io_recovery_work()
smb: smbdirect: wrap rdma_disconnect() in rdma_[un]lock_handler()
smb: server: make use of smbdirect_netdev_rdma_capable_mode_type()
smb: smbdirect: introduce smbdirect_netdev_rdma_capable_mode_type()
smb: server: make use of smbdirect.ko
smb: server: remove unused ksmbd_transport_ops.prepare()
smb: server: make use of smbdirect_socket_{listen,accept}()
smb: server: only use public smbdirect functions
smb: server: make use of smbdirect_socket_create_accepting()/smbdirect_socket_release()
smb: server: make use of smbdirect_{socket_init_accepting,connection_wait_for_connected}()
smb: server: make use of smbdirect_connection_send_iter() and related functions
...