Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
The comparison of an __s8 value against DTPAGEMAXSLOT is still trivially
true, causing a harmless (default disabled) warning with clang:
fs/jfs/jfs_dtree.c:4419:25: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
4419 | p->header.freelist >= DTPAGEMAXSLOT)) {
| ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~
I previously worked around two of these in commit 7833570dae83 ("jfs: avoid
-Wtautological-constant-out-of-range-compare warning"), but now a new one has
come up, so address the same way by dropping the redundant range check.
Fixes: 119e448bb50a ("jfs: add dtpage integrity check to prevent index/pointer overflows")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
The filesystem UUID was only being loaded into super_block sb when an
external journal device was in use. When mounting without an external
journal, the UUID remained unset, which prevented the computation of
a filesystem ID (fsid), which could be confirmed via `stat -f -c "%i"`
and thus user space could not use fanotify correctly.
A missing filesystem ID causes fanotify to return ENODEV when marking
the filesystem for events like FAN_CREATE, FAN_DELETE, FAN_MOVED_TO,
and FAN_MOVED_FROM. As a result, applications relying on fanotify
could not monitor these events on JFS filesystems without an external
journal.
Moved the UUID initialization so it is always performed during mount,
ensuring the superblock UUID is consistently available.
Signed-off-by: João Paredes <joaommp@yahoo.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
write_special_inodes() function iterate through the log->sb_list and
access the sbi fields, which can be set to NULL concurrently by umount.
Fix concurrency issue by holding LOG_LOCK and checking for NULL.
Reported-by: syzbot+e14b1036481911ae4d77@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e14b1036481911ae4d77
Signed-off-by: Helen Koike <koike@igalia.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
In lbmRead(), the I/O event waited for by wait_event() finishes before
it goes to sleep, and the lbmIODone() prematurely sets the flag to
lbmDONE, thus ending the wait. This causes wait_event() to return before
lbmREAD is cleared (because lbmDONE was set first), the premature return
of wait_event() leads to the release of lbuf before lbmIODone() returns,
thus triggering the use-after-free vulnerability reported in [1].
Moving the operation of setting the lbmDONE flag to after clearing lbmREAD
in lbmIODone() avoids the use-after-free vulnerability reported in [1].
[1]
BUG: KASAN: slab-use-after-free in rt_spin_lock+0x88/0x3e0 kernel/locking/spinlock_rt.c:56
Call Trace:
blk_update_request+0x57e/0xe60 block/blk-mq.c:1007
blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1169
blk_complete_reqs block/blk-mq.c:1244 [inline]
blk_done_softirq+0x10a/0x160 block/blk-mq.c:1249
Allocated by task 6101:
lbmLogInit fs/jfs/jfs_logmgr.c:1821 [inline]
lmLogInit+0x3d0/0x19e0 fs/jfs/jfs_logmgr.c:1269
open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline]
lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069
jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257
jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532
Freed by task 6101:
kfree+0x1bd/0x900 mm/slub.c:6876
lbmLogShutdown fs/jfs/jfs_logmgr.c:1864 [inline]
lmLogInit+0x1137/0x19e0 fs/jfs/jfs_logmgr.c:1415
open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline]
lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069
jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257
jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532
Reported-by: syzbot+1d38eedcb25a3b5686a7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1d38eedcb25a3b5686a7
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
This patch resolves the "list_add corruption. next is NULL" Oops
reported by syzkaller in dbUpdatePMap(). The root cause is uninitialized
synclist nodes in struct metapage and struct TxBlock, plus improper list
node removal using list_del() (which leaves nodes in an invalid state).
This fixes the following Oops reported by syzkaller.
list_add corruption. next is NULL.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:28!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 122 Comm: jfsCommit Not tainted syzkaller #0
PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 10/02/2025
RIP: 0010:__list_add_valid_or_report+0xc3/0x130 lib/list_debug.c:27
Code: 4c 89 f2 48 89 d9 e8 0c 88 a4 fc 90 0f 0b 48 c7 c7 20 de 3d 8b e8
fd 87 a4 fc 90 0f 0b 48 c7 c7 c0 de 3d 8b e8 ee 87 a4 fc 90 <0f> 0b 48
89 df e8 13 c3 7d fd 42 80 7c 2d 00 00 74 08 4c 89 e7 e8
RSP: 0018:ffffc9000395fa20 EFLAGS: 00010246
RAX: 0000000000000022 RBX: 0000000000000000 RCX: 270c5dfadb559700
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000000f0000 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: fffff5200072bee9 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000004 R15: 1ffff92000632266
FS: 0000000000000000(0000) GS:ffff888126ef9000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056341fdb86c0 CR3: 0000000040a18000 CR4: 00000000003526f0
Call Trace:
<TASK>
__list_add_valid include/linux/list.h:96 [inline]
__list_add include/linux/list.h:158 [inline]
list_add include/linux/list.h:177 [inline]
dbUpdatePMap+0x7e4/0xeb0 fs/jfs/jfs_dmap.c:577
txAllocPMap+0x57d/0x6b0 fs/jfs/jfs_txnmgr.c:2426
txUpdateMap+0x81e/0x9c0 fs/jfs/jfs_txnmgr.c:2364
txLazyCommit fs/jfs/jfs_txnmgr.c:2665 [inline]
jfs_lazycommit+0x3f1/0xa10 fs/jfs/jfs_txnmgr.c:2734
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
Reported-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4d0a0feb49c5138cac46
Tested-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Add check_dmapctl() to validate dmapctl structure integrity, focusing on
preventing invalid operations caused by on-disk corruption.
Key checks:
- nleafs bounded by [0, LPERCTL] (maximum leaf nodes per dmapctl).
- l2nleafs bounded by [0, L2LPERCTL] and consistent with nleafs
(nleafs must be 2^l2nleafs).
- leafidx must be exactly CTLLEAFIND (expected leaf index position).
- height bounded by [0, L2LPERCTL >> 1] (valid tree height range).
- budmin validity: NOFREE only if nleafs=0; otherwise >= BUDMIN.
- Leaf nodes fit within stree array (leafidx + nleafs <= CTLTREESIZE).
- Leaf node values are either non-negative or NOFREE.
Invoked in dbAllocAG(), dbFindCtl(), dbAdjCtl() and dbExtendFS() when
accessing dmapctl pages, catching corruption early before dmap operations
trigger invalid memory access or logic errors.
This fixes the following UBSAN warning.
[58245.668090][T14017] ------------[ cut here ]------------
[58245.668103][T14017] UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2641:11
[58245.668119][T14017] shift exponent 110 is too large for 32-bit type 'int'
[58245.668137][T14017] CPU: 0 UID: 0 PID: 14017 Comm: 4c1966e88c28fa9 Tainted: G E 6.18.0-rc4-00253-g21ce5d4ba045-dirty #124 PREEMPT_{RT,(full)}
[58245.668174][T14017] Tainted: [E]=UNSIGNED_MODULE
[58245.668176][T14017] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[58245.668184][T14017] Call Trace:
[58245.668200][T14017] <TASK>
[58245.668208][T14017] dump_stack_lvl+0x189/0x250
[58245.668288][T14017] ? __pfx_dump_stack_lvl+0x10/0x10
[58245.668301][T14017] ? __pfx__printk+0x10/0x10
[58245.668315][T14017] ? lock_metapage+0x303/0x400 [jfs]
[58245.668406][T14017] ubsan_epilogue+0xa/0x40
[58245.668422][T14017] __ubsan_handle_shift_out_of_bounds+0x386/0x410
[58245.668462][T14017] dbSplit+0x1f8/0x200 [jfs]
[58245.668543][T14017] dbAdjCtl+0x34c/0xa20 [jfs]
[58245.668628][T14017] dbAllocNear+0x2ee/0x3d0 [jfs]
[58245.668710][T14017] dbAlloc+0x933/0xba0 [jfs]
[58245.668797][T14017] ea_write+0x374/0xdd0 [jfs]
[58245.668888][T14017] ? __pfx_ea_write+0x10/0x10 [jfs]
[58245.668966][T14017] ? __jfs_setxattr+0x76e/0x1120 [jfs]
[58245.669046][T14017] __jfs_setxattr+0xa01/0x1120 [jfs]
[58245.669135][T14017] ? __pfx___jfs_setxattr+0x10/0x10 [jfs]
[58245.669216][T14017] ? mutex_lock_nested+0x154/0x1d0
[58245.669252][T14017] ? __jfs_xattr_set+0xb9/0x170 [jfs]
[58245.669333][T14017] __jfs_xattr_set+0xda/0x170 [jfs]
[58245.669430][T14017] ? __pfx___jfs_xattr_set+0x10/0x10 [jfs]
[58245.669509][T14017] ? xattr_full_name+0x6f/0x90
[58245.669546][T14017] ? jfs_xattr_set+0x33/0x60 [jfs]
[58245.669636][T14017] ? __pfx_jfs_xattr_set+0x10/0x10 [jfs]
[58245.669726][T14017] __vfs_setxattr+0x43c/0x480
[58245.669743][T14017] __vfs_setxattr_noperm+0x12d/0x660
[58245.669756][T14017] vfs_setxattr+0x16b/0x2f0
[58245.669768][T14017] ? __pfx_vfs_setxattr+0x10/0x10
[58245.669782][T14017] filename_setxattr+0x274/0x600
[58245.669795][T14017] ? __pfx_filename_setxattr+0x10/0x10
[58245.669806][T14017] ? getname_flags+0x1e5/0x540
[58245.669829][T14017] path_setxattrat+0x364/0x3a0
[58245.669840][T14017] ? __pfx_path_setxattrat+0x10/0x10
[58245.669859][T14017] ? __se_sys_chdir+0x1b9/0x280
[58245.669876][T14017] __x64_sys_lsetxattr+0xbf/0xe0
[58245.669888][T14017] do_syscall_64+0xfa/0xfa0
[58245.669901][T14017] ? lockdep_hardirqs_on+0x9c/0x150
[58245.669913][T14017] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[58245.669927][T14017] ? exc_page_fault+0xab/0x100
[58245.669937][T14017] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-by: syzbot+4c1966e88c28fa96e053@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4c1966e88c28fa96e053
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Add check_dtpage() to validate dtpage_t integrity, focusing on
preventing index/pointer overflows from on-disk corruption.
Key checks:
- maxslot must be exactly DTPAGEMAXSLOT (128) as defined for dtpage
slot array.
- freecnt bounded by [0, DTPAGEMAXSLOT-1] (slot[0] reserved for header).
- freelist validity: -1 when freecnt=0; 1~DTPAGEMAXSLOT-1 when non-zero,
with linked list checks (no duplicates, proper termination via next=-1).
- stblindex bounds: must be within range that avoids overlapping with
stbl itself (stblindex < DTPAGEMAXSLOT - stblsize).
- nextindex bounded by stbl size (stblsize << L2DTSLOTSIZE). stbl entries
validity: within 1~DTPAGEMAXSLOT-1, no duplicates(excluding invalid
entries marked as -1).
Invoked when loading dtpage (in BT_GETPAGE macro context) to catch
corruption early before directory operations trigger out-of-bounds access.
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Add check_dtroot() to validate dtroot_t integrity, focusing on preventing
index/pointer overflows from on-disk corruption.
Key checks:
- freecnt bounded by [0, DTROOTMAXSLOT-1] (slot[0] reserved for header).
- freelist validity: -1 when freecnt=0; 1~DTROOTMAXSLOT-1 when non-zero,
with linked list checks (no duplicates, proper termination via next=-1).
- stbl bounds: nextindex within stbl array size; entries within 0~8, no
duplicates (excluding idx=0).
Invoked in copy_from_dinode() when loading directory inodes, catching
corruption early before directory operations trigger out-of-bounds access.
This fixes the following UBSAN warning.
[ 101.832754][ T5960] ------------[ cut here ]------------
[ 101.832762][ T5960] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3713:8
[ 101.832792][ T5960] index -1 is out of range for type 'struct dtslot[128]'
[ 101.832807][ T5960] CPU: 2 UID: 0 PID: 5960 Comm: 5f7f0caf9979e9d Tainted: G E 6.18.0-rc4-00250-g2603eb907f03 #119 PREEMPT_{RT,(full
[ 101.832817][ T5960] Tainted: [E]=UNSIGNED_MODULE
[ 101.832819][ T5960] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 101.832823][ T5960] Call Trace:
[ 101.832833][ T5960] <TASK>
[ 101.832838][ T5960] dump_stack_lvl+0x189/0x250
[ 101.832909][ T5960] ? __pfx_dump_stack_lvl+0x10/0x10
[ 101.832925][ T5960] ? __pfx__printk+0x10/0x10
[ 101.832934][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0
[ 101.832959][ T5960] ubsan_epilogue+0xa/0x40
[ 101.832966][ T5960] __ubsan_handle_out_of_bounds+0xe9/0xf0
[ 101.833007][ T5960] dtInsertEntry+0x936/0x1430 [jfs]
[ 101.833094][ T5960] dtSplitPage+0x2c8b/0x3ed0 [jfs]
[ 101.833177][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10
[ 101.833193][ T5960] dtInsert+0x109b/0x6000 [jfs]
[ 101.833283][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0
[ 101.833296][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10
[ 101.833307][ T5960] ? rt_spin_unlock+0x161/0x200
[ 101.833315][ T5960] ? __pfx_dtInsert+0x10/0x10 [jfs]
[ 101.833391][ T5960] ? txLock+0xaf9/0x1cb0 [jfs]
[ 101.833477][ T5960] ? dtInitRoot+0x22a/0x670 [jfs]
[ 101.833556][ T5960] jfs_mkdir+0x6ec/0xa70 [jfs]
[ 101.833636][ T5960] ? __pfx_jfs_mkdir+0x10/0x10 [jfs]
[ 101.833721][ T5960] ? generic_permission+0x2e5/0x690
[ 101.833760][ T5960] ? bpf_lsm_inode_mkdir+0x9/0x20
[ 101.833776][ T5960] vfs_mkdir+0x306/0x510
[ 101.833786][ T5960] do_mkdirat+0x247/0x590
[ 101.833795][ T5960] ? __pfx_do_mkdirat+0x10/0x10
[ 101.833804][ T5960] ? getname_flags+0x1e5/0x540
[ 101.833815][ T5960] __x64_sys_mkdir+0x6c/0x80
[ 101.833823][ T5960] do_syscall_64+0xfa/0xfa0
[ 101.833832][ T5960] ? lockdep_hardirqs_on+0x9c/0x150
[ 101.833840][ T5960] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 101.833847][ T5960] ? exc_page_fault+0xab/0x100
[ 101.833856][ T5960] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Pull EFI fix from Ard Biesheuvel:
"Fix for the x86 EFI workaround keeping boot services code and data
regions reserved until after SetVirtualAddressMap() completes:
deferred struct page initialization may result in some of this memory
being lost permanently"
* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efi: defer freeing of boot services memory
Pull i2c fix from Wolfram Sang:
"A revert for the i801 driver restoring old locking behaviour"
* tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().
There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().
More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.
Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.
If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.
Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.
Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.
More robust approach is to only defer freeing of the EFI boot services
memory.
Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.
Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org
Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull x86 fixes from Ingo Molnar:
- Fix SEV guest boot failures in certain circumstances, due to
very early code relying on a BSS-zeroed variable that isn't
actually zeroed yet an may contain non-zero bootup values
Move the variable into the .data section go gain even earlier
zeroing
- Expose & allow the IBPB-on-Entry feature on SNP guests, which
was not properly exposed to guests due to initial implementational
caution
- Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
file paths
- Fix the various SNC (Sub-NUMA Clustering) topology enumeration
bugs/artifacts (sched-domain build errors mostly).
SNC enumeration data got more complicated with Granite Rapids X
(GNR) and Clearwater Forest X (CWF), which exposed these bugs
and made their effects more serious
- Also use the now sane(r) SNC code to fix resctrl SNC detection bugs
- Work around a historic libgcc unwinder bug in the vdso32 sigreturn
code (again), which regressed during an overly aggressive recent
cleanup of DWARF annotations
* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso32: Work around libgcc unwinder bug
x86/resctrl: Fix SNC detection
x86/topo: Fix SNC topology mess
x86/topo: Replace x86_has_numa_in_package
x86/topo: Add topology_num_nodes_per_package()
x86/numa: Store extra copy of numa_nodes_parsed
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
x86/sev: Allow IBPB-on-Entry feature for SNP guests
x86/boot/sev: Move SEV decompressor variables into the .data section
This reverts commit f707d6b9e7c18f669adfdb443906d46cfbaaa0c1.
Under rare circumstances, multiple udev threads can collect i801 device
info on boot and walk i801_acpi_io_handler somewhat concurrently. The
first will note the area is reserved by acpi to prevent further touches.
This ultimately causes the area to be deregistered. The second will
enter i801_acpi_io_handler after the area is unregistered but before a
check can be made that the area is unregistered. i2c_lock_bus relies on
the now unregistered area containing lock_ops to lock the bus. The end
result is a kernel panic on boot with the following backtrace;
[ 14.971872] ioatdma 0000:09:00.2: enabling device (0100 -> 0102)
[ 14.971873] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 14.971880] #PF: supervisor read access in kernel mode
[ 14.971884] #PF: error_code(0x0000) - not-present page
[ 14.971887] PGD 0 P4D 0
[ 14.971894] Oops: 0000 [#1] PREEMPT SMP PTI
[ 14.971900] CPU: 5 PID: 956 Comm: systemd-udevd Not tainted 5.14.0-611.5.1.el9_7.x86_64 #1
[ 14.971905] Hardware name: XXXXXXXXXXXXXXXXXXXXXXX BIOS 1.20.10.SV91 01/30/2023
[ 14.971908] RIP: 0010:i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[ 14.971929] Code: 00 00 49 8b 40 20 41 57 41 56 4d 8b b8 30 04 00 00 49 89 ce 41 55 41 89 d5 41 54 49 89 f4 be 02 00 00 00 55 4c 89 c5 53 89 fb <48> 8b 00 4c 89 c7 e8 18 61 54 e9 80 bd 80 04 00 00 00 75 09 4c 3b
[ 14.971933] RSP: 0018:ffffbaa841483838 EFLAGS: 00010282
[ 14.971938] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9685e01ba568
[ 14.971941] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 0000000000000000
[ 14.971944] RBP: ffff9685ca22f028 R08: ffff9685ca22f028 R09: ffff9685ca22f028
[ 14.971948] R10: 000000000000000b R11: 0000000000000580 R12: 0000000000000580
[ 14.971951] R13: 0000000000000008 R14: ffff9685e01ba568 R15: ffff9685c222f000
[ 14.971954] FS: 00007f8287c0ab40(0000) GS:ffff96a47f940000(0000) knlGS:0000000000000000
[ 14.971959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.971963] CR2: 0000000000000000 CR3: 0000000168090001 CR4: 00000000003706f0
[ 14.971966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 14.971968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 14.971972] Call Trace:
[ 14.971977] <TASK>
[ 14.971981] ? show_trace_log_lvl+0x1c4/0x2df
[ 14.971994] ? show_trace_log_lvl+0x1c4/0x2df
[ 14.972003] ? acpi_ev_address_space_dispatch+0x16e/0x3c0
[ 14.972014] ? __die_body.cold+0x8/0xd
[ 14.972021] ? page_fault_oops+0x132/0x170
[ 14.972028] ? exc_page_fault+0x61/0x150
[ 14.972036] ? asm_exc_page_fault+0x22/0x30
[ 14.972045] ? i801_acpi_io_handler+0x2d/0xb0 [i2c_i801]
[ 14.972061] acpi_ev_address_space_dispatch+0x16e/0x3c0
[ 14.972069] ? __pfx_i801_acpi_io_handler+0x10/0x10 [i2c_i801]
[ 14.972085] acpi_ex_access_region+0x5b/0xd0
[ 14.972093] acpi_ex_field_datum_io+0x73/0x2e0
[ 14.972100] acpi_ex_read_data_from_field+0x8e/0x230
[ 14.972106] acpi_ex_resolve_node_to_value+0x23d/0x310
[ 14.972114] acpi_ds_evaluate_name_path+0xad/0x110
[ 14.972121] acpi_ds_exec_end_op+0x321/0x510
[ 14.972127] acpi_ps_parse_loop+0xf7/0x680
[ 14.972136] acpi_ps_parse_aml+0x17a/0x3d0
[ 14.972143] acpi_ps_execute_method+0x137/0x270
[ 14.972150] acpi_ns_evaluate+0x1f4/0x2e0
[ 14.972158] acpi_evaluate_object+0x134/0x2f0
[ 14.972164] acpi_evaluate_integer+0x50/0xe0
[ 14.972173] ? vsnprintf+0x24b/0x570
[ 14.972181] acpi_ac_get_state.part.0+0x23/0x70
[ 14.972189] get_ac_property+0x4e/0x60
[ 14.972195] power_supply_show_property+0x90/0x1f0
[ 14.972205] add_prop_uevent+0x29/0x90
[ 14.972213] power_supply_uevent+0x109/0x1d0
[ 14.972222] dev_uevent+0x10e/0x2f0
[ 14.972228] uevent_show+0x8e/0x100
[ 14.972236] dev_attr_show+0x19/0x40
[ 14.972246] sysfs_kf_seq_show+0x9b/0x100
[ 14.972253] seq_read_iter+0x120/0x4b0
[ 14.972262] ? selinux_file_permission+0x106/0x150
[ 14.972273] vfs_read+0x24f/0x3a0
[ 14.972284] ksys_read+0x5f/0xe0
[ 14.972291] do_syscall_64+0x5f/0xe0
...
The kernel panic is mitigated by setting limiting the count of udev
children to 1. Revert to using the acpi_lock to continue protecting
marking the area as owned by firmware without relying on a lock in
a potentially unmapped region of memory.
Fixes: f707d6b9e7c1 ("i2c: i801: replace acpi_lock with I2C bus lock")
Signed-off-by: Charles Haithcock <chaithco@redhat.com>
[wsa: added Fixes-tag and updated comment stating the importance of the lock]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Pull timer fix from Ingo Molnar:
"Make clock_adjtime() syscall timex validation slightly more permissive
for auxiliary clocks, to not reject syscalls based on the status field
that do not try to modify the status field.
This makes the ABI behavior in clock_adjtime() consistent with
CLOCK_REALTIME"
* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix timex status validation for auxiliary clocks