Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Pull io_uring fixes from Jens Axboe:
- Fix for a NOMMU bug with io_uring, where NOMMU doesn't grab page refs
at mmap time. NOMMU also has entirely broken FOLL_PIN support, yet
here we are
- A few fixes covering minor issues introduced in this merge window
- data race annotation to shut up KCSAN for when io-wq limits are
applied
- A nospec addition for direct descriptor file updating. Rest of the
direct descriptor path already had this, but for some reason the
update did not. Now they are all the same
- Various minor defensive changes that claude identified and suggested
terrible fixes for, turned into actually useful cleanups:
- Use kvfree() for the imu cache. These can come from kmalloc or
vmalloc depending on size, but the in-cache ones are capped
where it's always kmalloc based. Change to kvfree() in the
cleanup path, making future changes unlikely to mess that up
- Negative kbuf consumption lengths. Can't happen right now, but
cqe->res is used directly, which if other codes changes could
then be an error value
- Fix for an issue with the futex code, where partial wakes on a
vectored fuxes would potentially wake the same futex twice, rather
than move on to the next one. This could confuse an application as it
would've expected the next futex to have been woken
- Fix for a bug with ring resizing, where SQEs or CQEs might not have
been copied correctly if large SQEs or CQEs are used in the ring.
Application side issue, where SQEs or CQEs might have been lost
during resize
- Fix for a bug where EPOLL_URING_WAKE might have been lost, causing a
multishot poll to not be terminated when it's nested, like it should
have been
- Fix for an issue with signed comparison of poll references for the
slow path
- Fix for a user struct UAF in the zcrx code
- Two minor zcrx cleanups
* tag 'io_uring-7.1-20260424' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: take page references for NOMMU pbuf_ring mmaps
io_uring/poll: ensure EPOLL_ONESHOT is propagated for EPOLL_URING_WAKE
io_uring/zcrx: warn on freelist violations
io_uring/zcrx: clear RQ headers on init
io_uring/zcrx: fix user_struct uaf
io_uring/register: fix ring resizing with mixed/large SQEs/CQEs
io_uring/futex: ensure partial wakes are appropriately dequeued
io_uring/rw: add defensive hardening for negative kbuf lengths
io_uring/rsrc: use kvfree() for the imu cache
io_uring/rsrc: unify nospec indexing for direct descriptors
io_uring: fix spurious fput in registered ring path
io_uring: fix iowq_limits data race in tctx node addition
io_uring/tctx: mark io_wq as exiting before error path teardown
io_uring/tctx: check for setup tctx->io_wq before teardown
io_uring/poll: fix signed comparison in io_poll_get_ownership()
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- Fix handling of ENOSPC so that if we have to resend writes, they
are written synchronously
- SUNRPC RDMA transport fixes from Chuck
- Several fixes for delegated timestamps in NFSv4.2
- Failure to obtain a directory delegation should not cause stat() to
fail with NFSv4
- Rename was failing to update timestamps when a directory delegation
is held on NFSv4
- Ensure we check rsize/wsize after crossing a NFSv4 filesystem
boundary
- NFSv4/pnfs:
- If the server is down, retry the layout returns on reboot
- Fallback to MDS could result in a short write being incorrectly
logged
Cleanups:
- Use memcpy_and_pad in decode_fh"
* tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (21 commits)
NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address
NFS: remove redundant __private attribute from nfs_page_class
NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes
NFS: fix writeback in presence of errors
nfs: use memcpy_and_pad in decode_fh
NFSv4.1: Apply session size limits on clone path
NFSv4: retry GETATTR if GET_DIR_DELEGATION failed
NFS: fix RENAME attr in presence of directory delegations
pnfs/flexfiles: validate ds_versions_cnt is non-zero
NFS/blocklayout: print each device used for SCSI layouts
xprtrdma: Post receive buffers after RPC completion
xprtrdma: Scale receive batch size with credit window
xprtrdma: Replace rpcrdma_mr_seg with xdr_buf cursor
xprtrdma: Decouple frwr_wp_create from frwr_map
xprtrdma: Close lost-wakeup race in xprt_rdma_alloc_slot
xprtrdma: Avoid 250 ms delay on backlog wakeup
xprtrdma: Close sendctx get/put race that can block a transport
nfs: update inode ctime after removexattr operation
nfs: fix utimensat() for atime with delegated timestamps
NFS: improve "Server wrote zero bytes" error
...
Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.
The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page. Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.
Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them. The user can then write into whatever
the allocator hands out next.
Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close. NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap.
This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed. The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
[axboe: get rid of region lookup, just iterate pages in vma]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ceph updates from Ilya Dryomov:
"We have a series from Alex which extends CephFS client metrics with
support for per-subvolume data I/O performance and latency tracking
(metadata operations aren't included) and a good variety of fixes and
cleanups across RBD and CephFS"
* tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-client:
ceph: add subvolume metrics collection and reporting
ceph: parse subvolume_id from InodeStat v9 and store in inode
ceph: handle InodeStat v8 versioned field in reply parsing
libceph: Fix slab-out-of-bounds access in auth message processing
rbd: fix null-ptr-deref when device_add_disk() fails
crush: cleanup in crush_do_rule() method
ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails
ceph: only d_add() negative dentries when they are unhashed
libceph: update outdated comment in ceph_sock_write_space()
libceph: Remove obsolete session key alignment logic
ceph: fix num_ops off-by-one when crypto allocation fails
libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
The cl_xprt pointer in struct rpc_clnt is marked as __rcu. Accessing
it directly in nfs_compare_super_address() is unsafe and triggers
Sparse warnings.
Fix this by using rcu_dereference() within an RCU read-side critical
section to retrieve the transport pointer. This addresses the sparse
warning and ensures atomic access to the pointer, as the transport
can be updated via transport switching even while the superblock
remains active under sb_lock.
Fixes: 7e3fcf61abde ("nfs: don't share mounts between network namespaces")
Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Commit:
aacf2f9f382c ("io_uring: fix req->apoll_events")
fixed an issue where poll->events and req->apoll_events weren't
synchronized, but then when the commit referenced in Fixes got added,
it didn't ensure the same thing.
If we mask in EPOLLONESHOT in the regular EPOLL_URING_WAKE path, then
ensure it's done for both. Including a link to the original report
below, even though it's mostly nonsense. But it includes a reproducer
that does show that IORING_CQE_F_MORE is set in the previous CQE,
while no more CQEs will be generated for this request. Just ignore
anything that pretends this is security related in any way, it's just
the typical AI nonsense.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/io-uring/CAM0zi7yQzF3eKncgHo4iVM5yFLAjsiob_ucqyWKs=hyd_GqiMg@mail.gmail.com/
Reported-by: Azizcan Daştan <azizcan.d@mileniumsec.com>
Fixes: 4464853277d0 ("io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ntfs updates from Namjae Jeon:
- Fix potential data leakage by zeroing the portion of the straddle
block beyond initialized_size when reading non-resident attributes
- Remove unnecessary zeroing in ntfs_punch_hole() for ranges beyond
initialized_size, as they are already returned as zeros on read
- Fix writable check in ntfs_file_mmap_prepare() to correctly handle
shared mappings using VMA_SHARED_BIT | VMA_MAYWRITE_BIT
- Use page allocation instead of kmemdup() for IOMAP_INLINE data to
ensure page-aligned address and avoid BUG trap in
iomap_inline_data_valid() caused by the page boundary check
- Add a size check before memory allocation in ntfs_attr_readall() and
reject overly large attributes
- Remove unneeded noop_direct_IO from ntfs_aops as it is no longer
required following the FMODE_CAN_ODIRECT flag
- Fix seven static analysis warnings reported by Smatch
* tag 'ntfs-for-7.1-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs:
ntfs: use page allocation for resident attribute inline data
ntfs: fix mmap_prepare writable check for shared mappings
ntfs: fix potential 32-bit truncation in ntfs_write_cb()
ntfs: fix uninitialized variable in ntfs_map_runlist_nolock
ntfs: delete dead code
ntfs: add missing error code in ntfs_mft_record_alloc()
ntfs: fix uninitialized variables in ntfs_ea_set_wsl_inode()
ntfs: fix uninitialized pointer in ntfs_write_mft_block
ntfs: fix uninitialized variable in ntfs_write_simple_iomap_begin_non_resident
ntfs: remove noop_direct_IO from address_space_operations
ntfs: limit memory allocation in ntfs_attr_readall
ntfs: not zero out range beyond init in punch_hole
ntfs: zero out stale data in straddle block beyond initialized_size
Add complete infrastructure for per-subvolume I/O metrics collection
and reporting to the MDS. This enables administrators to monitor I/O
patterns at the subvolume granularity, which is useful for multi-tenant
CephFS deployments.
This patch adds:
- CEPHFS_FEATURE_SUBVOLUME_METRICS feature flag for MDS negotiation
- CEPH_SUBVOLUME_ID_NONE constant (0) for unknown/unset state
- Red-black tree based metrics tracker for efficient per-subvolume
aggregation with kmem_cache for entry allocations
- Wire format encoding matching the MDS C++ AggregatedIOMetrics struct
- Integration with the existing CLIENT_METRICS message
- Recording of I/O operations from file read/write and writeback paths
- Debugfs interfaces for monitoring (metrics/subvolumes, metrics/metric_features)
Metrics tracked per subvolume include:
- Read/write operation counts
- Read/write byte counts
- Read/write latency sums (for average calculation)
The metrics are periodically sent to the MDS as part of the existing
metrics reporting infrastructure when the MDS advertises support for
the SUBVOLUME_METRICS feature.
CEPH_SUBVOLUME_ID_NONE enforces subvolume_id immutability. Following
the FUSE client convention, 0 means unknown/unset. Once an inode has
a valid (non-zero) subvolume_id, it should not change during the
inode's lifetime.
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The nfs_page_class tracepoint uses a pointer for the 'req' field marked
with the __private attribute. This causes Sparse to complain about
dereferencing a private pointer within the trace ring buffer context,
specifically during the TP_fast_assign() operation.
This fixes a Sparse warning introduced in commit b6ef079fd984 ("nfs:
more in-depth tracing of writepage events") by removing the redundant
__private attribute from the 'req' field.
Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>
Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The freelist is appropriately sized to always be able to take a free
niov, but let's be more defensive and check the invariant with a
warning. That should help to catch any double-free issues.
Suggested-by: Kai Aizen <kai@snailsploit.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull 9p updates from Dominique Martinet:
- 9p access flag fix (cannot change access flag since new mount API implem)
- some minor cleanup
* tag '9p-for-7.1-rc1' of https://github.com/martinetd/linux:
9p/trans_xen: replace simple_strto* with kstrtouint
9p/trans_xen: make cleanup idempotent after dataring alloc errors
9p: document missing enum values in kernel-doc comments
9p: fix access mode flags being ORed instead of replaced
9p: fix memory leak in v9fs_init_fs_context error path
The current kmemdup() based allocation for IOMAP_INLINE can result in
inline_data pointer having a non-zero page offset. This causes
iomap_inline_data_valid() to fail the check:
iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data)
and triggers the kernel BUG at fs/iomap/buffered-io.c:1061.
This particularly affects workloads with frequent small file access
(e.g. Firefox Nightly profile on NTFS with bind mount) when using the
new ntfs. This fix this by allocating a full page with alloc_page() so that
page_address() always returns a page-aligned address.
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Add support for parsing the subvolume_id field from InodeStat v9 and
storing it in the inode for later use by subvolume metrics tracking.
The subvolume_id identifies which CephFS subvolume an inode belongs to,
enabling per-subvolume I/O metrics collection and reporting.
This patch:
- Adds subvolume_id field to struct ceph_mds_reply_info_in
- Adds i_subvolume_id field to struct ceph_inode_info
- Parses subvolume_id from v9 InodeStat in parse_reply_info_in()
- Adds ceph_inode_set_subvolume() helper to propagate the ID to inodes
- Initializes i_subvolume_id in inode allocation and clears on destroy
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
xfstest generic/407 is failing in 2 ways. It detects that after
doing a clone the client does not update it's mtime and it's ctime.
CLONE always sends a GETATTR operation and then calls
nfs_post_op_update_inode() based on the returned attributes.
Because of the delegated attributes the client ignores updating
the mtime. Then also, when delegated attributes are present, for
the change_attr the server replies with the same values as what
the client cached before and thus the generic/407 would flag that.
Instead, make sure we invalidate the blocks attr.
By adding updating delegated attributes in nfs42_copy_dest_done()
both COPY and CLONE would update mtime appropriately.
Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
It might be unexpected to users if the RQ head/tail after a ring
creation are not zeroed, fix that.
Cc: stable@vger.kernel.org
Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/331f94663c3e8f021ffa3cb770ca2844a07d4855.1776760911.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull SPDX update from Greg KH:
"Here is a single SPDX-like change for 7.1-rc1. It explicitly allows
the use of SPDX-FileCopyrightText which has been used already in many
files.
At the same time, update checkpatch to catch any "non allowed" spdx
identifiers as we don't want to go overboard here.
This has been in linux-next for a long time with no reported problems"
* tag 'spdx-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
LICENSES: Explicitly allow SPDX-FileCopyrightText