Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Every doit handler followed the same pattern: stack-allocate an
adm_ctx, call drbd_adm_prepare() at the top, call drbd_adm_finish()
at the bottom. This duplicated boilerplate across 25 handlers and
made error paths inconsistent, since some handlers could miss sending
the reply skb on early-exit paths.
The generic netlink framework already provides pre_doit/post_doit
hooks for exactly this purpose. An old comment even noted "this
would be a good candidate for a pre_doit hook".
Use them:
- pre_doit heap-allocates adm_ctx, looks up per-command flags from a
new drbd_genl_cmd_flags[] table, runs drbd_adm_prepare(), and
stores the context in info->user_ptr[0].
- post_doit sends the reply, drops kref references for
device/connection/resource, and frees the adm_ctx.
- Handlers just receive adm_ctx from info->user_ptr[0], set
reply_dh->ret_code, and return. All teardown is in post_doit.
- drbd_adm_finish() is removed, superseded by post_doit.
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://patch.msgid.link/20260324152907.2840984-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new options that causes zloop to truncate the zone files to the
write pointer value recorded at the last cache flush to simulate
unclean shutdowns.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/20260323071156.2940772-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Split out two helpers functions to make the function more readable and
to avoid conditional locking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://patch.msgid.link/20260323071156.2940772-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_alloc_bioset() first strips __GFP_DIRECT_RECLAIM from the optimistic
fast allocation attempt with try_alloc_gfp(). If that fast path fails,
the slowpath checks saved_gfp to decide whether blocking allocation is
allowed, but then still calls mempool_alloc() with the stripped gfp mask.
That can lead to a NULL bio pointer being passed into bio_init().
Fix the slowpath by using saved_gfp for the bio and bvec mempool
allocations.
Fixes: b520c4eef83d ("block: split bio_alloc_bioset more clearly into a fast and slowpath")
Reported-by: syzbot+09ddb593eea76a158f42@syzkaller.appspotmail.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/p01.gc6e9ad5845ad.ttca29g@ub.hpns
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mark ublk_filter_unused_tags() as noinline since it is only called from
the unlikely(needs_filter) branch. Extract the error-handling block from
__ublk_batch_dispatch() into a new noinline ublk_batch_dispatch_fail()
function to keep the hot path compact and icache-friendly. This also
makes __ublk_batch_dispatch() more readable by separating the error
recovery logic from the normal dispatch flow.
Before: __ublk_batch_dispatch is ~1419 bytes
After: __ublk_batch_dispatch is ~1090 bytes (-329 bytes, -23%)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260318014112.3125432-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull MD changes from Yu Kuia:
"Bug Fixes:
- md: suppress spurious superblock update error message for dm-raid
(Chen Cheng)
- md/raid1: fix the comparing region of interval tree (Xiao Ni)
- md/raid10: fix deadlock with check operation and nowait requests
(Josh Hunt)
- md/raid5: skip 2-failure compute when other disk is R5_LOCKED
(FengWei Shih)
- md/md-llbitmap: raise barrier before state machine transition
(Yu Kuai)
- md/md-llbitmap: skip reading rdevs that are not in_sync (Yu Kuai)
Improvements:
- md/raid5: set chunk_sectors to enable full stripe I/O splitting
(Yu Kuai)
Cleanups:
- md: remove unused mddev argument from export_rdev (Chen Cheng)
- md/raid5: remove stale md_raid5_kick_device() declaration
(Chen Cheng)
- md/raid5: move handle_stripe() comment to correct location
(Chen Cheng)"
* tag 'md-7.1-20260323' of git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md: remove unused mddev argument from export_rdev
md/raid5: move handle_stripe() comment to correct location
md/raid5: remove stale md_raid5_kick_device() declaration
md/raid1: fix the comparing region of interval tree
md/raid5: skip 2-failure compute when other disk is R5_LOCKED
md/md-llbitmap: raise barrier before state machine transition
md/md-llbitmap: skip reading rdevs that are not in_sync
md/raid5: set chunk_sectors to enable full stripe I/O splitting
md/raid10: fix deadlock with check operation and nowait requests
md: suppress spurious superblock update error message for dm-raid
In preparation for removing the strlcat API[1], replace the char *pp_buf
with a struct seq_buf, which tracks the current write position and
remaining space internally. This allows for:
- Direct use of seq_buf_printf() in place of snprintf()+strlcat()
pairs, eliminating local tmp buffers throughout.
- Adjacent strlcat() calls that build strings piece-by-piece
(e.g., strlcat("["); strlcat(name); strlcat("]")) to be collapsed
into single seq_buf_printf() calls.
- Simpler call sites: seq_buf_puts() takes only the buffer and string,
with no need to pass PAGE_SIZE at every call.
The backing buffer allocation is unchanged (__get_free_page), and the
output path uses seq_buf_str() to NUL-terminate before passing to
printk().
Link: https://github.com/KSPP/linux/issues/370 [1]
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Josh Law <objecting@objecting.org>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Josh Law <objecting@objecting.org>
Link: https://patch.msgid.link/20260321004840.work.670-kees@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The mddev argument in export_rdev() is never used. Remove it to
simplify callers.
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/linux-raid/20260304111417.20777-1-chencheng@fnnas.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Implement the SCSI-specific io_uring command handler for BSG using
struct bsg_uring_cmd.
The handler builds a SCSI request from the io_uring command, maps user
buffers (including fixed buffers), and completes asynchronously via a
request end_io callback and task_work. Completion returns a 32-bit
status and packed residual/sense information via CQE res and res2, and
supports IO_URING_F_NONBLOCK.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260317072226.2598233-4-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the handle_stripe() documentation comment from above
analyse_stripe() to directly above handle_stripe() where it belongs.
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Link: https://lore.kernel.org/linux-raid/20260304111001.15767-1-chencheng@fnnas.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Add an io_uring command handler to the generic BSG layer. The new
.uring_cmd file operation validates io_uring features and delegates
handling to a per-queue bsg_uring_cmd_fn callback.
Extend bsg_register_queue() so transport drivers can register both
sg_io and io_uring command handlers.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260317072226.2598233-3-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove the unused md_raid5_kick_device() declaration from raid5.h -
no definition exists for this function.
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Link: https://lore.kernel.org/linux-raid/20260304110919.15071-1-chencheng@fnnas.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Add the bsg_uring_cmd structure to the BSG UAPI header to support
io_uring-based SCSI passthrough operations via IORING_OP_URING_CMD.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260317072226.2598233-2-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Interval tree uses [start, end] as a region which stores in the tree.
In raid1, it uses the wrong end value. For example:
bio(A,B) is too big and needs to be split to bio1(A,C-1), bio2(C,B).
The region of bio1 is [A,C] and the region of bio2 is [C,B]. So bio1 and
bio2 overlap which is not right.
Fix this problem by using right end value of the region.
Fixes: d0d2d8ba0494 ("md/raid1: introduce wait_for_serialization")
Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/linux-raid/20260305011839.5118-2-xni@redhat.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
The function bio_add_page() returns the number of bytes added to the
bio, and if that failed it should return 0.
However there is a special quirk, if a caller is passing a page with
length 0, that function will always return 0 but with different results:
- The page is added to the bio
If there is enough bvec slot or the folio can be merged with the last
bvec.
The return value 0 is just the length passed in, which is also 0.
- The page is not added to the bio
If the page is not mergeable with the last bvec, or there is no bvec
slot available.
The return value 0 means page is not added into the bio.
Unfortunately the caller is not able to distinguish the above two cases,
and will treat the 0 return value as page addition failure.
In that case, this can lead to the double releasing of the last page:
- By the bio cleanup
Which normally goes through every page of the bio, including the last
page which is added into the bio.
- By the caller
Which believes the page is not added into the bio, thus would manually
release the page.
I do not think anyone should call bio_add_folio()/bio_add_page() with zero
length, but idiots like me can still show up.
So add an extra WARN_ON_ONCE() check for zero length and rejects it
early to avoid double freeing.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Link: https://patch.msgid.link/bc2223c080f38d0b63f968f605c918181c840f40.1773734749.git.wqu@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When skip_copy is enabled on a doubly-degraded RAID6, a device that is
being written to will be in R5_LOCKED state with R5_UPTODATE cleared.
If a new read triggers fetch_block() while the write is still in
flight, the 2-failure compute path may select this locked device as a
compute target because it is not R5_UPTODATE.
Because skip_copy makes the device page point directly to the bio page,
reconstructing data into it might be risky. Also, since the compute
marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io()
which checks that R5_SkipCopy and R5_UPTODATE are not both set.
This can be reproduced by running small-range concurrent read/write on
a doubly-degraded RAID6 with skip_copy enabled, for example:
mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing
echo 1 > /sys/block/md0/md/skip_copy
fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \
--iodepth=32 --size=4M --runtime=30 --time_based --direct=1
Fix by checking R5_LOCKED before proceeding with the compute. The
compute will be retried once the lock is cleared on IO completion.
Signed-off-by: FengWei Shih <dannyshih@synology.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
Link: https://lore.kernel.org/linux-raid/20260319053351.3676794-1-dannyshih@synology.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>