Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
raid5_make_request() will free the cloned bio. But raid5_make_request()
can call make_stripe_request() multiple times, writing to the various
stripes. If that bio got added to the toread or towrite lists of a
stripe disk in an earlier call to make_stripe_request(), then it's not
safe to just free the bio if a later part of it is found to cross the
reshape position. Doing so can lead to a UAF error, when bio_endio()
is called on the bio for the earlier stripes.
Instead, raid5_make_request() needs to wait until all parts of the bio
have called bio_endio(). To do this, bios that cross the reshape
position while the reshape can't make progress are flagged as needing to
wait for all parts to complete. When raid5_make_request() has a bio that
failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
bi->bi_private to a completion struct and waits for completion after
ending the bio. When the bio_endio() is called for the last time on a
clone bio with bi->bi_private set, it wakes up the waiter. This
guarantees that raid5_make_request() doesn't return until the cloned bio
needing a retry for io across the reshape boundary is safely cleaned up.
There is a simple reproducer available at [1]. Compile the kernel with
KASAN for more useful reporting when the error is triggered (this is not
necessary to see the bug).
[1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20260408043548.1695157-1-bmarzins@redhat.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
The cdrom core never calls set_disk_ro() for a registered device, so
BLKROGET on a CD-ROM device always returns 0 (writable), even when the
drive has no write capabilities and writes will inevitably fail. This
causes problems for userspace that relies on BLKROGET to determine
whether a block device is read-only. For example, systemd's loop device
setup uses BLKROGET to decide whether to create a loop device with
LO_FLAGS_READ_ONLY. Without the read-only flag, writes pass through the
loop device to the CD-ROM and fail with I/O errors. systemd-fsck
similarly checks BLKROGET to decide whether to run fsck in no-repair
mode (-n).
The write-capability bits in cdi->mask come from two different sources:
CDC_DVD_RAM and CDC_CD_RW are populated by the driver from the MODE
SENSE capabilities page (page 0x2A) before register_cdrom() is called,
while CDC_MRW_W and CDC_RAM require the MMC GET CONFIGURATION command
and were only probed by cdrom_open_write() at device open time. This
meant that any attempt to compute the writable state from the full
mask at probe time was incorrect, because the GET CONFIGURATION bits
were still unset (and cdi->mask is initialized such that capabilities
are assumed present).
Fix this by factoring the GET CONFIGURATION probing out of
cdrom_open_write() into a new exported helper,
cdrom_probe_write_features(), and having sr call it from sr_probe()
right after get_capabilities() has populated the MODE SENSE bits.
register_cdrom() then calls set_disk_ro() based on the full
write-capability mask (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW)
so the block layer reflects the drive's actual write support. The
feature queries used (CDF_MRW and CDF_RWRT via GET CONFIGURATION with
RT=00) report drive-level capabilities that are persistent across
media, so a single probe before register_cdrom() is sufficient and the
redundant probe at open time is dropped.
With set_disk_ro() now accurate, the long-vestigial cd->writeable flag
in sr can go: get_capabilities() used to set cd->writeable based on
the same four mask bits, but because CDC_MRW_W and CDC_RAM default to
"capability present" in cdi->mask and aren't touched by MODE SENSE,
the condition that gated cd->writeable was always true, making it
unconditionally 1. Replace the corresponding gate in sr_init_command()
with get_disk_ro(cd->disk), which turns a previously no-op check into
a real one and also catches kernel-internal bio writers that bypass
blkdev_write_iter()'s bdev_read_only() check.
The sd driver (SCSI disks) does not have this problem because it
checks the MODE SENSE Write Protect bit and calls set_disk_ro()
accordingly. The sr driver cannot use the same approach because the
MMC specification does not define the WP bit in the MODE SENSE
device-specific parameter byte for CD-ROM devices.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Daan De Meyer <daan@amutable.com>
Reviewed-by: Phillip Potter <phil@philpotter.co.uk>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://patch.msgid.link/20260427210139.1400-2-phil@philpotter.co.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Keith:
"- Target data transfer size confiruation (Aurelien)
- Enable P2P for RDMA (Shivaji Kant)
- TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
- TCP host updates (Alistair, Chaitanya)
- Authentication updates (Alistair, Daniel, Chris Leech)
- Multipath fixes (John Garry)
- New quirks (Alan Cui, Tao Jiang)
- Apple driver fix (Fedor Pchelkin)
- PCI admin doorbell update fix (Keith)"
* tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme: (22 commits)
nvme-auth: Hash DH shared secret to create session key
nvme-pci: fix missed admin queue sq doorbell write
nvme-auth: Include SC_C in RVAL controller hash
nvme-tcp: teardown circular locking fixes
nvmet-tcp: Don't clear tls_key when freeing sq
Revert "nvmet-tcp: Don't free SQ on authentication success"
nvme: skip trace completion for host path errors
nvme-pci: add quirk for Memblaze Pblaze5 (0x1c5f:0x0555)
nvme-multipath: put module reference when delayed removal work is canceled
nvme: expose TLS mode
nvme-apple: drop invalid put of admin queue reference count
nvme-core: fix parameter name in comment
nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free
nvme-multipath: drop head pointer check in nvme_mpath_clear_current_path()
nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus )
nvmet-tcp: fix race between ICReq handling and queue teardown
nvmet-tcp: remove redundant calls to nvmet_tcp_fatal_error()
nvmet-tcp: propagate nvmet_tcp_build_pdu_iovec() errors to its callers
nvme: enable PCI P2PDMA support for RDMA transport
nvmet: introduce new mdts configuration entry
...
The NVMe Base Specification 8.3.5.5.9 states that the session key Ks
shall be computed from the ephemeral DH key by applying the hash
function selected by the HashID parameter.
The current implementation stores the raw DH shared secret as the
session key without hashing it. This causes redundant hash operations:
1. Augmented challenge computation (section 8.3.5.5.4) requires
Ca = HMAC(H(g^xy mod p), C). The code compensates by hashing the
unhashed session key in nvme_auth_augmented_challenge() to produce
the correct result.
2. PSK generation (section 8.3.5.5.9) requires PSK = HMAC(Ks, C1 || C2)
where Ks should already be H(g^xy mod p). As the DH shared secret
is always larger than the HMAC block size, HMAC internally hashes
it before use, accidentally producing the correct result.
When using secure channel concatenation with bidirectional
authentication, this results in hashing the DH value three times: twice
for augmented challenge calculations and once during PSK generation.
Fix this by:
- Modifying nvme_auth_gen_shared_secret() to hash the DH shared secret
once after computation: Ks = H(g^xy mod p)
- Removing the hash operation from nvme_auth_augmented_challenge()
as the session key is now already hashed
- Updating session key buffer size from DH key size to hash output size
- Adding specification references in comments
This avoid storing the raw DH shared secret and reduces the number of
hash operations from three to one when using secure channel
concatenation.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull clk fix from Stephen Boyd:
"One more fix for the merge window to avoid a boot hang on
Raspberry Pi 3B by marking the VEC clk critical so that it
doesn't get turned off and hang the bus"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: bcm: rpi: Mark VEC clock as CLK_IGNORE_UNUSED
We can batch admin commands submitted through io_uring_cmd passthrough,
which means bd->last may be false and skips the doorbell write to
aggregate multiple commands per write. If a subsequent command can't be
dispatched for whatever reason, we have to provide the blk-mq ops'
commit_rqs callback in order to ensure we properly update the doorbell.
Fixes: 58e5bdeb9c2b ("nvme: enable uring-passthrough for admin commands")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull PCIe TSP update from Dan Williams:
"A small update for the TSM core. It is arguably a fix and coming in
late as I have been offline the past few weeks:
- Drop class_create() for the 'tsm' class"
* tag 'tsm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm:
virt: coco: change tsm_class to a const struct
On Raspberry Pi 3B, the VEC clock is used by the VideoCore firmware
display driver, which remains active until the vc4 driver loads and
sends NOTIFY_DISPLAY_DONE. If this clock is disabled during boot, a bus
lockup happens and the firmware becomes unresponsive, causing a complete
system lockup.
Mark the VEC clock with CLK_IGNORE_UNUSED so it survives the unused
clock disablement and remains available until the vc4 driver takes over
display management.
Fixes: 672299736af6 ("clk: bcm: rpi: Manage clock rate in prepare/unprepare callbacks")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/r/5f0bec08-f458-4fba-8bf3-06817a100c4c@sirena.org.uk
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patch.msgid.link/20260401111416.562279-2-mcanal@igalia.com
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Brian Masney <bmasney@redhat.com> # Active contributor to clk
Reviewed-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Section 8.3.4.5.5 of the NVMe Base Specification 2.1 describes what is
included in the Response Value (RVAL) hash and SC_C should be included.
Currently we are hardcoding 0 instead of using the correct SC_C value.
Update the host and target code to use the SC_C when calculating the
RVAL instead of using 0.
Fixes: e88a7595b57f2 ("nvme-tcp: request secure channel concatenation")
Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull Kbuild fixes from Nicolas Schier:
- builddeb - avoid recompiles for non-cross-compiles
Avoid triggering complete rebuilds for non-cross-compile Debian
package builds by only triggering the rebuild of host tools for
actual cross-compile builds
- Never respect CONFIG_WERROR / W=e to fixdep
Avoid spurious rebuilds of fixdep w/ and w/o -Werror during a single
kbuild invocation by never respecting CONFIG_WERROR for fixdep
* tag 'kbuild-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Never respect CONFIG_WERROR / W=e to fixdep
kbuild: builddeb - avoid recompiles for non-cross-compiles
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change tsm_class to be a const struct class and drop the
class_create() call. Compile tested only.
Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Changes with v1:
- Removed redundant int err variable.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260306183325.245254-1-jkoolstra@xs4all.nl
Signed-off-by: Dan Williams <djbw@kernel.org>
* clk-samsung:
clk: samsung: exynos850: Add APM-to-AP mailbox clock
dt-bindings: clock: exynos850: Add APM_AP MAILBOX clock
clk: samsung: Use %pe format to simplify
clk: samsung: pll: Fix possible truncation in a9fraco recalc rate
clk: samsung: exynosautov920: add block G3D clock support
dt-bindings: clock: exynosautov920: add G3D clock definitions
clk: samsung: gs101: harmonise symbol names (clock arrays)
clk: samsung: artpec-9: Add initial clock support for ARTPEC-9 SoC
clk: samsung: Add clock PLL support for ARTPEC-9 SoC
dt-bindings: clock: Add ARTPEC-9 clock controller
* clk-qcom: (67 commits)
clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
clk: qcom: rpmh: Add support for Nord rpmh clocks
clk: qcom: Add TCSR clock driver for Nord SoC
dt-bindings: clock: qcom: Add Nord Global Clock Controller
dt-bindings: clock: qcom-rpmhcc: Add support for Nord SoCs
dt-bindings: clock: qcom: Document the Nord SoC TCSR Clock Controller
clk: qcom: gcc-x1e80100: Keep GCC USB QTB clock always ON
clk: qcom: Constify list of critical CBCR registers
clk: qcom: Constify qcom_cc_driver_data
clk: qcom: videocc-glymur: Constify qcom_cc_desc
clk: qcom: Add a driver for SM8750 GPU clocks
dt-bindings: clock: qcom: Add SM8750 GPU clocks
clk: qcom: ipq-cmn-pll: Add IPQ8074 SoC support
dt-bindings: clock: qcom: Add CMN PLL support for IPQ8074
clk: qcom: ipq-cmn-pll: Add IPQ6018 SoC support
dt-bindings: clock: qcom: Add CMN PLL support for IPQ6018
clk: qcom: gdsc: Fix error path on registration of multiple pm subdomains
dt-bindings: clock: qcom: Add missing power-domains property
clk: qcom: gcc-eliza: Enable FORCE_MEM_CORE_ON for UFS AXI PHY clock
clk: qcom: dispcc-sc7180: Add missing MDSS resets
...
* clk-round:
clk: divider: remove divider_round_rate() and divider_round_rate_parent()
clk: divider: remove divider_ro_round_rate_parent()
clk: remove round_rate() clk ops
clk: composite: convert from round_rate() to determine_rate()
clk: test: remove references to clk_ops.round_rate
* clk-sai:
clk: fsl-sai: Add MCLK generation support
clk: fsl-sai: Extract clock setup into fsl_sai_clk_register()
dt-bindings: clock: fsl-sai: Document clock-cells = <1> support
clk: fsl-sai: Add i.MX8M support with 8 byte register offset
clk: fsl-sai: Sort the headers
dt-bindings: clock: fsl-sai: Document i.MX8M support
* clk-cleanup:
clk: visconti: pll: initialize clk_init_data to zero
clk: xgene: Fix mapping leak in xgene_pllclk_init()
clk: Simplify clk_is_match()
clk: baikal-t1: Remove not-going-to-be-supported code for Baikal SoC
clk: mvebu: armada-37xx-periph: fix __iomem casts in structure init
clk: qoriq: avoid format string warning
When a controller reset is triggered via sysfs (by writing to
/sys/class/nvme/<nvmedev>/reset_controller), the reset work tears down
and re-establishes all queues. The socket release using fput() defers
the actual cleanup to task_work delayed_fput workqueue. This deferred
cleanup can race with the subsequent queue re-allocation during reset,
potentially leading to use-after-free or resource conflicts.
Replace fput() with __fput_sync() to ensure synchronous socket release,
guaranteeing that all socket resources are fully cleaned up before the
function returns. This prevents races during controller reset where
new queue setup may begin before the old socket is fully released.
* Call chain during reset:
nvme_reset_ctrl_work()
-> nvme_tcp_teardown_ctrl()
-> nvme_tcp_teardown_io_queues()
-> nvme_tcp_free_io_queues()
-> nvme_tcp_free_queue() <-- fput() -> __fput_sync()
-> nvme_tcp_teardown_admin_queue()
-> nvme_tcp_free_admin_queue()
-> nvme_tcp_free_queue() <-- fput() -> __fput_sync()
-> nvme_tcp_setup_ctrl() <-- race with deferred fput
memalloc_noreclaim_save() sets PF_MEMALLOC which is intended for tasks
performing memory reclaim work that need reserve access. While PF_MEMALLOC
prevents the task from entering direct reclaim (causing __need_reclaim() to
return false), it does not strip __GFP_IO from gfp flags. The allocator can
therefore still trigger writeback I/O when __GFP_IO remains set, which is
unsafe when the caller holds block layer locks.
Switch to memalloc_noio_save() which sets PF_MEMALLOC_NOIO. This causes
current_gfp_context() to strip __GFP_IO|__GFP_FS from every allocation in
the scope, making it safe to allocate memory while holding elevator_lock and
set->srcu.
* The issue can be reproduced using blktests:
nvme_trtype=tcp ./check nvme/005
blktests (master) # nvme_trtype=tcp ./check nvme/005
nvme/005 (tr=tcp) (reset local loopback target) [failed]
runtime 0.725s ... 0.798s
something found in dmesg:
[ 108.473940] run blktests nvme/005 at 2025-11-22 16:12:20
[...]
...
(See '/root/blktests/results/nodev_tr_tcp/nvme/005.dmesg' for the entire message)
blktests (master) # cat /root/blktests/results/nodev_tr_tcp/nvme/005.dmesg
[ 108.473940] run blktests nvme/005 at 2025-11-22 16:12:20
[ 108.526983] loop0: detected capacity change from 0 to 2097152
[ 108.555606] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 108.572531] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 108.613061] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 108.616832] nvme nvme0: creating 48 I/O queues.
[ 108.630791] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[ 108.661892] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 108.746639] nvmet: Created nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 108.748466] nvme nvme0: creating 48 I/O queues.
[ 108.802984] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[ 108.829983] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
[ 108.854288] block nvme0n1: no available path - failing I/O
[ 108.854344] block nvme0n1: no available path - failing I/O
[ 108.854373] Buffer I/O error on dev nvme0n1, logical block 1, async page read
[ 108.891693] ======================================================
[ 108.895912] WARNING: possible circular locking dependency detected
[ 108.900184] 6.17.0nvme+ #3 Tainted: G N
[ 108.903913] ------------------------------------------------------
[ 108.908171] nvme/2734 is trying to acquire lock:
[ 108.911957] ffff88810210e610 (set->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x17/0x170
[ 108.917587]
but task is already holding lock:
[ 108.921570] ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[ 108.927361]
which lock already depends on the new lock.
[ 108.933018]
the existing dependency chain (in reverse order) is:
[ 108.938223]
-> #4 (&q->elevator_lock){+.+.}-{4:4}:
[ 108.942988] __mutex_lock+0xa2/0x1150
[ 108.945873] elevator_change+0xa8/0x1c0
[ 108.948925] elv_iosched_store+0xdf/0x140
[ 108.952043] kernfs_fop_write_iter+0x16a/0x220
[ 108.955367] vfs_write+0x378/0x520
[ 108.957598] ksys_write+0x67/0xe0
[ 108.959721] do_syscall_64+0x76/0xbb0
[ 108.962052] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 108.965145]
-> #3 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 108.968923] blk_alloc_queue+0x30e/0x350
[ 108.972117] blk_mq_alloc_queue+0x61/0xd0
[ 108.974677] scsi_alloc_sdev+0x2a0/0x3e0
[ 108.977092] scsi_probe_and_add_lun+0x1bd/0x430
[ 108.979921] __scsi_add_device+0x109/0x120
[ 108.982504] ata_scsi_scan_host+0x97/0x1c0
[ 108.984365] async_run_entry_fn+0x2d/0x130
[ 108.986109] process_one_work+0x20e/0x630
[ 108.987830] worker_thread+0x184/0x330
[ 108.989473] kthread+0x10a/0x250
[ 108.990852] ret_from_fork+0x297/0x300
[ 108.992491] ret_from_fork_asm+0x1a/0x30
[ 108.994159]
-> #2 (fs_reclaim){+.+.}-{0:0}:
[ 108.996320] fs_reclaim_acquire+0x99/0xd0
[ 108.998058] kmem_cache_alloc_node_noprof+0x4e/0x3c0
[ 109.000123] __alloc_skb+0x15f/0x190
[ 109.002195] tcp_send_active_reset+0x3f/0x1e0
[ 109.004038] tcp_disconnect+0x50b/0x720
[ 109.005695] __tcp_close+0x2b8/0x4b0
[ 109.007227] tcp_close+0x20/0x80
[ 109.008663] inet_release+0x31/0x60
[ 109.010175] __sock_release+0x3a/0xc0
[ 109.011778] sock_close+0x14/0x20
[ 109.013263] __fput+0xee/0x2c0
[ 109.014673] delayed_fput+0x31/0x50
[ 109.016183] process_one_work+0x20e/0x630
[ 109.017897] worker_thread+0x184/0x330
[ 109.019543] kthread+0x10a/0x250
[ 109.020929] ret_from_fork+0x297/0x300
[ 109.022565] ret_from_fork_asm+0x1a/0x30
[ 109.024194]
-> #1 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[ 109.026634] lock_sock_nested+0x2e/0x70
[ 109.028251] tcp_sendmsg+0x1a/0x40
[ 109.029783] sock_sendmsg+0xed/0x110
[ 109.031321] nvme_tcp_try_send_cmd_pdu+0x13e/0x260 [nvme_tcp]
[ 109.034263] nvme_tcp_try_send+0xb3/0x330 [nvme_tcp]
[ 109.036375] nvme_tcp_queue_rq+0x342/0x3d0 [nvme_tcp]
[ 109.038528] blk_mq_dispatch_rq_list+0x297/0x800
[ 109.040448] __blk_mq_sched_dispatch_requests+0x3db/0x5f0
[ 109.042677] blk_mq_sched_dispatch_requests+0x29/0x70
[ 109.044787] blk_mq_run_work_fn+0x76/0x1b0
[ 109.046535] process_one_work+0x20e/0x630
[ 109.048245] worker_thread+0x184/0x330
[ 109.049890] kthread+0x10a/0x250
[ 109.051331] ret_from_fork+0x297/0x300
[ 109.053024] ret_from_fork_asm+0x1a/0x30
[ 109.054740]
-> #0 (set->srcu){.+.+}-{0:0}:
[ 109.056850] __lock_acquire+0x1468/0x2210
[ 109.058614] lock_sync+0xa5/0x110
[ 109.060048] __synchronize_srcu+0x49/0x170
[ 109.061802] elevator_switch+0xc9/0x330
[ 109.063950] elevator_change+0x128/0x1c0
[ 109.065675] elevator_set_none+0x4c/0x90
[ 109.067316] blk_unregister_queue+0xa8/0x110
[ 109.069165] __del_gendisk+0x14e/0x3c0
[ 109.070824] del_gendisk+0x75/0xa0
[ 109.072328] nvme_ns_remove+0xf2/0x230 [nvme_core]
[ 109.074365] nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[ 109.076652] nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[ 109.078775] nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[ 109.081009] nvme_sysfs_delete+0x34/0x40 [nvme_core]
[ 109.083082] kernfs_fop_write_iter+0x16a/0x220
[ 109.085009] vfs_write+0x378/0x520
[ 109.086539] ksys_write+0x67/0xe0
[ 109.087982] do_syscall_64+0x76/0xbb0
[ 109.089577] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 109.091665]
other info that might help us debug this:
[ 109.095478] Chain exists of:
set->srcu --> &q->q_usage_counter(io) --> &q->elevator_lock
[ 109.099544] Possible unsafe locking scenario:
[ 109.101708] CPU0 CPU1
[ 109.103402] ---- ----
[ 109.105103] lock(&q->elevator_lock);
[ 109.106530] lock(&q->q_usage_counter(io));
[ 109.109022] lock(&q->elevator_lock);
[ 109.111391] sync(set->srcu);
[ 109.112586]
*** DEADLOCK ***
[ 109.114772] 5 locks held by nvme/2734:
[ 109.116189] #0: ffff888101925410 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x67/0xe0
[ 109.119143] #1: ffff88817a914e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10f/0x220
[ 109.123141] #2: ffff8881046313f8 (kn->active#185){++++}-{0:0}, at: sysfs_remove_file_self+0x26/0x50
[ 109.126543] #3: ffff88810470e1d0 (&set->update_nr_hwq_lock){++++}-{4:4}, at: del_gendisk+0x6d/0xa0
[ 109.129891] #4: ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[ 109.133149]
stack backtrace:
[ 109.134817] CPU: 6 UID: 0 PID: 2734 Comm: nvme Tainted: G N 6.17.0nvme+ #3 PREEMPT(voluntary)
[ 109.134819] Tainted: [N]=TEST
[ 109.134820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 109.134821] Call Trace:
[ 109.134823] <TASK>
[ 109.134824] dump_stack_lvl+0x75/0xb0
[ 109.134828] print_circular_bug+0x26a/0x330
[ 109.134831] check_noncircular+0x12f/0x150
[ 109.134834] __lock_acquire+0x1468/0x2210
[ 109.134837] ? __synchronize_srcu+0x17/0x170
[ 109.134838] lock_sync+0xa5/0x110
[ 109.134840] ? __synchronize_srcu+0x17/0x170
[ 109.134842] __synchronize_srcu+0x49/0x170
[ 109.134843] ? mark_held_locks+0x49/0x80
[ 109.134845] ? _raw_spin_unlock_irqrestore+0x2d/0x60
[ 109.134847] ? kvm_clock_get_cycles+0x14/0x30
[ 109.134853] ? ktime_get_mono_fast_ns+0x36/0xb0
[ 109.134858] elevator_switch+0xc9/0x330
[ 109.134860] elevator_change+0x128/0x1c0
[ 109.134862] ? kernfs_put.part.0+0x86/0x290
[ 109.134864] elevator_set_none+0x4c/0x90
[ 109.134866] blk_unregister_queue+0xa8/0x110
[ 109.134868] __del_gendisk+0x14e/0x3c0
[ 109.134870] del_gendisk+0x75/0xa0
[ 109.134872] nvme_ns_remove+0xf2/0x230 [nvme_core]
[ 109.134879] nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[ 109.134887] nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[ 109.134893] nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[ 109.134899] nvme_sysfs_delete+0x34/0x40 [nvme_core]
[ 109.134905] kernfs_fop_write_iter+0x16a/0x220
[ 109.134908] vfs_write+0x378/0x520
[ 109.134911] ksys_write+0x67/0xe0
[ 109.134913] do_syscall_64+0x76/0xbb0
[ 109.134915] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 109.134916] RIP: 0033:0x7fd68a737317
[ 109.134917] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 109.134919] RSP: 002b:00007ffded1546d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 109.134920] RAX: ffffffffffffffda RBX: 000000000054f7e0 RCX: 00007fd68a737317
[ 109.134921] RDX: 0000000000000001 RSI: 00007fd68a855719 RDI: 0000000000000003
[ 109.134921] RBP: 0000000000000003 R08: 0000000030407850 R09: 00007fd68a7cd4e0
[ 109.134922] R10: 00007fd68a65b130 R11: 0000000000000246 R12: 00007fd68a855719
[ 109.134923] R13: 00000000304074c0 R14: 00000000304074c0 R15: 0000000030408660
[ 109.134926] </TASK>
[ 109.962756] Key type psk unregistered
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull power utility updates from Len Brown:
"x86_energy_perf_policy:
- Initial SoC Slider support
turbostat:
- Display HT siblings in cpu# order
- Add Module-ID column
- Print Core-ID and APIC-ID in hex
- Fix misc bugs"
* tag 'power-utilities-2026.04.25' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power x86_energy_perf_policy: Version 2026.04.25
tools/power x86_energy_perf_policy.8: Document SoC Slider Options
tools/power x86_energy_perf_policy: Enhances SoC Slider related checks
tools/power turbostat: v2026.04.21
tools/power turbostat: Process HT siblings in CPU order
tools/power turbostat: Show module_id column
tools/power turbostat: Print core_id and apic_id in hex
tools/power turbostat: Cleanup print helper functions
tools/power turbostat: Fix --cpu-set 1 regression on HT systems
tools/power turbostat: Fix --cpu-set 0 regression on HT systems
tools/power turbostat: Fix unrecognized option '-P'
tools/power turbostat: Fix AMD RAPL regression on big systems
tools/power/x86: Add SOC slider and platform profile support
The fixdep hostprog may be built multiple times during a single build.
Once during the configuration phase and later during the regular phase.
As only the regular build phase respects CONFIG_WERROR / W=e, the
compiler flags might change between the phases, leading to rebuilds.
Example, the rebuilds will happen twice on each invocation of the build:
$ make allyesconfig prepare
make[1]: Entering directory '/tmp/deleteme'
HOSTCC scripts/basic/fixdep
#
# No change to .config
#
HOSTCC scripts/basic/fixdep
DESCEND objtool
INSTALL libsubcmd_headers
make[1]: Leaving directory '/tmp/deleteme'
Fix the compilation flags used for scripts/basic/ before
scripts/Makefile.warn is evaluated to stop CONFIG_WERROR / W=e
influencing the fixdep build to avoid the spurious rebuilds.
Fixes: 7ded7d37e5f5 ("scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260422-kbuild-scripts-basic-werror-v1-1-8c6912ff22e0@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>