Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Pull iommu updates from Joerg Roedel:
"Core changes:
- Rust bindings for IO-pgtable code
- IOMMU page allocation debugging support
- Disable ATS during PCI resets
Intel VT-d changes:
- Skip dev-iotlb flush for inaccessible PCIe device
- Flush cache for PASID table before using it
- Use right invalidation method for SVA and NESTED domains
- Ensure atomicity in context and PASID entry updates
AMD-Vi changes:
- Support for nested translations
- Other minor improvements
ARM-SMMU-v2 changes:
- Configure SoC-specific prefetcher settings for Qualcomm's "MDSS"
ARM-SMMU-v3 changes:
- Improve CMDQ locking fairness for pathetically small queue sizes
- Remove tracking of the IAS as this is only relevant for AArch32 and
was causing C_BAD_STE errors
- Add device-tree support for NVIDIA's CMDQV extension
- Allow some hitless transitions for the 'MEV' and 'EATS' STE fields
- Don't disable ATS for nested S1-bypass nested domains
- Additions to the kunit selftests"
* tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
iommu/amd: serialize sequence allocation under concurrent TLB invalidations
iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
iommu/tegra241-cmdqv: Decouple driver from ACPI
iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
iommu/vt-d: Fix race condition during PASID entry replacement
iommu/vt-d: Clear Present bit before tearing down context entry
iommu/vt-d: Clear Present bit before tearing down PASID entry
iommu/vt-d: Flush piotlb for SVM and Nested domain
iommu/vt-d: Flush cache for PASID table before using it
iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
rust: iommu: fix `srctree` link warning
rust: iommu: fix Rust formatting
...
Pull landlock updates from Mickaël Salaün:
- extend Landlock to enforce restrictions on a whole process, similarly
to the seccomp's TSYNC flag
- refactor data structures to simplify code and improve performance
- add documentation to cover missing parts
* tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
mailmap: Add entry for Mickaël Salaün
landlock: Transpose the layer masks data structure
landlock: Add access_mask_subset() helper
selftests/landlock: Add filesystem access benchmark
landlock: Document audit blocker field format
landlock: Add errata documentation section
landlock: Add backwards compatibility for restrict flags
landlock: Refactor TCP socket type check
landlock: Minor reword of docs for TCP access rights
landlock: Document LANDLOCK_RESTRICT_SELF_TSYNC
selftests/landlock: Add LANDLOCK_RESTRICT_SELF_TSYNC tests
landlock: Multithreading support for landlock_restrict_self()
Pull integrity updates from Mimi Zohar:
"Just two bug fixes: IMA's detecting scripts (bprm_creds_for_exec), and
calculating the EVM HMAC"
* tag 'integrity-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
evm: Use ordered xattrs list to calculate HMAC in evm_init_hmac()
ima: Fix stack-out-of-bounds in is_bprm_creds_for_exec()
My Microsoft address is no longer used. Add a mailmap entry to reflect
that.
Cc: Günther Noack <gnoack@google.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20260207111136.577249-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The PASID_FLAG_PAGE_SNOOP and PASID_FLAG_PWSNP constants are identical.
This will cause the pasid code to always set both or neither of the
PGSNP and PWSNP bits in PASID table entries. However, PWSNP is a
reserved bit if SMPWC is not set in the IOMMU's extended capability
register, even if SC is supported.
This has resulted in DMAR errors when testing the iommufd code on an
Arrow Lake platform. With this patch, those errors disappear and the
PASID table entries look correct.
Fixes: 101a2854110fa ("iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry")
Cc: stable@vger.kernel.org
Signed-off-by: Viktor Kleen <viktor@kleen.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260202192109.1665799-1-viktor@kleen.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
A vSTE may have three configuration types: Abort, Bypass, and Translate.
An Abort vSTE wouldn't enable ATS, but the other two might.
It makes sense for a Transalte vSTE to rely on the guest vSTE.EATS field.
For a Bypass vSTE, it would end up with an S2-only physical STE, similar
to an attachment to a regular S2 domain. However, the nested case always
disables ATS following the Bypass vSTE, while the regular S2 case always
enables ATS so long as arm_smmu_ats_supported(master) == true.
Note that ATS is needed for certain VM centric workloads and historically
non-vSMMU cases have relied on this automatic enablement. So, having the
nested case behave differently causes problems.
To fix that, add a condition to disable_ats, so that it might enable ATS
for a Bypass vSTE, aligning with the regular S2 case.
Fixes: f27298a82ba0 ("iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
The Intel VT-d PASID table entry is 512 bits (64 bytes). When replacing
an active PASID entry (e.g., during domain replacement), the current
implementation calculates a new entry on the stack and copies it to the
table using a single structure assignment.
struct pasid_entry *pte, new_pte;
pte = intel_pasid_get_entry(dev, pasid);
pasid_pte_config_first_level(iommu, &new_pte, ...);
*pte = new_pte;
Because the hardware may fetch the 512-bit PASID entry in multiple
128-bit chunks, updating the entire entry while it is active (Present
bit set) risks a "torn" read. In this scenario, the IOMMU hardware
could observe an inconsistent state — partially new data and partially
old data — leading to unpredictable behavior or spurious faults.
Fix this by removing the unsafe "replace" helpers and following the
"clear-then-update" flow, which ensures the Present bit is cleared and
the required invalidation handshake is completed before the new
configuration is applied.
Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With concurrent TLB invalidations, completion wait randomly gets timed out
because cmd_sem_val was incremented outside the IOMMU spinlock, allowing
CMD_COMPL_WAIT commands to be queued out of sequence and breaking the
ordering assumption in wait_on_sem().
Move the cmd_sem_val increment under iommu->lock so completion sequence
allocation is serialized with command queuing.
And remove the unnecessary return.
Fixes: d2a0cac10597 ("iommu/amd: move wait_on_sem() out of spinlock")
Tested-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add current (iova, len) to the iotlb gather, regardless of the setting
of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.
In gather_range_pages(), the current IOVA range is only added to
iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with
NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather
will stay empty (start=ULONG_MAX, end=0) after initialization, and the
current (iova, len) will not be added to the iotlb_gather, causing
subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong
parameters (e.g., amd_iommu_iotlb_sync() computes size from
gather->end - gather->start + 1, leading to an invalid range).
The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain
unchanged: when the new range is disjoint from the existing gather,
we still sync first and then add the new range, so semantics for
NO_GAPS are preserved.
Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull smack updates from Casey Schaufler:
"Two improvements to the code for setting the CIPSO Domain Of
Interpretation (DOI), a seldom used feature, and a formatting change"
* tag 'Smack-for-7.0' of https://github.com/cschaufler/smack-next:
smack: /smack/doi: accept previously used values
smack: /smack/doi must be > 0
security: smack: fix indentation in smack_access.c
Commit 8e5d9f916a96 ("smack: deduplicate xattr setting in
smack_inode_init_security()") introduced xattr_dupval() to simplify setting
the xattrs to be provided by the SMACK LSM on inode creation, in the
smack_inode_init_security().
Unfortunately, moving lsm_get_xattr_slot() caused the SMACK64TRANSMUTE
xattr be added in the array of new xattrs before SMACK64. This causes the
HMAC of xattrs calculated by evm_init_hmac() for new files to diverge from
the one calculated by both evm_calc_hmac_or_hash() and evmctl.
evm_init_hmac() calculates the HMAC of the xattrs of new files based on the
order LSMs provide them, while evm_calc_hmac_or_hash() and evmctl calculate
the HMAC based on an ordered xattrs list.
Fix the issue by making evm_init_hmac() calculate the HMAC of new files
based on the ordered xattrs list too.
Fixes: 8e5d9f916a96 ("smack: deduplicate xattr setting in smack_inode_init_security()")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
The layer masks data structure tracks the requested but unfulfilled
access rights during an operation's security check. It stores one bit
for each combination of access right and layer index. If the bit is
set, that access right is not granted (yet) in the given layer and we
have to traverse the path further upwards to grant it.
Previously, the layer masks were stored as arrays mapping from access
right indices to layer_mask_t. The layer_mask_t value then indicates
all layers in which the given access right is still (tentatively)
denied.
This patch introduces struct layer_access_masks instead: This struct
contains an array with the access_mask_t of each (tentatively) denied
access right in that layer.
The hypothesis of this patch is that this simplifies the code enough
so that the resulting code will run faster:
* We can use bitwise operations in multiple places where we previously
looped over bits individually with macros. (Should require less
branch speculation and lends itself to better loop unrolling.)
* Code is ~75 lines smaller.
Other noteworthy changes:
* In no_more_access(), call a new helper function may_refer(), which
only solves the asymmetric case. Previously, the code interleaved
the checks for the two symmetric cases in RENAME_EXCHANGE. It feels
that the code is clearer when renames without RENAME_EXCHANGE are
more obviously the normal case.
Tradeoffs:
This change improves performance, at a slight size increase to the
layer masks data structure.
This fixes the size of the data structure at 32 bytes for all types of
access rights. (64, once we introduce a 17th filesystem access right).
For filesystem access rights, at the moment, the data structure has
the same size as before, but once we introduce the 17th filesystem
access right, it will double in size (from 32 to 64 bytes), as
access_mask_t grows from 16 to 32 bit [1].
Link: https://lore.kernel.org/all/20260120.haeCh4li9Vae@digikod.net/ [1]
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260206151154.97915-5-gnoack3000@gmail.com
[mic: Cosmetic fixes, moved struct layer_access_masks definition]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
STE in a nested case requires both S1 and S2 fields. And this makes the use
case different from the existing one.
Add coverage for previously failed cases shifting between S2-only and S1+S2
STEs.
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
When tearing down a context entry, the current implementation zeros the
entire 128-bit entry using multiple 64-bit writes. This creates a window
where the hardware can fetch a "torn" entry — where some fields are
already zeroed while the 'Present' bit is still set — leading to
unpredictable behavior or spurious faults.
While x86 provides strong write ordering, the compiler may reorder writes
to the two 64-bit halves of the context entry. Even without compiler
reordering, the hardware fetch is not guaranteed to be atomic with
respect to multiple CPU writes.
Align with the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:
1. Clear only the 'Present' (P) bit of the context entry first to
signal the transition of ownership from hardware to software.
2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
3. Perform the required cache and context-cache invalidation to ensure
hardware no longer has cached references to the entry.
4. Fully zero out the entry only after the invalidation is complete.
Also, add a dma_wmb() to context_set_present() to ensure the entry
is fully initialized before the 'Present' bit becomes visible.
Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka <dmaluka@chromium.org>
Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>