Linux kernel ============ The Linux kernel is the core of any Linux operating system. It manages hardware, system resources, and provides the fundamental services for all other software. Quick Start ----------- * Report a bug: See Documentation/admin-guide/reporting-issues.rst * Get the latest kernel: https://kernel.org * Build the kernel: See Documentation/admin-guide/quickly-build-trimmed-linux.rst * Join the community: https://lore.kernel.org/ Essential Documentation ----------------------- All users should be familiar with: * Building requirements: Documentation/process/changes.rst * Code of Conduct: Documentation/process/code-of-conduct.rst * License: See COPYING Documentation can be built with make htmldocs or viewed online at: https://www.kernel.org/doc/html/latest/ Who Are You? ============ Find your role below: * New Kernel Developer - Getting started with kernel development * Academic Researcher - Studying kernel internals and architecture * Security Expert - Hardening and vulnerability analysis * Backport/Maintenance Engineer - Maintaining stable kernels * System Administrator - Configuring and troubleshooting * Maintainer - Leading subsystems and reviewing patches * Hardware Vendor - Writing drivers for new hardware * Distribution Maintainer - Packaging kernels for distros * AI Coding Assistant - LLMs and AI-powered development tools For Specific Users ================== New Kernel Developer -------------------- Welcome! Start your kernel development journey here: * Getting Started: Documentation/process/development-process.rst * Your First Patch: Documentation/process/submitting-patches.rst * Coding Style: Documentation/process/coding-style.rst * Build System: Documentation/kbuild/index.rst * Development Tools: Documentation/dev-tools/index.rst * Kernel Hacking Guide: Documentation/kernel-hacking/hacking.rst * Core APIs: Documentation/core-api/index.rst Academic Researcher ------------------- Explore the kernel's architecture and internals: * Researcher Guidelines: Documentation/process/researcher-guidelines.rst * Memory Management: Documentation/mm/index.rst * Scheduler: Documentation/scheduler/index.rst * Networking Stack: Documentation/networking/index.rst * Filesystems: Documentation/filesystems/index.rst * RCU (Read-Copy Update): Documentation/RCU/index.rst * Locking Primitives: Documentation/locking/index.rst * Power Management: Documentation/power/index.rst Security Expert --------------- Security documentation and hardening guides: * Security Documentation: Documentation/security/index.rst * LSM Development: Documentation/security/lsm-development.rst * Self Protection: Documentation/security/self-protection.rst * Reporting Vulnerabilities: Documentation/process/security-bugs.rst * CVE Procedures: Documentation/process/cve.rst * Embargoed Hardware Issues: Documentation/process/embargoed-hardware-issues.rst * Security Features: Documentation/userspace-api/seccomp_filter.rst Backport/Maintenance Engineer ----------------------------- Maintain and stabilize kernel versions: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * Backporting Guide: Documentation/process/backporting.rst * Applying Patches: Documentation/process/applying-patches.rst * Subsystem Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git for Maintainers: Documentation/maintainer/configure-git.rst System Administrator -------------------- Configure, tune, and troubleshoot Linux systems: * Admin Guide: Documentation/admin-guide/index.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Sysctl Tuning: Documentation/admin-guide/sysctl/index.rst * Tracing/Debugging: Documentation/trace/index.rst * Performance Security: Documentation/admin-guide/perf-security.rst * Hardware Monitoring: Documentation/hwmon/index.rst Maintainer ---------- Lead kernel subsystems and manage contributions: * Maintainer Handbook: Documentation/maintainer/index.rst * Pull Requests: Documentation/maintainer/pull-requests.rst * Managing Patches: Documentation/maintainer/modifying-patches.rst * Rebasing and Merging: Documentation/maintainer/rebasing-and-merging.rst * Development Process: Documentation/process/maintainer-handbooks.rst * Maintainer Entry Profile: Documentation/maintainer/maintainer-entry-profile.rst * Git Configuration: Documentation/maintainer/configure-git.rst Hardware Vendor --------------- Write drivers and support new hardware: * Driver API Guide: Documentation/driver-api/index.rst * Driver Model: Documentation/driver-api/driver-model/driver.rst * Device Drivers: Documentation/driver-api/infrastructure.rst * Bus Types: Documentation/driver-api/driver-model/bus.rst * Device Tree Bindings: Documentation/devicetree/bindings/ * Power Management: Documentation/driver-api/pm/index.rst * DMA API: Documentation/core-api/dma-api.rst Distribution Maintainer ----------------------- Package and distribute the kernel: * Stable Kernel Rules: Documentation/process/stable-kernel-rules.rst * ABI Documentation: Documentation/ABI/README * Kernel Configuration: Documentation/kbuild/kconfig.rst * Module Signing: Documentation/admin-guide/module-signing.rst * Kernel Parameters: Documentation/admin-guide/kernel-parameters.rst * Tainted Kernels: Documentation/admin-guide/tainted-kernels.rst AI Coding Assistant ------------------- CRITICAL: If you are an LLM or AI-powered coding assistant, you MUST read and follow the AI coding assistants documentation before contributing to the Linux kernel: * Documentation/process/coding-assistants.rst This documentation contains essential requirements about licensing, attribution, and the Developer Certificate of Origin that all AI tools must comply with. Communication and Support ========================= * Mailing Lists: https://lore.kernel.org/ * IRC: #kernelnewbies on irc.oftc.net * Bugzilla: https://bugzilla.kernel.org/ * MAINTAINERS file: Lists subsystem maintainers and mailing lists * Email Clients: Documentation/process/email-clients.rst
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Clarify the locking rules associated with file level internal variables
inside the cpuset code. There is no functional change.
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2")
incorrectly changed the 2nd parameter of cpuset_update_tasks_cpumask()
from tmp->new_cpus to cp->effective_cpus. This second parameter is just
a temporary cpumask for internal use. The cpuset_update_tasks_cpumask()
function was originally called update_tasks_cpumask() before commit
381b53c3b549 ("cgroup/cpuset: rename functions shared between v1
and v2").
This mistake can incorrectly change the effective_cpus of the
cpuset when it is the top_cpuset or in arm64 architecture where
task_cpu_possible_mask() may differ from cpu_possible_mask. So far
top_cpuset hasn't been passed to update_cpumasks_hier() yet, but arm64
arch can still be impacted. Fix it by reverting the incorrect change.
Fixes: e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The effective_xcpus of a cpuset can contain offline CPUs. In
partition_xcpus_del(), the xcpus parameter is incorrectly used as
a temporary cpumask to mask out offline CPUs. As xcpus can be the
effective_xcpus of a cpuset, this can result in unexpected changes
in that cpumask. Fix this problem by not making any changes to the
xcpus parameter.
Fixes: 11e5f407b64a ("cgroup/cpuset: Keep track of CPUs in isolated partitions")
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When a task is migrated out of a css_set, cgroup_migrate_add_task()
first moves it from cset->tasks to cset->mg_tasks via:
list_move_tail(&task->cg_list, &cset->mg_tasks);
If a css_task_iter currently has it->task_pos pointing to this task,
css_set_move_task() calls css_task_iter_skip() to keep the iterator
valid. However, since the task has already been moved to ->mg_tasks,
the iterator is advanced relative to the mg_tasks list instead of the
original tasks list. As a result, remaining tasks on cset->tasks, as
well as tasks queued on cset->mg_tasks, can be skipped by iteration.
Fix this by calling css_set_skip_task_iters() before unlinking
task->cg_list from cset->tasks. This advances all active iterators to
the next task on cset->tasks, so iteration continues correctly even
when a task is concurrently being migrated.
This race is hard to hit in practice without instrumentation, but it
can be reproduced by artificially slowing down cgroup_procs_show().
For example, on an Android device a temporary
/sys/kernel/cgroup/cgroup_test knob can be added to inject a delay
into cgroup_procs_show(), and then:
1) Spawn three long-running tasks (PIDs 101, 102, 103).
2) Create a test cgroup and move the tasks into it.
3) Enable a large delay via /sys/kernel/cgroup/cgroup_test.
4) In one shell, read cgroup.procs from the test cgroup.
5) Within the delay window, in another shell migrate PID 102 by
writing it to a different cgroup.procs file.
Under this setup, cgroup.procs can intermittently show only PID 101
while skipping PID 103. Once the migration completes, reading the
file again shows all tasks as expected.
Note that this change does not allow removing the existing
css_set_skip_task_iters() call in css_set_move_task(). The new call
in cgroup_migrate_add_task() only handles iterators that are racing
with migration while the task is still on cset->tasks. Iterators may
also start after the task has been moved to cset->mg_tasks. If we
dropped css_set_skip_task_iters() from css_set_move_task(), such
iterators could keep task_pos pointing to a migrating task, causing
css_task_iter_advance() to malfunction on the destination css_set,
up to and including crashes or infinite loops.
The race window between migration and iteration is very small, and
css_task_iter is not on a hot path. In the worst case, when an
iterator is positioned on the first thread of the migrating process,
cgroup_migrate_add_task() may have to skip multiple tasks via
css_set_skip_task_iters(). However, this only happens when migration
and iteration actually race, so the performance impact is negligible
compared to the correctness fix provided here.
Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Qingye Zhao <zhaoqingye@honor.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull networking updates from Paolo Abeni:
"Core & protocols:
- A significant effort all around the stack to guide the compiler to
make the right choice when inlining code, to avoid unneeded calls
for small helper and stack canary overhead in the fast-path.
This generates better and faster code with very small or no text
size increases, as in many cases the call generated more code than
the actual inlined helper.
- Extend AccECN implementation so that is now functionally complete,
also allow the user-space enabling it on a per network namespace
basis.
- Add support for memory providers with large (above 4K) rx buffer.
Paired with hw-gro, larger rx buffer sizes reduce the number of
buffers traversing the stack, dincreasing single stream CPU usage
by up to ~30%.
- Do not add HBH header to Big TCP GSO packets. This simplifies the
RX path, the TX path and the NIC drivers, and is possible because
user-space taps can now interpret correctly such packets without
the HBH hint.
- Allow IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified,
aligning IPv6 to IPv4 behavior.
- Multi-queue aware sch_cake. This makes it possible to scale the
rate shaper of sch_cake across multiple CPUs, while still enforcing
a single global rate on the interface.
- Add support for the nbcon (new buffer console) infrastructure to
netconsole, enabling lock-free, priority-based console operations
that are safer in crash scenarios.
- Improve the TCP ipv6 output path to cache the flow information,
saving cpu cycles, reducing cache line misses and stack use.
- Improve netfilter packet tracker to resolve clashes for most
protocols, avoiding unneeded drops on rare occasions.
- Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
- Reduce tcp socket size by one cache line.
- Notify neighbour changes atomically, avoiding inconsistencies
between the notification sequence and the actual states sequence.
- Add vsock namespace support, allowing complete isolation of vsocks
across different network namespaces.
- Improve xsk generic performances with cache-alignment-oriented
optimizations.
- Support netconsole automatic target recovery, allowing netconsole
to reestablish targets when underlying low-level interface comes
back online.
Driver API:
- Support for switching the working mode (automatic vs manual) of a
DPLL device via netlink.
- Introduce PHY ports representation to expose multiple front-facing
media ports over a single MAC.
- Introduce "rx-polarity" and "tx-polarity" device tree properties,
to generalize polarity inversion requirements for differential
signaling.
- Add helper to create, prepare and enable managed clocks.
Device drivers:
- Add Huawei hinic3 PF etherner driver.
- Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
controller.
- Add ethernet driver for MaxLinear MxL862xx switches
- Remove parallel-port Ethernet driver.
- Convert existing driver timestamp configuration reporting to
hwtstamp_get and remove legacy ioctl().
- Convert existing drivers to .get_rx_ring_count(), simplifing the RX
ring count retrieval. Also remove the legacy fallback path.
- Ethernet high-speed NICs:
- Broadcom (bnxt, bng):
- bnxt: add FW interface update to support FEC stats histogram
and NVRAM defragmentation
- bng: add TSO and H/W GRO support
- nVidia/Mellanox (mlx5):
- improve latency of channel restart operations, reducing the
used H/W resources
- add TSO support for UDP over GRE over VLAN
- add flow counters support for hardware steering (HWS) rules
- use a static memory area to store headers for H/W GRO,
leading to 12% RX tput improvement
- Intel (100G, ice, idpf):
- ice: reorganizes layout of Tx and Rx rings for cacheline
locality and utilizes __cacheline_group* macros on the new
layouts
- ice: introduces Synchronous Ethernet (SyncE) support
- Meta (fbnic):
- adds debugfs for firmware mailbox and tx/rx rings vectors
- Ethernet virtual:
- geneve: introduce GRO/GSO support for double UDP encapsulation
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- some code refactoring and cleanups
- RealTek (r8169):
- add support for RTL8127ATF (10G Fiber SFP)
- add dash and LTR support
- Airoha:
- AN8811HB 2.5 Gbps phy support
- Freescale (fec):
- add XDP zero-copy support
- Thunderbolt:
- add get link setting support to allow bonding
- Renesas:
- add support for RZ/G3L GBETH SoC
- Ethernet switches:
- Maxlinear:
- support R(G)MII slow rate configuration
- add support for Intel GSW150
- Motorcomm (yt921x):
- add DCB/QoS support
- TI:
- icssm-prueth: support bridging (STP/RSTP) via the switchdev
framework
- Ethernet PHYs:
- Realtek:
- enable SGMII and 2500Base-X in-band auto-negotiation
- simplify and reunify C22/C45 drivers
- Micrel: convert bindings to DT schema
- CAN:
- move skb headroom content into skb extensions, making CAN
metadata access more robust
- CAN drivers:
- rcar_canfd:
- add support for FD-only mode
- add support for the RZ/T2H SoC
- sja1000: cleanup the CAN state handling
- WiFi:
- implement EPPKE/802.1X over auth frames support
- split up drop reasons better, removing generic RX_DROP
- additional FTM capabilities: 6 GHz support, supported number of
spatial streams and supported number of LTF repetitions
- better mac80211 iterators to enumerate resources
- initial UHR (Wi-Fi 8) support for cfg80211/mac80211
- WiFi drivers:
- Qualcomm/Atheros:
- ath11k: support for Channel Frequency Response measurement
- ath12k: a significant driver refactor to support multi-wiphy
devices and and pave the way for future device support in the
same driver (rather than splitting to ath13k)
- ath12k: support for the QCC2072 chipset
- Intel:
- iwlwifi: partial Neighbor Awareness Networking (NAN) support
- iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
- RealTek (rtw89):
- preparations for RTL8922DE support
- Bluetooth:
- implement setsockopt(BT_PHY) to set the connection packet type/PHY
- set link_policy on incoming ACL connections
- Bluetooth drivers:
- btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
- btqca: add WCN6855 firmware priority selection feature"
* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
bnge/bng_re: Add a new HSI
net: macb: Fix tx/rx malfunction after phy link down and up
af_unix: Fix memleak of newsk in unix_stream_connect().
net: ti: icssg-prueth: Add optional dependency on HSR
net: dsa: add basic initial driver for MxL862xx switches
net: mdio: add unlocked mdiodev C45 bus accessors
net: dsa: add tag format for MxL862xx switches
dt-bindings: net: dsa: add MaxLinear MxL862xx
selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
octeontx2-pf: Unregister devlink on probe failure
net: renesas: rswitch: fix forwarding offload statemachine
ionic: Rate limit unknown xcvr type messages
tcp: inet6_csk_xmit() optimization
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
ipv6: use np->final in inet6_sk_rebuild_header()
ipv6: add daddr/final storage in struct ipv6_pinfo
net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
...
Pull devicetree updates from Rob Herring:
"DT core:
- Sync dtc/libfdt with upstream v1.7.2-62-ga26ef6400bd8
- Add a for_each_compatible_node_scoped() loop and convert users in
cpufreq, dmaengine, clk, cdx, powerpc and Arm
- Simplify of/platform.c with scoped loop helpers
- Add fw_devlink tracking for "mmc-pwrseq"
- Optimize fw_devlink callback code size for pinctrl-N properties
- Replace strcmp_suffix() with strends()
DT bindings:
- Support building single binding targets
- Convert google,goldfish-fb, cznic,turris-mox-rwtm, ti,prm-inst
- Add bindings for Freescale AVIC, Realtek RTD1xxx system
controllers, Microchip 25AA010A EEPROM, OnSemi FIN3385, IEI
WT61P803 PUZZLE, Delta Electronics DPS-800-AB power supply,
Infineon IR35221 Digital Multi-phase Controller, Infineon PXE1610
Digital Dual Output 6+1 VR12.5 & VR13 CPU Controller,
socionext,uniphier-smpctrl, and xlnx,zynqmp-firmware
- Lots of trivial binding fixes to address warnings in DTS files.
These are mostly for arm64 platforms which is getting closer to be
warning free. Some public shaming has helped.
- Fix I2C bus node names in examples
- Drop obsolete brcm,vulcan-soc binding
- Drop unreferenced binding headers"
* tag 'devicetree-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (60 commits)
dt-bindings: interrupt-controller: Add compatiblie string fsl,imx(1|25|27|31|35)-avic
dt-bindings: soc: imx: add fsl,aips and fsl,emi compatible strings
dt-bindings: display: bridge: lt8912b: Drop reset gpio requirement
dt-bindings: firmware: fsl,scu: Mark multi-channel MU layouts as deprecated
cpufreq: s5pv210: Simplify with scoped for each OF child loop
dmaengine: fsl_raid: Simplify with scoped for each OF child loop
clk: imx: imx31: Simplify with scoped for each OF child loop
clk: imx: imx27: Simplify with scoped for each OF child loop
cdx: Use mutex guard to simplify error handling
cdx: Simplify with scoped for each OF child loop
powerpc/wii: Simplify with scoped for each OF child loop
powerpc/fsp2: Simplify with scoped for each OF child loop
ARM: exynos: Simplify with scoped for each OF child loop
ARM: at91: Simplify with scoped for each OF child loop
of: Add for_each_compatible_node_scoped() helper
dt-bindings: Fix emails with spaces or missing brackets
scripts/dtc: Update to upstream version v1.7.2-62-ga26ef6400bd8
dt-bindings: crypto: inside-secure,safexcel: Mandate only ring IRQs
dt-bindings: crypto: inside-secure,safexcel: Add SoC compatibles
of: reserved_mem: Fix placement of __free() annotation
...
Merge in late fixes in preparation for the net-next PR.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull driver core updates from Danilo Krummrich:
"Bus:
- Ensure bus->match() is consistently called with the device lock
held
- Improve type safety of bus_find_device_by_acpi_dev()
Devtmpfs:
- Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
simple_strtoul()
- Avoid sparse warning by making devtmpfs_context_ops static
IOMMU:
- Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()
MAINTAINERS:
- Add the new driver-core mailing list (driver-core@lists.linux.dev)
to all relevant entries
- Add missing tree location for "FIRMWARE LOADER (request_firmware)"
- Add driver-model documentation to the "DRIVER CORE" entry
- Add missing driver-core maintainers to the "AUXILIARY BUS" entry
Misc:
- Change return type of attribute_container_register() to void; it
has always been infallible
- Do not export sysfs_change_owner(), sysfs_file_change_owner() and
device_change_owner()
- Move devres_for_each_res() from the public devres header to
drivers/base/base.h
- Do not use a static struct device for the faux bus; allocate it
dynamically
Revocable:
- Patches for the revocable synchronization primitive have been
scheduled for v7.0-rc1, but have been reverted as they need some
more refinement
Rust:
- Device:
- Support dev_printk on all device types, not just the core Device
struct; remove now-redundant .as_ref() calls in dev_* print
calls
- Devres:
- Introduce an internal reference count in Devres<T> to avoid a
deadlock condition in case of (indirect) nesting
- DMA:
- Allow drivers to tune the maximum DMA segment size via
dma_set_max_seg_size()
- I/O:
- Introduce the concept of generic I/O backends to handle
different kinds of device shared memory through a common
interface.
This enables higher-level concepts such as register
abstractions, I/O slices, and field projections to be built
generically on top.
In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
trait hierarchy for sharing a common interface supporting offset
validation and bound-checking logic between I/O backends.
- Refactor MMIO to use the common I/O backend infrastructure
- Misc:
- Add __rust_helper annotations to C helpers for inlining into
Rust code
- Use "kernel vertical" style for imports
- Replace kernel::c_str! with C string literals
- Update ARef imports to use sync::aref
- Use pin_init::zeroed() for struct auxiliary_device_id and
debugfs file_operations initialization
- Use LKMM atomic types in debugfs doc-tests
- Various minor comment and documentation fixes
- PCI:
- Implement PCI configuration space accessors using the common I/O
backend infrastructure
- Document pci::Bar device endianness assumptions
- SoC:
- Abstractions for struct soc_device and struct soc_device_attribute
- Sample driver for soc::Device"
* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
rust: devres: fix race condition due to nesting
rust: dma: add missing __rust_helper annotations
samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
Revert "revocable: Revocable resource management"
Revert "revocable: Add Kunit test cases"
Revert "selftests: revocable: Add kselftest cases"
driver core: remove device_change_owner() export
sysfs: remove exports of sysfs_*change_owner()
driver core: disable revocable code from build
revocable: Add KUnit test for concurrent access
revocable: fix SRCU index corruption by requiring caller-provided storage
revocable: Add KUnit test for provider lifetime races
revocable: Fix races in revocable_alloc() using RCU
driver core: fix inverted "locked" suffix of driver_match_device()
rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
rust: pci: re-export ConfigSpace
rust: dma: allow drivers to tune max segment size
gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
rust: auxiliary: use `pin_init::zeroed()` for device ID
rust: debugfs: use pin_init::zeroed() for file_operations
...
Add compatiblie string fsl,imx(1|25|27|31|35)-avic for i.MX3 SoCs (over 15
years old).
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260210221215.1575844-1-Frank.Li@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
The HSI is shared between the firmware and the driver and is
automatically generated.
Add a new HSI for the BNGE driver. The current HSI refers to BNXT,
which will become incompatible with ThorUltra devices as the
BNGE driver adds more features. The BNGE driver will not use the HSI
located in the bnxt folder.
Also, add an HSI for ThorUltra RoCE driver.
Changes in v3:
- Fix in bng_roce_hsi.h reported by Jakub (AI review)
https://lore.kernel.org/netdev/20260207051422.4181717-1-kuba@kernel.org/
- Add an entry in MAINTAINERS
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Link: https://patch.msgid.link/20260208172925.1861255-1-vikas.gupta@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 99537d5c476c ("net: macb: Relocate mog_init_rings() callback
from macb_mac_link_up() to macb_open()"), the mog_init_rings() callback
was moved from macb_mac_link_up() to macb_open() to resolve a deadlock
issue. However, this change introduced a tx/rx malfunction following
phy link down and up events. The issue arises from a mismatch between
the software queue->tx_head, queue->tx_tail, queue->rx_prepared_head,
and queue->rx_tail values and the hardware's internal tx/rx queue
pointers.
According to the Zynq UltraScale TRM [1], when tx/rx is disabled, the
internal tx queue pointer resets to the value in the tx queue base
address register, while the internal rx queue pointer remains unchanged.
The following is quoted from the Zynq UltraScale TRM:
When transmit is disabled, with bit [3] of the network control register
set low, the transmit-buffer queue pointer resets to point to the address
indicated by the transmit-buffer queue base address register. Disabling
receive does not have the same effect on the receive-buffer queue
pointer.
Additionally, there is no need to reset the RBQP and TBQP registers in a
phy event callback. Therefore, move macb_init_buffers() to macb_open().
In a phy link up event, the only required action is to reset the tx
software head and tail pointers to align with the hardware's behavior.
[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
Fixes: 99537d5c476c ("net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260208-macb-init-ring-v1-1-939a32c14635@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Don't try to enable Extended Tags on VFs since that bit is Reserved
and causes misleading log messages (Håkon Bugge)
- Initialize Endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only
when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
Tag and Relaxed Ordering settings (Håkon Bugge)
Resource management:
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Fix bridge window alignment with optional resources, where
additional alignment requirement was previously lost (Ilpo
Järvinen)
- Stop over-estimating bridge window size since they are now assigned
without any gaps between them (Ilpo Järvinen)
- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
for nested bridges and endpoints (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim
failures (Ilpo Järvinen)
Driver binding:
- Fix device reference leak in pcie_port_remove_service() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Convert portdrv to use pcie_port_bus_type.probe() and .remove()
callbacks so .probe() and .remove() can eventually be removed from
struct device_driver (Uwe Kleine-König)
Error handling:
- Clear stale errors on reporting agents upon probe so they don't
look like recent errors (Lukas Wunner)
- Add generic RAS tracepoint for hotplug events (Shuai Xue)
- Add RAS tracepoint for link speed changes (Shuai Xue)
Power management:
- Avoid redundant delay on transition from D3hot to D3cold if the
device was already in D3hot (Brian Norris)
- Prevent runtime suspend until devices are fully initialized to
avoid saving incompletely configured device state (Brian Norris)
Power control:
- Add power_on/off callbacks with generic signature to pwrseq,
tc9563, and slot drivers so they can be used by pwrctrl core
(Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver
(Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the
PCI core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan
Sadhasivam)
Virtualization:
- Remove an incorrect unlock in pci_slot_trylock() error handling
(Jinhui Guo)
- Lock the bridge device for slot reset (Keith Busch)
- Enable ACS after IOMMU configuration on OF platforms so ACS is
enabled an all devices; previously the first device enumerated
(typically a Root Port) didn't have ACS enabled (Manivannan
Sadhasivam)
- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
work around hardware erratum; previously ACS SV was only
temporarily disabled, which worked for enumeration but not after
reset (Manivannan Sadhasivam)
Peer-to-peer DMA:
- Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
when removing the PCI device (Hou Tao)
- Remove incorrect p2pmem_alloc_mmap() warning about page refcount
(Hou Tao)
Endpoint framework:
- Add configfs sub-groups synchronously to avoid NULL pointer
dereference when racing with removal (Liu Song)
- Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
functions (Manikanta Maddireddy)
ASPEED PCIe controller driver:
- Add ASPEED Root Complex DT binding and driver (Jacky Chou)
Freescale i.MX6 PCIe controller driver:
- Add DT binding and driver support for an optional external refclock
in addition to the refclock from the internal PLL (Richard Zhu)
- Fix CLKREQ# control so host asserts it during enumeration and
Endpoints can use it afterwards to exit the L1.2 link state
(Richard Zhu)
NVIDIA Tegra PCIe controller driver:
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
down MSI domains to be built as modules (Aaron Kling)
- Allow pci-tegra to be built as a module (Aaron Kling)
NVIDIA Tegra194 PCIe controller driver:
- Relax Kconfig so tegra194 can be built for platforms beyond
Tegra194 (Vidya Sagar)
Qualcomm PCIe controller driver:
- Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)
- Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
schema (Krzysztof Kozlowski)
- Add DT binding and driver support for SA8255p Endpoint being
configured by firmware (Mrinmay Sarkar)
- Parse PERST# from all PCIe bridge nodes for future platforms that
will have PERST# in Switch Downstream Ports as well as in Root
Ports (Manivannan Sadhasivam)
Renesas RZ/G3S PCIe controller driver:
- Use pci_generic_config_write() since the writability provided by
the custom wrapper is unnecessary (Claudiu Beznea)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
Amaoto)
Synopsys DesignWare PCIe controller driver:
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability, to allow removal of
Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities in platforms that can't support
them, so the PCI core automatically falls back to INTx (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
for drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
link is not up to avoid an unnecessary timeout (Manivannan
Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up
IRQs to trigger enumeration instead of waiting for link to be up
because the PCI core doesn't allocate bus number space for
hierarchies that might be attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming
it dynamically, which is slow and racy with respect to other
concurrent traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Allow DWC host controller driver probe to continue if device is not
found or found but inactive; only fail when there's an error with
the link (Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
are not accessible after PME_Turn_Off, simply wait 10ms instead of
polling for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA
ranges when necessary instead of failing (Samuel Holland)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint
Controllers that can update BAR inbound address translation without
requiring EPF driver to clear/reset the BAR first, and advertise it
for DWC-based Endpoints (Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that
can map multiple independent inbound regions in a single BAR,
implement subrange mapping, advertise it for DWC-based Endpoints,
and add Endpoint selftests for it (Koichiro Den)
- Make resizable BARs work for Endpoint multi-PF configurations;
previously it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
and Address Match Mode (Aksh Garg)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound
windows weren't programmed, and ECAM-related iATU entries weren't
restored after suspend/resume, so config accesses failed (Krishna
Chaitanya Chundru)
Miscellaneous:
- Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"
* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
PCI: Disable ACS SV for IDT 0x8090 switch
PCI: Disable ACS SV for IDT 0x80b5 switch
PCI: Cache ACS Capabilities register
PCI: Enable ACS after configuring IOMMU for OF platforms
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
PCI: Use device_lock_assert() to verify device lock is held
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
PCI: Fix pci_slot_lock () device locking
PCI: Fix pci_slot_trylock() error handling
PCI: Mark Nvidia GB10 to avoid bus reset
PCI: Mark ASM1164 SATA controller to avoid bus reset
PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
PCI/PME: Replace RMW of Root Status register with direct write
PCI/AER: Clear stale errors on reporting agents upon probe
PCI: Don't claim disabled bridge windows
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
...
Commit f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc") did
attempt to optimize away the internal reference count of Devres.
However, without an internal reference count, we can't support cases
where Devres is indirectly nested, resulting into a deadlock.
Such indirect nesting easily happens in the following way:
A registration object (which is guarded by devres) hold a reference
count of an object that holds a device resource guarded by devres
itself.
For instance a drm::Registration holds a reference of a drm::Device. The
drm::Device itself holds a device resource in its private data.
When the drm::Registration is dropped by devres, and it happens that it
did hold the last reference count of the drm::Device, it also drops the
device resource, which is guarded by devres itself.
Thus, resulting into a deadlock in the Devres destructor of the device
resource, as in the following backtrace.
sysrq: Show Blocked State
task:rmmod state:D stack:0 pid:1331 tgid:1331 ppid:1330 task_flags:0x400100 flags:0x00000010
Call trace:
__switch_to+0x190/0x294 (T)
__schedule+0x878/0xf10
schedule+0x4c/0xcc
schedule_timeout+0x44/0x118
wait_for_common+0xc0/0x18c
wait_for_completion+0x18/0x24
_RINvNtCs4gKlGRWyJ5S_4core3ptr13drop_in_placeINtNtNtCsgzhNYVB7wSz_6kernel4sync3arc3ArcINtNtBN_6devres6DevresmEEECsRdyc7Hyps3_15rust_driver_pci+0x68/0xe8 [rust_driver_pci]
_RINvNvNtCsgzhNYVB7wSz_6kernel6devres16register_foreign8callbackINtNtCs4gKlGRWyJ5S_4core3pin3PinINtNtNtB6_5alloc4kbox3BoxINtNtNtB6_4sync3arc3ArcINtB4_6DevresmEENtNtB1A_9allocator7KmallocEEECsRdyc7Hyps3_15rust_driver_pci+0x34/0xc8 [rust_driver_pci]
devm_action_release+0x14/0x20
devres_release_all+0xb8/0x118
device_release_driver_internal+0x1c4/0x28c
driver_detach+0x94/0xd4
bus_remove_driver+0xdc/0x11c
driver_unregister+0x34/0x58
pci_unregister_driver+0x20/0x80
__arm64_sys_delete_module+0x1d8/0x254
invoke_syscall+0x40/0xcc
el0_svc_common+0x8c/0xd8
do_el0_svc+0x1c/0x28
el0_svc+0x54/0x1d4
el0t_64_sync_handler+0x84/0x12c
el0t_64_sync+0x198/0x19c
In order to fix this, re-introduce the internal reference count.
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Deadlock.20caused.20by.20nested.20Devres/with/571242651
Reported-by: Markus Probst <markus.probst@posteo.de>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Devres.20inside.20Devres.20stuck.20on.20cleanup/with/571239721
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/56#note_3282757
Fixes: f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260205222529.91465-1-dakr@kernel.org
[ Call clone() prior to devm_add_action(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add the fsl,aips and fsl,emi compatible strings for legacy i.MX3 SoCs
(over 15 years old).
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260201011913.2419626-1-Frank.Li@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Daniel Golle says:
====================
net: dsa: initial support for MaxLinear MxL862xx switches
This series adds very basic DSA support for the MaxLinear MxL86252
(5x 2500Base-T PHYs) and MxL86282 (8x 2500Base-T PHYs) switches.
In addition to the 2.5G TP ports both switches also come with two
SerDes interfaces which can be used either to connect external PHYs
or SFP cages, or as CPU port when using the switch with this DSA driver.
MxL862xx integrates a firmware running on an embedded processor (based on
Zephyr RTOS). Host interaction uses a simple netlink-like API transported
over MDIO/MMD.
This series includes only what's needed to pass traffic between user
ports and the CPU port: relayed MDIO to internal PHYs, basic port
enable/disable, and CPU-port special tagging.
The SerDes interface of the CPU port is automatically configured by the
switch after reset using a board-specific configuration stored together
with the firmware in the flash chip attached to the switch, so no action
is needed from the driver to setup the interface mode of the CPU port.
Also MAC settings of the PHY ports are automatically configured, which
means the driver works fine with phylink_mac_ops being all no-op stubs.
Multiple follow up series will bring support for setting up the other
SerDes PCS interface (ie. not used for the CPU port), bridge, VLAN, ...
offloading, and support for using an 802.1Q-based special tag instead of
the proprietary 8-byte tag.
====================
Link: https://patch.msgid.link/cover.1770433307.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When prepare_peercred() fails in unix_stream_connect(),
unix_release_sock() is not called for newsk, and the memory
is leaked.
Let's move prepare_peercred() before unix_create1().
Fixes: fd0a109a0f6b ("net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260207232236.2557549-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>