commits
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
Fix "declaration does not declare anything" warning by using
-fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs
that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct aes_key {
struct aes_enckey;
union aes_invkey_arch inv_k;
};
struct ns_common {
...
union {
struct ns_tree;
struct callback_head ns_rcu;
};
};
Which raise warning like below when building scx scheduler:
tools/sched_ext/build/include/vmlinux.h:50533:3: warning:
declaration does not declare anything [-Wmissing-declarations]
50533 | struct ns_tree;
| ^
Fix it by using -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_watchdog_timeout is written with WRITE_ONCE() in scx_enable():
WRITE_ONCE(scx_watchdog_timeout, timeout);
However, three read-side accesses use plain reads without the matching
READ_ONCE():
/* check_rq_for_timeouts() - L2824 */
last_runnable + scx_watchdog_timeout
/* scx_watchdog_workfn() - L2852 */
scx_watchdog_timeout / 2
/* scx_enable() - L5179 */
scx_watchdog_timeout / 2
The KCSAN documentation requires that if one accessor uses WRITE_ONCE()
to annotate lock-free access, all other accesses must also use the
appropriate accessor. Plain reads alongside WRITE_ONCE() leave the pair
incomplete and can trigger KCSAN warnings.
Note that scx_tick() already uses the correct READ_ONCE() annotation:
last_check + READ_ONCE(scx_watchdog_timeout)
Fix the three remaining plain reads to match, making all accesses to
scx_watchdog_timeout consistently annotated and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_attr_ops_show() and scx_uevent() access scx_root->ops.name directly.
This is problematic for two reasons:
1. The file-level comment explicitly identifies naked scx_root
dereferences as a temporary measure that needs to be replaced
with proper per-instance access.
2. scx_attr_events_show(), the neighboring sysfs show function in
the same group, already uses the correct pattern:
struct scx_sched *sch = container_of(kobj, struct scx_sched, kobj);
Having inconsistent access patterns in the same sysfs/uevent
group is error-prone.
The kobject embedded in struct scx_sched is initialized as:
kobject_init_and_add(&sch->kobj, &scx_ktype, NULL, "root");
so container_of(kobj, struct scx_sched, kobj) correctly retrieves
the owning scx_sched instance in both callbacks.
Replace the naked scx_root dereferences with container_of()-based
access, consistent with scx_attr_events_show() and in preparation
for proper multi-instance scx_sched support.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_bpf_dsq_nr_queued() reads dsq->nr via READ_ONCE() without holding
any lock, making dsq->nr a lock-free concurrently accessed variable.
However, dsq_mod_nr(), the sole writer of dsq->nr, only uses
WRITE_ONCE() on the write side without the matching READ_ONCE() on the
read side:
WRITE_ONCE(dsq->nr, dsq->nr + delta);
^^^^^^^
plain read -- KCSAN data race
The KCSAN documentation requires that if one accessor uses READ_ONCE()
or WRITE_ONCE() on a variable to annotate lock-free access, all other
accesses must also use the appropriate accessor. A plain read on the
right-hand side of WRITE_ONCE() leaves the pair incomplete and will
trigger KCSAN warnings.
Fix by using READ_ONCE() for the read side of the update:
WRITE_ONCE(dsq->nr, READ_ONCE(dsq->nr) + delta);
This is consistent with scx_bpf_dsq_nr_queued() and makes the
concurrent access annotation complete and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_hotplug_seq() uses strtoul() but validates the result with a
negative check (val < 0), which can never be true for an unsigned
return value.
Use the endptr mechanism to verify the entire string was consumed,
and check errno == ERANGE for overflow detection.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
SCX_EFLAG_INITIALIZED is the sole member of enum scx_exit_flags with no
explicit value, so the compiler assigns it 0. This makes the bitwise OR
in scx_ops_init() a no-op:
sch->exit_info->flags |= SCX_EFLAG_INITIALIZED; /* |= 0 */
As a result, BPF schedulers cannot distinguish whether ops.init()
completed successfully by inspecting exit_info->flags.
Assign the value 1LLU << 0 so the flag is actually set.
Fixes: f3aec2adce8d ("sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_idle_node_masks is allocated with num_possible_nodes() elements but
indexed by NUMA node IDs via for_each_node(). On systems with
non-contiguous NUMA node numbering (e.g. nodes 0 and 4), node IDs can
exceed the array size, causing out-of-bounds memory corruption.
Use nr_node_ids instead, which represents the maximum node ID range and
is the correct size for arrays indexed by node ID.
Fixes: 7c60329e3521 ("sched_ext: Add NUMA-awareness to the default idle selection policy")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_claim_exit() atomically sets exit_kind, which prevents scx_error() from
triggering further error handling. After claiming exit, the caller must kick
the helper kthread work which initiates bypass mode and teardown.
If the calling task gets preempted between claiming exit and kicking the
helper work, and the BPF scheduler fails to schedule it back (since error
handling is now disabled), the helper work is never queued, bypass mode
never activates, tasks stop being dispatched, and the system wedges.
Disable preemption across scx_claim_exit() and the subsequent work kicking
in all callers - scx_disable() and scx_vexit(). Add
lockdep_assert_preemption_disabled() to scx_claim_exit() to enforce the
requirement.
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add the missing Kconfig file to tools/sched_ext/ as referenced in
the README.
Ref: https://github.com/sched-ext/scx/blob/main/kernel.config
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Sync the documentation with the upstream scx repository to
reflect the current recommended configuration.
Ref: https://github.com/sched-ext/scx/blob/main/README.md#build--install
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The header <unistd.h> is included twice in rt_stall.c. Remove the
redundant inclusion to clean up the code.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The '-f' option is defined in getopt() but not handled in the switch
statement or documented in the help text. Providing '-f' currently
triggers the default error path.
Remove it to sync the optstring with the actual implementation.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The '-p' option is defined in getopt() but not handled in the switch
statement or documented in the help text. Providing '-p' currently
triggers the default error path.
Remove it to sync the optstring with the actual implementation.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The read() call in run_test() triggers a warn_unused_result compiler
warning, which breaks the build under -Werror.
Check the return value of read() and exit the child process on failure to
satisfy the compiler and handle pipe read errors.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The runner sets exit_req on SIGINT/SIGTERM but ignores it during the
main loop. This prevents users from cleanly interrupting a test run.
Check exit_req each iteration to safely break out on exit signals.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.
Reset optind to 1 at the restart label so options are re-parsed.
Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The stats thread reads nr_vruntime_enqueues, nr_vruntime_dispatches,
nr_vruntime_failed, and nr_curr_enqueued concurrently with the main
thread writing them, with no synchronization.
Use __atomic builtins with relaxed ordering for all accesses to these
counters to eliminate the data races.
Only display accuracy is affected, not scheduling correctness.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
nr_cpu_ids / 2 produces stride 0 on a single-CPU system, which later
causes SCX_BUG_ON(i == j) to fire. Validate stride after option
parsing to also catch invalid user-supplied values via -S.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use CPU_SET_S() instead of CPU_SET() on the dynamically allocated
cpuset to avoid a potential out-of-bounds write when nr_cpu_ids
exceeds CPU_SETSIZE.
Also destroy the skeleton before returning on invalid central CPU ID
to prevent a resource leak.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reset all counters, tasks and vruntime_head list on restart.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
fcg_read_stats() had a VLA allocating 21 * nr_cpus * 8 bytes on the
stack, risking stack overflow on large CPU counts (nr_cpus can be up
to 512).
Fix by using a single heap allocation with the correct size, reusing
it across all stat indices, and freeing it at the end.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The rt_stall test measures the runtime ratio between an EXT and an RT
task pinned to the same CPU, verifying that the deadline server prevents
RT tasks from starving SCHED_EXT tasks. It expects the EXT task to get
at least 4% of CPU time.
The test is flaky because sched_stress_test() calls sleep(RUN_TIME)
immediately after fork(), without waiting for the RT child to complete
its setup (set_affinity + set_sched). If the RT child experiences
scheduling latency before completing setup, that delay eats into the
measurement window: the RT child runs for less than RUN_TIME seconds,
and the EXT task's measured ratio drops below the 4% threshold.
For example, in the failing CI run [1]:
EXT=0.140s RT=4.750s total=4.890s (expected ~5.0s)
ratio=2.86% < 4% → FAIL
The 110ms gap (5.0 - 4.89) corresponds to the RT child's setup time
being counted inside the measurement window, during which fewer
deadline server ticks fire for the EXT task.
Fix by using pipes to synchronize: each child signals the parent after
completing its setup, and the parent waits for both signals before
starting sleep(RUN_TIME). This ensures the measurement window only
counts time when both tasks are fully configured and competing.
[1] https://github.com/kernel-patches/bpf/actions/runs/21961895809/job/63442490449
Fixes: be621a76341c ("selftests/sched_ext: Add test for sched_ext dl_server")
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix three issues in scx_userland's restart path:
- exit_req is not reset on restart, causing sched_main_loop() to exit
immediately without doing any scheduling work.
- stats_printer thread handle is local to spawn_stats_thread(), making
it impossible to join from main(). Promote it to file scope.
- The stats thread continues reading skel->bss after the skeleton is
destroyed on restart, causing a use-after-free. Join the stats thread
before destroying the skeleton to ensure it has exited.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cpu set is dynamically allocated for nr_cpu_ids using CPU_ALLOC(),
so the size passed to sched_setaffinity() should be CPU_ALLOC_SIZE()
rather than sizeof(cpu_set_t). Valgrind flagged this as accessing
unaddressable bytes past the allocation.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The local cnts array in read_stats() is not initialized before being
accumulated into per-CPU stats, which may lead to reading garbage
values. Zero it out with memset alongside the existing stats array
initialization.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull networking updates from Paolo Abeni:
"Core & protocols:
- A significant effort all around the stack to guide the compiler to
make the right choice when inlining code, to avoid unneeded calls
for small helper and stack canary overhead in the fast-path.
This generates better and faster code with very small or no text
size increases, as in many cases the call generated more code than
the actual inlined helper.
- Extend AccECN implementation so that is now functionally complete,
also allow the user-space enabling it on a per network namespace
basis.
- Add support for memory providers with large (above 4K) rx buffer.
Paired with hw-gro, larger rx buffer sizes reduce the number of
buffers traversing the stack, dincreasing single stream CPU usage
by up to ~30%.
- Do not add HBH header to Big TCP GSO packets. This simplifies the
RX path, the TX path and the NIC drivers, and is possible because
user-space taps can now interpret correctly such packets without
the HBH hint.
- Allow IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified,
aligning IPv6 to IPv4 behavior.
- Multi-queue aware sch_cake. This makes it possible to scale the
rate shaper of sch_cake across multiple CPUs, while still enforcing
a single global rate on the interface.
- Add support for the nbcon (new buffer console) infrastructure to
netconsole, enabling lock-free, priority-based console operations
that are safer in crash scenarios.
- Improve the TCP ipv6 output path to cache the flow information,
saving cpu cycles, reducing cache line misses and stack use.
- Improve netfilter packet tracker to resolve clashes for most
protocols, avoiding unneeded drops on rare occasions.
- Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
- Reduce tcp socket size by one cache line.
- Notify neighbour changes atomically, avoiding inconsistencies
between the notification sequence and the actual states sequence.
- Add vsock namespace support, allowing complete isolation of vsocks
across different network namespaces.
- Improve xsk generic performances with cache-alignment-oriented
optimizations.
- Support netconsole automatic target recovery, allowing netconsole
to reestablish targets when underlying low-level interface comes
back online.
Driver API:
- Support for switching the working mode (automatic vs manual) of a
DPLL device via netlink.
- Introduce PHY ports representation to expose multiple front-facing
media ports over a single MAC.
- Introduce "rx-polarity" and "tx-polarity" device tree properties,
to generalize polarity inversion requirements for differential
signaling.
- Add helper to create, prepare and enable managed clocks.
Device drivers:
- Add Huawei hinic3 PF etherner driver.
- Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
controller.
- Add ethernet driver for MaxLinear MxL862xx switches
- Remove parallel-port Ethernet driver.
- Convert existing driver timestamp configuration reporting to
hwtstamp_get and remove legacy ioctl().
- Convert existing drivers to .get_rx_ring_count(), simplifing the RX
ring count retrieval. Also remove the legacy fallback path.
- Ethernet high-speed NICs:
- Broadcom (bnxt, bng):
- bnxt: add FW interface update to support FEC stats histogram
and NVRAM defragmentation
- bng: add TSO and H/W GRO support
- nVidia/Mellanox (mlx5):
- improve latency of channel restart operations, reducing the
used H/W resources
- add TSO support for UDP over GRE over VLAN
- add flow counters support for hardware steering (HWS) rules
- use a static memory area to store headers for H/W GRO,
leading to 12% RX tput improvement
- Intel (100G, ice, idpf):
- ice: reorganizes layout of Tx and Rx rings for cacheline
locality and utilizes __cacheline_group* macros on the new
layouts
- ice: introduces Synchronous Ethernet (SyncE) support
- Meta (fbnic):
- adds debugfs for firmware mailbox and tx/rx rings vectors
- Ethernet virtual:
- geneve: introduce GRO/GSO support for double UDP encapsulation
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- some code refactoring and cleanups
- RealTek (r8169):
- add support for RTL8127ATF (10G Fiber SFP)
- add dash and LTR support
- Airoha:
- AN8811HB 2.5 Gbps phy support
- Freescale (fec):
- add XDP zero-copy support
- Thunderbolt:
- add get link setting support to allow bonding
- Renesas:
- add support for RZ/G3L GBETH SoC
- Ethernet switches:
- Maxlinear:
- support R(G)MII slow rate configuration
- add support for Intel GSW150
- Motorcomm (yt921x):
- add DCB/QoS support
- TI:
- icssm-prueth: support bridging (STP/RSTP) via the switchdev
framework
- Ethernet PHYs:
- Realtek:
- enable SGMII and 2500Base-X in-band auto-negotiation
- simplify and reunify C22/C45 drivers
- Micrel: convert bindings to DT schema
- CAN:
- move skb headroom content into skb extensions, making CAN
metadata access more robust
- CAN drivers:
- rcar_canfd:
- add support for FD-only mode
- add support for the RZ/T2H SoC
- sja1000: cleanup the CAN state handling
- WiFi:
- implement EPPKE/802.1X over auth frames support
- split up drop reasons better, removing generic RX_DROP
- additional FTM capabilities: 6 GHz support, supported number of
spatial streams and supported number of LTF repetitions
- better mac80211 iterators to enumerate resources
- initial UHR (Wi-Fi 8) support for cfg80211/mac80211
- WiFi drivers:
- Qualcomm/Atheros:
- ath11k: support for Channel Frequency Response measurement
- ath12k: a significant driver refactor to support multi-wiphy
devices and and pave the way for future device support in the
same driver (rather than splitting to ath13k)
- ath12k: support for the QCC2072 chipset
- Intel:
- iwlwifi: partial Neighbor Awareness Networking (NAN) support
- iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
- RealTek (rtw89):
- preparations for RTL8922DE support
- Bluetooth:
- implement setsockopt(BT_PHY) to set the connection packet type/PHY
- set link_policy on incoming ACL connections
- Bluetooth drivers:
- btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
- btqca: add WCN6855 firmware priority selection feature"
* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
bnge/bng_re: Add a new HSI
net: macb: Fix tx/rx malfunction after phy link down and up
af_unix: Fix memleak of newsk in unix_stream_connect().
net: ti: icssg-prueth: Add optional dependency on HSR
net: dsa: add basic initial driver for MxL862xx switches
net: mdio: add unlocked mdiodev C45 bus accessors
net: dsa: add tag format for MxL862xx switches
dt-bindings: net: dsa: add MaxLinear MxL862xx
selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
octeontx2-pf: Unregister devlink on probe failure
net: renesas: rswitch: fix forwarding offload statemachine
ionic: Rate limit unknown xcvr type messages
tcp: inet6_csk_xmit() optimization
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
ipv6: use np->final in inet6_sk_rebuild_header()
ipv6: add daddr/final storage in struct ipv6_pinfo
net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
...
Pull devicetree updates from Rob Herring:
"DT core:
- Sync dtc/libfdt with upstream v1.7.2-62-ga26ef6400bd8
- Add a for_each_compatible_node_scoped() loop and convert users in
cpufreq, dmaengine, clk, cdx, powerpc and Arm
- Simplify of/platform.c with scoped loop helpers
- Add fw_devlink tracking for "mmc-pwrseq"
- Optimize fw_devlink callback code size for pinctrl-N properties
- Replace strcmp_suffix() with strends()
DT bindings:
- Support building single binding targets
- Convert google,goldfish-fb, cznic,turris-mox-rwtm, ti,prm-inst
- Add bindings for Freescale AVIC, Realtek RTD1xxx system
controllers, Microchip 25AA010A EEPROM, OnSemi FIN3385, IEI
WT61P803 PUZZLE, Delta Electronics DPS-800-AB power supply,
Infineon IR35221 Digital Multi-phase Controller, Infineon PXE1610
Digital Dual Output 6+1 VR12.5 & VR13 CPU Controller,
socionext,uniphier-smpctrl, and xlnx,zynqmp-firmware
- Lots of trivial binding fixes to address warnings in DTS files.
These are mostly for arm64 platforms which is getting closer to be
warning free. Some public shaming has helped.
- Fix I2C bus node names in examples
- Drop obsolete brcm,vulcan-soc binding
- Drop unreferenced binding headers"
* tag 'devicetree-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (60 commits)
dt-bindings: interrupt-controller: Add compatiblie string fsl,imx(1|25|27|31|35)-avic
dt-bindings: soc: imx: add fsl,aips and fsl,emi compatible strings
dt-bindings: display: bridge: lt8912b: Drop reset gpio requirement
dt-bindings: firmware: fsl,scu: Mark multi-channel MU layouts as deprecated
cpufreq: s5pv210: Simplify with scoped for each OF child loop
dmaengine: fsl_raid: Simplify with scoped for each OF child loop
clk: imx: imx31: Simplify with scoped for each OF child loop
clk: imx: imx27: Simplify with scoped for each OF child loop
cdx: Use mutex guard to simplify error handling
cdx: Simplify with scoped for each OF child loop
powerpc/wii: Simplify with scoped for each OF child loop
powerpc/fsp2: Simplify with scoped for each OF child loop
ARM: exynos: Simplify with scoped for each OF child loop
ARM: at91: Simplify with scoped for each OF child loop
of: Add for_each_compatible_node_scoped() helper
dt-bindings: Fix emails with spaces or missing brackets
scripts/dtc: Update to upstream version v1.7.2-62-ga26ef6400bd8
dt-bindings: crypto: inside-secure,safexcel: Mandate only ring IRQs
dt-bindings: crypto: inside-secure,safexcel: Add SoC compatibles
of: reserved_mem: Fix placement of __free() annotation
...
Merge in late fixes in preparation for the net-next PR.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull driver core updates from Danilo Krummrich:
"Bus:
- Ensure bus->match() is consistently called with the device lock
held
- Improve type safety of bus_find_device_by_acpi_dev()
Devtmpfs:
- Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
simple_strtoul()
- Avoid sparse warning by making devtmpfs_context_ops static
IOMMU:
- Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()
MAINTAINERS:
- Add the new driver-core mailing list (driver-core@lists.linux.dev)
to all relevant entries
- Add missing tree location for "FIRMWARE LOADER (request_firmware)"
- Add driver-model documentation to the "DRIVER CORE" entry
- Add missing driver-core maintainers to the "AUXILIARY BUS" entry
Misc:
- Change return type of attribute_container_register() to void; it
has always been infallible
- Do not export sysfs_change_owner(), sysfs_file_change_owner() and
device_change_owner()
- Move devres_for_each_res() from the public devres header to
drivers/base/base.h
- Do not use a static struct device for the faux bus; allocate it
dynamically
Revocable:
- Patches for the revocable synchronization primitive have been
scheduled for v7.0-rc1, but have been reverted as they need some
more refinement
Rust:
- Device:
- Support dev_printk on all device types, not just the core Device
struct; remove now-redundant .as_ref() calls in dev_* print
calls
- Devres:
- Introduce an internal reference count in Devres<T> to avoid a
deadlock condition in case of (indirect) nesting
- DMA:
- Allow drivers to tune the maximum DMA segment size via
dma_set_max_seg_size()
- I/O:
- Introduce the concept of generic I/O backends to handle
different kinds of device shared memory through a common
interface.
This enables higher-level concepts such as register
abstractions, I/O slices, and field projections to be built
generically on top.
In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
trait hierarchy for sharing a common interface supporting offset
validation and bound-checking logic between I/O backends.
- Refactor MMIO to use the common I/O backend infrastructure
- Misc:
- Add __rust_helper annotations to C helpers for inlining into
Rust code
- Use "kernel vertical" style for imports
- Replace kernel::c_str! with C string literals
- Update ARef imports to use sync::aref
- Use pin_init::zeroed() for struct auxiliary_device_id and
debugfs file_operations initialization
- Use LKMM atomic types in debugfs doc-tests
- Various minor comment and documentation fixes
- PCI:
- Implement PCI configuration space accessors using the common I/O
backend infrastructure
- Document pci::Bar device endianness assumptions
- SoC:
- Abstractions for struct soc_device and struct soc_device_attribute
- Sample driver for soc::Device"
* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
rust: devres: fix race condition due to nesting
rust: dma: add missing __rust_helper annotations
samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
Revert "revocable: Revocable resource management"
Revert "revocable: Add Kunit test cases"
Revert "selftests: revocable: Add kselftest cases"
driver core: remove device_change_owner() export
sysfs: remove exports of sysfs_*change_owner()
driver core: disable revocable code from build
revocable: Add KUnit test for concurrent access
revocable: fix SRCU index corruption by requiring caller-provided storage
revocable: Add KUnit test for provider lifetime races
revocable: Fix races in revocable_alloc() using RCU
driver core: fix inverted "locked" suffix of driver_match_device()
rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
rust: pci: re-export ConfigSpace
rust: dma: allow drivers to tune max segment size
gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
rust: auxiliary: use `pin_init::zeroed()` for device ID
rust: debugfs: use pin_init::zeroed() for file_operations
...
Add compatiblie string fsl,imx(1|25|27|31|35)-avic for i.MX3 SoCs (over 15
years old).
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260210221215.1575844-1-Frank.Li@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
The HSI is shared between the firmware and the driver and is
automatically generated.
Add a new HSI for the BNGE driver. The current HSI refers to BNXT,
which will become incompatible with ThorUltra devices as the
BNGE driver adds more features. The BNGE driver will not use the HSI
located in the bnxt folder.
Also, add an HSI for ThorUltra RoCE driver.
Changes in v3:
- Fix in bng_roce_hsi.h reported by Jakub (AI review)
https://lore.kernel.org/netdev/20260207051422.4181717-1-kuba@kernel.org/
- Add an entry in MAINTAINERS
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Link: https://patch.msgid.link/20260208172925.1861255-1-vikas.gupta@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 99537d5c476c ("net: macb: Relocate mog_init_rings() callback
from macb_mac_link_up() to macb_open()"), the mog_init_rings() callback
was moved from macb_mac_link_up() to macb_open() to resolve a deadlock
issue. However, this change introduced a tx/rx malfunction following
phy link down and up events. The issue arises from a mismatch between
the software queue->tx_head, queue->tx_tail, queue->rx_prepared_head,
and queue->rx_tail values and the hardware's internal tx/rx queue
pointers.
According to the Zynq UltraScale TRM [1], when tx/rx is disabled, the
internal tx queue pointer resets to the value in the tx queue base
address register, while the internal rx queue pointer remains unchanged.
The following is quoted from the Zynq UltraScale TRM:
When transmit is disabled, with bit [3] of the network control register
set low, the transmit-buffer queue pointer resets to point to the address
indicated by the transmit-buffer queue base address register. Disabling
receive does not have the same effect on the receive-buffer queue
pointer.
Additionally, there is no need to reset the RBQP and TBQP registers in a
phy event callback. Therefore, move macb_init_buffers() to macb_open().
In a phy link up event, the only required action is to reset the tx
software head and tail pointers to align with the hardware's behavior.
[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
Fixes: 99537d5c476c ("net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260208-macb-init-ring-v1-1-939a32c14635@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Don't try to enable Extended Tags on VFs since that bit is Reserved
and causes misleading log messages (Håkon Bugge)
- Initialize Endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only
when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
Tag and Relaxed Ordering settings (Håkon Bugge)
Resource management:
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Fix bridge window alignment with optional resources, where
additional alignment requirement was previously lost (Ilpo
Järvinen)
- Stop over-estimating bridge window size since they are now assigned
without any gaps between them (Ilpo Järvinen)
- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
for nested bridges and endpoints (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim
failures (Ilpo Järvinen)
Driver binding:
- Fix device reference leak in pcie_port_remove_service() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Convert portdrv to use pcie_port_bus_type.probe() and .remove()
callbacks so .probe() and .remove() can eventually be removed from
struct device_driver (Uwe Kleine-König)
Error handling:
- Clear stale errors on reporting agents upon probe so they don't
look like recent errors (Lukas Wunner)
- Add generic RAS tracepoint for hotplug events (Shuai Xue)
- Add RAS tracepoint for link speed changes (Shuai Xue)
Power management:
- Avoid redundant delay on transition from D3hot to D3cold if the
device was already in D3hot (Brian Norris)
- Prevent runtime suspend until devices are fully initialized to
avoid saving incompletely configured device state (Brian Norris)
Power control:
- Add power_on/off callbacks with generic signature to pwrseq,
tc9563, and slot drivers so they can be used by pwrctrl core
(Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver
(Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the
PCI core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan
Sadhasivam)
Virtualization:
- Remove an incorrect unlock in pci_slot_trylock() error handling
(Jinhui Guo)
- Lock the bridge device for slot reset (Keith Busch)
- Enable ACS after IOMMU configuration on OF platforms so ACS is
enabled an all devices; previously the first device enumerated
(typically a Root Port) didn't have ACS enabled (Manivannan
Sadhasivam)
- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
work around hardware erratum; previously ACS SV was only
temporarily disabled, which worked for enumeration but not after
reset (Manivannan Sadhasivam)
Peer-to-peer DMA:
- Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
when removing the PCI device (Hou Tao)
- Remove incorrect p2pmem_alloc_mmap() warning about page refcount
(Hou Tao)
Endpoint framework:
- Add configfs sub-groups synchronously to avoid NULL pointer
dereference when racing with removal (Liu Song)
- Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
functions (Manikanta Maddireddy)
ASPEED PCIe controller driver:
- Add ASPEED Root Complex DT binding and driver (Jacky Chou)
Freescale i.MX6 PCIe controller driver:
- Add DT binding and driver support for an optional external refclock
in addition to the refclock from the internal PLL (Richard Zhu)
- Fix CLKREQ# control so host asserts it during enumeration and
Endpoints can use it afterwards to exit the L1.2 link state
(Richard Zhu)
NVIDIA Tegra PCIe controller driver:
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
down MSI domains to be built as modules (Aaron Kling)
- Allow pci-tegra to be built as a module (Aaron Kling)
NVIDIA Tegra194 PCIe controller driver:
- Relax Kconfig so tegra194 can be built for platforms beyond
Tegra194 (Vidya Sagar)
Qualcomm PCIe controller driver:
- Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)
- Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
schema (Krzysztof Kozlowski)
- Add DT binding and driver support for SA8255p Endpoint being
configured by firmware (Mrinmay Sarkar)
- Parse PERST# from all PCIe bridge nodes for future platforms that
will have PERST# in Switch Downstream Ports as well as in Root
Ports (Manivannan Sadhasivam)
Renesas RZ/G3S PCIe controller driver:
- Use pci_generic_config_write() since the writability provided by
the custom wrapper is unnecessary (Claudiu Beznea)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
Amaoto)
Synopsys DesignWare PCIe controller driver:
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability, to allow removal of
Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities in platforms that can't support
them, so the PCI core automatically falls back to INTx (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
for drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
link is not up to avoid an unnecessary timeout (Manivannan
Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up
IRQs to trigger enumeration instead of waiting for link to be up
because the PCI core doesn't allocate bus number space for
hierarchies that might be attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming
it dynamically, which is slow and racy with respect to other
concurrent traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Allow DWC host controller driver probe to continue if device is not
found or found but inactive; only fail when there's an error with
the link (Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
are not accessible after PME_Turn_Off, simply wait 10ms instead of
polling for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA
ranges when necessary instead of failing (Samuel Holland)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint
Controllers that can update BAR inbound address translation without
requiring EPF driver to clear/reset the BAR first, and advertise it
for DWC-based Endpoints (Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that
can map multiple independent inbound regions in a single BAR,
implement subrange mapping, advertise it for DWC-based Endpoints,
and add Endpoint selftests for it (Koichiro Den)
- Make resizable BARs work for Endpoint multi-PF configurations;
previously it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
and Address Match Mode (Aksh Garg)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound
windows weren't programmed, and ECAM-related iATU entries weren't
restored after suspend/resume, so config accesses failed (Krishna
Chaitanya Chundru)
Miscellaneous:
- Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"
* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
PCI: Disable ACS SV for IDT 0x8090 switch
PCI: Disable ACS SV for IDT 0x80b5 switch
PCI: Cache ACS Capabilities register
PCI: Enable ACS after configuring IOMMU for OF platforms
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
PCI: Use device_lock_assert() to verify device lock is held
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
PCI: Fix pci_slot_lock () device locking
PCI: Fix pci_slot_trylock() error handling
PCI: Mark Nvidia GB10 to avoid bus reset
PCI: Mark ASM1164 SATA controller to avoid bus reset
PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
PCI/PME: Replace RMW of Root Status register with direct write
PCI/AER: Clear stale errors on reporting agents upon probe
PCI: Don't claim disabled bridge windows
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
...
Commit f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc") did
attempt to optimize away the internal reference count of Devres.
However, without an internal reference count, we can't support cases
where Devres is indirectly nested, resulting into a deadlock.
Such indirect nesting easily happens in the following way:
A registration object (which is guarded by devres) hold a reference
count of an object that holds a device resource guarded by devres
itself.
For instance a drm::Registration holds a reference of a drm::Device. The
drm::Device itself holds a device resource in its private data.
When the drm::Registration is dropped by devres, and it happens that it
did hold the last reference count of the drm::Device, it also drops the
device resource, which is guarded by devres itself.
Thus, resulting into a deadlock in the Devres destructor of the device
resource, as in the following backtrace.
sysrq: Show Blocked State
task:rmmod state:D stack:0 pid:1331 tgid:1331 ppid:1330 task_flags:0x400100 flags:0x00000010
Call trace:
__switch_to+0x190/0x294 (T)
__schedule+0x878/0xf10
schedule+0x4c/0xcc
schedule_timeout+0x44/0x118
wait_for_common+0xc0/0x18c
wait_for_completion+0x18/0x24
_RINvNtCs4gKlGRWyJ5S_4core3ptr13drop_in_placeINtNtNtCsgzhNYVB7wSz_6kernel4sync3arc3ArcINtNtBN_6devres6DevresmEEECsRdyc7Hyps3_15rust_driver_pci+0x68/0xe8 [rust_driver_pci]
_RINvNvNtCsgzhNYVB7wSz_6kernel6devres16register_foreign8callbackINtNtCs4gKlGRWyJ5S_4core3pin3PinINtNtNtB6_5alloc4kbox3BoxINtNtNtB6_4sync3arc3ArcINtB4_6DevresmEENtNtB1A_9allocator7KmallocEEECsRdyc7Hyps3_15rust_driver_pci+0x34/0xc8 [rust_driver_pci]
devm_action_release+0x14/0x20
devres_release_all+0xb8/0x118
device_release_driver_internal+0x1c4/0x28c
driver_detach+0x94/0xd4
bus_remove_driver+0xdc/0x11c
driver_unregister+0x34/0x58
pci_unregister_driver+0x20/0x80
__arm64_sys_delete_module+0x1d8/0x254
invoke_syscall+0x40/0xcc
el0_svc_common+0x8c/0xd8
do_el0_svc+0x1c/0x28
el0_svc+0x54/0x1d4
el0t_64_sync_handler+0x84/0x12c
el0t_64_sync+0x198/0x19c
In order to fix this, re-introduce the internal reference count.
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Deadlock.20caused.20by.20nested.20Devres/with/571242651
Reported-by: Markus Probst <markus.probst@posteo.de>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Devres.20inside.20Devres.20stuck.20on.20cleanup/with/571239721
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/56#note_3282757
Fixes: f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260205222529.91465-1-dakr@kernel.org
[ Call clone() prior to devm_add_action(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add the fsl,aips and fsl,emi compatible strings for legacy i.MX3 SoCs
(over 15 years old).
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260201011913.2419626-1-Frank.Li@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Daniel Golle says:
====================
net: dsa: initial support for MaxLinear MxL862xx switches
This series adds very basic DSA support for the MaxLinear MxL86252
(5x 2500Base-T PHYs) and MxL86282 (8x 2500Base-T PHYs) switches.
In addition to the 2.5G TP ports both switches also come with two
SerDes interfaces which can be used either to connect external PHYs
or SFP cages, or as CPU port when using the switch with this DSA driver.
MxL862xx integrates a firmware running on an embedded processor (based on
Zephyr RTOS). Host interaction uses a simple netlink-like API transported
over MDIO/MMD.
This series includes only what's needed to pass traffic between user
ports and the CPU port: relayed MDIO to internal PHYs, basic port
enable/disable, and CPU-port special tagging.
The SerDes interface of the CPU port is automatically configured by the
switch after reset using a board-specific configuration stored together
with the firmware in the flash chip attached to the switch, so no action
is needed from the driver to setup the interface mode of the CPU port.
Also MAC settings of the PHY ports are automatically configured, which
means the driver works fine with phylink_mac_ops being all no-op stubs.
Multiple follow up series will bring support for setting up the other
SerDes PCS interface (ie. not used for the CPU port), bridge, VLAN, ...
offloading, and support for using an 802.1Q-based special tag instead of
the proprietary 8-byte tag.
====================
Link: https://patch.msgid.link/cover.1770433307.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When prepare_peercred() fails in unix_stream_connect(),
unix_release_sock() is not called for newsk, and the memory
is leaked.
Let's move prepare_peercred() before unix_create1().
Fixes: fd0a109a0f6b ("net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260207232236.2557549-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull device mapper updates from Mikulas Patocka:
- dm-verity:
- various optimizations and fixes related to forward error correction
- add a .dm-verity keyring
- dm-integrity: fix bugs with growing a device in bitmap mode
- dm-mpath:
- fix leaking fake timeout requests
- fix UAF bug caused by stale rq->bio
- fix minor bugs in device creation
- dm-core:
- fix a bug related to blkg association
- avoid unnecessary blk-crypto work on invalid keys
- dm-bufio:
- dm-bufio cleanup and optimization (reducing hash table lookups)
- various other minor fixes and cleanups
* tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
dm mpath: make pg_init_delay_msecs settable
Revert "dm: fix a race condition in retrieve_deps"
dm mpath: Add missing dm_put_device when failing to get scsi dh name
dm vdo encodings: clean up header and version functions
dm: use bio_clone_blkg_association
dm: fix excessive blk-crypto operations for invalid keys
dm-verity: fix section mismatch error
dm-unstripe: fix mapping bug when there are multiple targets in a table
dm-integrity: fix recalculation in bitmap mode
dm-bufio: avoid redundant buffer_tree lookups
dm-bufio: merge cache_put() into cache_put_and_wake()
selftests: add dm-verity keyring selftests
dm-verity: add dm-verity keyring
dm: clear cloned request bio pointer when last clone bio completes
dm-verity: fix up various workqueue-related comments
dm-verity: switch to bio_advance_iter_single()
dm-verity: consolidate the BH and normal work structs
dm: add WQ_PERCPU to alloc_workqueue users
dm-integrity: fix a typo in the code for write/discard race
dm: use READ_ONCE in dm_blk_report_zones
...
- Fix documentation typos (Shawn Lin)
- Add struct p2pdma_provider kernel doc (Leon Romanovsky)
- Remove useless devres WARN_ON() (Philipp Stanner)
* pci/misc:
PCI: Remove useless WARN_ON() from devres
PCI/P2PDMA: Add missing struct p2pdma_provider documentation
Documentation: PCI: Fix typos in msi-howto.rst
The commit d8932355f8c5 ("rust: dma: add helpers for architectures
without CONFIG_HAS_DMA") missed adding the __rust_helper annotations.
Add them.
Reported-by: Gary Guo <gary@garyguo.net>
Closes: https://lore.kernel.org/rust-for-linux/DFW4F5OSDO7A.TBUOX6RCN8G7@garyguo.net/
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260126071738.1670967-1-dirk.behme@de.bosch.com
[ Fix minor checkpatch.pl warning. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Drop the gpio reset from the list of required properties. The bridge works
fine with a reset not managed by Linux. In the driver itself the gpio is
already flagged as optional.
Signed-off-by: Dominik Haller <d.haller@phytec.de>
Link: https://patch.msgid.link/20260130205820.83189-6-d.haller@phytec.de
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Prior to this the receiver would sleep for the configured timeout,
then attempt to receive as many packets as possible. This would result
in a large burst of packets, and we don't necessarily need that many samples.
The tests now run faster.
Before
ok 12 toeplitz.test.rps_udp_ipv6
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
real 0m54.792s
user 0m12.486s
sys 0m10.887s
After
ok 12 toeplitz.test.rps_udp_ipv6
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
real 0m36.892s
user 0m4.203s
sys 0m8.314s
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20260207013018.551347-1-dimitri.daskalakis1@gmail.com
[pabeni@redhat.com: whitespaces fixes]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add very basic DSA driver for MaxLinear's MxL862xx switches.
In contrast to previous MaxLinear switches the MxL862xx has a built-in
processor that runs a sophisticated firmware based on Zephyr RTOS.
Interaction between the host and the switch hence is organized using a
software API of that firmware rather than accessing hardware registers
directly.
Add descriptions of the most basic firmware API calls to access the
built-in MDIO bus hosting the 2.5GE PHYs, basic port control as well as
setting up the CPU port.
Implement a very basic DSA driver using that API which is sufficient to
get packets flowing between the user ports and the CPU port.
The firmware offers all features one would expect from a modern switch
hardware, they are going to be added one by one in follow-up patch
series.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/ccde07e8cf33d8ae243000013b57cfaa2695e0a9.1770433307.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame
forward offload") introduced support for offloading HSR frame forwarding,
which relies on functions such as is_hsr_master() provided by the HSR
module. Although HSR provides stubs for configurations with HSR
disabled, this driver still requires an optional dependency on HSR.
Otherwise, build failures will occur when icssg-prueth is built-in
while HSR is configured as a module.
ld.lld: error: undefined symbol: is_hsr_master
>>> referenced by icssg_prueth.c:710 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:710)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:681 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:681)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:1812 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:1812)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(prueth_netdevice_event) in archive vmlinux.a
ld.lld: error: undefined symbol: hsr_get_port_ndev
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/etherneteth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:683 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:683)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced 1 more times
Fixes: 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame forward offload")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260207-icssg-dep-v3-1-8c47c1937f81@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull iommu updates from Joerg Roedel:
"Core changes:
- Rust bindings for IO-pgtable code
- IOMMU page allocation debugging support
- Disable ATS during PCI resets
Intel VT-d changes:
- Skip dev-iotlb flush for inaccessible PCIe device
- Flush cache for PASID table before using it
- Use right invalidation method for SVA and NESTED domains
- Ensure atomicity in context and PASID entry updates
AMD-Vi changes:
- Support for nested translations
- Other minor improvements
ARM-SMMU-v2 changes:
- Configure SoC-specific prefetcher settings for Qualcomm's "MDSS"
ARM-SMMU-v3 changes:
- Improve CMDQ locking fairness for pathetically small queue sizes
- Remove tracking of the IAS as this is only relevant for AArch32 and
was causing C_BAD_STE errors
- Add device-tree support for NVIDIA's CMDQV extension
- Allow some hitless transitions for the 'MEV' and 'EATS' STE fields
- Don't disable ATS for nested S1-bypass nested domains
- Additions to the kunit selftests"
* tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
iommu/amd: serialize sequence allocation under concurrent TLB invalidations
iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
iommu/tegra241-cmdqv: Decouple driver from ACPI
iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
iommu/vt-d: Fix race condition during PASID entry replacement
iommu/vt-d: Clear Present bit before tearing down context entry
iommu/vt-d: Clear Present bit before tearing down PASID entry
iommu/vt-d: Flush piotlb for SVM and Nested domain
iommu/vt-d: Flush cache for PASID table before using it
iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
rust: iommu: fix `srctree` link warning
rust: iommu: fix Rust formatting
...
"pg_init_delay_msecs X" can be passed as a feature in the multipath
table and is used to set m->pg_init_delay_msecs in parse_features().
However, alloc_multipath_stage2(), which is called after
parse_features(), resets m->pg_init_delay_msecs to its default value.
Instead, set m->pg_init_delay_msecs in alloc_multipath(), which is
called before parse_features(), to avoid overwriting a value passed in
by the table.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Cc: stable@vger.kernel.org
- Add rcar-host OF Kconfig dependency to avoid objtool no-cfi warning
(Nathan Chancellor)
* pci/controller/misc:
PCI: rcar-host: Add OF Kconfig dependency to avoid objtool no-cfi warning
PCI's devres implementation contains a WARN_ON() which served to inform
users relying on the legacy devres iomap table that this table does not
support multiple mappings per BAR.
The WARN_ON() can be regarded as useless by now, since mapping a BAR
multiple times is legal behavior and old users of pcim_iomap_table(), the
accessor function for that table, did not break in the past PCI devres
cleanup. New PCI users will hopefully notice that pcim_iomap_table() is
deprecated and are unlikely to use it for mapping the same BAR multiple
times.
Moreover, WARN_ON()s create noisy, difficult to read error messages which
can be more confusing than helpful, since they don't inform the user about
what precisely the problem is.
Remove the WARN_ON().
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251218092819.149665-2-phasta@kernel.org
The commit 600de1c008b2 ("rust: pci: remove redundant `.as_ref()` for
`dev_*` print") removed `.as_ref()` for `dev_*` prints. Nearly at the
same time the commit e62e48adf76c ("sample: rust: pci: add tests for
config space routines") was merged. Which missed this removal, then.
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260202064001.176787-1-dirk.behme@de.bosch.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The SCU MU driver has already supported the simple and efficient single-TX
and single-RX channel layout since 2021. The older multi-channel MU
configurations (tx0..tx3 and rx0..rx3) are less efficient in practice and
not needed.
Mark these legacy mbox-names and mboxes tuple layouts as deprecated in the
binding schema. The driver continues to support them for backward
compatibility in case firmware publishes the legacy properties.
The example section is updated accordingly to demonstrate the recommended
layout.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260127-scu-v2-1-03f3aaa56e1b@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Eric Dumazet says:
====================
ipv6: tcp: no longer rebuild fl6 at each transmit
TCP v6 spends a good amount of time rebuilding a fresh fl6 at each
transmit in inet6_csk_xmit()/inet6_csk_route_socket().
TCP v4 caches the information in inet->cork.fl.u.ip4 instead.
This series changes TCP v6 to behave the same, saving cpu cycles
and reducing cache line misses and stack use.
====================
Link: https://patch.msgid.link/20260206173426.1638518-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add helper inline functions __mdiodev_c45_read() and
__mdiodev_c45_write(), which are the C45 equivalents of the existing
__mdiodev_read() and __mdiodev_write() added by commit e6a45700e7e1
("net: mdio: add unlocked mdiobus and mdiodev bus accessors")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/8d1d55949a75a871d2a3b90e421de4bd58d77685.1770433307.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When probe fails after devlink registration, the missing devlink unregister
call causing a memory leak.
Fixes: 2da489432747 ("octeontx2-pf: devlink params support to set mcam entry count")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20260206182645.4032737-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull landlock updates from Mickaël Salaün:
- extend Landlock to enforce restrictions on a whole process, similarly
to the seccomp's TSYNC flag
- refactor data structures to simplify code and improve performance
- add documentation to cover missing parts
* tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
mailmap: Add entry for Mickaël Salaün
landlock: Transpose the layer masks data structure
landlock: Add access_mask_subset() helper
selftests/landlock: Add filesystem access benchmark
landlock: Document audit blocker field format
landlock: Add errata documentation section
landlock: Add backwards compatibility for restrict flags
landlock: Refactor TCP socket type check
landlock: Minor reword of docs for TCP access rights
landlock: Document LANDLOCK_RESTRICT_SELF_TSYNC
selftests/landlock: Add LANDLOCK_RESTRICT_SELF_TSYNC tests
landlock: Multithreading support for landlock_restrict_self()
This reverts commit f6007dce0cd35d634d9be91ef3515a6385dcee16.
Commit f6007dce0cd3 ("dm: fix a race condition in retrieve_deps") was
added to fix a race between retrieving the list of dm table devices and
multipath_message() modifying the list of table devices. But Commit
a48f6b82c5c4 ("dm mpath: don't call dm_get_device in multipath_message")
removed the call to dm_get_device() from multipath_message(). After that
commit, the only calls to dm_get_device() and dm_put_device() are in
target constructors and destructors, so the race with retrieve_deps() is
no longer possible.
Suggested-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
- Fix IRQ domain leak when MSI allocation fails (Haotian Zhang)
* pci/controller/xilinx:
PCI: xilinx: Fix INTx IRQ domain leak in error paths
After commit 894af4a1cde6 ("objtool: Validate kCFI calls"), compile
testing pcie-rcar-host.c with CONFIG_FINEIBT=y and CONFIG_OF=n results
in a no-cfi objtool warning in rcar_pcie_probe():
$ cat allno.config
CONFIG_CFI=y
CONFIG_COMPILE_TEST=y
CONFIG_CPU_MITIGATIONS=y
CONFIG_GENERIC_PHY=y
CONFIG_MITIGATION_RETPOLINE=y
CONFIG_MODULES=y
CONFIG_PCI=y
CONFIG_PCI_MSI=y
CONFIG_PCIE_RCAR_HOST=y
CONFIG_X86_KERNEL_IBT=y
$ make -skj"$(nproc)" ARCH=x86_64 KCONFIG_ALLCONFIG=1 LLVM=1 clean allnoconfig vmlinux
vmlinux.o: warning: objtool: rcar_pcie_probe+0x191: no-cfi indirect call!
When CONFIG_OF is unset, of_device_get_match_data() returns NULL, so
LLVM knows this indirect call has no valid destination and drops the
kCFI setup before the call, triggering the objtool check that makes sure
all indirect calls have kCFI setup.
This driver depends on OF for probing with non-NULL data for every match
so this call will never be NULL in practice. Add a hard Kconfig
dependency on OF to avoid the warning.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510092124.O2IX0Jek-lkp@intel.com/
Closes: https://github.com/ClangBuiltLinux/linux/issues/2134
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251014-rcar_pcie_probe-avoid-nocfi-objtool-warning-v2-1-6e0204b002c6@kernel.org
Two fields in struct p2pdma_provider were not documented, which resulted
in the following kernel-doc warning:
Warning: include/linux/pci-p2pdma.h:26 struct member 'owner' not described in 'p2pdma_provider'
Warning: include/linux/pci-p2pdma.h:26 struct member 'bus_offset' not described in 'p2pdma_provider'
Repro:
$ scripts/kernel-doc -none include/linux/pci-p2pdma.h
Fixes: f58ef9d1d135 ("PCI/P2PDMA: Separate the mmap() support from the core logic")
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Closes: https://lore.kernel.org/all/20260102234033.GA246107@bhelgaas
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patch.msgid.link/20260104-fix-p2p-kdoc-v1-1-6d181233f8bc@nvidia.com
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
Fix "declaration does not declare anything" warning by using
-fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs
that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Similar to commit 835a50753579 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f480 ("bpftool: Fix build warnings
due to MS extensions")
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct aes_key {
struct aes_enckey;
union aes_invkey_arch inv_k;
};
struct ns_common {
...
union {
struct ns_tree;
struct callback_head ns_rcu;
};
};
Which raise warning like below when building scx scheduler:
tools/sched_ext/build/include/vmlinux.h:50533:3: warning:
declaration does not declare anything [-Wmissing-declarations]
50533 | struct ns_tree;
| ^
Fix it by using -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_watchdog_timeout is written with WRITE_ONCE() in scx_enable():
WRITE_ONCE(scx_watchdog_timeout, timeout);
However, three read-side accesses use plain reads without the matching
READ_ONCE():
/* check_rq_for_timeouts() - L2824 */
last_runnable + scx_watchdog_timeout
/* scx_watchdog_workfn() - L2852 */
scx_watchdog_timeout / 2
/* scx_enable() - L5179 */
scx_watchdog_timeout / 2
The KCSAN documentation requires that if one accessor uses WRITE_ONCE()
to annotate lock-free access, all other accesses must also use the
appropriate accessor. Plain reads alongside WRITE_ONCE() leave the pair
incomplete and can trigger KCSAN warnings.
Note that scx_tick() already uses the correct READ_ONCE() annotation:
last_check + READ_ONCE(scx_watchdog_timeout)
Fix the three remaining plain reads to match, making all accesses to
scx_watchdog_timeout consistently annotated and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_attr_ops_show() and scx_uevent() access scx_root->ops.name directly.
This is problematic for two reasons:
1. The file-level comment explicitly identifies naked scx_root
dereferences as a temporary measure that needs to be replaced
with proper per-instance access.
2. scx_attr_events_show(), the neighboring sysfs show function in
the same group, already uses the correct pattern:
struct scx_sched *sch = container_of(kobj, struct scx_sched, kobj);
Having inconsistent access patterns in the same sysfs/uevent
group is error-prone.
The kobject embedded in struct scx_sched is initialized as:
kobject_init_and_add(&sch->kobj, &scx_ktype, NULL, "root");
so container_of(kobj, struct scx_sched, kobj) correctly retrieves
the owning scx_sched instance in both callbacks.
Replace the naked scx_root dereferences with container_of()-based
access, consistent with scx_attr_events_show() and in preparation
for proper multi-instance scx_sched support.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_bpf_dsq_nr_queued() reads dsq->nr via READ_ONCE() without holding
any lock, making dsq->nr a lock-free concurrently accessed variable.
However, dsq_mod_nr(), the sole writer of dsq->nr, only uses
WRITE_ONCE() on the write side without the matching READ_ONCE() on the
read side:
WRITE_ONCE(dsq->nr, dsq->nr + delta);
^^^^^^^
plain read -- KCSAN data race
The KCSAN documentation requires that if one accessor uses READ_ONCE()
or WRITE_ONCE() on a variable to annotate lock-free access, all other
accesses must also use the appropriate accessor. A plain read on the
right-hand side of WRITE_ONCE() leaves the pair incomplete and will
trigger KCSAN warnings.
Fix by using READ_ONCE() for the read side of the update:
WRITE_ONCE(dsq->nr, READ_ONCE(dsq->nr) + delta);
This is consistent with scx_bpf_dsq_nr_queued() and makes the
concurrent access annotation complete and KCSAN-clean.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_hotplug_seq() uses strtoul() but validates the result with a
negative check (val < 0), which can never be true for an unsigned
return value.
Use the endptr mechanism to verify the entire string was consumed,
and check errno == ERANGE for overflow detection.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
SCX_EFLAG_INITIALIZED is the sole member of enum scx_exit_flags with no
explicit value, so the compiler assigns it 0. This makes the bitwise OR
in scx_ops_init() a no-op:
sch->exit_info->flags |= SCX_EFLAG_INITIALIZED; /* |= 0 */
As a result, BPF schedulers cannot distinguish whether ops.init()
completed successfully by inspecting exit_info->flags.
Assign the value 1LLU << 0 so the flag is actually set.
Fixes: f3aec2adce8d ("sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_idle_node_masks is allocated with num_possible_nodes() elements but
indexed by NUMA node IDs via for_each_node(). On systems with
non-contiguous NUMA node numbering (e.g. nodes 0 and 4), node IDs can
exceed the array size, causing out-of-bounds memory corruption.
Use nr_node_ids instead, which represents the maximum node ID range and
is the correct size for arrays indexed by node ID.
Fixes: 7c60329e3521 ("sched_ext: Add NUMA-awareness to the default idle selection policy")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_claim_exit() atomically sets exit_kind, which prevents scx_error() from
triggering further error handling. After claiming exit, the caller must kick
the helper kthread work which initiates bypass mode and teardown.
If the calling task gets preempted between claiming exit and kicking the
helper work, and the BPF scheduler fails to schedule it back (since error
handling is now disabled), the helper work is never queued, bypass mode
never activates, tasks stop being dispatched, and the system wedges.
Disable preemption across scx_claim_exit() and the subsequent work kicking
in all callers - scx_disable() and scx_vexit(). Add
lockdep_assert_preemption_disabled() to scx_claim_exit() to enforce the
requirement.
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The '-f' option is defined in getopt() but not handled in the switch
statement or documented in the help text. Providing '-f' currently
triggers the default error path.
Remove it to sync the optstring with the actual implementation.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The '-p' option is defined in getopt() but not handled in the switch
statement or documented in the help text. Providing '-p' currently
triggers the default error path.
Remove it to sync the optstring with the actual implementation.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The read() call in run_test() triggers a warn_unused_result compiler
warning, which breaks the build under -Werror.
Check the return value of read() and exit the child process on failure to
satisfy the compiler and handle pipe read errors.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The runner sets exit_req on SIGINT/SIGTERM but ignores it during the
main loop. This prevents users from cleanly interrupting a test run.
Check exit_req each iteration to safely break out on exit signals.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.
Reset optind to 1 at the restart label so options are re-parsed.
Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The stats thread reads nr_vruntime_enqueues, nr_vruntime_dispatches,
nr_vruntime_failed, and nr_curr_enqueued concurrently with the main
thread writing them, with no synchronization.
Use __atomic builtins with relaxed ordering for all accesses to these
counters to eliminate the data races.
Only display accuracy is affected, not scheduling correctness.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use CPU_SET_S() instead of CPU_SET() on the dynamically allocated
cpuset to avoid a potential out-of-bounds write when nr_cpu_ids
exceeds CPU_SETSIZE.
Also destroy the skeleton before returning on invalid central CPU ID
to prevent a resource leak.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
fcg_read_stats() had a VLA allocating 21 * nr_cpus * 8 bytes on the
stack, risking stack overflow on large CPU counts (nr_cpus can be up
to 512).
Fix by using a single heap allocation with the correct size, reusing
it across all stat indices, and freeing it at the end.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The rt_stall test measures the runtime ratio between an EXT and an RT
task pinned to the same CPU, verifying that the deadline server prevents
RT tasks from starving SCHED_EXT tasks. It expects the EXT task to get
at least 4% of CPU time.
The test is flaky because sched_stress_test() calls sleep(RUN_TIME)
immediately after fork(), without waiting for the RT child to complete
its setup (set_affinity + set_sched). If the RT child experiences
scheduling latency before completing setup, that delay eats into the
measurement window: the RT child runs for less than RUN_TIME seconds,
and the EXT task's measured ratio drops below the 4% threshold.
For example, in the failing CI run [1]:
EXT=0.140s RT=4.750s total=4.890s (expected ~5.0s)
ratio=2.86% < 4% → FAIL
The 110ms gap (5.0 - 4.89) corresponds to the RT child's setup time
being counted inside the measurement window, during which fewer
deadline server ticks fire for the EXT task.
Fix by using pipes to synchronize: each child signals the parent after
completing its setup, and the parent waits for both signals before
starting sleep(RUN_TIME). This ensures the measurement window only
counts time when both tasks are fully configured and competing.
[1] https://github.com/kernel-patches/bpf/actions/runs/21961895809/job/63442490449
Fixes: be621a76341c ("selftests/sched_ext: Add test for sched_ext dl_server")
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix three issues in scx_userland's restart path:
- exit_req is not reset on restart, causing sched_main_loop() to exit
immediately without doing any scheduling work.
- stats_printer thread handle is local to spawn_stats_thread(), making
it impossible to join from main(). Promote it to file scope.
- The stats thread continues reading skel->bss after the skeleton is
destroyed on restart, causing a use-after-free. Join the stats thread
before destroying the skeleton to ensure it has exited.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cpu set is dynamically allocated for nr_cpu_ids using CPU_ALLOC(),
so the size passed to sched_setaffinity() should be CPU_ALLOC_SIZE()
rather than sizeof(cpu_set_t). Valgrind flagged this as accessing
unaddressable bytes past the allocation.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The local cnts array in read_stats() is not initialized before being
accumulated into per-CPU stats, which may lead to reading garbage
values. Zero it out with memset alongside the existing stats array
initialization.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull networking updates from Paolo Abeni:
"Core & protocols:
- A significant effort all around the stack to guide the compiler to
make the right choice when inlining code, to avoid unneeded calls
for small helper and stack canary overhead in the fast-path.
This generates better and faster code with very small or no text
size increases, as in many cases the call generated more code than
the actual inlined helper.
- Extend AccECN implementation so that is now functionally complete,
also allow the user-space enabling it on a per network namespace
basis.
- Add support for memory providers with large (above 4K) rx buffer.
Paired with hw-gro, larger rx buffer sizes reduce the number of
buffers traversing the stack, dincreasing single stream CPU usage
by up to ~30%.
- Do not add HBH header to Big TCP GSO packets. This simplifies the
RX path, the TX path and the NIC drivers, and is possible because
user-space taps can now interpret correctly such packets without
the HBH hint.
- Allow IPv6 routes to be configured with a gateway address that is
resolved out of a different interface than the one specified,
aligning IPv6 to IPv4 behavior.
- Multi-queue aware sch_cake. This makes it possible to scale the
rate shaper of sch_cake across multiple CPUs, while still enforcing
a single global rate on the interface.
- Add support for the nbcon (new buffer console) infrastructure to
netconsole, enabling lock-free, priority-based console operations
that are safer in crash scenarios.
- Improve the TCP ipv6 output path to cache the flow information,
saving cpu cycles, reducing cache line misses and stack use.
- Improve netfilter packet tracker to resolve clashes for most
protocols, avoiding unneeded drops on rare occasions.
- Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
- Reduce tcp socket size by one cache line.
- Notify neighbour changes atomically, avoiding inconsistencies
between the notification sequence and the actual states sequence.
- Add vsock namespace support, allowing complete isolation of vsocks
across different network namespaces.
- Improve xsk generic performances with cache-alignment-oriented
optimizations.
- Support netconsole automatic target recovery, allowing netconsole
to reestablish targets when underlying low-level interface comes
back online.
Driver API:
- Support for switching the working mode (automatic vs manual) of a
DPLL device via netlink.
- Introduce PHY ports representation to expose multiple front-facing
media ports over a single MAC.
- Introduce "rx-polarity" and "tx-polarity" device tree properties,
to generalize polarity inversion requirements for differential
signaling.
- Add helper to create, prepare and enable managed clocks.
Device drivers:
- Add Huawei hinic3 PF etherner driver.
- Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
controller.
- Add ethernet driver for MaxLinear MxL862xx switches
- Remove parallel-port Ethernet driver.
- Convert existing driver timestamp configuration reporting to
hwtstamp_get and remove legacy ioctl().
- Convert existing drivers to .get_rx_ring_count(), simplifing the RX
ring count retrieval. Also remove the legacy fallback path.
- Ethernet high-speed NICs:
- Broadcom (bnxt, bng):
- bnxt: add FW interface update to support FEC stats histogram
and NVRAM defragmentation
- bng: add TSO and H/W GRO support
- nVidia/Mellanox (mlx5):
- improve latency of channel restart operations, reducing the
used H/W resources
- add TSO support for UDP over GRE over VLAN
- add flow counters support for hardware steering (HWS) rules
- use a static memory area to store headers for H/W GRO,
leading to 12% RX tput improvement
- Intel (100G, ice, idpf):
- ice: reorganizes layout of Tx and Rx rings for cacheline
locality and utilizes __cacheline_group* macros on the new
layouts
- ice: introduces Synchronous Ethernet (SyncE) support
- Meta (fbnic):
- adds debugfs for firmware mailbox and tx/rx rings vectors
- Ethernet virtual:
- geneve: introduce GRO/GSO support for double UDP encapsulation
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- some code refactoring and cleanups
- RealTek (r8169):
- add support for RTL8127ATF (10G Fiber SFP)
- add dash and LTR support
- Airoha:
- AN8811HB 2.5 Gbps phy support
- Freescale (fec):
- add XDP zero-copy support
- Thunderbolt:
- add get link setting support to allow bonding
- Renesas:
- add support for RZ/G3L GBETH SoC
- Ethernet switches:
- Maxlinear:
- support R(G)MII slow rate configuration
- add support for Intel GSW150
- Motorcomm (yt921x):
- add DCB/QoS support
- TI:
- icssm-prueth: support bridging (STP/RSTP) via the switchdev
framework
- Ethernet PHYs:
- Realtek:
- enable SGMII and 2500Base-X in-band auto-negotiation
- simplify and reunify C22/C45 drivers
- Micrel: convert bindings to DT schema
- CAN:
- move skb headroom content into skb extensions, making CAN
metadata access more robust
- CAN drivers:
- rcar_canfd:
- add support for FD-only mode
- add support for the RZ/T2H SoC
- sja1000: cleanup the CAN state handling
- WiFi:
- implement EPPKE/802.1X over auth frames support
- split up drop reasons better, removing generic RX_DROP
- additional FTM capabilities: 6 GHz support, supported number of
spatial streams and supported number of LTF repetitions
- better mac80211 iterators to enumerate resources
- initial UHR (Wi-Fi 8) support for cfg80211/mac80211
- WiFi drivers:
- Qualcomm/Atheros:
- ath11k: support for Channel Frequency Response measurement
- ath12k: a significant driver refactor to support multi-wiphy
devices and and pave the way for future device support in the
same driver (rather than splitting to ath13k)
- ath12k: support for the QCC2072 chipset
- Intel:
- iwlwifi: partial Neighbor Awareness Networking (NAN) support
- iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
- RealTek (rtw89):
- preparations for RTL8922DE support
- Bluetooth:
- implement setsockopt(BT_PHY) to set the connection packet type/PHY
- set link_policy on incoming ACL connections
- Bluetooth drivers:
- btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
- btqca: add WCN6855 firmware priority selection feature"
* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
bnge/bng_re: Add a new HSI
net: macb: Fix tx/rx malfunction after phy link down and up
af_unix: Fix memleak of newsk in unix_stream_connect().
net: ti: icssg-prueth: Add optional dependency on HSR
net: dsa: add basic initial driver for MxL862xx switches
net: mdio: add unlocked mdiodev C45 bus accessors
net: dsa: add tag format for MxL862xx switches
dt-bindings: net: dsa: add MaxLinear MxL862xx
selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
octeontx2-pf: Unregister devlink on probe failure
net: renesas: rswitch: fix forwarding offload statemachine
ionic: Rate limit unknown xcvr type messages
tcp: inet6_csk_xmit() optimization
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
ipv6: use np->final in inet6_sk_rebuild_header()
ipv6: add daddr/final storage in struct ipv6_pinfo
net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
...
Pull devicetree updates from Rob Herring:
"DT core:
- Sync dtc/libfdt with upstream v1.7.2-62-ga26ef6400bd8
- Add a for_each_compatible_node_scoped() loop and convert users in
cpufreq, dmaengine, clk, cdx, powerpc and Arm
- Simplify of/platform.c with scoped loop helpers
- Add fw_devlink tracking for "mmc-pwrseq"
- Optimize fw_devlink callback code size for pinctrl-N properties
- Replace strcmp_suffix() with strends()
DT bindings:
- Support building single binding targets
- Convert google,goldfish-fb, cznic,turris-mox-rwtm, ti,prm-inst
- Add bindings for Freescale AVIC, Realtek RTD1xxx system
controllers, Microchip 25AA010A EEPROM, OnSemi FIN3385, IEI
WT61P803 PUZZLE, Delta Electronics DPS-800-AB power supply,
Infineon IR35221 Digital Multi-phase Controller, Infineon PXE1610
Digital Dual Output 6+1 VR12.5 & VR13 CPU Controller,
socionext,uniphier-smpctrl, and xlnx,zynqmp-firmware
- Lots of trivial binding fixes to address warnings in DTS files.
These are mostly for arm64 platforms which is getting closer to be
warning free. Some public shaming has helped.
- Fix I2C bus node names in examples
- Drop obsolete brcm,vulcan-soc binding
- Drop unreferenced binding headers"
* tag 'devicetree-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (60 commits)
dt-bindings: interrupt-controller: Add compatiblie string fsl,imx(1|25|27|31|35)-avic
dt-bindings: soc: imx: add fsl,aips and fsl,emi compatible strings
dt-bindings: display: bridge: lt8912b: Drop reset gpio requirement
dt-bindings: firmware: fsl,scu: Mark multi-channel MU layouts as deprecated
cpufreq: s5pv210: Simplify with scoped for each OF child loop
dmaengine: fsl_raid: Simplify with scoped for each OF child loop
clk: imx: imx31: Simplify with scoped for each OF child loop
clk: imx: imx27: Simplify with scoped for each OF child loop
cdx: Use mutex guard to simplify error handling
cdx: Simplify with scoped for each OF child loop
powerpc/wii: Simplify with scoped for each OF child loop
powerpc/fsp2: Simplify with scoped for each OF child loop
ARM: exynos: Simplify with scoped for each OF child loop
ARM: at91: Simplify with scoped for each OF child loop
of: Add for_each_compatible_node_scoped() helper
dt-bindings: Fix emails with spaces or missing brackets
scripts/dtc: Update to upstream version v1.7.2-62-ga26ef6400bd8
dt-bindings: crypto: inside-secure,safexcel: Mandate only ring IRQs
dt-bindings: crypto: inside-secure,safexcel: Add SoC compatibles
of: reserved_mem: Fix placement of __free() annotation
...
Pull driver core updates from Danilo Krummrich:
"Bus:
- Ensure bus->match() is consistently called with the device lock
held
- Improve type safety of bus_find_device_by_acpi_dev()
Devtmpfs:
- Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
simple_strtoul()
- Avoid sparse warning by making devtmpfs_context_ops static
IOMMU:
- Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()
MAINTAINERS:
- Add the new driver-core mailing list (driver-core@lists.linux.dev)
to all relevant entries
- Add missing tree location for "FIRMWARE LOADER (request_firmware)"
- Add driver-model documentation to the "DRIVER CORE" entry
- Add missing driver-core maintainers to the "AUXILIARY BUS" entry
Misc:
- Change return type of attribute_container_register() to void; it
has always been infallible
- Do not export sysfs_change_owner(), sysfs_file_change_owner() and
device_change_owner()
- Move devres_for_each_res() from the public devres header to
drivers/base/base.h
- Do not use a static struct device for the faux bus; allocate it
dynamically
Revocable:
- Patches for the revocable synchronization primitive have been
scheduled for v7.0-rc1, but have been reverted as they need some
more refinement
Rust:
- Device:
- Support dev_printk on all device types, not just the core Device
struct; remove now-redundant .as_ref() calls in dev_* print
calls
- Devres:
- Introduce an internal reference count in Devres<T> to avoid a
deadlock condition in case of (indirect) nesting
- DMA:
- Allow drivers to tune the maximum DMA segment size via
dma_set_max_seg_size()
- I/O:
- Introduce the concept of generic I/O backends to handle
different kinds of device shared memory through a common
interface.
This enables higher-level concepts such as register
abstractions, I/O slices, and field projections to be built
generically on top.
In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
trait hierarchy for sharing a common interface supporting offset
validation and bound-checking logic between I/O backends.
- Refactor MMIO to use the common I/O backend infrastructure
- Misc:
- Add __rust_helper annotations to C helpers for inlining into
Rust code
- Use "kernel vertical" style for imports
- Replace kernel::c_str! with C string literals
- Update ARef imports to use sync::aref
- Use pin_init::zeroed() for struct auxiliary_device_id and
debugfs file_operations initialization
- Use LKMM atomic types in debugfs doc-tests
- Various minor comment and documentation fixes
- PCI:
- Implement PCI configuration space accessors using the common I/O
backend infrastructure
- Document pci::Bar device endianness assumptions
- SoC:
- Abstractions for struct soc_device and struct soc_device_attribute
- Sample driver for soc::Device"
* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
rust: devres: fix race condition due to nesting
rust: dma: add missing __rust_helper annotations
samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
Revert "revocable: Revocable resource management"
Revert "revocable: Add Kunit test cases"
Revert "selftests: revocable: Add kselftest cases"
driver core: remove device_change_owner() export
sysfs: remove exports of sysfs_*change_owner()
driver core: disable revocable code from build
revocable: Add KUnit test for concurrent access
revocable: fix SRCU index corruption by requiring caller-provided storage
revocable: Add KUnit test for provider lifetime races
revocable: Fix races in revocable_alloc() using RCU
driver core: fix inverted "locked" suffix of driver_match_device()
rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
rust: pci: re-export ConfigSpace
rust: dma: allow drivers to tune max segment size
gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
rust: auxiliary: use `pin_init::zeroed()` for device ID
rust: debugfs: use pin_init::zeroed() for file_operations
...
The HSI is shared between the firmware and the driver and is
automatically generated.
Add a new HSI for the BNGE driver. The current HSI refers to BNXT,
which will become incompatible with ThorUltra devices as the
BNGE driver adds more features. The BNGE driver will not use the HSI
located in the bnxt folder.
Also, add an HSI for ThorUltra RoCE driver.
Changes in v3:
- Fix in bng_roce_hsi.h reported by Jakub (AI review)
https://lore.kernel.org/netdev/20260207051422.4181717-1-kuba@kernel.org/
- Add an entry in MAINTAINERS
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Link: https://patch.msgid.link/20260208172925.1861255-1-vikas.gupta@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 99537d5c476c ("net: macb: Relocate mog_init_rings() callback
from macb_mac_link_up() to macb_open()"), the mog_init_rings() callback
was moved from macb_mac_link_up() to macb_open() to resolve a deadlock
issue. However, this change introduced a tx/rx malfunction following
phy link down and up events. The issue arises from a mismatch between
the software queue->tx_head, queue->tx_tail, queue->rx_prepared_head,
and queue->rx_tail values and the hardware's internal tx/rx queue
pointers.
According to the Zynq UltraScale TRM [1], when tx/rx is disabled, the
internal tx queue pointer resets to the value in the tx queue base
address register, while the internal rx queue pointer remains unchanged.
The following is quoted from the Zynq UltraScale TRM:
When transmit is disabled, with bit [3] of the network control register
set low, the transmit-buffer queue pointer resets to point to the address
indicated by the transmit-buffer queue base address register. Disabling
receive does not have the same effect on the receive-buffer queue
pointer.
Additionally, there is no need to reset the RBQP and TBQP registers in a
phy event callback. Therefore, move macb_init_buffers() to macb_open().
In a phy link up event, the only required action is to reset the tx
software head and tail pointers to align with the hardware's behavior.
[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
Fixes: 99537d5c476c ("net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260208-macb-init-ring-v1-1-939a32c14635@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Don't try to enable Extended Tags on VFs since that bit is Reserved
and causes misleading log messages (Håkon Bugge)
- Initialize Endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only
when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
Tag and Relaxed Ordering settings (Håkon Bugge)
Resource management:
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Fix bridge window alignment with optional resources, where
additional alignment requirement was previously lost (Ilpo
Järvinen)
- Stop over-estimating bridge window size since they are now assigned
without any gaps between them (Ilpo Järvinen)
- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
for nested bridges and endpoints (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim
failures (Ilpo Järvinen)
Driver binding:
- Fix device reference leak in pcie_port_remove_service() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Convert portdrv to use pcie_port_bus_type.probe() and .remove()
callbacks so .probe() and .remove() can eventually be removed from
struct device_driver (Uwe Kleine-König)
Error handling:
- Clear stale errors on reporting agents upon probe so they don't
look like recent errors (Lukas Wunner)
- Add generic RAS tracepoint for hotplug events (Shuai Xue)
- Add RAS tracepoint for link speed changes (Shuai Xue)
Power management:
- Avoid redundant delay on transition from D3hot to D3cold if the
device was already in D3hot (Brian Norris)
- Prevent runtime suspend until devices are fully initialized to
avoid saving incompletely configured device state (Brian Norris)
Power control:
- Add power_on/off callbacks with generic signature to pwrseq,
tc9563, and slot drivers so they can be used by pwrctrl core
(Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver
(Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the
PCI core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan
Sadhasivam)
Virtualization:
- Remove an incorrect unlock in pci_slot_trylock() error handling
(Jinhui Guo)
- Lock the bridge device for slot reset (Keith Busch)
- Enable ACS after IOMMU configuration on OF platforms so ACS is
enabled an all devices; previously the first device enumerated
(typically a Root Port) didn't have ACS enabled (Manivannan
Sadhasivam)
- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
work around hardware erratum; previously ACS SV was only
temporarily disabled, which worked for enumeration but not after
reset (Manivannan Sadhasivam)
Peer-to-peer DMA:
- Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
when removing the PCI device (Hou Tao)
- Remove incorrect p2pmem_alloc_mmap() warning about page refcount
(Hou Tao)
Endpoint framework:
- Add configfs sub-groups synchronously to avoid NULL pointer
dereference when racing with removal (Liu Song)
- Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
functions (Manikanta Maddireddy)
ASPEED PCIe controller driver:
- Add ASPEED Root Complex DT binding and driver (Jacky Chou)
Freescale i.MX6 PCIe controller driver:
- Add DT binding and driver support for an optional external refclock
in addition to the refclock from the internal PLL (Richard Zhu)
- Fix CLKREQ# control so host asserts it during enumeration and
Endpoints can use it afterwards to exit the L1.2 link state
(Richard Zhu)
NVIDIA Tegra PCIe controller driver:
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
down MSI domains to be built as modules (Aaron Kling)
- Allow pci-tegra to be built as a module (Aaron Kling)
NVIDIA Tegra194 PCIe controller driver:
- Relax Kconfig so tegra194 can be built for platforms beyond
Tegra194 (Vidya Sagar)
Qualcomm PCIe controller driver:
- Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)
- Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
schema (Krzysztof Kozlowski)
- Add DT binding and driver support for SA8255p Endpoint being
configured by firmware (Mrinmay Sarkar)
- Parse PERST# from all PCIe bridge nodes for future platforms that
will have PERST# in Switch Downstream Ports as well as in Root
Ports (Manivannan Sadhasivam)
Renesas RZ/G3S PCIe controller driver:
- Use pci_generic_config_write() since the writability provided by
the custom wrapper is unnecessary (Claudiu Beznea)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
Amaoto)
Synopsys DesignWare PCIe controller driver:
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability, to allow removal of
Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities in platforms that can't support
them, so the PCI core automatically falls back to INTx (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
for drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
link is not up to avoid an unnecessary timeout (Manivannan
Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up
IRQs to trigger enumeration instead of waiting for link to be up
because the PCI core doesn't allocate bus number space for
hierarchies that might be attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming
it dynamically, which is slow and racy with respect to other
concurrent traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Allow DWC host controller driver probe to continue if device is not
found or found but inactive; only fail when there's an error with
the link (Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
are not accessible after PME_Turn_Off, simply wait 10ms instead of
polling for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA
ranges when necessary instead of failing (Samuel Holland)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint
Controllers that can update BAR inbound address translation without
requiring EPF driver to clear/reset the BAR first, and advertise it
for DWC-based Endpoints (Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that
can map multiple independent inbound regions in a single BAR,
implement subrange mapping, advertise it for DWC-based Endpoints,
and add Endpoint selftests for it (Koichiro Den)
- Make resizable BARs work for Endpoint multi-PF configurations;
previously it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
and Address Match Mode (Aksh Garg)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound
windows weren't programmed, and ECAM-related iATU entries weren't
restored after suspend/resume, so config accesses failed (Krishna
Chaitanya Chundru)
Miscellaneous:
- Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"
* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
PCI: Disable ACS SV for IDT 0x8090 switch
PCI: Disable ACS SV for IDT 0x80b5 switch
PCI: Cache ACS Capabilities register
PCI: Enable ACS after configuring IOMMU for OF platforms
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
PCI: Use device_lock_assert() to verify device lock is held
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
PCI: Fix pci_slot_lock () device locking
PCI: Fix pci_slot_trylock() error handling
PCI: Mark Nvidia GB10 to avoid bus reset
PCI: Mark ASM1164 SATA controller to avoid bus reset
PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
PCI/PME: Replace RMW of Root Status register with direct write
PCI/AER: Clear stale errors on reporting agents upon probe
PCI: Don't claim disabled bridge windows
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
...
Commit f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc") did
attempt to optimize away the internal reference count of Devres.
However, without an internal reference count, we can't support cases
where Devres is indirectly nested, resulting into a deadlock.
Such indirect nesting easily happens in the following way:
A registration object (which is guarded by devres) hold a reference
count of an object that holds a device resource guarded by devres
itself.
For instance a drm::Registration holds a reference of a drm::Device. The
drm::Device itself holds a device resource in its private data.
When the drm::Registration is dropped by devres, and it happens that it
did hold the last reference count of the drm::Device, it also drops the
device resource, which is guarded by devres itself.
Thus, resulting into a deadlock in the Devres destructor of the device
resource, as in the following backtrace.
sysrq: Show Blocked State
task:rmmod state:D stack:0 pid:1331 tgid:1331 ppid:1330 task_flags:0x400100 flags:0x00000010
Call trace:
__switch_to+0x190/0x294 (T)
__schedule+0x878/0xf10
schedule+0x4c/0xcc
schedule_timeout+0x44/0x118
wait_for_common+0xc0/0x18c
wait_for_completion+0x18/0x24
_RINvNtCs4gKlGRWyJ5S_4core3ptr13drop_in_placeINtNtNtCsgzhNYVB7wSz_6kernel4sync3arc3ArcINtNtBN_6devres6DevresmEEECsRdyc7Hyps3_15rust_driver_pci+0x68/0xe8 [rust_driver_pci]
_RINvNvNtCsgzhNYVB7wSz_6kernel6devres16register_foreign8callbackINtNtCs4gKlGRWyJ5S_4core3pin3PinINtNtNtB6_5alloc4kbox3BoxINtNtNtB6_4sync3arc3ArcINtB4_6DevresmEENtNtB1A_9allocator7KmallocEEECsRdyc7Hyps3_15rust_driver_pci+0x34/0xc8 [rust_driver_pci]
devm_action_release+0x14/0x20
devres_release_all+0xb8/0x118
device_release_driver_internal+0x1c4/0x28c
driver_detach+0x94/0xd4
bus_remove_driver+0xdc/0x11c
driver_unregister+0x34/0x58
pci_unregister_driver+0x20/0x80
__arm64_sys_delete_module+0x1d8/0x254
invoke_syscall+0x40/0xcc
el0_svc_common+0x8c/0xd8
do_el0_svc+0x1c/0x28
el0_svc+0x54/0x1d4
el0t_64_sync_handler+0x84/0x12c
el0t_64_sync+0x198/0x19c
In order to fix this, re-introduce the internal reference count.
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Deadlock.20caused.20by.20nested.20Devres/with/571242651
Reported-by: Markus Probst <markus.probst@posteo.de>
Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20Devres.20inside.20Devres.20stuck.20on.20cleanup/with/571239721
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/56#note_3282757
Fixes: f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patch.msgid.link/20260205222529.91465-1-dakr@kernel.org
[ Call clone() prior to devm_add_action(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Daniel Golle says:
====================
net: dsa: initial support for MaxLinear MxL862xx switches
This series adds very basic DSA support for the MaxLinear MxL86252
(5x 2500Base-T PHYs) and MxL86282 (8x 2500Base-T PHYs) switches.
In addition to the 2.5G TP ports both switches also come with two
SerDes interfaces which can be used either to connect external PHYs
or SFP cages, or as CPU port when using the switch with this DSA driver.
MxL862xx integrates a firmware running on an embedded processor (based on
Zephyr RTOS). Host interaction uses a simple netlink-like API transported
over MDIO/MMD.
This series includes only what's needed to pass traffic between user
ports and the CPU port: relayed MDIO to internal PHYs, basic port
enable/disable, and CPU-port special tagging.
The SerDes interface of the CPU port is automatically configured by the
switch after reset using a board-specific configuration stored together
with the firmware in the flash chip attached to the switch, so no action
is needed from the driver to setup the interface mode of the CPU port.
Also MAC settings of the PHY ports are automatically configured, which
means the driver works fine with phylink_mac_ops being all no-op stubs.
Multiple follow up series will bring support for setting up the other
SerDes PCS interface (ie. not used for the CPU port), bridge, VLAN, ...
offloading, and support for using an 802.1Q-based special tag instead of
the proprietary 8-byte tag.
====================
Link: https://patch.msgid.link/cover.1770433307.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When prepare_peercred() fails in unix_stream_connect(),
unix_release_sock() is not called for newsk, and the memory
is leaked.
Let's move prepare_peercred() before unix_create1().
Fixes: fd0a109a0f6b ("net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260207232236.2557549-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull device mapper updates from Mikulas Patocka:
- dm-verity:
- various optimizations and fixes related to forward error correction
- add a .dm-verity keyring
- dm-integrity: fix bugs with growing a device in bitmap mode
- dm-mpath:
- fix leaking fake timeout requests
- fix UAF bug caused by stale rq->bio
- fix minor bugs in device creation
- dm-core:
- fix a bug related to blkg association
- avoid unnecessary blk-crypto work on invalid keys
- dm-bufio:
- dm-bufio cleanup and optimization (reducing hash table lookups)
- various other minor fixes and cleanups
* tag 'for-7.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
dm mpath: make pg_init_delay_msecs settable
Revert "dm: fix a race condition in retrieve_deps"
dm mpath: Add missing dm_put_device when failing to get scsi dh name
dm vdo encodings: clean up header and version functions
dm: use bio_clone_blkg_association
dm: fix excessive blk-crypto operations for invalid keys
dm-verity: fix section mismatch error
dm-unstripe: fix mapping bug when there are multiple targets in a table
dm-integrity: fix recalculation in bitmap mode
dm-bufio: avoid redundant buffer_tree lookups
dm-bufio: merge cache_put() into cache_put_and_wake()
selftests: add dm-verity keyring selftests
dm-verity: add dm-verity keyring
dm: clear cloned request bio pointer when last clone bio completes
dm-verity: fix up various workqueue-related comments
dm-verity: switch to bio_advance_iter_single()
dm-verity: consolidate the BH and normal work structs
dm: add WQ_PERCPU to alloc_workqueue users
dm-integrity: fix a typo in the code for write/discard race
dm: use READ_ONCE in dm_blk_report_zones
...
- Fix documentation typos (Shawn Lin)
- Add struct p2pdma_provider kernel doc (Leon Romanovsky)
- Remove useless devres WARN_ON() (Philipp Stanner)
* pci/misc:
PCI: Remove useless WARN_ON() from devres
PCI/P2PDMA: Add missing struct p2pdma_provider documentation
Documentation: PCI: Fix typos in msi-howto.rst
The commit d8932355f8c5 ("rust: dma: add helpers for architectures
without CONFIG_HAS_DMA") missed adding the __rust_helper annotations.
Add them.
Reported-by: Gary Guo <gary@garyguo.net>
Closes: https://lore.kernel.org/rust-for-linux/DFW4F5OSDO7A.TBUOX6RCN8G7@garyguo.net/
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260126071738.1670967-1-dirk.behme@de.bosch.com
[ Fix minor checkpatch.pl warning. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Drop the gpio reset from the list of required properties. The bridge works
fine with a reset not managed by Linux. In the driver itself the gpio is
already flagged as optional.
Signed-off-by: Dominik Haller <d.haller@phytec.de>
Link: https://patch.msgid.link/20260130205820.83189-6-d.haller@phytec.de
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Prior to this the receiver would sleep for the configured timeout,
then attempt to receive as many packets as possible. This would result
in a large burst of packets, and we don't necessarily need that many samples.
The tests now run faster.
Before
ok 12 toeplitz.test.rps_udp_ipv6
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
real 0m54.792s
user 0m12.486s
sys 0m10.887s
After
ok 12 toeplitz.test.rps_udp_ipv6
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
real 0m36.892s
user 0m4.203s
sys 0m8.314s
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20260207013018.551347-1-dimitri.daskalakis1@gmail.com
[pabeni@redhat.com: whitespaces fixes]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add very basic DSA driver for MaxLinear's MxL862xx switches.
In contrast to previous MaxLinear switches the MxL862xx has a built-in
processor that runs a sophisticated firmware based on Zephyr RTOS.
Interaction between the host and the switch hence is organized using a
software API of that firmware rather than accessing hardware registers
directly.
Add descriptions of the most basic firmware API calls to access the
built-in MDIO bus hosting the 2.5GE PHYs, basic port control as well as
setting up the CPU port.
Implement a very basic DSA driver using that API which is sufficient to
get packets flowing between the user ports and the CPU port.
The firmware offers all features one would expect from a modern switch
hardware, they are going to be added one by one in follow-up patch
series.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/ccde07e8cf33d8ae243000013b57cfaa2695e0a9.1770433307.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame
forward offload") introduced support for offloading HSR frame forwarding,
which relies on functions such as is_hsr_master() provided by the HSR
module. Although HSR provides stubs for configurations with HSR
disabled, this driver still requires an optional dependency on HSR.
Otherwise, build failures will occur when icssg-prueth is built-in
while HSR is configured as a module.
ld.lld: error: undefined symbol: is_hsr_master
>>> referenced by icssg_prueth.c:710 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:710)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:681 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:681)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:1812 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:1812)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(prueth_netdevice_event) in archive vmlinux.a
ld.lld: error: undefined symbol: hsr_get_port_ndev
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/etherneteth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:683 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:683)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced 1 more times
Fixes: 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame forward offload")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260207-icssg-dep-v3-1-8c47c1937f81@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull iommu updates from Joerg Roedel:
"Core changes:
- Rust bindings for IO-pgtable code
- IOMMU page allocation debugging support
- Disable ATS during PCI resets
Intel VT-d changes:
- Skip dev-iotlb flush for inaccessible PCIe device
- Flush cache for PASID table before using it
- Use right invalidation method for SVA and NESTED domains
- Ensure atomicity in context and PASID entry updates
AMD-Vi changes:
- Support for nested translations
- Other minor improvements
ARM-SMMU-v2 changes:
- Configure SoC-specific prefetcher settings for Qualcomm's "MDSS"
ARM-SMMU-v3 changes:
- Improve CMDQ locking fairness for pathetically small queue sizes
- Remove tracking of the IAS as this is only relevant for AArch32 and
was causing C_BAD_STE errors
- Add device-tree support for NVIDIA's CMDQV extension
- Allow some hitless transitions for the 'MEV' and 'EATS' STE fields
- Don't disable ATS for nested S1-bypass nested domains
- Additions to the kunit selftests"
* tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
iommu/amd: serialize sequence allocation under concurrent TLB invalidations
iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
iommu/tegra241-cmdqv: Decouple driver from ACPI
iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
iommu/vt-d: Fix race condition during PASID entry replacement
iommu/vt-d: Clear Present bit before tearing down context entry
iommu/vt-d: Clear Present bit before tearing down PASID entry
iommu/vt-d: Flush piotlb for SVM and Nested domain
iommu/vt-d: Flush cache for PASID table before using it
iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
rust: iommu: fix `srctree` link warning
rust: iommu: fix Rust formatting
...
"pg_init_delay_msecs X" can be passed as a feature in the multipath
table and is used to set m->pg_init_delay_msecs in parse_features().
However, alloc_multipath_stage2(), which is called after
parse_features(), resets m->pg_init_delay_msecs to its default value.
Instead, set m->pg_init_delay_msecs in alloc_multipath(), which is
called before parse_features(), to avoid overwriting a value passed in
by the table.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Cc: stable@vger.kernel.org
PCI's devres implementation contains a WARN_ON() which served to inform
users relying on the legacy devres iomap table that this table does not
support multiple mappings per BAR.
The WARN_ON() can be regarded as useless by now, since mapping a BAR
multiple times is legal behavior and old users of pcim_iomap_table(), the
accessor function for that table, did not break in the past PCI devres
cleanup. New PCI users will hopefully notice that pcim_iomap_table() is
deprecated and are unlikely to use it for mapping the same BAR multiple
times.
Moreover, WARN_ON()s create noisy, difficult to read error messages which
can be more confusing than helpful, since they don't inform the user about
what precisely the problem is.
Remove the WARN_ON().
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251218092819.149665-2-phasta@kernel.org
The commit 600de1c008b2 ("rust: pci: remove redundant `.as_ref()` for
`dev_*` print") removed `.as_ref()` for `dev_*` prints. Nearly at the
same time the commit e62e48adf76c ("sample: rust: pci: add tests for
config space routines") was merged. Which missed this removal, then.
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260202064001.176787-1-dirk.behme@de.bosch.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The SCU MU driver has already supported the simple and efficient single-TX
and single-RX channel layout since 2021. The older multi-channel MU
configurations (tx0..tx3 and rx0..rx3) are less efficient in practice and
not needed.
Mark these legacy mbox-names and mboxes tuple layouts as deprecated in the
binding schema. The driver continues to support them for backward
compatibility in case firmware publishes the legacy properties.
The example section is updated accordingly to demonstrate the recommended
layout.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260127-scu-v2-1-03f3aaa56e1b@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Eric Dumazet says:
====================
ipv6: tcp: no longer rebuild fl6 at each transmit
TCP v6 spends a good amount of time rebuilding a fresh fl6 at each
transmit in inet6_csk_xmit()/inet6_csk_route_socket().
TCP v4 caches the information in inet->cork.fl.u.ip4 instead.
This series changes TCP v6 to behave the same, saving cpu cycles
and reducing cache line misses and stack use.
====================
Link: https://patch.msgid.link/20260206173426.1638518-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add helper inline functions __mdiodev_c45_read() and
__mdiodev_c45_write(), which are the C45 equivalents of the existing
__mdiodev_read() and __mdiodev_write() added by commit e6a45700e7e1
("net: mdio: add unlocked mdiobus and mdiodev bus accessors")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/8d1d55949a75a871d2a3b90e421de4bd58d77685.1770433307.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When probe fails after devlink registration, the missing devlink unregister
call causing a memory leak.
Fixes: 2da489432747 ("octeontx2-pf: devlink params support to set mcam entry count")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20260206182645.4032737-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull landlock updates from Mickaël Salaün:
- extend Landlock to enforce restrictions on a whole process, similarly
to the seccomp's TSYNC flag
- refactor data structures to simplify code and improve performance
- add documentation to cover missing parts
* tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
mailmap: Add entry for Mickaël Salaün
landlock: Transpose the layer masks data structure
landlock: Add access_mask_subset() helper
selftests/landlock: Add filesystem access benchmark
landlock: Document audit blocker field format
landlock: Add errata documentation section
landlock: Add backwards compatibility for restrict flags
landlock: Refactor TCP socket type check
landlock: Minor reword of docs for TCP access rights
landlock: Document LANDLOCK_RESTRICT_SELF_TSYNC
selftests/landlock: Add LANDLOCK_RESTRICT_SELF_TSYNC tests
landlock: Multithreading support for landlock_restrict_self()
This reverts commit f6007dce0cd35d634d9be91ef3515a6385dcee16.
Commit f6007dce0cd3 ("dm: fix a race condition in retrieve_deps") was
added to fix a race between retrieving the list of dm table devices and
multipath_message() modifying the list of table devices. But Commit
a48f6b82c5c4 ("dm mpath: don't call dm_get_device in multipath_message")
removed the call to dm_get_device() from multipath_message(). After that
commit, the only calls to dm_get_device() and dm_put_device() are in
target constructors and destructors, so the race with retrieve_deps() is
no longer possible.
Suggested-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
After commit 894af4a1cde6 ("objtool: Validate kCFI calls"), compile
testing pcie-rcar-host.c with CONFIG_FINEIBT=y and CONFIG_OF=n results
in a no-cfi objtool warning in rcar_pcie_probe():
$ cat allno.config
CONFIG_CFI=y
CONFIG_COMPILE_TEST=y
CONFIG_CPU_MITIGATIONS=y
CONFIG_GENERIC_PHY=y
CONFIG_MITIGATION_RETPOLINE=y
CONFIG_MODULES=y
CONFIG_PCI=y
CONFIG_PCI_MSI=y
CONFIG_PCIE_RCAR_HOST=y
CONFIG_X86_KERNEL_IBT=y
$ make -skj"$(nproc)" ARCH=x86_64 KCONFIG_ALLCONFIG=1 LLVM=1 clean allnoconfig vmlinux
vmlinux.o: warning: objtool: rcar_pcie_probe+0x191: no-cfi indirect call!
When CONFIG_OF is unset, of_device_get_match_data() returns NULL, so
LLVM knows this indirect call has no valid destination and drops the
kCFI setup before the call, triggering the objtool check that makes sure
all indirect calls have kCFI setup.
This driver depends on OF for probing with non-NULL data for every match
so this call will never be NULL in practice. Add a hard Kconfig
dependency on OF to avoid the warning.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510092124.O2IX0Jek-lkp@intel.com/
Closes: https://github.com/ClangBuiltLinux/linux/issues/2134
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251014-rcar_pcie_probe-avoid-nocfi-objtool-warning-v2-1-6e0204b002c6@kernel.org
Two fields in struct p2pdma_provider were not documented, which resulted
in the following kernel-doc warning:
Warning: include/linux/pci-p2pdma.h:26 struct member 'owner' not described in 'p2pdma_provider'
Warning: include/linux/pci-p2pdma.h:26 struct member 'bus_offset' not described in 'p2pdma_provider'
Repro:
$ scripts/kernel-doc -none include/linux/pci-p2pdma.h
Fixes: f58ef9d1d135 ("PCI/P2PDMA: Separate the mmap() support from the core logic")
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Closes: https://lore.kernel.org/all/20260102234033.GA246107@bhelgaas
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patch.msgid.link/20260104-fix-p2p-kdoc-v1-1-6d181233f8bc@nvidia.com