Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

net: dsa: properly keep track of conduit reference

Problem description
-------------------

DSA has a mumbo-jumbo of reference handling of the conduit net device
and its kobject which, sadly, is just wrong and doesn't make sense.

There are two distinct problems.

1. The OF path, which uses of_find_net_device_by_node(), never releases
the elevated refcount on the conduit's kobject. Nominally, the OF and
non-OF paths should result in objects having identical reference
counts taken, and it is already suspicious that
dsa_dev_to_net_device() has a put_device() call which is missing in
dsa_port_parse_of(), but we can actually even verify that an issue
exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command
"before" and "after" applying this patch:

(unbind the conduit driver for net device eno2)
echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind

we see these lines in the output diff which appear only with the patch
applied:

kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)

2. After we find the conduit interface one way (OF) or another (non-OF),
it can get unregistered at any time, and DSA remains with a long-lived,
but in this case stale, cpu_dp->conduit pointer. Holding the net
device's underlying kobject isn't actually of much help, it just
prevents it from being freed (but we never need that kobject
directly). What helps us to prevent the net device from being
unregistered is the parallel netdev reference mechanism (dev_hold()
and dev_put()).

Actually we actually use that netdev tracker mechanism implicitly on
user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
But time still passes at DSA switch probe time between the initial
of_find_net_device_by_node() code and the user port creation time, time
during which the conduit could unregister itself and DSA wouldn't know
about it.

So we have to run of_find_net_device_by_node() under rtnl_lock() to
prevent that from happening, and release the lock only with the netdev
tracker having acquired the reference.

Do we need to keep the reference until dsa_unregister_switch() /
dsa_switch_shutdown()?
1: Maybe yes. A switch device will still be registered even if all user
ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
make user port errors fatal"), and the cpu_dp->conduit pointers
remain valid. I haven't audited all call paths to see whether they
will actually use the conduit in lack of any user port, but if they
do, it seems safer to not rely on user ports for that reference.
2. Definitely yes. We support changing the conduit which a user port is
associated to, and we can get into a situation where we've moved all
user ports away from a conduit, thus no longer hold any reference to
it via the net device tracker. But we shouldn't let it go nonetheless
- see the next change in relation to dsa_tree_find_first_conduit()
and LAG conduits which disappear.
We have to be prepared to return to the physical conduit, so the CPU
port must explicitly keep another reference to it. This is also to
say: the user ports and their CPU ports may not always keep a
reference to the same conduit net device, and both are needed.

As for the conduit's kobject for the /sys/class/net/ entry, we don't
care about it, we can release it as soon as we hold the net device
object itself.

History and blame attribution
-----------------------------

The code has been refactored so many times, it is very difficult to
follow and properly attribute a blame, but I'll try to make a short
history which I hope to be correct.

We have two distinct probing paths:
- one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
new binding implementation")
- one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
Add support for platform data")

These are both complete rewrites of the original probing paths (which
used struct dsa_switch_driver and other weird stuff, instead of regular
devices on their respective buses for register access, like MDIO, SPI,
I2C etc):
- one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
device tree bindings to register DSA switches")
- one for non-OF, introduced in 2008 in commit 91da11f870f0 ("net:
Distributed Switch Architecture protocol support")

except for tiny bits and pieces like dsa_dev_to_net_device() which were
seemingly carried over since the original commit, and used to this day.

The point is that the original probing paths received a fix in 2015 in
the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
dev_put() calls"), but the fix never made it into the "new" (dsa2)
probing paths that can still be traced to today, and the fixed probing
path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
legacy probing support").

That is to say, the new probing paths were never quite correct in this
area.

The existence of the legacy probing support which was deleted in 2019
explains why dsa_dev_to_net_device() returns a conduit with elevated
refcount (because it was supposed to be released during
dsa_remove_dst()). After the removal of the legacy code, the only user
of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
function returns. This pattern makes no sense today, and can only be
interpreted historically to understand why dev_hold() was there in the
first place.

Change details
--------------

Today we have a better netdev tracking infrastructure which we should
use. Logically netdev_hold() belongs in common code
(dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a
tradeoff to be made with the rtnl_lock() section which would become a
bit too long if we did that - dsa_port_parse_cpu() also calls
request_module(). So we duplicate a bit of logic in order for the
callers of dsa_port_parse_cpu() to be the ones responsible of holding
the conduit reference and releasing it on error. This shortens the
rtnl_lock() section significantly.

In the dsa_switch_probe() error path, dsa_switch_release_ports() will be
called in a number of situations, one being where dsa_port_parse_cpu()
maybe didn't get the chance to run at all (a different port failed
earlier, etc). So we have to test for the conduit being NULL prior to
calling netdev_put().

There have still been so many transformations to the code since the
blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
dsa: be compatible with masters which unregister on shutdown")), that it
only makes sense to fix the code using the best methods available today
and see how it can be backported to stable later. I suspect the fix
cannot even be backported to kernels which lack dsa_switch_shutdown(),
and I suspect this is also maybe why the long-lived conduit reference
didn't make it into the new DSA probing paths at the time (problems
during shutdown).

Because dsa_dev_to_net_device() has a single call site and has to be
changed anyway, the logic was just absorbed into the non-OF
dsa_port_parse().

Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.

Reported-by: Ma Ke <make24@iscas.ac.cn>
Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251215150236.3931670-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Vladimir Oltean and committed by
Paolo Abeni
06e219f6 5e7365b5

+35 -25
+1
include/net/dsa.h
··· 302 302 struct devlink_port devlink_port; 303 303 struct phylink *pl; 304 304 struct phylink_config pl_config; 305 + netdevice_tracker conduit_tracker; 305 306 struct dsa_lag *lag; 306 307 struct net_device *hsr_dev; 307 308
+34 -25
net/dsa/dsa.c
··· 1253 1253 if (ethernet) { 1254 1254 struct net_device *conduit; 1255 1255 const char *user_protocol; 1256 + int err; 1256 1257 1258 + rtnl_lock(); 1257 1259 conduit = of_find_net_device_by_node(ethernet); 1258 1260 of_node_put(ethernet); 1259 - if (!conduit) 1261 + if (!conduit) { 1262 + rtnl_unlock(); 1260 1263 return -EPROBE_DEFER; 1264 + } 1265 + 1266 + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL); 1267 + put_device(&conduit->dev); 1268 + rtnl_unlock(); 1261 1269 1262 1270 user_protocol = of_get_property(dn, "dsa-tag-protocol", NULL); 1263 - return dsa_port_parse_cpu(dp, conduit, user_protocol); 1271 + err = dsa_port_parse_cpu(dp, conduit, user_protocol); 1272 + if (err) 1273 + netdev_put(conduit, &dp->conduit_tracker); 1274 + return err; 1264 1275 } 1265 1276 1266 1277 if (link) ··· 1404 1393 return device_find_child(parent, class, dev_is_class); 1405 1394 } 1406 1395 1407 - static struct net_device *dsa_dev_to_net_device(struct device *dev) 1408 - { 1409 - struct device *d; 1410 - 1411 - d = dev_find_class(dev, "net"); 1412 - if (d != NULL) { 1413 - struct net_device *nd; 1414 - 1415 - nd = to_net_dev(d); 1416 - dev_hold(nd); 1417 - put_device(d); 1418 - 1419 - return nd; 1420 - } 1421 - 1422 - return NULL; 1423 - } 1424 - 1425 1396 static int dsa_port_parse(struct dsa_port *dp, const char *name, 1426 1397 struct device *dev) 1427 1398 { 1428 1399 if (!strcmp(name, "cpu")) { 1429 1400 struct net_device *conduit; 1401 + struct device *d; 1402 + int err; 1430 1403 1431 - conduit = dsa_dev_to_net_device(dev); 1432 - if (!conduit) 1404 + rtnl_lock(); 1405 + d = dev_find_class(dev, "net"); 1406 + if (!d) { 1407 + rtnl_unlock(); 1433 1408 return -EPROBE_DEFER; 1409 + } 1434 1410 1435 - dev_put(conduit); 1411 + conduit = to_net_dev(d); 1412 + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL); 1413 + put_device(d); 1414 + rtnl_unlock(); 1436 1415 1437 - return dsa_port_parse_cpu(dp, conduit, NULL); 1416 + err = dsa_port_parse_cpu(dp, conduit, NULL); 1417 + if (err) 1418 + netdev_put(conduit, &dp->conduit_tracker); 1419 + return err; 1438 1420 } 1439 1421 1440 1422 if (!strcmp(name, "dsa")) ··· 1495 1491 struct dsa_vlan *v, *n; 1496 1492 1497 1493 dsa_switch_for_each_port_safe(dp, next, ds) { 1494 + if (dsa_port_is_cpu(dp) && dp->conduit) 1495 + netdev_put(dp->conduit, &dp->conduit_tracker); 1496 + 1498 1497 /* These are either entries that upper layers lost track of 1499 1498 * (probably due to bugs), or installed through interfaces 1500 1499 * where one does not necessarily have to remove them, like ··· 1642 1635 /* Disconnect from further netdevice notifiers on the conduit, 1643 1636 * since netdev_uses_dsa() will now return false. 1644 1637 */ 1645 - dsa_switch_for_each_cpu_port(dp, ds) 1638 + dsa_switch_for_each_cpu_port(dp, ds) { 1646 1639 dp->conduit->dsa_ptr = NULL; 1640 + netdev_put(dp->conduit, &dp->conduit_tracker); 1641 + } 1647 1642 1648 1643 rtnl_unlock(); 1649 1644 out: