Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

driver core: fix potential deadlock in __driver_attach

In __driver_attach function, There are also AA deadlock problem,
like the commit b232b02bf3c2 ("driver core: fix deadlock in
__device_attach").

stack like commit b232b02bf3c2 ("driver core: fix deadlock in
__device_attach").
list below:
In __driver_attach function, The lock holding logic is as follows:
...
__driver_attach
if (driver_allows_async_probing(drv))
device_lock(dev) // get lock dev
async_schedule_dev(__driver_attach_async_helper, dev); // func
async_schedule_node
async_schedule_node_domain(func)
entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
/* when fail or work limit, sync to execute func, but
__driver_attach_async_helper will get lock dev as
will, which will lead to A-A deadlock. */
if (!entry || atomic_read(&entry_count) > MAX_WORK) {
func;
else
queue_work_node(node, system_unbound_wq, &entry->work)
device_unlock(dev)

As above show, when it is allowed to do async probes, because of
out of memory or work limit, async work is not be allowed, to do
sync execute instead. it will lead to A-A deadlock because of
__driver_attach_async_helper getting lock dev.

Reproduce:
and it can be reproduce by make the condition
(if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like
below:

[ 370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 370.787154] task:swapper/0 state:D stack: 0 pid: 1 ppid:
0 flags:0x00004000
[ 370.788865] Call Trace:
[ 370.789374] <TASK>
[ 370.789841] __schedule+0x482/0x1050
[ 370.790613] schedule+0x92/0x1a0
[ 370.791290] schedule_preempt_disabled+0x2c/0x50
[ 370.792256] __mutex_lock.isra.0+0x757/0xec0
[ 370.793158] __mutex_lock_slowpath+0x1f/0x30
[ 370.794079] mutex_lock+0x50/0x60
[ 370.794795] __device_driver_lock+0x2f/0x70
[ 370.795677] ? driver_probe_device+0xd0/0xd0
[ 370.796576] __driver_attach_async_helper+0x1d/0xd0
[ 370.797318] ? driver_probe_device+0xd0/0xd0
[ 370.797957] async_schedule_node_domain+0xa5/0xc0
[ 370.798652] async_schedule_node+0x19/0x30
[ 370.799243] __driver_attach+0x246/0x290
[ 370.799828] ? driver_allows_async_probing+0xa0/0xa0
[ 370.800548] bus_for_each_dev+0x9d/0x130
[ 370.801132] driver_attach+0x22/0x30
[ 370.801666] bus_add_driver+0x290/0x340
[ 370.802246] driver_register+0x88/0x140
[ 370.802817] ? virtio_scsi_init+0x116/0x116
[ 370.803425] scsi_register_driver+0x1a/0x30
[ 370.804057] init_sd+0x184/0x226
[ 370.804533] do_one_initcall+0x71/0x3a0
[ 370.805107] kernel_init_freeable+0x39a/0x43a
[ 370.805759] ? rest_init+0x150/0x150
[ 370.806283] kernel_init+0x26/0x230
[ 370.806799] ret_from_fork+0x1f/0x30

To fix the deadlock, move the async_schedule_dev outside device_lock,
as we can see, in async_schedule_node_domain, the parameter of
queue_work_node is system_unbound_wq, so it can accept concurrent
operations. which will also not change the code logic, and will
not lead to deadlock.

Fixes: ef0ff68351be ("driver core: Probe devices asynchronously instead of the driver")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Link: https://lore.kernel.org/r/20220622074327.497102-1-zhangwensheng5@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Zhang Wensheng and committed by
Greg Kroah-Hartman
70fe7583 1d248d23

+4 -1
+4 -1
drivers/base/dd.c
··· 1099 1099 static int __driver_attach(struct device *dev, void *data) 1100 1100 { 1101 1101 struct device_driver *drv = data; 1102 + bool async = false; 1102 1103 int ret; 1103 1104 1104 1105 /* ··· 1138 1137 if (!dev->driver && !dev->p->async_driver) { 1139 1138 get_device(dev); 1140 1139 dev->p->async_driver = drv; 1141 - async_schedule_dev(__driver_attach_async_helper, dev); 1140 + async = true; 1142 1141 } 1143 1142 device_unlock(dev); 1143 + if (async) 1144 + async_schedule_dev(__driver_attach_async_helper, dev); 1144 1145 return 0; 1145 1146 } 1146 1147