Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

blk-cgroup: wait for blkcg cleanup before initializing new disk

When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.

Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.

Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260311032837.2368714-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Ming Lei and committed by
Jens Axboe
3dbaacf6 daa6c798

+15
+15
block/blk-cgroup.c
··· 24 24 #include <linux/backing-dev.h> 25 25 #include <linux/slab.h> 26 26 #include <linux/delay.h> 27 + #include <linux/wait_bit.h> 27 28 #include <linux/atomic.h> 28 29 #include <linux/ctype.h> 29 30 #include <linux/resume_user_mode.h> ··· 612 611 613 612 q->root_blkg = NULL; 614 613 spin_unlock_irq(&q->queue_lock); 614 + 615 + wake_up_var(&q->root_blkg); 615 616 } 616 617 617 618 static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) ··· 1500 1497 struct request_queue *q = disk->queue; 1501 1498 struct blkcg_gq *new_blkg, *blkg; 1502 1499 bool preloaded; 1500 + 1501 + /* 1502 + * If the queue is shared across disk rebind (e.g., SCSI), the 1503 + * previous disk's blkcg state is cleaned up asynchronously via 1504 + * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to 1505 + * finish (indicated by root_blkg becoming NULL) before setting up 1506 + * new blkcg state. Otherwise, we may overwrite q->root_blkg while 1507 + * the old one is still alive, and radix_tree_insert() in 1508 + * blkg_create() will fail with -EEXIST because the old entries 1509 + * still occupy the same queue id slot in blkcg->blkg_tree. 1510 + */ 1511 + wait_var_event(&q->root_blkg, !READ_ONCE(q->root_blkg)); 1503 1512 1504 1513 new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL); 1505 1514 if (!new_blkg)