Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

block: don't release queue's sysfs lock during switching elevator

cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release & acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing & storing 'queue/iosched' attributes and removing elevator's
kobject.

Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
required in .show & .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.

The biggest issue is that commit cecf5d87ff20 assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release & re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.

Fixes the issue by not releasing queue's sysfs lock during switching
elevator.

Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Ming Lei and committed by
Jens Axboe
b89f625e 284b94be

+5 -39
+4 -9
block/blk-sysfs.c
··· 989 989 blk_mq_debugfs_register(q); 990 990 } 991 991 992 - /* 993 - * The flag of QUEUE_FLAG_REGISTERED isn't set yet, so elevator 994 - * switch won't happen at all. 995 - */ 992 + mutex_lock(&q->sysfs_lock); 996 993 if (q->elevator) { 997 994 ret = elv_register_queue(q, false); 998 995 if (ret) { 996 + mutex_unlock(&q->sysfs_lock); 999 997 mutex_unlock(&q->sysfs_dir_lock); 1000 998 kobject_del(&q->kobj); 1001 999 blk_trace_remove_sysfs(dev); ··· 1003 1005 has_elevator = true; 1004 1006 } 1005 1007 1006 - mutex_lock(&q->sysfs_lock); 1007 1008 blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); 1008 1009 wbt_enable_default(q); 1009 1010 blk_throtl_register_queue(q); ··· 1059 1062 kobject_del(&q->kobj); 1060 1063 blk_trace_remove_sysfs(disk_to_dev(disk)); 1061 1064 1062 - /* 1063 - * q->kobj has been removed, so it is safe to check if elevator 1064 - * exists without holding q->sysfs_lock. 1065 - */ 1065 + mutex_lock(&q->sysfs_lock); 1066 1066 if (q->elevator) 1067 1067 elv_unregister_queue(q); 1068 + mutex_unlock(&q->sysfs_lock); 1068 1069 mutex_unlock(&q->sysfs_dir_lock); 1069 1070 1070 1071 kobject_put(&disk_to_dev(disk)->kobj);
+1 -30
block/elevator.c
··· 503 503 if (uevent) 504 504 kobject_uevent(&e->kobj, KOBJ_ADD); 505 505 506 - mutex_lock(&q->sysfs_lock); 507 506 e->registered = 1; 508 - mutex_unlock(&q->sysfs_lock); 509 507 } 510 508 return error; 511 509 } ··· 521 523 kobject_uevent(&e->kobj, KOBJ_REMOVE); 522 524 kobject_del(&e->kobj); 523 525 524 - mutex_lock(&q->sysfs_lock); 525 526 e->registered = 0; 526 527 /* Re-enable throttling in case elevator disabled it */ 527 528 wbt_enable_default(q); 528 - mutex_unlock(&q->sysfs_lock); 529 529 } 530 530 } 531 531 ··· 586 590 lockdep_assert_held(&q->sysfs_lock); 587 591 588 592 if (q->elevator) { 589 - if (q->elevator->registered) { 590 - mutex_unlock(&q->sysfs_lock); 591 - 592 - /* 593 - * Concurrent elevator switch can't happen becasue 594 - * sysfs write is always exclusively on same file. 595 - * 596 - * Also the elevator queue won't be freed after 597 - * sysfs_lock is released becasue kobject_del() in 598 - * blk_unregister_queue() waits for completion of 599 - * .store & .show on its attributes. 600 - */ 593 + if (q->elevator->registered) 601 594 elv_unregister_queue(q); 602 595 603 - mutex_lock(&q->sysfs_lock); 604 - } 605 596 ioc_clear_queue(q); 606 597 elevator_exit(q, q->elevator); 607 - 608 - /* 609 - * sysfs_lock may be dropped, so re-check if queue is 610 - * unregistered. If yes, don't switch to new elevator 611 - * any more 612 - */ 613 - if (!blk_queue_registered(q)) 614 - return 0; 615 598 } 616 599 617 600 ret = blk_mq_init_sched(q, new_e); ··· 598 623 goto out; 599 624 600 625 if (new_e) { 601 - mutex_unlock(&q->sysfs_lock); 602 - 603 626 ret = elv_register_queue(q, true); 604 - 605 - mutex_lock(&q->sysfs_lock); 606 627 if (ret) { 607 628 elevator_exit(q, q->elevator); 608 629 goto out;