Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

block: fix zone write plug removal

Commit 7b295187287e ("block: Do not remove zone write plugs still in
use") modified disk_should_remove_zone_wplug() to add a check on the
reference count of a zone write plug to prevent removing zone write
plugs from a disk hash table when the plugs are still being referenced
by BIOs or requests in-flight. However, this check does not take into
account that a BIO completion may happen right after its submission by
a zone write plug BIO work, and before the zone write plug BIO work
releases the zone write plug reference count. This situation leads to
disk_should_remove_zone_wplug() returning false as in this case the zone
write plug reference count is at least equal to 3. If the BIO that
completes in such manner transitioned the zone to the FULL condition,
the zone write plug for the FULL zone will remain in the disk hash
table.

Furthermore, relying on a particular value of a zone write plug
reference count to set the BLK_ZONE_WPLUG_UNHASHED flag is fragile as
reading the atomic reference count and doing a comparison with some
value is not overall atomic at all.

Address these issues by reworking the reference counting of zone write
plugs so that removing plugs from a disk hash table can be done
directly from disk_put_zone_wplug() when the last reference on a plug
is dropped.

To do so, replace the function disk_remove_zone_wplug() with
disk_mark_zone_wplug_dead(). This new function sets the zone write plug
flag BLK_ZONE_WPLUG_DEAD (which replaces BLK_ZONE_WPLUG_UNHASHED) and
drops the initial reference on the zone write plug taken when the plug
was added to the disk hash table. This function is called either for
zones that are empty or full, or directly in the case of a forced plug
removal (e.g. when the disk hash table is being destroyed on disk
removal). With this change, disk_should_remove_zone_wplug() is also
removed.

disk_put_zone_wplug() is modified to call the function
disk_free_zone_wplug() to remove a zone write plug from a disk hash
table and free the plug structure (with a call_rcu()), when the last
reference on a zone write plug is dropped. disk_free_zone_wplug()
always checks that the BLK_ZONE_WPLUG_DEAD flag is set.

In order to avoid having multiple zone write plugs for the same zone in
the disk hash table, disk_get_and_lock_zone_wplug() checked for the
BLK_ZONE_WPLUG_UNHASHED flag. This check is removed and a check for
the new BLK_ZONE_WPLUG_DEAD flag is added to
blk_zone_wplug_handle_write(). With this change, we continue preventing
adding multiple zone write plugs for the same zone and at the same time
re-inforce checks on the user behavior by failing new incoming write
BIOs targeting a zone that is marked as dead. This case can happen only
if the user erroneously issues write BIOs to zones that are full, or to
zones that are currently being reset or finished.

Fixes: 7b295187287e ("block: Do not remove zone write plugs still in use")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Damien Le Moal and committed by
Jens Axboe
b7d4ffb5 0ee8ab5d

+57 -94
+57 -94
block/blk-zoned.c
··· 99 99 * being executed or the zone write plug bio list is not empty. 100 100 * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone 101 101 * write pointer offset and need to update it. 102 - * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed 103 - * from the disk hash table and that the initial reference to the zone 104 - * write plug set when the plug was first added to the hash table has been 105 - * dropped. This flag is set when a zone is reset, finished or become full, 106 - * to prevent new references to the zone write plug to be taken for 107 - * newly incoming BIOs. A zone write plug flagged with this flag will be 108 - * freed once all remaining references from BIOs or functions are dropped. 102 + * - BLK_ZONE_WPLUG_DEAD: Indicates that the zone write plug will be 103 + * removed from the disk hash table of zone write plugs when the last 104 + * reference on the zone write plug is dropped. If set, this flag also 105 + * indicates that the initial extra reference on the zone write plug was 106 + * dropped, meaning that the reference count indicates the current number of 107 + * active users (code context or BIOs and requests in flight). This flag is 108 + * set when a zone is reset, finished or becomes full. 109 109 */ 110 110 #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) 111 111 #define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) 112 - #define BLK_ZONE_WPLUG_UNHASHED (1U << 2) 112 + #define BLK_ZONE_WPLUG_DEAD (1U << 2) 113 113 114 114 /** 115 115 * blk_zone_cond_str - Return a zone condition name string ··· 587 587 mempool_free(zwplug, zwplug->disk->zone_wplugs_pool); 588 588 } 589 589 590 - static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) 590 + static void disk_free_zone_wplug(struct blk_zone_wplug *zwplug) 591 591 { 592 - if (refcount_dec_and_test(&zwplug->ref)) { 593 - WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); 594 - WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); 595 - WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)); 596 - 597 - call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); 598 - } 599 - } 600 - 601 - static inline bool disk_should_remove_zone_wplug(struct gendisk *disk, 602 - struct blk_zone_wplug *zwplug) 603 - { 604 - lockdep_assert_held(&zwplug->lock); 605 - 606 - /* If the zone write plug was already removed, we are done. */ 607 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) 608 - return false; 609 - 610 - /* If the zone write plug is still plugged, it cannot be removed. */ 611 - if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) 612 - return false; 613 - 614 - /* 615 - * Completions of BIOs with blk_zone_write_plug_bio_endio() may 616 - * happen after handling a request completion with 617 - * blk_zone_write_plug_finish_request() (e.g. with split BIOs 618 - * that are chained). In such case, disk_zone_wplug_unplug_bio() 619 - * should not attempt to remove the zone write plug until all BIO 620 - * completions are seen. Check by looking at the zone write plug 621 - * reference count, which is 2 when the plug is unused (one reference 622 - * taken when the plug was allocated and another reference taken by the 623 - * caller context). 624 - */ 625 - if (refcount_read(&zwplug->ref) > 2) 626 - return false; 627 - 628 - /* We can remove zone write plugs for zones that are empty or full. */ 629 - return !zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug); 630 - } 631 - 632 - static void disk_remove_zone_wplug(struct gendisk *disk, 633 - struct blk_zone_wplug *zwplug) 634 - { 592 + struct gendisk *disk = zwplug->disk; 635 593 unsigned long flags; 636 594 637 - /* If the zone write plug was already removed, we have nothing to do. */ 638 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) 639 - return; 595 + WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_DEAD)); 596 + WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); 597 + WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); 640 598 641 - /* 642 - * Mark the zone write plug as unhashed and drop the extra reference we 643 - * took when the plug was inserted in the hash table. Also update the 644 - * disk zone condition array with the current condition of the zone 645 - * write plug. 646 - */ 647 - zwplug->flags |= BLK_ZONE_WPLUG_UNHASHED; 648 599 spin_lock_irqsave(&disk->zone_wplugs_lock, flags); 649 600 blk_zone_set_cond(rcu_dereference_check(disk->zones_cond, 650 601 lockdep_is_held(&disk->zone_wplugs_lock)), ··· 603 652 hlist_del_init_rcu(&zwplug->node); 604 653 atomic_dec(&disk->nr_zone_wplugs); 605 654 spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); 606 - disk_put_zone_wplug(zwplug); 655 + 656 + call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); 657 + } 658 + 659 + static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) 660 + { 661 + if (refcount_dec_and_test(&zwplug->ref)) 662 + disk_free_zone_wplug(zwplug); 663 + } 664 + 665 + /* 666 + * Flag the zone write plug as dead and drop the initial reference we got when 667 + * the zone write plug was added to the hash table. The zone write plug will be 668 + * unhashed when its last reference is dropped. 669 + */ 670 + static void disk_mark_zone_wplug_dead(struct blk_zone_wplug *zwplug) 671 + { 672 + lockdep_assert_held(&zwplug->lock); 673 + 674 + if (!(zwplug->flags & BLK_ZONE_WPLUG_DEAD)) { 675 + zwplug->flags |= BLK_ZONE_WPLUG_DEAD; 676 + disk_put_zone_wplug(zwplug); 677 + } 607 678 } 608 679 609 680 static void blk_zone_wplug_bio_work(struct work_struct *work); ··· 645 672 again: 646 673 zwplug = disk_get_zone_wplug(disk, sector); 647 674 if (zwplug) { 648 - /* 649 - * Check that a BIO completion or a zone reset or finish 650 - * operation has not already removed the zone write plug from 651 - * the hash table and dropped its reference count. In such case, 652 - * we need to get a new plug so start over from the beginning. 653 - */ 654 675 spin_lock_irqsave(&zwplug->lock, *flags); 655 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) { 656 - spin_unlock_irqrestore(&zwplug->lock, *flags); 657 - disk_put_zone_wplug(zwplug); 658 - goto again; 659 - } 660 676 return zwplug; 661 677 } 662 678 ··· 750 788 disk_zone_wplug_update_cond(disk, zwplug); 751 789 752 790 disk_zone_wplug_abort(zwplug); 753 - 754 - /* 755 - * The zone write plug now has no BIO plugged: remove it from the 756 - * hash table so that it cannot be seen. The plug will be freed 757 - * when the last reference is dropped. 758 - */ 759 - if (disk_should_remove_zone_wplug(disk, zwplug)) 760 - disk_remove_zone_wplug(disk, zwplug); 791 + if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug)) 792 + disk_mark_zone_wplug_dead(zwplug); 761 793 } 762 794 763 795 static unsigned int blk_zone_wp_offset(struct blk_zone *zone) ··· 1403 1447 return true; 1404 1448 } 1405 1449 1450 + /* 1451 + * If we got a zone write plug marked as dead, then the user is issuing 1452 + * writes to a full zone, or without synchronizing with zone reset or 1453 + * zone finish operations. In such case, fail the BIO to signal this 1454 + * invalid usage. 1455 + */ 1456 + if (zwplug->flags & BLK_ZONE_WPLUG_DEAD) { 1457 + spin_unlock_irqrestore(&zwplug->lock, flags); 1458 + disk_put_zone_wplug(zwplug); 1459 + bio_io_error(bio); 1460 + return true; 1461 + } 1462 + 1406 1463 /* Indicate that this BIO is being handled using zone write plugging. */ 1407 1464 bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING); 1408 1465 ··· 1496 1527 disk->disk_name, zwplug->zone_no); 1497 1528 disk_zone_wplug_abort(zwplug); 1498 1529 } 1499 - disk_remove_zone_wplug(disk, zwplug); 1530 + disk_mark_zone_wplug_dead(zwplug); 1500 1531 spin_unlock_irqrestore(&zwplug->lock, flags); 1501 1532 1502 1533 disk_put_zone_wplug(zwplug); ··· 1599 1630 } 1600 1631 1601 1632 zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; 1602 - 1603 - /* 1604 - * If the zone is full (it was fully written or finished, or empty 1605 - * (it was reset), remove its zone write plug from the hash table. 1606 - */ 1607 - if (disk_should_remove_zone_wplug(disk, zwplug)) 1608 - disk_remove_zone_wplug(disk, zwplug); 1609 - 1633 + if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug)) 1634 + disk_mark_zone_wplug_dead(zwplug); 1610 1635 spin_unlock_irqrestore(&zwplug->lock, flags); 1611 1636 } 1612 1637 ··· 1811 1848 while (!hlist_empty(&disk->zone_wplugs_hash[i])) { 1812 1849 zwplug = hlist_entry(disk->zone_wplugs_hash[i].first, 1813 1850 struct blk_zone_wplug, node); 1814 - refcount_inc(&zwplug->ref); 1815 - disk_remove_zone_wplug(disk, zwplug); 1816 - disk_put_zone_wplug(zwplug); 1851 + spin_lock_irq(&zwplug->lock); 1852 + disk_mark_zone_wplug_dead(zwplug); 1853 + spin_unlock_irq(&zwplug->lock); 1817 1854 } 1818 1855 } 1819 1856