drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()

Even though we check that we "should" be able to do lc_get_cumulative()
while holding the device->al_lock spinlock, it may still fail,
if some other code path decided to do lc_try_lock() with bad timing.

If that happened, we logged "LOGIC BUG for enr=...",
but still did not return an error.

The rest of the code now assumed that this request has references
for the relevant activity log extents.

The implcations are that during an active resync, mutual exclusivity of
resync versus application IO is not guaranteed. And a potential crash
at this point may not realizs that these extents could have been target
of in-flight IO and would need to be resynced just in case.

Also, once the request completes, it will give up activity log references it
does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put().

Fix:

Do not crash the kernel for a condition that is harmless during normal
operation: also catch "e->refcnt == 0", not only "e == NULL"
when being noisy about "al_complete_io() called on inactive extent %u\n".

And do not try to be smart and "guess" whether something will work, then
be surprised when it does not.
Deal with the fact that it may or may not work. If it does not, remember a
possible "partially in activity log" state (only possible for requests that
cross extent boundaries), and return an error code from
drbd_al_begin_io_nonblock().

A latter call for the same request will then resume from where we left off.

Cc: stable@vger.kernel.org
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Lars Ellenberg and committed by

Jens Axboe 3 months ago ab140365 2961f841

+27 -31

2 changed files

expand all

drivers

block

drbd

drbd_actlog.c

drbd_interval.h

+23 -30

drivers/block/drbd/drbd_actlog.c

··· 483 483 484 484 int drbd_al_begin_io_nonblock(struct drbd_device *device, struct drbd_interval *i) 485 485 { 486 - struct lru_cache *al = device->act_log; 487 486 /* for bios crossing activity log extent boundaries, 488 487 * we may need to activate two extents in one go */ 489 488 unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); 490 489 unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); 491 - unsigned nr_al_extents; 492 - unsigned available_update_slots; 493 490 unsigned enr; 494 491 495 - D_ASSERT(device, first <= last); 496 - 497 - nr_al_extents = 1 + last - first; /* worst case: all touched extends are cold. */ 498 - available_update_slots = min(al->nr_elements - al->used, 499 - al->max_pending_changes - al->pending_changes); 500 - 501 - /* We want all necessary updates for a given request within the same transaction 502 - * We could first check how many updates are *actually* needed, 503 - * and use that instead of the worst-case nr_al_extents */ 504 - if (available_update_slots < nr_al_extents) { 505 - /* Too many activity log extents are currently "hot". 506 - * 507 - * If we have accumulated pending changes already, 508 - * we made progress. 509 - * 510 - * If we cannot get even a single pending change through, 511 - * stop the fast path until we made some progress, 512 - * or requests to "cold" extents could be starved. */ 513 - if (!al->pending_changes) 514 - __set_bit(__LC_STARVING, &device->act_log->flags); 515 - return -ENOBUFS; 492 + if (i->partially_in_al_next_enr) { 493 + D_ASSERT(device, first < i->partially_in_al_next_enr); 494 + D_ASSERT(device, last >= i->partially_in_al_next_enr); 495 + first = i->partially_in_al_next_enr; 516 496 } 497 + 498 + D_ASSERT(device, first <= last); 517 499 518 500 /* Is resync active in this area? */ 519 501 for (enr = first; enr <= last; enr++) { ··· 511 529 } 512 530 } 513 531 514 - /* Checkout the refcounts. 515 - * Given that we checked for available elements and update slots above, 516 - * this has to be successful. */ 532 + /* Try to checkout the refcounts. */ 517 533 for (enr = first; enr <= last; enr++) { 518 534 struct lc_element *al_ext; 519 535 al_ext = lc_get_cumulative(device->act_log, enr); 520 - if (!al_ext) 521 - drbd_info(device, "LOGIC BUG for enr=%u\n", enr); 536 + 537 + if (!al_ext) { 538 + /* Did not work. We may have exhausted the possible 539 + * changes per transaction. Or raced with someone 540 + * "locking" it against changes. 541 + * Remember where to continue from. 542 + */ 543 + if (enr > first) 544 + i->partially_in_al_next_enr = enr; 545 + return -ENOBUFS; 546 + } 522 547 } 523 548 return 0; 524 549 } ··· 545 556 546 557 for (enr = first; enr <= last; enr++) { 547 558 extent = lc_find(device->act_log, enr); 548 - if (!extent) { 559 + /* Yes, this masks a bug elsewhere. However, during normal 560 + * operation this is harmless, so no need to crash the kernel 561 + * by the BUG_ON(refcount == 0) in lc_put(). 562 + */ 563 + if (!extent || extent->refcnt == 0) { 549 564 drbd_err(device, "al_complete_io() called on inactive extent %u\n", enr); 550 565 continue; 551 566 }

+4 -1

drivers/block/drbd/drbd_interval.h

··· 8 8 struct drbd_interval { 9 9 struct rb_node rb; 10 10 sector_t sector; /* start sector of the interval */ 11 - unsigned int size; /* size in bytes */ 12 11 sector_t end; /* highest interval end in subtree */ 12 + unsigned int size; /* size in bytes */ 13 13 unsigned int local:1 /* local or remote request? */; 14 14 unsigned int waiting:1; /* someone is waiting for completion */ 15 15 unsigned int completed:1; /* this has been completed already; 16 16 * ignore for conflict detection */ 17 + 18 + /* to resume a partially successful drbd_al_begin_io_nonblock(); */ 19 + unsigned int partially_in_al_next_enr; 17 20 }; 18 21 19 22 static inline void drbd_clear_interval(struct drbd_interval *i)

Configure Feed

Configure Feed