Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

md/raid5: Fix UAF on IO across the reshape position

If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
raid5_make_request() will free the cloned bio. But raid5_make_request()
can call make_stripe_request() multiple times, writing to the various
stripes. If that bio got added to the toread or towrite lists of a
stripe disk in an earlier call to make_stripe_request(), then it's not
safe to just free the bio if a later part of it is found to cross the
reshape position. Doing so can lead to a UAF error, when bio_endio()
is called on the bio for the earlier stripes.

Instead, raid5_make_request() needs to wait until all parts of the bio
have called bio_endio(). To do this, bios that cross the reshape
position while the reshape can't make progress are flagged as needing to
wait for all parts to complete. When raid5_make_request() has a bio that
failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
bi->bi_private to a completion struct and waits for completion after
ending the bio. When the bio_endio() is called for the last time on a
clone bio with bi->bi_private set, it wakes up the waiter. This
guarantees that raid5_make_request() doesn't return until the cloned bio
needing a retry for io across the reshape boundary is safely cleaned up.

There is a simple reproducer available at [1]. Compile the kernel with
KASAN for more useful reporting when the error is triggered (this is not
necessary to see the bug).

[1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20260408043548.1695157-1-bmarzins@redhat.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>

authored by

Benjamin Marzinski and committed by
Yu Kuai
418b3e64 0898a817

+14 -25
+8 -23
drivers/md/md.c
··· 9341 9341 9342 9342 static void md_end_clone_io(struct bio *bio) 9343 9343 { 9344 - struct md_io_clone *md_io_clone = bio->bi_private; 9344 + struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone, 9345 + bio_clone); 9345 9346 struct bio *orig_bio = md_io_clone->orig_bio; 9346 9347 struct mddev *mddev = md_io_clone->mddev; 9348 + struct completion *reshape_completion = bio->bi_private; 9347 9349 9348 9350 if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false)) 9349 9351 md_bitmap_end(mddev, md_io_clone); ··· 9357 9355 bio_end_io_acct(orig_bio, md_io_clone->start_time); 9358 9356 9359 9357 bio_put(bio); 9360 - bio_endio(orig_bio); 9358 + if (unlikely(reshape_completion)) 9359 + complete(reshape_completion); 9360 + else 9361 + bio_endio(orig_bio); 9361 9362 percpu_ref_put(&mddev->active_io); 9362 9363 } 9363 9364 ··· 9385 9380 } 9386 9381 9387 9382 clone->bi_end_io = md_end_clone_io; 9388 - clone->bi_private = md_io_clone; 9383 + clone->bi_private = NULL; 9389 9384 *bio = clone; 9390 9385 } 9391 9386 ··· 9395 9390 md_clone_bio(mddev, bio); 9396 9391 } 9397 9392 EXPORT_SYMBOL_GPL(md_account_bio); 9398 - 9399 - void md_free_cloned_bio(struct bio *bio) 9400 - { 9401 - struct md_io_clone *md_io_clone = bio->bi_private; 9402 - struct bio *orig_bio = md_io_clone->orig_bio; 9403 - struct mddev *mddev = md_io_clone->mddev; 9404 - 9405 - if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false)) 9406 - md_bitmap_end(mddev, md_io_clone); 9407 - 9408 - if (bio->bi_status && !orig_bio->bi_status) 9409 - orig_bio->bi_status = bio->bi_status; 9410 - 9411 - if (md_io_clone->start_time) 9412 - bio_end_io_acct(orig_bio, md_io_clone->start_time); 9413 - 9414 - bio_put(bio); 9415 - percpu_ref_put(&mddev->active_io); 9416 - } 9417 - EXPORT_SYMBOL_GPL(md_free_cloned_bio); 9418 9393 9419 9394 /* md_allow_write(mddev) 9420 9395 * Calling this ensures that the array is marked 'active' so that writes
-1
drivers/md/md.h
··· 920 920 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev, 921 921 struct bio *bio, sector_t start, sector_t size); 922 922 void md_account_bio(struct mddev *mddev, struct bio **bio); 923 - void md_free_cloned_bio(struct bio *bio); 924 923 925 924 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio); 926 925 void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
+6 -1
drivers/md/raid5.c
··· 6217 6217 6218 6218 mempool_free(ctx, conf->ctx_pool); 6219 6219 if (res == STRIPE_WAIT_RESHAPE) { 6220 - md_free_cloned_bio(bi); 6220 + DECLARE_COMPLETION_ONSTACK(done); 6221 + WRITE_ONCE(bi->bi_private, &done); 6222 + 6223 + bio_endio(bi); 6224 + 6225 + wait_for_completion(&done); 6221 6226 return false; 6222 6227 } 6223 6228