Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

ublk: fix deadlock when reading partition table

When one process(such as udev) opens ublk block device (e.g., to read
the partition table via bdev_open()), a deadlock[1] can occur:

1. bdev_open() grabs disk->open_mutex
2. The process issues read I/O to ublk backend to read partition table
3. In __ublk_complete_rq(), blk_update_request() or blk_mq_end_request()
runs bio->bi_end_io() callbacks
4. If this triggers fput() on file descriptor of ublk block device, the
work may be deferred to current task's task work (see fput() implementation)
5. This eventually calls blkdev_release() from the same context
6. blkdev_release() tries to grab disk->open_mutex again
7. Deadlock: same task waiting for a mutex it already holds

The fix is to run blk_update_request() and blk_mq_end_request() with bottom
halves disabled. This forces blkdev_release() to run in kernel work-queue
context instead of current task work context, and allows ublk server to make
forward progress, and avoids the deadlock.

Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Link: https://github.com/ublk-org/ublksrv/issues/170 [1]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
[axboe: rewrite comment in ublk]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Ming Lei and committed by
Jens Axboe
c258f5c4 a58383fa

+28 -4
+28 -4
drivers/block/ublk_drv.c
··· 1080 1080 return io_uring_cmd_to_pdu(ioucmd, struct ublk_uring_cmd_pdu); 1081 1081 } 1082 1082 1083 + static void ublk_end_request(struct request *req, blk_status_t error) 1084 + { 1085 + local_bh_disable(); 1086 + blk_mq_end_request(req, error); 1087 + local_bh_enable(); 1088 + } 1089 + 1083 1090 /* todo: handle partial completion */ 1084 1091 static inline void __ublk_complete_rq(struct request *req, struct ublk_io *io, 1085 1092 bool need_map) 1086 1093 { 1087 1094 unsigned int unmapped_bytes; 1088 1095 blk_status_t res = BLK_STS_OK; 1096 + bool requeue; 1089 1097 1090 1098 /* failed read IO if nothing is read */ 1091 1099 if (!io->res && req_op(req) == REQ_OP_READ) ··· 1125 1117 if (unlikely(unmapped_bytes < io->res)) 1126 1118 io->res = unmapped_bytes; 1127 1119 1128 - if (blk_update_request(req, BLK_STS_OK, io->res)) 1120 + /* 1121 + * Run bio->bi_end_io() with softirqs disabled. If the final fput 1122 + * happens off this path, then that will prevent ublk's blkdev_release() 1123 + * from being called on current's task work, see fput() implementation. 1124 + * 1125 + * Otherwise, ublk server may not provide forward progress in case of 1126 + * reading the partition table from bdev_open() with disk->open_mutex 1127 + * held, and causes dead lock as we could already be holding 1128 + * disk->open_mutex here. 1129 + * 1130 + * Preferably we would not be doing IO with a mutex held that is also 1131 + * used for release, but this work-around will suffice for now. 1132 + */ 1133 + local_bh_disable(); 1134 + requeue = blk_update_request(req, BLK_STS_OK, io->res); 1135 + local_bh_enable(); 1136 + if (requeue) 1129 1137 blk_mq_requeue_request(req, true); 1130 1138 else if (likely(!blk_should_fake_timeout(req->q))) 1131 1139 __blk_mq_end_request(req, BLK_STS_OK); 1132 1140 1133 1141 return; 1134 1142 exit: 1135 - blk_mq_end_request(req, res); 1143 + ublk_end_request(req, res); 1136 1144 } 1137 1145 1138 1146 static struct io_uring_cmd *__ublk_prep_compl_io_cmd(struct ublk_io *io, ··· 1188 1164 if (ublk_nosrv_dev_should_queue_io(ubq->dev)) 1189 1165 blk_mq_requeue_request(rq, false); 1190 1166 else 1191 - blk_mq_end_request(rq, BLK_STS_IOERR); 1167 + ublk_end_request(rq, BLK_STS_IOERR); 1192 1168 } 1193 1169 1194 1170 static void ··· 1233 1209 ublk_auto_buf_reg_fallback(ubq, req->tag); 1234 1210 return AUTO_BUF_REG_FALLBACK; 1235 1211 } 1236 - blk_mq_end_request(req, BLK_STS_IOERR); 1212 + ublk_end_request(req, BLK_STS_IOERR); 1237 1213 return AUTO_BUF_REG_FAIL; 1238 1214 } 1239 1215