Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

block: fix null pointer dereference in blk_mq_rq_timed_out()

We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:

[ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 108.827059] PGD 0 P4D 0
[ 108.827313] Oops: 0000 [#1] SMP PTI
[ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[ 108.829503] Workqueue: kblockd blk_mq_timeout_work
[ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[ 108.838191] Call Trace:
[ 108.838406] bt_iter+0x74/0x80
[ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450
[ 108.839074] ? __switch_to_asm+0x34/0x70
[ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40
[ 108.840273] ? syscall_return_via_sysret+0xf/0x7f
[ 108.840732] blk_mq_timeout_work+0x74/0x200
[ 108.841151] process_one_work+0x297/0x680
[ 108.841550] worker_thread+0x29c/0x6f0
[ 108.841926] ? rescuer_thread+0x580/0x580
[ 108.842344] kthread+0x16a/0x1a0
[ 108.842666] ? kthread_flush_work+0x170/0x170
[ 108.843100] ret_from_fork+0x35/0x40

The bug is caused by the race between timeout handle and completion for
flush request.

When timeout handle function blk_mq_rq_timed_out() try to read
'req->q->mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.

After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq->ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.

However, flush request has defined .end_io and rq->end_io() is still
called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.

We fix this problem by covering flush request with 'rq->ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>

-------
v2:
- move rq_status from struct request to struct blk_flush_queue
v3:
- remove unnecessary '{}' pair.
v4:
- let spinlock to protect 'fq->rq_status'
v5:
- move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Yufen Yu and committed by
Jens Axboe
8d699663 2af2783f

+21 -1
+10
block/blk-flush.c
··· 214 214 215 215 /* release the tag's ownership to the req cloned from */ 216 216 spin_lock_irqsave(&fq->mq_flush_lock, flags); 217 + 218 + if (!refcount_dec_and_test(&flush_rq->ref)) { 219 + fq->rq_status = error; 220 + spin_unlock_irqrestore(&fq->mq_flush_lock, flags); 221 + return; 222 + } 223 + 224 + if (fq->rq_status != BLK_STS_OK) 225 + error = fq->rq_status; 226 + 217 227 hctx = flush_rq->mq_hctx; 218 228 if (!q->elevator) { 219 229 blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+4 -1
block/blk-mq.c
··· 918 918 */ 919 919 if (blk_mq_req_expired(rq, next)) 920 920 blk_mq_rq_timed_out(rq, reserved); 921 - if (refcount_dec_and_test(&rq->ref)) 921 + 922 + if (is_flush_rq(rq, hctx)) 923 + rq->end_io(rq, 0); 924 + else if (refcount_dec_and_test(&rq->ref)) 922 925 __blk_mq_free_request(rq); 923 926 924 927 return true;
+7
block/blk.h
··· 19 19 unsigned int flush_queue_delayed:1; 20 20 unsigned int flush_pending_idx:1; 21 21 unsigned int flush_running_idx:1; 22 + blk_status_t rq_status; 22 23 unsigned long flush_pending_since; 23 24 struct list_head flush_queue[2]; 24 25 struct list_head flush_data_in_flight; ··· 46 45 static inline void __blk_get_queue(struct request_queue *q) 47 46 { 48 47 kobject_get(&q->kobj); 48 + } 49 + 50 + static inline bool 51 + is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx) 52 + { 53 + return hctx->fq->flush_rq == req; 49 54 } 50 55 51 56 struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,