Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

ublk: document IO reference counting design

Add comprehensive documentation for ublk's split reference counting
model (io->ref + io->task_registered_buffers) above ublk_init_req_ref()
given this model isn't very straightforward.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Ming Lei and committed by
Jens Axboe
0921abdc f46ebb91

+89
+89
drivers/block/ublk_drv.c
··· 985 985 ublk_dev_support_auto_buf_reg(ub); 986 986 } 987 987 988 + /* 989 + * ublk IO Reference Counting Design 990 + * ================================== 991 + * 992 + * For user-copy and zero-copy modes, ublk uses a split reference model with 993 + * two counters that together track IO lifetime: 994 + * 995 + * - io->ref: refcount for off-task buffer registrations and user-copy ops 996 + * - io->task_registered_buffers: count of buffers registered on the IO task 997 + * 998 + * Key Invariant: 999 + * -------------- 1000 + * When IO is dispatched to the ublk server (UBLK_IO_FLAG_OWNED_BY_SRV set), 1001 + * the sum (io->ref + io->task_registered_buffers) must equal UBLK_REFCOUNT_INIT 1002 + * when no active references exist. After IO completion, both counters become 1003 + * zero. For I/Os not currently dispatched to the ublk server, both ref and 1004 + * task_registered_buffers are 0. 1005 + * 1006 + * This invariant is checked by ublk_check_and_reset_active_ref() during daemon 1007 + * exit to determine if all references have been released. 1008 + * 1009 + * Why Split Counters: 1010 + * ------------------- 1011 + * Buffers registered on the IO daemon task can use the lightweight 1012 + * task_registered_buffers counter (simple increment/decrement) instead of 1013 + * atomic refcount operations. The ublk_io_release() callback checks if 1014 + * current == io->task to decide which counter to update. 1015 + * 1016 + * This optimization only applies before IO completion. At completion, 1017 + * ublk_sub_req_ref() collapses task_registered_buffers into the atomic ref. 1018 + * After that, all subsequent buffer unregistrations must use the atomic ref 1019 + * since they may be releasing the last reference. 1020 + * 1021 + * Reference Lifecycle: 1022 + * -------------------- 1023 + * 1. ublk_init_req_ref(): Sets io->ref = UBLK_REFCOUNT_INIT at IO dispatch 1024 + * 1025 + * 2. During IO processing: 1026 + * - On-task buffer reg: task_registered_buffers++ (no ref change) 1027 + * - Off-task buffer reg: ref++ via ublk_get_req_ref() 1028 + * - Buffer unregister callback (ublk_io_release): 1029 + * * If on-task: task_registered_buffers-- 1030 + * * If off-task: ref-- via ublk_put_req_ref() 1031 + * 1032 + * 3. ublk_sub_req_ref() at IO completion: 1033 + * - Computes: sub_refs = UBLK_REFCOUNT_INIT - task_registered_buffers 1034 + * - Subtracts sub_refs from ref and zeroes task_registered_buffers 1035 + * - This effectively collapses task_registered_buffers into the atomic ref, 1036 + * accounting for the initial UBLK_REFCOUNT_INIT minus any on-task 1037 + * buffers that were already counted 1038 + * 1039 + * Example (zero-copy, register on-task, unregister off-task): 1040 + * - Dispatch: ref = UBLK_REFCOUNT_INIT, task_registered_buffers = 0 1041 + * - Register buffer on-task: task_registered_buffers = 1 1042 + * - Unregister off-task: ref-- (UBLK_REFCOUNT_INIT - 1), task_registered_buffers stays 1 1043 + * - Completion via ublk_sub_req_ref(): 1044 + * sub_refs = UBLK_REFCOUNT_INIT - 1, 1045 + * ref = (UBLK_REFCOUNT_INIT - 1) - (UBLK_REFCOUNT_INIT - 1) = 0 1046 + * 1047 + * Example (auto buffer registration): 1048 + * Auto buffer registration sets task_registered_buffers = 1 at dispatch. 1049 + * 1050 + * - Dispatch: ref = UBLK_REFCOUNT_INIT, task_registered_buffers = 1 1051 + * - Buffer unregister: task_registered_buffers-- (becomes 0) 1052 + * - Completion via ublk_sub_req_ref(): 1053 + * sub_refs = UBLK_REFCOUNT_INIT - 0, ref becomes 0 1054 + * 1055 + * Example (zero-copy, ublk server killed): 1056 + * When daemon is killed, io_uring cleanup unregisters buffers off-task. 1057 + * ublk_check_and_reset_active_ref() waits for the invariant to hold. 1058 + * 1059 + * - Dispatch: ref = UBLK_REFCOUNT_INIT, task_registered_buffers = 0 1060 + * - Register buffer on-task: task_registered_buffers = 1 1061 + * - Daemon killed, io_uring cleanup unregisters buffer (off-task): 1062 + * ref-- (UBLK_REFCOUNT_INIT - 1), task_registered_buffers stays 1 1063 + * - Daemon exit check: sum = (UBLK_REFCOUNT_INIT - 1) + 1 = UBLK_REFCOUNT_INIT 1064 + * - Sum equals UBLK_REFCOUNT_INIT, then both two counters are zeroed by 1065 + * ublk_check_and_reset_active_ref(), so ublk_abort_queue() can proceed 1066 + * and abort pending requests 1067 + * 1068 + * Batch IO Special Case: 1069 + * ---------------------- 1070 + * In batch IO mode, io->task is NULL. This means ublk_io_release() always 1071 + * takes the off-task path (ublk_put_req_ref), decrementing io->ref. The 1072 + * task_registered_buffers counter still tracks registered buffers for the 1073 + * invariant check, even though the callback doesn't decrement it. 1074 + * 1075 + * Note: updating task_registered_buffers is protected by io->lock. 1076 + */ 988 1077 static inline void ublk_init_req_ref(const struct ublk_queue *ubq, 989 1078 struct ublk_io *io) 990 1079 {